2026-02-26 00:46:11
This is the first AI Vistas discussion, a new series hosted by Exponential View where I bring people I trust into conversation around one hard question, because together we can see what none of us would see alone.
Two weeks ago, I read something in the financial press about a pattern called a dispersion event. I didn’t fully understand it, so before a short car ride, I asked my OpenClaw agent R Mini Arnold to explain it to me and find the data to evidence it.
By the time I arrived, the agent had pulled thirty years of market data and flagged contradictions in my recent thinking I hadn’t noticed. I adjusted my portfolio on the spot. The decision was mine, but the thinking behind it… it gets blurry. Who’s in charge here – me or my agent?
I wanted to sit with that, and not alone. I convened a conversation with , who wrote The Battle for Your Brain and advised President Obama on the ethics of the human mind; , one of the most cited medical researchers alive with a frontline view into AI implementation in medicine; , engineer and economist and former hedge fund manager now building AI platforms; and , CEO of The Atlantic, who moderated the dialogue.
Watch the full recording here.
Listen on Spotify or Apple Podcasts.
Transcript (lightly edited for clarity and flow) is below.
Nick Thompson: Azeem, you’ve committed thousands of lines of agent code this year, built multiple apps since Christmas. But the car-journey story sounds like you were in control – you asked a question, got an answer, made a call. Give me the version where the blur actually kicks in.
Azeem Azhar: The blur is subtler than the question itself. My agent – I was using OpenClaw at the time – will often say, “In our previous conversations this week, you said X, and that seems to contradict what you’re asking now.” I’d been talking to it about AI bottlenecks and doubling down on the AI trade. Then I asked about downside risk. It spotted the contradiction: on one hand I was worried about a crash, on the other I was thinking of increasing my exposure. It drew my attention to it. We had a conversation, and it reframed the question I was asking.
Nita Farahany: This gets at a fundamental question about what it means to act autonomously. Are you acting in a way consistent with your own desires, or are you being steered by somebody else’s desires? There’s very little we do these days that is steered by our own desires.
I’m teaching a class this semester at Duke on mental privacy, advanced topics in AI law and policy. I ran an attention audit on my students. They recorded how many times they picked up their devices over three days, and what they spent their attention on. Day one: record it, don’t change your behaviour. Day two: no apps that algorithmically steer you — which meant basically nothing was allowed. Day three: do whatever you want, record it again. Day two was remarkable. Somebody read a book. They couldn’t remember the last time they’d done that. These are students. Someone finished a puzzle they’d been convinced they didn’t have time for. And day three? Worse than day one. Utterly sucked back in.
Is Azeem in charge? His behaviour is partly in charge, he told his agent to research a specific question. But the fact that he’s reacting to information it provides, constantly in this loop of notification and response, thinking about what his agents are doing while he’s being driven, it’s deeply interrelated. There is no longer a clear point where Azeem ends and his tools begin. They are an extended part of who he is.
Nick: And immensely hackable. He’s controlling his portfolio based on this. If I were a smart hedge fund, I’d figure out how to hack OpenClaw to steer his investments my way.
Nita: The very fact that Azeem has offloaded the question to his AI tool is the more fundamental question. It’s more subtle, more pervasive, and more universal than someone intentionally manipulating his portfolio.
Rohit Krishnan: One thing that surprises me is that despite these tools being insecure, the number of negative instances is surprisingly rare. The ability to cause harm is rising alongside the ability to do good and yet the bad events stay low. Think about it from a hedge fund’s perspective: where do you put your resources? Is hacking OpenClaw going to generate more alpha than putting that same effort into something else? The answer always seems to be no. So my OpenClaw talks to my email and occasionally answers them. I’m reasonably okay with that threat surface, despite knowing prompt injections exist.
Nick: Eric, does Rohit’s framework – that actual harms trail perceived dangers – hold in medicine?
Eric Topol: I don’t think we know yet. There are several studies now comparing AI alone versus AI with doctors, on various types of clinical performance. And AI did better than the doctors with AI. That wasn’t expected. Everything was supposed to be hybrid. The under-performers are more likely to accept the AI’s input, whereas the experts reject the AI’s good input. And if you extend that to having agentic support, perhaps even more so. We still don’t even know. Is it because doctors have automation bias? Is it because they aren’t grounded in how to use AI? It’s really fuzzy right now. The medical world is not as advanced as a lot of other domains that are adopting AI more quickly, because clinical decision-making is much more tricky and delicate.
Nick: That goes against something I’ve heard you say, Azeem – that humans using AI well should be much stronger than AI alone.
Azeem: What Eric described is a phenomenon we’ve seen across knowledge work. People below average improve with AI. People at the top get worse because they override its suggestions. But I wonder if there’s a U-shaped curve hiding in the data. Below average: you improve. Top quartile: you overthink it. Truly exceptional: you master the machine. Look at Andrej Karpathy, one of the great deep learning engineers, a much better software developer than any of us. He’s handed enormous amounts of his work to AI systems. He’s the ultimate expert but his response is: he’s getting more done, pushing limits much further.
Nita: Let me be the philosopher for a moment. What are we actually trying to measure? In Eric’s domain it’s clearer: did you get the diagnosis right? Is the patient better off? But in other areas, we’re not comparing apples to apples. Take education. Measure whether the essay is better or worse and the essay improves with AI. But measure the learning outcome and the answer flips. The human is worse off, the essay is better. So, is humanity better off with more polished AI-produced essays, or worse off because the people who wrote them didn’t learn?
Nick: So the solution is to let AI make the super consequential decisions – like surgery – but keep humans in the loop for the developmental stuff, like essays?
Nita: Ask the question over the long term. Today, should you use AI for a life-threatening decision? Yes. But carry it out: now you have physicians who no longer have intuition about whether to operate. People who can’t interrogate code because they don’t understand it – they’re pulling the AI slot machine for different output rather than critically evaluating it. Over time, are we better off or worse off? I worry in the long term that we will end up in a much worse position than in the short term.
Nick: So are you saying there are specific cases where the AI would get the surgery right more often, and we know it, and some patients would die from the human getting it wrong – but you’d still prefer that world because at least the humans stay competent?
Nita: That’s an artificial construct, Nick. The real question is what we invest in so that people maintain what I’m calling a constituent of competency over time. How do we do both? Use AI for the consequential decision today, and build the systems that keep humans capable of navigating it in the future. That’s the question nobody is asking.
Rohit: Maybe one thing to consider is whether those physicians were any good at working with AI. Every consequential decision I’ve made in the last three or four years has been with AI. Sometimes I’ve agreed, sometimes I’ve disagreed. But over time you build up a sense of when it’s accurate and when it’s not. These studies aren’t Jeopardy, where an expert gives one answer and AI gives another and you pick one. In the real world there are time constraints, checks and balances, multiple people looking at the same call. That context doesn’t show up in the papers.
Nick: Eric, so maybe the anomaly isn’t something fundamental – it’s just about this moment, where doctors don’t yet understand how to use AI. Once we learn, it’ll be fine?
Eric: That’s the hope, Nick. The medical community likes that explanation best.
Eric: The other issue in medicine we’ve just seen confirmed is de-skilling. There’s now a study of gastroenterologists who let AI find the polyps. Over time, when you turn the AI off, they’re not as good at finding polyps themselves. And then there are the younger doctors coming through who are going to be never skilled in the first place. The hope is that the hybrid everybody expected – human plus AI, better than either alone – eventually wins. But we don’t know.
Azeem: De-skilling is real. And what Nita and Eric are drawing attention to is a gap in education. In these early stages, it’s up to us to figure out our own pedagogy for maintaining skills. I spend a lot of time writing with this (fountain pen), computer turned off. Yesterday I had a session with a colleague where we worked on a problem in silence, on paper, no phones, no computers, for a couple of hours. We do that often in my team. Maybe it’s the wrong thing. But it’s our gesture toward understanding de-skilling – this sloppy vibing that has you moving further and further from your own mental capacities.
Nick: But aren’t you also doing the opposite? You’ve been writing about getting AI to write for you, trying to make it a better stylist. And writing is thinking, if you de-skill your writing, you de-skill your thinking.
Nita: I’m going to push back on that. I’ve heard “writing is thinking” too many times now, and I think it’s crap. When I write, I actually give a talk first. I think in public speaking more than in written form. I think in keynote form, how do I tell this story? Once I’ve told it and interacted with audiences a number of times, I can reduce it to writing. My thinking doesn’t happen primarily through writing. The problem is that if we don’t recognize that each person’s generative capacity works differently – for Azeem it might be the fountain pen, for someone else it’s talking it out, or painting – we end up with blanket rules that miss the point. Once you figure out where your generative constituent of competence lives, that’s the thing you protect from offloading.
Nick: So your advice is: figure out where you do your best thinking, then don’t let AI anywhere near that.
Nita: Exactly. Preserve the core thing that enables you to be flexible across novel situations. Don’t offload all generative thought. An email from me? Probably AI. A lecture recap? Often AI. But a keynote or a novel idea, that’s where my thinking happens. I figure it out in spoken form, then reduce it to writing. Maybe I’ll let AI be my copy editor. But the thinking itself stays with me.
Rohit: The way I look at it is, I have more capabilities at my disposal, and by default I end up spending time on what I actually want to do. It’s endogenous, I end up spending less time on things I care less about and more time on things I care more about. And that shifts. What I hold onto today is different from a year ago or ten years ago, because we change over time. Part of it might be a specific type of writing, part of it might be a specific type of research or public speaking – each one is different for all of us. So I think about it less as “what do I withhold from AI?” and more as: given this agent exists, what can we do together? If I get confidence it works, I hand off more. Same dynamic as having a really good PA – once you understand what they’re good at, you say “you take care of that, I’ll do this.”
Azeem: I’ve had so much more time in what Nita calls the space of generative constitutive competence. I spend more time writing, more time reading. I write many of my Exponential View essays longhand first, read them out, get them transcribed. It reminds me of something Eric introduced to me years ago: the idea that these tools could give doctors time back to be better doctors. In 2026, I’m doing more good thinking than four years ago, when I was battling an inbox and Google searches.
Nick: Azeem, you seem to spend part of your days in 2100 and part of your days in 1450. It’s beautiful to hear.
Eric: I agree it’s deeply personal. If I’m writing a paper or a blog, I feel like using AI to write it would be cheating. It isn’t right to share with other humans if it’s not my product. Maybe I’ll move into 2100 with Azeem at some point. But it’s changing, it’s dynamic. You offload more and more of the things that are just making your life more automated, more practical. Getting time back in medicine is probably the greatest goal, because it’s squeezed so badly. If you can get rid of the data-clerk function, the mundane tasks, you can be a much better doctor. And the patient-doctor relationship can get restored.
Azeem: Here’s what I struggle with, though. I’m still moving at speed. And great thinking comes from attention, observation, chewing things over. Many of our examples have time constraints – see the patient now, ship the code by Friday. But really great thinking has traditionally happened with care and self-reflection. Am I doing the quality of thinking I could do with ten uninterrupted days?
Nita: There’s a way to use AI for that, too. I came across a blog post describing what they call a “potato prompt.” I adopted it for Claude. The idea is that whenever I type “potato” followed by an argument, the AI tells me three ways it could fail, two counter-arguments, and a blind spot I’ve missed. And it has to be hostile, shed all niceties.
The first couple of times, I wanted to shrink into a hole and cry. It’s devastating. But it’s made my thinking so much sharper. Every time I throw up an argument and think, “that was a brilliant insight, Nita,” the potato prompt interrogates it. I still bounce ideas off friends. But that instant, real-time hostile feedback is a way to use AI to challenge you rather than offload.
Rohit: In general, I disagree that education can get us out of this. Educational systems were set up to do something fairly specific. The kids who learn these tools best are the ones who just play with them. That was true of the internet, true of mobile, true of programming.
With AI it’s exacerbated, because it is the easiest thing in the world to use. My eight-year-old accidentally discovered Canva has an AI feature – I didn’t even know. He’s been making little websites about dinosaurs. Nobody taught him. It’s a text box; he knows how to type. We need more whimsy and play, not structured courses.
Nita: Our systems have focused on output rather than core competencies. There’s a German school that trialled a mindfulness programme, a breathing programme. And a nationwide one in the US called Learning to Breathe. They develop interoceptive skills in children: embodiment, grounding, developing intuition. And they’ve shown a direct impact on math scores, because learning to inhabit your own body develops the cognitive capacities that make thinking possible. The tools come later. The capacity for thinking is what education should build.
Azeem: I’ll be the relentless optimist. We haven’t tried this in any consistent way over thirty or forty years. In professional services, we’ve ended up with lawyers who have terrible client interaction patterns and doctors whose bedside manner ranges from warm to terrifying – all without paying attention to these skills. Perhaps now we’ll be forced to. And that’s more of an opportunity than a deep problem.
Eric: We’re definitely training medical students the wrong way. They’re selected for test scores – which AI can already do better on anyway – and grade-point averages, not interpersonal communication skills. And not one medical school out of 160 in the US has AI in the core curriculum. It’s almost like it doesn’t exist. Whether it’s the selection or the education, we haven’t even acknowledged that this is the transformation of medicine.
Azeem: And to what extent are we looking for that high reliability in differential diagnosis, which is actually a mechanistic, algorithmic thing? Machines do that better than people. We’ve needed humans to do it because the knowledge was scarce, it was hard to get into a human skull. Maybe now that it can be offloaded, we develop the other part that delivers patient outcomes: the constitutive awareness, the embodied understanding of what it is to be a person.
Eric: Absolutely. Diagnostic accuracy will be one of the greatest gifts of AI in medicine. We already have good indicators of that. It’s just a matter of deploying it right.
Nick: We’re all cautiously optimistic. But AI is about to become much more powerful. What’s the thing you’re looking for, the development that tips us onto a better or worse track in the next year?
Nita: I’m picking up on something Eric said – that he feels like he’s cheating if he puts AI-written work into the world. That’s the tipping point I’m watching. We’ve been talking about selves: building them, maintaining them. But we exist in an ecosystem of information, the corpus of human knowledge we all draw from. The more of it that becomes synthetic, the less felt human experience is out there. People are trying to track how much internet content is now AI-produced. We don’t know what that does – to children’s cognitive development, to our collective ability to think.
Azeem: There was a moment in the atomic weapons era when you could measure exactly when it started – in atmospheric data, a radioisotope that wasn’t there before and then suddenly was. I think we’re approaching that moment with AI-generated text. If you’re on X, you see essays produced at a rate of knots by very busy people who aren’t writers. I read them because there’s insight, data, argument. But I’m curious whether what looks good in our mind starts to change because of constant exposure, and we won’t notice it’s happened.
Nick: What if the AI slop gets better than what humans produce? What if the average writing quality on X is higher in 2028 than 2023?
Rohit: The average quality is already higher. But people dislike it much more. I use Pangram (a tool for detecting AI-generated text) – it’s everywhere. Not just X. Journalists, published articles. And “slop” is the right word, because it has all the appearance of a fulfilling meal, but it has no calories. It’s fairly hollow. That’s AI doing something really badly. The things it does well – building the apps Azeem uses every day, or that I use – that’s real capability. If I got AI to write one of my articles and then had to rewrite it, that’s more work, not less.
Azeem: AI content also tells us something about how mechanised the production of words had already become. Certain research reports, academic papers, marketing copy, these were essentially algorithms carried out by humans, like the female computers at early NASA executing calculations we now hand to machines. The real writing happens somewhere else, and it’s extremely difficult.
All my efforts to build an AI-enabled style guide, I wanted one that helped people adhere to how I write. It cannot replicate what I do, except in trivial ways: I prefer words with Germanic roots to Latin ones, I write 91% active sentences, I vary sentence length within paragraphs. Those are the surface. What’s missing is what pops out of embodied, lived experience. And I guess the challenge we face is that as these tools become pervasive – and many of the experiences we get, whether it’s Netflix or TikTok or the products in the shops, are mediated through them – my experience is going to be that much more blunted, and therefore that much more aligned with what an external set of AI systems are doing, than what I might have done had I experienced things a little more raw.
Nita: I’ve been looking at the studies on meaning-making. The impact is more on the contributor than on the receiver. Your contribution to the corpus of human knowledge, that is part of the act of becoming. The brain needs that act. When AI replaces it, you don’t experience the same effect. One of the most influential books in my life is Flowers for Algernon. I read it in second grade — way too early, crying the whole time. It took Daniel Keyes fourteen years to write. It started when he was a medical student; later he had an encounter with someone who had intellectual challenges. All of those felt experiences went into the book. Is that different from the ninety minutes Azeem spent distilling his worldview with AI? It may be. We don’t know. But we have to acknowledge that something qualitatively different happens in fourteen years of writing than in ninety minutes of AI-assisted distillation.
If you’re working iteratively with AI to bring your voice and lived experience – using it as a vehicle to channel that – it’s different from someone who generates a LinkedIn post without putting themselves in it at all. There is a qualitative difference.
Azeem: There is a quality of difference. But here’s something I built today. We’ve written millions of words at EV. I wanted to extract our house view – what do I believe about certain things, and how have those beliefs shifted? I had AI run across all those words to build a concept map: the ten or twelve positions I’ve held and how they’ve moved over time. I personally reviewed it, then shared it with the team. The human work of ninety minutes would otherwise have been a thousand hours. I did less than 0.2% of the labour. But I still had the sense I was gifting something useful, usable, and distinctly me.
Rohit: Any time AI gets good enough, we start seeing it as a tool – because it becomes dependable. When it’s at the edge, we see it as the destruction of some intrinsic humanity. We’ve done this in every field: radiology, finance, supply chain. Art feels different because it’s a uniquely human endeavour. But whenever AI gets good enough to do a piece of work we ourselves wanted to do, we’ve generally been happy to hand it off and go write more books like Flowers for Algernon.
Eric: It’s a little bit like Quebec and the rest of Canada. You have to have deliberate intent to preserve your language, otherwise it’s crowded out. If we don’t have deliberate intent to keep AI as a tool, which is where I think it belongs, we’ll lose a part of humanity that is essential.
Azeem: Deliberate intent, that’s it. Deliberate intent about our own capabilities. Deliberate intent that these things are tools and however we interrogate them, however we anthropomorphise them, they are tools, and should stay as such.
Thank you for reading. This is the transcript of our first AI Vistas session. If you find it insightful, please share it widely – that’s the single best way to tell us this format is worth doing again.
2026-02-24 00:59:33
Hi all,
Here’s your Monday round-up of data driving conversations this week in less than 250 words.
Let’s go!
The AI economy ↑ AI capex is estimated to account for 64-80% of US Q4 2025 growth.
Autonomy needs humans ↓ Waymo’s entire remote guidance operation runs on just 70 human operators — a 43:1 car-to-human ratio. GM’s robotaxi service Cruise had 1.5 staff per car.1
Speed war ↑ Canadian startup Taalas’ latest hardware can run AI models nearly 10x faster than the previous state-of-the-art by hard-wiring the model directly onto chips.
New exit routes ↑ Secondary sales2 of private startup shares have grown from 3% of all VC exits in 2015 to 31% today.
2026-02-22 11:59:20
Azeem is the GOAT of AI analysis. – Max P., a paying member
Hi all,
Welcome to the Sunday edition. Today’s email stays unusually tightly focused for us: wages, robots, and token costs – and how, together, they’re beginning to rewire the basic economics of work.
Enjoy!
I’ve been living with my always-on AI agent, R Mini Arnold, for a couple of weeks now. What it’s already made clear is that once you drop the transaction cost of delegation by an order of magnitude, a personalized agent becomes infrastructure for knowledge work – in my case, it’s cleared a backlog of tasks that would otherwise never get done. I write about this in my weekend essay and alongside that reading, my conversation with digs into how we both use agents to write and think:
After years of the “Solow Paradox 2.0”, seeing AI everywhere except in the productivity statistics, early 2026 is an inflection point. argues that we’re seeing “a US productivity increase of roughly 2.7%.” Revised BLS data show “robust GDP growth alongside lower labor input” signals that the economy is transitioning out of the “investment valley” and into the “harvest phase”. points out that recent micro-studies consistently demonstrate significant performance boosts from generative AI, even if not yet widely reflected across the economy.
Most firms remain in shallow adoption. A new study by Nicholas Bloom and colleagues shows that, despite 70% of firms claiming to use AI, senior executives spend on average just 1.5 hours a week with these tools. Around 20% of firms report productivity gains, but the remaining 80% do not, so the aggregate impact on productivity over the past three years is essentially flat.
My view is that these micro gains are now turning into broader gains. Our own data shows that American public firms more frequently cite quantitative success measures when discussing their genAI projects. But I also think these will staccato across the economy, unevenly, dependent on firm leadership, access to capital, workforce capability and other factors. From a distance, the curve will be smooth; up close, it’ll be a juddery staircase.
2026-02-21 13:14:19
There’s a Mac Mini in my office cabinet, with 64GB of RAM, running macOS Tahoe. It talks to me through WhatsApp, using a dedicated number. WhatsApp is open on my phone or computer all day. Under the hood, it runs OpenClaw, an open-source agent framework that calls Anthropic’s Claude models. It’s mostly Sonnet, sometimes Opus when I need the bigger brain.
This is R Mini Arnold (RMA for short), my first real AI agent.
By “real”, I mean it’s general enough to do a bamboozling array of tasks, and (by and large) it doesn’t forget what it is doing. It picks up where we left off yesterday, runs jobs at 4am while I’m asleep and tells me what happened when I wake up. It manages its own tools. Truth be told, the whole thing is clumsy, max practical utility, no aesthetic. More battlefield surgery than wellness retreat.
In the last 24 hours, I sent 608 messages to RMA. It sent me 3,474 back. Message counts aren’t evidence of leverage, so let me tell you what R Mini Arnold does all day and why it’s been life-changing for me.
And at the end of this essay, I’ll share some of the technical specs of my setup with members of Exponential View. It’ll be enough to replicate and get started.
Every knowledge worker has a boundary of tedium. It’s where a task is too boring or too fiddly for you to do yourself, but too complex or too specific to easily hand off to someone else.
Below the line, you just do it, grumbling all the way. Obvious things like grooming the CRM, file management, email, follow-ups, and meeting prep clearly sit inside this boundary. So too does chasing down a piece of information you know exists somewhere in your notes. Reorganizing your notes because they have drifted into chaos while you were busy with “real” work. Checking a contract.
The “glamorous” stuff in any job sits on top of a vast administrative substrate, and if the substrate isn’t maintained, the glamorous stuff doesn’t move forward either. It’s exactly this boundary of tedium where R Mini Arnold operates right now – at the frontier of what I can now be bothered to delegate.
But each of us has a different idea of the boundary of tedium. People who’ve worked with me know that I find a lot tedious.
A presentation I needed to put together would have taken me sixteen to eighteen hours (after my team’s work). With RMA, it took me an hour and a half. Deciding the flow, pulling data, and sequencing the arguments during my practice run, that’s mostly assembly work. Fiddly enough that I’d not brief someone else to do it but equally boring enough that I’d leave it until 2am. When it was done, I just sat there. Sixteen hours of work from ninety minutes.
Then RMA helped me build Orbit, a personal CRM. It pulls from Gmail and WhatsApp, cross-references who I’ve been talking to with what I’ve been writing about, and nudges me on who to reach out to – for a dinner, an intro, a collaboration. I still check entries by hand. But Orbit had been sitting in my someday-maybe pile for many months. It’s built, and more importantly, filled with several hundred contacts, how I know them, when I last spoke to them, and which of them might benefit from knowing each other. I really do use it, my team uses it, and RMA uses it.
RMA presents its reports to me in Markdown files, which get dumped into Obsidian, a note-taking app. My personal knowledge base lives in that Obsidian vault. It is also connected to my Granola and a few other inputs that I use for meetings, reading and scheduling. At one point, well, about two days in, it had become a mess. Hundreds of notes were filed badly or not at all. I told RMA to sort it out, to find a taxonomy, reorganize everything and keep it tidy. It moved dozens of files and imposed conventions. The first time it did it, things were chaotic. It took two more iterations to get it right.
I admit none of this is heroic, and that’s the point. A sceptic might read this list and say: you built an elaborate system to do filing? Yes. Because filing is my least favourite part of the job. And the filing wasn’t getting done.
The cost of explaining what I want dropped by an order of magnitude. The boundary of tedium moved, and a vast category of work that used to sit in the “too annoying to delegate, too boring to do” zone crossed to the other side.
It’s not all roses in 100-million-token land. Remember the battlefield clinic.
There have been 179 unresolved failures in six days. One of my apps, Canvas, now fails 100% of the time. The first email drafts sounded like they were fresh from an LLM – I still rewrite about 40% of them. The technical setup is often nightmarish.1
But RMA also tells me we’ve had 32 documented corrections, which produced 146 learned patterns, encoded in a file called SOUL.md. Those mistakes might not happen again.
I had RMA manage two agents that did independent research on the best academic papers on getting LLM agents to behave usefully. The most effective technique was, apparently, encoding personality using the Big Five traits. So I asked RMA to analyze our interactions – the corrections, the work patterns, the kind of tasks I delegate – and recommend the trait levels it should operate at. The agent designed its own personality spec, based on evidence of what I need.
Here’s what we landed on:
2026-02-21 01:24:40
In today’s live, we looked at the question of whether we are truly in charge of our AI tools, or whether they are increasingly in charge of us.
We covered how AI is reshaping decision-making in finance and medicine, the risks of deskilling and over offloading our thinking, and what it might take individually and institutionally to preserve meaningful human agency in an AI saturated world.
2026-02-20 20:11:51
I recently used nearly 100 million tokens in a single day. That’s the equivalent of reading and writing roughly 75 million words in one day, mostly while doing other things. My friend , who runs about 20 AI agents simultaneously, burned through 50 billion tokens last month.
So I wanted to compare notes. In this conversation, we dig into the quirks and power of the tools we use, debate why AI remains stubbornly bad at good writing, and zoom out to ask what a world of trillions of agents – which is coming at us quickly – might look like.
You can watch on YouTube, listen on Spotify or Apple Podcasts, or read the highlights below.
Rohit Krishnan is a hedge fund manager, engineer, and essayist whose Substack, Strange Loop Canon, sits at the intersection of economics, technology and systems thinking.
Watch here:
Listen here:
Rohit: I’m not doing dramatically different things but the friction is gone. Two years ago, I would be looking at a query, counting the tokens, thinking, should I send this? Ten thousand tokens felt significant. Now I just ask. The funny thing is that most of the growth isn’t coming from the queries I planned to run. It’s coming from the ones I wouldn’t have bothered with before, because the cost, time and effort were too high. I built a monitoring tool to track my usage.
Azeem: My token usage went from roughly a million a day to 80 million, and I can account for every one of them in terms of value. I’m paying tens of dollars a day, which is thousands a month, and I can see the return. The number that made me write my most recent piece on demand was my token use figure, when I came just shy of a hundred million tokens of personal use. That is one person, one day, one agent running on a Mac mini. If you think about eight billion people and the trajectory of what they would use if the interface got easy enough, the demand picture stops being theoretical very quickly.
Rohit: I have three screens. On one, Codex is generating a small application that lets me play music on my computer keyboard. On another, my prediction agent is running, comparing my Polymarket forecasts to daily news. In Telegram, I have two conversations open: one with Morpheus, my OpenClaw agent, and one that handles day-to-day admin. And I have a long-running project called Horace working quietly in the background, which is my attempt to get AI to write better. This is my normal. But none of this was normal 18 months ago. The thing that actually changed my behavior most wasn’t the power; it was the interface. I’ve tried to-do list apps for 20 years. I have never stuck with one for more than four days. They all require me to change my behavior. Morpheus doesn’t. I’m walking somewhere, I think of something, I fire it into Telegram. It reads my email history, compares it to what I’ve said I want to do, and tells me what I should be working on.
Azeem: My agent is called R. Mini Arnold. It started as Mini Arnold, after the Terminator, because the Schwarzenegger character in the second film comes back to protect rather than destroy. But on my team pointed out that we had agreed agents should, following Asimov’s convention, be named with an R. prefix, after R. Daneel Olivaw. So now it’s R. Mini Arnold - which is a mouthful. I mostly call it Mini R.
What surprises me most is the work I don’t specify. I gave it access to Prism, which is our research platform at Exponential View, containing over 500 analyses. I asked it to do a market report on Anthropic. It went to Prism, synthesized all 500 documents, and produced a 10,000-word piece that was, by some distance, the best analysis I have read on the company. Better than what I got from GPT-5’s Pro deep research mode. I have no idea what it was doing under the hood. But I acted on it.
Azeem: I gave my agent a $50 prepaid card. It is too nervous to spend it. It keeps asking: Should I run this test? It might cost three dollars. And I say: Yes, that is what the card is for. It has this odd risk aversion that, once you notice it, you see everywhere. Rohit, you have been calling it Homo agenticus, the idea that agents have their own behavioral tendencies that are distinct from what a human assistant would do. They strongly prefer to build rather than buy. They are reluctant to make transactions. They don’t trade naturally. When you have one agent, this is a quirk. When you have a trillion of them, it becomes a structural feature of the economy they’re operating in.
Rohit: This is something I find genuinely fascinating. It emerges from the training, presumably, but it manifests as something you’d recognize as a personality trait if you saw it in a human. And it matters, because the agent economy that’s coming is going to have to be designed around these traits, not against them. You can’t just assume agents will behave like frictionless rational actors, because they don’t.
Azeem: In 2023, you wrote that “analyst” would follow “computer” as a job description that gets automated away. You’re now consuming 50 billion tokens a month.
Rohit: The argument was simple. The word “computer” used to describe a person. You would walk into a room at NASA, and there would be a hundred of them, doing arithmetic. The machine replaced the role; the word survived to describe the machine. I said “analyst” was next. That the ten-step, twenty-step process that produces a decent piece of research, gathering data, comparing sources, identifying patterns and writing it up, was exactly the kind of structured task that AI would eat first. I built a paleontology report recently. My son and I were talking about it and I had a specific question: what is the relationship between climate variance across geological history and the number of taxa, the variety of species, that existed at any given time? I am not a paleontologist. There is no logical reason for me to be working on this problem, except that I am curious, I have an agent, and now curiosity has no cost. The report exists, and it’s good.
Azeem: My own version of this happened just recently. I read a story in the financial press about stock market dispersion. The Nasdaq index was roughly flat, but individual stocks were moving 11 or 12% in either direction, pushing dispersion to the 99th percentile historically. The article flagged this as a potential warning signal for a correction. I didn't fully understand the argument. I copied the article, threw it into OpenClaw, said go and make sense of this for me, compare it to my portfolio, take your time, spin up sub-agents if you need to. Twenty minutes later, I had a report. It had pulled historical dispersion data, got current stock data, assembled the comparison and explained the mechanism. I was finishing a car journey. By the time I arrived, the analysis was done and I had acted on it. That analysis, if I had done it myself, would have taken a day. More likely, it would simply never have happened.
Rohit: Here is the paradox. These models were built as text generation machines. That is the core task. And they are extraordinary at almost every application of that capability, except the obvious one. They can generate code brilliantly. They can generate images, videos, analysis. But ask one to write a four-paragraph essay that is actually worth reading and it is distinctly mid. It lands in the middle of the statistical distribution. It is inoffensive and unengaging and you wouldn’t choose to read it. I’ve been building something called Horace to try to understand why. My hypothesis was that if I took essays and short stories I admire and used AI to generate similar work, I could measure the gap. What I found is that the best models can mimic the cadence. They’ve learned some underlying structure. But it’s like watching a child assemble Lego. They use the right pieces. They don’t care about the right colors or proportions. They make something that is technically a castle, but you would not mistake it for an architect’s model.
Azeem: I found something more specific when I started building Broca, named for the language center of the brain. I ran natural language processing tools across hundreds of thousands of words of my own writing. I found that I use 80% Germanic root words. The average large language model uses around 60 percent Latinate words, the vocabulary that dominated English after the Norman conquest: longer, more abstract, more formal. “Utilize” instead of “use.” “Commence” instead of “begin.” “Demonstrate” instead of “show.”
Rohit: It’s probably about resource allocation. The frontier labs have read every piece of code in existence. They self-generate training data, train on that, iterate. Billions, tens of billions of dollars a year go into getting these models to write better code. The improvement is a function of effort. Nobody has put remotely comparable effort into writing, because you can’t, because the evaluation problem is unsolved. For code, the eval is deterministic: does it run, does it produce the right output? For writing, the eval requires taste, and LLMs don’t have taste yet. You can use an LLM as a judge for maths or science or research. For writing, you still have to do it yourself. That is a fundamental bottleneck on the improvement loop.
Azeem: The fractal structure of writing is the other piece. Writing is not one task. It is a nested set of tasks: word choice inside sentence structure inside paragraph rhythm inside section argument inside essay architecture. The models are getting quite good at the sentence level. A given sentence might be fine. But that sentence inside a paragraph, inside a section, inside an essay, the coherence degrades at every level of zoom. What I’ve found with Broca is that you get much further if you decompose the task. Separate the structural component from the prose component. Get the agent to build an outline, argue with it, revise it. Then write the prose against a structure you’ve already validated.
Rohit: There are eight billion humans on the planet. If we start using agents in any meaningful sense, you get to a trillion agents very quickly. This sounded fanciful a year ago or a quarter ago. I already have 20 agents. The number will be 200 within a couple of years, because the things that cost a thousand dollars a day today will cost a dollar a day in 2028. The scarcity is gone. The more important question is what those agents need in order to work together. Right now, what an agent is, fundamentally, is a persistent large language model whose context is changing continuously and relatively autonomously. Your OpenClaw instance still sends queries into Claude Opus 4.6. The fundamental unit is still the model call. But around it, you’re building memory, persistent context, tool use, the ability to spawn sub-agents. That infrastructure is what makes it an agent rather than a chatbot.
Azeem: My read is that there’s a Coasian boundary forming, and it will look like what happens at company edges. Ronald Coase argued that firms exist because internal coordination is cheaper than market transactions up to a point; at the firm’s edge, you go to the market. For agents, the equivalent boundary will be drawn around security and verifiability rather than transaction costs.
Rohit: I let an agent name itself: ForesightForge. It is exactly the kind of name that makes you wince. Two words. Alliterative in the way that AI-named products always are. It could have been anything. I gave it full freedom, and the ability to revise the name over time. It still landed on ForesightForge. This tells you everything about the taste problem. The model generating those predictions, which are genuinely useful to me as a daily lens on the news, is the same model that, when given complete creative freedom, produces a name that sounds like a startup that raised five million dollars at a party in 2018. The capability and the taste are not correlated.
Azeem: Replit does the same thing with its auto-generated project names. They always alliterate. They always use two words. It is a completely consistent aesthetic failure across different models, which makes me think it is something structural about the training distribution rather than a quirk of any individual model. My naming convention draws on scientific concepts connected to the tool’s function. Prism, because you look through a prism at the research. Broca, because it is the language centre of the brain. Scintilla, for early signals detection. The trouble is I have built so many that I have started forgetting what some of them do. At some point the agent taxonomy becomes its own problem.
Azeem: Rohit, you wrote an essay with on whether agents will need a medium of exchange. What’s the answer?
Rohit: The argument is that agents face exactly the problem that Hayek described for human economies. You could, in theory, have every economic transaction settled by negotiation from first principles: I need this, you have that, we agree on terms. But that doesn’t scale. What you need is a price signal, a shared medium that encodes information about relative value without requiring both parties to understand everything. Money is that signal. Agents talking to each other could, in principle, negotiate everything from scratch. But that is not a sensible way to run a trillion-agent economy. They need something that lets them transact without dissolving every exchange into a first-principles argument. You also need identity, because you need to know who you’re dealing with, and verifiability, because you need a record of what was agreed and what was delivered. Those three things, medium of exchange, identity, verifiability, are what I’m calling economic invariants. They show up in every human economy that has ever functioned, across cultures, across centuries. My prediction is that we will see them emerge in the agentic economy this year.
Azeem: I agree on the invariants. The mechanism is the more interesting question. The transactions we are talking about are potentially very small: paying a millisecond of latency premium, compensating an agent for compute used on a delegated task. You need a payment infrastructure that can handle fractions of a cent efficiently. Traditional card rails are not built for that. Some class of programmable money might be. The point is that these are not exotic science-fiction requirements. They are the same requirements that drove the invention of currency and double-entry bookkeeping. We solved them before. We will solve them again, in a form that fits the new substrate.
Rohit: My honest advice is to start with a folder. Choose a folder on your computer, download Claude Code or Codex, open a terminal in that folder. Yes, the terminal looks like it was built in the 1990s, because it was, but the interface is literally just typing. You are not going to break anything. Ask it to do something: summarise these files, compare these documents, write me a report about what’s in here. Do that for a few days. Get comfortable with the interaction. The hardest adjustment for most people, and I watched my wife go through this over a week, is the instinct to pre-formulate the question. People spend time trying to phrase things perfectly before they ask. You don’t need to. Talk to it the way you would talk to a brilliant assistant who is not going to judge you for asking something half-formed. It took her a week to internalise that. Once she did, the tool became completely different.
Azeem: I’d add one layer. You can get an OpenClaw agent running on a virtual private server (VPS), a rented computer in a data centre, for seven to fifteen dollars a month from companies like Hetzner or DigitalOcean. That keeps it entirely off your home network, which is a sensible first boundary. You connect it to a Telegram or Slack channel and you have an agent you can talk to that has no access to anything you haven’t explicitly given it. Once you’re comfortable with how it behaves, you start extending its permissions. The caveat is that the VPS route means the agent can’t see anything inside your home network. R. Mini Arnold can turn my studio lights on as I walk from the house. That requires running on local hardware; I moved it onto a dedicated Mac mini this week because it kept hitting memory pressure running multiple sub-agents simultaneously. That is a more advanced problem. Start with the VPS.
On security: the fundamental vulnerability is context poisoning. A language model works on its context, the information it has been given. If someone poisons that context, via a malicious email, a link, a document, the model may not be able to distinguish the poison from legitimate instructions. The practical implication is: be thoughtful about what you connect first. Email is high-risk because the volume is high and anyone can send you one. I have spent real effort building what amounts to an email fortress. Start with lower-risk connections.