MoreRSS

site iconExponential ViewModify

By Azeem Azhar, an expert on artificial intelligence and exponential technologies.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Exponential View

📈 Data to start your week

2026-02-24 00:59:33

Hi all,

Here’s your Monday round-up of data driving conversations this week in less than 250 words.

Let’s go!

Subscribe now


  1. The AI economy ↑ AI capex is estimated to account for 64-80% of US Q4 2025 growth.

  2. Autonomy needs humans ↓ Waymo’s entire remote guidance operation runs on just 70 human operators — a 43:1 car-to-human ratio. GM’s robotaxi service Cruise had 1.5 staff per car.1

  3. Speed war ↑ Canadian startup Taalas’ latest hardware can run AI models nearly 10x faster than the previous state-of-the-art by hard-wiring the model directly onto chips.

  4. New exit routes ↑ Secondary sales2 of private startup shares have grown from 3% of all VC exits in 2015 to 31% today.

Read more

🔮 Exponential View #562: Agents & the tedium frontier; AI in the statistics; robot insurance; Claude at war, hacking pigeons, AI dignosis++

2026-02-22 11:59:20

Azeem is the GOAT of AI analysis. – Max P., a paying member

Subscribe now


Hi all,

Welcome to the Sunday edition. Today’s email stays unusually tightly focused for us: wages, robots, and token costs – and how, together, they’re beginning to rewire the basic economics of work.

Enjoy!

The tedium frontier has moved

I’ve been living with my always-on AI agent, R Mini Arnold, for a couple of weeks now. What it’s already made clear is that once you drop the transaction cost of delegation by an order of magnitude, a personalized agent becomes infrastructure for knowledge work – in my case, it’s cleared a backlog of tasks that would otherwise never get done. I write about this in my weekend essay and alongside that reading, my conversation with digs into how we both use agents to write and think:


The productivity staircase

After years of the “Solow Paradox 2.0”, seeing AI everywhere except in the productivity statistics, early 2026 is an inflection point. argues that we’re seeing “a US productivity increase of roughly 2.7%.” Revised BLS data show “robust GDP growth alongside lower labor input” signals that the economy is transitioning out of the “investment valley” and into the “harvest phase”. points out that recent micro-studies consistently demonstrate significant performance boosts from generative AI, even if not yet widely reflected across the economy.

Most firms remain in shallow adoption. A new study by Nicholas Bloom and colleagues shows that, despite 70% of firms claiming to use AI, senior executives spend on average just 1.5 hours a week with these tools. Around 20% of firms report productivity gains, but the remaining 80% do not, so the aggregate impact on productivity over the past three years is essentially flat.

My view is that these micro gains are now turning into broader gains. Our own data shows that American public firms more frequently cite quantitative success measures when discussing their genAI projects. But I also think these will staccato across the economy, unevenly, dependent on firm leadership, access to capital, workforce capability and other factors. From a distance, the curve will be smooth; up close, it’ll be a juddery staircase.

Read more

🫵 You already have an AI agent.

2026-02-21 13:14:19

There’s a Mac Mini in my office cabinet, with 64GB of RAM, running macOS Tahoe. It talks to me through WhatsApp, using a dedicated number. WhatsApp is open on my phone or computer all day. Under the hood, it runs OpenClaw, an open-source agent framework that calls Anthropic’s Claude models. It’s mostly Sonnet, sometimes Opus when I need the bigger brain.

This is R Mini Arnold (RMA for short), my first real AI agent.

By “real”, I mean it’s general enough to do a bamboozling array of tasks, and (by and large) it doesn’t forget what it is doing. It picks up where we left off yesterday, runs jobs at 4am while I’m asleep and tells me what happened when I wake up. It manages its own tools. Truth be told, the whole thing is clumsy, max practical utility, no aesthetic. More battlefield surgery than wellness retreat.

The Gross Clinic by Thomas Eakins (1875)

In the last 24 hours, I sent 608 messages to RMA. It sent me 3,474 back. Message counts aren’t evidence of leverage, so let me tell you what R Mini Arnold does all day and why it’s been life-changing for me.

And at the end of this essay, I’ll share some of the technical specs of my setup with members of Exponential View. It’ll be enough to replicate and get started.

Subscribe now

The boundary of tedium

Every knowledge worker has a boundary of tedium. It’s where a task is too boring or too fiddly for you to do yourself, but too complex or too specific to easily hand off to someone else.

Below the line, you just do it, grumbling all the way. Obvious things like grooming the CRM, file management, email, follow-ups, and meeting prep clearly sit inside this boundary. So too does chasing down a piece of information you know exists somewhere in your notes. Reorganizing your notes because they have drifted into chaos while you were busy with “real” work. Checking a contract.

The “glamorous” stuff in any job sits on top of a vast administrative substrate, and if the substrate isn’t maintained, the glamorous stuff doesn’t move forward either. It’s exactly this boundary of tedium where R Mini Arnold operates right now – at the frontier of what I can now be bothered to delegate.

But each of us has a different idea of the boundary of tedium. People who’ve worked with me know that I find a lot tedious.

A presentation I needed to put together would have taken me sixteen to eighteen hours (after my team’s work). With RMA, it took me an hour and a half. Deciding the flow, pulling data, and sequencing the arguments during my practice run, that’s mostly assembly work. Fiddly enough that I’d not brief someone else to do it but equally boring enough that I’d leave it until 2am. When it was done, I just sat there. Sixteen hours of work from ninety minutes.

Then RMA helped me build Orbit, a personal CRM. It pulls from Gmail and WhatsApp, cross-references who I’ve been talking to with what I’ve been writing about, and nudges me on who to reach out to – for a dinner, an intro, a collaboration. I still check entries by hand. But Orbit had been sitting in my someday-maybe pile for many months. It’s built, and more importantly, filled with several hundred contacts, how I know them, when I last spoke to them, and which of them might benefit from knowing each other. I really do use it, my team uses it, and RMA uses it.

RMA presents its reports to me in Markdown files, which get dumped into Obsidian, a note-taking app. My personal knowledge base lives in that Obsidian vault. It is also connected to my Granola and a few other inputs that I use for meetings, reading and scheduling. At one point, well, about two days in, it had become a mess. Hundreds of notes were filed badly or not at all. I told RMA to sort it out, to find a taxonomy, reorganize everything and keep it tidy. It moved dozens of files and imposed conventions. The first time it did it, things were chaotic. It took two more iterations to get it right.

I admit none of this is heroic, and that’s the point. A sceptic might read this list and say: you built an elaborate system to do filing? Yes. Because filing is my least favourite part of the job. And the filing wasn’t getting done.

The cost of explaining what I want dropped by an order of magnitude. The boundary of tedium moved, and a vast category of work that used to sit in the “too annoying to delegate, too boring to do” zone crossed to the other side.

Subscribe now

What 179 failures built

It’s not all roses in 100-million-token land. Remember the battlefield clinic.

Annoyed 😬

There have been 179 unresolved failures in six days. One of my apps, Canvas, now fails 100% of the time. The first email drafts sounded like they were fresh from an LLM – I still rewrite about 40% of them. The technical setup is often nightmarish.1

But RMA also tells me we’ve had 32 documented corrections, which produced 146 learned patterns, encoded in a file called SOUL.md. Those mistakes might not happen again.

I had RMA manage two agents that did independent research on the best academic papers on getting LLM agents to behave usefully. The most effective technique was, apparently, encoding personality using the Big Five traits. So I asked RMA to analyze our interactions – the corrections, the work patterns, the kind of tasks I delegate – and recommend the trait levels it should operate at. The agent designed its own personality spec, based on evidence of what I need.

Here’s what we landed on:

Read more

Are we in charge of our AI tools, or are they in charge of us?

2026-02-21 01:24:40

In today’s live, we looked at the question of whether we are truly in charge of our AI tools, or whether they are increasingly in charge of us.

We covered how AI is reshaping decision-making in finance and medicine, the risks of deskilling and over offloading our thinking, and what it might take individually and institutionally to preserve meaningful human agency in an AI saturated world.

🔮 Entering the trillion-agent economy

2026-02-20 20:11:51

I recently used nearly 100 million tokens in a single day. That’s the equivalent of reading and writing roughly 75 million words in one day, mostly while doing other things. My friend , who runs about 20 AI agents simultaneously, burned through 50 billion tokens last month.

So I wanted to compare notes. In this conversation, we dig into the quirks and power of the tools we use, debate why AI remains stubbornly bad at good writing, and zoom out to ask what a world of trillions of agents – which is coming at us quickly – might look like.

You can watch on YouTube, listen on Spotify or Apple Podcasts, or read the highlights below.

Rohit Krishnan is a hedge fund manager, engineer, and essayist whose Substack, Strange Loop Canon, sits at the intersection of economics, technology and systems thinking.

Watch here:

Listen here:

What does 50 billion tokens buy you?

Rohit: I’m not doing dramatically different things but the friction is gone. Two years ago, I would be looking at a query, counting the tokens, thinking, should I send this? Ten thousand tokens felt significant. Now I just ask. The funny thing is that most of the growth isn’t coming from the queries I planned to run. It’s coming from the ones I wouldn’t have bothered with before, because the cost, time and effort were too high. I built a monitoring tool to track my usage.

Azeem: My token usage went from roughly a million a day to 80 million, and I can account for every one of them in terms of value. I’m paying tens of dollars a day, which is thousands a month, and I can see the return. The number that made me write my most recent piece on demand was my token use figure, when I came just shy of a hundred million tokens of personal use. That is one person, one day, one agent running on a Mac mini. If you think about eight billion people and the trajectory of what they would use if the interface got easy enough, the demand picture stops being theoretical very quickly.

What are our agents doing all day?

Rohit: I have three screens. On one, Codex is generating a small application that lets me play music on my computer keyboard. On another, my prediction agent is running, comparing my Polymarket forecasts to daily news. In Telegram, I have two conversations open: one with Morpheus, my OpenClaw agent, and one that handles day-to-day admin. And I have a long-running project called Horace working quietly in the background, which is my attempt to get AI to write better. This is my normal. But none of this was normal 18 months ago. The thing that actually changed my behavior most wasn’t the power; it was the interface. I’ve tried to-do list apps for 20 years. I have never stuck with one for more than four days. They all require me to change my behavior. Morpheus doesn’t. I’m walking somewhere, I think of something, I fire it into Telegram. It reads my email history, compares it to what I’ve said I want to do, and tells me what I should be working on.

Azeem: My agent is called R. Mini Arnold. It started as Mini Arnold, after the Terminator, because the Schwarzenegger character in the second film comes back to protect rather than destroy. But on my team pointed out that we had agreed agents should, following Asimov’s convention, be named with an R. prefix, after R. Daneel Olivaw. So now it’s R. Mini Arnold - which is a mouthful. I mostly call it Mini R.

What surprises me most is the work I don’t specify. I gave it access to Prism, which is our research platform at Exponential View, containing over 500 analyses. I asked it to do a market report on Anthropic. It went to Prism, synthesized all 500 documents, and produced a 10,000-word piece that was, by some distance, the best analysis I have read on the company. Better than what I got from GPT-5’s Pro deep research mode. I have no idea what it was doing under the hood. But I acted on it.

Agents too nervous to spend $?

Azeem: I gave my agent a $50 prepaid card. It is too nervous to spend it. It keeps asking: Should I run this test? It might cost three dollars. And I say: Yes, that is what the card is for. It has this odd risk aversion that, once you notice it, you see everywhere. Rohit, you have been calling it Homo agenticus, the idea that agents have their own behavioral tendencies that are distinct from what a human assistant would do. They strongly prefer to build rather than buy. They are reluctant to make transactions. They don’t trade naturally. When you have one agent, this is a quirk. When you have a trillion of them, it becomes a structural feature of the economy they’re operating in.

Rohit: This is something I find genuinely fascinating. It emerges from the training, presumably, but it manifests as something you’d recognize as a personality trait if you saw it in a human. And it matters, because the agent economy that’s coming is going to have to be designed around these traits, not against them. You can’t just assume agents will behave like frictionless rational actors, because they don’t.

Subscribe now

The analyst is next

Azeem: In 2023, you wrote that “analyst” would follow “computer” as a job description that gets automated away. You’re now consuming 50 billion tokens a month.

Rohit: The argument was simple. The word “computer” used to describe a person. You would walk into a room at NASA, and there would be a hundred of them, doing arithmetic. The machine replaced the role; the word survived to describe the machine. I said “analyst” was next. That the ten-step, twenty-step process that produces a decent piece of research, gathering data, comparing sources, identifying patterns and writing it up, was exactly the kind of structured task that AI would eat first. I built a paleontology report recently. My son and I were talking about it and I had a specific question: what is the relationship between climate variance across geological history and the number of taxa, the variety of species, that existed at any given time? I am not a paleontologist. There is no logical reason for me to be working on this problem, except that I am curious, I have an agent, and now curiosity has no cost. The report exists, and it’s good.

Azeem: My own version of this happened just recently. I read a story in the financial press about stock market dispersion. The Nasdaq index was roughly flat, but individual stocks were moving 11 or 12% in either direction, pushing dispersion to the 99th percentile historically. The article flagged this as a potential warning signal for a correction. I didn't fully understand the argument. I copied the article, threw it into OpenClaw, said go and make sense of this for me, compare it to my portfolio, take your time, spin up sub-agents if you need to. Twenty minutes later, I had a report. It had pulled historical dispersion data, got current stock data, assembled the comparison and explained the mechanism. I was finishing a car journey. By the time I arrived, the analysis was done and I had acted on it. That analysis, if I had done it myself, would have taken a day. More likely, it would simply never have happened.

The world’s best text machine can’t write

Rohit: Here is the paradox. These models were built as text generation machines. That is the core task. And they are extraordinary at almost every application of that capability, except the obvious one. They can generate code brilliantly. They can generate images, videos, analysis. But ask one to write a four-paragraph essay that is actually worth reading and it is distinctly mid. It lands in the middle of the statistical distribution. It is inoffensive and unengaging and you wouldn’t choose to read it. I’ve been building something called Horace to try to understand why. My hypothesis was that if I took essays and short stories I admire and used AI to generate similar work, I could measure the gap. What I found is that the best models can mimic the cadence. They’ve learned some underlying structure. But it’s like watching a child assemble Lego. They use the right pieces. They don’t care about the right colors or proportions. They make something that is technically a castle, but you would not mistake it for an architect’s model.

Azeem: I found something more specific when I started building Broca, named for the language center of the brain. I ran natural language processing tools across hundreds of thousands of words of my own writing. I found that I use 80% Germanic root words. The average large language model uses around 60 percent Latinate words, the vocabulary that dominated English after the Norman conquest: longer, more abstract, more formal. “Utilize” instead of “use.” “Commence” instead of “begin.” “Demonstrate” instead of “show.”

Rohit: It’s probably about resource allocation. The frontier labs have read every piece of code in existence. They self-generate training data, train on that, iterate. Billions, tens of billions of dollars a year go into getting these models to write better code. The improvement is a function of effort. Nobody has put remotely comparable effort into writing, because you can’t, because the evaluation problem is unsolved. For code, the eval is deterministic: does it run, does it produce the right output? For writing, the eval requires taste, and LLMs don’t have taste yet. You can use an LLM as a judge for maths or science or research. For writing, you still have to do it yourself. That is a fundamental bottleneck on the improvement loop.

Azeem: The fractal structure of writing is the other piece. Writing is not one task. It is a nested set of tasks: word choice inside sentence structure inside paragraph rhythm inside section argument inside essay architecture. The models are getting quite good at the sentence level. A given sentence might be fine. But that sentence inside a paragraph, inside a section, inside an essay, the coherence degrades at every level of zoom. What I’ve found with Broca is that you get much further if you decompose the task. Separate the structural component from the prose component. Get the agent to build an outline, argue with it, revise it. Then write the prose against a structure you’ve already validated.

Subscribe now

The world of a trillion agents

Rohit: There are eight billion humans on the planet. If we start using agents in any meaningful sense, you get to a trillion agents very quickly. This sounded fanciful a year ago or a quarter ago. I already have 20 agents. The number will be 200 within a couple of years, because the things that cost a thousand dollars a day today will cost a dollar a day in 2028. The scarcity is gone. The more important question is what those agents need in order to work together. Right now, what an agent is, fundamentally, is a persistent large language model whose context is changing continuously and relatively autonomously. Your OpenClaw instance still sends queries into Claude Opus 4.6. The fundamental unit is still the model call. But around it, you’re building memory, persistent context, tool use, the ability to spawn sub-agents. That infrastructure is what makes it an agent rather than a chatbot.

Azeem: My read is that there’s a Coasian boundary forming, and it will look like what happens at company edges. Ronald Coase argued that firms exist because internal coordination is cheaper than market transactions up to a point; at the firm’s edge, you go to the market. For agents, the equivalent boundary will be drawn around security and verifiability rather than transaction costs.

An agent names itself

Rohit: I let an agent name itself: ForesightForge. It is exactly the kind of name that makes you wince. Two words. Alliterative in the way that AI-named products always are. It could have been anything. I gave it full freedom, and the ability to revise the name over time. It still landed on ForesightForge. This tells you everything about the taste problem. The model generating those predictions, which are genuinely useful to me as a daily lens on the news, is the same model that, when given complete creative freedom, produces a name that sounds like a startup that raised five million dollars at a party in 2018. The capability and the taste are not correlated.

Azeem: Replit does the same thing with its auto-generated project names. They always alliterate. They always use two words. It is a completely consistent aesthetic failure across different models, which makes me think it is something structural about the training distribution rather than a quirk of any individual model. My naming convention draws on scientific concepts connected to the tool’s function. Prism, because you look through a prism at the research. Broca, because it is the language centre of the brain. Scintilla, for early signals detection. The trouble is I have built so many that I have started forgetting what some of them do. At some point the agent taxonomy becomes its own problem.

Will agents need money?

Azeem: Rohit, you wrote an essay with on whether agents will need a medium of exchange. What’s the answer?

Rohit: The argument is that agents face exactly the problem that Hayek described for human economies. You could, in theory, have every economic transaction settled by negotiation from first principles: I need this, you have that, we agree on terms. But that doesn’t scale. What you need is a price signal, a shared medium that encodes information about relative value without requiring both parties to understand everything. Money is that signal. Agents talking to each other could, in principle, negotiate everything from scratch. But that is not a sensible way to run a trillion-agent economy. They need something that lets them transact without dissolving every exchange into a first-principles argument. You also need identity, because you need to know who you’re dealing with, and verifiability, because you need a record of what was agreed and what was delivered. Those three things, medium of exchange, identity, verifiability, are what I’m calling economic invariants. They show up in every human economy that has ever functioned, across cultures, across centuries. My prediction is that we will see them emerge in the agentic economy this year.

Azeem: I agree on the invariants. The mechanism is the more interesting question. The transactions we are talking about are potentially very small: paying a millisecond of latency premium, compensating an agent for compute used on a delegated task. You need a payment infrastructure that can handle fractions of a cent efficiently. Traditional card rails are not built for that. Some class of programmable money might be. The point is that these are not exotic science-fiction requirements. They are the same requirements that drove the invention of currency and double-entry bookkeeping. We solved them before. We will solve them again, in a form that fits the new substrate.

How do you start?

Rohit: My honest advice is to start with a folder. Choose a folder on your computer, download Claude Code or Codex, open a terminal in that folder. Yes, the terminal looks like it was built in the 1990s, because it was, but the interface is literally just typing. You are not going to break anything. Ask it to do something: summarise these files, compare these documents, write me a report about what’s in here. Do that for a few days. Get comfortable with the interaction. The hardest adjustment for most people, and I watched my wife go through this over a week, is the instinct to pre-formulate the question. People spend time trying to phrase things perfectly before they ask. You don’t need to. Talk to it the way you would talk to a brilliant assistant who is not going to judge you for asking something half-formed. It took her a week to internalise that. Once she did, the tool became completely different.

Azeem: I’d add one layer. You can get an OpenClaw agent running on a virtual private server (VPS), a rented computer in a data centre, for seven to fifteen dollars a month from companies like Hetzner or DigitalOcean. That keeps it entirely off your home network, which is a sensible first boundary. You connect it to a Telegram or Slack channel and you have an agent you can talk to that has no access to anything you haven’t explicitly given it. Once you’re comfortable with how it behaves, you start extending its permissions. The caveat is that the VPS route means the agent can’t see anything inside your home network. R. Mini Arnold can turn my studio lights on as I walk from the house. That requires running on local hardware; I moved it onto a dedicated Mac mini this week because it kept hitting memory pressure running multiple sub-agents simultaneously. That is a more advanced problem. Start with the VPS.

On security: the fundamental vulnerability is context poisoning. A language model works on its context, the information it has been given. If someone poisons that context, via a malicious email, a link, a document, the model may not be able to distinguish the poison from legitimate instructions. The practical implication is: be thoughtful about what you connect first. Email is high-risk because the volume is high and anyone can send you one. I have spent real effort building what amounts to an email fortress. Start with lower-risk connections.

Leave a comment

📈 Data to start your week

2026-02-16 23:58:26

Hi all,

Here’s your Monday round-up of data driving conversations this week in less than 250 words.

Let’s go!


  1. AI boom ↑ US data center and software investment now exceeds $1 trillion annualized (3.5% of GDP).1 See our analysis.

  2. Prompt assist ↑ On a realistic business task, AI cut the performance gap between less‑ and more-educated adults by three-quarters.

  3. Appetite for profit ↑ The global GLP-1 drug market is set to make up almost 16% of the global pharmaceutical market by 2030, at ~$268 billion.

  4. Grid hunger ↑ IEA forecasts global electricity demand through 2030 to grow 50% faster than the past decade, as consumption from industry, electric vehicles, air conditioning and data centers increases.

  5. Peak China carbon ↓ China’s CO2 emissions fell 0.3% in 2025 – flat or falling for 21 straight months.

  6. Can’t beat the sun ↑ More than 99% of all new US electricity generating capacity in 2026 will be solar, wind and storage, equivalent to adding 70 nuclear plants.

  7. The reasoning jump ↑ Gemini 3 Deep Think scored a record 84.6% on ARC-AGI-2.2 ARC-AGI-3 drops in March.

  8. Europe’s AI alternative ↑ Mistral shot over $400 million annualized revenue, up more than 20x in the last year.

  9. Generation AI ↑ People between 16 and 24 years old in the EU are twice as likely to use generative AI (64%) as the general population (32.7%).


Today’s edition is brought to you by Polymarket

Who has the best AI model by the end of February?

View Polymarket


*Disclaimer: Polymarket is a real‑money prediction market. Trading involves risk and may not be available in all jurisdictions. Nothing here is investment, tax, or legal advice. Please check your local regulations and only stake what you can afford to lose.


Thanks for reading!

1

The multi‑hundred‑billion capital expenditure plans reported by big tech firms are global capex numbers that also include other assets (such as warehouses, logistics infrastructure, operating costs and non‑data‑center facilities that aren't included in the buckets above), and so they cannot be directly compared to the US‑only fixed‑investment values in the graph here.

2

ARC-AGI benchmarks test how well AI can solve novel reasoning puzzles that humans generally have no issue with.