MoreRSS

site iconUnderstanding AIModify

By Timothy B. Lee, a tech reporter with a master’s in computer science, covers AI progress and policy.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Understanding AI

16 charts that explain the AI boom

2025-10-28 02:37:58

AI bubble talk has intensified in recent weeks. Record investments combined with economic weakness have left some leery of a dip in investor confidence and potential economic pain.

For now, though, we’re still in the AI boom. Nvidia keeps hitting record highs. OpenAI released another hit product, Sora, which quickly rose to the top of the app store. Anthropic and Google announced a deal last Thursday to give Anthropic access to up to one million of Google’s chips.

In this piece, we’ll visualize the AI boom in a series of charts. It’s hard to put all of AI progress into one graph. So here are 16.

If you enjoy this article, please subscribe to get future pieces delivered straight to your inbox. It’s free!

1. The largest technology companies are investing heavily in AI

If I had to put the current state of AI financials into one chart, it might be this one.

Training and running current AI models require huge, expensive collections of GPUs, stored in data centers. Someone has to invest the money to buy the chips and build the data centers. One major contender: big tech firms, who account for 44% of the total data center market. This chart shows how much money five big tech firms have spent on an annualized basis on capital expenditures, or capex.

Not all tech capex is spent on data centers, and not all data centers are dedicated to AI. The spending shown in this chart includes all the equipment and infrastructure a company buys. For instance, Amazon also needs to pay for new warehouses to ship packages. Google’s capex also covers servers to support Google search.

But a large and increasing percentage of this spending is AI related. For instance, Amazon’s CEO said in their Q4 2024 earnings call that AI investment was “the vast majority” of Amazon’s recent capex. And it will continue to grow. In Meta’s Q2 2025 earnings call, CFO Susan Li noted that “scaling GenAI capacity” will be the biggest driver of increased 2026 capex.

2. AI spending is significant in historical terms

Amazon, Meta, Microsoft, Alphabet, and Oracle spent $241 billion in capex in 2024 — that was 0.82% of US GDP for that year. In the second quarter this year, the tech giants spent $97 billion — 1.28% of the period’s US GDP.

If this pace of spending continues for the rest of 2025, it will exceed peak annual spending during some of the most famous investment booms in the modern era, including the Manhattan Project, NASA’s spending on the Apollo Project, and the internet broadband buildout that accompanied the dot-com boom.

This isn’t the largest investment in American history — Paul Kedrosky estimated that railroad investments peaked at about 6% of the US economy in the 1880s. But it’s still one of the largest investment booms since World War II. And tech industry executives have signaled they plan to spend even more in 2026.

One caveat about this graph: not all the tech company capex is directed at the US. Probably only around 70 to 75% of the big tech capex is going to the US.1 However, not all data center spending is captured by big tech capex, and other estimates of US AI spending also lie around 1.2 to 1.3% of US GDP.

3. Companies are importing a lot of AI chips

AI chips have a famously long and complicated supply chain. Highly specialized equipment is manufactured all across the world, with most chips assembled by TSMC in Taiwan. Basically all of the AI chips that tech companies buy for US use have to be imported first.

The Census data displayed above show this clearly. There’s no specific trade category corresponding to Nvidia GPUs (or Google’s TPUs), but the red line corresponds to large computers (“automatic data processing machines” minus laptops), a category that includes most GPU and TPU imports. This has spiked to over $200 billion in annualized spending recently. Similarly, imports of computer parts and accessories (HS 8473.30), such as hard drives or power supply units, have also doubled in the past year.

These imports have been exempted from Trump’s tariff scheme. Without that exemption, companies would have had to pay somewhere between $10 and $20 billion in tariffs on these imports, according to Joey Politano.

4. They’re building a lot of data centers too

The chart above shows the construction costs of all data centers built in the US, according to Census data. This doesn’t include the value of the GPUs themselves, nor of the underlying land. (The Stargate data center complex in Abilene, Texas is large enough to be seen from space). Even so, investment is skyrocketing.

In regions where data centers are built, the biggest economic benefits come during construction. A report to the Virginia legislature estimated that a 250,000 square foot data center (around 30 megawatts of capacity) would employ up to 1,500 construction workers, but only 50 full-time workers after the work was completed.

A few select counties do still earn significant tax revenues; Loudoun County in Virginia earns 38% of their tax revenue from data centers. However, Loudoun County also has the highest data center concentration in the United States, so most areas receive less benefit.

Subscribe now

5. Data centers, particularly large ones, are geographically concentrated

This is a map from the National Renewable Energy Laboratory (NREL) which shows the location of data centers currently operating or under construction. Each circle represents an individual data center; larger circles (with a deeper shade of red) are bigger facilities.

There’s a clear clustering pattern, where localities with favorable conditions — like cheap energy, good network connectivity, or a permissive regulatory environment — attract most of the facilities in operation or under construction. This is particularly true of the big data centers being constructed for AI training and inference. Unlike data centers serving internet traffic, AI workloads don’t require ultra-low latency, so they don’t need to be located close to users.

6. Few data centers are being built in California

The chart above shows ten of the largest regions in the US for data center development, according to CBRE, a commercial real estate company. Northern Virginia has the largest data center concentration in the world.

Access to cheap energy is clearly attractive to data center developers. Of the ten markets pictured above, six feature energy prices below the US industrial average of 9.2¢ per kilowatt-hour (kWh), including the five biggest. Despite California’s proximity to tech companies, high electricity rates seem to have stunted data center growth there.

Electricity prices are one major advantage that the US has over the European Union in data center construction. In the second half of 2024, commercial electricity prices in Europe averaged €0.19 per kW-hour, around double the comparable US rate.

7. Low vacancy and high demand are pushing up data center rents

Companies often rent space in data centers, and the rents companies pay are often on a per kilowatt basis. During the 2010s, these costs were steadily falling. But that has changed over the last five years, as the industry has gotten caught between strong AI-driven demand and increasing physical constraints. Even after adjusting for inflation, the cost of data center space has risen to its highest level in a decade.

According to CBRE, this is most true with the largest data centers, which “recorded the sharpest increases in lease rates, driven by hyperscale demand, limited power availability and elevated build costs.” In turn, hyperscalers like Microsoft consistently claim that demand for their cloud services outstrips their capacity.

This leads to a situation where major construction is paired with record low vacancy rates, around 1.6% in what CBRE classifies as “primary” markets. So even if tech giants are willing to spend heavily to expand their data centers, physical constraints may prevent them from doing so as quickly as they’d like.

8. Data center power consumption might double by 2030 — or it might not

The International Energy Agency estimates that data centers globally consumed around 415 terawatt-hours (TWh) of electricity in 2024. This figure is expected to more than double to 945 TWh by 2030.

That’s 530 TWh of new demand in six years. Is that a lot? In the same report, the IEA compared it to expected growth in other sources of electricity demand. For example, electric vehicles could add more than 800 TWh of demand by 2030, while air conditioners could add 650 and electric heating could add 450 TWh.

There’s significant uncertainty to projections of data center demand. McKinsey has estimated that data center demand could grow to 1,400 TWh by 2030. Deloitte believes data center demand will be between 700 and 970 TWh. Goldman Sachs has a wide range between 740 and 1,400 TWh.

Unlike other categories, data centers’ electricity demand will be concentrated, straining local grids. But the big picture is the same for any of these estimates: data center electricity growth is going to be significant, but it will only be a modest slice of overall electricity growth as the world tries to decarbonize.

9. Water use is an overrated problem with AI

There’s been a lot of media coverage about data centers guzzling water, harming local communities in the process. But total water usage in data centers is small compared to other uses, as shown in this chart using data compiled by Andy Masley.

In 2023, data centers used around 48 million gallons per day, according to a report from Lawrence Berkeley National Laboratory. That sounds like a lot until you compare it to other uses. AI-related data centers use so little water, relative to golf courses or mines, that you can’t even see the bar.

Although some data centers use water to cool computer chips, this actually isn’t the primary way data centers drive water consumption. More water is used by power plants that generate electricity for data centers. But even if you include these off-site uses, daily water use is about 250 million gallons per day.

That’s not out of proportion” with other industrial processes, according to Bill Shobe, an emeritus professor of environmental economics at the University of Virginia. Shobe told me that “the concerns about water and data centers seem like they get more time than maybe they deserve compared to some other concerns.”

There are still challenges around data centers releasing heated water back into the environment. But by and large, data centers don’t consume much water. In fact, if there is a data center water-use problem, it’s that some municipalities charge too little for water in dry areas like Texas where water is scarce. If these municipalities priced their water appropriately, that would encourage companies to use water more efficiently — or perhaps build data centers in other places where water is abundant.

Subscribe now

10. There’s a lot of demand for AI inference

It’s not often that you get to deal with a quadrillion of something. But in October, Google CEO Sundar Pichai announced that the company was now processing 1.3 quadrillion tokens per month between their product integrations and API offerings. That’s equivalent to processing 160,000 tokens for every person on Earth. That’s more than the length of one Lord of the Rings book for every single person in the world, every month.

It’s difficult to compare Google’s number with other AI providers. Google’s token count includes AI features inside Google’s own products — such as the AI summary that often appears at the top of search results. But OpenAI has also announced numbers in the same ballpark. On October 6, OpenAI announced that it was processing around six billion tokens per minute, or around 260 trillion tokens per month on its developer API. This was about a four-fold increase from the 60 trillion monthly in January 2025 that The Information reported.

11. Consumer AI products are getting more popular — especially ChatGPT

Consumer AI usage has steadily increased over the past three years. While ChatGPT famously reached a million users within five days of its release, it took another 11 months for the service to reach 100 million weekly active users. Since then, reported users have grown to 800 million, though the true number may be slightly lower. An academic paper co-written by OpenAI researchers noted that their estimates double-counted users with multiple accounts; the numbers given by executives may similarly be overestimates.

Other AI services have grown more slowly: Google’s Gemini has 450 million monthly active users per CEO Sundar Pichai, while Anthropic’s Claude currently has around 30 million monthly active users, according to Business Insider.

Subscribe now

12. Tech giants have enough profits to pay for their AI investments

With such high levels of AI investment, one might worry about the financial stability of the firms behind the data center rollout. But for most of the big tech firms, this isn’t a huge issue. Cash flow from operations continues to exceed their infrastructure spending.

There is some variation among the set, though. Google earns so much money from search that they started issuing dividends in 2024, even amidst the capex boom. Microsoft and Meta have also reported solid financial performance. On the other hand, both Amazon and Oracle have had a few recent quarters with negative free cash flow (Amazon’s overall financial health is excellent; Oracle has been accumulating debt, partly as a result of aggressive stock buybacks).

There are some reasons to take companies’ reported numbers with a grain of salt. Meta recently took a 20% stake in a $27 billion joint venture that will build a data center in Louisiana, which Meta will operate. This allows Meta to acquire additional data center capacity without paying the full costs upfront. Notably, Meta agreed to compensate its partners if the data center loses significant value over the first 16 years, which means the deal could be expensive for Meta in the event of an AI downturn.

13. OpenAI expects to lose billions over the next five years

Tech giants like Google, Meta, and Microsoft can finance AI investments using profits from their non-AI products. OpenAI, Anthropic, and xAI do not have this luxury. They need to raise money from outside sources to cover the costs of building data centers and training new models.

This chart, based on reporting from The Information, shows recent OpenAI internal projections of its own cash flow needs. At the start of 2025, OpenAI expected to reach a peak negative cash flow ($20 billion) in 2027. OpenAI expected smaller losses in 2028 and positive cash flow in 2029.

But in recent months, OpenAI’s projections have gotten more aggressive. Now the company expects to reach peak negative cash flow (more than $40 billion) in 2028. And OpenAI doesn’t expect to reach positive cash flow until 2030.

So far, OpenAI hasn’t had trouble raising money; many people are eager to invest in the AI boom. But if public sentiment shifts, fundraising opportunities could dry up quickly.

14. OpenAI deals boost partners’ stock

Over the past two months, OpenAI has made four deals that could lead to the construction of 30 gigawatts of additional data center capacity. According to CNBC, one gigawatt of data center capacity costs around $50 billion at today’s prices. So the overall cost of this new infrastructure could be as high as $1.5 trillion — far more than the $500 billion valuation given to OpenAI in its last fundraising round.

Each of these deals was made with technology companies that were acting as OpenAI suppliers: three deals with chipmakers and one with Oracle, which builds and operates data centers.

OpenAI got favorable terms in each of these deals. Why? One reason is that partnering with OpenAI boosted the partners’ stock price. In total, the four companies gained $636 billion in stock value on the days their respective deals were announced. (Some stocks have since decreased slightly in value).

It’s unclear whether these deals will fully come to fruition as planned. 30 gigawatts is a huge amount of capacity. It’s almost two thirds of the total American data center capacity in operation today (according to Baxtel, a data center consultancy).

It also dwarfs OpenAI’s current data center capacity. In a recent internal Slack note, reported by Alex Heath of Sources, Sam Altman wrote that OpenAI started the year with “around” 230 megawatts of capacity, and that the company is “now on track to exit 2025 north of 2 gigawatts of operational capacity.”

15. OpenAI’s annualized revenue has risen to $13 billion

The key question for investors is how quickly AI startups can grow their revenue. Thus far, both OpenAI and Anthropic have shown impressive revenue growth, at least according to figures reported in the media. OpenAI expects $13 billion in revenue in 2025, while Anthropic recently told Reuters that its “annual revenue run rate is approaching $7 billion.”

Both companies are still losing billions of dollars a year, however, so continued growth is necessary.

OpenAI and Anthropic have different primary revenue streams. 70% of OpenAI’s revenue comes from consumer ChatGPT subscriptions. Meanwhile, Anthropic earns 80% of its revenue from enterprise customers, according to Reuters. Most of that revenue appears to come from selling access to Claude models via an API.

Subscribe now

16. OpenAI predicts huge revenue growth

This chart from The Information shows OpenAI’s internal revenue projections.

After generating $13 billion this year, OpenAI hopes to generate $30 billion next year, $60 billion in 2027, and a whopping $200 billion in 2030. As you can see, OpenAI’s revenue projections have gotten more optimistic over the course of 2025. At the start of the year, the company was projecting “only” $174 billion in revenue in 2030.

OpenAI hopes to diversify its revenue streams over the next few years. The company expects ChatGPT subscriptions will continue to be the biggest moneymaker. But OpenAI is looking for healthy growth in its API business. And the company hopes that agents like Codex will generate tens of billions of dollars per year by the end of the decade.

The AI giant is also looking to generate around $50 billion in revenue from new products, including showing ads to free users of OpenAI products. OpenAI needs strategies to make money off the 95% of ChatGPT users who do not currently pay for a subscription. This is probably a large part of the logic behind OpenAI’s recent release of in-chat purchases.

Anthropic has similarly forecast that its annualized revenue could reach $26 billion by the end of 2026, up from $6 to $7 billion today.

These predictions are aggressive: a recent analysis by Greg Burnham of Epoch AI was unable to find any American companies that have gone from $10 billion in annual revenue to $100 billion in less than seven years. OpenAI predicts that it will take fewer than four.

On the other hand, Burnham found that OpenAI was potentially the second fastest company ever to go from $1 billion to $10 billion, after pandemic-era Moderna. If OpenAI can sustain its current pace of growth (roughly 3x per year), it will be able to hit its revenue targets.

Whether OpenAI and Anthropic can do so is already a trillion dollar question.

Thanks to Joey Politano, Nat Purser, and Cathy Kunkel for helpful comments on this article.

If you enjoy this article, please subscribe to get future pieces delivered straight to your inbox. It’s free!

1

This number is based on two proxies. First, Epoch AI estimates that 74% of current GPU-intensive data capacity is located in the United States. Second, big tech companies have reported that a large majority of their long-lived assets (which includes data centers) are located in the US. Specifically, at the end of its most recent fiscal year Microsoft had 60.2% of its long-lived assets in the US. The figure was 73.5% for Amazon, 75.3% for Google, 75.6% for Oracle, and 86.2% for Meta.

Sora, OpenAI’s chart-topping AI video app, explained

2025-10-09 04:56:04

On September 25, Meta announced Vibes, a “new way to discover, create, and share AI videos.” The next week, OpenAI announced a new app called Sora for creating and sharing AI-generated videos.

The public reaction to these launches could not have been more different. As I write this, the iOS App Store ranks Sora as its number one free app. The Meta AI app…

Read more

AI isn't replacing radiologists

2025-10-01 21:03:30

We’re pleased to publish this guest post from Deena Mousa, a researcher at Open Philanthropy. If you’d like to read more of her work, please subscribe to her Substack or follow her on Twitter.

The piece was originally published by Works in Progress, a truly excellent magazine, which is part of Stripe. You can subscribe to their email newsletter or (if you live in the US or UK) their new print edition.


CheXNet can detect pneumonia with greater accuracy than a panel of board-certified radiologists. It is an AI model released in 2017, trained on more than 100,000 chest X-rays. It is fast, free, and can run on a single consumer-grade GPU. A hospital can use it to classify a new scan in under a second.

Since then, companies like Annalise.ai, Lunit, Aidoc, and Qure.ai have released models that can detect hundreds of diseases across multiple types of scans with greater accuracy and speed than human radiologists in benchmark tests. Some products can reorder radiologist worklists to prioritize critical cases, suggest next steps for care teams, or generate structured draft reports that fit into hospital record systems. A few, like IDx-DR, are even cleared to operate without a physician reading the image at all. In total, there are over 700 FDA-cleared radiology models, which account for roughly three-quarters of all medical AI devices.

Radiology is a field optimized for human replacement, where digital inputs, pattern recognition tasks, and clear benchmarks predominate. In 2016, Geoffrey Hinton – computer scientist and Turing Award winner – declared that “people should stop training radiologists now.” If the most extreme predictions about the effect of AI on employment and wages were true, then radiology should be the canary in the coal mine.

But demand for human labor is higher than ever. In 2025, American diagnostic radiology residency programs offered a record 1,208 positions across all radiology specialties, a four percent increase from 2024, and the field’s vacancy rates are at all-time highs. In 2025, radiology was the second-highest-paid medical specialty in the country, with an average income of $520,000, over 48 percent higher than the average salary in 2015.

Three things explain this. First, while models beat humans on benchmarks, the standardized tests designed to measure AI performance, they struggle to replicate this performance in hospital conditions. Most tools can only diagnose abnormalities that are common in training data, and models often don’t work as well outside of their test conditions. Second, attempts to give models more tasks have run into legal hurdles: regulators and medical insurers so far are reluctant to approve or cover fully autonomous radiology models. Third, even when they do diagnose accurately, models replace only a small share of a radiologist’s job. Human radiologists spend a minority of their time on diagnostics and the majority on other activities, like talking to patients and fellow clinicians.

Artificial intelligence is rapidly spreading across the economy and society. But radiology shows us that it will not necessarily dominate every field in its first years of diffusion — at least until these common hurdles are overcome. Exploiting all of its benefits will involve adapting it to society, and society’s rules to it.

Islands of automation

All AIs are functions or algorithms, called models, that take in inputs and spit out outputs. Radiology models are trained to detect a finding, which is a measurable piece of evidence that helps identify or rule out a disease or condition. Most radiology models detect a single finding or condition in one type of image. For example, a model might look at a chest CT and answer whether there are lung nodules, rib fractures, or what the coronary arterial calcium score is.

For every individual question, a new model is required. In order to cover even a modest slice of what they see in a day, a radiologist would need to switch between dozens of models and ask the right questions of each one. Several platforms manage, run, and interpret outputs from dozens or even hundreds of separate AI models across vendors, but each model operates independently, analyzing for one finding or disease at a time. The final output is a list of separate answers to specific questions, rather than a single description of an image.

Even with hundreds of imaging algorithms approved by the Food and Drug Administration (FDA) on the market, the combined footprint of today’s radiology AI models still cover only a small fraction of real-world imaging tasks. Many cluster around a few use cases: stroke, breast cancer, and lung cancer together account for about 60 percent of models, but only a minority of the actual radiology imaging volume that is carried out in the US. Other subspecialties, such as vascular, head and neck, spine, and thyroid imaging currently have relatively few AI products. This is in part due to data availability: the scan needs to be common enough for there to be many annotated examples that can be used to train models. Some scans are also inherently more complicated than others. For example, ultrasounds are taken from multiple angles and do not have standard imaging planes, unlike X-rays.

Once deployed outside of the hospital where they were initially trained, models can struggle. In a standard clinical trial, samples are taken from multiple hospitals to ensure exposure to a broad range of patients and to avoid site-specific effects, such as a single doctor’s technique or how a hospital chooses to calibrate its diagnostic equipment.1 But when an algorithm is undergoing regulatory approval in the US, its developers will normally test it on a relatively narrow dataset. Out of the models in 2024 that reported the number of sites where they were tested, 38 percent were tested on data from a single hospital. Public benchmarks tend to rely on multiple datasets from the same hospital.

The performance of a tool can drop as much as 20 percentage points when it is tested out of sample, on data from other hospitals. In one study, a pneumonia detection model trained on chest X-rays from a single hospital performed substantially worse when tested at a different hospital. Some of these challenges stemmed from avoidable experimental issues like overfitting, but others are indicative of deeper problems like differences in how hospitals record and generate data, such as using slightly different imaging equipment. This means that individual hospitals or departments would need to retrain or revalidate today’s crop of tools before adopting them, even if they have been proven elsewhere.

The limitations of radiology models stem from deeper problems with building medical AI. Training datasets come with strict inclusion criteria, where the diagnosis must be unambiguous (typically confirmed by a consensus of two to three experts or a pathology result) and without images that are shot at an odd angle, look too dark, or are blurry. This skews performance towards the easiest cases, which doctors are already best at diagnosing, and away from real-world images. In one 2022 study, an algorithm that was meant to spot pneumonia on chest X-rays faltered when the disease presented in subtle or mild forms, or when other lung conditions resembled pneumonia, such as pleural effusions, where fluid builds up in lungs, or in atelectasis (collapsed lung). Humans also benefit from context: one radiologist told me about a model they use that labels surgical staples as hemorrhages, because of the bright streaks they create in the image.

Medical imaging datasets used for training also tend to have fewer cases from children, women, and ethnic minorities, making their performance generally worse for these demographics. Many lack information about the gender or race of cases at all, making it difficult to adjust for these issues and address the problem of bias. The result is that radiology models often predict only a narrow slice of the world,2 though there are scenarios where AI models do perform well, including identifying common diseases like pneumonia or certain tumors.

The problems don’t stop there. Even a model for the precise question you need and in the hospital where it was trained is unlikely to perform as well in clinical practice as it did in the benchmark. In benchmark studies, researchers isolate a cohort of scans, define goals in quantitative metrics, such as the sensitivity (the percentage of people with the condition who are correctly identified by the test) and specificity (the percentage of people without the condition who are correctly identified as such), and compare the performance of a model to the score of another reviewer, typically a human doctor. Clinical studies, on the other hand, show how well the model performs in a real healthcare setting without controls. Since the earliest days of computer-aided diagnosis, there has been a gulf between benchmark and clinical performance.

In the 1990s, computer-aided diagnosis, effectively rudimentary AI systems, were developed to screen mammograms, or X-rays of breasts that are performed to look for breast cancer. In trials, the combination of humans and computer-aided diagnosis systems outperformed humans alone in accuracy when evaluating mammograms. More controlled experiments followed, which pointed to computer-aided diagnosis helping radiologists pick up more cancer with minimal costs.

The FDA approved mammography computer-aided diagnosis in 1998, and Medicare started to reimburse the use of computer-aided diagnosis in 2001. The US government paid radiologists $7 more to report a screening mammogram if they used the technology; by 2010, approximately 74 percent of mammograms in the country were read by computer-aided diagnosis alongside a clinician.

But computer-aided diagnosis turned out to be a disappointment. Between 1998 and 2002 researchers analyzed 430,000 screening mammograms from 200,000 women at 43 community clinics in Colorado, New Hampshire, and Washington. Among the seven clinics that turned to computer-aided detection software, the machines flagged more images, leading to clinicians conducting 20 percent more biopsies, but uncovering no more cancer than before. Several other large clinical studies had similar findings.

Another way to measure performance is to compare having computerized help to a second clinician reading every film, called ‘double reading’. Across ten trials and seventeen studies of double reading, researchers found that computer aids did not raise the cancer detection rate but led to patients being called back an additional ten percent more often. In contrast, having two readers caught more cancers while slightly lowering callbacks. Computer-aided detection was worse than standard care, and much worse than another pair of eyes. In 2018, Medicare stopped reimbursing doctors more for mammograms read with computer-aided diagnosis than those read by a radiologist alone.

One explanation for this gap is that people behave differently if they are treating patients day to day than when they are part of laboratory studies or other controlled experiments.3In particular, doctors appear to defer excessively to assistive AI tools in clinical settings in a way that they do not in lab settings. They did this even with much more primitive tools than we have today: one clinical trial all the way back in 2004 asked 20 breast screening specialists to read mammogram cases with the computer prompts switched on, then brought in a new group to read the identical films without the software. When guided by computer aids, doctors identified barely half of the malignancies, while those reviewing without the model caught 68 percent. The gap was largest when computer aids failed to recognize the malignancy itself; many doctors seemed to treat an absence of prompts as reassurance that a film was clean. Another review, this time from 2011, found that when a system gave incorrect guidance, clinicians were 26 percent more likely to make a wrong decision than unaided peers.

Works in Progress is becoming a print magazine – you can subscribe here.

Humans in the loop

It would seem as if better models and more automation could together fix the problems of current-day AI for radiology. Without a doctor involved whose behavior might change we might expect real-world results to match benchmark scores. But regulatory requirements and insurance policies are slowing the adoption of fully autonomous radiology AI.

The FDA splits imaging software into two regulatory lanes: assistive or triage tools, which require a licensed physician to read the scan and sign the chart, and autonomous tools, which do not. Makers of assistive tools simply have to show that their software can match the performance of tools that are already on the market. Autonomous tools have to clear a much higher bar: they must demonstrate that the AI tool will refuse to read any scan that is blurry, uses an unusual scanner, or is outside its competence. The bar is higher because, once the human disappears, a latent software defect could harm thousands before anyone notices.

Meeting that criteria is difficult. Even state-of-the-art vision networks falter with images that lack contrast, have unexpected angles, or lots of different artifacts. IDx-DR, a diabetic retinopathy screener and one of the few cleared to operate autonomously, comes with guardrails: the patient must be an adult with no prior retinopathy diagnosis; there must be two macula-centred photographs of the fundus (the rear of the eye) with a resolution of at least 1,000 times 1,000 pixels; if glare, small pupils or poor focus degrade quality, the software must self-abort and refer the patient to an eye care professional.

Stronger evidence and improved performance could eventually clear both hurdles, but other requirements would still delay widespread use. For example, if you retrain a model, you are required to receive new approval even if the previous model was approved. This contributes to the market generally lagging behind frontier capabilities.

And when autonomous models are approved, malpractice insurers are not eager to cover them. Diagnostic error is the costliest mistake in American medicine, resulting in roughly a third of all malpractice payouts, and radiologists are perennial defendants. Insurers believe that software makes catastrophic payments more likely than a human clinician, as a broken algorithm can harm many patients at once. Standard contract language now often includes phrases such as: ‘Coverage applies solely to interpretations reviewed and authenticated by a licensed physician; no indemnity is afforded for diagnoses generated autonomously by software’. One insurer, Berkley, even carries the blunter label ‘Absolute AI Exclusion’.

Without malpractice coverage, hospitals cannot afford to let algorithms sign reports. In the case of IDx-DR, the vendor, Digital Diagnostics, includes a product liability policy and an indemnity clause. This means that if the clinic used the device exactly as the FDA label prescribes, with adult patients, good-quality images, and no prior retinopathy, then the company will reimburse the clinic for damages traceable to algorithmic misclassification.

Today, if American hospitals wanted to adopt AI for fully independent diagnostic reads, they would need to believe that autonomous models deliver enough cost savings or throughput gains to justify pushing for exceptions to credentialing and billing norms. For now, usage is too sparse to make a difference. One 2024 investigation estimated that 48 percent of radiologists are using AI at all in their practice. A 2025 survey reported that only 19 percent of respondents who have started piloting or deploying AI use cases in radiology reported a ‘high’ degree of success.

Better AI, more MRIs

Even if AI models become accurate enough to read scans on their own and are cleared to do so, radiologists may still find themselves busier, rather than out of a career.

Radiologists are useful for more than reading scans; a study that followed staff radiologists in three different hospitals in 2012 found that only 36 percent of their time was dedicated to direct image interpretation. More time is spent on overseeing imaging examinations, communicating results and recommendations to the treating clinicians and occasionally directly to patients, teaching radiology residents and technologists who conduct the scans, and reviewing imaging orders and changing scanning protocols.4 This means that, if AI were to get better at interpreting scans, radiologists may simply shift their time toward other tasks. This would reduce the substitution effect of AI.

As tasks get faster or cheaper to perform, we may also do more of them. In some cases, especially if lower costs or faster turnaround times open the door to new uses, the increase in demand can outweigh the increase in efficiency, a phenomenon known as Jevons Paradox. This has historical precedent in the field: in the early 2000s hospitals swapped film jackets for digital systems. Hospitals that digitized improved radiologist productivity, and time to read an individual scan went down. A study at Vancouver General found that the switch boosted radiologist productivity 27 percent for plain radiography and 98 percent for CT within a year of going filmless. This occurred alongside other advancements in imaging technology that made scans faster to execute. Yet, no radiologists were laid off.

Instead, the overall American utilization rate per 1,000 insured patients for all imaging increased by 60 percent from 2000 to 2008. This is not explained by a commensurate increase in physician visits. Instead, each visit was associated with more imaging on average. Before digitization, the nonmonetary price of imaging was high: the median reporting turnaround time for X-rays was 76 hours for patients discharged from emergency departments, and 84 hours for admitted patients. After departments digitized, these times dropped to 38 hours and 35 hours, respectively.

Faster scans give doctors more options. Until the early 2000s, only exceptional trauma cases would receive whole-body CT scans; the increased speed of CT turnaround times mean that they are now a common choice. This is a reflection of elastic demand, a concept in economics that describes when demand for a product or service is very sensitive to changes in price. In this case, when these scans got cheaper in terms of waiting time, demand for those scans increased.

The first decade of diffusion

Over the past decade, improvements in image interpretation have run far ahead of their diffusion. Hundreds of models can spot bleeds, nodules, and clots, yet AI is often limited to assistive use on a small subset of scans in any given practice. And despite predictions to the contrary, head counts and salaries have continued to rise. The promise of AI in radiology is overstated by benchmarks alone.

Multi‑task foundation models may widen coverage, and different training sets could blunt data gaps. But many hurdles cannot be removed with better models alone: the need to counsel the patient, shoulder malpractice risk, and receive accreditation from regulators. Each hurdle makes full substitution the expensive, risky option and human plus machine the default. Sharp increases in AI capabilities could certainly alter this dynamic, but it is a useful model for the first years of AI models that benchmark well at tasks associated with a particular career.

There are industries where conditions are different. Large platforms rely heavily on AI systems to triage or remove harmful or policy-violating content. At Facebook and Instagram, 94 percent and 98 percent of moderation decisions respectively are made by machines. But many of the more sophisticated knowledge jobs look more like radiology.

In many jobs, tasks are diverse, stakes are high, and demand is elastic. When this is the case, we should expect software to initially lead to more human work, not less. The lesson from a decade of radiology models is neither optimism about increased output nor dread about replacement. Models can lift productivity, but their implementation depends on behavior, institutions and incentives. For now, the paradox has held: the better the machines, the busier radiologists have become.

1

A few groups have started doing this, like the 2025 ‘OpenMIBOOD’ suite which explicitly scores chest-X-ray models on 14 out-of-distribution collections, but that hasn’t yet become standard.

2

A few companies and research groups are working to mitigate this, such as by training on multi-site datasets, building synthetic cases, or using self-supervised learning to reduce labeling needs, but these approaches are still early and expensive. This limitation is an important reason why AI models do not yet perform as expected.

3

One study tracked 27 mammographers and compared how well each interpreted real screening films versus a standardised ‘test-set’ of the same images. The researchers found no meaningful link between a radiologist’s accuracy in the lab and accuracy on live patients; the statistical correlation in sensitivity-specificity scores was essentially zero.

4

This dynamic is not exclusive to radiology. A study that tracked US occupations since 1980 found that those that adopted computers actually hired people faster than others, about 0.9 percent a year, because workers shifted to new, complementary tasks that the machines could not handle.

How AI is shaking up the study of earthquakes

2025-09-29 21:30:53

On January 1, 2008, at 1:59 AM in Calipatria, California, an earthquake happened. You haven’t heard of this earthquake; even if you had been living in Calipatria, you wouldn’t have felt anything. It was magnitude -0.53, about the same amount of shaking as a truck passing by. Still, this earthquake is notable: not because it was large, but because it was small and yet we know about it.

Over the past seven years, AI tools based on computer imaging have almost completely automated one of the fundamental tasks of seismology: detecting earthquakes. What used to be the task of human analysts and later, simpler computer programs, can now be done automatically and quickly by machine learning tools.

These machine learning tools can detect smaller earthquakes than human analysts, especially in noisy environments like cities. Earthquakes give valuable information about the composition of the Earth and what hazards might occur in the future.

“In the best-case scenario, when you adopt these new techniques, even on the same old data, it’s kind of like putting on glasses for the first time, and you can see the leaves on the trees,” said Kyle Bradley, co-author of the Earthquake Insights newsletter.

Subscribe now

I talked with several earthquake scientists, and they all agreed that machine learning methods have replaced humans for the better in these specific tasks.

“It’s really remarkable,” Judith Hubbard, a Cornell professor and Bradley’s co-author, told me.

Less certain is what comes next. Earthquake detection is a fundamental part of seismology, but there are many other data processing tasks that have yet to be disrupted. The biggest potential impacts, all the way to earthquake forecasting, haven’t materialized yet.

“It really was a revolution,” said Joe Byrnes, a professor at the University of Texas at Dallas. “But the revolution is ongoing.”

What do seismologists do?

"I live next to a wall of rock 20 miles thick. There's no way around or over it. I'm trapped on this side forever. 
"I study the stuff on the other side."

Mantle Geology seems like the most frustrating field.
(xkcd CC BY-NC 2.5)

When an earthquake happens in one place, the shaking passes through the ground similar to how sound waves pass through the air. In both cases, it’s possible to draw inferences about the materials the waves pass through.

Imagine tapping a wall to figure out if it’s hollow. Because a solid wall vibrates differently than a hollow wall, you can figure out the structure by sound.

With earthquakes, this same principle holds. Seismic waves pass through different materials (rock, oil, magma, etc.) differently, and scientists use these vibrations to image the Earth’s interior.

The main tool that scientists traditionally use is a seismometer. These record the movement of the Earth in three directions: up–down, north–south, and east–west. If an earthquake happens, seismometers can measure the shaking in that particular location.

An old-fashioned physical seismometer. Today, seismometers record data digitally. (From Yamaguchi先生 on Wikimedia CC BY-SA 3.0)

Scientists then process raw seismometer information to identify earthquakes.

Earthquakes produce multiple types of shaking, which travel at different speeds. Two types, Primary (P) waves and Secondary (S) waves are particularly important, and scientists like to identify the start of each of these phases.

Finding quakes before machine learning

Before good algorithms, earthquake cataloging had to happen by hand. Byrnes said that “traditionally, something like the lab at the United States Geological Survey would have an army of mostly undergraduate students or interns looking at seismograms.”

However, there are only so many earthquakes you can find and classify manually. Creating algorithms to effectively find and process earthquakes has long been a priority in the field — especially since the arrival of computers in the early 1950s.

“The field of seismology historically has always advanced as computing has advanced,” Bradley told me.

There’s a big challenge with traditional algorithms though: they can’t easily find smaller quakes, especially in noisy environments.

Composite seismogram of common events. Note how each event has a slightly different shape. (From EarthScope Consortium CC BY 4.0)

As we see in this seismogram above, many different events can cause seismic signals. If a method is too sensitive, it risks falsely detecting events as earthquakes. The problem is especially bad in cities, where the constant hum of traffic and buildings can drown out small earthquakes.

However, earthquakes have a characteristic “shape.” The magnitude 7.7 earthquake above looks quite different from the helicopter landing, for instance.

So one idea scientists had was to make templates from human-labeled datasets. If a new waveform correlates closely with an existing template, it’s almost certainly an earthquake.

Template matching works very well if you have enough human-labeled examples. In 2019, Zach Ross’s lab at Caltech used template matching to find ten times as many earthquakes in Southern California as had previously been known, including the earthquake at the start of this story. Almost all of the new 1.6 million quakes they found were very small, magnitude 1 and below.

If you don’t have an extensive pre-existing dataset of templates, however, you can’t easily apply template matching. That isn’t a problem in Southern California — which already had a basically complete record of earthquakes down to magnitude 1.7 — but it’s a challenge elsewhere.

Also, template matching is computationally expensive. Creating a Southern California quake dataset using template matching took 200 Nvidia P100 GPUs running for days on end.

There had to be a better way.

Subscribe now

Breaking down the Earthquake Transformer

AI detection models solve both these problems:

  • They are faster than template matching. Because AI detection models are very small (around 350,000 parameters compared to billions in LLMs), they can be run on consumer CPUs.

  • AI models generalize well to regions not represented in the original dataset.

As an added bonus, AI models can give better information about when the different types of earthquake shaking arrive. Timing the arrivals of the two most important waves — P and S waves — is called phase picking. It allows scientists to draw inferences about the structure of the quake. AI models can do this alongside earthquake detection.

The basic task of earthquake detection (and phase picking) looks like this:

Fig. 4
Cropped figure from Earthquake Transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. CC BY 4.0

The first three rows represent different directions of vibration (east–west, north–south, and up–down respectively). Given these three dimensions of vibration, can we determine if an earthquake occurred, and if so, when it started?

Ideally, our model outputs three things at every time step in the sample:

  1. The probability that an earthquake is occurring at that moment.

  2. The probability that a P wave arrives at that moment.

  3. The probability that an S wave arrives at that moment.

We see all three outputs in the fourth row: the detection in green, the P wave arrival in blue, and the S wave arrival in red. (There are two earthquakes in this sample).

To train an AI model, scientists take large amounts of labeled data like what’s above and do supervised training. I’ll describe one of the most used models: Earthquake Transformer, which was developed around 2020 by a Stanford team led by S. Mostafa Mousavi — who later became a Harvard professor.

Like many earthquake detection models, Earthquake Transformer adapts ideas from image classification. Readers may be familiar with AlexNet, a famous image recognition model that kicked off the deep learning boom in 2012.

AlexNet used convolutions, a neural network architecture that’s based on the idea that pixels that are physically close together are more likely to be related. The first convolutional layer of AlexNet broke an image down into small chunks — 11 pixels on a side — and classified each chunk based on the presence of simple features like edges or gradients.

The next layer took the first layer’s classifications as input and checked for higher-level concepts such as textures or simple shapes.

Each convolutional layer analyzed a larger portion of the image and operated at a higher level of abstraction. By the final layers, the network was looking at the entire image and identifying objects like “mushroom” and “container ship.”

Images are two-dimensional, so AlexNet is based on two-dimensional convolutions. In contrast, seismograph data is one-dimensional, so Earthquake Transformer uses one-dimensional convolutions over the time dimension. The first layer analyzes vibration data in 0.1-second chunks, while later layers identify patterns over progressively longer time periods.

It’s difficult to say what exact patterns the earthquake model is picking out, but we can analogize this to a hypothetical audio transcription model using one-dimensional convolutions. That model might first identify consonants, then syllables, then words, then sentences over increasing time scales.

Earthquake Transformer converts raw waveform data into a collection of high-level representations that indicate the likelihood of earthquakes and other seismologically significant events. This is followed by a series of deconvolution layers that pinpoint exactly when an earthquake — and its all-important P and S waves — occurred.

The model also uses an attention layer in the middle of the model to mix information between different parts of the time series. The attention mechanism is most famous in large language models, where it helps pass information between words. It plays a similar role in seismographic detection. Earthquake seismograms have a general structure: P waves followed by S waves followed by other types of shaking. So if a segment looks like the start of a P wave, the attention mechanism helps it check that it fits into a broader earthquake pattern.

Scaling earthquake data

All of the Earthquake Transformer’s components are standard designs from the neural network literature. Other successful detection models, like PhaseNet, are even simpler. PhaseNet uses only one-dimensional convolutions to pick the arrival times of earthquake waves. There are no attention layers.

Generally, there hasn’t been “much need to invent new architectures for seismology,” according to Byrnes. The techniques derived from image processing have been sufficient.

What made these generic architectures work so well then? Data. Lots of it.

We’ve previously reported on how the introduction of ImageNet, an image recognition benchmark, helped spark the deep learning boom. Large, publicly available earthquake datasets have played a similar role in seismology.

Earthquake Transformer was trained using the Stanford Earthquake Dataset (STEAD), which contains 1.2 million human-labeled segments of seismogram data from around the world. (The paper for STEAD explicitly mentions ImageNet as an inspiration). Other models, like PhaseNet, were also trained on hundreds of thousands or millions of labeled segments.

All recorded earthquakes in the Stanford Earthquake Dataset. (CC BY 4.0)

The combination of the data and the architecture just works. The current models are “comically good” at identifying and classifying earthquakes, according to Byrnes. Typically, machine learning methods find ten or more times the quakes that were previously identified in an area. You can see this directly in an Italian earthquake catalog:

AI tools won’t necessarily detect more earthquakes than template matching. But AI-based techniques are much less compute- and labor-intensive, making them more accessible to the average research project and easier to apply in regions around the world.

All in all, these machine learning models are so good that they’ve almost completely supplanted traditional methods for detecting and phase-picking earthquakes, especially for smaller magnitudes.

What does all this AI stuff do?

The holy grail of earthquake science is earthquake prediction. For instance, scientists know that a large quake will happen near Seattle, but have little ability to know whether it will happen tomorrow or in a hundred years. It would be helpful if we could predict earthquakes precisely enough to allow people in affected areas to evacuate.

You might think AI tools would help predict earthquakes, but that doesn’t seem to have happened yet.

The applications are more technical, and less flashy, said Cornell’s Judith Hubbard.

Better AI models have given seismologists much more comprehensive earthquake catalogs, which have unlocked “a lot of different techniques,” Bradley said.

One of the coolest applications is in understanding and imaging volcanoes. Volcanic activity produces a large number of small earthquakes, whose locations help scientists understand the structure of the magma system. In a 2022 paper, John Wilding and co-authors used a large AI-generated earthquake catalog to create this incredible image of the structure of the Hawaiian volcanic system.

Each dot represents an individual earthquake. (From Wilding et al., The magmatic web beneath Hawai‘i.)

They provided direct evidence of a previously hypothesized magma connection between the deep Pāhala sill complex and Mauna Loa’s shallow volcanic structure. You can see this in the image with the arrow labeled as Pāhala-Mauna Loa seismicity band. The authors were also able to clarify the structure of the Pāhala sill complex into discrete sheets of magma. This level of detail could potentially facilitate better real-time monitoring of earthquakes and more accurate eruption forecasting.

Another promising area is lowering the cost of dealing with huge datasets. Distributed Acoustic Sensing (DAS) is a powerful technique that uses fiber-optic cables to measure seismic activity across the entire length of the cable. A single DAS array can produce “hundreds of gigabytes of data” a day, according to Jiaxuan Li, a professor at the University of Houston. That much data can produce extremely high resolution datasets — enough to pick out individual footsteps.

AI tools make it possible to very accurately time earthquakes in DAS data. Before the introduction of AI techniques for phase picking in DAS data, Li and some of his collaborators attempted to use traditional techniques. While these “work roughly,” they weren’t accurate enough for their downstream analysis. Without AI, much of his work would have been “much harder,” he told me.

Li is also optimistic that AI tools will be able to help him isolate “new types of signals” in the rich DAS data in the future.

Subscribe now

Not all AI techniques have paid off

As in many other scientific fields, seismologists face some pressure to adopt AI methods whether or not it’s relevant to their research.

“The schools want you to put the word AI in front of everything,” Byrnes said. “It’s a little out of control.”

This can lead to papers that are technically sound but practically useless. Hubbard and Bradley told me that they’ve seen a lot of papers based on AI techniques that “reveal a fundamental misunderstanding of how earthquakes work.”

They pointed out that graduate students can feel pressure to specialize in AI methods at the cost of learning less about the fundamentals of the scientific field. They fear that if this type of AI-driven research becomes entrenched, older methods will get “out-competed by a kind of meaninglessness.”

While these are real issues, and ones Understanding AI has reported on before, I don’t think they detract from the success of AI earthquake detection. In the last five years, an AI-based workflow has almost completely replaced one of the fundamental tasks in seismology for the better.

That’s pretty cool.

The case for AI doom isn't very convincing

2025-09-26 04:16:40

A striking thing about the AI industry is how many insiders believe AI could pose an existential risk to humanity.

Just last week, Anthropic CEO Dario Amodei described himself as “relatively an optimist” about AI. But he said there was a “25 percent chance that things go really really badly.” Among the risks Amodei worries about: “the autonomous danger of the model.”

In a 2023 interview, OpenAI CEO Sam Altman was blunter, stating that the worst-case scenario was “lights out for all of us.”

No one has done more to raise these concerns than rationalist gadfly Eliezer Yudkowsky. In a new book with co-author Nate Soares, Yudkowsky doesn’t mince words: If Anyone Builds It, Everyone Dies. Soares and Yudkowsky believe that if anyone invents superintelligent AI, it will take over the world and kill everyone.

Normally, when someone predicts the literal end of the world, you can write them off as a kook. But Yudkowsky is hard to dismiss. He has been warning about these dangers since the early 2010s, when he (ironically) helped get some of the leading AI companies off the ground. Legendary AI researchers like Geoffrey Hinton and Yoshua Bengio take Yudkowsky’s concerns seriously.

So is Yudkowsky right? In my mind, there are three key steps to his argument:

  1. Humans are on a path to develop AI systems with superhuman intelligence.

  2. These systems will gain a lot of power over the physical world.

  3. We don’t know how to ensure these systems use their power for good rather than evil.

Outside the AI industry, debate tends to focus on the first claim; many normie skeptics think superintelligent AI is simply too far away to worry about. Personally, I think these skeptics are too complacent. I don’t know how soon AI systems will surpass human intelligence, but I expect progress to be fast enough over the next decade that we should start taking these questions seriously.

Inside the AI industry, many people accept Yudkowsky’s first two premises—superintelligent AI will be created and become powerful—but they disagree about whether we can get it to pursue beneficial goals instead of harmful ones. There’s now a sprawling AI safety community exploring how to align AI systems with human values.

But I think the weakest link in Yudkowsky and Soares’s argument is actually the second claim: that an AI system with superhuman intelligence would become so powerful it could kill everyone. I have no doubt that AI will give people new capabilities and solve long-standing problems. But I think the authors wildly overestimate how transformational the technology will be—and dramatically underestimate how easy it will be for humans to maintain control.

Subscribe now

Grown, not crafted

Over the last two centuries, humans have used our intelligence to dramatically increase our control over the physical world. From airplanes to antibiotics to nuclear weapons, modern humans accomplish feats that would have astonished our ancestors.

Yudkowsky and Soares believe AI will unlock another, equally large, jump in our (or perhaps just the AI’s) ability to control the physical world. And the authors expect this transformation to happen over months rather than decades.

Biology is one area where the authors expect radical acceleration.

“The challenge of building custom-designed biological technology is not so much one of producing the tools to make it, as it is one of understanding the design language, the DNA and RNA,” Yudkowsky and Soares argue. According to these authors, “our best wild guess is that it wouldn’t take a week” for a superintelligent AI system to “crack the secrets of DNA” so that it could “design genomes that yielded custom life forms.”

Photo by Jackyenjoyphotography / Getty

For example, they describe trees as “self-replicating factories that spin air into wood” and conclude that “any intelligence capable of comprehending biochemistry at the deepest level is capable of building its own self-replicating factories to serve its own purposes.”

Ironically, I think the first four chapters of the book do a good job of explaining why it’s probably not that simple.

These chapters argue that AI alignment is a fool’s errand. Due to the complexity of AI models and the way they’re trained, the authors say, humans won’t be able to design AI models to predictably follow human instructions or prioritize human values. I think this argument is correct, but it has broader implications than the authors acknowledge.

Here’s a key passage from Chapter 2 of If Anyone Builds It:

The way humanity finally got to the level of ChatGPT was not by finally comprehending intelligence well enough to craft an intelligent mind. Instead, computers became powerful enough that AIs can be churned out by gradient descent, without any human needing to understand the cognitions that grow inside.

Which is to say: engineers failed at crafting AI, but eventually succeeded in growing it.

“You can’t grow an AI that does what you want just by training it to be nice and hoping,” they write. “You don’t get what you train for.”

The authors draw an analogy to evolution, another complex process with frequently surprising results. For example, the long, colorful tails of male peacocks make it harder for them to flee predators. So why do they have them? At some point, early female peacocks developed a preference for large-tailed males, and this led to a self-reinforcing dynamic where males grew ever larger tails to improve their chances of finding a mate.

“If you ran the process [of evolution] again in very similar circumstances you’d get a wildly different result” than large-tailed peacocks, the authors argue. “The result defies what you might think natural selection should do, and you can’t predict the specifics no matter how clever you are.”

I love this idea that some systems are so complex that “you can’t predict the specifics no matter how clever you are.” But there’s an obvious tension with the idea that after an AI system “cracks the secret of DNA” it will be able to rapidly invent “custom life forms” and “self-replicating factories” that serve the purposes of the AI.

Yudkowsky and Soares believe that some systems are too complex for humans to fully understand or control, but superhuman AI won’t have the same limitations. They believe that AI systems will become so smart that they’ll be able to create and modify living organisms as easily as children rearrange Lego blocks. Once an AI system has this kind of predictive power, it could become trivial for it to defeat humanity in a conflict.

But I think the difference between grown and crafted systems is more fundamental. Some of the most important systems—including living organisms—are so complex that no one will ever be able to fully understand or control them. And this means that raw intelligence only gets you so far. At some point you need to perform real-world experiments to see if your predictions hold up. And that is a slow and error-prone process.

And not just in the domain of biology. Military conflicts, democratic elections, and cultural evolution are other domains that are beyond the predictive power—and hence the control—of even the smartest humans. Many doomers expect that superintelligent AIs won’t face such limitations—that they’ll be able to perfectly predict the outcome of battles or deftly manipulate the voting public to achieve its desired outcome in elections.

But I’m skeptical. I suspect that large-scale social systems like this are so complex that it’s impossible to perfectly understand and control them no matter how clever you are. Which isn’t to say that future AI systems won’t be helpful for winning battles or influencing elections. But the idea that superintelligence will yield God-like capabilities in these areas seems far-fetched.

Subscribe now

Chess is a poor model

Yudkowsky and Soares repeatedly draw analogies to chess, where AI has outperformed the best human players for decades. But chess has some unique characteristics that make it a poor model for the real world. Chess is a game of perfect information; both players know the exact state of the board at all times. The rules of chess are also far simpler than the physical world, allowing chess engines to “look ahead” many moves.

The real world is a lot messier. There’s a military aphorism that “no plan survives contact with the enemy.” Generals try to anticipate the enemy’s strategy and game out potential counter-attacks. But the battlefield is so complicated—and there’s so much generals don’t know prior to the battle—that things almost always evolve in ways that planners don’t anticipate.

Many real-world problems have this character: smarter people can come up with better experiments to try, but even the smartest people are still regularly surprised by experimental results. And so the bottleneck to progress is often the time and resources required to gain real-world experience, not raw brainpower.

In chess, both players start the game with precisely equal resources, and this means that even a small difference in intelligence can be decisive. In the real world, in contrast, specific people and organizations start out with control over essential resources. A rogue AI that wanted to take over the world would start out with a massive material disadvantage relative to governments, large corporations, and other powerful institutions that won’t want to give up their power.

There have been historical examples where brilliant scientists made discoveries that helped their nations win wars. Two of the best known are from World War II: the physicists in the Manhattan Project who helped the US build the first nuclear weapons and the mathematicians at Bletchley Park who figured out how to decode encrypted Nazi communications.

But it’s notable that while Enrico Fermi, Leo Szilard, Alan Turing, and others helped the Allies win the war, none of them personally wound up with significant political power. Instead, they empowered existing Allied leaders such as Franklin Roosevelt, Winston Churchill, and Harry Truman.

That’s because intelligence alone wasn’t sufficient to build an atomic bomb or decode Nazi messages. To make the scientists’ insights actionable, the government needed to mobilize vast resources to enrich uranium, intercept Nazi messages, and so forth. And so despite being less intelligent than Fermi or Turing, Allied leaders had no trouble maintaining control of the overall war effort.

A similar pattern is evident in the modern United States. Currently the most powerful person in the United States is Donald Trump. He has charisma and a certain degree of political cunning, but I think even many of his supporters would concede that he is not an intellectual giant. Neither was Trump’s immediate successor, Joe Biden. But it turns out that other characteristics—such as Trump’s wealth and fame—are at least as important as raw intelligence for achieving political power.

We can use superintelligent AI as tools

I see one other glaring flaw with the chess analogy. There’s actually an easy way for a human to avoid being humiliated by an AI at chess: run your own copy of the AI and do what it recommends. If you do that, you’ve got about a 50/50 chance of winning the game.

And I think the same point applies to AI takeover scenarios, like the fictional story in the middle chapters of If Anyone Builds It. Yudkowsky and Soares envision a rogue AI outsmarting the collective intelligence of billions of human beings. That seems implausible to me in any case, but it seems especially unlikely when you remember that human beings can always ask other AI models for advice.

This is related to my earlier discussion of how much AI models can accelerate technological progress. If it were true that a superintelligent AI system could “crack the secrets of DNA” in a week, I might find it plausible that it could gain a large enough technological head start to outsmart all humans.

But it seems much more likely that the first superhuman AI will be only slightly more intelligent than the smartest humans, and that within a few months rival AI labs will release their own models with similar capabilities.

Moreover, it’s possible to modify the behavior of today’s AI models through either prompting or fine-tuning. There’s no guarantee that future AI models will work exactly the same way, but it seems pretty likely that we’ll continue to have techniques for making copies of leading AI models and giving them different goals and behaviors. So even if one isntance of an AI “goes rogue,” we should be able to create other instances that are willing to help us defend ourselves.

So the question is not “will the best AI become dramatically smarter than humans?” It’s “will the best AI become dramatically smarter than humans advised by the second-best AI?” It’s hard to be sure about this, since no superintelligent AI systems exist yet. But I didn’t find Yudkowsky and Soares’s pessimistic case convincing.

Welcome Kai!

2025-09-18 23:10:05

You might have noticed that yesterday’s piece had a new byline—Kai Williams!

Photo by Christopher Cieri.

After earning a bachelor’s degree in mathematics from Swarthmore College last year (and earning a top-500 score on the Putnam math competition), Kai spent a year honing his programming skills at the Recurse Center and studying AI safety at the prestigious ML Alignment and Theory Scholars (MATS) program.

In short, Kai is smart and knows quite a lot about AI. I expect great things from him.

Talk to Kai!

Kai would like to get to know Understanding AI readers! He has opened up some slots on his calendar today and tomorrow for video calls.

If you’d like to talk to Kai, please click here and grab a time slot. [Update: readers have grabbed all time slots!] You could discuss your own use of AI, how AI is affecting your industry or profession, topics you’d like to see us cover, or anything else AI-related that’s on your mind.

He’d especially like to hear from people in K-12 and higher education, as this will likely be a focus of his reporting.

You can also follow Kai on Twitter.

Kai is supported by a Tarbell Fellowship

For the next nine months, Kai will be writing for Understanding AI full time, supported by a fellowship from the Tarbell Center for AI Journalism. Tarbell is funded by groups like Open Philanthropy and the Future of Life Institute that believe AI poses an existential risk. However, Tarbell says that “our grantees and fellows maintain complete autonomy over their reporting.”

So I don’t plan to change anything about the way I’ve covered these topics. I don’t expect existential risk from AI to be a major focus of Kai’s reporting.