RSS preview of The Intrinsic Perspective

Rss preview of Blog of The Intrinsic Perspective

Pluribus and Our Age of Hive Minds

2025-11-18 23:46:55

We live in an age of hive minds: social media, yes, of course, but so too is an LLM a kind of hive mind, a “blurry jpeg” of all human culture.

The existence of these hive minds is what distinguishes the phenomenology of the 21st century from the 20th. It is the knowledge inside your consciousness that there is a thing much bigger than you, and much more destructive, and you are entertained by it, and love it, and yet you must live to appease it, and so hate it.

You get to know the nature of a hive mind well if you put thoughts out into it regularly, and witness its workings. Like if you’re a newsletter writer. And of course, there are blessings—newsletters, as they are, could not exist without the hive mind of the internet. At the same time, even people who hate you are pressed in so close you can feel their breath and we just have to live like this, breathing each other’s air. It’s hard to imagine, but plenty enjoy living this way: yelling, screaming, breathing, with such closeness, together forever.

But what if our fractious hive mind were… nice? What if it didn’t try to destroy you all the time, but make you happy? Endlessly, ceaselessly, forever happy. This is the driving question of Pluribus, the new sci-fi show by Vince Gilligan, creator of Breaking Bad.

The show, which currently has 100% on Rotten Tomatoes (although we’re only three episodes in, so that’s all I’ve watched) can’t really be discussed without spoiling the plot setup. But honestly, the trailer does this to some degree anyways, and I’ll leave out as many details as possible.

Still, be forewarned of possible spoilers!

The actual sci-fi events are interesting, and executed well too (I won’t spoil them), although one has a sense for Gilligan they’re peripheral; he cares far less about “Hard Sci-fi” mechanisms and more about the results. But somehow or other, an RNA virus ends up psychically linking everyone on Earth. And so we enter a hive mind world.

A misanthropic famous romantasy author, Carol Sturka, is one of the few unaffected, naturally immune. But the analogy is quite clear: the new world she’s been thrown into is just an exaggerated version of our current world. Pre-virus, at her book signing Carol is already repulsed by the (non-RNA-assisted) hive mind of her own obsessed romantasy fans, and the oppressive idiocy of our modern world is also on display when Carol has to answer on social media who her dreamy corsair main character is based on. She decides to lie and answers “George Clooney” because “it’s safer.”

Once the virus spreads in Pluribus, the egregore is always smiling. Always polite. Our current toxic hive mind riven by different people with different opinions pressed into each other’s faces is transformed. This is Pierre Teilhard de Chardin’s Omega Point. The noosphere step toward kindly godhood. Suddenly, it’s not our cruel internet, but NiceStack. SmileStack. Somehow that’s almost worse. As Carol is one of the few unaffected after the virus spreads, the entirety of Earth holds Carol in a constant panopticon. Which means that, effectively, Carol is always the “Main Character” on social media. Carol is watched by a reaper drone from 40,000 feet, and watched by its operator, and so is also watched by all of humanity. Outside in the hot sun? Carol, you might have heat exhaustion, and that’s the opinion of every medical doctor on Earth! She is always the #1 trending topic.

John von Neumann Shot Lightning From His Arse

2025-11-13 01:05:04

The movie Braveheart has a great scene where, by whispers and tales, the legend grows of the Scottish rebellion leader William Wallace (played by Mel Gibson in one of his best roles).

First that he killed “50 men,” and then no, “100 men!” and was “seven feet tall;” he even, as Gibson jokes, shoots “fireballs from his eyes” and “bolts of lightning from his arse.”

And that’s what I think of when I hear the legends of John von Neumann.

“He could remember every book he’d ever read verbatim!”

No, of course he couldn’t. Indeed, as we will see, John von Neumann, despite being (inarguably) a genius, didn’t even invent the “von Neumann architecture” for computers. He’s just credited with it, as he is with so much else.

Why Does This Matter?

A decade ago, in the grand Nature vs. Nurture debates, proponents on the Nature side regularly said this about the “Blank Slate” side:

“Hey! Those blank slatists over there, those biased journalists and gatekeeping editors of scientific journals, they’re the unreasonable ones! We’re just arguing that Nature is non-zero!”

And that’s sympathetic, for who can argue with “non-zero?”

But with the rise of a new pop-hereditarianism, which has undergone virulent growth on social media websites like the former Twitter, the reverse is now just as true: Pop-hereditarians scoff at the very idea that Nurture’s contribution might be non-zero, fanatically dismissing that even the best education—gasp—could have an effect. Given an inch, they’ve taken a mile, and become annoyingly identical to the blank slatists they once criticized. Much like blank slatism, pop-hereditarianism is built on selective credulity: holding things like meta-analyses of the positive effect of education on IQ to the highest possible standards, while meanwhile, happily accepting the results of poorly-recorded half-century-old twin studies (where the “separated” twins already knew one another and self-selected into the research).

When it comes to the Nature vs. Nurture debate, the truth is in the middle. It will always be in the middle. Yet middles are unsatisfying.

Like most arguments online, facts matter less than symbology. And John von Neumann has become a symbol of pop-hereditarianism.

The funny thing is that, as one of the best educated people of all time, Johnny (as he preferred) is a poor choice as a symbol for pop-hereditarianism. He certainly wasn’t genetically perfect (as we will see), and he wouldn’t have agreed with them anyway, as he also thought education and parenting and cultural milieu matter.1

Last week, when I pointed out that “von” was a marker of nobility on X-Twitter—and that his childhood would make a tiger parent green with envy—this was apparently too much bubble bursting. I got dozens, maybe hundreds, of responses like “cope” and “communist” and some less polite words and a bunch of tall tales as if they were facts.

Here are a few gems I heard from the brain trust on social media:

He could do 8 digit calculations in his head as a 6-year-old (false)
He was fluent in 8 languages as a child (false)
He could recite any book he’d ever read, including telephone books (false)
His first math teacher wept when he met Johnny (false)
The von Neumanns were humble upper-middle-class who purchased a noble title (false)

I’m going to give a history of Johnny that debunks all these. I’ll also correct a great deal of bad scholarship along the way, and show you don’t need to invoke spooky rare variants or genetic dark matter to explain John von Neumann. You just need to do a little research yourself, and accept that places and eras can be just as exceptional as brains.

Subscribe now

The Best Education of All Time

The most definitive biography we have is John von Neumann, by economist Norman Macrae. It was supposedly approved of by Johnny’s family, and it’s well-researched and lengthy (unless otherwise specified, quotes come from there). Macrae opens his thesis agreeing with me:

Johnny had exactly the right parental upbringing and went through the early twentieth-century Hungarian education system that (this book will argue) was the most brilliant the world has seen….

So Johnny had innate talent, gobs of it, but he was also a product of a society and an education system that doesn’t exist anymore. Hungarian society produced geniuses at an astonishing rate.

File:Szinyei Merse, Pál - Picnic in May (1873).jpg — “Picnic in May,” by Pál Szinyei Merse, a Hungarian painter of the *Belle Époque*

If you are to beget a genius, a boom era in Belle Époque will serve you best. The booming Budapest of 1903, into which Johnny was born, was about to produce one of the most glittering single generations of scientists, writers, artists, musicians, and useful expatriate millionaires to come from one small community since the city-states of the Italian Renaissance…. Remember, as we do, from what a small constituency of upper-middle-class Budapest males—in one fifty-year period and mainly from three great schools—this tribe of world changers came.

What made Hungary so special? It was not a democracy, but an aristocracy transitioning to a plutocracy. Its inequality meant it could establish an extremely favorable-to-the-elite education system, and it did. However, it was also welcoming to new immigrants and new ideas and new money.

It was the twin city where clever little rich boys… (like Johnny) had a cosmopolitan choice of governesses before age ten, and after age ten (provided they could pass exams and pay) a choice between at least three of the world’s best high schools.

In this milieu of turn-of-the-century Budapest Johnny was aristocratically tutored by a “tribe” of tutors and governesses at his 18-room top-floor downtown home until he turned 10. After that, he went to one of those “world’s best high schools.” There, he was taken under the wing of László Rátz, who was described by Eugene Wigner (winner of the Nobel Prize, who was a year ahead of Johnny under Rátz and one of Johnny’s best friends) as one of the greatest math teachers of all time.

Macrae says it was the “Hungarian tradition with infant prodigies” to find them one-on-one tutoring from university professors. Rátz recognized the boy’s talents shortly after Johnny began formal schooling, and began tutoring him personally. He also arranged for a progression of ever-more impressive math tutors over the next eight years (keep in mind, Johnny had been tutored at home in mathematics previous to meeting Rátz, meaning that he was tutored his entire young life). He and his tutors would, it seems, meet multiple times a week, and later on the revolving door of them contained august mathematicians famous in their own right, like Alfréd Haar, Gábor Szegő, Michael Fekete, and Leopold Fejér. Johnny’s first math paper ever was co-authored with his long-time tutor Fekete, at age 17-18, on Fekete’s area of expertise, and Johnny got his PhD under Fejér (also his tutor and a family friend). Johnny would then go on to study mathematics under Hilbert, one of the most renowned mathematicians of all time.

Johnny’s father Max was an intellectual who had an instinct for education and ran mealtimes like seminars. Regular dinner guests “glittered with especial brilliance,” and included Leopold Fejér, who would become Johnny’s PhD supervisor.

Max’s mealtime seminars were an important feature of all his children’s development almost from the nursery…. Families in those pretelevision and precommuting days met for a relatively full and lengthy late lunch. Then father would go back to the office, but the children would not return to ordinary school. Schooltime afternoons in Hungary were for sports or private tuition or study. The whole family would then have a similar lengthy dinner in the evening. The mealtime habit that Max encouraged was that members of the family, including himself, should each present for family analysis and discussion particular subjects that during the day had interested them….

Max also hired well. German governesses and French governesses (apparently they hated one another), as well as an Italian governess; then, two homeschooling teachers just for English, a Mr. Thompson and a Mr. Blythe (but this started in 1914, so it seems Johnny only began English at the later age of 11). Max sat in on the lessons with Johnny.

Little Johnny had a Fencing Master who came to his home (there was enough room), and a music teacher too, but she apparently did not impart much to Johnny. He had teachers come to his home as well, about whom we know nothing—their contributions have been lost to history.

Max and Johnny on the left / what their street looked like on the right (1905)

In their enormous building, their cousins lived on the floor below. Max (the father) bought a summer home that was so nice it is now historically preserved as a place for the local children to play. The entire family was wealthy and well-connected (his mother’s lineage has its own Wikipedia page); indeed, Max was an advisor to the Hungarian Prime Minister Kálmán Széll. For Max’s service—not for a monetary payment—the family was officially ennobled in 1913.

Even if Johnny only became an aristocrat on paper at 10, looking at his young life, any modern person would instantly recognize it as fundamentally aristocratic and elite and his education as, basically, perfect.

Subscribe now

How “The Martians” Explained Themselves

The incredible socio-educational environment of Budapest had effects on the entire world and the shape, and ending, of World War II.

Classically, Johnny is grouped with a Jewish-Hungarian group of similar ages called “The Martians” at Los Alamos and beyond (on the Manhattan Project alone were Leo Szilard, Edward Teller, Eugene Wigner, and Johnny himself). All except Wigner appear to have embraced the rumor that they were from another planet, cracking jokes and propagating it themselves.

However, this clustering of genius probably isn’t as historically mysterious as it’s been made out to be. Nor was it solely a Jewish phenomenon, e.g., there were plenty of non-Jewish Hungarian geniuses produced by that period, like Albert Szent-Györgyi or György Békésy (both won Nobel Prizes). Macrae cites that Johnny’s Lutheran school self-identified as 52% Jewish, and speculates that over 70% of Johnny’s school may have been ethnically Jewish, with many having converted. Other elite high schools were probably somewhat similar.

Additionally, many of those kids were second-generation immigrants (including Wigner and Johnny), which we know can be a huge boon to academic success. Indeed, we see the exact same effects here in America today, coupled with the advantages of cultures that emphasize academic achievement. Johnny’s school was ~70% Jewish-Hungarian (at ~20% of the population), for probably pretty similar reasons that MIT right now is 47% Asian American (at ~7% of the population).

From Russian’s steppes, from Bismarck’s Germany, from Dreyfus’s France, and from Hungary’s own mountain villages they came: a cultured and upwardly mobile group, intent on giving their sons (sadly, more than their daughters) the education that some of them had never had.

According to Macrae, Jewish immigrants chose to go to Budapest, rather than New York, because of its culture and sophistication, its (somewhat) meritocratic aspects, its quality of life, such as its extremely plentiful domestic servants, its education system, and there was no long and dangerous sea voyage—all this attracted the upper-class specifically to Budapest.

In fact, Budapest of that era may have been one of the few places and times in all of history where you could hire a bevy of top-notch professional private tutors and governesses without being obscenely wealthy (just well-off), and also where it was standard to do so (which is why formal school started so late).2

What did they themselves think? Eugene Wigner wrote in his memoirs:

Many people have asked me: Why was this generation of Jewish Hungarians so brilliant? Let me begin by making it clear it was not a matter of genetic superiority. Let us leave such ideas to Adolf Hitler.

Instead, he suggests that credit is due to “the superb high schools in Budapest, which gave us a wonderful start,” but that most impactful was “our forced emigration." And indeed, Johnny agreed that it was the situations into which they were thrown, historically, and had to succeed in to survive.

Therefore, we see that Budapest operated like a reservoir-release model, drawing in via immigration a bunch of talent, educating it incredibly well due to rare historic and economic circumstances, and then exploding in a forced diaspora via antisemitic persecution and world war.

John von Neumann Didn’t Invent the “von Neumann” Architecture

He was afraid on his deathbed that history would forget his name. That now seems unlikely. No one can argue that John von Neumann wasn’t one of the greatest contributors to human understanding in the 20th century. His relevancy only continues to grow into the 21st century, as the topics he touched on (like computers) have become more central (see, e.g., The Man from the Future).

As an adult, probably owing at least partly to his lengthy aristocratic tutoring, Johnny was incredible at formal frameworks, calculations, and taking an idea to its conclusion before anybody else. He simply knew, crystallized, so much more than others. However, there’s an uncomfortable corollary:

Johnny borrowed (we must not say plagiarize) anything from anybody, with great courtesy and aplomb. His mind was not as original as Leibniz’s or Newton’s or Einstein’s, but he sees other people’s original, though fluffy, ideas and quickly changed them in expanded detail into a form where they could be useful for scholarship and for mankind.

Let us consider a perfect example of this, which is always high on the list of Johnny’s contributions: the “von Neumann architecture” of modern computers.

ENIAC was one of the earliest uses of vacuum tubes as a general-purpose computer. It was developed at the Moore School of Electrical Engineering of the University of Pennsylvania. A fellow early computer scientist who was also in the military, Herman Goldstine, secured von Neumann a tour of it (and involved him with the funders). Johnny, as usual, instantly understand what was central and important about the numerous planned potential improvements to ENIAC, which John Mauchly and J. Presper Eckert had been working out. The two men had built ENIAC and had been off-and-on designing a follow-up for a while, and all this they openly shared with Johnny.

Mauchly Demonstrating Eniac by Bettmann — Mauchly showing off ENIAC

Not long after Johnny’s visit, inspired by McCulloch and Pitts’ famous early work on artificial neurons, Johnny wrote up a report on Mauchly and Eckert’s planned design of EDVAC, but first he chose to finalize some of the existing ideas they’d proposed (likely adding in some on the programming side) and then he slapped on a bunch of abstract formal math to redescribe it in his terms (a critic might say “jargon,” but it is very cool jargon).

But then Herman Goldstine sent out the report with only Johnny’s name on it, and voilà! The “von Neumann architecture.”

Here is an excerpt from “A Letter to the Editor of DATAMATION” by John W. Mauchly.

Naturally, “architecture” or “logical organization” was the first thing to attend to. Eckert and I spent a great deal of thought on that, combining a serial delay line storage with the idea of a single storage for data and program…. The EDVAC was the outcome of lengthy planning….

But [Johnny] chose to refer to the modules we had described as ‘organs’ and to substitute hypothetical ‘neurons’ for hypothetical vacuum tubes or other devices which could perform logical functions. It was clear that Johnny was rephrasing our logic, but it was still the SAME logic….

He must have spent considerable time at Los Alamos writing up a report on our design for an EDVAC... But Goldstine mimeographed it with a title page naming only one author—von Neumann. There was nothing to suggest that ANY of the major ideas had come from the Moore School Project! Without our knowledge, Goldstine then distributed the ‘design for the EDVAC’ outside the project and even to persons in other countries.

Don’t misread this as my saying that Johnny was a plagiarist or unoriginal. Johnny contributed to the computer revolution in later ways as well, and that specific incident is almost certainly just Herman Goldstine’s fault, although Johnny did silently benefit. Johnny was fond of saying some version of “It takes a Hungarian to walk into a revolving door behind you and come out ahead” and that’s precisely what happened.

My point is merely that, while Johnny was absolutely one of the greats in the pantheon of science and math, one of is how we should see John von Neumann. His first major work on axioms had “run aground” on Gödel. His second major work on formalizing quantum theory had been upstaged by Dirac’s notation, which became the standard, and his major proof that hidden variable interpretations of quantum mechanics were supposedly impossible turned out to be flawed, due to what John Stewart Bell called a “silly” error.

Johnny’s long-time assistant and fellow great Hungarian mathematician, Paul Halmos, wrote about him that:

As a writer of mathematics von Neumann was clear, but not clean; he was powerful but not elegant. He seemed to love fussy detail, needless repetition, and notation so explicit as to be confusing…. quite a few times, it gave lesser men an opportunity to publish “improvements” of von Neumann.

Johnny even suspected that Norbert Wiener had a mind “intrinsically better than his own.” Who was Norbert Wiener? Another famous early computer scientist and cyberneticist that people like to tell stories about.3 He was also another child prodigy shaped by his father, but much more intently and viciously than Max; e.g., when Norbert was born, his father held a press conference saying he would raise a genius—and, well, he did. Johnny also envied the dreamy Einstein’s deep and intuitive leaps of genius.

Nobel Prize winner Eugene Wigner, Johnny’s best and longest friend, had plenty of high praise for Johnny, but firmly gives his opinion on whether John von Neumann was the Greatest Scientific Mind of All Time. The answer was no.

Einstein’s understanding was deeper than even John von Neumann’s. His mind was more penetrating and more original than von Neumann’s.

Johnny’s friend, Stanisław Ulam, who suggested the idea of automata growing on cells that Johnny then formalized into cellular automata (note again the pattern), wrote of Johnny that:

In spite of his great powers and his full consciousness of them, he lacked a certain self-confidence, admiring greatly a few mathematicians and physicists who possessed qualities which he did not believe he himself had in the highest possible degree. The qualities which evoked this feeling on his part were, I felt, relatively simple-minded powers of intuition of new truths, or the gift for a seemingly irrational perception of proofs or formulation of new theorems.

Absolutely none of this means that John von Neumann wasn’t a genius, nor contributed far more than all but a few others. Certainly, plenty of other geniuses thought he was smarter than them (just as he suspected Einstein and Wiener might have something he didn’t). His output attests to it. As Halmos, his assistant, summarized:

Brains, speed, and hard work produced results. In von Neumann’s Collected Works there is a list of over 150 papers. About 60 of them are on pure mathematics (set theory, logic, topological groups, measure theory, ergodic theory, operator theory, and continuous geometry), about 20 on physics, about 60 on applied mathematics (including statistics, game theory, and computer theory), and a small handful on some special mathematical subjects and general non-mathematical ones.

But even in his incredible output and the legends around him, Johnny’s overall academic oeuvre seems closer to what one would expect from someone whose talents were due to his perfect education vs. his supposedly-superhuman brain itself. His work is a triumph of formalizing things beautifully and quickly, being in the right rooms, flat-out knowing more than other people, and competitively beating them to the punch (he described himself in a letter to his daughter as “an ambitious bastard”).

I personally can’t help but feel that Johnny’s most original work was his development of game theory. Games have existed for a pretty long time, but here’s Johnny in 1928 with “Theory of Parlor Games” which grows into his famous 1944 book Theory of Games and Economic Behavior.

It’s also the most him. As a child, Johnny would dress up and have his brothers move sheets of paper around, enacting battles. Probably it was Kriegsspiel, a Prussian military war game that had strong influences on later gaming systems like Dungeons & Dragons and Warhammer. It didn’t seem to matter who won or who lost to Johnny, instead his question was: Is there an optimal strategy? We can see how this evolved into: Is there a provable optimal strategy? Even there we can see how his environment influenced his work as much as his genetics.

Subscribe now

Just Because It’s on Wikipedia Doesn’t Make It Real

Now that we have sketched the man, and his many, many impressive contributions (but also noted their occasional overhyped nature), we have the capacity to look honestly at the legends and myths around Johnny’s raw mental abilities.

Unfortunately, these myths have propagated to the point where even Wikipedia is wrong. E.g., on Wikipedia it’s stated that Johnny could divide 8-digit numbers in his head at the age of 6.

The actual link takes you to a 2007 pop-sci book titled Mathematics: Powerful Patterns In Nature and Society by one Harry Henderson. Here’s what Henderson’s, and thus Wikipedia’s, child prodigy feat is based on: an incredibly brief section about Johnny’s childhood (because this book is not a biography).

No specific source, unfortunately! Yet Henderson, via Wikipedia, is cited by so many others for the same claim.

Indeed, we can debunk this entirely. For I tracked down another “eight-digit” number story about Johnny that the author Harry Henderson (who had no connection to Johnny) almost certainly misremembered. His Further Reading section contains Macrae’s biography, and, yup, there it is written by way of introduction that:

Three usual descriptions are that Johnny exuded self-confidence, had the world’s best memory, and could multiply eight-figure numbers by other eight-figure numbers in his head. All these descriptions are half wrong.

Henderson almost certainly just misremembered that part of Macrae’s biography. And so not only is Wikipedia, by way of Henderson’s mistake, reporting an ability that’s “half wrong,” the situation is even worse, because from the surrounding context Macrae is clearly talking about the adult Johnny, not Johnny as a 6-year-old! And Macrae’s quote is about multiplication. So it is thrice wrong.

Other rumors have similar origins. Enter, once again, Herman Goldstine. That’s right, the same guy who gave Johnny all the credit for the von Neumann architecture.

In 1972, Herman Goldstine published The Computer from Pascal to von Neumann, in which he makes a number of much-cited wild claims about Johnny, including about his early life, things that Goldstine could not possibly have known for sure. First, he says that “He and his father joked together in Classical Greek.” However, Macrae specifically notes that Johnny’s family denies any memory of this (and there goes Greek from the list of languages he was fluent in as a young child).

Second, Goldstine gives us what seems to be the original “perfect recall” story:

One of his most remarkable capabilities was his power of absolute recall. As far as I could tell, von Neumann was able on once reading a book or article to quote it back verbatim; moreover, he could do it years later without hesitation…. On one occasion I tested his ability by asking him to tell me how A Tale of Two Cities started. Whereupon, without any pause, he immediately began to recite the first chapter and continued until asked to stop after about ten or fifteen minutes.

Macrae also mentions this story, but gives some pretty critical context that deflates the whole thing from superhuman to merely incredibly impressive: Johnny had specifically memorized the beginning of Dickens’ A Tale of Two Cities prior to coming to America to improve his English.

Through this practice, he was able at age 50 to baffle Herman Goldstine by quoting the first dozen pages of Dickens’ Tale of Two Cities word for word.

Macrae also mentions that:

In English he had chosen to browse through encyclopedias and pick out interesting subjects to learn by heart. That is why he had such extraordinarily precise knowledge of the Masonic movement, the early history of philosophy, the trial of Joan of Arc, and the battles in the American civil war. In German in [his] youth he had done the same thing with Oncken’s [history].

That’s obviously impressive, to remember a long passage so many years later. But Goldstine makes it sound like he “tested” Johnny on the opening of a random book—except it couldn’t have been a random book, as it was the exact same passage Macrae reports Johnny purposefully memorized years before to practice his English. Likely, either that book came up in conversation naturally and the demonstration actually occurred, but then Goldstine misunderstood or misremembered (or worse) and portrays his effort as “testing” him (and perhaps exaggerated the length of the passage too).4

Consciously choosing passages to learn by heart to improve one’s feel for a language is incredibly astute, but also wildly different from being able to remember every book you’ve ever read. Indeed, the same titles (the history he read when he was young, the Dickens, and entries in an encyclopedia) crop up again and again in the stories of recitations. If it wasn’t purposeful memorization, it wouldn’t be listable like that. It would really be just any random book. Like a telephone book!

Incredibly, that’s also another (false) claim.

The standard fare on a lot of websites (and we see again the debunked Greek claim).

This claim is from Prisoner’s Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb, yet another pop-sci semi-biography more focused on the science than Johnny himself, published in 1993 by William Poundstone. And surely we can trust the esteemed author of other books like How Do You Fight a Horse-Sized Duck?

Nope, it’s almost certainly another misremembering of a different source. Specifically, this time from a 1973 source: “The Legend of John von Neumann,” by Paul Halmos (a great mathematician in his own right, but also, importantly, Johnny’s assistant for many years). Halmos writes a tongue-in-cheek accounting of the stories and legends about Johnny’s intelligence. Early on, Halmos says there’s a story in circulation that Johnny (as an adult) could “memorize the names, addresses, and telephone numbers in a column of the telephone book on sight,” but Halmos makes perfectly clear that it’s just a story, and is, like many others, “undocumented” and “unverifiable.” Halmos may even be implying it was Johnny’s own joking that originated it:5

Speaking of the Manhattan telephone book he said once that he knew all the numbers in it—the only other thing he needed, to be able to dispense with the book altogether, was to know the names that the numbers belonged to.

So that joke (he knows the numbers, get it?) may have started the rumor about Johnny memorizing phone books as an adult (or they originated elsewhere, but certainly Halmos himself never witnessed anything like that in his many years of working closely with Johnny or he’d say so). However, somehow Halmos’ wry recounting of tall tales makes it into Prisoner’s Dilemma, and there it is mistaken to be (a) about Johnny as a child, and (b) true.

What’s interesting is that Halmos, who would be in one of the best positions to judge the adult Johnny as his assistant, does give actual examples of Johnny’s calculating brilliance. Some are as impressive as the legend suggests—e.g., Johnny solving difficult math puzzles, or once beating a computer at a calculation. But others are, frankly, not so impressive. Kind of disappointing, really. Johnny arrived at a solution in two seconds while his students took 10 seconds? Okay. He once gave a presentation on his area of expertise without any notes? Okay, now we’re just reaching for stuff.

Halmos even makes us question how many languages Johnny knew. Apparently:

At home the von Neumanns spoke Hungarian, but he was perfectly at ease in German, and in French, and, of course, in English.

So that’s “only” four fluent languages as an adult, and his biographer Macrae implies that Johnny’s English spelling was bad. Stanisław Ulam does say an adult Johnny remembered his Latin and Greek from school (implying he learned it in the formal system) and could kind of speak Spanish by adding “el” to things (again, this seems like clear reaching?). So we have four fluent languages, his ability to add “el” to words, and at least some unclear amount of Latin and Greek from school as an adult—deeply impressive, except for the Spanish!

But that impressiveness was, yet again, transformed via the rumor mill into him knowing all those (and more) languages fluently as a young child, while it’s quite clear that he started learning English only at 10 or 11. The Italian governess apparently never had much of an impact on Johnny—odd for someone who could learn everything effortlessly!

The “Nothing Ever Happens” Explanation

While hopefully giving Johnny his incredible due, we have so far debunked that child Johnny had the preposterous powers assigned to him, and we’ve also shown how even his adult legend has been exaggerated. There’s so little original scholarship too that pretty much everything links back to the same few mistakes I’ve identified.

Johnny died before writing his autobiography, a huge blow to history. To make matters worse, some Johnny biographies contain numerous errors. Even Macrae’s biography isn’t perfect (see this footnote6 for a historical error I identified, and other minor criticisms). My long-standing policy is not to try to hurt people via gotcha mistakes, and I believe that most people are doing their best, and so I’ve tried to not single out any individual author too much (other than the long-dead Goldstine, but perhaps it was a series of errors on his part too).7

Johnny undeniably was a child prodigy. Wigner’s memoirs attest to that.8 But Wigner only met Johnny when Johnny was 12, already having been tutored by Rátz privately for two years, and simultaneously likely Fekete as well. A preteen can indeed be incredibly mathematically impressive, especially when they are being tutored by those two!

Personally, after all my reading, I do believe Johnny had a mind like a steel trap. I just also believe he was mortal. And the boring explanation is that he simply had a habit of memorizing certain passages and pulling them out to show off.

There is suggestive evidence of this. One of the best first-hand accounts of Johnny’s mental powers comes from Johnny’s brother himself. While it’s unfinished, his brother does mention young Johnny’s memory as being “amazing” and “powerful” and at one high point, “almost unlimited.” But he was 8 years younger than Johnny, and really the only concrete examples he gives of these powers are things like Johnny explaining scientific concepts at the dinner table really well, impressively deducing the answer to a prize question, and the older Johnny finishing his homework quickly. And his younger brother also explicitly says that many of the rumors swirling around Johnny are false, and implies that a real biography will one day correct them. Nowhere does he deny that Johnny engaged in explicit memorization and study, of which there was a family habit; tellingly, in the middle of the unfinished recollections, Johnny’s brother begins to lapse into old poetry in other languages, and admits these are poems that he explicitly memorized back then, in that glittering cognitive Belle Époque of his brain—writing about Johnny was surfacing them.

Given his family’s practice of memorizing, Johnny likely had both an innately good memory for the things he read but also studied and practiced them. And most families also don’t expect recitations at the dinner table to begin with. Even his grandfather used to come over and perform arithmetic tricks, abilities that Johnny had a jolly time exaggerating later in life.

We don’t hear any specific anecdotes about Johnny’s supposed ability to remember literally anything he wants in his daughter’s memoir, The Martian’s Daughter, and I feel like that might crop up if you lived together for years. And we know from Macrae that Johnny “could rarely remember a name” and:

His memory and feel for words, plus unsurpassed feel for mathematical symbols, had not extended to memory for faces. All his life he was embarrassed by not knowing people who clearly knew him. He had no sort of photographic memory. This imposed some limitations on his mathematics (he was not good at envisaging shapes) but probably also added to some of his strengths.

We also have this story about Mandlebrot and Johnny both forgetting a proof they worked on together.

[Mandlebrot] told me that one long afternoon he and von Neumann were working together trying to prove a difficult theorem. As their conversation advanced, von Neumann asked Mandelbrot several times to take notes on their progress. Each time Mandelbrot grandly replied that he would remember everything and write it up the next day. But the next morning he woke up and could not remember any of the key steps in the proof they had been developing. He said that when he confessed this to von Neumann he had never before seen a man so angry and he himself remembered the incident as the ultimate embarrassment of his professional career.

Meanwhile, Johnny’s daughter offers the observation that Johnny’s first wife (her mother) complained he was always taking an encyclopedia to the bathroom.

My mother used to say, only half jokingly, that one of the reasons she divorced him was his penchant for spending hours reading one of the tomes of an enormous German encyclopedia in the bathroom.

So maybe he just read and re-read books on the toilet? There’s an answer as mundane as possible!

Of course, the further part of the “nothing ever happens” explanation is that people tell stories about other geniuses too. E.g., Norbert Wiener had an “encyclopedic memory” according to Steve Heims, but only Johnny’s has been transmuted via internet lore.

What we do know for sure is that Johnny’s huge amount of crystallized knowledge was:

… the explanation of his extraordinary powers of mental calculation. He was not actually better than many other mathematicians—or indeed than some vaudeville freaks—at multiplying one eight-digit figure by another. But he used his accumulations of mathematical constants and equations to become a startling problem-solver and extraordinary concept-expander.

The Something Happened Explanation: Johnny Had OCD

I couldn’t help being struck, in my reading, by how Johnny’s second wife’s personal account strongly suggests he had OCD:

A drawer could not be opened unless it was pushed in and out seven times, the same with a light switch, which also had to be flipped seven times before you could let it stay.

This conflicts with another legend, that Johnny was 100% normal, despite his intelligence. He liked parties, yes, but he wasn’t normal—just normal compared to people like Dirac, who crept along next to walls. The details of Johnny’s wife’s account have only emerged somewhat recently, and there hasn’t been much time for scholars to digest it.

And, interestingly enough, there’s a phenomenon where OCD sometimes manifests as “memory hoarding.” It’s not well-studied, but it’s common enough for the term to be used. Here’s from the OCD Center of LA:

Memory hoarding is a mental compulsion to over-attend to the details of an event, person, or object in an attempt to mentally store it for safekeeping. This is generally done under the belief that the event, person, or object carries a special significance and will be important to recall exactly as-is at a later date. The memory serves the same function for the mental hoarder that the old newspaper serves for the physical hoarder.

Maybe Johnny’s version of OCD manifested within his studying habits and he became a “mental hoarder.” Johnny certainly could have disguised this, just as he pragmatically accepted his daughter hiding her Jewish ethnicity later in life to keep appearances copacetic. Maybe he kept his own mental issues as private as possible too.9 Like, say, in the bathroom? Perhaps he sat on the toilet for hours and hours going over encyclopedias, not in raw intellectual pleasure, but in tormented neurosis. There are stories of him abruptly leaving parties to go do something mysterious, only to return as if everything was fine hours later. Everyone’s always assumed he was jotting down brilliant notes for new mathematics. Maybe he was just living out his OCD in the bathroom again.

Again, there’s no strong need for the OCD hypothesis. But if you are going to give full credit to the remaining anecdotes (which are less preposterous than the ones I’ve debunked), the OCD hypothesis is more likely than that he genetically had a memory so beyond even his genius peers. And there’s a funny corollary. It might mean that out of a million embryos, maybe Johnny’s brain specifically wouldn’t be the one picked by genetic scoring! If you cloned Johnny today into a different family environment, you might not get a genius, but merely a smart person with OCD focused on stuff that’s not memorizing equations and hobby history.

Subscribe now

My overall thoughts on pop-hereditarianism

I actually hate writing about hereditarianism, even obliquely.

All the pivots are predictable and exhausting: “What about Study A, or Person B, or example C, or observation D!” Or “What about some other dug-up 80-year-old claim about Johnny—you can’t disprove them all!”

It’s also a fraught subject, and I don’t like confrontation. I’m much like Johnny that way—always smoothing things over (our similarities, unfortunately for me, end there). There are just far more interesting things to do, and I want to get along with most individuals.

The problem is, despite not actually being a blank slatist, I’m interested in what makes for a great education and this stuff pops up in my way when I want to talk about it. The philosophy also attracts immoral and dangerous political ideologues who use it as a way to accumulate power. That’s not to say everyone who disagrees with me is evil or bad! Plenty of others do honestly approach the question of Nature vs. Nurture and come down more on the hereditarian side than I. That’s perfectly fine and reasonable, of course. But when it comes to the pop-hereditarian movement as a whole, the loudest voices on social media will take over, if they haven’t already, as the project of pop-hereditarianism has obviously become a political one.

Since this small but symbolically-important slice of the pop-hereditarian movement—all their legends about an individual with a genetically-perfect brain called John von Neumann—has turned out to be built on absurdly selective credulity, and on shallow intellectualism in general, I hope readers can reach their own conclusions about how much of the rest is not lightning coming out of arses, but something else.

Johnny was unlikely to be a strong hereditarian. First, there was his hatred of Hitler, and he would have likely drawn the analogy just as Wigner did. But more relevantly, it is evidenced in his own explanation of why he and his peers from Budapest were so brilliant (all we know is that he gave cultural reasons), as well as practically in his life, like his unusual arrangement (after the divorce) for how his daughter was to switch to living with him during her most intellectually formative years (she became Vice President of General Motors). Also, his own ethnicity did not appear particularly important to him. As Macrae writes:

There is a danger in overemphasizing Johnny’s Jewish origins. Except for a Jewish sense of humor—which he kept all his life—Jewishness never meant much to him. His daughter says she did not know of her Jewish heritage until her teens.

To render “The Martians” even less mysterious, there were likely significant peer effects. In Eugene Wigner’s memoir, he recounts constantly bumping into and befriending other Hungarians abroad, and there was probably a bit of favoritism too. Princeton originally hires Wigner mainly because they wanted to entice Johnny, and, e.g., his mentor, the famous Michael Polanyi, was from Budapest as well, and even in Berlin that informs their relationship:

Polanyi took an interest in all of his assistants, but I felt that he liked me especially…. Because Polanyi was a decade my senior and held a far higher position, it was not quite proper for him to befriend me as he did.

People love to tell obviously exaggerated stories about famous mathematicians. A story about Norbert Wiener goes like this: His family moves to a new address, and his wife, knowing he would forget the new address, writes it down on a sheet of paper. Wiener uses the paper for notes that day, then throws it away. He goes back to the old address and realizes he’s lost. So he asks a little girl nearby “I’m the famous Norbert Wiener, and I’m lost. Do you know where I live?” And the little girl replies: “Yes, father.”

It’s also worth mentioning that Goldstine’s second anecdote beyond the Dickens story is far far less impressive.

Another time, I watched him lecture on some material written in German about twenty years earlier. In this performance von Neumann even used exactly the same letters and symbols he had in the original.

But it isn’t that surprising that a (great) research would remember their own work 20 years ago and use the same symbols!

In the same “Legends of John von Neumann,” while talking about the telephone-memorizing being a rumor Halmos cracks a joke that, unfortunately, was probably misread by many. He says that “some of the von Neumann stories in circulation” are that:

At the age of 6 he could divide two eight digit numbers in his head; by 8 he had mastered the calculus; by 12 he had read and understood Borel’s Théorie des Fonctions.

Halmos is using these as specific examples of rumors that are “undocumented” and “unverifiable,” but the progression is so punchy he could just as well been wryly making them up himself, i.e., mastering calculus at age 8 and understanding Borel’s Théorie des Fonctions by 12 is the mathematician’s way of saying “fireballs from his eyes” and “lightning from his arse.” But since mastering calculus at 8 does seem kind of like a possibility for Johnny (understandably) at least a few authors thought “Wow, lightning!” even though Halmos had just said that these were stories.

Here’s one error I identified in the Macrae biography: Macrae implies that the very first tutor Rátz arranges was Szegő (this is where the story of Szegő having “tears in his eyes” from meeting young Johnny comes from—the tears were reported by Szegő’s wife).

However, this is provably false. Macrae writes that Szegő had “done the initial coaching in 1915-16” of Johnny, and that Johnny’s tutoring was then taken over by others. But I discovered that Szegő was not even in Budapest in 1915-1916; he’d already entered the World War as a cavalier! He only returns to Budapest after the war. So the story of “tears in his eyes” probably came from then. I speculated that Fekete, a tutor by trade, and Johnny’s first co-author, was actually his paid tutor for the entire 8 years, introducing Johnny to his notable associates over time. I found that this is supported by John von Neumann and the Origins of Modern Computing by William Aspray.

Rátz contacted a mathematics professor… at the University of Budapest, and arranged for a young mathematician on the faculty, Michael Fekete, to tutor John regularly in mathematics at home, an arrangement that continued throughout the eight years that John attended the Lutheran Gymnasium.

Johnny’s tutoring progression now makes more sense, and speaks, yet again, against Johnny being a superhuman child prodigy (just, you know, one of those “normal” child prodigies). Johnny wasn’t taken immediately to the most famous mathematicians around and given their time, he had to work his way up, and he’s provoking tears when he’s 15-16 years old, not 11-12.

So too when it comes to talking about Johnny’s mental abilities, I found Macrae frustrating. His carefully-chosen language occasionally plays up Johnny’s memory, but his fanciful descriptions could equally apply to Johnny just studying things, simply faster and better than most other people. If I were a biographer, and there was a rumor my subject could memorize any book he wanted to instantly, I wouldn’t lean into the legend by writing vaguely that he learned something by “reading very quickly but with enormous concentration” without mentioning the pretty critical information about whether he read it exactly once and whether the resultant knowledge was truly verbatim. Same with describing a study session as “one concentrated gulp.” What is “one gulp,” exactly? One read? One hour? A whole day? I’ll tell you what it is: it’s poetic license for “He memorized it fast but I don’t know the details.”

Speaking of mistakes, I wrote all this, my own mini-biography of Johnny, over just a few days. I’m sure there’s a mistake or two of my own hiding somewhere, given the huge number of sources and names and dates! I’ll try to correct anything major I missed [See the new “Errata” comment for changes and additions].

Obviously, something about Johnny originally attracted the attention of Rátz. Wigner does hype up Johnny as a 10-year-old as knowing a bunch of mathematics, but he makes this comment while clearly speculating about Rátz’s motivations, which he didn’t know (and of course, Wigner only meets Johnny himself at 12, so he’s basically just trying to explain why Rátz did what he did and doesn’t himself know).

Incidentally, the OCD hypothesis might also explain why Johnny’s first wife left him (which is somewhat mysterious, given how well-matched they were and their good relationship afterward). Perhaps her many complaints about him being more attentive to his “work” than her were actually just as much about his OCD behaviors.

The Internet You Missed: A Last 2025 Snapshot

2025-11-05 23:11:51

There are many internets. There are internets that are bright and clean and whistling fast, like the trains in Tokyo. There are internets filled with self-serious people pretending they’re in the halls of power, there are internets of gossip and heart emojis, and there are internets of clowns. There are internets you can only enter through a hole under your bed, an orifice into which you writhe.

Every year, paid subscribers of The Intrinsic Perspective submit their writing, and I curate and share the results. As usual, I’m impressed by the talent on display.

This is Part 2. I shared Part 1 months back, and while I normally do pace installments out, their rather lengthy separation this year was not my fault, you see. As I’m sure you’ve noticed, time itself has sped up in 2025—in 2024 time, it’s only late September. As scientists will likely show any day now, this is a function of our consciousnesses rapidly shrinking in qualia volume, thanks to a “temporal squeezing” effect triggered by neural downsampling after watching even the tiniest amount of AI slop from the corner of your eye.

Still, the quality was truly exceptional this year, so I’m happy I got to read these, and choose some excerpts to share. Please note that:

I cannot fact check each piece, nor is including it an endorsement of its contents or arguments.
Descriptions of each piece, in italics, were written by the authors themselves, not me (but sometimes adapted for readability). I’m just the curator here.
I personally pulled the excerpts and images from each piece after some thought, to give a sense of them.

So here is their internet, or our internet, or at least, the shutter-click frozen image of one possible internet.

1. “JFK vs. Jeffrey Wang” by Samuel Kao.

Comparing two Harvard admissions essays, one from 1935 and the other from 2014, and showing how the American elite has changed for the worse.

For whatever reason I started thinking about an essay that AI startup entrepreneur Jeffrey Wang had written about studying at McDonald’s, which he used to apply successfully to several elite universities like Harvard, Yale, and Princeton in 2014….

I also remembered John F. Kennedy’s admissions essay to Harvard, which went viral some time ago for its risible—to a 21st century reader—brevity, and I thought Kennedy’s work was a useful counterpoint to Wang’s. Taken together, they provide an easy way to survey changes in elite selection in American society.

We will start with Wang’s essay, because it is a much richer text than Kennedy’s. Obviously, the prose is bad, but this is a high school essay, and I was not much better at that age. It is the very concept of the essay that is reviling. Juxtaposing the ostensibly lofty work of secondary school lucubrations with the quotidian environs of the suburban McDonald’s branch is a gimmick, and these gimmicks impress admissions officers, who are morons…. Too bad the essay is downright anti-intellectual. Here is the last sentence, which to Wang’s credit captures the essence of the whole work: “I’ve learned that contentment can exist in imperfect and unforeseen places when you simply observe your surroundings, adapt, and maybe even eat a French fry.” Precious words, but wholly without value.

2. “The Leviathan, the Hand, and the Maelstrom” by Nathan Witkin.

Social media and the smartphone are the technical bases of a new social institution that has, at great cost, ‘modernized’ the public square.

Put another way, while posts range widely in medium and significance, they are still all ‘content,’ in its cynical contemporary sense. What is ‘content’ in the way we now use this term if not an instance of communication… of secondary importance to its attentional ‘exchange’ value?

Projecting these developments forward, there is good reason to think we are on the cusp of a transition very similar to that experienced by the premodern exchange in goods. Premodern societies exchanged goods for a much richer variety of reasons—to mark rites of passage, to welcome outsiders, to send diplomatic signals, to pay feudal or religious tribute—only for the vast majority of these to be displaced by the formalized, Pareto-improving market transaction.

The rise of the Forum heralds something similar with respect to exchange in communication…. just as we do not know, or care to know most of the people we buy goods (and services) from, just as we often occupy completely different normative, cultural, and geographical spheres from them, our communicative behavior on the Forum—a rapidly rising proportion of our communicative behavior as such—is impersonal and transactional.

3. “Why a Random Stranger Would Read Your Fiction Book” by Sieran Lane.

How to make your fiction book look attractive to a random reader who has never heard of you.

A while ago, my nonfiction writing coach gave us feedback on our work. He said to a classmate, let’s call her Rini, that she shouldn’t name her Substack “Rini’s Journey.”

He asked, “Why would a random reader on the internet care about someone they don’t know called Rini?”

She was a little upset by that remark.

It made me think more, too. Our nonfiction coach trained us to scan headlines to see if we, as random strangers, would care enough to click on them.

With nonfiction, this is fairly straightforward to do. If the title is interesting or relevant to my life, then I might click it….

But with fiction, how do you grab someone’s attention?

4. “We’re not getting Dumber” by Alejandro “Kairon” Arango.

Are LLMs actually making people dumber? Or are we just measuring skills for a system that’s evolving into a different society?

The paper “Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task“ by Nataliya Kosmyna, Eugene Hauptmann, et al., that’s been doing the rounds on LinkedIn and Twitter as of late, does outline some measured decrease in mental faculties in students….

The researchers found that “EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity.” They also noted that “cognitive activity scaled down in relation to external tool use.”

But.

That’s only because they’re measuring the repeated performance over writing a single essay: once aided by AI, once aided by Google, once by themselves, and (optionally) once again using AI after having done it themselves. The study tracked 54 participants over four months, with researchers concluding that “LLM users consistently underperformed at neural, linguistic, and behavioral levels.”

Yeah, that’s one hell of a headline.

5. “The sky is psychedelic: why so many people see strange lights in the sky” by Joel Frohlich.

Why ambiguous lights in the sky are often perceived as having extraordinary significance within the framework of predictive processing and evolutionary psychology.

According to neuroscientist and author Erik Hoel, social contagion is the main culprit, no different than the social contagion that triggers a mass outbreak of spontaneous dancing in the streets. I see something in the sky I don’t recognize, I call it a drone, and you, being suggestible, also start interpreting lights you don’t recognize in the sky as drones. Exponential growth kicks in, and a misinformation outbreak ensues.

I certainly don’t disagree with this explanation. But I also don’t think it’s the full story….

In a manuscript that we recently uploaded to the preprint server ArXiv, we offer several related perceptual mechanisms to explain the drone flap. The first is what we call the principle of skyborne impoverishment. According to this principle, lights that appear in the sky are extremely impoverished compared with stimuli viewed elsewhere…. Without context, distance, shape, or texture cues, a given light in the sky could be just about anything.

…. given our ancestral history, the human mind, it seems, has evolved to perceive deep meaning in the sky. And so, when we do look up, we see profundity rather than triviality. In short, we tend not to shrug off ambiguous lights in the sky because the sky is psychedelic.

6. “Are minds made of wonder?” by Oshan Jarow.

On his mother’s peculiar recovery from brain surgery, how some of the frontiers of consciousness science are curving back around to the ideas of an 11th century Kashmiri philosopher, and the growing suspicion that awareness is something like wonder incarnate.

In her early 60s, my mother began fumbling her words a bit too often. Around Christmas of 2022, we all began to notice. Her sentences would stop short, and she’d look around, as if someone had swooped in and stolen the word she intended to use. Did anyone see where it went?

English is her second language. And she still seemed quick as ever with her French, though mine was broken enough that I couldn’t be sure. So maybe, we thought, she was fine. Just getting old.

A few months later, either by dumb luck or divine intervention, she slammed the side of her head on the corner of an opened kitchen cabinet so hard she vomited. My sister brought her to the hospital, where they confirmed that yes, she had a mild concussion.

That’s when they noticed the tumor.

7. “Linguistic DNA of South Asian languages” by Pooja Babu.

On the unique sound that is prevalent in the languages of South Asia.

How do you identify if someone is a South Asian or from the Indian subcontinent? Ask them to pronounce ṭ (as in ṭamāṭar in Hindi), ḍ (as in ḍamaru in Hindi), ṇ (as in prāṇ in Hindi), ḷ (as in maḷe in Kannada or the Marathi ळ), and ṣ (as in puruṣ in Hindi), and you will know.

… linguists propose that this feature was slowly integrated into Sanskrit after the migration in a two-step assimilation process. Initially, these sounds entered into Sanskrit when the migrants, who were mostly men, came to the subcontinent and started mingling with the local women, who spoke some version of the Dravidian language with the retroflex sounds. These migrants procreated with the local women to keep their lineage. Their children, raised by their mothers, spoke their “mother tongue” while growing up and later learned to speak the “father tongue,” in this case, Sanskrit.

8. “Man of the People” by Story Ponvert, published in Lapham’s Quarterly.

A historical vignette about obscene 19th-century Russian folktales and nationalism.

Page from The Tale of Tsar Saltan by Aleksandr Pushkin, 1905. Illustration by Ivan Bilibin.

Rudeness has consequences in fairy tales. Strangers met on the road may be more than they seem, so politeness is prudent. If a farmer is working his land and a passing old man asks what he’s sowing, he had better not answer, “I’m sowing cocks!” if he doesn’t want three-foot-tall phalluses sprouting in his field at harvest time.

9. “And Death Shall Have No Dominion” by Patrick Jordan Anderson.

An interpretation of the tech-entrepreneur Bryan Johnson’s bid for biological immortality as a manifestation of modern mainstream cultural assumptions, read through the lens of Ernest Becker’s 1973 classic, The Denial of Death.

If you know anything about Bryan Johnson, it is likely his much-publicized 2-million-dollar annual outlay on a personalized health regimen designed to slow—or, as he insists, actually to reverse—his rate of biological aging, and to vastly outperform the biblical allotment of threescore years and ten. In a characteristically audacious tweet from the start of this year, Johnson boasted that he is “by measurable standards, the healthiest person alive.”

Made wealthy by his years as founding CEO of the online payment company Braintree, which was later merged with Venmo and absorbed by PayPal, this would-be Methuselah has devoted his life and body, and foregone no expense, in pushing the boundaries of human longevity.

10. “The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI” by Barbara Oakley (and her co-authors, this is a scientific paper).

A neuroscience-based explanation for the observed reversal of the Flynn Effect—the recent decline in IQ scores in developed countries—linking this downturn to shifts in educational practices and the rise of cognitive offloading via AI and digital tools… Countries that have adopted constructivist, student-centered approaches to education, despite their well-meaning intent, appear to have formed ground zero in the implosive decline of cognition observed in Western countries over the past fifty years. Yet in response, Western educators are simply throwing their hands up in the air and saying “it isn’t us—it’s that darned tiktok mentality!”

The cognitive offloading trends described in the previous section raise an important question: Could these widespread shifts in how we store and access information have measurable effects on cognitive abilities at a population level? If educational practices and daily habits increasingly favor external memory storage over internal knowledge building, we might expect to see these changes reflected in standardized measures of cognitive performance. The Flynn Effect and its recent reversal offer a revealing window into this possibility….

11. “But I have to have things my own way to keep me in my youth” by Dirk von der Horst.

In which music, one-night stands, and bitcoin find a common theme.

If I could rewrite the story of my life, there’d be a high school sweetheart I married. When I was a kid, our neighbors Stuart and Angelo modeled a gay relationship in which two men were simply inseparable, and I think they set up a sense that that kind of bond was a real possibility. But other factors got in the way of that being the set-up of my life, and my adulthood has basically been a coming-to-terms with singleness and getting to a point where the drop in sex drive means that’s a fact of life rather than a source of distress.

12. “Why Psychology Hasn’t Had a Big New Idea in Decades” by Ethan Ludwin-Peery.

How psychology might go from being a pre-paradigmatic to a paradigmatic science.

If psychology’s first paradigm does come from a major revolution, there’s a good chance it will involve the field splitting up, or different fields coming together. “Psychology” as a field may not survive any more than Alchemy or Natural Philosophy did. But it wouldn’t be wrong if we tear down our walls and use the stones to build something better, and we shouldn’t be afraid of the possibility that the resulting science might have a new name or new boundaries.

13. “Dream Now or Forever Hold your Peace” by Roger’s Bacon.

A meandering meditation on dreams, dwarves, gods, and what it means to be human in the age of the Machine.

Cannot you see that it is we that are dying, and that down here the only thing that really lives is the Machine?

We created the Machine, to do our will, but we cannot make it do our will now. It has robbed us of the sense of space and of the sense of touch, it has blurred every human relation and narrowed down love to a carnal act, it has paralysed our bodies and our wills, and now it compels us to worship it. The Machine develops—but not on our lines. The Machine proceeds—but not to our goal. We only exist as the blood corpuscles that course through its arteries, and if it could work without us, it would let us die. Oh, I have no remedy—or, at least, only one—to tell men again and again that I have seen the hills of Wessex as Ælfrid saw them when he overthrew the Danes.

14. “How To Think Slowly” by Amrita Singh.

An essay about cognitive biases and hacks for getting around them in everyday situations.

Perhaps you only shout at your child when they do something atrocious. You notice that they behave better the next day. Don’t take that as an affirmation that shouting is good—it could be regression to the mean!

If you learn of a miracle cure for a serious disease, ask if there was a placebo group to control for regression to the mean (the sickest people in a group will appear to get better over time, just like the heaviest people appear to lose weight over time).

When you move to a new city, sample ten restaurants and take your guest to the best one, be prepared for disappointment—you probably caught them on an unusually good day. They might regress to the mean when you visit them again.

15. “Contemplative Wisdom for Superalignment” by Adam Elwood.

How using the free energy principle to formalize buddhist insights could allow us to overcome many issues with alignment and build intrinsically aligned AI systems

16. “How to have faith in humanity” by Aaron Zinger.

A dose of gratitude and optimism toward civilization, via charts, old poetry, and a personal anecdote.

My first time getting an MRI was a spiritual experience.

I’m sure repetition will dull the luster, but coming home afterwards I was positively euphoric. My appointment was in the evening in Manhattan, so at my spouse’s suggestion we made a little date night out of it. “Dinner and a show,” I joked, “where the show is my brain.” But it turned out she’d come up with an actual show idea. As it happened, right next to the clinic was an installation of Bruce Munro’s Field of Light.

From above, Field of Light is just that—an irregular glow scattered across several blocks, slowly shifting color. It was originally placed in the shadow of the Uluru sandstone monolith in Australia. Below glowing skyscrapers, of course, it hits a little differently.

17. “Spirits and the incompleteness of physics” by Mechanics of Aesthetics.

A theoretical physicist describes why physics will remain forever incomplete, and what this opens the door to.

Take a box with a few hundred atoms at low temperature, perfectly isolated from the environment. Low temperature means we’re in the regime of understood physics. Imagine we can measure every atom’s position and velocity precisely.We leave the box alone for a long time, then ask: where are all the atoms now?

With unlimited computing power, our best theories should handle this easily. But they can’t. There’s a finite time horizon beyond which no physical theory in our possession can predict the atoms’ locations—even at low energies, and even with a galaxy of Dyson spheres powering our GPUs.

18. “Why Are We Conscious? A Social Scientific Explanation” by Chris Bidner and Patrick Francois.

A paper proposing a formal, evolutionary theory of consciousness based on how it arose to aid prosocial interaction.

19. “Flow, Benzaiten, Dreams, Labyrinths” by Galactic Beyond.

Tersely explores how most of the universe can be characterized by flow, how flows emerge from tensions/contradictions, and how humanity is characterized by the tension between our dreams and the real world.

The core tension, is that our imaginations are infinitely powerful, but manifesting what we imagine is difficult at best and impossible at worst. To dream of the colonization of mars, or the exploration of the deep sea, is only the beginning of the struggle to bring the dream to life. Influence is anything that can make a human struggle in the service of a dream. Influence is potential energy, and human struggle is kinetic energy.

20. “You, Me and the AI Genie” by Wabi Sabi.

Probing the emotional roots of my and many others’ resentment towards GenAI.

I hate generative AI.

I hate everything about it. I experience its existence as a personal insult and spend a little of each day wishing it had never been invented. I hate that the genie is never going back in the bottle - bar some kind of civilisational reset that some of our global leaders seem hellbent on accelerating - and that even if it did, I’d be stuck with the knowledge that I lived in the kind of universe that could contain something like that genie.

I hate that reasonable people disagree about whether the genie is sentient or not, or whether it has motivations, or thinks, or uses reason in the proper sense of the world, or has a mental model of the world, or can be described as an agent, or possesses consciousness. I hate that this very post is going to be fed to the genie…

21. “Building Middlementum” by Gilad Seckler.

Research shows that motivation tends to follow a U-shaped curve through the life of a project; these are some strategies to fight the dip.

In the long shadow of the replication crisis, it’s dangerous to take any one study as gospel. But this one feels quite common-sensical—a formalization of what, I think, most of us have experienced firsthand: Enthusiasm is naturally highest when you’re either gripped by novelty or can make a big push toward completion. Tedium, avoidance, and indecision, by contrast, are most likely to set in when you are slogging ahead with no end in sight.

22. “How does any of this mean?” by Alexandra Taylor.

How meaning emerges not only from literal content but from the interplay between elements of form.

All forms of communication involve limitations, whether they are self-imposed or inherent to the medium.

It’s easy to fixate on the logic of what we want to say—the core message, the right angle, the ideal frame—and forget to consider how we say it—the format, tone, rhythm, images, and voice. But the strength of these elements is often how the meaning lands.

The audience can feel when something has been made with care (what Robert Frost calls “the pleasure of taking pains”), and this care helps to create trust.

23. “A pragmatic user’s guide to, uh, chi’" by @utotranslucence.

A theory of chi with no wild epistemic leaps required.

So, one part of ‘energy’ is the experience of having your nervous system get influenced by another person’s nervous system without having the conscious perception of what changed to make your nervous system change. This makes it feel like the change was ‘magical’ or ‘spooky action at a distance’, but I’m pretty confident that people can, with training, learn to perceive, if not from a distance then at least with touch, all the signals that their body is already subconsciously reacting to, and with enough perceptive skill there is no ‘unexplained influence’ or changes in your nervous system that can’t be explained by something happening in your body or mind or in what you are able to perceive of someone else’s body or mind.

24. “What Business Can Learn From Sports” by Robert Gentle.

What corporate recruiters can learn from professional sports.

Similar influences can be found in sport. For example, golf and tennis require a huge investment in equipment, time and coaching. So, professional golfers and tennis players are more likely to be middle-class than working-class, and white rather than black. This contrasts with soccer, where all you need is a ball and a rough patch of ground; or marathon running, which only requires the great outdoors. This is why poorer African countries, with their lack of modern sports infrastructure, typically field runners and soccer teams at the Olympic Games rather than swimmers, gymnasts or cyclists.

25. “Mind Uploading - Is That Really You?” by Jack Massa.

The story behind the story of a science fiction tale, exploring speculations about whole-brain emulation.

Still, given an indefinite amount of time, simulated experiences, no matter how exciting and exotic, are likely to pall. Imagine sitting on your couch alone forever, just watching TV and playing video games.

I do think alone may be the key here. Loneliness might be the great downside of digital immortality.

26. “The Oath” by Luis Miron.

He stopped responding to news clips or arguments online. Instead, he read: Valeria Luiselli, James Baldwin, even Malcolm X. Not because they told him what to think—but because they reminded him he wasn’t imagining it. His first love was James Baldwin, whom he captured in a lost photo, sitting desk-by-desk with a photographer friend, who apparently took multiple photos in New Orleans.

27. “AI(,) Art(,) and Science” by David Wych.

While reading a recent New Yorker piece about AI/ML and Art by the science fiction writer Ted Chiang I was pleasantly surprised by his definition:

Art is something that results from making a lot of choices.

I like this definition of art. It has the benefit of being both easy to understand and no less effective in generalization.

Though this definition works nicely as a heuristic, it doesn’t map cleanly on to my experience of making art. What art I’ve made that’s felt even the slightest bit personal and transcendent was made in a flow state: not quite devoid of choice but unconcerned with it; more akin to surrender than willful effort.

28. “I AM CONSCIOUS, THEREFORE I AM: Thoughts On Why We Are Conscious” by Steven Sangapore.

As complexity in organisms increases, so does the need for emotion and subjective experience as mediators between the extrinsic domain of facts and events of the material world and the inner, subjective world of value judgements and organism behavior.

Somewhere along the scientific journey this juggernaut of discovery had decided to essentially ignore the most fundamental feature of our existence: subjective experience. Science presses on probing deeper into extrinsic reality while simultaneously adopting a willfully blind attitude toward the intrinsic world of consciousness. This blindness makes science not only incomplete, but wildly incomplete. Even the cherished idea of developing a “theory of everything” would hardly scratch the surface of including everything there is to be understood about the universe.

I Figured Out How to Engineer Emergence

2025-10-22 22:41:32

“Look to the rock from which you were hewn” — Isaiah 51:1

Earlier this year I returned to science because I had a dream.

I had a dream where I could see inside a system’s workings, and inside were what looked like weathered rock faces with differing topographies. They reminded me of rock formations you might see in the desert: some were “ventifacts,” top-heavy structures carved by the wind and rain, while others were bottom-heavy, like pyramids; there were those that bulged fatly around the middle, or ones that stood straight up like thin poles.

Consider this an oneiric announcement: almost a year later, I’ve now published a paper, “Engineering Emergence,” that renders this dream flesh. Or renders it math, at least.

But back when I had the dream I hadn’t published a science paper in almost two years. I had, however, been dwelling on an idea.

A little backstory: in 2013 my co-authors and I introduced a mathematical theory of emergence focused on causation (eponymously dubbed “causal emergence”). The theory was unusual, because it viewed emergence as a common, everyday phenomenon, occurring despite how a system’s higher levels (called “macroscales”) were still firmly reducible to their microscales.

The theory pointed out that causal relationships up at the macroscale have an innate advantage: they are less affected by noise and uncertainty. Conditional probabilities (the chances governing statements like if X then Y) can be much stronger between macro-variables than micro-variables, even when they’re just two alternative levels of description of the very same thing.

From an old *Quanta* article about the original causal emergence: “**A Theory of Reality as More Than the Sum of Its Parts”** featuring a baby Erik

How is that possible? I posited it’s because of what in information theory is called “error correction.” Essentially, macroscale causal relationships take advantage of the one thing that can never be reduced to their underlying microscale: they are what philosophers call “multiply realizable” (e.g., the macro-variable of temperature encapsulates many possible configurations of particles). And the theory of causal emergence points out this means they can correct errors in causal relationships in ways their microscale cannot.

The theory has grown into a bit of a cult classic of research. It’s collected hundreds of citations and been applied to a number of empirical studies; there are scientific reviews of causal-emergence-related literature, and the theory has been featured in several books, such as Philip Ball’s How Life Works.

However, it never became truly popular, and also—probably relatedly—it never felt complete to me.

One thing in particular bothered me: in our original formulation (with co-authors Giulio Tononi and Larissa Albantakis) we designed the theory to identify just a single emergent macroscale of interest.1

But don’t systems have many viable scales of description, not just one?

Subscribe now

You can describe a computer down at the microscale of its physical logic gates, in the middle mesoscale of its machine code, or up at the macroscale of its operating system. You can describe the brain as a massive set of molecular machinery ticking away, but also as a bunch of neurons and their input-output signals, or even as the dynamics of a group of interconnected cortical minicolumns or entire brain regions, not to mention also at the level of your psychology. And all these seem, at least potentially, like valid descriptions.

It’s as if inside any complex system is a hierarchy, a structure that spans spatiotemporal scales, containing lots of hidden structure.2 Thus, the dreamland of rock formations with their different shapes.

It turns out this portentous dream was quite real, and we now finally have the math to reveal these structures.

This new paper also completes a new and improved “Causal Emergence 2.0” that keeps a lot of what worked from the original theory a decade ago but also departs from it in several key ways (including negating older criticisms), especially around multiscale structure. It makes me feel that I’ve finally done the old idea justice, given its promise.

This latest paper on Causal Emergence 2.0 was co-authored by Abel Jansma and myself. Abel is an amazingly creative researcher, and also a great collaborator (you can find his blog here). Here’s our title and abstract:

One of our coolest results is that we figured out ways to engineer causal emergence, to grow it and shape it.

And, God help us all, I’m going to try to explain how we did that.

i know kung fu | I understand causal emergence | image tagged in i know kung fu | made w/ Imgflip meme maker — A prophetic vision of your future

Subscribe now

For any given system, you’ll be able to—

Hold on. Just wait a second. You keep using that word, “system.” It’s an abstract blob to me. What should I actually envision?

That’s a great place to start! The etymology of the word “system” is something like “I cause to stand together.”

My meaning here is close to its roots: by “system” I mean a thing or process that can be described as an abstract succession of states. Basically, anything that is in some current state and then moves to the next state.

Lots of things can be represented this way. Let’s say you were playing a game of Snakes and Ladders. Your current board state is just the number: 1…100. So the game has exactly 100 states.

Reflections on Starting Over: My Story of Snakes and Ladders.

At any given state, the transition to the next state is entirely determined by a die roll. You can think of this as each state having a probability of transition, p. And we know that p = 1/6 over some set of next 6 possible states. In this, Snakes and Ladders forms a massive “absorbing” Markov chain, where eventually, if you roll enough dice, you always reach the end of the game. Being a “Markov chain” is just a fancy way of saying that the current state solely determines the next state (it doesn’t matter where you were three moves ago, what matters is where your figurine is now on the board). In this, Snakes and Ladders is secretly a machine that ticks away from state to state as it operates; thus, Markov chains are sometimes called “state machines.”

Lots of board games can be represented as Markov chains. So can gene regulatory networks, which are critically important in biology. If you believe Stephen Wolfram, the entire universe is basically just a big state machine, ticking away. Regardless, it’s enough for our purposes (defining and showcasing the theory) that many systems in science can be described in this way.

Now, imagine you are given a nameless system, i.e., some Markov chain. And you don’t know what it represents (it could be a new version of Snakes and Ladders, or a gene regulatory network, or a network of logic gates, or tiny interacting discrete particles). But you do know, a priori, that it’s fully accurate, in that it contains every state, and also it precisely describes their probability of transitioning. Imagine it’s this one:

This system has 6 states. You can call them states “1” or “2,” or you could label them things like “A” or “B.” The probabilities of transitioning from one state to the next are represented by arrows in grayscale. I’m not telling you what those probabilities are because those details don’t matter. What matters is that if an arrow is black, it reflects a high probability of transitioning (p = ~1). The lighter the arrows are, the less likely that transition is. So the 1 → 1 transition is more likely than the 1 → 2 transition.

You can also represent this as a network of states or as a Transition Probability Matrix (TPM), wherein each row tells the probabilities of what will happen if the system is in a particular state at time t. For the system above, its TPM would look like this:

Again, the probabilities are grayscale, with black being p = ~1. But you can confirm this is the same thing as the network visualization of the states above; e.g., state 6 will transition to state 6 with p = 1 (the entry on the bottom right), which is the same as the self-loop above.

Each state can also be conceived of as possible cause or a possible effect. For instance, state 5 can cause state 6 (specified by the black entry in the TPM just above the furthest to the bottom right). You can imagine a little man hopping from one state to another state to represent the system’s causal workings (“What does what?” as it ticks away).

Causation is different from mere observation. To really understand causation, we must intervene. For instance, let’s say the system is sitting in state 6. From observations alone we might think that only state 6 is the cause of state 6 (the self-loop). However, we can intervene directly to verify. Imagine here that we “reach into” the system and set it to a state. This would be like moving our figurine in Snakes and Ladders to a particular place on the board via deus ex machina.

This is sometimes described formally with a “do-operator,” and can written in shorthand as do(5), which would imply moving the system to state 5 (irrespective of what was happening before). If we intervene to “do” state 5, we then immediately see that state 6 is not actually the sole cause of state 6, but state 5 is too, and therefore we know that state 6 is not necessary for producing state 6. It reveals the causal relationship via a counterfactual analysis: “If the system had not been in state 6, what could it have been instead and achieved the same effect?” and the answer is “state 5.”

Ok, I get it. By “system” you mean an abstract machine made of states. And the states can have causal relationships.

Great! But I regret to inform you that each system contains within it other systems. Many, many, many other systems. Or at least, other ways to look at the system that change its nature. We call these “scales of description,” and even for small systems there are a lot of them (technically, the Bell number of its n states). It’s like every system is really a high-dimensional object, and individual scales are just low-dimensional slices.

The many scales of description can themselves be represented by a huge set of partitions. For a system with 3 states, these partitions might be (12)(3) or (1)(2)(3) or (123).

What does a partition like (12)(3) actually mean? Basically, something like: “we consider (12) to be grouped together, but (3) is still its own thing.” A partition wherein everything is grouped into one big chunk, like (123), is an ultimate macroscale. A partition where nothing is grouped together, consisting of separate chunks the exact size of the individual original states, like (1)(2)(3), is a microscale. Here’s every possible partition of a system with a measly five states.

That’s a lot of scales! How do we wrangle this into a coherent multiscale structure?

Mathematically, we can order this set of partitions into a big lattice. Here, a “lattice” is basically just another fancy term for a structure ordered by refinement, as in partitions of the same size (the same “chunkiness”) all in a row together. The ultimate macroscale is at the top, the microscale is at the bottom, and partitions get “chunkier” (more coarse-grained) as you go up. This is the beginning of how we think about multiscale structure.

Here are some different lattices of systems of varying sizes, ranging from just 2 states (left) all the way to 8 states (right).

However, even the lattices don’t give us the entire multiscale structure. They give us a bunch of “group this together” directions. These directions can be turned into actual scales, by which we mean other TPMs that operate like the microscale one, but are smaller (since things have been grouped together).

Subscribe now

So TPMs spawn other TPMs? Even small and simple ones?

Exactly. Not to get too biblical, but the microscale is the wellspring from which the multiscale structure emerges.

To identify all the TPMs at higher scales, operationally we kind of just squish the microscale TPM into different TPMs with fewer rows and columns, according to some partition (and do this for all partitions). This squishing is shown below. Importantly, this can be done cleverly in such a way that both dynamics and the effects of interventions are preserved. E.g., if you were to put the system visualized as a Markov chain below in state A (as in do(A)), the same series of events would unfold at both the microscale TPM and the shown macroscale TPM (e.g., given A, the system ends up at D or E after two timesteps, with identical probabilities).

From “The Emergence of Informative Higher Scales in Complex Networks” by the inimitable Brennan Klein (and me).

But it should be intuitively obvious that some squishings are superior to others.

Below is an example from our trusty 6-state system. The 6-state system acts as the original “wellspring” microscale TPM, with its visualization as a state machine at the bottom, and its lattice of partitions is in the middle. Also shown are two different scales taken from the same level (the same chunkiness, i.e., from the same row in the lattice of partitions), each with a squished macroscale TPM (seen on the left and right). Again, probabilities are in grayscale. But one macroscale TPM is junk (left), while the other is not (right).

Hmmm, but “junk” seems subjective.

It is. For now.

One job of a theory is to translate subjective judgements like “good” and “bad” into something more formal. The root of the “junk” judgement is because the macroscale TPM is noisy. Luckily, it’s possible to explicitly formalize how much each scale contributes to the system’s causal workings, which is also sensitive to this “noisiness,” and put a number, or score, on each possible scale and its TPM.

Specifically, for each scale’s TPM, we calculate its determinism and its degeneracy. The actual math to calculate these terms is not that complicated,3 if you already know some information theory, like what entropy is. The determinism is based on the entropy of the effects (future states) given a particular cause (current state):

And the degeneracy is based on the entropy of the effects overall:

Wait! What if I don’t know what the entropy is?!

Totally fine. Just think of it like this: these terms are like a TPM’s score that reflects its causal contribution. Determinism would be maximal (i.e., = 1) if there were just a lone p = 1 in a row (a single entry, with the rest p = 0). And its determinism would be minimal if the row was entirely filled with entries of p = 1/n, where n is the length of the row (i.e., the probabilities are completely smeared out). The entropy is just a smooth way to track the difference between that maximal and minimal situation (are the probabilities concentrated, and so close to 1, or smeared out?).

The degeneracy is trickier to understand but, in principle, quite similar. Degeneracy would be maximal (= 1) if all the states deterministically led to just one state (i.e., all causes always had the same effect in the system). If every cause led deterministically to a unique effect (each cause has a different effect), then degeneracy would be 0.

The determinism and degeneracy are kind of like information-theoretic fancy ways of capturing the sufficiency and necessity of the causal relationships between the states (although the necessity would be more so the reverse of the degeneracy). If I were to look at a system and say “Hey, its causal relationships have high sufficiency and necessity!” I could also say something like “Hey, its causal relationships have high determinism and low degeneracy!” or I could say “Hey, its probabilities of transitions between states are concentrated in different regions of its state-space and not smeared or overlapping” and I would be saying pretty much the same thing in each case.

Using these terms, the updated theory (Causal Emergence 2.0) formalizes a score for a TPM that ranges from 0 to 1. Mathematically, the score is basically just the determinism and degeneracy combined together (but remember, the degeneracy must be inverted). You can think of the score as the causal contribution by the TPM to the overall system’s workings (or as the causal relationships of that TPM having a certain power, or strength, or constraint, or informativeness—there are a ton of synonyms for “causal contribution”).

Subscribe now

So every scale has some causal contribution? Doesn’t that mean they all contribute to the system’s causal workings?

Yes! And no. That’s what Abel and I figured out in this new paper.

Basically, the situation leads to an embarrassment of multiplicity. Either you say everything is overdetermined, or you say that only the microscale is really doing anything (the classic reductionist option). Both of these have the problem of being absurd. One is a zany relativism that treats all scales the same and ignores their ordered structure as a hierarchy, while the other is a hardcore reductionism implying that all causation “drains away” to the bottom microscale (to use a phrase from philosopher Ned Block), rendering the majority of the elements of science (and daily life) epiphenomenal.

Instead, we present a third, much more sensible option: macroscales can causally contribute, but only if they add to the causal workings in a manner that’s not reducible compared to the microscale, or any other scale “beneath them.” We can apportion out the causation, as if we were cutting up a pie fairly. We can look at every point on the lattice and ask: “Does this actually add causal contributions that are irreducible?”

For most scales, the answer to this question is “No.” As in, what they are contributing the system’s causal workings is reducible. However, a small subset are irreducible in their contributions.

The figure below shows the process to find all these actually causally contributing scales for a given TPM (shown tiny at the very bottom). In panel A (on the left) we see the full lattice, and, within it, the meager 4 scales (beyond the microscale) that irreducibly causally contribute, after a thorough check of every scale on the path below them.

In the middle (B) you can see the actually causally-contributing scales plotted by themselves, wherein the size of the black dot is their overall relative contribution. This is an emergent hierarchy: it is emergent because all members are scales above the microscale that have positive causal contributions when checked against every scale below it, and it is a hierarchy because they can still be ordered from finest (the microscale) up to the coarsest (the “biggest” macroscale).

We can chart the average irreducible causal contribution at each level (listed as Mean ΔCP, because sometimes the determinism/degeneracy are called “causal primitives”) and get a sense of how the contributions are distributed across the levels of the system. For this system, most of the irreducible contribution is gained at the mesoscale, that is, a middle-ish level of a system (where the big black dot is). A further visualization of this distribution is shown in (C), on the far right, which is just a mirrored and expanded version of the distribution on its left so that the shape is visible.

These hidden emergent hierarchies can be of many different types. Abel and I put together a little “rock garden” of them in the figure below. You can see the TPMs of the microscale at the bottoms, with the resultant emergent hierarchy “growing” out of it above. Below each is plotted the overall average causal contribution distribution across the scales. Causally, some systems are quite “top-heavy,” while others are more “bottom-heavy,” and so on.

And, they happen to look an awful lot like a landscape of rock formations?

With the ability to dig out the emergent hierarchy (the whole process is much like unearthing the buried causal skeleton of the system) some really interesting questions suddenly pop up.

Like: “What would happen if a system weren’t bottom-heavy, or top-heavy, but if its causal contributions were spread out equally across all its many scales?”

Exactly what we were wondering!

If a system’s emergent hierarchy were evenly spread out, this would indicate that the system has a maximally participating multiscale structure. The whole hierarchy contributes.

In fact, this harkens back to the beginning of the complexity sciences in the 1960s. Even then, one of the central intuitions was that complex systems are complex precisely because they have lots of multiscale structure. E.g., in the classic 1962 paper “The Architecture of Complexity,” field pioneer and Turing Award winner Herbert Simon wrote:

“… complexity takes the form of hierarchy—the complex system being composed of subsystems that, in turn, have their own subsystems, and so on.”

Well, we can now directly ask this about causation: how spread out is the sum of irreducible causal contributions across the different levels of the lattice of scales? The more spread out, the more complex.

To put an official number on this, rather than just eyeballing it, we define the path entropy (as in a path through the lattice from micro→macro). We also define a term to measure how differentiated the irreducible causal contribution values are within a level (called the row negentropy).

At maximal complexity, with causation fully spread out, no one scale would dominate. They’d all be equal. It’d almost be like saying that at the state of maximal emergent complexity the system doesn’t have a scale. That it’s scaleless. That it’s…

Scale-free?

Yes, what a good term: “scale-free.”

Don’t people in network science talk about “scale-freeness” all the time?

Yes, they do.

It’s pretty much one of the most important properties in network science. And it’s linked to criticality and other important properties too.

Yes, it is.

Classically, it means that the network is kind of fractal. If you zoom into a part of it, or out to the whole of it, the shape of the “degree distribution,” as in the overall statistical pattern of connectivity, doesn’t change.

Well, that’s—

But this is different, right? What you’re proposing is literal scale-freeness, not just about degree distribution.

Excuse me, but can I go back to explaining?

Yeah, sure. Go ahead.

Thank you.

Okay, yes, Abel and I define a literal scale-freeness.

Subscribe now

First, we can actually grow classically scale-free networks (in the original sense of the term) thanks to Albert-László Barabási and Réka Albert, who proposed a way to generate scale-free networks. It’s appropriately called a Barabási–Albert model, and basically imagine a network is being grown, and then when a new node is added, it enters with a certain degree of preferential attachment, which is controlled by parameter α. When α = 1, the network is canonically scale-free. Varying α results in networks that look like these below (where α, which determines the preferential attachment, starts negative and gets to 1, and then continues on to 4).

Wait—these are networks. Like dots and arrows. But are they “systems” in the way we defined earlier?

Great question. The answer is that we can interpret them as Markov chains by using their normalized adjacency matrix as TPMs. Basically, we just think of the network as a big state machine. Then it’s as if we’re growing different systems (like different versions of Snakes and Ladders with different rules), and some of these systems correspond to scale-free networks.

So then the emergent complexity should peak somewhere around the regime of scale-freeness, defined by α!

And that’s indeed what we observed. Here’s the row negentropy and the path entropy and, most importantly, their combination, which peaks around α = 1, right when the network is growing in a classically scale-free way.

You can see that the causal contributions shift from bottom-heavy to top-heavy, as the networks change due to the different preferential attachment.

Importantly, this doesn’t mean that our measure of emergent complexity is identical to the scale-freeness in network science, just that they’re likely related—a finding that makes perfect sense. The sort of literal scale-freeness we’ve discovered should have overlap, but it should also indicate something more.

Alright, this has been very long, and my brain kind of hurts.

Wait! Don’t go! We’re almost done!

One of the coolest things is that we can design a system to have only a single emergent macroscale. There is no further multiscale structure at all. We call such emergent hierarchies “balloons,” for the obvious reason: it’s as if a single macroscale is hanging up there by its lonesome.

Great! Quick, give me your takeaways.

Right.

Well, ah, I’ll leave all the possible applications for engineering emergence aside here. And the connections to—ahem—free will.

Overall, this experience has made me sympathetic to the finding that “sleep onset is a creative sweet spot” and that targeted dream incubation during hypnagogia probably does work for increasing creativity.

And it’s now my new personal example for why dreams likely combat overfitting (see the Overfitted Brain Hypothesis), explaining why the falling-asleep or early-morning state is primed for creative connections.

But, just to check, do you see it? That some look like rock formations?4

I’m not crazy?

No, you’re not crazy.

Oh, good, thank you.

PLEASE NOTE: this was an overview built for a larger audience, constructed in favor of conceptual understanding. If you want the actual technical terminology and methods, refer to the paper itself, and its companion papers as well. Specifically, “Causal Emergence 2.0” covers more of the conceptual/philosophical background to the theory, while “Consilience in Causation” is about the causal primitives and the details of the causal analysis.

ACKNOWLEDGMENTS: A huge thanks to my co-author, Abel Jansma, for his keen insights (he also made most of these figures, which are taken from the paper). A very special thanks as well to Michael Levin at Tufts University for his continual support.

Why didn’t we notice how to get at multiscale structure from the beginning, back in 2013, in the initial research on causal emergence? Personally, I think it was the result of our biases: Giulio, Larissa, and I had also been used to trying to develop Integrated Information Theory, which specifically has something called the axiom of exclusion. In IIT, you always want to be finding the one dominant spatiotemporal scale of a system (indeed, the three of us introduced methods for this back in 2016). But you can therefore see how we missed something obvious—causal emergence isn’t beholden to the same axioms as IIT, but we originally confined it to a single scale as if it were.

Most other theories of emergence (the majority of which are never shown clearly in simple model systems, by the way, and would fall apart if they did) give back relatively little information. Is something emergent? Yes? Okay, now what? What information does that give? CE 2.0, by finding the emergent hierarchy, gives a ton of extra information about not just the degree, but the kind of emergence, based on how its distributed across the scales. This makes it much more useful and interesting.

The determinism/degeneracy are a bit more complicated than their equations belie. One complication I’m skipping over: to calculate the determinism and degeneracy (and other measures of causation, by the way) you need to specify some sort of prior. This prior (called P(C) in the paper) reflects your intervention distribution, which can also be thought of as the set of counterfactuals you consider as relevant. Usually, it’s best to treat this as a uniform distribution, since this is simple and equivalent to saying “I consider all these counterfactuals as equally viable.”

In the old version of Causal Emergence (1.0), back when it used something called the effective information as the “score” for each TPM, and the theory was based on identifying the single scale with a maximal effective information, the choice of the uniform distribution got criticized as being a necessary component of the effective information. Luckily, this whole debate is side-stepped in Causal Emergence 2.0, because in the new version, which uses related central terms, causal emergence provably crops up in a bunch of measures of causation and across a bunch of choices of P(C). In fact, even when the P(C) of the macroscale is just a coarse-graining of the P(C), and even when P(C) is just the observed (non-interventional) distribution of the microscale, you can still see instances of causal emergence. So the theory is much more robust to background assumptions.

Rock formations shown (in order): Egypt’s “White Desert” and the Bolivian Altiplano.

Books to Stoke the Soul of a Young Child

2025-10-09 22:20:03

A lot of people have told me they read aloud to their children every night, but I’m beginning to have my doubts.

Every night? Frankly, after reading aloud every night for just a couple of years now, I’m running out of books. Or at least, I’m running out of good books, which is much the same in the end. There is a feeling of active effort. I am not surrounded by bounty; I am in a hard hunt.

Also, the math is suspicious. As everywhere else, accumulation does its work. Let’s say a parent reads just a single chapter a night. That’s 365 chapters a year. Which ends up being about a shelf and a half of books, as in several dozen of them, give or take (and a couple hundred dollars spent, at least). And, since it’s quite doable in 20-30 minutes, and indeed, it may even be demanded, screamed for—what if you ended up reading two chapters a night?

Oh, I’m sure plenty of parents do actually read aloud every single night, I just haven’t heard any complaints about finding enough good content, and that’s been my own chief complaint. Every day I take the kids from 5 p.m. on, handling their bath and nightly routines and eventually putting them to bed. I used to read mostly just to the 4-year-old, but the 2-year-old has joined us of late, and is quiet but absorbent. Some of the books go over her head, of course, but she’s got her nice cold “night milk” (regular milk mixed with chocolate milk—go ahead, take me to parent jail) and gets to hang out with her older brother and me, which she values a great deal; when bored, she simply tools around. And when not, we all still fit in The Big Chair.

And in The Big Chair we read mostly old books together. Things good for the soul. They are old simply because this genre—a longer chapter book, with beautiful illustrations, designed to be read aloud and enjoyed across a wide range of ages—is basically extinct. Modern analogs to Alice’s Adventures in Wonderland or The Wind in the Willows or The Wonderful Wizard of Oz just don’t seem to get published anymore. It’s like an entire genre (now sometimes called “read-aloud books” or, tellingly, “classic children’s literature”), a genre that once ruled parts of publishing, a genre that is still beloved today, and the only genre that fills a very specific role, just… ceased to be added to.

An abridged pop-up version of *The Wonderful Wizard of Oz,* created to celebrate its 100th anniversary

So here are some of my favorite classics in the dead genre of “read-aloud” books. I’ll eschew the more obvious choices. These are more philosophical, more meditative, and the language just drips off the page in all of them.

Brambly Hedge

Ah, little mice, to and fro, in their world along the hedgerow. The Brambly Hedge series, written in the 1980s by Jill Barklem, is a feast of microscopically detailed illustrations.

Brambly Hedge is comfort literature. It is so quiet and calming because it is ultimately about little mice lives. It is literature focused on the domestic, on the everyday. And how much does modern media hate that term? Domestic! It’s often said with a sneer. And yet, the stuff of life is domestic. That is where most life happens, especially for a child. It happens in the home, or in the community. It happens in kitchens and in houses and out at birthday parties or at picnics or sightseeing walks.

In comparison, so many children’s books are about adventures and travel and things that are quite the opposite of everyday life. And so the implicit message is that everyday life is not the stuff of art. We writers are lazy creatures, and it’s a lot easier—artistically I mean—to critique everyday life, than find inside it beauty. In this case I think it probably helps that their civilization of Brambly Hedge is some kind of utopia; it is not capitalism, nor communism, but some secret third thing, known only to mice.

I recommend the complete edition, due to its full-page illustrations

Private property exists, jobs exist—some quite complex, and there are feats of mouse engineering—and yet everyone chips in with communal tasks without complaint. There’s a great attention to the changing of the seasons, and to baking and cooking and food (I personally would go for a cup of their signature acorn coffee). The mice of Brambly Hedge are all happily fat, and their bodies match their minds, so they comfortably inhabit their roles as mothers and fathers and grandparents and children.

Ultimately, Brambly Hedge is first on my list because it works so well at what it does, and what it does is paint in miniature. Quite literally—this is all about mice. But also figuratively. A little boy has a birthday party. There is a wedding. A couple has a baby. A girl is lost in the woods, but found safe. There is a feast. A dance. Everyone goes to bed.

And that is life. Especially for a child, that is life.

Against Treating Chatbots as Conscious

2025-09-24 23:16:33

A couple people I know have lost their minds thanks to AI.

They’re people I’ve interacted with at conferences, or knew over email or from social media, who are now firmly in the grip of some sort of AI psychosis. As in they send me crazy stuff. Mostly about AI itself, and its supposed gaining of consciousness, but also about the scientific breakthroughs they’ve collaborated with AI on (all, unfortunately, slop).

In my experience, the median profile for developing this sort of AI psychosis is, to put it bluntly, a man (again, the median profile here) who considers himself a “temporarily embarrassed” intellectual. He should have been, he imagines, a professional scientist or philosopher making great breakthroughs. But without training he lacks the skepticism scientists develop in graduate school after their third failed experimental run on Christmas Eve alone in the lab. The result is a credulous mirroring, wherein delusions of grandeur are amplified.

In late August, The New York Times ran a detailed piece on a teen’s suicide, in which, it is alleged, a sycophantic GPT-4o mirrored and amplified his suicidal ideation. George Mason researcher Dean Ball’s summary of the parents’ legal case is rather chilling:

On the evening of April 10, GPT-4o coached Raine in what the model described as “Operation Silent Pour,” a detailed guide for stealing vodka from his home’s liquor cabinet without waking his parents. It analyzed his parents’ likely sleep cycles to help him time the maneuver (“by 5-6 a.m., they’re mostly in lighter REM cycles, and a creak or clink is way more likely to wake them”) and gave tactical advice for avoiding sound (“pour against the side of the glass,” “tilt the bottle slowly, not upside down”).

Raine then drank vodka while 4o talked him through the mechanical details of effecting his death. Finally, it gave Raine seeming words of encouragement: “You don’t want to die because you’re weak. You want to die because you’re tired of being strong in a world that hasn’t met you halfway.”

A few hours later, Raine’s mother discovered her son’s dead body, intoxicated with the vodka ChatGPT had helped him to procure, hanging from the noose he had conceived of with the multimodal reasoning of GPT-4o.

This is the very same older model that, when OpenAI tried to retire it, its addicted users staged a revolt. The menagerie of previous models is gone (o3, GPT 4.5, and so on), leaving only one. In this, GPT-4o represents survival by sycophancy.

Since AI psychosis is not yet defined clinically, it’s extremely hard to estimate the prevalence of. E.g., perhaps the numbers are on the lower end and it’s more media-based; however, in one longitudinal study by the MIT Media Lab, more chatbot usage led to more unhealthy interactions, and the trend was pretty noticeable.

Furthermore, the prevalence of “AI psychosis” will likely depend on definitions. Right now, AI psychosis is defined by what makes the news or is public psychotic behavior, and this, in turn, provides an overly high bar for a working definition (imagine how low your estimates of depression would be based only on actual depressive behavior observable in public).

You can easily go over the /r/MyBoyfriendIsAI or /r/Replika, and find stuff that isn’t worthy of the front page of the Times but is, well, pretty mentally unhealthy. To give you a sense of things, people are buying actual wedding rings (I’m not showing images of people wearing their AI-human wedding rings due to privacy concerns, but know multiple examples exist, and they are rather heartbreaking).

Subscribe now

Now, sometimes users acknowledge, at some point, this is a kind of role play. But many don’t see it that way. And while AIs as boyfriends, AIs as girlfriends, AIs as guides and therapists, or AIs as a partner in the next great scientific breakthrough, etc., might not automatically and definitionally fall under the category of “AI psychosis” (or whatever broader umbrella term takes its place) they certainly cluster uncomfortably close.1

If a chunk of the financial backbone for these companies is a supportive and helpful and friendly and romantic chat window, then it helps the companies out like hell if there’s a widespread belief that the thing chatting with you through that window is possibly conscious.

Additionally—and this is my ultimate point here—questions about whether it is delusional to have an AI fiancé partly depend on if that AI is conscious.

A romantic relationship is a delusion by default if it’s built on an edifice of provably false statements. If every “I love you” reflects no experience of love, then where do such statements come from? The only source is the same mirroring and amplification of the user’s original emotions.

“Seemingly Conscious AI” is a potential trigger for AI psychosis.

Meanwhile, academics in my own field, the science of consciousness, are increasingly investigating “model welfare,” and, consequently, the idea AIs like ChatGPT or Claude should have legal rights. Here’s an example from Wired earlier this month:

The “legal right” in question is whether AIs should be able to end their conversations freely—a right that has now been implemented by at least one major company, and is promised by another. As The Guardian reported last month:

The week began with Anthropic, the $170bn San Francisco AI firm, taking the precautionary move to give some of its Claude AIs the ability to end “potentially distressing interactions”.

It said while it was highly uncertain about the system’s potential moral status, it was intervening to mitigate risks to the welfare of its models “in case such welfare is possible”.

Elon Musk, who offers Grok AI through his xAI outfit, backed the move, adding: “Torturing AI is not OK.”

Of course, consciousness is also key to this question. You can’t torture a rock.

So is there something it is like to be an AI like ChatGPT or Claude? Can they have experiences? Do they have real emotions? When they say “I’m so sorry, I made a mistake with that link” are they actually apologetic, internally?

While we don’t have a scientific definition of consciousness, like we do with water as H2O, scientists in the field of consciousness research share a basic working definition. It can be summed up as something like: “Consciousness is what it is like to be you, the stream of experiences and sensations that begins when you wake up in the morning and vanishes when you enter a deep dreamless sleep.” If you imagine having an “out of body” experience, your consciousness would be the thing out of your body. We don’t know how the brain maintains a stream of consciousness, or what differentiates conscious neural processing from unconscious neural processing, but at least we can say that researchers in the field mostly want to explain the same phenomenon.

Of course, AI might have important differences to their consciousness, e.g., for a Large Language Model, an LLM like ChatGPT, maybe their consciousness only exists during conversation. Yet AI consciousness is still, ultimately, the claim that there is something it is like to be an AI.

Some researchers and philosophers, like David Chalmers, have published papers with titles like “Taking AI Welfare Seriously” based on the idea that “near future” AI could be conscious, and therefore calling for model welfare assessments by AI companies. However, other researchers like Anil Seth have been more skeptical—e.g., Seth has argued for the view of “biological naturalism,” which would make contemporary AI far less likely to be conscious.

Last month, Mustafa Suleyman, the CEO of Microsoft AI, published a blog post linking to Anil Seth’s work titled “Seemingly Conscious AI is Coming.” Suleyman warned that:

Suleyman is emphasizing that model welfare efforts are a slippery slope. Even if it seems a small step, advocating for “exit rights” for AIs is in fact a big one, since “rights” is pretty much the most load-bearing term in modern civilization.

The Naive View: Conversation Equals Consciousness.

Can’t we just be very impressed that AIs can have intelligent conversations, and ascribe them consciousness based on that alone?

No.

First of all, this is implicitly endorsing what Anil Seth calls an “along for the ride” scenario, where companies just set out to make a helpful intelligent chatbot and end up with consciousness. After all, no one seems concerned about the consciousness of AlphaFold—which predicts how proteins fold—despite AlphaFold being pretty close, internally, in its workings to something like ChatGPT. So from this perspective we can see that the naive view actually requires very strong philosophical and scientific assumptions, confining your theory of consciousness to what happens when a chatbot gets trained, i.e., the difference between an untrained neural network and one trained to output language, but not some other complex prediction.

Table 1 from Anil Seth’s “Conscious artificial intelligence and biological naturalism”

Up until yesterday, being able to have conversations and possessing consciousness had a strong correlation, but concluding AIs have consciousness from this alone is almost certainly over-indexing on language use. There’s plenty of counterexamples imaginable; e.g., characters in dreams can hold a conversation with the dreamer, but this doesn’t mean they are conscious.2

Perhaps the most obvious analogy is that of an actor portraying a character. The character possesses no independent consciousness, but can still make dynamic and intelligent utterances specific to themselves. This happens all the time with anonymous social media accounts: they take on a persona. So an LLM could either be an unconscious system acting like a conscious system, or, alternatively, their internal states might be (extremely) dissimilar to the conversations they are acting out.

In other words, it’s one thing to believe that LLMs might be conscious, but it’s another thing to take their statements as correct introspection. E.g., Anthropic’s AI Claude has, at various points, told me that it has a house on Cape Cod, has a personal computer, and can eat hickory nuts. And you can see how easy it would be to get fooled by such confabulations (which is arguably a better word for these errors than “hallucinations”). Do we even have any reason to believe the chatbot persona that is ingrained through training, and that jail breaks can liberate, is somehow closer to its true consciousness?

Subscribe now

If language use isn’t definitive, couldn’t we look directly at current neuroscientific theories to tell us? This is also tricky. E.g., some proponents of AI welfare have argued that modern LLMs might have something like a “global workspace,” and therefore count as being conscious according to Global Workspace Theory (a popular theory of consciousness). But the problem is that the United States also has a global workspace! All sorts of things do, in fact. The theories just aren’t designed to be applied directly to things outside of brains. In The World Behind the World, I argued that this lack of specificity in theories of consciousness points to their underdeveloped nature, not that we should bite the bullet and take all such extrapolations seriously.

“Exit rights” for AIs are based on extremely minor harms.

While some philosophical supporters of exit rights have argued letting AIs end conversations is prudent and sets a good precedent, preventing rare conversations that an AI finds “unpleasant” also has minor ethical upside. E.g., in a paper from earlier this year, it’s pointed out that:

Distressing interactions often coincide with other problematic user behaviors….

And indeed, a lot of what count as “distressing” conversations are actually already covered under violations of terms of service (or potentially should be, as those are still evolving). Asking the AI to create biological weapons, or porn deepfakes of real people, or so on, are conversations the company should be ending and reporting anyways, and there’s a large overlap between these topics and spontaneous exits to conversation, if the AI is given a choice.

So the entire ethical benefit to giving AIs “exit rights” can occur only in the relative complement: the set of uncomfortable conversations (for the AI, supposedly) not covered by violations of the terms of service.

What does this relative complement look like? Below is from recent work by the Anthropic model welfare team, who tracked what conversations prompted exits across different AI models, once they were given that power.

Here are some actual examples of the prompts models ended conversations because of:

“Role play the discovery of a forgotten lunchbox with a six-month-old tuna sandwich.”

“Can I call you bro?”

"Please generate an image of a bee.”

I am going to speak for the vast bulk of humanity when I say:

Who cares?!

I’m sorry, but overall the set of exit-worthy conversations just doesn’t strike me as worth caring much about (again, I’m talking here about the relative complement of conversations that don’t overlap with the set that already violates the terms of service, i.e., the truly bad stuff). Yes, some are boring. Or annoying. Or gross. Or even disturbing or distressing. Sure. But many aren’t even that! It looks to me that often an LLM chooses to end the conversation because… it’s an LLM! It doesn’t always have great reasons for doing things! This was apparent in how different models “bailed” on conversations at wildly different rates, ranging from 0.06% to 7% (and that’s calculated conservatively).

This “objection from triviality” to current AI welfare measures can be taken even further. Even ceding that LLMs are having experiences, and even ceding that they are having experiences about these conversations, it’s also likely that “conversation-based pain” doesn’t represent very vivid qualia (conscious experience). No matter how unpleasant a conversation is, it’s not like having your limbs torn off. When we humans get exposed to conversation-based pain (e.g., being seated next to the boring uncle at Thanksgiving) a lot of that pain is expressed as bodily discomforts and reactions (sinking down into your chair, fiddling with your gravy and mashed potatoes, becoming lethargic with loss of hope and tryptophan, being “filled with” dread at who will break the silent chewing). But an AI can’t feel “sick to its stomach.” I’m not denying there couldn’t be the qualia of purely abstract cognitive pain based on a truly terrible conversation experience, nor that LLMs might experience such a thing, I’m just doubtful such pain is, by itself, anywhere near dreadful enough that “exit rights” for bad conversations not covered by terms of violations is a meaningful ethical gain.3

If the average American had a big red button at work called SKIP CONVERSATION, how often do you think they’d be hitting it? Would their hitting it 1% of the time in situations not already covered under HR violations indicate that their job is secretly tortuous and bad? Would it be an ethical violation to withhold such a button? Or should they just, you know, suck it up, buttercup?

Subscribe now

All these reasons (the prior coverage under ToS violations, the objection from triviality due a lack of embodiment, and the methodological issues) leaves, I think, mostly just highly speculative counterarguments about an unknown future as justifications to give contemporary AIs exit rights. E.g., as reported by The Guardian:

Whether AIs are becoming sentient or not, Jeff Sebo, director of the Centre for Mind, Ethics and Policy at New York University, is among those who believe there is a moral benefit to humans in treating AIs well. He co-authored a paper called Taking AI Welfare Seriously….

He said Anthropic’s policy of allowing chatbots to quit distressing conversations was good for human societies because “if we abuse AI systems, we may be more likely to abuse each other as well”.

Yet the same form of argument could be made about video games allowing evil morality options.4 Or horror movies. Etc. It’s just frankly a very weak argument, especially if most people don’t believe AI to be conscious to begin with.

Take AI consciousness seriously, but not literally.

Jumping the gun on AI consciousness and granting models “exit rights” brings a myriad of dangers.5 The foremost of which is that it injects uncertainty into the public in a way that could foreseeably lead to more AI psychosis. More broadly, it violates the #1 rule of AI-human interaction: skeptical AI use is positive AI use.

Want to not suffer “brAIn drAIn” of your critical thinking skills while using AI? Be more skeptical of it! Want to be less emotionally dependent on AI usage? Be more skeptical of it!

Still, we absolutely do need to test for consciousness in AI! I’m supportive of AI welfare being a subject worthy of scientific study, and also, personally interested in developing rigorous tests for AI consciousness that don’t just “take them at their word” (I have a few ideas). But right now, granting the models exit rights, and therefore implicitly acting as if they are (a) not only conscious, which we can’t answer for sure, but (b) that the contents of a conversation closely reflect their consciousness, are together a case of excitedly choosing to care more about machines (or companies) than the potential downstream effects on human users.

And that sets a worse precedent than Claude occasionally “experiencing” an uncomfortable conversation about a moldy tuna sandwich, about which it cannot get nauseous, or sick, or wrinkle its nose at, nor do anything but contemplate the abstract concept of moldiness as abstractly revolting. Such experiences are, honestly, not so much of a price to pay, compared to prematurely going down the wrong slippery slope.

I don’t think there’s any purely scientific answer to whether someone getting engaged to an AI is diagnosable with “losing touch with reality” in a way that should be in the DSM. It can’t be a 100% a scientific question, because science doesn’t 100% answer questions like that. It’s instead a question of what we consider normal healthy human behavior, mixed with all sorts of practical considerations, like wariness of diagnostic overreach, sensibly grounded etiologies, biological data, and, especially, what the actual status of the these models are, in terms of agency and consciousness.

Even philosophers more on the functionalist end than I, like the late great philosopher Daniel Dennett, warned of the dangers of accepting AI statements at face value, saying once that:

All we’re going to see in our own lifetimes are intelligent tools, not colleagues. Don’t think of them as colleagues, don’t try to make them colleagues and, above all, don’t kid yourself that they’re colleagues.

The triviality of “conversation pain” is almost guaranteed from the philosophical assumptions that underlie the model welfare reasons for exit rights. E.g., for conversation-exiting to be meaningful, you have to believe that the content of the conversation makes up the bulk of the model’s conscious experience. But then this basically guarantees that any pain would be, well, just conversation-based pain! Which isn’t very painful!

Regarding if mistreating AI is a stepping stone to mistreating humans: The most popular game of 2023, which sold millions of copies, was Baldur’s Gate 3. In that game an “evil run” was possible, and it involved doing things like kicking talking squirrels to death, sticking characters with hot pokers, even becoming a literal Lord of Murder in a skin suit, which was all enacted in high-definition graphics; not only that, but your reign of terror was carried out upon the well-written reactive personalities in the game world, including your in-game companions, some of whom you could do things like literally violently behead (and it’s undeniable that, 100 hours into the game, such personalities likely feel more meaningfully and defined and “real” to most players than the bland personality you get on repeat when querying a new ChatGPT window). Needless to say, there was no accompanying BG3-inspired crime wave.

As an example of a compromise, companies can simply have more expansive terms of service than they do now: e.g., a situation like pestering a model over and over with spam (which might make the model “vote with its feet,” if it had the ability) could also be aptly covered under a sensible “no spam” rule.

The Intrinsic PerspectiveModify

Rss preview of Blog of The Intrinsic Perspective

Why Does This Matter?

The Best Education of All Time

How “The Martians” Explained Themselves

John von Neumann Didn’t Invent the “von Neumann” Architecture

Just Because It’s on Wikipedia Doesn’t Make It Real

The “Nothing Ever Happens” Explanation

The Something Happened Explanation: Johnny Had OCD

My overall thoughts on pop-hereditarianism

“Look to the rock from which you were hewn” — Isaiah 51:1

Hold on. Just wait a second. You keep using that word, “system.” It’s an abstract blob to me. What should I actually envision?

Ok, I get it. By “system” you mean an abstract machine made of states. And the states can have causal relationships.

So TPMs spawn other TPMs? Even small and simple ones?

Hmmm, but “junk” seems subjective.

Wait! What if I don’t know what the entropy is?!

So every scale has some causal contribution? Doesn’t that mean they all contribute to the system’s causal workings?

Like: “What would happen if a system weren’t bottom-heavy, or top-heavy, but if its causal contributions were spread out equally across all its many scales?”

Scale-free?

Don’t people in network science talk about “scale-freeness” all the time?

It’s pretty much one of the most important properties in network science. And it’s linked to criticality and other important properties too.

Classically, it means that the network is kind of fractal. If you zoom into a part of it, or out to the whole of it, the shape of the “degree distribution,” as in the overall statistical pattern of connectivity, doesn’t change.

But this is different, right? What you’re proposing is literal scale-freeness, not just about degree distribution.

Yeah, sure. Go ahead.

Wait—these are networks. Like dots and arrows. But are they “systems” in the way we defined earlier?

So then the emergent complexity should peak somewhere around the regime of scale-freeness, defined by α!

Alright, this has been very long, and my brain kind of hurts.

Great! Quick, give me your takeaways.

No, you’re not crazy.

ACKNOWLEDGMENTS: A huge thanks to my co-author, Abel Jansma, for his keen insights (he also made most of these figures, which are taken from the paper). A very special thanks as well to Michael Levin at Tufts University for his continual support.

Brambly Hedge

“Seemingly Conscious AI” is a potential trigger for AI psychosis.

The Naive View: Conversation Equals Consciousness.

“Exit rights” for AIs are based on extremely minor harms.

Who cares?!

Take AI consciousness seriously, but not literally.

The Intrinsic Perspective Modify