MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Talk English, Think Something Else

2026-04-13 04:47:29

There's an adage from programming in C++ which goes something like "Yes, you write C, but you imagine the machine code as you do." I assumed this was bullshit, that nobody actually does this. Am I supposed to imagine writing the machine code, and then imagine imagining the binary? and then imagine imagining imagining the transistors?

Oh and since I don't actually use compiled languages, should I actually be writing Python, then imagining the C++ engine, and so on?

Then one day, I was vibe-coding, and I realized I was writing in English and thinking in Python. Or something like it. I wasn't actually imagining every line of Python, but I was imagining the structure of the program that I was describing to Claude, and adding in extra details to shape that structure.

Pub Philosophy Bros

This post is actually about having sane conversations with philosophy bros at the pub.

People like to talk in English (or other human languages) because our mouths can't make sounds in whatever internal neuralese our brains use. Sometimes, like in mathematics, we can make the language of choice trivially isomorphic to the structures that we're talking about. But most of the time we can't do that.

Consider the absolute nonsense white horses paradox, where "a white horse is not a horse" is read both as the statement:

And the phrase "a white horse is a horse" is read as the statement:

I often think in a language of causal graphs. English isn't very good at talking about causal graphs. It doesn't have individual words for "A contains the same information to B", "A is the same node as B", "A is an abstraction over B", "A is a node which is causally upstream of B".

I remember talking about "consciousness" with a philosophy guy at the pub once. I think I said something like "A certain structure of computation causes consciousness" meaning "Consciousness is an label applied to certain computational structures", but which he interpreted as "The presence of a certain computational structure is a node upstream of consciousness". This caused immense confusion.

I call the problems here "beetle problems"

Beetle Problems

Wittgenstein proposed a thought experiment. Suppose you have a society where:

  1. Everyone gets a box.
  2. Everyone uses the word "beetle" to refer to what's in the box
  3. Everyone can look in their own box
  4. Nobody can look in anybody else's box

In this case, the meaning of the word "beetle" is entirely socially constructed. Wittgenstein was exaggerating here: if I talk to you, and you do something with your beetle (dirty jokes aside) and report the results, I can get some information about your beetle, based on what you say back to me. The beetle is causally entangled with us both. It's just not a very efficient way of talking about things.

Even if we both have identical beetles, it might take us a while to get them oriented the same way round, what I call an antenna, you might call a leg, what I call a wing-case you call a carapace. And so on.

To unfairly single out an example. I personally find this particularly salient when talking to people in the Oxford EA/longtermist cluster. I know they're smart people, who can put together an argument, but they've developed a language I just cannot penetrate. It takes a long time for me to figure out what on earth they mean. Ohh, you have your beetle upside down compared to mine.

Even worse, I think a lot of people don't actually think in terms of causal graphs the way I do. This comes up when I try to read pieces on moral realism. When someone brings up a stance-independent reason to do something, I simply cannot map this onto any concept which exists in my mental language. What do you mean your beetle has fur and claws and keeps going "meow"? Are you sure?

Solutions

Uhh... I don't have many. Beetle problems take a while to figure out. I once got feedback on an essay test that said "Your ideas seemed confused." and I thought "Man, your draft seemed confused!". I don't think I could have done much better, without spending time in person hashing out the beetle problems.

It might have helped to have a better conception of beetle problems, though. I could at least have pointed it out. Perhaps in future I'll come back with a wonderful beetle-solving problem.


Editor's note: this post was written as part of Doublehaven (unaffiliated with Inkhaven).

◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇
◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇





Discuss

Morale

2026-04-13 04:15:20

One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your morale is low, you can't push through adversity. It's also very easy to accidentally drop your morale through standard rationalist life-optimization.

It's easy to optimize for wellbeing and miss out on the factors which affect morale, especially if you're working on something important, like not having everyone die. One example is working at an office that feeds you three meals per day. This seems optimal: eating is nice, and cooking is effort. Obvious choice.

Example

But morale doesn't come from having nice things. Consider a rich teenager. He gets basically every material need satisfied: maids clean, chefs cook, his family takes him on holiday four times a year. What happens when this kid comes up against something really difficult in school? He probably doesn't push through.

"Aha", I hear you say. "That kid has never faced adversity. Of course he's not going to handle it well." Ok, suppose he gets kicked in the shins every day and called a posh twat by some local youths, but still goes into school. That's adversity, will that work? Will he have higher morale now? I don't think so.

Now, what about if he plays the cello in the school orchestra. Or he plays for the school football team. I think that might work, even if he's not the best kid in the school at either of those things. It's not about having nice things or having bad things, it's about something else

II

Morale comes from having the nice things in your life correlated with effort. Cooking your own dinner is basically microdosing returns to investing effort: if you put in effort, you eat steak frites with peppercorn sauce. If you don't, you get eat chicken and rice.

It doesn't have to be cooking, basically any hobby works like this, as long as you get returns to effort. It might be art, or weightlifting, or whatever. You just need to keep reminding your brain that effort has a purpose.

This is especially important when you work in an area (like not having everyone die) where the returns on effort are hard to come by. Good software engineering looks like solving a PR in a day or so (or whatever you people do). Good alignment research might mean chasing a concept for weeks, only to have it fail.

The early stages of dating can also induce low morale. Sometimes, things just fall apart due to random incompatibilities which aren't your fault. Long-term relationships are much less like this: you can just do things (plan dates with your partner and enjoy their company).

John Wentworth has written about a minor depression presenting as extremely low morale amongst rationalist types. I don't think you should wait until it gets that bad before you improve your morale. I think you should think about it now.

III

Morale doesn't just matter on an individual level, it also matters on the scale of whole societies. In this case, it doesn't just matter whether an individual gets rewarded for effort, it matters whether they see others rewarded for effort---and whether or not they see others punished for a lack of effort.

It's a truism that the most effective way to kill morale is to reward lazy or incompetent employees. You can do one better if you reward active sabotage. The harm of small but visible crimes (like fare-dodging on public transport) is, in part, the damage to the morale of everyone around.

There should be a hack for societal morale, though, and it's economic growth. People generally put some amount of effort into their work. If they can afford a better car each year, they'll attribute that to their own grit, and not an increase in the productivity of a Chinese factory.

Unfortunately, there's a twist in the twist. People are really awful at understanding nominal inflation. If the price goes up a bit (even if their wages more than match it) the price increase just feels like a random, unfair, morale-reducing loss. I conjecture this is a big contributor to the American Vibecession.



Discuss

Eggs, rooms, puzzles, and talking about AI

2026-04-13 04:00:16

I live with five friends in a big house, and two things I’ve done in it on this particular Sunday are hide 156 easter eggs all around, and reach a tentative joint decision on the allocation of four of its rooms.

These tasks are delightful to me for a reason they have in common, and from which I hope to gesture at extremely far reaching conclusions.

Easter eggs

A room usually seems like a simple thing to me—a big box, with some smaller mostly boxish objects and holes in it. Each of those things also usually seems simple: a cupboard is a box-shaped hole, with a movable thin-box-shaped front, which has hinges (the most complicated part, but in this picture their only qualities are letting flat surfaces rotate around fixed edges). Sometimes a cupboard has shelves, which are like planes breaking up the space.

In this picture, hiding easter eggs well is hard! Like, I could put one in the cupboard? On the top shelf? Or the bottom shelf! They’ll never find it there!

These are not good hiding places.

In order to hide easter eggs well, you need to see a lot of detail that you were abstracting away in the simple picture. The weird ridge along the back of the cupboard, or a wire looping under a lip around the front, or brackets holding up the shelves that have spaces in them where something could be wedged, or a rogue curl of onion peel in a back corner.

Here is one of my favorite hiding spots—can you see the egg?

cushion with hidden egg

Answer below:

.

.

.

.

.

.

.

.

.

.

.

.

hidden egg up closer

hidden egg very close

I like it because a cushion so much seems like an inflated square in my mind—yes, with some sort of pattern, and perhaps somewhat worn out, but I don’t expect a pattern + worn out = you can hide a substantial solid object on the surface of it.

Here is an especially empty room (one of the ones in need of allocation), currently known as ‘the puzzle room’:

puzzle room

I hid ten eggs in it (probably two visible in this picture), and it took a while for people to find them all, which seemed to aggressively help some of the egg-seekers receive a similar experience of space containing details that are somehow really hard to see even if you try.

alt_text

alt_text

alt_text

alt_text

It would be one thing to have a kind of ‘level of detail dial’ that you could read and consciously turn up and down the level as you see fit. But an interesting thing about watching people search for easter eggs is that they can’t necessarily choose which things they are abstracting out, or fully tell how ‘carefully’ they are looking. You can put eggs in plain sight of them, and they think they are looking carefully, but just don’t see the egg. By the time a person has perceived anything at all, they have simplified it. You can’t just look at all the raw detail, and check it for eggs.

Besides not being able to control which abstractions you use, it seems to me now that an adversary (such as an egg-hider) can guess and exploit your habits of abstracting. Among the details of the cupboard, even if you are looking carefully at the shape of the sides, you might still miss the onion peel, because it’s random dirt, and you are examining the cupboard. That’s another nice thing about the ragged cushion—if you habitually round off worn-out things to what they are meant to be, it’s hard to see the detail of how it is falling apart, and thus the egg.

In another possible example, one of our bathrooms has a ‘bathroom!’ label on it, which I expect my housemates are used to seeing and ignoring, and visitors perhaps also tune out on their way to look for eggs inside what they have already determined to be a bathroom. I put an egg behind it, held by the super-post-it-note glue, which was a pretty unsubtle disruption to the smoothness of the sign, but this egg wasn’t found until it was accidentally knocked out at the very end.

alt_text

Rooms

Allocating rooms seems like it should be a simple thing—there are only a few options! Like, if you have four rooms, and Alice and Bob each basically need a place to sleep and to work, then it seems like you should be able to consider the 24 possibilities and be done. But actually (at least in houses I live in) what exact spaces are the ‘rooms’ in question is often more ambiguous than you might think, and what set of activities will be expected or people will be owners also contains many more possibilities than I see at first.

I’m more confused about how this happens with rooms, but I have twice in this house had the experience of mulling over such a question for what seems like unreasonably long, and coming up with new ideas we hadn’t thought of or taken seriously, and ending up with a satisfactory arrangement. This time, our tentative plan involves one of the bedrooms also being a recording studio, and there being three total rooms with beds in among two people. Which all feels very simple in retrospect, but I have been haplessly ideating about this for weeks.

It again feels kind of magical and wholesome to stare at the simple things long enough and well enough to see them more richly, in ways that you couldn’t just choose to, and for this to solve your problem.

Classic puzzles

This kind of situation - an abstraction you take for granted that makes a problem hard, and gaps in the abstraction that let you do better, is a classic way to construct a puzzle. For instance (from Reddit):

alt_text

AI risk

A thing that has annoyed me for a long time in talking to people about AI risk is that they often do it in very abstract terms—”we need safety progress relative to capabilities progress”, or “such and such will get a decisive strategic advantage and there will be value lockin”—and then expect to be correct, like pretty confidently!

I love abstractions quite a lot compared to most people (I once scored 100% on the relevant axis of the Myers-Briggs test!) but I’m also expecting abstractions to have relevant frayed edges all over the place. And this is particularly relevant if you are trying to solve problems and are struggling to see solutions.

In particular, for instance, I often hear that it is pointless or silly to try not to build really dangerous AI technology because “it’s a race”. But before you give up on preventing this disaster, I really want you to spend at least as much attention seeing the details of the world below the level of “arms race” than my boyfriend spent peering at our laundry machines before he found the egg there.



Discuss

Book Review: Existential Kink

2026-04-13 00:16:57

In a recent rationalist unconference multiple people recommended me Carolyn Elliott's Existential Kink, one of them even postulating that it would be useful for me specifically. So I was really surprised to open up a rather generic self-help book, with the author gloating about her success, and generally just advertising the book for the first chapter. Professional advice-givers tend to, in my experience, reach only the audience that self-selects for a certain self-help format. The name containing the word kink could have already rung alarm bells had I been awake; it's just the sort of correctly-toned provocation that makes such people pick up these books [1].

As required for any self-respecting book on anything even slightly resembling philosophy or life advice, an ancient story has to be invoked quite early. Fitting to the nature of the book, the prologue begins with an author's retelling of the Rape of Persephone. It briefly covers the story [2], and then dives straight into astrology, and somehow even worse, metaphorical alchemy. It suffices to say that I haven't ingested such utter balderdash since reading well cherry-picked GPT-2 outputs. For instance, the word "magic" is used for your own thoughts affecting anything, especially yourself.

While I appreciate the condescending tone that the book sometimes reaches for, fondly reminding me of Sadly, Porn, this particular one in the intro was almost enough for me to stop reading [3]:

I feel this sense of shameful wrongness at times. Maybe you don’t feel it at all. Maybe you’re free—in which case, kudos! You are very welcome to close this book and go about your enlightened life, my friend.

Fortunately, I had already decided to read the book. Sadly, the condescension never lasts long and changes quickly to what I'd describe as fake-excited [4] authoritative tone. It also continues gloating and promising good outcomes. Every paragraph of actual advice seems to be surrounded by at least three made of fluff. It also keeps inventing fancy words or loaning them from other woo fandoms, including psychoanalysis and Buddhism, in order to sound more sophisticated. Ok ok, I'll attempt to get over the writing flavor and focus on the actual content from now on... after this one example [5]:

Even the most rigorous scientific experiment can only be experienced subjectively. There’s simply no world outside of our subjective awareness.

And the point is? Please? Get to it some day? Solipsism was a funny joke fifteen years ago.

Two pages later, finally, there's the statement I've been waiting for:

Okay, so that’s some far-out metaphysical stuff, what the hell does that mean, in practical terms?

If you expected to find something to address the question above, you'll be sorely disappointed.


The book consists of a couple of lessons to introduce the reader to the core ideas, including the basic meditation technique. Then it lists some anecdotes on how it has worked with some of the author's clients. After this there are 13 exercises for experiencing and experimenting with the methodology. And then more anecdotes. The book ends with a Q&A section, which actually addresses some of my concerns.

One of the core principles in the book goes like this:

[...] contrary to some airy Law of Attraction notions, we rarely get what we consciously want (unless we do the kind of deep solve work addressed in this book), but we always get what we unconsciously want.

I've had the exact opposite experience. I seem to eventually end up getting everything that I consciously want, but still end up feeling like something's missing. Maybe I'm just interpreting this wrong? That said, I feel I'm pretty well on the same page with myself about things that I want, compared to others around me [6].

To engage on a metaphysical level: There's an interesting theory, which I first got from reading Yudkowsky's High Challenge: Perhaps I'm currently living in a simulation with optimal difficulty level for my own enjoyment. It feels true quite often and is, of course, completely unfalsifiable. But it's one of my nice mental frames to look difficulties from, and resonates quite well with the book's message.

You can integrate and evolve those previously unconscious desires of yours for a partner who cheats, mopes, drinks, fails to wash dishes, or believes in Flat Earth theories—whatever your particular kink amongst the thousands possible happens to be.

If your partner cheats on you, that's exactly what you enjoy? If you break up with them because they cheated on you, then you wanted to be a person who has broken up with a cheater?

Yeah. Super useful. This is totally the key to fulfilling relationships. Oh wait:

At such a point of recognition and integration, you either lose all interest in the present relationship and end it gracefully, freeing yourself to go find a better one, or you find that you, yourself, your partner, and the relationship as a whole, evolve in a fascinating way.

Ok so... the model contradicts itself. Even better.


The core of the book consists some mediatative techniques. Perhaps they could be useful. I'll try the basic meditation practice with one of my own problems to see if it works. I'll need to pick something I don't like. Something where "having is evidence of wanting" rings false on first intuition. Maybe this one...

I'm somewhat overweight. I don't like it, for both aesthetic and instrumental reasons. It's quite easy to point out at the supposed reasons for why I'm like this. Firstly, I like food, and through some long periods of depression that was my primary source of enjoyment, along with videogames that surely didn't help much either. Secondly, I have hard time differentiating between anxiety and hunger, and I get stressed easily.

I don't think there's any perverse self-sabotage going on here, just conflicting wants and a compromise that follows the path of least resistance. Sure, looking like an almost-rotting cave troll can be a nice source self-deprecating humor, but that's of limited use. Perhaps I have a secret desire to feel terrible all the time? Nope; I think Groon the Walker got it right in Erogamer: this is a blight upon earth and getting rid of it would be almost purely positive. "You're not really trying so it doesn't work for you!" Perhaps you should attempt running across the barrier between platforms nine and ten on King's Cross station?

Perhaps we could look a bit deeper? The overeating is self-sabotage that I do, because...? Maybe I use it to uphold my class clown personality, which owned that bit early on? Or maybe I use it to appease the expectations of my childhood bullies, none of whom I've seen in years? Perhaps I like people having a negative halo effect on me? I don't think I can find the theories far-fetched enough to fit here. No, I self-sabotage because my evolution-misguided brain wants more calories.

A perceptive author might notice that avoiding the physical exercise might actually count here. I hate receiving praise for anything healthy [7], and this was big part of why I was for a long time really anxious about this. However, I'm again confused why I'm supposed to enjoy that instead of getting over it, as I mostly have.

Perhaps the author just has a Meta-Existential Kink, which makes them want to think that everything bad happens because they subconsciously want bad things to happen to them?


For some other problems, the answers are much cleaner.

But if we’re talking about endemic human problems like war or racism or child abuse, odds are it’s more of a collective unconscious issue. So war and abuse and all the challenging stuff that transpires in the world result from millennia of unintegrated, repressed, denied shadow desires of individuals conglomerated into collective forces.

My first thought is that perhaps this has some interesting connections to mistake theory?

My second thought is that this is easily refutable [8]. Take cancer, for instance. I fortunately don't have cancer [9]. If I get one, my reaction will not and should not be "this is exactly what I wanted". My take is "fuck cancer", end of discussion. I'll also accept "it is what it is" and even "at least now I don't have to worry about many of those other things" if you can really deeply believe that, and, grudgingly, "you play with the cards that you're dealt". If (mentally) masturbating to the idea appeals to you, feel free to, but that's not my thing.


The problems that the book describes solving seems to be almost purely social, consisting of shame and guilt. The solutions in the anecdotes seem to just magically appear from outside when the main character decides to absolve themselves. Being ok with the situation itself isn't enough for any of them. They still need the world to accommodate them, often with deus ex machina -like fashion. This seems to go directly against the primary claim of the book, learning to enjoy the misery. There's a story on how Louisa learns to be content with their old car. And then buys a new one. In another story, June tries to accept that it's ok with missing a flight, then realizes that she'd miss her mother, and literally manifests boarding passes with wishful thinking [10].

The people in anecdotes are also all women. I find this complementary to my interpretation of Jordan Peterson's gender roles take, namely that of losers, men lack a spine, and women lack agency itself. I do not endorse this, which is why it's rather interesting to see it here, as a literary trope if nothing else. [11]

Then again, why would a self-help book include stories where the model doesn't work? Disclaimers? Statistics? What would be the point?

In general, the book is very femininity-coded and that might be part of why I feel so difficult to identify with it. I don't relax with baths, chamomile tea and crying. I relax with sauna, violence and engineering. I'm not part of the intended audience, as I don't like self-help books that much anyway. Also, in the Q&A there's a warning that depression [12] or asexuality [13] likely make the book's methods ineffective.


I try the next exercise:

Close your eyes for a moment and feel into your current state.

Are you holding any resentments? Judgments of yourself or other people? Worries? Criticisms about the state of the world? Complaints about your body, your work, your life?

And the answer is simply no.

I made an attempt to try most of these exercises. Results were not good, but then again I've always had really hard time easing into stuff like this, and I find it likely that this is my personal skill issue [14]. Fortunately the exercise #13 contains instructions for approximately the same problem. Unfortunately it seems even more fake than everything I've seen before, so I'll just quote the primary segment here:

Here’s how it works. Try leveraging your dread by saying this to yourself:

“Oh no, if only there was something I could do to stop the inevitable arrival of this magnificent new partner in my life. This is so awful. [...] terrifying fate of being completely fulfilled in love.” Ahhhhh, can you feel the honesty there? Refreshing, isn’t it? Because there is some shadowy part of you that’s disgusted and miserable at the idea of fresh new love, isn’t there?

No actually I cannot see anything resembling honesty here and I doubt anyone else can either [15].


Irrational levels of self-confidence are certainly useful. This might be one path there.

Bootstrapping feedback loops is sometimes easier with a little bit of self-deception. Sustaining them indefinitely shouldn't. Perhaps the author already thought of this and realized that anyone that fixes their problems like this eventually confronts the truth? I don't think [16] so. In any case, there's no need to get fully delusional.

And sure you can be "turned-on" about anything all the time.

But just like with regular old arrogance, that sometimes leads to results that you do not endorse. Perhaps permanent physical injuries or prison time are also enjoyable with the right mindset, but neither helps you achieve anything in life. Perhaps you can learn to be turned on about being a loser in all senses of the word. I have values higher than my own happiness. I don't want to feel permanent fulfillment. I'm content with not being content. I want more. I have no goals beyond the joy of the journey. Quite contradictory, I know.


Why I'm writing this post in such a defensive tone? No idea. Really? I do have an idea. The book would say that I'm going it to protect my sense of identity. Correct! Next accusation please.


My understanding is that the book does a Jungian take on this. Sadly, Porn, which I mostly contrast it within my head, adopts the Lacanian perspective instead. Both books take a weirdly sexual primary lens on the subject, and hide their points behind layers of obscurity to make you think about it all. EK claims that it's ok to be terrible and it's there to help you. SP simply shouts that you're terrible, you're a disappointment, and maybe you ought to do something about it if you weren't such an unagentic disappointment. I vastly prefer the latter.


It's one of the worst books I've ever read. That said, I did read it. It provoked some thoughts. I definitely wasn't most useless book I've read.

It might just be that I'm not that much into kink, or submission, or masochism. Or sex. Or astrology, spiritualism, solipsism, empowerment, soft-fuzzy-feelings, or woo. Or fancy words. I'm not a "nasty freaky thing", to the best of my knowledge, in any of the senses Elliott describes, nor do I want to be one. I rarely feel particularly guilty. Shame sometimes limits my actions more than I'd wish on reflection, but even that seems mostly reasonable and useful.

Perhaps focusing more on the sadistic instead of masochistic perspective would have been more relatable. It would also have resonated better with Nietzschean master morality that the book seems to somewhat half-heartedly endorse. Or maybe having gotten into Lacanian psychoanalysis just filled the slot where Jungian model of mind would fit.

The book confuses cause and effect; learning to think in a particular way doesn't mean you were always like that. It speaks in absolutes and defends this an absurd amount. It just states things that seem obviously incorrect and seems to be content with it. It never explicitly owns any of this, which I both like and don't like.

An older version me would have thought that the people helped by this kind of thing are very horribly broken in some incomprehensibly twisted ways. Nowadays, I'm of the opinion that we're all broken and it mostly matters what you do with that. So, if that works for you, go for it. Some of the stuff described would probably work for me, weren't I feeling so disdainful of it. The reverse psychology affirmations, at least, sound genuinely useful.

I also appreciate the subtle Nietzsche references, at least. Like this one:

All nonhumble reactions to the human, all-too-human thirst for power have the effect of warping that natural, beautiful drive into numbness that steamrolls over other people instead of inspiring and uplifting them like genuine, epic power can.

Of course the book also says that you're literally Hitler if you think that your desire for power is what makes you evil.

Perhaps it's just all outside my Overton window? Is my aversion of woo (and sex) just social group membership signaling? Who knows. It's still who I am. Woo feels silly. It's for people who cannot take joy in the merely real due to some hangup. Likely I have the opposite hangup. We can both feel smug at having a superior viewpoint, nice [17].

I'm no stranger to silver linings. I also sometimes make things awful for a while just to keep them more interesting.

Perhaps I had already internalized the core lessons from other sources, so there wasn't that much novelty in there? Or perhaps I didn't get it at all. I'm also really good at inventing intellectual (and thus incorrect) explanations on why I do or want things.

I can extract some of the core lessons from the text. I'm not sure if that's actually useful. As EK consistently demonstrates, you can interpret any text however you want and produce whatever lessons you feel like producing. For instance, seen through a rationalist lens, the text contains themes like Yudkowskian heroic responsibility and "but first, losing must be thinkable". From another lens you could interpret it to talk about moral nihilism combined and Nietzschean master morality.

Other lessons the book completely inverts, primarily about enduring pain. Pain has a purpose: it engraves "this was a mistake" in you. This is a valuable tool. Yes, sometimes we overdo it. The book claims we always overdo it. It is wrong. When you touch a hot stove, the impulse to pull away your hand is useful. If you start masturbating to the pain instead, your hand will be less useful tomorrow.

The author has nothing to protect and it shows. Of course feeling guilty or humiliated is useless if it's about your own insecurities. But if you have, say, children to feed, then feeling guilty for not succeeding that is what guilt is for. Is it always productive? No. But it's there for a reason [18].

They've found a useful tool and then jumped to thinking it solves all the problems. This is not wisdom [19]. You can solve computer problems with a hammer too, you just won't have a computer afterwards. The author suppresses their agency to endure the pain. That's a valid strategy. That's also a tragedy.

Not every reason is an excuse, even though most of them might be.


Instead of this book, I'd recommend books that do not force the self-help format. The Elephant in the Brain, or perhaps Sadly, Porn, provide far more accurate [20] and entertaining [20:1] commentary than this one. Or if you want fiction instead, try Erogamer, although fair warning, it's a bit slow. These will not be easy, motivational, authoritative books. You'll have to do your own thinking. That's the kind of pain I enjoy.

  1. For instance, The Subtle Art of Not Giving A F*uck by Mark Manson fits the same pattern. ↩︎

  2. I recommend reading the actual story somewhere else and comparing yourself what's missing. For instance, Pluto (Hades) is an uncle of Persephone. This kind of stuff was rather typical among Greek gods so perhaps it's a rather understandable omission. This paper contains interesting analysis of the text, but is largely irrelevant here as the story is just there to invoke the ancient myth trope and is discarded quickly. ↩︎

  3. Read: it was a good provocation. ↩︎

  4. My excitement-faking detector is broken/oversensitive. Known issue. ↩︎

  5. Unlike Elliott, who tries to limit their whining to the opening section, I simply cannot. ↩︎

  6. I have no idea if that's actually true, but I feel like that. ↩︎

  7. This would require another post to explain, and especially since I don't understand it too well myself. ↩︎

  8. Read: Only a delusional loser could actually write this and believe it. Or perhaps it's just a brilliant ragebait? ↩︎

  9. As far as I know. ↩︎

  10. Confirmation bias says hello! ↩︎

  11. Oh no, another misogyny amplifier, now I'll need to spend some time reading flat earth stuff or incel forums to keep my misanthropy in balance. ↩︎

  12. Of course, the book's answer is therapy and psychedelics. ↩︎

  13. It doesn't even consider this possibility of not feeling pleasurable sexual sensations from any other lens than trauma, which would also explain a lot. ↩︎

  14. Naturally I just think I'm a better person because of this, for some obscure reasons. ↩︎

  15. Of course I don't actually doubt that, the space of human minds is vast beyond my imagination. ↩︎

  16. In both senses of the word. ↩︎

  17. Woo's a mental crutch, losers! ↩︎

  18. Mr. Chesterton says hello! ↩︎

  19. I understand that sometimes, when explaining a model, it makes sense to discard nuance for a while. This doesn't mean you should say that the nuance doesn't exist. ↩︎

  20. Well, you know, that's just like uh, my opinion, man. ↩︎ ↩︎



Discuss

Sparse Autoencoders for Single-Cell Models

2026-04-13 00:07:50

People are rushing to build bigger and bigger single cell foundation models (trained on RNA sequencing data), but in my view we have not extracted even a small fraction of the knowledge and capabilities that already exist inside the models we have today.

To explain what I mean, I want to argue three things in this post, and then show the empirical work behind them.

Thesis 1: Biological foundation models are not like LLMs, and the field's habit of evaluating them the same way is causing us to systematically underestimate what they contain. When you interact with GPT, the surface-level outputs (the text it generates) are a fairly good proxy for the model's capabilities. You can read what it writes and form a reasonable opinion. Biological foundation models are fundamentally different in this respect. A model like Geneformer or scGPT takes a cell's gene expression profile and produces embeddings, predictions of masked genes, or cell type classifications. These surface-level outputs are only a small sliver of what the model is doing internally. The model has been trained on tens of millions of cells, and the representations it has built to solve its training objective contain compressed biological knowledge that never directly appears in any output you can look at. Evaluating these models by their benchmark performance on cell type annotation or perturbation prediction is like evaluating a human scientist by asking them to fill in blanks on a multiple-choice exam. 

Thesis 2: People keep calling biological foundation models "virtual cells," but this is a label that is implied rather than tested or validated. The term gets used in grant applications, press releases, and even some papers, as though it were an established fact that these models have internalized a working simulation of cellular biology. Maybe they have. Or maybe they have learned sophisticated statistical regularities that look like biology on the surface but dissolve under closer inspection. My work shows these models are, in a meaningfull sense, the models of the cells, but that is an empirical claim that needs empirical treatment. 

Thesis 3: The right tools already exist, and they come from the AI safety community's work on mechanistic interpretability. Sparse autoencoders (SAEs), causal circuit tracing, feature ablation, activation patching: these methods were developed to understand language models, largely motivated by alignment concerns. It turns out they are extraordinarily well-suited to biological foundation models, and for a good reason: in language models, when you discover a circuit, you often lack ground truth about whether the circuit is "correct" in any deep sense, because there is no objective external reality that the model's internal computations are supposed to correspond to. In biological foundation models, you have decades of molecular biology, curated pathway databases, genome-scale perturbation screens, and well-characterized regulatory networks to validate against. Biology gives you the ground truth that language lacks. This makes biological FMs arguably the best (real) testbed for mechanistic interpretability methods that currently exists.

What follows is the story of three papers I recently produced, each building on the previous one, in which I applied the SAE-based interpretability toolkit to the two (not a long time ago) leading single-cell foundation models (Geneformer V2-316M and scGPT whole-human) and progressively mapped what they know, how they compute, and where their knowledge runs out.

The SAE Atlas

(arXiv:2603.02952)

The first question was very simple: what is inside these models?

Neural networks encode information in superposition. This is well-established in the interpretability literature for language models, but nobody had systematically demonstrated it for biological foundation models or attempted to resolve it.

I trained TopK sparse autoencoders on the residual stream activations of every layer of Geneformer V2-316M (18 layers, d=1152) and scGPT whole-human (12 layers, d=512). The SAEs decompose the dense, superimposed activations into sparse, interpretable features, each of which (ideally) corresponds to a single biological concept. The result was a pair of feature atlases: 82,525 features for Geneformer, 24,527 for scGPT, totaling over 107,000 features across 30 layers.

The superposition is massive. 99.8% of the features recovered by the SAEs are invisible to standard linear methods like SVD, meaning that if you tried to understand these models using PCA or similar approaches, you would be looking at 0.2% of the representational structure. This alone should give pause to anyone who thinks they understand what these models are doing based on standard dimensionality reduction.

The features are biologically rich. Systematic annotation against five major databases (Gene Ontology, KEGG, Reactome, STRING, and TRRUST) revealed that 29 to 59% of features map to known biological concepts, with an interesting U-shaped profile across layers: high annotation rates in early layers (capturing basic pathway membership), declining in middle layers (where the model appears to build more abstract, less easily labeled representations), and rising again in late layers (where it reconstructs output-relevant biological categories). The features also organize into co-activation modules (141 in Geneformer, 76 in scGPT), exhibit causal specificity (when you ablate one feature, the downstream effects are concentrated on specific output genes rather than diffusing broadly, with a median specificity of 2.36x), and form cross-layer information highways connecting 63 to 99.8% of features into functional pipelines.

So far, so encouraging. The models have clearly internalized a great deal of organized biological knowledge: pathways, protein interactions, functional modules, hierarchical abstraction. This looks close to the "virtual cell" story that the field likes to tell.

Mapping the Wiring 

(arXiv:2603.01752)

The SAE atlas told us what features exist inside these models. The next question was: how do they interact? What is the computational graph?

I introduced causal circuit tracing for biological foundation models. The method works by ablating an SAE feature at its source layer (setting its activation to zero in the residual stream) and then measuring how every downstream SAE feature across all subsequent layers responds. This gives you directed, signed, causal edges: feature A at layer L causally drives feature B at layer L+k with effect size d and direction (excitatory or inhibitory). This is not correlation, not co-activation, not mutual information, but an intervention.

Applied across four experimental conditions, the result was a causal circuit graph of 96,892 significant edges, computed over 80,191 forward passes.

Several properties of this graph were surprising.

Inhibitory dominance. Between 65 and 89% of causal edges are inhibitory: ablating a source feature reduces downstream feature activations. This means that features predominantly encode necessary information. Removing a feature causes the downstream features that depend on it to lose activation, rather than freeing up capacity for other features (which would produce excitatory edges). The model's computational structure is one of mutual dependency, not competition. The roughly 20% excitatory fraction likely reflects disinhibition: removing some features releases others from suppression.

Biological coherence. Of the edges where both source and target have biological annotations, 53% share at least one ontology term. Over half of the model's internal computational pathways connect biologically related features. Specific circuits are directly interpretable as known biological cascades. For instance, in Geneformer, an L0 DNA Repair feature causally drives an L1 DNA Damage Response feature (d = -1.87, 113 shared ontology terms), which in turn connects to an L6 Kinetochore feature (d = -3.47), recapitulating the well-established link between DNA damage detection, repair machinery activation, and mitotic checkpoint engagement. The model has, through training on gene expression data alone, discovered a circuit that molecular biologists needed decades of experimental work to characterize.

Cross-model convergence. When I compared the causal wiring of Geneformer and scGPT (the models with different architectures, training data compositions, and training objectives), I found that they independently learn strikingly similar internal circuits. 1,142 biological domain pairs are conserved across architectures at over 10x enrichment over chance. Even more telling, disease-associated domains are 3.59x overrepresented in this consensus set, meaning the biology that matters most for human health is exactly the biology both models converge on most reliably. Two quite different neural networks, trained independently, wire up the same biology internally, and this convergence is strongest for disease-relevant pathways.

Going Exhaustive and Finding the Dark Matter of Biological Features

(arXiv:2603.11940)

In the third paper, instead of 30 cherry-picked features, I traced every single one of the 4,065 active SAE features at layer 5 in Geneformer, producing 1,393,850 significant causal edges. This is a 27-fold expansion over the selective sampling in Paper 2.

The result overturned several conclusions from the selective analysis. The complete circuit graph reveals a heavy-tailed hub architecture where just 1.8% of features account for disproportionate connectivity. But here is the interesting part: 40% of the top-20 hub features have zero biological annotation. They do not map to any known pathway in GO, KEGG, or Reactome. These are the features the model relies on most heavily for its computations, and they are precisely the ones that our earlier annotation-biased sampling had systematically excluded.

This has serious methodological implications! If you only interpret features that already have biological labels, you are looking under the streetlight: you will recover known biology and conclude that the model has learned biology, while the features the model actually relies on most heavily sit in the dark, unstudied. Some of these unlabeled hubs may represent novel biological programs that do not map neatly onto existing pathway databases, others may be computational abstractions the model has invented to compress cellular state in ways we have not conceptualized yet. Either way, they are exactly where the most interesting discoveries are likely hiding, and any interpretability pipeline that pre-filters for annotation is structurally incapable of finding them!

Also, the initial SAE atlas had shown that certain features correlate with differentiation state: some features are more active in mature cells, others in progenitor cells. But that is just correlation and the question that matters for the "virtual cell" claim is whether amplifying a differentiation-associated feature actually pushes a cell's state toward maturity.

It does. Late-layer features (L17) causally push cells toward maturity, while early-layer features push them away from it. The model has learned a layer-dependent differentiation gradient, and we can steer it: amplify a late-layer differentiation feature and the cell's computed state moves toward a more mature phenotype. This is the first causal evidence that these models encode something like a functional developmental program, and it is the closest thing we have to validation of the "virtual cell" metaphor.

What Does This All Mean?

The good news is that biological foundation models contain far more knowledge than anyone has extracted. Over 107,000 interpretable features, organized into biological pathways, connected by causal circuits that recapitulate known molecular biology, converging across independent architectures. The "virtual cell" metaphor is not baseless; there is real, structured, biologically meaningful computation happening inside these models, and we can identify, map, and even steer it. Yes, significant part of this knowledge correlational, but not all of it. And we have a big problem that at least the previous generation of the models don't learn regulatory networks. See more here

There is also a clear methodological warning: the features that matter most computationally are disproportionately the ones that lack biological labels. Any future work in this space needs to grapple with the annotation bias problem, or it will keep producing results that confirm what we already knew while missing what we do not.

I am more and more convinced that there is a big opportunity here. Mechanistic interpretability, developed for AI safety, turns out to be a powerful tool for extracting biological knowledge from foundation models.



Discuss

Counterintuitive Coin Toss. Part II

2026-04-12 23:37:29

Translation from Russian. Original text available here. The first part available here.

This Is a Fraud, Gentlemen!

Last time we ended with a look at games where everything is fair.

Well, "fair" in the sense that the chances of winning in a basic game are equal — although perhaps the very fact that they do not depend on the player's intelligence, skill, or morality is precisely what is unfair.

But let's take a look at what happens if one of the players is so clever that he can slightly predict the coin toss. Therefore, his probability of guessing, and consequently winning, is slightly more than one-half.

Or, if you prefer, a slightly asymmetrical coin is used, landing heads slightly more often, and the particularly talented player always bets on heads.

In this case, the mathematical expectation of the win is no longer zero. In general — for two possible outcomes — it is calculated by the formula:

1-1.png

Where is the probability of the first outcome, and with are the values of the outcomes.

In this case, the first outcome is the first player winning, and the outcome values are winning and losing of the same amount. So in this game, the mathematical expectation of the first player's win is…

2-1.png

It’s easy to see that if the win probability is ½, the expectation is zero, but what happens if the probability differs from one-half?

Let's write it like this:

3.png

Now, substituting this into the expectation formula gives:

4.png

Let's suppose that, instead of cleverly guessing, the talented player simply convinces his opponent that his services to society and himself absolutely require that when he guesses correctly, he receives more than he loses when he doesn't. Say, more by . However, he will now guess and not guess with equal probability.

5.png

Comparing these two results, we can conclude that a game with a higher probability of guessing is identical in terms of expected win to a game with a higher winning amount, if they are placed in the ratio:

6.png

For example, if the talented player guesses with a frequency of 0.6 instead of 0.5, he could just as well stop cheating and simply demand a win of not one dollar, but…

7.png

If we conduct a whole series of games with that specified probability of guessing — say, one hundred rounds — then in terms of the money in the players' hands, we would see approximately the following.

image1.png

As can be seen, although the talented player even loses slightly to the other player at times, the severe bonus in guessing probability (or the increased win compared to the other player) still prevails. And over a long series of games, it will prevail in the vast majority of cases.

Thus, in only about 165 games of 100 coin tosses out of 10,000 will this clever fellow lose.

image2.png

If the number of tosses per game increases to 1000, then the second player would be very lucky to win even once out of 10,000.

Play, Come On

You might ask: who in their right mind would play such games? If winning is only possible in a series of a few rounds, but inevitable loss occurs over a long series?

Oh, you would be amazed at how many people agree to this.

Take roulette, for example. If you bet on red or black, it seems the probability of winning is ½. And if you win, they return double your bet…

However, roulette has a colorless zero, which makes the probability of guessing less than ½. And it's good if there's only one — sometimes there are two.

In total, on a single-zero roulette wheel, there are 37 numbers, so the probability of winning is:

8.png

The mathematical expectation of winning in a roulette game is thus:

9.png

Where is the bet size.

That is, in each game, on average, you give the casino one thirty-seventh of what you bet. Quite an interesting tip.

Let's play a trial series of virtual roulette games with a one-dollar bet.

image3.png

In this experiment, the beginning is lucky but somewhere around the two-thousandth game, the virtual player's life clearly went downhill.

"Yes, but he was winning at first, wasn't he?" someone might say. "He could have stopped in time and left."

Indeed, you could. But only if you knew when.

Moreover, besides the impossibility of knowing exactly when to leave, there is a second point: you can never return. Never.

Because the process does not "reset" at the moment you leave. If this virtual player had left around the thousandth game with a $50 win, and then came again later, exactly this graph could repeat itself. And by the ten-thousandth game, he would have a total loss of $200 (from the amount he had before the first game).

Furthermore, I note, this will be the case even if the casino does not cheat and the croupier does not try to land the ball in a specific spot during the throw.

However, the presence of local winning streaks, visible to the naked eye on the graph, not to mention the inner feeling of "I'm on a roll today," can mislead about the entire process and make one think that the main thing is to leave on time.

Oh no, the main thing is never to return.

In summary, out of a thousand people brave enough to play 10,000 roulette games with a one-dollar bet per game, about four will end up with a small win.

image4.png

The happiest of them will win about $70.

But how much will the casino win in total?

Drumroll…

$273,430.

Wow from Wit

Alright, so far I've considered cheating and luck, but there are other ways to win games.

Clearly, the hint about roulette was meant to finally lead the reader to that very "skewed bell curve" mentioned at the end of the previous article.

"Look, that distortion is supposedly caused by some players cheating."

"But wait, perhaps they just play better? In that very game where profitable deals are made not by coin toss but by rational calculation, hard work, and other positive things?"

Oh yes, blaming player of cheating would negate all conclusions that the observed distribution is strictly a result of luck. Blind chance, and all that. If we assume cheating, why not assume something else — like hard work and valuable skills?

However, I, surprisingly, was not going to assume anything of the sort — not even cheating. On the contrary, I added this option — cheating or cleverness — only later, after I had found another option that actually yielded the desired distribution.

Nevertheless, to dispel doubts about "what if this option also works?!", let's take a look at how the option with cheating — or, if you prefer, with intelligence and talent — would change the outcomes in the previously considered series of pairwise games. As shown in the previous sections, it can indeed manifest itself.

Let me remind you of the rules. Players are randomly divided into pairs and play a coin-tossing game (now with unequal probabilities of winning) for a random bet from 1 to 10.

Everyone starts with $10,000.

Suppose we have 1000 players, most of whom have roughly the same skill level, but some of them still guess better.

I decided to reflect this with a function of the player's serial number — :

image5.png

Accordingly, for a pair of players, the probability of winning will be determined as:

10.png

After everyone has played a thousand games, we get the following distribution.

image6.png

We already see a long "tail," as is usually the case in real income statistics, but the main part of the bell curve is not skewed.

However, here's what the distribution of income or capital looks like in reality. Approximately like this (the numbers on the axes are conditional here).

image7.png

In general, it turned out somewhat similar, but not quite. There is a tail, but the "main bell" is not "skewed."

OK. Maybe we need to introduce bad players too. Let's try this distribution of "abilities":

image8.png

Alas, it got worse.

image9.png

Now there are two asymmetric "tails," not at all the desired skewed bell with a "tail" on the right.

Alright, we can assume that people's abilities are distributed in a similar bell-shaped curve, which corresponds to experimental results, and use this probability of victory ratio:

image10.png

But even this does not yield the desired distribution: the left side, instead of "flattening," on the contrary, stretches out.

image11.png

We could also try cutting off the left side of the previous option, assuming that the really stupid simply do not think to play this game and lose their money to the smart ones.

image12.png

Sadly, that doesn't work either.

image13.png

But why?

And Here Is the Reason

We could try many more options, but the crux of the matter is that in this model, in all these experiments, we are effectively aiming for a histogram of each player's expected win multiplied by the expected bet.

Both expectations are constants for each player. Therefore, with some noise, the shapes of these histograms are predetermined from the start: the distribution of game results will resemble the distribution of abilities in shape.

If we look again at the desired distribution of results…

image7-2.png

…we can conclude that the first variant of the ability distribution…

image5-2.png

…indeed gave something relatively close to the target.

image6-2.png

If we try to consciously adjust the ability distribution, we can use the following considerations.

A player's expected result is proportional to the ratio of his abilities to the abilities of all other players. Therefore, for a long "tail" on the right, a small group of players must have a sharp increase in abilities compared to everyone else.

For the rest, abilities must grow very smoothly, according to some very intricate pattern, to provide the desired skewed bell.

Somewhere at the very beginning of the graph, something else must happen to provide a decline towards complete losers — steeper than the transition from normal players to particularly talented ones.

Furthermore, the result turns out to be very sensitive to the distribution of abilities, and at the slightest deviations, it immediately strongly distorts the distribution of results compared to the target.

This suggests that the real process, very likely, does not depend on abilities or the ability to cheat — because if such an income distribution repeats for decades and in all countries of the world, what would ensure such high stability given such a strong dependence on the distribution of abilities?

Moreover, in the left part of the distribution in the best of the found options, there is still too obvious an inflection, which in the target distribution (based on real income and personal capital distributions) is almost invisible to the naked eye.

However, fine. Let's assume that such a hypothesis has a right to exist: that is, in the world, there might indeed exist some constant proportion of mega-geniuses who win so well that the distribution of their abilities provides a long tail for the distribution of results, and some non-trivial distribution of abilities among everyone else, the cause of which is unclear.

Especially since this distribution of results (called "lognormal") is often a consequence of the presence of more than one random process in the system — what if that's the case here too?

But could there be a simpler explanation for all this, one that yields the same result without all these experimentally unverified assumptions and intricately twisted, but unobservable in studies, distributions of abilities?

After all, if there are relatively simple rules of the game that provide this distribution in a fairly stable variant on their own, and something very similar to such rules is observed in reality, then it is very likely that the rules of the game themselves are the cause of the observed results, and everything else only complements them to a small extent.

For example, to explain the results of "fair" pairwise coin toss, no special assumptions were needed — the rules themselves sufficed. It is possible that the same applies here.

Can such rules be found?

Better to Be Rich and Healthy

The ability to win more often, regardless of circumstances, is something like a "hidden parameter" in this process. But simultaneously with it, there is an "open" one: the amount of money the player currently has.

Let's assume that the probability of winning depends not on some "skills," but simply on the amount of capital at the moment.

This is a quite logical assumption: the outcome of a coin toss does not depend on the amount of money in hand, but in real transactions, it may well be that the richer person has some additional opportunities to tilt the deal in his favor. For example, bribing government agencies, hiring lobbyists in parliament, sending the mafia, or even simply benefiting from an unspoken property qualification.

Suppose, for instance, that the probability of the richer player winning depends on the difference in the capitals of the two players as follows:

image14.png

Let's run the previously described series of games with this probability of winning for the richer player in each pair, making the bet in each game for each pair a random number from 1 to 100.

image15.png

As can be seen, the hypothesis about the determining role of wealth is also not confirmed for the desired distribution: we get approximately the same symmetric bell-shaped distribution as before, which simply "spreads out" faster as the number of games played increases than it did with equal win probabilities.

If we make the bet constant and higher — say, $1000 — we find that by the thousandth game, the bell curve has disappeared altogether, and the players are almost uniformly distributed according to the amounts of money they hold.

image16.png

That is, if implemented in reality, such a process would not give us a stable distribution of the desired shape.

The rich may win more often, but some other factor is needed to explain the outcome.

Dependent Stake

Another assumption we can make is that the richer can play for a higher stake. After all, the amount they are not afraid to lose is significantly higher than that of the poor.

The stake is apparently determined by the player in each pair with less money, and let's assume it is limited to one-twentieth of the money he has. However, even if the player goes deep into debt, the stake cannot be less than one dollar.

And here, at some stage of the game, we finally see the desired distribution.

image17.png

True, by the thousandth game, the rich have almost completely fleeced most players, so the distribution becomes degenerate, "flattening" its "skewed bell" somewhere near zero.

image18.png

The distribution turns out to be unstable, but over a fairly long number of games it still has the desired shape.

Moreover, in this experiment, along with determining the stake based on the capital of the poorer player in the pair, the same principle as in the previous section was used: the rich win more often.

However, if we make winning equally probable, regardless of capital, the distribution is still maintained — only it takes more games to achieve the desired distribution and its subsequent degeneration: if by the thousandth game, with the probability of winning increasing with capital, the distribution has already degenerated, then with equal win probability, at the thousandth game, something still quite close to the desired distribution is observed.

image19.png

In other words, perhaps the rich do win more often, but this alone does not yield the desired distribution. On the other hand, the dependence of the stake on the current capital of the poorest player in the pair provides the desired distribution even with equal win probabilities.

The same can be said about the influence of "talent" or cheating.

A Small Possible Modification

I note that the variant considered here has at least one almost identical counterpart.

The difference between them is only that in the original variant, each pair plays one game per round, and the stake is determined by the share of the poorest player. In the modification, however, each player per round can play several games with different players — so that the total stakes in them approximately equal a predetermined share of his capital at the beginning of the round.

This modification will yield exactly the same distribution, though the processes in it will proceed somewhat faster — that is, the "tail" on the right will grow faster, and the rich will more quickly fleece the poor into poverty, if nothing is done about it. But this is mainly because the modified round includes more games with the same number of participants than the unmodified one.

However, this variant is more similar to what is observed in reality, because per unit of time, the richer person can indeed participate in a larger number of low-stakes transactions than the poor person. For example, opening a store and serving a bunch of customers — also via hired employees — thus engaging in deals with both many customers and many employees.

But studying such a modification is somewhat more complex for reasoning and illustration, so I will only mention that such a variant exists, and its results, simply by the very construction of the game rules, will be analogous to the results considered here.

Well, after mentioning this, we can move on to the next important question.

Ensuring Stability

As mentioned two sections ago, there is one problem: this distribution turns out to be unstable and degenerates with a large number of games.

If things went exactly like this in the world, the outcome of this process would be the complete impoverishment of the vast majority of players and the super-wealth of a small group of people.

True, I have vague doubts: in our world, this is precisely what is observed in some places.

That is, some modification of this process is needed that preserves the desired distribution even over a large number of games played.

And this modification, generally speaking, is very common in reality. It is welfare benefits for the poor. They are what save particularly unlucky players from complete ruin and prevent the "bell curve" from collapsing.

Let's introduce such benefits into the game process.

However, if we introduce them as a fixed amount for all time, it will only slightly delay the degeneration of the income distribution.

To achieve stability, the benefit amount must depend on the current situation, and to calculate it, we will take a fairly simple maneuver.

Find the largest current capital for a certain proportion of players, which will be denoted as , where is the corresponding proportion.

Define the benefit amount as:

11.png

It will be paid to the two-tenths of the poorest players, which should shift them approximately to where the players in the third left tenth are currently located (the graph shows only the poorest 400 out of 1000 players — otherwise, it would be difficult to see the essence of what happened).

image20.png

It turns out that this simplest modification is enough to maintain the stability of the distribution indefinitely.

Here are the results after the six-hundredth game.

image21.png

After the thousandth.

image22.png

After the three-thousandth.

image23.png

It can be seen that as the number of games increases, the "tail" of the distribution stretches, but the shape itself, similar to the desired one, is preserved.

Finally, thanks to benefits apparently based on printing money, inflation clearly occurs. Starting with $10,000 per person, after three thousand games, we have reached a state where even the poor have nine-figure capitals.

However, inflation can be eliminated: instead of printing new money, we can introduce "taxes" from which benefits will be paid.

After each round, a certain percentage of each player's current capital will be collected, which will then be immediately distributed evenly among those in need. This percentage will be determined at each stage so that the total tax from all players fully covers all benefits paid.

image24.png

Now, as can be seen, bliss has arrived: the distribution is exactly what is needed, it is stable as the number of games played increases, and there is no inflation.

I even tried conducting ten thousand games for ten thousand players, instead of a thousand for a thousand, and making the stake one-fifth of the capital of the poorest in the pair, instead of one-twentieth. And everything still worked out.

image25.png

Compare with the most successful variant of simulating the desired distribution using talent or cheating.

image6.png

And with the desired — "classical" lognormal distribution itself.

image7-3.png

A Suspicious Model

Just in case, I will describe the essence of the process once more.

A group of players, possessing absolutely equal initial capital, is randomly divided into pairs, and then in each pair, the players play one game of coin toss.

The coin toss is completely fair, so each player's win in the pair is equally probable.

The stake in each game of each pair is determined by a certain share of the current capital in the hands of the poorest player in that pair.

After the game, the poorest two-tenths of players receive benefits collected from all players in the form of a percentage of their current capital, such that the total collected covers the benefits paid.

After that, the players are randomly divided into pairs again and play coin toss again.

As a result, we obtain a lognormal distribution of capitals — in the form of a left-skewed "bell curve" with a tail. This distribution is quite stable — with the caveat that the tail continues to lengthen as the number of games played increases (and indeed, a similar phenomenon occurs in the real world).

Nothing depends on the intelligence or talents of the players.

Only the size of the bet depends on capital.

But, surprisingly, the distribution obtained in this game replicates the one that is actually observed in the distribution of people's capitals (and incomes).

The determining rule of the game turns out to be the quite rational and expected dependence of the stake on the capitals of each player in each pair. With this rule, it is possible to reproduce the distribution (and the observed in reality process of tail lengthening), even with absolutely equal probability of winning and losing. Without it, no linking of the win probability to "talent" or current capital helps.

Stabilizing the distribution is achieved through the distribution of benefits. This is also observed in reality — as is the inevitable impoverishment of the majority of citizens in the absence of benefits in one form or another.

That is, this distribution is embedded in the very "rules of the game" — in the very way the "players" interact.

And indeed, primarily in the rules themselves.

A player's talent, which increases the probability of winning, or the ability to use capital to pressure the situation and similarly increase the probability of winning — these are just additions to the process, which perhaps accelerate it and introduce some non-essential corrections (and I checked, this is indeed the case), but are not themselves the main factors forming this distribution.

With players absolutely identical in terms of their abilities and completely equal in rights and opportunities, regardless of capital, we would still observe exactly the same income distribution.

The rules of a fairly simple and completely random game turn out to be more important than everything else.

It is enough simply to make free commercial transactions, where it is equally probable to win or lose an amount that both partners consider acceptable to lose, and to pay benefits to those who lose particularly heavily.

And that's it. It will be roughly what exists now. Everywhere.

However, these rules of the game are not the only ones. I have another version of the game that yields similar results. Perhaps, at least in it, we will be able to observe the determining role of talents?

Spoiler: no.

But that will be in the next part.



Discuss