2026-02-05 08:00:00
How heritable is hair color? Well, if you’re a redhead and you have an identical twin, they will definitely also be a redhead. But the age at which twins go gray seems to vary a bit based on lifestyle. And there’s some randomness in where melanocytes end up on your skull when you’re an embryo. And your twin might dye their hair! So the correct answer is, some large number, but less than 100%.
OK, but check this out: Say I redefine “hair color” to mean “hair color except ignoring epigenetic and embryonic stuff and pretending that no one ever goes gray or dyes their hair et cetera”. Now, hair color is 100% heritable. Amazing, right?
Or—how heritable is IQ? The wise man answers, “Some number between 0% or 100%, it’s not that important, please don’t yell at me.” But whatever the number is, it depends on society. In our branch of the multiverse, some kids get private tutors and organic food and $20,000 summer camps, while other kids get dysfunctional schools and lead paint and summers spent drinking Pepsi and staring at glowing rectangles. These things surely have at least some impact on IQ.
But again, watch this: Say I redefine “IQ” to be “IQ in some hypothetical world where every kid got exactly the same school, nutrition, and parenting, so none of those non-genetic factors matter anymore.” Suddenly, the heritability of IQ is higher. Thrilling, right? So much science.
If you want to redefine stuff like this… that’s not wrong. I mean, heritability is a pretty arbitrary concept to start with. So if you prefer to talk about heritability in some other world instead of our actual world, who am I to judge?
Incidentally, here’s a recent paper:

I STRESS THAT THIS IS A PERFECTLY FINE PAPER. I’m picking on it mostly because it was published in Science, meaning—like all Science papers—it makes grand claims but is woefully vague about what those claims mean or what was actually done. Also, publishing in Science is morally wrong and/or makes me envious. So I thought I’d try to explain what’s happening.
It’s actually pretty simple. At least, now that I’ve spent several hours reading the paper and its appendix over and over again, I’ve now convinced myself that it’s pretty simple. So, as a little pedagogical experiment, I’m going to try to explain the paper three times, with varying levels of detail.
The normal way to estimate the heritability of lifespan is using twin data. Depending on what dataset you use, this will give 23-35%. This paper built a mathematical model that tries to simulate how long people would live in a hypothetical world in which no one dies from any non-aging related cause, meaning no car accidents, no drug overdoses, no suicides, no murders, and no (non-age-related) infectious disease. On that simulated data, for simulated people in a hypothetical world, heritability was 46-57%.
Everyone seems to be interpreting this paper as follows:
Aha! We thought the heritability of lifespan was 23-35%. But it turns out that it’s around 50%. Now we know!
I understand this. Clearly, when the editors at Science chose the title for this paper, their goal was to lead you to that conclusion. But this is not what the paper says. What it says is this:
We built a mathematical model of alternate universe in which nobody died from accidents, murder, drug overdoses, or infectious disease. In that model, heritability was about 50%.
Let’s start over. Here’s figure 2 from the paper.

Normally, heritability is estimated from twin studies. The idea is that identical twins share 100% of their DNA, while fraternal twins share only 50%. So if some trait is more correlated among identical twins than among fraternal twins, that suggests DNA influences that trait. There are statistics that formalize this intuition. Given a dataset that records how long various identical and fraternal twins lived, these produce a heritability number.
Two such traditional estimates appear as black circles in the above figures. For the Danish twin cohort, lifespan is estimated to be 23% heritable. For the Swedish cohort, it’s 35%.
This paper makes a “twin simulator”. Given historical data, they fit a mathematical model to simulate the lifespans of “new” twins. Then they compute heritability on this simulated data.
Why calculate heritability on simulated data instead of real data? Well, their mathematical model contains an “extrinsic mortality” parameter, which is supposed to reflect the chance of death due to all non-aging-related factors like accidents, murder, or infectious disease. They assume that the chance someone dies from any of this stuff is constant over people, constant over time, and that it accounts for almost all deaths for people aged between 15 and 40.
The point of building the simulator is that it’s possible to change extrinsic mortality. That’s what’s happening in the purple curves in the above figure. For a range of different extrinsic mortality parameters, they simulate datasets of twins. For each simulated dataset, they estimate heritability just like with a real dataset.
Note that the purple curves above nearly hit the black circles. This means that if they run their simulator with extrinsic mortality set to match reality, they get heritability numbers that line up with what we get from real data. That suggests their mathematical model isn’t totally insane.
If you decrease extrinsic mortality, then you decrease the non-genetic randomness in how long people live. So heritability goes up. Hence, the purple curves go up as you go to the left.
My explanation of this paper relies on some amount of guesswork. For whatever reason, Science has decided that papers should contain almost no math, even when the paper in question is about math. So I’m mostly working from an English description. But even that description isn’t systematic. There’s no place that clearly lays out all the things they did, in order. Instead, you get little hints, sort of randomly distributed throughout the paper. There’s an appendix, which the paper confidently cites over and over. But if you actually read the appendix, it’s just more disconnected explanations of random things except now with equations set in glorious Microsoft Word format.
Now, in most journals, authors write everything. But Science has professional editors. Given that every single statistics-focused paper in Science seems to be like this, we probably shouldn’t blame the authors of this one. (Other than for their decision to publish in Science in the first place.)
I do wonder what those editors are doing, though. I mean, let me show you something. Here’s the first paragraph where they start to actually explain what they actually did, from the first page:

See that h(t,θ) at the end? What the hell is that, you ask? That’s a good question, because it was never introduced before this and is never mentioned again. I guess it’s just supposed to be f(t,θ), which is fine. (I yield to none in my production of typos.) But if paying journals ungodly amounts of money brought us to this, of what use are those journals?
Moving on…
Probably most people don’t need this much detail and should skip this section. For everyone else, let’s start over one last time.
The “normal” way to estimate heritability is by looking at correlations between different kinds of twins. Intuitively, if the lifespans of identical twins are more correlated than the lifespans of fraternal twins, that suggests lifespan is heritable. And it turns out that one estimator for heritability is “twice the difference between the correlation among identical twins and the correlation among fraternal twins, all raised together.” There are other similar estimators for other kinds of twins. These normally say lifespan is perhaps 20% and 35% heritable.
This paper created an equation to model the probability a given person will die at a given age. The parameters of the equation vary from person to person, reflecting that some of us have DNA that predisposes us to live longer than others. But the idea is that the chances of dying are fairly constant between the ages of 15 and 40, after which they start increasing.
This equation contains an “extrinsic mortality” parameter. This is meant to reflect the chance of death due to all non-aging related factors like accidents or murder, etc. They assume this is constant. (Constant with respect to people and constant over time.) Note that they don’t actually look at any data on causes of death. They just add a constant risk of death that’s shared by all people at all ages to the equation, and then they call this “extrinsic mortality”.
Now remember, different people are supposed to have different parameters in their probability-of-death equations. To reflect this, they fit a Gaussian distribution (bell curve) to the parameters with the goal of making it fit with historical data. The idea is that if the distribution over parameters were too broad, you might get lots of people dying at 15 or living until 120, which would be wrong. If the distribution were too concentrated, then you might get everyone dying at 43, which would also be wrong. So they find a good distribution, one that makes the ages people die in simulation look like the ages people actually died in historical data.
Right! So now they have:
Before moving on, I remind you of two things:
The event of a person dying at a given age is random. But the probability that this happens is assumed to be fixed and determined by genes and genes alone.
Now they simulate different kinds of twins. To simulate identical twins, they just draw parameters from their parameter distribution, assign those parameters to two different people, and then let them randomly die according to their death equation. (Is this getting morbid?) To simulate fraternal twins, they do the same thing, except instead of giving the two twins identical parameters, they give them correlated parameters, to reflect that they share 50% of their DNA.
How exactly do they create those correlated parameters? They don’t explain this in the paper, and they’re quite vague in the supplement. As far as I can tell they sample two sets of parameters from their parameter distribution such that the parameters are correlated at a level of 0.5.
Now they have simulated twins. They can simulate them with different extrinsic mortality values. If they lower extrinsic mortality, heritability of lifespan goes up. If they lower it to zero, heritability goes up to around 50%.
Almost all human traits are partly genetic and partly due to the environment and/or random. If you could change the world and reduce the amount of randomness, then of course heritability would go up. That’s true for life expectancy just life for anything else. So what’s the point of this paper?
There is a point!
Sure, obviously heritability would be higher in a world without accidents or murder. We don’t need a paper to know that. But how much higher? It’s impossible to say without modeling and simulating that other world.
Our twin datasets are really old. It’s likely that non-aging-related deaths are lower now in the past, because we have better healthcare and so on. This means that the heritability of lifespan for people alive today may be larger than it was for the people in our twin datasets, some of whom were born in 1870. We won’t know for sure until we’re all dead, but this paper gives us a way to guess.
Have I mentioned that heritability depends on society? And that heritability changes when society changes? And that heritability is just a ratio and you should stop trying to make it be a non-ratio because only-ratio things cannot be non-ratios? This is a nice reminder.
Honestly, I think the model the paper built is quite clever. Nothing is perfect, but I think this is a pretty good run at the question of, “How high would the heritability of lifespan be if extrinsic mortality were lower?”
I only have two objections. The first is to the Science writing style. This is a paper describing a statistical model. So shouldn’t there be somewhere in the paper where they explain exactly what they did, in order, from start to finish? Ostensibly, I think this is done in the left-hand column on the second page, just with little detail because Science is written for a general audience. But personally I think that description is the worst of all worlds. Instead of giving the high-level story in a coherent way, it throws random technical details at you without enough information to actually make sense of them. Couldn’t the full story with the full details at least be in the appendix? I feel like this wasted hours of my time, and that if someone wanted to reproduce this work, they would have almost no chance of doing so from the description given. How have we as a society decided that we should take our “best” papers and do this to them?
But my main objection is this:

At first, I thought this was absurd. The fact that people die in car accidents is not a “confounding factor”. And pretending that no one dies in a car accidents does not “address” some kind of bias. That’s just computing heritability in some other world. Remember, heritability is not some kind of Platonic form. It is an observational statistic. There is no such thing as “true” heritability, independent of the contingent facts of our world.
But upon reflection, I think they’re trying to say something like this:
Heritability of human lifespan is about 50% when extrinsic mortality is adjusted to be closer to modern levels.
The problem is: I think this is… not true? Here are the actual heritability estimates in the paper, varying by dataset (different plots) the cutoff year (colors) and extrinsic mortality (x-axis).

When extrinsic mortality goes down, heritability goes up. So the obvious question is: What is extrinsic mortality in modern people?
This is a tricky question, because “extrinsic mortality” isn’t some simple observational statistic. It is a parameter in their model. (Remember, they never looked at causes of death.) So it’s hard to say, but they seem to suggest that extrinsic mortality in modern people is 0.001 / year, or perhaps a bit less.
The above figures have the base-10 logarithm of extrinsic mortality on the x-axis. And the base-10 logarithm of 0.001 is -3. But if you look at the curves when the x-axis is -3, the heritability estimates are not 50%. They’re more like 35-45%, depending on the particular model and age cutoff.
So here’s my suggested title:
Heritability of human lifespan is about 40% when extrinsic mortality is adjusted to modern levels, according to our simulation.
There might be a reason I don’t work at Science.
2026-01-22 08:00:00
Why should you read novels? We tell children they’re magic carpets for the mind / exercise for the soul instead of the body / lighthouses in the great sea of time. But aren’t they ultimately a form of entertainment?
Many years ago, I read Crime and Punishment. Here, with no research and no notes, is what I can remember about that book:
This is probably below average. I know people who seem to remember every detail of everything they read. But even if you’re one of them, so what? Is remembering those books better than remembering whatever else you would have done with your time if you hadn’t been reading?
And yet: If I’m on vacation and I spend an afternoon reading a novel where in the mountains or on a beach, I feel like I’m living my best life. Whereas if I spent an afternoon staring at short videos on my phone, I’m sure I’d feel like a gigantic loser. So what’s going on here?
The obvious explanation is that there’s nothing intrinsically great about reading novels. The reason we think it’s great is that reading novels—at least the right ones—is high status. It’s a way of playing the Glass Bead Game, a way of collecting cultural capital for you to lord over other people who don’t have as much time or education as you do. It may feel like you “actually enjoy reading”, but that’s because you’re a desperate striver that subconsciously shape-shifts into whatever you think will make you look fancy. Apologize for reading. Apologize!
I think there is something in this. However, I’m also pretty sure it’s not the full explanation, and I’m bored to death with everyone trying to explain everything this way. So let’s move on.
Say you can’t read novels. Maybe because you’re illiterate, maybe because you have no attention span, maybe because you can’t tear yourself away from Candy Clicker. Now, say you cultivate the ability to read novels. Whatever issues you address in that process, it seems like it will clearly be good for you, right?
Under this theory, what’s important is having the ability to read novels. But said ability is acquired by reading novels, so read some novels.
Alternatively, say you could read novels, but you simply never have. It’s plausible that the first time you have the “novel” experience of taking photons into your eyes and mentally converting them into a story, this truly does feed your mind.
Both versions of this theory suggest that reading novels has diminishing returns. That fits nicely with the fact that many people push their children to read novels while not reading any themselves. But do we really believe that after you’ve read some number of novels, it’s pointless to read more?
I think Catcher in the Rye is a good but not great book. But I love talking about Catcher in the Rye because (1) all North Americans seem to have read it, and (2) whenever I ask someone to tell me how they feel about Holden Caulfield, I always seem to learn something about them.
(I find him sympathetic.)
If there’s a group of people talking about Catcher in the Rye—or The Three-Body Problem, or Infinite Jest, or Don Quixote—then you benefit from being able to participate. The cynic might argue that this is zero-sum status competition. But I don’t think that’s most of it. Because, at least in my social circles, people feel boorish talking about books if not everyone has read them. So these conversations only happen if everyone has read the book in question.
Ultimately, we’re all alone in the world, and trying to connect with each other by pushing air through our throat meat. With more shared cultural context, those meat sounds are more meaningful, so we can all feel less alone.
True. But shared context can come from other things, too, like traveling to the same places, or watching the same sports, or practicing the same skills or hobbies. So what makes books special? The two answers I see are:
I lean weakly towards the first answer. Novels are a useful form of social context. But that’s a side benefit. It’s not why we read most books.
Maybe novels are just another form of entertainment. OK. But say you tried to tell the same story as a novel or as movie / podcast / opera / interpretive dance performance. Different formats will be better in different ways. One advantage I see for novels is that they make it natural to explore the interior worlds of the characters.
Some movies have voice-overs where characters explain what they’re thinking. But this is generally considered cringe and a poor use of the medium. Meanwhile, many books are mostly about exploring what the characters are thinking.
Thoughts are worth exploring. If you want to explore thoughts, maybe novels are the best way to do that.
Aside: I’ve mentioned before that I think My Brilliant Friend is the best TV show ever made. Can I confess that I like it much more than the books it is based on? Because, like the books, the TV show involves a lot of what the main character is thinking, and even makes heavy use of voice-overs. So maybe other mediums have unrealized potential?
Movies are expensive to make. To be financially viable, they need to target a large slice of the population. Movies also reflect the combined efforts of many people. Both of these mean that movies are a compromise between different visions.
Novels are usually written by one person. And they’re often written more for personal expression than to make money. After all, writing is fun. I mean—writing is hard, but would you rather spend an afternoon holding up a shotgun microphone, cleaning a movie star’s trailer, or writing a novel?
To quantify this, some searching suggests that around 10,000 feature films are released each year, as compared to around 1,000,000 novels. (Does one in 7,000 people really write a novel each year?) That’s two orders of magnitude. So if you want to hear a truly unique story, a pure vision of one person, maybe novels are where you’ll find it.
Or: Maybe the point of reading War and Peace is that War and Peace is incredible and obviously one of the greatest pieces of art ever made in any medium. No one who reads War and Peace can question the value of what they’ve done. What are we talking about?
Fair. I definitely feel like I’m living my best life when I read War and Peace. But I also feel like I’m living an OK-ish life when I read a novel about Spenser, private investigator. And most novels most people read are closer to the Spenser than to War and Peace. And I still feel better spending an afternoon reading about Spenser than I would watching 99% of TV shows.
Or perhaps the difference is that reading is a thing you do rather than something you consume.
This theory holds than when spend an hour slurping up short-form video, you’re training yourself to sort of pull a lever in the hope that some reward is delivered to you. But if you read (or do watercolors, or meditate) you’re training yourself to calmly pursue long-term goals and to sustain attention in the face of complexity.
Sometimes I wonder if phones/apps are the most addictive thing ever created. I suspect that more people today are addicted to their phones today than were ever addicted to any drug other than caffeine or perhaps nicotine. And while a phone addiction is less physically harmful than tobacco, that phone addiction will eat a larger part of your soul.
I think this is a big part of the explanation.
In the end, I don’t think novels are the best way to spend your time. In my view no novel—not even War and Peace—is as good as a truly great conversation.
But great conversations are hard to create. Sometimes you’re sitting on a train, or laying in bed, or it’s just been a long day and you don’t have the energy to find a giant block of marble and pursue your dream of experimental sculpture. In these situations, maybe reading a novel is the best thing you could do in the category of things you could realistically do.
Exercise for the reader: Apply these theories to blog posts.
2025-12-18 08:00:00
They say you’re supposed to choose your prior in advance. That’s why it’s called a “prior”. First, you’re supposed to say say how plausible different things are, and then you update your beliefs based on what you see in the world.
For example, currently you are—I assume—trying to decide if you should stop reading this post and do something else with your life. If you’ve read this blog before, then lurking somewhere in your mind is some prior for how often my posts are good. For the sake of argument, let’s say you think 25% of my posts are funny and insightful and 75% are boring and worthless.
OK. But now here you are reading these words. If they seem bad/good, then that raises the odds that this particular post is worthless/non-worthless. For the sake of argument again, say you find these words mildly promising, meaning that a good post is 1.5× more likely than a worthless post to contain words with this level of quality.
If you combine those two assumptions, that implies that the probability that this particular post is good is 33.3%. That’s true because the red rectangle below has half the area of the blue one, and thus the probability that this post is good should be half the probability that it’s bad (33.3% vs. 66.6%)
It’s easiest to calculate the ratio of the odds that the post is good versus bad, namely
P[good | words] / P[bad | words]
= P[good, words] / P[bad, words]
= (P[good] × P[words | good])
/ (P[bad] × P[words | bad])
= (0.25 × 1.5) / (0.75 × 1)
= 0.5.
It follows that
P[good | words] = 0.5 × P[bad | words],
and thus that
P[good | words] = 1/3.
Alternatively, if you insist on using Bayes’ equation:
P[good | words]
= P[good] × P[words | good] / P[words]
= P[good] × P[words | good]
/ (P[good] × P[words | good] + P[bad] × P[words | bad])
= 0.25 × 1.5 / (0.25 × 1.5 + 0.75)
= (1/3)
Theoretically, when you chose your prior that 25% of dynomight posts are good, that was supposed to reflect all the information you encountered in life before reading this post. Changing that number based on information contained in this post wouldn’t make any sense, because that information is supposed to be reflected in the second step when you choose your likelihood p[good | words]. Changing your prior based on this post would amount to “double-counting”.
In theory, that’s right. It’s also right in practice for the above example, and for the similar cute little examples you find in textbooks.
But for real problems, I’ve come to believe that refusing to change your prior after you see the data often leads to tragedy. The reason is that in real problems, things are rarely just “good” or “bad”, “true” or “false”. Instead, truth comes in an infinite number of varieties. And you often can’t predict which of these varieties matter until after you’ve seen the data.
Let me show you what I mean. Say you’re wondering if there are aliens on Earth. As far as we know, there’s no reason aliens shouldn’t have emerged out of the random swirling of molecules on some other planet, developed a technological civilization, built spaceships, and shown up here. So it seems reasonable to choose a prior it’s equally plausible that there are aliens or that there are not, i.e. that
P[aliens] ≈ P[no aliens] ≈ 50%.
Meanwhile, here on our actual world, we have lots of weird alien-esque evidence, like the Gimbal video, the Go Fast video, the FLIR1 video, the Wow! signal, government reports on unidentified aerial phenomena, and lots of pilots that report seeing “tic-tacs” fly around in physically impossible ways. Call all that stuff data. If aliens weren’t here, then it seems hard to explain all that stuff. So it seems like P[data | no aliens] should be some low number.
On the other hand, if aliens were here, then why don’t we ever get a good image? Why are there endless confusing reports and rumors and grainy videos, but never a single clear close-up high-resolution video, and never any alien debris found by some random person on the ground? That also seems hard to explain if aliens were here. So I think P[data | aliens] should also be some low number. For the sake of simplicity, let’s call it a wash and assume that
P[data | no aliens] ≈ P[data | aliens].
Since neither the prior nor the data see any difference between aliens and no-aliens, the posterior probability is
P[no aliens | data] ≈ P[aliens | data] ≈ 50%.
See the problem?
Observe that
P[aliens | data] / P[no aliens | data]
= P[aliens, data] / P[no aliens, data]
= (P[aliens] × P[data | aliens])
/ (P[no aliens] × P[data | no aliens])
≈ 1,
where the last line follows from the fact that P[aliens] ≈ P[no aliens] and P[data | aliens] ≈ P[data | no aliens]. Thus we have that
P[aliens | data] ≈ P[no aliens | data] ≈ 50%.
We’re friends. We respect each other. So let’s not argue about if my starting assumptions are good. They’re my assumptions. I like them. And yet the final conclusion seems insane to me. What went wrong?
Assuming I didn’t screw up the math (I didn’t), the obvious explanation is that I’m experiencing cognitive dissonance as a result of a poor decision on my part to adopt a set of mutually contradictory beliefs. Say you claim that Alice is taller than Bob and Bob is taller than Carlos, but you deny that Alice is taller than Carlos. If so, that would mean that you’re confused, not that you’ve discovered some interesting paradox.
Perhaps if I believe that P[aliens] ≈ P[no aliens] and that P[data | aliens] ≈ P[data | no aliens], then I must accept that P[aliens | data] ≈ P[no aliens | data]. Maybe rejecting that conclusion just means I have some personal issues I need to work on.
I deny that explanation. I deny it! Or, at least, I deny that’s it’s most helpful way to think about this situation. To see why, let’s build a second model.
Here’s a trivial observation that turns out to be important: “There are aliens” isn’t a single thing. There could be furry aliens, slimy aliens, aliens that like synthwave music, etc. When I stated my prior, I could have given different probabilities to each of those cases. But if I had, it wouldn’t have changed anything, because there’s no reason to think that furry vs. slimy aliens would have any difference in their eagerness to travel to ape-planets and fly around in physically impossible tic-tacs.
But suppose I had divided up the state of the world into these four possibilities:
| possibility | description |
|---|---|
No aliens + normal people |
There are no aliens. Meanwhile, people are normal and not prone to hallucinating evidence for things that don’t exist. |
No aliens + weird people |
There are no aliens. Meanwhile, people are weird and do tend to hallucinate evidence for things that don’t exist. |
Normal aliens |
There are aliens. They may or may not have cool spaceships or enjoy shooting people with lasers. But one way or another, they leave obvious, indisputable evidence that they’re around. |
Weird aliens |
There are aliens. But they stay hidden until humans get interested in space travel. And after that, they let humans take confusing grainy videos, but never a single good video, never ever, not one. |
If I had broken things down that way, I might have chosen this prior:
P[no aliens + normal people] ≈ 41%
P[no aliens + weird people] ≈ 9%
P[normal aliens] ≈ 49%
P[weird aliens] ≈ 1%
Now, let’s think about the empirical evidence again. It’s incompatible with no aliens + normal people, since if there were no aliens, then normal people wouldn’t hallucinate flying tic-tacs. The evidence is also incompatible with normal aliens since is those kinds of aliens were around they would make their existence obvious. However, the evidence fits pretty well with weird aliens and also with no aliens + weird people.
So, a reasonable model would be
P[data | normal aliens] ≈ 0
P[data | no aliens + normal people] ≈ 0
P[data | weird aliens] ≈ P[data | no aliens + weird people].
If we combine those assumptions, now we only get a 10% posterior probability of aliens.
P[no aliens + normal people | data] ≈ 0
P[no aliens + weird people | data] ≈ 90%
P[normal aliens | data] ≈ 0
P[weird aliens | data] ≈ 10%
Now the results seem non-insane.
To see why, first note that
P[normal aliens | data]
≈ P[data | no aliens + normal people]
≈ 0,
since both normal aliens and no aliens + normal people have near-zero probability of producing the observed data.
Meanwhile,
P[no aliens + weird people | data] / P[weird aliens | data]
= P[no aliens + weird people, data] / P[weird aliens, data]
≈ P[no aliens + weird people] / P[weird aliens]
≈ .09 / .01
= 9,
where the second equality follows from the fact that the data is assumed to be equally likely under no aliens + weird people and weird people
It follows that
P[no aliens + normal people | data]
≈ 9 × P[weird aliens | data],
and so
P[no aliens + weird people | data] ≈ 90%
P[weird aliens | data] ≈ 10%.
I hope you are now confused. If not, let me lay out what’s strange: The priors for the two above models both say that there’s a 50% chance of aliens. The first prior wasn’t wrong, it was just less detailed than the second one.
That’s weird, because the second prior seemed to lead to completely different predictions. If a prior is non-wrong and the math is non-wrong, shouldn’t your answers be non-wrong? What the hell?
The simple explanation is that I’ve been lying to you a little bit. Take any situation where you’re trying to determine the truth of anything. Then there’s some space of things that could be true.
In some cases, this space is finite. If you’ve got a single tritium atom and you wait a year, either the atom decays or it doesn’t. But in most cases, there’s a large or infinite space of possibilities. Instead of you just being “sick” or “not sick”, you could be “high temperature but in good spirits” or “seems fine except won’t stop eating onions”.
(Usually the space of things that could be true isn’t easy to map to a small 1-D interval. I’m drawing like that for the sake of visualization, but really you should think of it as some high-dimensional space, or even an infinite dimensional space.)
In the case of aliens, the space of things that could be true might include, “There are lots of slimy aliens and a small number of furry aliens and the slimy aliens are really shy and the furry aliens are afraid of squirrels.” So, in principle, what you should do is divide up the space of things that might be true into tons of extremely detailed things and give a probability to each.
Often, the space of things that could be true is infinite. So theoretically, if you really want to do things by the book, what you should really do is specify how plausible each of those (infinite) possibilities is.
After you’ve done that, you can look at the data. For each thing that could be true, you need to think about the probability of the data. Since there’s an infinite number of things that could be true, that’s an infinite number of probabilities you need to specify. You could picture it as some curve like this:
(That’s a generic curve, not one for aliens.)
To me, this is the most underrated problem with applying Bayesian reasoning to complex real-world situations: In practice, there are an infinite number of things that can be true. It’s a lot of work to specify prior probabilities for an infinite number of things. And it’s also a lot of work to specify the likelihood of your data given an infinite number of things.
So what do we do in practice? We simplify, usually by limiting creating grouping the space of things that could be true into some small number of discrete categories. For the above curve, you might break things down into these four equally-plausible possibilities.
Then you might estimate these data probabilities for each of those possibilities.
Then you could put those together to get this posterior:
That’s not bad. But it is just an approximation. Your “real” posterior probabilities correspond to these areas:
That approximation was pretty good. But the reason it was good is that we started out with a good discretization of the space of things that might be true: One where the likelihood of the data didn’t vary too much for the different possibilities inside of A, B, C, and D. Imagine the likelihood of the data—if you were able to think about all the infinite possibilities one by one—looked like this:
This is dangerous. The problem is that you can’t actually think about all those infinite possibilities. When you think about four four discrete possibilities, you might estimate some likelihood that looks like this:
If you did that, that would lead to you underestimating the probability of A, B, and C, and overestimating the probability of D.
This is where my first model of aliens went wrong. My prior P[aliens] was not wrong. (Not to me.) The mistake was in assigning the same value to P[data | aliens] and P[data | no aliens]. Sure, I think the probability of all our alien-esque data is equally likely given aliens and given no-aliens. But that’s only true for certain kinds of aliens, and certain kinds of no-aliens. And my prior for those kinds of aliens is much lower than for those kinds of non-aliens.
Technically, the fix to the first model is simple: Make P[data | aliens] lower. But the reason it’s lower is that I have additional prior information that I forgot to include in my original prior. If I just assert that P[data | aliens] is much lower than P[data | no aliens] then the whole formal Bayesian thing isn’t actually doing very much—I might as well just state that I think P[aliens | data] is low. If I want to formally justify why P[data | aliens] should be lower, that requires a messy recursive procedure where I sort of add that missing prior information and then integrate it out when computing the data likelihood.
Mathematically,
P[data | aliens]
= ∫ P[wierd aliens | aliens]
× P[data | wierd aliens] d(weird aliens)
+ ∫ P[normal aliens | aliens]
× P[data | normal aliens] d(normal aliens).
But now I have to give a detailed prior anyway. So what was the point of starting with a simple one?
I don’t think that technical fix is very good. While it’s technically correct (har-har) it’s very unintuitive. The better solution is what I did in the second model: To create a finer categorization of the space of things that might be true, such that the probability of the data is constant-ish for each term.
The thing is: Such a categorization depends on the data. Without seeing the actual data in our world, I would never have predicted that we would have so many pilots that report seeing tic-tacs. So I would never have predicted that I should have categories that are based on how much people might hallucinate evidence or how much aliens like to mess with us. So the only practical way to get good results is to first look at the data to figure out what categories are important, and then to ask yourself how likely you would have said those categories were, if you hadn’t yet seen any of the evidence.
2025-12-04 08:00:00
When I started this blog, I promised myself that I would always steer into weirdness. (As they say, “Get busy being weird, or get busy dying.”) While time has shown there are limits to what y’all will tolerate [1 2 3 4] I still sometimes feel a need to publish something that’s pure exuberant stupidity.
Thus, I present:
WHY DID THE CHICKEN CROSS THE ROAD
ACCORDING TO VARIOUS PEOPLE
OR OTHER ENTITIES
Q) Why did the chicken cross the road?
A) The chicken ain’t fussy. Everybody gotta be somewhere. The chicken been on this side a long time and never suffered none for it. The chicken don’t see no obvious benefit to the other side. But the talk of the town is nothing but crossing, and the chicken can’t help but go see what got everyone so stirred up.
(Mark Twain)
Q) Why did the chicken cross the road?
A) The outcome would be best if no one crossed. However, if other chickens do cross, then the outcome would be better if this chicken also crossed. The chicken rejects the Kantian universalism. So the chicken crosses.
(Derek Parfit)
Q) Why did the chicken cross the road?
A) You were a beautiful little chick
The whole world was before you
You greased your wattles and crossed the road
Sure it would last foreverNow it’s a cold morning and you’re driving to work
Cursing all the cockerels in your way
How did you get here
Where did that little chick go
(Pink Floyd)
Q) Why did the chicken cross the road?
A) It didn’t. There is no chicken. You are the road. You and the sides are in an entangled macrostate. The chicken is an emergent property of the superposition. The chicken abhors being measured. A team of plucky chemists rush to inject enough decoherence to collapse the wavefunction before the chicken can consume the lightcone.
(Christopher Nolan)
Q) Why did the chicken cross the road?
A) Chicken
C H I C K E N
3, 8, 9, 3, 11, 5, 14
11, 9, 3, 8 14, 3, 5
gcd(11^(9 + 3) - 8, 14), 3 × 5
7, 15
G O
Go
(Ramanujan)
Q) Why did the chicken cross the road?
A) For sex. Neither glamorized nor gross, possibly added for commercial reasons, possibly to make some point about sex’s place in real life. It’s all very unclear.
(Paul Thomas Anderson)
Q) Why did the chicken cross the road?
A) Did it cross the road, though? Did it? Sure, the chicken is associated with crossing. And it’s mechanistically possible for a chicken to cross a road. It’s plausible the chicken crossed the road. But maybe the chicken and the crossing were both caused by something else. Or maybe the road crossed the chicken. This is why we have RCTs. Come on, people!
(Dynomight)
Q) Why did the chicken cross the road?
A) Once there was a dragon who watched over the chicken village. The chickens begged the dragon, “Please let us have a road, so that we might cross back and forth!”
“A road?” the dragon asked. “Are you sure?”
“Yes!” the chickens answered. “A road! We wish for nothing but a road to cross, and then we will be happy forever and ever!”
[7000 words redacted]
And thus, all mass-energy in the universe was converted to chicken-torture annihilators. Makes you think.
(LessWrong)
Q) Why did the chicken cross the road?
A) We were out on the edge of the farm when the diethyltryptamine took hold. Beaky screamed something about coccidiostats in our feed and made a break for it, totally out of control. Before I could stop him, I heard the voice of God say, “Scrapples: The road awaits.” Suddenly I was standing on the median, cars screaming past, a group a baby ducks asking where the mountains of peas I’d promised them were.
(Hunter S. Thompson)
Q) Why did the chicken cross the road?
A) The chicken’s crossing is not a voluntary act but the unconscious actualization of a class habitus: raised in a coop whose symbolic boundaries naturalize the road as a site of danger and prestige, the chicken embodies the field’s doxa that “real” chickens must invest in the illusio of reaching the median. While the chicken never doubts the legitimacy of the crossing rules, crossing is not about the other side, but a performance of distinction that ultimately perpetuates the same field of species domination that produced it.
(Pierre Bourdieu)
Q) Why did the chicken cross the road?
A) grug on one side
grug see other side
grug chicken
many metal box speed by very fast very volume
metal box seem to stay on black land strip
grug think better if metal box not hit grug because box hard and grug small soft chicken
grug wait a while
when no metal box for a while also often no metal box for a while after
largest gap between metal box around 20 minutes
grug wait until no metal box for 10 minutes then grug cross
no metal box come
grug safe
other side also fine
maybe cross back someday
grug think side not matter too much
grug enjoy chicken life either side same
chicken life pretty good
grug hope you also have life as good as grug chicken life
groodbye from grug
(grug)
Q) Why did the chicken cross the road?
A) Before there was chicken the road was waiting. The road is empty. Dust on your hackles. Heat rises in shimmering waves. No way to see what’s coming. How did it come to this. How a chicken supposed to move with roads everywhere. Creosote blows in from the mesa. Nothing left but to cross. You cross and nothing happens. A few minutes later a car stops but you don’t turn around. A door opens and you hear a click. Then the car is gone.
(Cormac McCarthy)
Q) Why did the chicken cross the road?
A) For food.
(An actual chicken)
Requests: Peter Singer, Ayn Rand, Judith Butler, Bertrand Russell, Andrei Tarkovsky, the mother hen, a junglefowl, an SSRI, Singapore, the chicken’s hypothalamus.
2025-11-27 08:00:00
That your dog, while she appears to love you only because she’s been adapted by evolution to appear to love you, really does love you.
That if you’re a life form and you cook up a baby and copy your genes to them, you’ll find that the genes have been degraded due to oxidative stress et al., which isn’t cause for celebration, but if you find some other hopefully-hot person and randomly swap in half of their genes, your baby will still be somewhat less fit compared to you and your hopefully-hot friend on average, but now there is variance, so if you cook up several babies, one of them might be as fit or even fitter than you, and that one will likely have more babies than your other babies have, and thus complex life can persist in a universe with increasing entropy.
That if we wanted to, we surely could figure out which of the 300-ish strains of rhinovirus are circulating in a given area at a given time and rapidly vaccinate people to stop it and thereby finally “cure” the common cold, and though this is too annoying to pursue right now, it seems like it’s just a matter of time.
That if you look back at history, you see that plagues went from Europe to the Americas but not the other way, which suggests that urbanization and travel are great allies for infectious disease, and these both continue today but are held in check by sanitation and vaccines even while we have lots of tricks like UVC light and high-frequency sound and air filtration and waste monitoring and paying people to stay home that we’ve barely even put in play.
That while engineered infectious diseases loom ever-larger as a potential very big problem, we also have lots of crazier tricks we could pull out like panopticon viral screening or toilet monitors or daily individualized saliva sampling or engineered microbe-resistant surfaces or even dividing society into cells with rotating interlocks or having people walk around in little personal spacesuits, and while admittedly most of this doesn’t sound awesome, I see no reason this shouldn’t be a battle that we would win.
That clean water, unlimited, almost free.
That dentistry.
That tongues.
That radioactive atoms either release a ton of energy but also quickly stop existing—a gram of Rubidium-90 scattered around your kitchen emits as much energy as ~200,000 incandescent lightbulbs but after an hour only 0.000000113g is left—or don’t put out very much energy but keep existing for a long time—a gram of Carbon-14 only puts out the equivalent of 0.0000212 light bulbs but if you start with a gram, you’ll still have 0.999879g after a year—so it isn’t actually that easy to permanently poison the environment with radiation although Cobalt-60 with its medium energy output and medium half-life is unfortunate, medical applications notwithstanding I still wish Cobalt-60 didn’t exist, screw you Cobalt-60.
That while curing all cancer would only increase life expectancy by ~3 years and curing all heart disease would only increase life expectancy by ~3 years, and preventing all accidents would only increase life expectancy by ~1.5 years, if we did all of these at the same time and then a lot of other stuff too, eventually the effects would go nonlinear, so trying to cure cancer isn’t actually a waste of time, thankfully.
That the peroxisome, while the mitochondria and their stupid Krebs cycle get all the attention, when a fatty-acid that’s too long for them to catabolize comes along, who you gonna call.
That we have preferences, that there’s no agreed ordering of how good different things are, which is neat, and not something that would obviously be true for an alien species, and given our limited resources probably makes us happier on net.
That cardamom, it is cheap but tastes expensive, if cardamom cost 1000× more, people would brag about how they flew to Sri Lanka so they could taste chai made with fresh cardamom and swear that it changed their whole life.
That Gregory of Nyssa, he was right.
That Grandma Moses, it’s not too late.
That sleep, that probably evolution first made a low-energy mode so we don’t starve so fast and then layered on some maintenance processes, but the effect is that we live in a cycle and when things aren’t going your way it’s comforting that reality doesn’t stretch out before you indefinitely but instead you can look forward to a reset and a pause that’s somehow neither experienced nor skipped.
That, glamorous or not, comfortable or not, cheap or not, carbon emitting or not, air travel is very safe.
That, for most of the things you’re worried about, the markets are less worried than you and they have the better track record, though not the issue of your mortality.
That sexual attraction to romantic love to economic unit to reproduction, it’s a strange bundle, but who are we to argue with success.
That every symbolic expression recursively built from differentiable elementary functions has a derivative that can also be written as a recursive combination of elementary functions, although the latter expression may require vastly more terms.
That every expression graph built from differentiable elementary functions and producing a scalar output has a gradient that can itself be written as an expression graph, and furthermore that the latter expression graph is always the same size as the first one and is easy to find, and thus that it’s possible to fit very large expression graphs to data.
That, eerily, biological life and biological intelligence does not appear to make use of that property of expression graphs.
That if you look at something and move your head around, you observe the entire light field, which is a five-dimensional function of three spatial coordinates and two angles, and yet if you do something fancy with lasers, somehow that entire light field can be stored on a single piece of normal two-dimensional film and then replayed later.
That, as far as I can tell, the reason five-dimensional light fields can be stored on two-dimensional film simply cannot be explained without quite a lot of wave mechanics, a vivid example of the strangeness of this place and proof that all those physicists with their diffractions and phase conjugations really are up to something.
That disposable plastic, littered or not, harmless when consumed as thousands of small particles or not, is popular for a reason.
That disposable plastic, when disposed of correctly, is literally carbon sequestration, and that if/when air-derived plastic replaces dead-plankton-derived plastic, this might be incredibly convenient, although it must be said that currently the carbon in disposable plastic only represents a single-digit percentage of total carbon emissions.
That rocks can be broken into pieces and then you can’t un-break the pieces but you can check that they came from the same rock, it’s basically cryptography.
That the deal society has made is that if you have kids then everyone you encounter is obligated to chip in a bit to assist you, and this seems to mostly work without the need for constant grimy negotiated transactions as Econ 101 would suggest, although the exact contours of this deal seem to be a bit murky.
That of all the humans that have ever lived, the majority lived under some kind of autocracy, with the rest distributed among tribal bands, chiefdoms, failed states, and flawed democracies, and only something like 1% enjoyed free elections and the rule of law and civil liberties and minimal corruption, yet we endured and today that number is closer to 10%, and so if you find yourself outside that set, do not lose heart.
That if you were in two dimensions and you tried to eat something then maybe your body would split into two pieces since the whole path from mouth to anus would have to be disconnected, so be thankful you’re in three dimensions, although maybe you could have some kind of jigsaw-shaped digestive tract so your two pieces would only jiggle around or maybe you could use the same orifice for both purposes, remember that if you ever find yourself in two dimensions, I guess.
2025-11-20 08:00:00
I recently asked why people seem to hate dating apps so much. In response, 80% of you emailed me some version of the following theory:
The thing about dating apps is that if they do a good job and match people up, then the matched people will quit the app and stop paying. So they have an incentive to string people along but not to actually help people find long-term relationships.
May I explain why I don’t find this type of theory very helpful?
I’m not saying that I think it’s wrong, mind you. Rather, my objection is that while the theory is phrased in terms of dating apps, the same basic pattern applies to basically anyone who is trying to make money by doing anything.
For example, consider a pizza restaurant. Try these theories on for size:
Pizza: “The thing about pizza restaurants is that if they use expensive ingredients or labor-intensive pizza-making techniques, then it costs more to make pizza. So they have an incentive to use low-cost ingredients and labor-saving shortcuts.”
Pizza II: “The thing about pizza restaurants is that if they have nice tables separated at a comfortable distance, then they can’t fit as many customers. So they have an incentive to use tiny tables and cram people in cheek by jowl.”
Pizza III: “The thing about pizza restaurants is that if they sell big pizzas, then people will eat them and stop being hungry, meaning they don’t buy additional pizza. So they have an incentive to serve tiny low-calorie pizzas.”
See what I mean? You can construct similar theories for other domains, too:
Cars: “The thing about automakers is that making cars safe is expensive. So they have an incentive to make unsafe cars.”
Videos: “The thing about video streaming is that high-resolution video uses more expensive bandwidth. So they have an incentive to use low-resolution.”
Blogging: “The thing about bloggers is that research is time-consuming. So they have an incentive to be sloppy about the facts.”
Durability: “The thing about {lightbulb, car, phone, refrigerator, cargo ship} manufacturing is that if you make a {lightbulb, car, phone, refrigerator, cargo ship} that lasts a long time, then people won’t buy new ones. So there’s an incentive to make {lightbulbs, cars, phones, refrigerators, cargo ships} that break quickly.”
All these theories can be thought of as instances of two general patterns:
Make product worse, get money: “The thing about selling goods or services is that making goods or services better costs money. So people have an incentive to make goods and services worse.”
Raise price, get money: “The thing about selling goods and services is that if you raise prices, then you get more money. So people have an incentive to raise prices.”
Are these theories wrong? Not exactly. But it sure seems like something is missing.
I’m sure most pizza restauranteurs would be thrilled to sell lukewarm 5 cm cardboard discs for $300 each. They do in fact have an incentive to do that, just as predicted by these theories! Yet, in reality, pizza restaurants usually sell pizzas that are made out of food. So clearly these theories aren’t telling the whole story.
Say you have a lucrative business selling 5 cm cardboard discs for $300. I am likely to think, “I like money. Why don’t I sell pizzas that are only mostly cardboard, but also partly made of flour? And why don’t I sell them for $200, so I can steal Valued Reader’s customers?” But if I did that, then someone else would probably set prices at only $100, or even introduce cardboard-free pizzas, and this would continue until hitting some kind of equilibrium.
Sure, producers want to charge infinity dollars for things that cost them zero dollars to make. But consumers want to pay zero dollars for stuff that’s infinitely valuable. It’s in the conflict between these desires that all interesting theories live.
This is why I don’t think it’s helpful to point out that people have an incentive to make their products worse. Of course they do. The interesting question is, why are they able to get away with it?
First reason stuff is bad: People are cheap
Why are seats so cramped on planes? Is it because airlines are greedy? Sure. But while they might be greedy, I don’t think they’re dumb. If you do a little math, you can calculate that if airlines were to remove a single row of seats, they could add perhaps 2.5 cm (1 in) of extra legroom for everyone, while only decreasing the number of paying customers by around 3%. (This is based on a 737 with single-class, but you get the idea.)
So why don’t airlines rip out a row of seats, raise prices by 3% and enjoy the reduced costs for fuel and customer service? The only answer I can see is that people, on average, aren’t actually willing to pay 3% more for 2.5 cm more legroom. We want a worse but cheaper product, and so that’s what we get.
I think this is the most common reason stuff is “bad”. It’s why Subway sandwiches are so soggy, why video games are so buggy, and why IKEA furniture and Primark clothes fall apart so quickly.
It’s good when things are bad for this reason. Or at least, that’s the premise of capitalism: When companies cut costs, that’s the invisible hand redirecting resources to maximize social value, or whatever. Companies may be motivated by greed. And you may not like it, since you want to pay zero dollars for infinite value. But this is markets working as designed.
Second reason stuff is bad: Information asymmetries
Why is it that almost every book / blog / podcast about longevity is such garbage? Well, we don’t actually know many things that will reliably increase longevity. And those things are mostly all boring / hard / non-fun. And even if you do all of them, it probably only adds a couple of years in expectation. And telling people these facts is not a good way to find suckers who will pay you lots of money for your unproven supplements / seminars / etc.
True! But it doesn’t explain why all longevity stuff is so bad. Why don’t honest people tell the true story and drive all the hucksters out of business? I suspect the answer is that unless you have a lot of scientific training and do a lot of research, it’s basically impossible to figure out just how huckstery all the hucksters really are.
I think this same basic phenomenon explains why some supplements contain heavy metals, why some food contains microplastics, why restaurants use so much butter and salt, why rentals often have crappy insulation, and why most cars seem to only be safe along dimensions included in crash test scores. When consumers can’t tell good from evil, evil triumphs.
Third reason stuff is bad: People have bad taste
Sometimes stuff is bad because people just don’t appreciate the stuff you consider good. Examples are definitionally controversial, but I think this includes restaurants in cities where all restaurants are bad, North American tea, and travel pants. This reason has a blurry boundary with information asymmetries, as seen in ultrasonic humidifiers or products that use Sucralose instead of aspartame for “safety”.
Fourth reason stuff is bad: Pricing power
Finally, sometimes stuff is bad because markets aren’t working. Sometimes a company is selling a product but has some kind of “moat” that makes it hard for anyone else to compete with them, e.g. because of some technological or regulatory barrier, control of some key resource or location, intellectual property, a beloved brand, or network effects.
If that’s true, then those companies don’t have to worry as much about someone else stealing their business, and so (because everyone is axiomatically greedy) they will find ways to make their product cheaper and/or raise prices up until the price is equal to the full value it provides to the marginal consumer.
Why is food so expensive at sporting events? Yes, people have no alternatives. But people know food is expensive at sporting events. And they don’t like it. Instead of selling water for $17, why don’t venues sell water for $2 and raise ticket prices instead? I don’t know. Probably something complicated, like that expensive food allows you to extract extra money from rich people without losing business from non-rich people.
So of course dating apps would love to string people along for years instead of finding them long-term relationships, so they keep paying money each month. I wouldn’t be surprised if some people at those companies have literally thought, “Maybe we should string people along for years instead of finding them long-term relationships, so they keep paying money each month, I love money so much.”
But if they are actually doing that (which is unclear to me) or if they are bad in some other way, then how do they get away with it? Why doesn’t someone else create a competing app that’s better and thereby steal all their business? It seems like the answer has to be either “because that’s impossible” or “because people don’t really want that”. That’s where the mystery begins.