2025-05-15 08:00:00
They say you can’t truly hate someone unless you loved them first. I don’t know if that’s true as a general principle, but it certainly describes my relationship with NumPy.
NumPy, by the way, is some software that does computations on arrays in Python. It’s insanely popular and has had a huge influence on all the popular machine learning libraries like PyTorch. These libraries share most of the same issues I discuss below, but I’ll stick to NumPy for concreteness.
NumPy makes easy things easy. Say A
is a 5×5
matrix, x
is a length-5 vector, and you want to find the vector y such that Ay=x
. In NumPy, that would be:
y = np.linalg.solve(A, x)
So elegant! So clear!
But say the situation is even a little more complicated. Say A
is a stack of 100 5×5
matrices, given as a 100×5×5
array. And say x
is a stack of 100 length-5 vectors, given as a 100×5
array. And say you want to solve Aᵢyᵢ=xᵢ
for 1≤i≤100
.
If you could use loops, this would be easy:
y = np.empty_like(x)
for i in range(100):
y[i,:] = np.linalg.solve(A[i,:,:], x[i,:])
But you can’t use loops. To some degree, this is a limitation of loops being slow in Python. But nowadays, everything is GPU and if you’ve got big arrays, you probably don’t want to use loops in any language. To get all those transistors firing, you need to call special GPU functions that will sort of split up the arrays into lots of little pieces and process them in parallel.
The good news is that NumPy knows about those special routines (at least if you use JAX or CuPy), and if you call np.linalg.solve
correctly, it will use them.
The bad news is that no one knows how do that.
Don’t believe me? OK, which of these is right?
y = linalg.solve(A,x)
y = linalg.solve(A,x,axis=0)
y = linalg.solve(A,x,axes=[[1,2],1])
y = linalg.solve(A.T, x.T)
y = linalg.solve(A.T, x).T
y = linalg.solve(A, x[None,:,:])
y = linalg.solve(A,x[:,:,None])
y = linalg.solve(A,x[:,:,None])[:,:,0]
y = linalg.solve(A[:,:,:,None],x[:,None,None,:])
y = linalg.solve(A.transpose([1,2,0]),x[:,:,None]).T
No one knows. And let me show you something else. Here’s the documentation:
Read that. Meditate on it. Now, notice: You still don’t know how to solve Aᵢyᵢ=xᵢ
for all i
at once. Is it even possible? Did I lie when I said it was?
As far as I can tell, what people actually do is try random variations until one seems to work.
NumPy is all about applying operations to arrays. When the arrays have 2 or fewer dimensions, everything is fine. But if you’re doing something even mildly complicated, you inevitably find yourself with some operation you want to apply to some dimensions of array A
, some other dimensions of array B
, and some other dimensions of array C
. And NumPy has no theory for how to express that.
Let me show you what I mean. Suppose:
A
is a K×L×M
arrayB
is a L×N
arrayC
is a K×M
arrayAnd say that for each k
and n
, you’d like to compute the mean over the L
and M
dimensions. That is, you want
Dkn = 1/(LM) × ∑lm Aklm Bln Ckm.
To do that, you’ve got two options. The first is to use grotesque dimension alignment tricks:
D = np.mean(
np.mean(
A[:,:,:,None] *
B[None,:,None,:] *
C[:,None,:,None],
axis=1),
axis=1)
The hell, you ask? Why is None
everywhere? Well, when indexing an array in NumPy, you can write None
to insert a new dimension. A
is K×L×M
, but A[:,:,:,None]
is K×L×M×
1
. Similarly, B[None,:,None,:]
is 1
×L×
1
×N
and C[:,None,:,None]
is K×
1
×M×
1
. When you multiply these together, NumPy “broadcasts” all the size-1 dimensions to give a K×L×M×N
array. Then, the np.mean
calls average over the L
and M
dimensions.
I think this is bad. I’ve been using NumPy for years and I still find it impossible to write code like that without always making mistakes.
It’s also borderline-impossible to read. To prove this, I just flipped a coin and introduced a bug above if and only if the coin was tails. Is there a bug? Are you sure? No one knows.
Your second option is to desperately try to be clever. Life is short and precious, but if you spend a lot of yours reading the NumPy documentation, you might eventually realize that there’s a function called np.tensordot
, and that it’s possible to make it do much of the work:
D = (1/L) * np.mean(
np.tensordot(A, B, axes=[1,0]) *
C[:,:,None],
axis=1)
That’s correct. (I promise.) But why does it work? What exactly is np.tensordot
doing? If you saw that code in some other context, would you have the slightest idea what was happening?
Here’s how I’d do it, if only I could use loops:
D = np.zeros((K,N))
for k in range(K):
for n in range(N):
a = A[k,:,:]
b = B[:,n]
c = C[k,:]
assert a.shape == (L,M)
assert b.shape == (L,)
assert c.shape == (M,)
D[k,n] = np.mean(a * b[:,None] * c[None,:])
People who’ve written too much NumPy may find that clunky. I suspect that’s a wee bit of Stockholm Syndrome. But surely we can agree that it’s clear.
In practice, things are often even worse. Say that A
had shape M×K×L
rather than K×L×M
. With loops, no big deal. But NumPy requires you to write monstrosities like A.transpose([1,2,0])
. Or should that be A.transpose([2,0,1])
? What shapes do those produce? No one knows.
Loops were better.
There is a third option:
D = 1/(L*M) * np.einsum('klm,ln,km->kn', A, B, C)
If you’ve never seen Einstein summation before, that might look terrifying. But remember, our goal is to find
Dkn = 1/(LM) × ∑lm Aklm Bln Ckm.
The string in the above code basically gives labels to the indices in each of the three inputs (klm,ln,km
) and the target indices for the output (->kn
). Then, np.einsum
multiplies together the corresponding elements of the inputs and sums over all indices that aren’t in the output.
Personally, I think np.einsum
is one of the rare parts of NumPy that’s actually good. The strings are a bit tedious, but they’re worth it, because the overall function is easy(ish) to understand, is completely explicit, and is quite general and powerful.
Except, how does np.einsum
achieve all this? It uses indices. Or, more precisely, it introduces a tiny domain-specific language based on indices. It doesn’t suffer from NumPy’s design flaws because it refuses to play by NumPy’s normal rules.
But np.einsum
only does a few things. (Einops does a few more.) What if you want to apply some other function over various dimensions of some arrays? There is no np.linalg.einsolve
. And if you create your own function, there’s certainly no “Einstein” version of it.
I think np.einsum
’s goodness shows that NumPy went somewhere.
Here’s a painting which feels analogous to our subject.
Here’s what I want from an array language. I ain’t particular about syntax, but it would be nice if:
Wouldn’t that be nice? I think NumPy doesn’t achieve these because of its original sin: It took away indices and replaced them with broadcasting. And broadcasting cannot fill indices’ shoes.
NumPy’s core trick is broadcasting. Take this code:
A = np.array([[1,2],[3,4],[5,6]])
B = np.array([10,20])
C = A * B
print(C)
This outputs:
[[ 10 40]
[ 30 80]
[ 50 120]]
Here, A
is a 3×2
array, and B
is a length-2
array. When you multiply them together, B
is “broadcast” to the shape of A
, meaning the first column of A
is multiplied with B[0]=10
and the second is multiplied with B[1]=20
.
In simple cases, this seems good. But I don’t love it. One reason is that, as we saw above, you often have to do gross things to the dimensions to get them to line up.
Another reason is that it isn’t explicit or legible. Sometimes A*B
multiplies element-by-element, and sometimes it does more complicated things. So every time you see A*B
, you have to figure out which case in the broadcasting conventions is getting triggered.
But the real problem with broadcasting is how it infects everything else. I’ll explain below.
Here’s a riddle. Take this code:
A = np.ones((10,20,30,40))
i = np.array([1,2,3])
j = np.array([[0],[1]])
B = A[:,i,j,:]
What shape does B
have?
It turns out the answer is 10×2×3×40
. That’s because the i
and j
indices get broadcast to a shape of 2×3
and then something something mumble mumble mumble. Try to convince yourself it makes sense.
Done? OK, now try these:
C = A[:,:,i,j]
D = A[:,i,:,j]
E = A[:,1:4,j,:]
What shapes do these have?
C
is 10×20×2×3
. This seems logical, given what happened with B
above.
What about D
? It is 2×3×10×30
. Now, for some reason, the 2
and 3
go at the beginning?
And what about E
? Well, “slices” in Python exclude the endpoint, so 1:4
is equivalent to [1,2,3]
which is equivalent to i
, and so E
is the same as B
. Hahaha, just kidding! E
is 10×3×2×1×40
.
Yes, that is what happens. Try it if you don’t believe me! I understand why NumPy does this, because I’ve absorbed this 5000 word document that explains how NumPy indexing works. But I want that time back.
I used this query:
Take this python code
A = np.ones((10,20,30,40))
i = np.array([1,2,3])
j = np.array([[0],[1]])
B = A[:,i,j,:]
C = A[:,:,i,j]
D = A[:,i,:,j]
E = A[:,1:4,j,:]what shapes do B, C, D, and E have?
Claude 3.7 used “extended thinking”. Here are all the incorrect outputs:
AI | B |
C |
D |
E |
---|---|---|---|---|
GPT 4.1 | 10×2×3×30 | |||
Grok 3 | 10×3×30×2 | 10×3×2×40 | ||
Claude 3 Opus | 10×3×2×30 | 10×20×3×2 | 10×3×30×2 | 10×3×2×40 |
Llama 4 Maverick | 10×3×30×2 | 10×3×2×40 | ||
o3 | 10×2×3×30 | |||
Claude 3.7 | 10×3×30×2 | 10×3×2×40 |
AI | B |
C |
D |
E |
---|---|---|---|---|
GPT 4.1 | ✔️ | ✔️ | X | ✔️ |
Grok 3 | ✔️ | ✔️ | X | X |
Claude 3 Opus | X | X | X | X |
Llama 4 Maverick | ✔️ | ✔️ | X | X |
o3 | ✔️ | ✔️ | X | ✔️ |
Claude 3.7 | ✔️ | ✔️ | X | X |
Gemini 2.5 Pro | ✔️ | ✔️ | ✔️ | ✔️ |
DeepSeek R1 | ✔️ | ✔️ | ✔️ | ✔️ |
(DeepSeek’s chain of thought used “wait” 76 times. It got everything right the first time, but when I tried it again, it somehow got B
, C
, and D
all wrong, but E
right.)
This is insane. Using basic features should not require solving crazy logic puzzles.
You might think, “OK, I’ll just limit myself to indexing in simple ways.” Sounds good, except sometimes you need advanced indexing. And even if you’re doing something simple, you still need to be careful to avoid the crazy cases.
This again makes everything non-legible. Even if you’re just reading code that uses indexing in a simple way, how do you know it’s simple? If you see A[B,C]
, that could be doing almost anything. To understand it, you need to remember the shapes of A
, B
, and C
and work through all the cases. And, of course, A
, B
, and C
are often produced by other code, which you also need to think about…
Why did NumPy end up with a np.linalg.solve(A,B)
function that’s so confusing? I imagine they first made it work when A
is a 2D array and and b
is a 1D or 2D array, just like the mathematical notation of A⁻¹b
or A⁻¹B
.
So far so good. But then someone probably came along with a 3D array. If you could use loops, the solution would be “use the old function with loops”. But you can’t use loops. So there were basically three options:
axes
argument, so the user can specify which dimensions to operate over. Maybe you could write solve(A,B,axes=[[1,2],1])
.solve_matrix_vector
would do one thing, solve_tensor_matrix
would do another.solve
will internally try to line up the dimensions. Then it’s the user’s problem to figure out and conform to those Conventions.All these options are bad, because none of them can really cope with the fact that there are a combinatorial number of different cases. NumPy chose: All of them. Some functions have axes
arguments. Some have different versions with different names. Some have Conventions. Some have Conventions and axes
arguments. And some don’t provide any vectorized version at all.
But the biggest flaw of NumPy is this: Say you create a function that solves some problem with arrays of some given shape. Now, how do you apply it to particular dimensions of some larger arrays? The answer is: You re-write your function from scratch in a much more complex way. The basic principle of programming is abstraction—solving simple problems and then using the solutions as building blocks for more complex problems. NumPy doesn’t let you do that.
One last example to show you what I’m talking about. Whenever I whine about NumPy, people always want to see an example with self-attention, the core trick behind modern language models. So fine. Here’s an implementation, which I humbly suggest is better than all 227 versions I found when I searched for “self-attention numpy”:
# self attention by your friend dynomight
input_dim = 4
seq_len = 4
d_k = 5
d_v = input_dim
X = np.random.randn(seq_len, input_dim)
W_q = np.random.randn(input_dim, d_k)
W_k = np.random.randn(input_dim, d_k)
W_v = np.random.randn(input_dim, d_v)
def softmax(x, axis):
e_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
return e_x / np.sum(e_x, axis=axis, keepdims=True)
def attention(X, W_q, W_k, W_v):
d_k = W_k.shape[1]
Q = X @ W_q
K = X @ W_k
V = X @ W_v
scores = Q @ K.T / np.sqrt(d_k)
attention_weights = softmax(scores, axis=-1)
return attention_weights @ V
This is fine. Some of the axis
stuff is a little obscure, but whatever.
But what language models really need is multi-head attention, where you sort of do attention several times in parallel and then merge the results. How do we do that?
First, let’s imagine we lived in a sane world where we were allowed to use abstractions. Then you could just call the previous function in a loop:
# multi-head self attention by your friend dynomight
# if only we could use loops
n_head = 2
X = np.random.randn(seq_len, input_dim)
W_q = np.random.randn(n_head, input_dim, d_k)
W_k = np.random.randn(n_head, input_dim, d_k)
W_v = np.random.randn(n_head, input_dim, d_v)
W_o = np.random.randn(n_head, d_v, input_dim // n_head)
def multi_head_attention(X, W_q, W_k, W_v, W_o):
projected = []
for n in range(n_head):
output = attention(X, W_q[n,:,:], W_k[n,:,:], W_v[n,:,:])
my_proj = output @ W_o[n,:,:]
projected.append(my_proj)
projected = np.array(projected)
output = []
for i in range(seq_len):
my_output = np.ravel(projected[:,i,:])
output.append(my_output)
return np.array(output)
Looks stupid, right? Yes—thank you! Cleverness is bad.
But we don’t live in a sane world. So instead you need to do this:
# multi-head self attention by your friend dynomight
# all vectorized and bewildering
def multi_head_attention(X, W_q, W_k, W_v, W_o):
d_k = W_k.shape[-1]
Q = np.einsum('si,hij->hsj', X, W_q)
K = np.einsum('si,hik->hsk', X, W_k)
V = np.einsum('si,hiv->hsv', X, W_v)
scores = Q @ K.transpose(0, 2, 1) / np.sqrt(d_k)
weights = softmax(scores, axis=-1)
output = weights @ V
projected = np.einsum('hsv,hvd->hsd', output, W_o)
return projected.transpose(1, 0, 2).reshape(seq_len, input_dim)
Ha! Hahahahahaha!
To be clear, I’m only suggesting that NumPy is “the worst array language other than all the other array languages”. What’s the point of complaining if I don’t have something better to suggest?
Well, actually I do have something better to suggest. I’ve made a prototype of a “better” NumPy that I think retains all the power while eliminating all the sharp edges. I thought this would just be a short motivational introduction, but after I started writing, the evil took hold of me and here we are 3000 words later.
Also, it’s probably wise to keep some distance between one’s raving polemics and one’s constructive array language API proposals. So I’ll cover my new thing next time.
2025-05-12 08:00:00
So you’ve made a thing. I’ll pretend it’s a blog post, though it doesn’t really matter. If people read your thing, some would like it, and some wouldn’t.
You should try to make a good thing, that many people would like. That presents certain challenges. But our subject today is only how to give your thing a title.
My advice is: Think of the title as “classifier”.
When people see the title, some are likely to click on it and some won’t. Abstractly speaking, the title adds a second dimension to the above figure:
A title has two goals. First, think of all the people in the world who, if they clicked on your thing, would finish it and love it. Ideally, those people would click. That is, you want there to be people in the like + click region:
Other people will hate your thing. It’s fine, some people hate everything. But if they click on your thing, they’ll be annoyed and tell everyone you are dumb and bad. You don’t want that. So you don’t want people in the hate + click region.
I find it helpful to think about all title-related issues from this perspective.
Everyone is deluged with content. Few people will hate your thing, because very few will care enough to have any feelings about it at all.
The good news is that it’s a big world and none of us are that unique. If you make a thing that you would love, then I guarantee you at least 0.0001% of other people would love it too. That’s still 8000 people! The problem is finding them.
That’s hard. Because—you don’t like most things, right? So you start with a strong prior that most things are bad (for you). Life is short, so you only click on things when there’s a very strong signal they’ll be good (for you).
Say you write a post about concrete. Should you call it, “My favorite concrete pozzolanic admixtures”, even though 99.9% of people have no idea what pozzolanic means? Well, think of the people who’d actually like your thing. Do they know? If so, use “pozzolanic”. That gives a strong signal to Concrete People: “Hey! This is for you! And you know I’m not lying about that, because I’ve driven away all the noobs.”
So ideally you’re aiming for something like this:
Be careful imitating famous people. If Barack Obama made a thing called “Thoughts on blockchain”, everyone would read it, because the implicit title is “Thoughts on blockchain, by Barack Goddamn Obama”. Most of the titles you see probably come from people who have some kind of established “brand”. If you don’t have that, you probably don’t want to choose the same kind of titles.
The title isn’t just about the subject. I called this post, “How to title your blog post or whatever” partly because I hope some of this applies to other things beyond blog posts. But mostly I did that because it signals that my style is breezy and informal. I think people really underrate this.
Some people choose clever punny titles. If you have a big audience that reads all your things, then your title doesn’t need to be a good classifier. I’m not in that situation, but I sometimes find a pun so amusing that I can’t resist. “Fahren-height” was worth it. “Taste games” was not.
Traditional advice says that you should put your main “message” in the title. I have mixed feelings about that. On the one hand, it provides a lot of signal. On the other hand, it seems to get people’s hackles up. The world is full of bad things that basically pick a conclusion and ignore or distort all conflicting evidence. If you’re attempting to be fair or nuanced, putting your conclusion in the title might signal that you’re going to be a typical biased/bad thing. It will definitely lead to lots of comments “refuting” you from people who didn’t read your thing.
Putting your conclusion in the title may also ruin your “story”. Though you should ask: Do the people who’d like your thing really care about your story?
A difficult case is things that create new “labels”. Sometimes there’s an idea floating around, and we need someone to make a canonical thing with a Name. To serve that role, the thing’s title needs to be that name. This presents a trade-off. A post titled “The Waluigi Effect” is great for people who want to know what that is, but terrible for everyone else.
For the best title ever I nominate, “I’m worried about Chicago”. It doesn’t look fancy, but do you see how elegantly it balances all the above issues?
You’d think that, by 2025, technology would have solved the problem of things getting to people. I think it’s the opposite. Social media is optimized to keep people engaged and does not want people leaving the walled garden. Openly prohibiting links would cause a revolt, so instead they go as close as people will tolerate. Which, it turns out, is pretty close.
Boring titles are OK. I know that no one will click on “Links for April” who doesn’t already follow me. But I think that’s fine, because I don’t think anyone else would like it.
Consider title-driven thing creation. That is, consider first choosing a title and then creating a thing that delivers on the title. It’s sad to admit, but I think there are many good things that simply don’t have good titles. Consider not making those things. The cynical view of this is that without a good title, no one will read your thing, so why bother? The optimistic view is that we’re all drowning in content, so what the world actually needs is good things that can find their way to the people who will benefit from them. In practice, it’s often something in the middle: You start to create your thing, then you choose a title, then you structure your thing to deliver on the title.
My favorite thing category is “Lucid examination of all sides of an issue which finds some evidence pointing in various directions and doesn’t reach a definitive conclusion because the world is complicated”. Some people make fun of me for spending so much time researching seed oils and then lamely calling my thing “Thoughts on seed oil”. But what should I have called that instead? Lots of bloggers create things in this category, and no one seems to have solved the problem.
[Insert joke about how bad the title of this post is.]
2025-05-08 08:00:00
This is an article that just appeared in Asimov Press, who kindly agreed that I could publish it here and also humored my deep emotional need to use words like “Sparklepuff”.
Do you like information theory? Do you like molecular biology? Do you like the idea of smashing them together and seeing what happens? If so, then here’s a question: How much information is in your DNA?
When I first looked into this question, I thought it was simple:
Easy, right? Sure, except:
Such questions quickly run into the limits of knowledge for both biology and computer science. To answer them, we need to figure out what exactly we mean by “information” and how that’s related to what’s happening inside cells. In attempting that, I will lead you through a frantic tour of information theory and molecular biology. We’ll meet some strange characters, including genomic compression algorithms based on deep learning, retrotransposons, and Kolmogorov complexity.
Ultimately, I’ll argue that the intuitive idea of information in a genome is best captured by a new definition of a “bit”—one that’s unknowable with our current level of scientific knowledge.
What is “information”? This isn’t just a pedantic question, as there are actually several different mathematical definitions of a “bit”. Often, the differences don’t matter, but for DNA, they turn out to matter a lot, so let’s start with the simplest.
In the storage space definition, a bit is a “slot” in which you can store one of two possible values. If some object can represent 2ⁿ possible patterns, then it contains n bits, regardless of which pattern actually happens to be stored.
So here’s a question we can answer precisely: How much information could your DNA store?
A few reminders: DNA is a polymer. It’s a long chain of chunks of ~40 atoms called “nucleotides”. There are four different chunks, commonly labeled A, T, C, and G. In humans, DNA comes in 23 pieces of different lengths, called “chromosomes”. Humans are “diploid”, meaning we have two versions of each chromosome. We get one from each of our parents, made by randomly weaving together sections from the two chromosomes they got from their parents.
Technically there’s also a tiny amount of DNA in the mitochondria. This is neat because you get it from your mother basically unchanged and so scientists can trace tiny mutations back to see how our great-great-…-great grandmothers were all related. If you go far enough back, our maternal lines all lead to a single woman, Mitochondrial Eve, who probably lived in East Africa 120,000 to 156,000 years ago. But mitochondrial DNA is tiny so I won’t mention it again.
Chromosomes 1-22 have a total of 2.875 billion nucleotides; the X chromosome has 156 million, and the Y chromosome has 62 million. From here, we can calculate the total storage space in your DNA. Remember, each nucleotide has 4 options, corresponding to 2 bits. So if you’re female, your total storage space is:
(2×2875 + 2×156) million nucleotides
× 2 bits / nucleotide
= 12.12 billion bits
= 1.51 GB.
If you’re male, the total storage space is:
(2×2875 + 156 + 62) million nucleotides
× 2 bits / nucleotide
= 11.94 billion bits
= 1.49 GB.
For comparison, a standard single-layer DVD can store 37.6 billion bits or 4.7 GB. The code for your body, magnificent as it is, takes up as much space as around 40 minutes of standard definition video.
So in principle, your DNA could represent around 212,000,000,000 different patterns. But hold on. Given human common ancestry, the chromosome pair you got from your mother is almost identical to the one you got from your father. And even ignoring that, there are long sequences of nucleotides that are repeated over and over in your DNA, enough to make up a significant fraction of the total. It seems weird to count all this repeated stuff. So perhaps we want a more nuanced definition of “information.”
A string of 12 billion zeros is much longer than this article. But most people would (I hope) agree that this article contains more information than a string of 12 billion zeros. Why?
One of the fundamental ideas from information theory is to define information in terms of compression. Roughly speaking, the “information” in some string is the length of the shortest possible compressed representation of that string.
So how much can you compress DNA? Answers to this question are all over the place. Some people claim it can be compressed by more than 99 percent, while others claim the state of the art is only around 25 percent. This discrepancy is explained by different definitions of “compression”, which turn out to correspond to different notions of “information”.
Fun facts: Because of these deletions and insertions, different people have slightly different amounts of DNA. In fact, each of your chromosome pairs have DNA of slightly different lengths. When your body creates sperm/ova it uses a crazy machine to align the chromosomes in a sensible way so different sections can be woven together without creating nonsense. Also, those same measures of similarity would say that we’re around 96 percent identical with our closest living cousins, the bonobos and chimpanzees.
The fact that we share so much DNA is key to how some algorithms can compress DNA by more than 99 percent. They do this by first storing a reference genome, which includes all the DNA that’s shared by all people and perhaps the most common variants for regions of DNA where people differ. Then, for each individual person, these algorithms only store the differences from the reference genome. Because that reference only has to be stored once, it isn’t counted in the compressed representation.
That’s great if you want to cram as many of your friends’ genomes on a hard drive as possible. But it’s a strange definition to use if you want to measure the “information content of DNA”. It implies that any genomic content that doesn’t change between individuals isn’t important enough to count as “information”. However, we know from evolutionary biology that it’s often the most crucial DNA that changes the least precisely because it’s so important. Heritability tends to be lower for genes more closely related to reproduction.
The best compression without a reference seems to be around 25 percent. (I expect this number to rise a bit over time, as the newest methods use deep learning and research is ongoing.) That’s not a lot of compression. However, these algorithms are benchmarked in terms of how well they compress a genome that includes only one copy of each chromosome. Since your two chromosomes are almost identical (at least, ignoring the Y chromosome), I’d guess that you could represent the other half almost for free, meaning a compression rate of around 50 percent + ½ × 25 percent ≈ 62 percent.
So if you compress DNA using an algorithm with a reference genome, it can be compressed by more than 99 percent, down to less than 120 million bits. But if you compress it without a reference genome, the best you can do is 62 percent, meaning 4.6 billion bits.
Which of these is right? The answer is that either could be right. There are two different definitions of a “bit” in information theory that correspond to different types of compression.
In the Kolmogorov complexity definition, named after the remarkable Soviet mathematician Andrey Kolmogorov, a bit is a property of a particular string of 1
s and 0
s. The number of bits of information in the string is the length of the shortest computer program that would output that string.
In the Shannon information definition, named after the also-remarkable American polymath Claude Shannon, a bit is again a property of a particular sequence of 1
s and 0
s, but it’s only defined relative to some large pool of possible sequences. In this definition, if a given sequence has a probability p of occurring, then it contains n bits for whatever value of n satisfies 2ⁿ=1/p. Or, equivalently, n=-log₂ p.
The Kolmogorov complexity definition is clearly related to compression. But what about Shannon’s?
Well, say you have three beloved pet rabbits, Fluffles, Marmalade, and Sparklepuff. And say you have one picture of each of them, each 1 MB large when compressed. To keep me updated on how you’re feeling, you like to send me these same pictures over and over again, with different pets for different moods. You send a picture of Fluffles ½ the time, Marmalade ¼ of the time, and Sparklepuff ¼ of the time. (You only communicate in rabbit pictures, never with text or images.)
But then you decide to take off in a spacecraft, and your data rates go way up. Continuing the flow of pictures is crucial, so what’s the cheapest way to do that? The best thing would be that we agree that if you send me a 0
, I should pull up the picture of Fluffles, while if you send 10
I should pull up Marmalade, and if you send 11
, I should pull up Sparklepuff. This is unambiguous: If you send 0011100, that means Fluffles, then Fluffles again, then Sparklepuff, then Marmalade, then Fluffles one more time.
It all works out. The “code length” for Fluffles is the number n so that 2ⁿ=1/p:
pet | probability p | code | code length n | 2ⁿ | 1/p |
---|---|---|---|---|---|
Fluffles | ½ | 0 |
1 | 2 | 2 |
Marmelade | ¼ | 10 |
2 | 4 | 4 |
Sparklepuff | ¼ | 11 |
2 | 4 | 4 |
Intuitively, the idea is that if you want to send as few bits as possible over time, then you should give short codes to high-probability patterns and long codes to low-probability patterns. If you do this optimally (in the sense that you’ll send the fewest bits over time), it turns out that the best thing is to code a pattern with probability p with about n bits, where 2ⁿ=p. (In general, things don’t work out quite this nicely, but you get the idea.)
In the Fluffles scenario, the Kolmogorov complexity definition would say that each of the images contains 1 MB of information since that’s the smallest each image can be compressed. But under the Shannon information definition, the Fluffles image contains 1 bit of information, and the Marmalade and Sparklepuff images contain 2 bits. This is quite a difference!
Now, let’s return to DNA. There, the Kolmogorov complexity definition basically corresponds to the best possible compression algorithm without a reference. As we saw above, the best-known current algorithm can compress by 62 percent. So, under the Kolmogorov complexity definition, DNA contains at most 12 billion × (1-0.62) ≈ 4.6 billion bits of information.
Meanwhile, under the Shannon information definition, you can assume that the distribution of all human genomes is known. The information in your DNA only includes the bits needed to reconstruct your genome. That’s essentially the same as compressing with a reference. So, under the Shannon information definition, your DNA contains less than 12 billion × (1-0.01) ≈ 120 million bits of information.
While neither of these is “wrong” for DNA, I prefer the Kolmogorov complexity definition for its ability to best capture DNA that codes for features and functions shared by all humans. After all, if you’re trying to measure how much “information” our DNA carries from our evolutionary history, surely you want to include that which has been universally preserved.
At some point, your high-school biology teacher probably told you (or will tell you) this story about how life works:
First, your DNA gets transcribed into matching RNA.
Next, that RNA gets translated into protein.
Then the protein does Protein Stuff.
If things were that simple, we could easily calculate the information density of DNA just by looking at what fraction of your DNA ever becomes a protein (only around 1 percent). But it’s not that simple. The rest of your DNA does other important things, like regulating what proteins get made. Some of it seems to exist only for the purpose of copying itself. Some of it might do nothing, or it might do important things we don’t even know about yet.
So let me tell you that story again with slightly more detail:
In the beginning, your DNA is relaxing in the nucleus.
Some parts of your DNA, called promoters, are designed so that if certain proteins are nearby, they’ll stick to the DNA.
If that happens, then a hefty little enzyme called “RNA polymerase” will show up, crack open the two strands of DNA, and start transcribing the nucleotides on one side into “pre-messenger RNA” (pre-mRNA).
Eventually, for one of several reasons—none of which make any sense to me—the enzyme will decide it’s time to stop transcribing, and the pre-mRNA will detach and float off into the nucleus. At this point, it’s a few thousand or a few tens of thousands of nucleotides long.
Then, my personal favorite macromolecular complex, the “spliceosome”, grabs the pre-mRNA, cuts away most of it, and throws those parts away. The sections of DNA that code for the parts that are kept are called exons, while the sections that code for parts that are thrown away are called introns.
Next, another enzyme called “RNA guanylyltransferase” (we can’t all be beautiful) adds a “cap” to one end, and an enzyme called “poly(A) polymerase” adds a “tail” to the other end.
The pre-mRNA is now all grown up and has graduated to being regular mRNA. At this point, it is a few hundred or a few thousand nucleotides long.
Then, some proteins notice that the mRNA has a tail, grab it, and throw it out of the nucleus into the cytoplasm, where the noble ribosome lurks.
It’s thought that ~1 percent of your DNA is exons and ~24 percent is introns. What’s the rest of it doing?
Well, while the above dance is happening, other sections of DNA are “regulating” it. Enhancers are regions of DNA where a certain protein can bind and cause the DNA to physically bend so that some promoter somewhere else (typically within a million nucleotides) is more likely to get activated. Silencers do the opposite. Insulators block enhancers and silencers from influencing regions they shouldn’t influence.
While that might sound complicated, we’re just warming up. The same region of DNA can be both an intron and an enhancer and/or a silencer. That’s right, in the middle of the DNA that codes for some protein, evolution likes to put DNA that regulates some other, distant protein. When it’s not regulating, it gets transcribed into (probably useless) pre-RNA and then cut away and recycled by the spliceosome.
Telomeres shrink as we age. The body has mechanisms to re-lengthen them, but it mostly only uses these in stem cells and reproductive cells. Longevity folks are interested in activating these mechanisms in other tissues to fight aging, but this is risky since the body seems to intentionally limit telomere repair as a strategy to prevent cancer cells from growing out of control.
Further complicating this picture are many regions of DNA that code for RNA that’s never translated into a protein but still has some function. Some regions make tRNA, whose job is to bring amino acids to the ribosome. Other regions make rRNA, which bundle together with some proteins to become the ribosome. There’s siRNA, microRNA, and piRNA that screw around with mRNA produced. And there’s scaRNA, snoRNA, rRNA, lncRNA, and mrRNA. Many more types are sure to be defined in the future, both because it’s hard to know for sure if DNA gets transcribed, it’s hard to know what functions RNA might have, and because academics have strong incentives to invent ever-finer subcategories.
In more serious cases, these mutations might make the organism non-viable, or lead to problems like Tay-Sachs disease or Cystic fibrosis. But this wouldn’t be considered a pseudogene.
Why? Why is this all such a mess? Why is it so hard to say if a given section of DNA does anything useful?
Biologists hate “why” questions. We can’t re-run evolution, so how can we say “why” evolution did things the way it did? Better to focus on how biological systems actually work. This is probably wise. But since I’m not a biologist (or wise), I’ll give my theory: Cells work like this because DNA is under constant attack from mutations.
Mutations most commonly arise during cell replication. Your DNA is composed of around 250 billion atoms. Making a perfect copy of all those atoms is hard. Your body has amazing nanomachines with many redundant mechanisms to try to correct errors, and it’s estimated that the error rate is less than one per billion nucleotides. But with several billion nucleotides, mutations happen.
There are also environmental sources of mutations. Ultraviolet light has more energy than visible light. If it hits your skin, that energy can sort of knock atoms out of place. The same thing happens if you’re exposed to radiation. Certain chemicals, like formaldehyde, benzene, or asbestos, can also do this or can interfere with your body’s error correction tricks.
“DNA transposons” get cut out and stuck back in somewhere else, while “retrotransposons” create RNA that’s designed to get reverse-transcribed back into the DNA in another location. There are also “retroviruses” like HIV that contain RNA that they insert into the genome. Some people theorize that retrotransposons can evolve into retroviruses and vice-versa.
It’s rare for retrotransposons to actually succeed in making a copy of themselves. They seem to have only a 1 in 100,000 or in 1,000,000 chance of copying themselves during cell division. But this is perhaps 10 times as high in the germ line, so the sperm from older men is more likely to contain such mutations.
Mutations in your regular cells will just affect you, but mutations in your sperm/eggs could affect all future generations. Evolution helps manage this through selection. Say you have 10 bad mutations, and I have 10 bad mutations, but those mutations are in different spots. If we have some babies together, some of them might get 13 bad mutations, but some might only get 7, and the latter babies are more likely to pass on their genes.
But as well as selection, cells seem designed to be extremely robust to these kinds of errors. Instead of just relying on selection, there are many redundant mechanisms to tolerate them without much issue.
And remember, evolution is a madman. If it decides to tolerate some mutation, everything else will be optimized against it. So even if a mutation is harmful at first, evolution may later find a way to make use of it.
So, in theory, how should we define the “information content” of DNA? I propose a definition I call the “phenotypic Kolmogorov complexity”. (This has surely been proposed by someone before, but I can’t find a reference, try as I might.) Roughly speaking, this is how short you could make DNA and still get a “human”.
The “phenotype” of an animal is just a fancy way of referring to its “observable physical characteristics and behaviors”. So this definition says, like Kolmogorov complexity, to try and find the shortest compressed representation of the DNA. But instead of needing to lead to the same DNA you have, it just needs to lead to an embryo that would look and behave like you do.
This definition isn’t totally precise, because I’m not saying how precisely the phenotype needs to match. Even if there’s some completely useless section of DNA and we remove it, that would make all your cells a tiny bit lighter. We need to tolerate some level of approximation. The idea is that it should be very close, but it’s hard to make this precise.
So what would this number be? My guess is that you could reduce the amount of DNA by at least 75 percent, but not by more than 98 percent, meaning the information content is:
12 billion bits
× 2 bits / nucleotide
× (2 to 25 percent)
= 480 million to 6 billion bits
= 60 MB to 750 MB
But in reality, nobody knows. We still have no idea what (if anything) lots of DNA is doing, and we’re a long way from fully understanding how much it can be reduced. Probably, no one will know for a long time.
2025-05-05 08:00:00
In a recent post about trading stuff for money, I mentioned:
Europe had a [blood plasma] shortage of around 38%, which it met by importing plasma from paid donors in the United States, where blood products account for 2% of all exports by value.
The internet’s reaction was: “TWO PERCENT?” “TWO PERCENT OF U.S. EXPORTS ARE BLOOD!?”
Well, I took that 2% number from a 2024 article in the Economist:
Last year American blood-product exports accounted for 1.8% of the country’s total goods exports, up from just 0.5% a decade ago—and were worth $37bn. That makes blood the country’s ninth-largest goods export, ahead of coal and gold. All told, America now supplies 70% or so of the plasma used to make medicine.
I figured the Economist was trustworthy on matters of economics. But note:
The article doesn’t explain how they arrived at 1.8%. And since the Economist speaks in the voice of God (without bylines), I can’t corner and harass the actual journalist. I’d have liked to reverse-engineer their calculations, but this was impossible since the world hasn’t yet caught on that they should always show lots of digits.
So what’s the right number? In 2023, total US goods exports were $2,045 billion, almost exactly ⅔ of all exports, including services.
How much of that involves blood? Well, the government keeps statistics on trade based on an insanely detailed classification scheme. All goods get some number. For example, dirigibles fall under HTS 8801.90.0000:
Leg warmers fall under HTS 6406.99.1530:
So what about blood? Well, HTS 3002 is the category for:
Human blood; animal blood prepared for therapeutic, prophylactic or diagnostic uses; antisera and other blood fractions and modified immunological products, whether or not obtained by means of biotechnological processes; vaccines, toxins, cultures of micro-organisms (excluding yeasts) and similar products:
The total exports in this category in 2023 were 41.977 billion, or 2.05% of all goods exports. But that category includes many products that don’t require human blood such as most vaccines.
To get the actual data, you need to go through a website maintained by the US Trade Commission. This website has good and bad aspects. On the one hand, it’s slow and clunky and confusing and often randomly fails to deliver any results. On the other hand, when you re-submit, it clears your query and then blocks you for submitting too many requests, which is nice.
But after a lot of tearing of hair, I got what seems to be the most detailed breakdown of that category available. There are some finer subcategories in the taxonomy, but they don’t seem to have any data.
So let’s go through those categories. To start, here are some that would seem to almost always contain human blood:
Category | Description | Exports ($) | Percentage of US goods exports |
---|---|---|---|
3002.12.00.10 | HUMAN BLOOD PLASMA | 5,959,103,120 | 0.2914% |
3002.12.00.20 | NORMAL HUMAN BLOOD SERA, WHETHER OR NOT FREEZE-DRIED | 38,992,251 | 0.0019% |
3002.12.00.30 | HUMAN IMMUNE BLOOD SERA | 5,608,090 | 0.0003% |
3002.12.00.90 | ANTISERA AND OTHER BLOOD FRACTIONS | 4,808,069,119 | 0.2351% |
3002.90.52.10 | WHOLE HUMAN BLOOD | 22,710,898 | 0.0011% |
TOTAL | (YES BLOOD) | 10,834,483,478 | 0.5298% |
Next, there are several categories that would seem to essentially never contain human blood:
Category | Description | Exports ($) | Percentage of US goods exports |
---|---|---|---|
3002.12.00.40 | FETAL BOVINE SERUM (FBS) | 146,026,727 | 0.0071% |
3002.42.00.00 | VACCINES FOR VETERINARY MEDICINE | 638,191,743 | 0.0312% |
3002.49.00.00 | VACCINES, TOXINS, CULTURES OF MICRO-ORGANISMS EXCLUDING YEASTS, AND SIMILAR PRODUCTS, NESOI | 1,630,036,341 | 0.0797% |
3002.59.00.00 | CELL CULTURES, WHETHER OR NOT MODIFIED, NESOI | 79,384,134 | 0.0039% |
3002.90.10.00 | FERMENTS | 361,418,233 | 0.0177% |
TOTAL | (NO BLOOD) | 2,869,107,296 | 0.1403% |
Finally, there are categories that include some products that might contain human blood:
Category | Description | Exports ($) | Percentage of US goods exports |
---|---|---|---|
3002.13.00.00 | IMMUNOLOGICAL PRODUCTS, UNMIXED, NOT PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 624,283,112 | 0.0305% |
3002.14.00.00 | IMMUNOLOGICAL PRODUCTS, MIXED, NOT PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 5,060,866,208 | 0.2475% |
3002.15.01.00 | IMMUNOLOGICAL PRODUCTS, PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 13,317,356,469 | 0.6512% |
3002.41.00.00 | VACCINES FOR HUMAN MEDICINE, NESOI | 7,760,695,744 | 0.3795% |
3002.51.00.00 | CELL THERAPY PRODUCTS | 595,963,010 | 0.0291% |
3002.90.52.50 | HUMAN BLOOD; ANIMAL BLOOD PREPARED FOR THERAPEUTIC, PROPHYLATIC OR DIAGNOSTIC USES; ANTISERA AND OTHER BLOOD FRACTIONS, ETC. NESOI | 914,348,561 | 0.0447% |
TOTAL | (MAYBE BLOOD) | 28,273,513,104 | 1.3826% |
The biggest contributor here is IMMUNOLOGICAL PRODUCTS (be they MIXED or UNMIXED, PUT UP or NOT PUT UP). The largest fraction of these is probably antibodies.
Antibodies are sometimes made from human blood. You may remember that in 2020, some organizations collected human blood from people who’d recovered from Covid to make antibodies. But it’s important to stress that this is quite rare. Human blood, after all, is expensive. So—because capitalism—whenever possible animals are used instead, often rabbits, goats, sheep, or humanized mice.
I can’t find any hard statistics on this. But I know several people who work in this industry. So I asked them to just guess what fraction might include human blood. Biologists don’t like numbers, so this took a lot of pleading, but my best estimate is 8%.
When looking at similar data a few years ago, Market Design suggested that immunoglobulin products might also fall under this category. But as far as I can tell this is not true. I looked up the tariff codes for a few immunoglobulin products, and they all seem to fall under 3002.90 (“HUMAN BLOOD; ANIMAL BLOOD PREPARED FOR THERAPEUTIC, PROPHYLATIC OR DIAGNOSTIC USES; ANTISERA AND OTHER BLOOD FRACTIONS, ETC. NESOI”).
What about vaccines or cell therapy products? These almost never contain human blood. But they are sometimes made by growing human cell lines, and sometimes those cell lines require human blood serum to grow. More pleading with the biologists produced a guess that this is true for 5% of vaccines and 80% of cell therapies.
Aside: Even if they do require blood serum, it’s somewhat debatable if they should count as “blood products”. How far down the supply chain does that classification apply? If I make cars, and one of my employees gets injured and needs a blood transfusion, are my cars now “blood products”?
Anyway, here’s my best guess for the percentage of products in this middle category that use human blood:
Category | Description | Needs blood (guess) | Exports ($) | Percentage of US goods exports |
---|---|---|---|---|
3002.13.00.00 | IMMUNOLOGICAL PRODUCTS, UNMIXED, NOT PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 8% | 49,942,648 | 0.0024% |
3002.14.00.00 | IMMUNOLOGICAL PRODUCTS, MIXED, NOT PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 8% | 404,869,296 | 0.0198% |
3002.15.01.00 | IMMUNOLOGICAL PRODUCTS, PUT UP IN MEASURED DOSES OR IN FORMS OR PACKINGS FOR RETAIL SALE | 8% | 1,065,388,517 | 0.0521% |
3002.41.00.00 | VACCINES FOR HUMAN MEDICINE, NESOI | 5% | 388,034,787 | 0.0190% |
3002.51.00.00 | CELL THERAPY PRODUCTS | 80% | 476,770,408 | 0.0233% |
3002.90.52 | HUMAN BLOOD; ANIMAL BLOOD PREPARED FOR THERAPEUTIC, PROPHYLATIC OR DIAGNOSTIC USES; ANTISERA AND OTHER BLOOD FRACTIONS, ETC. NESOI | 90% | 822,913,704 | 0.0402% |
TOTAL | (GUESSED BLOOD) | 3,207,919,363 | 0.1569% |
So 0.5298% of goods exports almost certainly use blood, and my best guess is that another 0.1569% of exports also include blood, for a total of 0.6867%.
Obviously, this is a rough cut. But I couldn’t find any other source that shows their work in any detail, so I hoped that by publishing this I could at least prod Cunningham’s law into action. Sorry for all the numbers.
2025-05-01 08:00:00
Examples are good. Let’s start with some examples:
We all need kidneys, or at least one kidney. Donating a kidney sucks, but having zero working kidneys really sucks. Paying people for kidneys would increase the number available, but it seems gross to pay people for part of their body. Donating a kidney is low-risk, but not zero risk. If you pay for kidneys, the extra kidneys tend to come from poorer people. So we don’t pay, and every day people die for lack of a kidney.
Except for Iran. Yes, in Iran you can legally buy or sell a kidney for a few thousand dollars. There is no waiting list for transplants, but most sellers seem driven by desperation and overall it doesn’t sound super awesome.
We all need a heart. Paying someone for their heart would mean paying for suicide. If we were to auction off hearts from organ donors, they would tend to go to rich people. People die every day from lack of a heart, but you don’t hear much about trading hearts for money.
Many people need blood plasma. For some people (me) donating blood plasma is a psychological nightmare. For other people it’s fine. Not getting plasma when you need it is very bad. Paying people for plasma means more plasma, mostly from low-income people. Much of Europe has long prohibited paying for plasma. Denmark and Italy met their needs with altruistic donors (Edit: Incorrect, Thanks to The Plasma Professor), but overall Europe had a shortage of around 38%, which it met by importing plasma from paid donors in the United States, where blood products account for 2% 0.7% of all (goods) exports by value.
The EU recently legalized limited payments for blood donations. The French government opposed this change. The French government owns a company that runs paid plasma centers in the United States.
Some people want hair. Prohibiting people from selling their hair is stupid. You should be allowed to sell your hair.
We all need a liver. You can—amazingly—give away half your liver and re-grow the rest in a few months. This is pretty safe, but compared to donating a kidney is a more complex surgery with a longer recovery period and 3-10× the mortality risk.
Steve Jobs got pancreatic cancer in 2003. This was a rare form that often responds to treatment, but Jobs initially refused surgery and spent almost a year doing “alternative” treatments. Finally in 2004 he had surgery. In 2009, he had a liver transplant. This may have been needed as a consequence of Jobs’ decision to delay treatment in 2003. Tim Cook offered half his liver, but Jobs angrily refused. Most people in this situation would not have been eligible for a liver from the public donor registry, but Jobs was able to leverage his wealth and connections to both get classified as eligible and jump the queue. Jobs died two years later.
We all need food. Food that is healthier or tastier is often more expensive. Rich people get to eat more of it. Our for-profit food production system is really efficient and in rich countries the main problem is eating too much food.
We all need somewhere to live. Housing that is closer to high-paying jobs or larger/nicer is more expensive. Richer people get to live in nicer homes. The cost of housing means many people need to accept long commutes or live with lots of roommates or cities with worse job opportunities.
Buildings needs roofs. In North America, roofs are most often made of asphalt shingles, which need to be replaced every 10-30 years. Roofing work is exhausting and miserable and dangerous. People would rather not do roofing. Roofing is well-paid given the qualifications. We have the technology to make roofs that last for 100 years, at a lower long-term cost. Nobody suggests making it illegal to pay people to do roofing.
Large pink diamonds are rare. Only rich people get to have large pink diamonds. This is fine.
If there’s a sudden shortage of fuel, then you can either ration or let prices go up. If you let prices go up, then rich people get to drive more, but if you need fuel to drive grandma to the hospital, you can buy some.
Cars need to park. If there’s a shortage of parking, you can either raise prices or let people fight for spots. If you raise prices, then rich people get to park more, but if you need to park next to the hospital to drop off grandma, you can do so. If you don’t raise prices, people drive around endlessly looking for spots, wasting energy, creating pollution, and slowing traffic.
We all want to buy goods and services. People sell these to us for money. They do that because they can use the money to buy other stuff they want. If money didn’t provide any advantage, they wouldn’t do that.
Many people want babies. The idea of auctioning off babies is gross. Nobody wants to auction off babies.
Many people want babies, but can’t biologically carry a baby to term. Carrying a baby to term is hard on your body and deeply personal. In much of the world, it’s illegal to have someone else to do this for you. In most of the rest, it’s illegal to pay someone to do it. In a few places it’s legal to pay. (Contemplate this list: Arkansas, Belarus, California, Florida, Illinois, Kazakhstan, Maine, Nevada, New Hampshire, Russia, Ukraine, Vermont, Washington.) The people who purchase this service are usually richer than the women they buy it from. If you’re willing to pay a woman to be a surrogate, some third party might coerce her and steal the money. People who live in places where commercial surrogacy is illegal often buy it from places where it’s legal.
Most adults want sex. Some have difficulty accessing it. Paying for sex increases the supply of sex. Some people believe paid sex is degrading or has harmful cultural effects. If you’re willing to pay someone for sex, some third party might coerce them and steal the money. Paying for sex is illegal in most of the world. In places where it’s legal, organized brothels are often illegal. In a few places (Canada, France, Ireland, Norway, Sweden) it’s legal to sell sex but not buy it.
Sometimes on planes I think about offering the person in front of me some money to not recline their seat. I don’t do this because I’m pretty sure it would end with them either (A) refusing and thinking I’m a huge jerk or (B) doing it for free and thinking I’m a huge jerk.
Lots of people want to move to rich countries. Some rich countries let people based on employment, some based on family, and some on “points”. If you auctioned off the right to move to a rich country, you’d get a mixture of people who (A) have lots of money, and (B) would economically benefit from moving. A few places—including arguably the United States—do this already.
Lots of people want their kids to get better grades. Lots of people pay for tutors or extra after-school education. You could directly pay your kids to get good grades. This seems strange and possibly bad, though I’m not sure why.
After working through these kinds of cases, I feel: Squishy.
I’m attracted to simple rules that can rise to tame the complexity of the real world. But the more I think about these cases, the less optimistic I feel about such rules.
Like every rationalist-adjacent blogger, I lean vaguely libertarian and consequentialist. (I wish I was more unique and interesting.) So I sometimes find myself thinking in high-handed slogans. Things like, “The government should not intrude in arrangements between consenting adults”, or “The right policy is whatever makes the outcome as good as possible.” I like how those words sound. But are they actually useful?
For example: Paid sex is not my thing. But there are some scenarios (e.g. people with certain disabilities) where prohibiting it seems downright cruel and providing this service downright noble.
On the other hand, when you talk about “arrangements between consenting adults”, it seems to call to mind a sort of theoretical idealized society. Like most people, I like to blithely imagine the Netherlands are such a society. After formally legalizing sex work in 2000, they’ve been creative and tenacious in trying to address organized crime and coercion. It sounds like it’s going OK, but not exactly great? I guess almost every other country has lower state capacity and would do somewhat worse.
Or take kidneys again. Say we had a total free market libertarian utopia/dystopia: If a rich person wants a kidney, they can go find a drug addict, hustle them into a clinic, get them to sign some forms, hand them some cash, and then take their kidney. That sounds gross. I’m not 100% confident I could win a debate arguing from first-principles that it’s grosser than our current system in which thousands of people die every year for lack of a kidney. But I’m not too worried about that, because it has zero chance of happening.
The Coalition to modify the National Organ Transplant Act wants to pay people to donate kidneys. They suggest a months-long screening process that only the 10% of people at lowest risk would pass. Donors would get no money up front, but would get $10,000 per year when they file their taxes for the next five years. This seems less gross than the libertarian {u,dys}topia because people couldn’t donate if they were high risk, because there’s a long waiting period, and because the resulting kidneys would be given out according to the current (non-market) system based on need and potential benefit.
The Coalition points also out that lower-income people would benefit the most from extra kidneys, since rich people tend to have healthy friends and family who are willing and able to give a directed donation. They also point out that the lowest-income people are the least likely to qualify as low-risk donors. But common sense still says the extra donors you get by paying people will tend to be lower income.
I don’t love that. But I think it’s silly to look at the flaws of one system without comparing to the flaws of the alternatives. As far as I can tell, those are: (1) Do nothing and let thousands of people continue to die every year. (2) Pay rich people extra when they donate. (3) Force everyone to register for some kind of kidney donation “lottery”. (4) Reeducation campaigns. (5) Marxism. Maybe the Coalition’s proposal is the “worst system other than all the other systems”.
In both cases (paid sex and paid kidneys) rules and slogans are weak. The action is in details.
What makes some things seem grosser than others? There seem to be many factors. Do some people need the stuff more than others? Will trading for money get the stuff to the people who need it more? Will money increase production? Do we want more production?
Here’s a case I find particularly confounding: Why does paying a surrogate mother seem not-that-bad (at worst), but auctioning off a baby seem horrific? Sure, surrogate mothers usually use genetic material from the clients, but even with an embryo from third parties, it still seems OK. Yet, if I buy an embryo and then pay a surrogate mother, haven’t I just bought a baby in advance? I can’t find any clear distinction, but I also can’t get myself to bite the bullet and say the two are equivalent.
But I do have one theory.
In terms of how gross it is to sell body parts like normal market products, I think everyone agrees the order is hair < blood ≪ kidney < liver ≪ heart.
I don’t think that order is controversial. The main way people differ is in terms of where they’d draw the line.
As you’ve surely surmised, I lean somewhere right of “kidney”. While this is a minority view in the world, I suspect it’s a majority view among people reading this. So I thought I should make the case for drawing the line near the left end of the spectrum.
Here goes: When I picture paying someone for a kidney, I picture someone who is healthy and hearty. They’re thriving in life and don’t need money, but they drive a Honda and they really want an Acura, so they sell a kidney and buy an Acura and live happily ever after. When I think of paid surrogates, I picture a woman who loves being pregnant so much she’d almost do it for fun.
Lovely. But in the existing organ industry in Iran sounds grim. Many sellers seem motivated by extreme poverty and financial desperation.
If someone does something out of desperation, you can argue that—almost by definition—this means it helps them, and removing the option would hurt them.
But suppose that if everyone had their basic needs met, then almost no one would donate their kidneys for money. Then you can argue that paying for kidneys is a step in the wrong direction. We should be moving towards a society where no one is desperate and people donate out of altruism. Paying for donations calcifies the current systems and papers over our problems instead of correcting them.
I don’t really agree, because I like incremental progress and I’m allergic to anything that verges on “the worse the better”. But I see where it’s coming from.
2025-04-25 08:00:00
I don’t know how to internet, but I know you’re supposed to get into beefs. In the nearly five years this blog has existed, the closest I’ve come was once politely asking Slime Mold Time Mold, “Hello, would you like to have a beef?” They said, “That sounds great but we’re really busy right now, sorry.”
Beefing is a funny thing. Before we invented police and laws as courts, gossip was the only method human beings had to enforce the social compact. So we’re naturally drawn to beefs. And, as I’ve written before, I believe that even with laws and courts, social punishment remains necessary in many circumstances. The legal system is designed to supplement social norms, not replace them.
Beefs tend to get a lot of attention. I like attention. I hope I get credit when the world inevitably turns against ultrasonic humidifiers. But I don’t really want attention for beefing. I don’t have a “mission” for this blog, but if I did, it would be to slightly increase the space in which people are calm and respectful and care about getting the facts right. I think we need more of this, and I’m worried that society is devolving into “trench warfare” where facts are just tools to be used when convenient for your political coalition, and everyone assumes everyone is distorting everything, all the time.
Nevertheless, I hereby beef with Crémieux.
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
That’s the start of a recent thread from Crémieux on the left, and sections from a post I wrote in 2022 on the right.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() ![]() ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() |
![]() |
![]() |
![]() ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think so. And I don’t think it’s a close call.
Presenting work or ideas from another source as your own, with or without consent of the original author, by incorporating it into your work without full acknowledgement.
And in particular:
Paraphrasing the work of others by altering a few words and changing their order, or by closely following the structure of their argument, is plagiarism if you do not give due acknowledgement to the author whose work you are using.
A passing reference to the original author in your own text may not be enough; you must ensure that you do not create the misleading impression that the paraphrased wording or the sequence of ideas are entirely your own.
Applying this definition requires some judgement. Crémieux took eleven (unattributed) screenshots from my posts, as well as the entire structure of ideas. But there was a link at the end. Would it be clear to readers where all the ideas came from? Is the link at the end “due acknowledgement”? I think few reasonable people would say yes.
There are also several phrases and sentences that are taken verbatim or almost verbatim. E.g. I wrote:
Aspartame is a weird synthetic molecule that’s 200 times sweeter than sucrose. Half of the world’s aspartame is made by Ajinomoto of Tokyo—the same company that first brought us MSG back in 1909.
And Crémieux wrote:
Aspartame is a sugary sweet synthetic molecule that’s 200 times sweeter than sucrose. More than half of the world’s supply comes from Ajinomoto of Tokyo, better known for bringing the world MSG.
This does not happen by accident. Crémieux seemed to understand this when former Harvard president Claudine Gay was accused of plagiarism. But I still consider this something of a technicality. It happens that Crémieux got sloppy and didn’t rephrase some stuff. But you could easily use AI to rephrase more and it would still be plagiarism.
I don’t understand twitter. Maybe this is normal there. But I understand rationalist-adjacent blogs.
If this was some random person, I’d probably let it go. But Crémieux presents as a member of my community. And inside that community, I feel comfortable saying this is Not Done. And if it is done, I expect an apology and a correction, rather than a long series of suspiciously practiced deflections.
I don’t expect this post will do much for my reputation. When I read it, I feel like I’m being a bit petty, and I should be spending my time on all the important things happening in the world. I think that’s what Crémieux is counting on: There’s no way to protest this behavior without hurting yourself in the process. But I’ve read Schelling, and I’m not going to play the game on that level.
I’d like to be known as a blogger with a quiet little community that calmly argues about control variables and GLP-1 trials and the hard problem of consciousness, not someone who whines about getting enough credit. But I’ve decided to take the reputational hit, because norms are important, and if you care about something, you have to be willing to defend it.