RSS preview of SLIME MOLD TIME MOLD

Rss preview of Blog of SLIME MOLD TIME MOLD

How to DIY New Scientific Protocols

2025-11-18 07:13:55

Scientific research today relies on one main protocol — experiments with control groups and random assignment. In medical contexts, these are usually called randomized controlled trials, or RCTs.

The RCT is a powerful invention for detecting population-level differences across treatments or conditions. If there’s a treatment and you want to know if it’s more effective than control or placebo, if you want to get an answer that’s totally dead to rights, the RCT is hard to beat. But there are some problems with RCTs that tend to get swept under the rug.

Today we aim to unsweep.

First, RCTs are seen as essential to science, but in fact they are historically unusual. RCTs were first invented in 1948, so most of science happened before they were even around. Galileo didn’t use RCTs, neither did Hooke, Lavoisier, Darwin, Kelvin, Maxwell, or Einstein. Newton didn’t use RCTs to come up with calculus or his laws of motion. He used observations and a mathematical model. So the idea that RCTs and other experiments are essential to science is ahistorical and totally wrong.

If you were to ask doctors what findings they are most sure of, they would almost certainly include “smoking causes cancer” in their list. But we didn’t discover this connection by randomly assigning some people to smoke a pack a day and other people to abstain, over the course of several years. No. We used epidemiologic evidence to infer a causal relationship between the presumed cause and observed effect.

Second, the RCT is only one tool, and like all tools, it has specific limitations. It’s great for studying population-level differences, or treatments where everyone has a similar response. But where there is substantial heterogeneity of treatment, the RCT is a poor tool and often gives incoherent answers. And if heterogeneity is the main question of interest, it’s borderline useless.

Put simply, if people respond to a treatment in very different ways, an RCT will give results that are confusing instead of clarifying. If some people have a strong positive response to treatment and some people have no response at all, the RCT will distill this into the conclusion that there is a mild positive response to treatment, even if no individual participant has a mild positive response!

Also, RCTs are like, way inefficient. To test for a moderate effect size, you need several dozen or several hundred participants, and you can test only one hypothesis at a time. Each time you compare condition A to condition B, you find out which group does better. Maybe you want to see if a dose of 2 mg is better than a dose of 4 mg. But if there are a dozen factors that might make a difference, you need a dozen studies. If you want to test two hypotheses, you need two groups several dozen or several hundred participants, for three you will need at least three groups, et cetera.

Third, RCTs don’t take advantage of modern cheap computation and search algorithms. For example, in the 1980s there was some interest in N=1 experiments for patients with rare cancers. This was difficult in the 1980s because of limited access to computers, even at research universities. But today you could run the same program on your cell phone a hundred times over. We’d be better off making use of these new insights and capabilities.

Recent Developments

Statistics is young, barely two hundred years at the outside. And the most familiar parts are some of the youngest. Correlation was invented in the 1880s and refined in the 1890s. It’s not even as old as trains.

Turns out it is kinda easy to make new tools. The RCT is important, but it isn’t rocket science. A new century requires new scientific protocols. The 21st century is an era where communication is prolific and computation is cheap, and we should harness this power.

Since the early days, science has been based on doing experiments and sharing results. Researchers collect data, develop theories, and discuss them with other likeminded weirdos, freaks, and nerds.

New technology has made it easier to do experiments and share results. And by “new technology”, we of course mean the internet. Just imagine trying to share results without email, make your data and materials public without the OSF or Google Drive or Dropbox, or collaborate on a manuscript by mailing a stack of papers across the country. Seriously, we used to live like that. Everyone did.

People do like the internet, and we also hear that they sometimes use it. Presumably a sensible, moderate amount. But just like the printing press, which was invented in 1440 but didn’t lead to the Protestant Reformation until 1517, the internet (and related tech like the computer and pocket computer, or “call phone”) has not yet been fully leveraged.

Let’s Put on our Thinking Caps

This is all easy enough to say, but at some point you need to consider how to come up with totally new research methods.

We take three main angles, which are historical, analogical, and tinkering. Basically: Look at how people came up with new methods in the past. Look at successful ideas from other fields and try applying them to science. And look at the different ideas and see what happens when you expose them to nature.

We begin with close reads and analysis of the successful development of past protocols (for example, the scientific innovation around the cure for scurvy).

We develop new scientific protocols by analogy to successful protocols in other areas. For example, self-experiments are somewhat like debugging (programmers in the audience will be familiar with suspicion towards stories of “well, it worked on MY setup”). The riff trial was developed in analogy to evolution.

Finally, we deploy simple versions of these protocols as quickly as possible so that we can tinker with them and benefit from the imagination of nature. This is also somewhat by analogy to hacker development methods, and startup concepts like the minimum viable product. We try out new ideas as soon as they are ready, and all of our work is published for free online, so other people can see our ideas and tinker with them too.

Here are some protocols we’ve been dreaming about that show exceptional promise:

N=1

The idea of N = 1 experiments / self-experiments has been around for a while, and there are some famous case studies like Nobel Laureate Barry Marshall’s self-administration of H. Pylori to demonstrate its role in stomach ulcers and stomach cancer. But N = 1 protocols have yet to reach their full potential.

There’s a lot of room to improve this method, especially for individuals with chronic illnesses/conditions that bamboozle the doctors. N = 1 studies have particular considerations, like hidden variables. You can’t just slap on a traditional design, you need to think about things like latency and half-life. And many of the lessons of N = 1 generalize to N of small.

Community Trial

The Community Trial is a protocol that blurs the line between participant and researcher. In these trials, an organizer makes a post providing guidelines and a template for people to share their data. Participants then collect their own data and send it to the organizer, who compiles and analyzes the results, sharing the anonymized data in a public repository.

Data collection is self-driven, so unlike a traditional RCT, participants can choose to measure additional variables, participate in the study for longer than requested, and generally take an active role in the study design.

Unlike most RCTs, community trials allow for rolling signups, and could be developed into a new class of studies that run continuously, with permanently open signups and an ever-growing database of results with a public dashboard for analysis.

We first tested this with the Potato Diet Community Trial (announcement, results), where 209 people enrolled in a study of an all-potato diet, and the 64 people who completed 4 weeks lost an average of 10.6 lbs. Not bad.

Reddit Trials

There’s a possible extension of the community trial that you might call a “Reddit Trial”.

In this protocol, participants in an online community (like a subreddit) that all share a common interest, problem, or question (like a mystery chronic illness) come together and invent hypotheses, design studies, collect data, perform analysis, and share their results. As in a community trial, participants can take an active role in the research, measure additional variables, formulate new hypotheses as they go, etc.

People seem to think that a central authority makes things better, but we think for design and discovery that’s mostly wrong. You want the chaos of the marketplace, not the rigid stones of the cathedral. Every bug is shallow if one of your readers is an entomologist.

This could be more like a community trial, where one person, maybe even a person from outside the community, takes the lead. But it could also be very different from a community trial, if the design and leadership is heavily or enormously distributed. There’s no reason that rival factions within a community, splintering over design and analysis, might not actually make this process better.

We already wrote a bit about similar ideas in Job Posting: Reddit Research Czar. And none other than Patrick Collison has come to a closely-related conclusion in a very long tweet, saying:

Observing some people close to me with chronic health conditions, it’s striking how useful Reddit frequently ends up being. I think a core reason is because trials aren’t run for a lot of things, and Reddit provides a kind of emergent intelligence that sits between that which any single physician can marshal and the full rigor of clinical trials.

… Reddit — in a pretty unstructured way — makes a limited kind of “compounding knowledge” possible. Best practices can be noticed and can imperfectly start to accumulate. For people with chronic health problems, this is a big deal, and I’ve heard lots of stories between “I found something that made my condition much more manageable” all the way to “I found a permanent cure in a weird comment buried deep in a thread”.

… Seeing this paper and the Reddit experience makes me wonder whether the approach could somehow be scaled: is there a kind of observational, self-reported clinical trial that could sit between Reddit and these manual approaches? Should there be a platform that covers all major chronic conditions, administers ongoing surveys, and tracks longitudinal outcomes?

We think the answer is: obviously yes. It’s just up to people to start running these studies and learning from experience. We’re also reminded of Recommendations vs. Guidelines from old Slate Star Codex.

Riff Trials

The Riff Trial takes a treatment or intervention which is already somewhat successful and recruits participants to self-assign to close variations on the original treatment. Each variation is then tested, and the results reported back to the organizers.

This uses the power of parallel search to quickly test possible boundary conditions, and discover variations that might improve upon the original. Since each variation is different, and future signups can make use of successful results, this can generate improvements based on the power of evolution.

We tested this protocol for the first time in the SMTM Potato Diet Riff Trial, with four rounds of results reported (Round 1, Round 2, Round 3, Retrospective).

This has already led to at least one discovery. While we originally thought that consuming dairy would stop the potato diet’s weight loss effects, multiple riff trials demonstrated that people keep losing weight just fine when they have milk, butter, even sour cream with their potatoes. Consuming dairy does not seem to be a boundary condition of the potato diet, as was originally suspected. This also seems to disprove the idea that the standard potato diet works because it is a mono-diet, boring, or low-fat. How can it work from being a mono-diet, boring, or low-fat if it still works when you add various dairy products, delicious dairy products, and high-fat dairy products?

There are hints of other discoveries in this riff trial too, like the fact that the diet kept working for one guy even when he added skittles. But that’s still to be seen.

“Bullet-Biting”

In most studies, people have a problem and want the effect to work. If it’s a weight loss study, they want to lose weight, and don’t want the weight loss to stop. So participants are hesitant to “bite the bullet” and try variations that might stop the effect.

This creates a strong bias against testing which parts of the intervention are actually doing the work, which elements are genuinely necessary or sufficient. It makes it much harder to identify the intervention’s real boundary conditions. So while you may end up with an intervention that works, you will have very little idea of why it works, and you won’t know if there’s a simpler version of the intervention that would work just as well; or maybe better.

We find this concerning, so we have been thinking about a new protocol where testing these boundaries is the centerpiece of the approach. For now we call it a “bullet-biting trial”, in the sense that it guides researchers and participants to bite the bullet (“decide to do something difficult or unpleasant in order to proceed”) of trying things that might kill the effect.

In this protocol, participants first test an intervention over a baseline period, to confirm that the standard intervention works for them.

Then, they are randomized into conditions, each condition being a variation that tests a theoretical or suspected boundary condition for the effect (e.g. “The intervention works, but it wouldn’t work if we did X/didn’t do Y.”).

For example, people might suspect that the potato diet works because it is low fat, low sugar, or low seed oils. In this protocol, participants would first do two weeks of a standard potato diet, to confirm that they are potato diet responders. No reason to study the effect in people who don’t respond! Then, anyone who lost some minimum amount of weight over the baseline period would be randomized into a high-fat, high-sugar, or high-seed-oil variant of the potato diet for at least two weeks more. If any of these really are boundary conditions, and stop the weight loss dead, well, we’d soon find out.

By randomly introducing potential blockers, you can learn more about how robust an intervention truly is. Maybe the intervention you’ve been treating so preciously actually works just fine when you’re very lax about it! More importantly, you can test theories of why the intervention works, since different theories will usually make strong predictions about conditions under which an intervention will stop working. And this design might help us better understand differences between individuals — it may reveal that certain variations are a boundary condition for some people, but not for others.

Links for October 2025

2025-11-01 05:45:28

Testosterone Is Giving Women Back Their Sex Drive — Compare this to our predictions for 2050 (see the section labeled “Elective Chemistry”).

Goodbye, for now from Max Goodbird / Superb Owl. A SAD DAY FOR THE BLOGOSPHERE. Gladly this should be temporary.

But as always, there is also new life. A new blog from SMTM reader and occasional correspondent Neoncube, starting with a post on potato diet comparisons.

London’s Forgotten Banana Nuisance:

Nutritious, cheap and self-packaged, the banana was a practical foodstuff for the busy worker. Just one problem, though. Edwardian London did not have many bins. The inedible peel was usually thrown onto the floor. While it awaited the attention of the street sleeper, the peel became a hazard to pedestrians and horses.

“There is no escape from the banana pest for rich or poor,” agonised the Leominster News that year. “…there is hardly a family which has not a member who at some time has not suffered from the ‘banana fall’.“ Slips by this time were so common that ‘banana fall’ became a widely recognised colloquialism.

Minnesota Department of Health website claims that “drinking water with low levels of arsenic over a long time is associated with diabetes“. Who knew about this?

J.K. Rowling’s $150 million yacht is named Samsara. Yes, really.

Infrasound: What You Can’t Hear CAN Hurt You — A good research direction, especially for those of us who have ever experienced mystery illnesses, ever stayed in a house where you mysteriously couldn’t sleep or felt sick all the time, etc. The vibes might literally be bad.

Montaigne’s Self-Fashioning — Giving style to one’s character.

Consumer Reports: Protein Powders and Shakes Contain High Levels of Lead. From the author on twitter:

For more than two-thirds of the products we analyzed, a single serving contained more lead than CR’s food safety experts say is safe to consume in a day—some by more than 10 times.

“It’s concerning that these results are even worse than the last time we tested,” said Tunde Akinleye, the CR food safety researcher who led the testing project. This time, in addition to the average level of lead being higher than what we found 15 years ago, there were also fewer products with undetectable amounts of it. The outliers also packed a heavier punch. Naked Nutrition’s Vegan Mass Gainer powder, the product with the highest lead levels, had nearly twice as much lead per serving as the worst product we analyzed in 2010.

That said, we’ve always been curious about these kinds of studies, so we were interested to see this pretty convincing counterpoint: Huel is Fine

Overcoming Our Politics of War

Reinvent Science: Science Vocational School — Cannot endorse this enough. In fact, we are ready to run it if we have students and/or funding.

Reinvent Science: Use Humor

Reinvent Science: Publish Incomplete Reports!

The moon’s biggest impact crater made a radioactive splash

Peak Youth:

MTV was the central node for music in culture for roughly three decades. Arguably, it popularized both reality television and adult animation. Indisputably, it popularized music videos as a cultural form. MTV was simultaneously an arbiter of cool, a gatekeeper of mainstream relevance, and it had enough money and power that it could afford to be experimental.

It’s the ability to be experimental that feels like it is missing in contemporary culture. Recently, I’ve been thinking about how we have not had a new cultural form in quite a while. Maybe that’s because the material of culture: sounds, screens, physical forms have been fully explored.

One person’s list of Famous Cognitive Psychology Experiments that Failed to Replicate. Some of the usual suspects, as well as a few surprises. Not an endorsement, but it’s good to compare notes.

Agentic Fragments. If you have even a little of this sense, then you are truly amazed at how little there is in the modern world:

They had both grown up on small farms, in the days before electricity, and began working as children. They farmed, slaughtered, built houses and roads, sewed the clothes for their four children, wired the electricity. Their way of appropriating the world was fundamentally different from mine: everything around them was something they could take apart and put back together. If they didn’t like how the light fell in their living room, they moved the windows. If they needed a lathe, they disassembled a hammer drill and turned it into a lathe. Their world was filled with affordances that I didn’t see. Where I saw a sweater, she saw a thread temporarily shaped as one—it could just as well be a scarf, a pair of socks, a hat, or six gloves. She saw more degrees of freedom than I did, and acted on it.

Corn Holes

2025-10-23 02:43:17

Extreme corn allergies aren’t common, but over the course of our lives we’ve happened to meet two people who have them. “Extreme” means they couldn’t eat corn, couldn’t eat corn products, and couldn’t eat any product containing corn derivatives. One of them was so allergic, she couldn’t even eat apples unless she picked them from the tree herself — apples in the store have been sprayed with wax, and some of those waxes contain corn byproducts.

Both of these people were also extremely lean, we mean like rail thin. It’s easy to imagine alternative explanations for this — if you have to carefully avoid any food that has ever been within shouting distance of corn, it might be harder to get enough to eat. But there’s no rule saying you can’t grow fat on pork and rice, and it occurs to us that if corn were somehow in the causal chain that’s causing the obesity epidemic, this is exactly what you would see.

If corn were a direct cause of the obesity epidemic — maybe if it concentrates an obesogenic contaminant like lithium, maybe if obesity is caused by a pesticide massively applied to corn — then people with serious corn allergies should be almost universally thin, or should at least have an obesity rate much lower than the general population. Our sample size of two is far too small to draw this conclusion right now, but every sample of 100 or 10,000 passes through a sample size of 2 at some point.

Easy enough to test. So, if you or someone you know has a serious corn allergy, are you really lean? We would love to know! Do you have access to the talk.kernelpanic.zero mailing list? Is there a secret r/cornwatchers subreddit? Can we send them a survey?

Corn aside, we can generalize this argument. The obesity rate in the US is about 40%. If people with an allergy to soy, fish, sesame, etc. are less than 40% obese, that implicates the food they’re allergic to. And if their obesity rate is < 5%, that’s a smoking gun.

You could also say, maybe people with food allergies have a lower overall rate of obesity, on account of their food allergies. This is probably true. Let’s say that the general rate of obesity in people with serious food allergies is 25%, instead of the 40% of the general population. But if people with serious avocado, kiwi, and banana allergies are 27%, 23%, and 24% obese, and people with serious tomato allergies are 2% obese, that’s kind of a signal.

There are some complications, like the fact that people with one food allergy are more likely to have another food allergy. But let’s not worry about that until we have the data.

One of our most counterintuitive beliefs is that the obesity epidemic may not have much to do with what we eat. But if it does, there should be some signal in the allergy cohorts.

Lithium Yay

2025-10-03 06:31:15

Scott Alexander recently named five criticisms of A Chemical Hunger, our series on the obesity epidemic, and asked for our responses. These criticisms come by way of a LessWrong commenter named Natália (see post, post).

We appreciate Scott taking the time to identify these as his top five points, because this gives us a concrete list to respond to. In short, we think these criticisms are generally confused and misunderstand our arguments.

Here they are:

1. Do you agree with the obesity increase being gradual over the course of the 20th century, rather than “an abrupt shift” as you describe in ACH?

If we’re talking about obesity rates, those increased abruptly around 1970. The increase was about 10 percentage points in the 60 years before the early 1960s and about 30 percentage points in the 60 years after the early 1960s. We’re all literally quoting the same numbers from the same sources (NHANES), there shouldn’t be any disagreement about whether or not there was an abrupt shift in obesity rates, unless we’re just arguing semantics over what counts as “abrupt”. Of interest in this point is that Natália agrees. She made a changelog to the relevant post where she wrote, “discussion in the comments made me realize that the argument I was trying to make was too semantic in nature and exaggerated the differences in our perspectives.”

Some people think that other measures, like average BMI, might have been increasing more linearly, that the abrupt shift in obesity rates are an artifact of the normal distribution in what is actually a gradual increase, that these other measures are therefore a better indicator, and that this suggests there was no special change in the obesity epidemic around 1970. This would be an interesting wrinkle, but we’ve looked at various models and we don’t think they support this interpretation (see the appendix for details). There’s even some data on average BMI over time, which also seems to show a shift. We still think there’s evidence of a change in the rate of change.

That said, we think this is the wrong question to ask. We highlighted the abrupt shift in obesity rates because we think it’s interesting, and maybe surprising, but it doesn’t do a lot to help us distinguish between different hypotheses, so it’s not very important. Contamination can happen either gradually or abruptly, so unless we’re asking about a specific contaminant that was abruptly introduced in 1970, whether or not the shift was abrupt has little bearing on whether the contamination hypothesis is correct. If anything, a gradual increase starting around 1950 is more compatible with the lithium hypothesis, because there’s some reason to think that lithium exposure increased gradually:

*Graph showing world lithium production from 1900 to 2007, by deposit type and year. The layers of the graph are placed one above the other, forming a cumulative total. Reproduced from* *USGS*.

2. Do you agree that even medical lithium patients don’t have enough weight gain to cause the obesity epidemic? If so, why do you think that getting a tiny fraction of that much lithium would?

This is a great question. Let’s say that on average, people have gained 12 kilos since 1970, but that patients only gain an average of 6 kilos when they start taking medical lithium. This would be some evidence that lithium exposure isn’t responsible for the entire change in obesity since 1970. But it would be quite consistent with the idea that lithium caused some of the change in obesity since 1970, potentially as much as 50%.

We’re comfortable with the idea that lithium may be responsible for only part of the obesity epidemic. Natália even mentions this, she says, “[SMTM] also think that other contaminants could be responsible, either alone or in combination” in footnote 1 of this post. Even if we assume the weight gained by medical lithium patients is an upper limit on the possible effect, it still seems consistent with lithium exposure being responsible for some reasonable percentage of the overall increase. If lithium caused “only” 50% of the weight gain since 1970, or even just 10%, that would still be a pretty big deal and we would still care about that.

That said, we do think there’s some reason to suspect that lithium might be responsible for more than 50%. If everyone is already exposed to lithium in their diet, then the amount of weight gained by medical lithium patients when they add a higher dose will underestimate the total effect. Extremely long-term trace exposure (and bolus doses, compounds other than lithium carbonate, etc.) might have different pharmacokinetics than medical lithium. And there’s at least one population (the Pima of the Gila River Valley) where long-term exposure to lithium in food and water was associated with striking rates of obesity and diabetes, suggesting that under some conditions, lithium levels found in food and water may be enough to cause serious weight gain.

3. Natalia lists several reasons to expect that trace lithium doses should have only trace effects – Gwern’s reanalysis showing few-to-no psych effects, some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users. What are your thoughts on this?

We think there are several reasons to expect effects from trace and subclinical doses, especially with extremely long-term exposure.

We’re only aware of one RCT of trace-level doses (Schrauzer & de Vroey, 1994), but this study found that taking 0.4 mg per day of lithium orally led to participants feeling happier, more friendly, more kind, less grouchy, etc., “without exception”, compared to placebo.

When we surveyed redditors who took subclinical doses of lithium as a nootropic (ballpark 1-10 mg/day), people commonly reported some non-weight-gain effects, like increased calm, brain fog, frequent urination, and decreased libido. And they rarely or never reported other effects, like eye pain, fainting, or severe trembling. This suggests that low doses of lithium are enough to cause some common effects of lithium, while not causing others.

Following chronic lifelong exposure to trace doses of lithium in their drinking water, and accumulation in some of their food, the Pima of the Gila River Valley ended up with high rates of obesity and diabetes. The Pima became obese and lethargic, but didn’t (as far as we know) suffer from hand tremors or nausea. Their example also supports the idea that lithium has some effects that kick in at psychiatric dose levels and others at groundwater levels, and that metabolic effects might be among the effects that can be caused by food and groundwater exposure alone.

These examples seem to address the concern of “some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users”. Lower doses do have fewer effects, and some effects do seem to go away as you lower the dose. But other effects seem to be fairly common, even at low doses, and others may manifest with long-term exposure. This question is especially hard to answer in just a few paragraphs, so take a look at the appendix for much more detail.

4. Do you agree that wild animals are not really becoming obese?

This is a misunderstanding about the use of the word “wild”. Our main source for animals becoming obese was Klimentidis et al. (2010), Canaries in the coal mine: a cross-species analysis of the plurality of obesity epidemics, which uses the terms “wild” and “feral” to refer to a sample of several thousand Norway rats.

Following this source, in Part I of A Chemical Hunger we also use the terms “wild” and “feral” to refer to these rats. We say, “Humans aren’t the only ones who are growing more obese — lab animals and even wild animals are becoming more obese as well. Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter.” Our use of the term followed our source, and while it’s natural that people misunderstood the term to mean something more broad, let’s clarify that we didn’t intend to imply we were making claims about mountain goats, sloths, or white-tailed deer.

But the broader question is definitely interesting, so let’s consider it now: have “truly wild” animals, living totally separately from humans, been getting obese as well? We think this is a point where reasonable people can disagree, because there isn’t much data about the weight of truly wild animals over time. There’s very little to go on. We can point to an example paper, Wolverton, Nagaoka, Densmore, & Fullerton (2008), where we find data that are consistent with the idea that some truly wild animals are getting heavier, so we think it’s possible. But we don’t claim it’s well-supported. The wildest animals we have good data on are probably those feral rats from above.

But we don’t make much of this either way, because it doesn’t seem like a crux. If pets, zoo animals, lab animals, feral animals, and/or truly wild animals are getting obese, that’s some evidence in favor of the contamination hypothesis. But the contamination hypothesis can still be true if some of those populations are not becoming obese.

5. Do you agree that water has higher lithium levels at high altitudes (the opposite of what would be needed for lithium to explain the altitude-obesity correlation)?

No. This claim is based on an analysis that contains several mistakes.

Natália conducted an analysis of this dataset from the USGS and elevation data from Open Elevation API, and found a positive correlation of 0.46 between altitude and log(lithium concentration) in U.S. domestic-supply wells. We replicated this analysis and can confirm that’s the correlation coefficient you get. But this analysis is mistaken, for two main reasons.

First of all, the statistical problem. Correlation tests estimate the population correlation by looking at the correlation in a random sample drawn from that population. But this sample isn’t random, and it’s not representative either. The data mostly come from Nebraska, certain parts of Texas, and the East Coast. Some states are not represented at all. Really, look at the map below; it’s so much Nebraska. Even if there is a correlation within this dataset, there’s no reason to expect it’s a meaningful estimate of the correlation in the U.S. as a whole.

But even if this were a random sample, this analysis would still be mistaken, because it’s a sample from the wrong population. Natália’s analysis only covers domestic-supply wells. It excludes public-supply wells, and it entirely omits surface water sources.

This is a problem, because many people get their drinking water from public-supply wells, or from surface water. And it’s a problem because if there were a correlation between lithium levels and altitude, we’d expect to see it in surface water, not well water. Water drawn from wells has often been down there for thousands of years, while surface water is directly exposed to runoff, landfills, brine spills, power plants, and factory explosion byproducts. So we’d expect surface water to drive any correlation of obesity with altitude.

This is a pretty strange set of errors for Natália to make, given that we discussed this dataset in A Chemical Hunger and specifically warned about both of these issues.

We also want to call attention to a 6th point that Scott doesn’t mention. If we were to phrase it as one of his questions, it might go something like this:

6. You did a literature review of lithium concentrations in food and found that some foods contain more than 1 mg/kg of lithium, which implies that people might be getting subclinical doses from their daily diet. Natália disputes this and says that the best available data shows less than 0.5 mg/kg lithium in every single food. Do you agree?

The truth is that there’s a split in the literature. The studies Natália cites consistently find low levels of lithium in food and beverages, as do some other papers. But other sources find much higher levels. These sources seem to contradict each other, in a way that seems like they can’t all be right. And there are other major gaps in our knowledge; Natália correctly pointed out that there are few recent measurements of lithium in the American food supply.

We went back and took a closer look at the study methods. What we noticed is that the studies that found < 1 mg/kg lithium tended to use the same technique for chemical analysis — ICP-MS with microwave digestion with nitric acid (HNO3). The studies that found more than 1 mg/kg lithium in food used a variety of other techniques.

This made us suspect that the split in the literature was caused by the method of analysis. It seemed like maybe one technique gave really low estimates of lithium in food, while other techniques gave much higher readings. To test this, we ran a study where we took samples of several American foods and analysed the same food samples using different methods.

This confirmed our hypothesis. Different analytical methods gave very different results.

When the foods were digested in HNO3, both ICP-MS and ICP-OES analysis mostly reported that concentrations of lithium were below the limit of detection. When foods were dry ashed instead, both ICP-MS and ICP-OES consistently found levels of lithium above the limit of detection, as high as 15.8 mg/kg lithium in eggs (which we replicated in a second study on just eggs).

This neatly explains the discrepancies in the literature. The lower results come from methods that yield very low estimates, often detecting no lithium at all, and the higher results come from other methods that give higher estimates. We think that the higher results are more accurate for several reasons (see our full reasoning in the original post) but the fastest way to make this case is that they show greater discrimination (better at distinguishing between samples). But even the lower estimates still support the idea that American foods sometimes contain more than 1 mg/kg, as they detected up to 1.2 mg/kg lithium in goji berries.

For more detail on all these points, see the Appendix. But first:

Why didn’t we respond earlier?

We love scientific debate. That’s why we respond to questions on twitter and have a long history of responding to questions asked on Reddit, as we did here. Sometimes we debate people over email; sometimes we write long response posts and make them public.

We can’t respond to everything, and we sometimes decline to respond to arguments we don’t understand, or conversations that don’t seem like they will be productive. This is definitely a judgment call, but it’s one we’re comfortable making. As a model, consider also this tweet from Visakan Veerasamy:

Our first experiences with Natália were of her, and her husband Matthew Barnett, being aggressive towards us for no clear reason.

Many of these early exchanges appear to have been deleted, but some of them survive. One early example was when Matthew publicly challenged us to a bet. The bet seemed like it would create a perverse incentive for us, so we declined the challenge and did our best to explain why.

Other people agreed with our interpretation. Dominik Peters said, “They’re planning to do further research about whether the theory is right or wrong, iiuc. Not sure it helps epistemically if they have a $2k incentive to find a ‘yes’ rather than a ‘no’ answer.” We tried to be as clear as possible. But Matthew didn’t seem to understand.

We responded to their comments for a while and continued to find them difficult to deal with, so we decided to stop engaging. Their comments were civil, but they were repeatedly confrontational, and our attempts to continue the conversation or explain our reasoning felt like they went nowhere.

If we couldn’t have a productive disagreement, it seemed like the most polite thing to do would be to not respond. We figured that not responding was a respectful way to decline further discussion. But they kept issuing public challenges, sending us DMs, comments, emails, for weeks. If you’ve ever stopped responding to someone and they continue sending you messages on every possible platform, you know what we mean.

So when Natália published her LessWrong posts, you can imagine why we weren’t interested in responding.

When you do science on the internet, you can see right away there are two kinds of responses. Most people want to help you get to the truth, even if they don’t necessarily agree with you. We’ve corresponded with several people like that: JP Callaghan, ExFatLoss, Jeff Nobbs, etc.

But some people want something else: it’s hard to tell what that thing is, because they seem to respond to what they imagine you said, rather than what’s actually there. It feels like they must have some motive you don’t understand — maybe they want to dunk on you, censor you, or promote you towards whatever strange goal. This isn’t a very charitable read and people who do this almost certainly don’t think of themselves this way, but that’s what it feels like on the receiving end.

And whatever, that’s the price of doing business on the internet. But you start to recognize pretty quickly whether someone is trying to help you or not, and if they’re not trying to help you, there’s really no reason to engage with them.

That’s why there’s no obligation to answer all objections. If you don’t feel like the objection was made by someone trying to get closer to the truth, and/or if you don’t feel like you’re going to get closer to the truth by answering it, why bother?

We feel like this is part of a pattern, because Natália and Matthew have acted the same way towards other researchers. They made a similar collection of arguments against the work of our one-time collaborator, Alexey Guzey. His response was “skimmed the post, tbh it seems weak”.

It’s not really that they are too aggressive. ExFatLoss is really aggressive, and we still talk to him. It’s more that discussions with Natália and Matthew never seem to get anywhere. Here’s a third party describing how Natália repeatedly edits or deletes her comments, which makes it hard to hold a conversation:

Mod note: I count six deleted comments by you on this post. Of these, two had replies (and so were edited to just say “deleted”), one was deleted quickly after posting, and three were deleted after they’d been up for awhile. This is disruptive to the conversation. It’s particularly costly when the subject of the top-level post is about conversation dynamics themselves, which the deleted comments are instances (or counterexamples) of.

You do have the right to remove your post/comments from LessWrong. However, doing so frequently, or in the middle of active conversations, is impolite. If you predict that you’re likely to wind up deleting a comment, it would be better to not post it in the first place. LessWrong has a “retract” button which crosses out text (keeping it technically-readable but making it annoying to read so that people won’t); this is the polite and epistemically-virtuous way to handle comments that you no longer stand by.

We want to be collegial, but Natália hasn’t treated us like a colleague. She often jumps straight to accusations, or just states single facts, or cites single articles as if they are a complete argument. She uses phrases like “extremely cherry-picked evidence” and accuses us of “subtle sleight of hand”. She says that our arguments are “misleading”, suggesting that any points of disagreement are both intentional and intended to mislead, without stopping to consider whether we might have simply made a mistake, or whether she might be misunderstanding our point.

Some people do use cherry-picked evidence, and we respect the desire to calls ‘em as one sees ‘em. But labeling something is a missed opportunity to describe the situation and let readers decide for themselves. And the principle of charity is also important — it’s not productive to nitpick, you should consider the best, strongest possible interpretation of an argument. Before you jump directly to accusations of cherrypicking, you should consider whether or not there are alternative explanations. Maybe you misunderstood the original argument, or made some other kind of mistake.

Maybe this is apocryphal, but we’ve heard that in medieval debate, you weren’t allowed to start criticising your opponent’s argument until you could re-state it to the point where they agreed, “yes, that’s my position.”

This is where Natália’s critiques really fail. We don’t recognize anything of our arguments in what she writes. It’s hard to respond when someone attacks a version of your argument that you didn’t make. We’re not really interested in responding to her in the future, but if she does want to offer a response, we’d like to see her at least start by re-stating what she thinks we believe. That way if she’s mistaken, it might be easier to clarify.

We believe in the principle of “focus your time and energy on what you want to see more of”. We don’t want more pointless internet arguments, more back and forths. We felt that our time was better spent elsewhere.

And this kind of disagreement does a disservice to the real issue, which is the science! We just don’t think the norms of who issued what kind of corrections when is all that interesting. We don’t want to spend our time fighting over procedure. We’d rather keep our eye on the ball, do more analysis, collect more data, and try to figure out the causes of obesity. That’s a conversation worth having.

Why Respond Now?

We didn’t respond to these arguments before, so why would we respond to them now? There are two main reasons.

First, Scott identified five points that he found interesting. When there were 101 points with no particular structure, it was hard to feel like it was possible to write a worthwhile response. No one wants to read a 101-item laundry list, and we sure as hell don’t want to write it.

But once Scott was kind enough to name his five points, we could focus on a small list of questions that a person of good judgment found concerning. That’s a discussion worth having, and tractable too.

Second, we have new data that can help resolve these disagreements. When you have the means to empirically test your disagreements, arguing is borderline unscientific. Debate is a waste of time, you should be running a study.

Instead of responding to criticisms with verbal arguments, we wanted to respond to them with data. We think this is good practice and we want to model it — we think everyone can agree that scientific debates on the internet would benefit if more people did empirical tests of their disagreements rather than forever dishing out verbal arguments and going in circles.

Now we have empirical results, so we can respond with the data. And we think it makes for a much more substantive response. Thank you for your patience.

Appendix

#1 Abrupt Shift

Do you agree with the obesity increase being gradual over the course of the 20th century, rather than “an abrupt shift” as you describe in ACH?

Much of this discussion is weird to us because, as far as we can tell, everyone is looking at the same data.

Natália wrote:

In the United States, the obesity rate among adults 20-74 years old was already 13.4% in 1960-1962 (a), 18-20 years before 1980. We don’t have nationally representative data for the obesity rate in the early 20th or late 19th centuries, but it might have been as low as ~1.5% or as high as 3%, indicating that the obesity rate in the US increased by a factor of >4x from ~1900 to ~1960.

We agree. Those numbers come from the same sources we used, like the NHANES and Helmchen & Henderson (2004). Natália quotes our sources back to us as if it contradicts what we said, which it doesn’t. It’s hard to know what to make of this kind of response.

Natália quotes us saying, “Between 1890 and 1976 … rates of obesity [went] from about 3% to about 10%.” She says, “the obesity rate in the early 20th or late 19th centuries …might have been as low as ~1.5% or as high as 3%”, and “the obesity rate among adults 20-74 years old was already 13.4% in 1960-1962.” Her numbers are also from about 3% to about 10%.

It’s hard to see how what we wrote “understates the meaningfulness and extent of the changes in average BMI and obesity rates that occurred before 1980.” Especially when Natália uses the same sources we used, and quotes the same numbers.

The important thing is that the obesity rate increased even more after 1960. See for example this graph we included in the original post:

Obesity rates went from something like 1.5%-3% around 1900 to something like 13.4% in the early 1960s. This is an increase of 11.9-10.4 percentage points over about 60 years. Then the obesity rate went from something like 13.4% in the early 1960s to something like 42.8% in 2017–2018. This is an increase of 29.4 percentage points over about 60 years. Based on these numbers, the obesity rate increased almost three times as much during 1960-2018 as it did from 1900-1960.

To us, this change looks both serious and abrupt. Per the CDC data, obesity rates for adults 20-74 years old went from 13.4% in 1960-1962 to 14.5% in 1971-1974, then to 15.0% in 1976-1980… then to 23.2% in 1988-1994, and then it keeps growing. A change of 1.6 pp from 1960-1962 to 1976-1980, a span of 20 years, followed by a change of 8.2 pp from 1976-1980 to 1988-1994, a span of just 14. You can see the slope of both obesity and extreme obesity change quite plainly on the figure. That seems like a serious change in the rate of change.

Is percentage points the wrong way of thinking about it? Natália says that “the obesity rate in the US increased by a factor of >4x from ~1900 to ~1960” when describing that change from 1.5%-3% around 1900 to 13.4% in the early 1960s. In comparison the change from 13.4% in the early 1960s to 42.8% in 2017–2018 would be about 3.2x. But intuitively, we think that a change from “for every 100 Americans you meet, about 3 are obese” to “for every 100 Americans you meet, about 10 are obese” is not as concerning as “for every 100 Americans you meet, about 10 are obese” to “for every 100 Americans you meet, about 40 are obese”.

To our mind, the strongest version of this critique is where you make the case that the rate of change in obesity rates is increasing, but not for the reasons you think. You could say, it’s true that the rate of change in obesity rates accelerated, but that might be an artifact of the distribution, while the rate of change in mean BMI was constant. And then you could make some argument about why rate of change in mean BMI is a better measure of the obesity epidemic than rate of change in obesity rates.

Having done some digging, we think this might be the argument Natália was trying to make in her original post. See in this comment thread, where Matthew Barnett, Natália’s husband, frames a version of this argument:

I think the relevant fact is that, based on the available data, it appears that average BMI increased relatively linearly and smoothly throughout the 20th century. Since BMI is approximately normally distributed (though skewed right), the seemingly sudden increase in the proportion of people obese is not surprising: it’s a simple consequence of the mathematics of normal distributions.

In other words, the smooth increase in mean BMI coupled with a normal distribution over BMI in the population at any particular point in time explains away the observation that there was an abrupt change centered around roughly 1980. It is not necessary to posit a separate, further variable that increased rapidly after 1980. The existing data most plausibly supports the simple interpretation that the environmental factors that underlie the obesity epidemic have changed relatively gradually over time, with no large breaks.

We’ve been discussing this for a long time now. It’s one of the questions we fielded in the A Chemical Hunger Discussion Thread posted on r/slatestarcodex in 2021.

The OP of the Reddit thread, u/HoldMyGin/, said: ”My biggest criticism is the assertion that obesity rates started spiking around 1980 … isn’t that what one would expect to see if you’re measuring the percent of a normal distribution above a certain threshold, and the mean of that distribution is slowly but consistently inching upward?” We responded with a series of simulations that showed that the rate of increase in obesity rates is faster than what we would expect if the mean of the distribution were slowly increasing. For more detail on discussion of these models, definitely check out this great comment thread involving DirectedEvolution.

But all that said, we have some data about BMI, so why rely purely on models? Assuming that the data in this figure we adapted from Helmchen & Henderson (2003) are roughly correct, then mean BMI increase per year was about 0.04 points per year from 1890-1894 to 1976-1980 and about 0.11 points per year afterwards.

See also u/KnotGodel’s analysis from the reddit comments, which finds:

*“You can see from the chart that (in this model) mean BMI didn’t really change until 1978. After this point it increased by ~4 points.”*

And even if it’s true that the rate of change in obesity rates is an artifact of the smooth increase in mean BMI over time, this wouldn’t change the fact that there was a relatively abrupt change in the rate of change of obesity rates around the 1970s. People might still be surprised that the rate of change in obesity rates increased so much, that it went from 13.4% in 1960-1962 to 14.5% in 1971-1974, then from 15.0% in 1976-1980 to 23.2% in 1988-1994. We know that we were.

Natália brings in another source we want to talk about, from John Komlos and Marek Brabec. This does contest the pattern, saying:

The common wisdom, based on period effects, is that obesity as a public health problem emerged suddenly in the 1980s. However, the disadvantage of cross-sectional surveys, upon which all analysis has been based, is that the subject’s current weight does not reveal when that weight was actually reached. That weight could have been reached at any time before measurement and maintained thereafter.

Essentially, if we look at someone in 1990 and he’s obese, we don’t know if he just became obese, or if he actually was obese in 1970.

We’re not sure this logic makes sense. Let’s imagine a population of 100 people. We’re looking at them in 1990 and we see that 23 of them are obese. Komlos and Brabec say, “these guys are obese now, but that weight could have been reached at any time before measurement and maintained thereafter. Therefore we can’t use this to estimate the trend.”

But we can look at the data from 1970 and see that only 15 people were obese. We can say that there were more obese people in the later snapshot than in the earlier one. Even if we can’t necessarily say whether or not obese individual #12 from 1990 was obese or not in 1970, we don’t need to. The estimate of obesity rates at two points is independent of whether or not we can track any individual across the two points.

We’re skeptical of this analysis for a few other reasons. Collecting data is already hard enough; adding in a fancy statistical model gives you more places where something can go wrong. And there’s a lot of interpolation. We don’t have BMI data from before 1959, so many parts of the model are estimates, not real data. In general we think it’s better to trust measurements over models, unless it’s very clear why the model is better.

In this case, the justification for the model doesn’t make any sense to us, so we don’t see why you would prefer it. Per the CDC, a higher percentage of people were obese in the late 80s/early 90s than in the 60s and 70s, and the increase went from 1.6 pp between the 60s to late 70s, to 8.2 pp between the late 70s and late 80s/early 90s.

But even if we accept these models, it doesn’t look like a contradiction. When you look at the figures (though remember these lines are model estimates, not data), we see:

That looks like a change in the rate of change to us. And the biggest change in rate of change seems to be for the cohort born around 1960, i.e. people turning 20 around 1980. There are some interesting implications here — that growth in obesity rates are mostly driven by the top few deciles, that the bottom decile hasn’t seen any change since cohort 1935, etc. — but it doesn’t contradict the idea of a change in the rate of change.

Natália agrees, saying, “it does look like there has been an acceleration at the later birth cohorts for the few highest BMI percentiles, but a minor acceleration is arguably not the same thing as ‘an abrupt shift.’”.

It’s hard to tell what the argument is here. Are we disagreeing about what counts as a “minor acceleration” and what counts as an “abrupt shift”? Is this just semantics? There might be an argument about what is abrupt enough to be abrupt, and it’s fine if someone disagrees, but the numbers seem pretty distinct.

The good news is that Natália agrees again. She made a changelog to the relevant post where she wrote,

The first version of this blog post argued that, contra the SMTM authors, there wasn’t an abrupt shift in obesity rates in the late 20th century. Further discussion in the comments made me realize that the argument I was trying to make was too semantic in nature and exaggerated the differences in our perspectives. I changed this about 8 hours after the post was published.

More importantly, we think this shows a misunderstanding of the role this observation plays in our work.

In Part I of the series, we introduced the idea of an abrupt shift as Mystery #2, to help drive the intuition that the obesity epidemic is more surprising than people expect, that there’s a mystery here to be solved.

We still think the change in the rate of change is surprising. If you came to our work with the expectation that obesity has been increasing at a constant rate since the invention of the croissant, you would be pretty far off the mark.

This particular mystery is interesting, but it’s orthogonal to the contamination hypothesis. Contamination can happen either gradually or abruptly, so whether or not the shift was abrupt has little bearing on whether the contamination hypothesis is plausible or correct.

There are some contaminants that are much more plausible candidates if there was an abrupt shift around 1970. If we were considering two possible causes for the obesity epidemic, one potential cause that appeared abruptly around the 1970s and another potential cause that appeared on the scene more gradually, the abruptness of the shift could help us distinguish between them.

But a slow and gradual shift is compatible with many possible contaminants, including lithium. If anything, a gradual increase starting around 1950 is more compatible with the lithium hypothesis, because there’s some reason to think that lithium exposure increased gradually:

#2 Medical Lithium Patients

Do you agree that even medical lithium patients don’t have enough weight gain to cause the obesity epidemic? If so, why do you think that getting a tiny fraction of that much lithium would?

As we understand it, the question here is this: The average American adult has gained something like 10-15 kg since the early 70s. But studies usually find that people on medical doses of lithium don’t get hyper obese, they gain only a few kilos on average. How can chronic, subclinical doses of lithium account for a gain of 10+ kg if acute, clinical doses don’t seem to cause more than 6 kg of gain?

First point here: We’re comfortable with the idea that lithium might not be the only factor causing the obesity epidemic. Natália knows this, she says, “[SMTM] also think that other contaminants could be responsible, either alone or in combination” in footnote 1 of this post.

Natália’s conclusion is, “lithium seems to cause an average of zero to 6 kg of weight gain in the long term. And strikingly, the upper end of that range, although large, is only half the amount of weight the average American adult has gained since the early 70s.”

To us, this doesn’t do anything to diminish the importance of this hypothesis. If lithium caused “only” 50% of the weight gain since 1970, or even just 10%, that would still be a pretty big deal. We should try to reverse it, so that everyone can be 6 kg lighter.

That said, let’s make the case that lithium might be responsible for more than 50%.

Modern people do tend to gain less than 15 kilos on clinical doses of lithium. But if we are already exposed to lithium in our food and water, we would expect that additional lithium would only top up the existing effect. If everyone’s on lithium already, then adding a bit more wouldn’t have the same impact as starting from zero, and will underestimate the total effect.

Think about the dose-response curve. For the sake of illustration, let’s imagine it’s like this, where the x-axis is dose of lithium per day, and the y-axis is extra weight gained from lithium exposure:

In the ancestral environment, everyone got less than 0.1 mg of lithium per day, and they had no extra weight from lithium. If you suddenly put one of these people on a clinical dose of 100 mg/day, they would gain 40 lbs.

Now let’s imagine that in the modern environment, everyone is getting 10 mg/day from their food and water. This would mean that everyone has already gained 20 lbs from chronic exposure. If we then put everyone on a clinical dose of 100 mg/day, they would gain only 20 lbs.

A person in this world might look at this and conclude that lithium doesn’t cause enough weight gain to cause the obesity epidemic. After all, adding a huge medical dose only makes you gain half of the observed effect. But in fact, lithium is causing the entire 40 lbs. It’s just that the background dose of 10 mg/day caused the first 20 lbs, and the 100 mg/day clinical dose is only topping up the remainder of the dose-response curve.

In fact, it’s kind of impressive that a clinical dose of lithium can cause like 6 kg more weight gain in an already obese population. If you gave the same dose to a hunter gatherer from 50,000 BC, he’d probably gain more.

In reality, everyone’s curve will be slightly different, the maximum effect will be slightly different, and so on. We discuss this at length in the introduction to our study, Subclinical Doses of Lithium Have Plenty of Effects. But the general logic still holds. If subclinical amounts of lithium are already causing weight gain, then adding more lithium on top will underestimate the total effect.

Scott also asks, “why do you think that getting a tiny fraction of that much lithium would [lead to weight gain?]”

One strong reason to suspect that trace or subclinical doses might lead to weight gain is the example of the Pima of the Gila River Valley in Arizona, who we’ve written about here and here.

The Pima were exposed to unusually high levels of lithium as the result of improperly sealed petroleum exploration boreholes that discharged salt brines to the surface. According to Sievers & Cannon (1973), the lithium levels in the Pima’s drinking water was 100 ng/mL, back when the average lithium concentration in American municipal water was about 2 ng/mL. Note that 100 ng/ml is a trace dose, but it’s 50x the level most Americans were getting in their water at the time, and it’s still a relatively high level for drinking water today.

Sievers & Cannon also found that lithium concentrated in some of the Pima’s crops. In particular, wolfberries were found to contain an “extraordinary” concentration of 1,120 ppm lithium by dry weight. We did some back-of-the-envelope math and estimated that the Pima might have been getting around 15 mg of lithium per day from wolfberry jelly. This is also a subclinical dose, but it’s still in the milligram range, even if our estimate is off by an order of magnitude.

The other notable thing about the Pima is that they were unusually obese, and had “the highest prevalence of diabetes ever recorded”, back before the general obesity rate had even broken 10%. We haven’t been able to find exact measurements of body weight, BMI, or obesity rate for the Pima in the 1970s, but all sources agree that they were unusually obese.

So, the Pima were exposed to chronic trace doses of lithium in their water and chronic subclinical doses in at least one of their common foods. The Pima were also unusually obese and had exceptionally high rates of diabetes. This doesn’t prove that the lithium exposure caused the obesity and diabetes, but it’s certainly consistent with that hypothesis, and it’s one reason to think that getting a tiny fraction of a clinical dose of lithium would lead to weight gain, especially with chronic exposure through food and water.

If lithium exposure was the cause, then that’s evidence that even trace amounts, when chronic, can cause more than 6 kg of weight gain, which supports the idea that lithium alone could explain more than 50% of the obesity epidemic.

You may suspect that this is us giving unfair weight to a piece of evidence that happens to closely fit a preferred hypothesis. Two reasons why you shouldn’t think that’s the case:

First of all, the Pima were brought to our attention as a counter-example, meant to challenge the lithium hypothesis. We were totally unaware of the Pima when we developed the lithium hypothesis, but during a discussion of these theories on the SSC subreddit, u/evocomp wrote,

The famous Pima Indians of Arizona had a tenfold increase in diabetes from 1937 to the 1950s, and then became the most obese population of the world at that time, long before 1980s. … What’s the chance that all these populations who lived under calorically-insecure evolutionary pressures are all independently highly sensitive and equally exposed to Lithium, PFAS, or whatever contaminants are in SPAM or white bread?

So the example was chosen to be adversarial, and u/evocomp was right to challenge us in this way. But when we looked into it, we not only found that the Pima were equally exposed to lithium, but that they were enormously exposed to lithium.

The rationalist citations here are Making Beliefs Pay Rent (in Anticipated Experiences) and Fake Causality. The core idea is that a good test of a theory is whether it makes accurate predictions about new, not-yet-seen data, not whether it can be made to fit old data retroactively. You develop a theory by fitting it to past data, which constrains the possibilities, but you can’t test it that way. You evaluate a theory by how accurately it predicts new, unseen evidence. This was an adversarial test with unseen evidence, and the lithium hypothesis scored almost perfectly on prediction. It’s a major reason we started preferring the lithium hypothesis over other contaminants!

Here’s a project we would love to see from a third party (Scott qualifies): Try to find other populations that were notably obese before the 1970s. We predict that if any such populations can be found, many of them will be found to have been exposed to high levels of lithium, or will have been found to be exposed to factors associated with high levels of lithium, like drawing drinking water from deep wells, early fossil fuel prospecting, other mining, seismic or volcanic activity, other water quality issues, etc. We say “many” rather than all because we don’t think that lithium is the only thing that can cause obesity. It would still be consistent with the lithium hypothesis if there were some early populations that were made obese by something else.

Second, back in the 1970s, Sievers & Cannon wrote:

It is tempting to postulate that the lithium intake of Pimas may relate 1) to apparent tranquility and rarity of duodenal ulcer and 2) to relative physical inactivity and high rates of obesity and diabetes mellitus.

Sievers & Cannon also suspected that lithium exposure might be responsible for the high rates of obesity and diabetes in the Pima. They couldn’t possibly have been said with the goal of explaining the obesity epidemic, because the obesity epidemic didn’t exist in the early 1970s when the quote was written. Sievers & Cannon had no idea it was coming.

Whatever factors you think might have misled us into thinking that lithium causes high rates of obesity and diabetes, they couldn’t have misled Sievers & Cannon. They came to the same conclusion independently, about fifty years before we did.

Finally, we think chronic exposure to low doses of lithium may build up over time, to the point where chronic trace exposure can eventually lead to clinical levels in your brain. It might take 10 or 20 years for trace levels in your water to lead to clinical levels in your brain, but we all spend 10 or 20 years consuming trace amounts in our water, so that’s no problem.

In our discussion with JP Callaghan, at the time an MD/PhD student with expertise in protein statistical mechanics and kinetic modeling, he put together a three-compartment model (gut -> serum <-> tissue) and found that, for plausible values of the parameters, “lognormally distributed doses of lithium with sufficient variability should create transient excursions of serum lithium into the therapeutic range” and “in that third compartment [brain], you get nearly therapeutic levels of lithium in the third compartment for whole weeks (days ~35-40) after these spikes, especially if you get two spikes back to back.”

There are limitations here, but they cut both ways. On the one hand, the parameters of both the system and the lognormal doses are plausible, but made up. On the other hand, it’s not clear if therapeutic ranges in the brain are needed to cause weight gain. Weight gain could start at brain levels well below the therapeutic.

The model is more of a sanity check, and it does support the idea that chronic exposure to trace or subclinical levels of lithium over a long enough time could lead to relatively high concentrations in the brain, thyroid, and/or bone. In addition, chronic effects may be different from acute effects. Take a look at our discussion with JP Callaghan to learn more.

#3 Trace Lithium and Trace Effects

Natália lists several reasons to expect that trace lithium doses should have only trace effects – Gwern’s reanalysis showing few-to-no psych effects, some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users. What are your thoughts on this?

Let’s start at the top. Natália writes, “Gwern has looked into this (a) and concluded that the evidence that such low doses of lithium cause psychiatric effects is actually fairly weak.”

This is a pretty rough gloss of what Gwern actually said. Gwern does say that the evidence is weak, but he doesn’t claim it’s nonexistent. Overall he takes the hypothesis seriously. His topline summary says:

Epidemiological research has correlated chronic lithium consumption through drinking water with a number of population-level variables … However, the evidence is weak.

But in the body of his article, he writes, “The criticisms of the trace lithium correlation seem weak to me”. So Gwern’s position is mixed: he thinks the evidence and the criticisms are both weak. He thinks we need to run more experiments, and we agree.

There is at least one existing RCT of trace-level effects. This is Schrauzer & de Vroey (1994). In this study, the researchers gave a group of former drug users (heroin, crystal meth, PCP, and cocaine), either 0.4 mg per day (a tiny trace dose) of lithium orally, or a placebo. Even on such a tiny dose, everyone in the lithium group reported feeling happier, more friendly, more kind, less grouchy, etc., “without exception”.

Gwern doesn’t mention this paper in his review (though he does cite other Schrauzer papers), so we assume he hasn’t encountered it. It’s a small study, just 24 subjects, but it’s a start in the direction he recommends, it provides a little experimental support for the correlational findings.

Gwern’s overall position seems to be one of cautious skepticism. On the one hand, there are lots of suggestive correlations. On the other, psychiatric doses are much higher than groundwater doses. He says, “one of the main problems with inferring that lithium causes these reductions [in various symptoms] is that it seems difficult to reconcile with how large the doses must be to treat mental illness”.

Gwern considers some ways to resolve this dilemma, and we want to focus on a few of them in particular. One option he considers is that:

…groundwater doses [may be] more effective than one would expect comparing to psychiatric doses of lithium carbonate (perhaps due to chronic lifelong exposure…)

This is one of the options we discussed with JP Callaghan. It seems plausible that with chronic lifelong exposure, lithium accumulates in the brain or thyroid, or possibly in the bones. If it does, that could lead to a reservoir. Gwern makes a similar point in the next paragraph, saying:

Ken Gillman … criticizes the correlations as generally invalid due to the smallness of the drinking water dose compared to the dietary doses of lithium; I disagree inasmuch as lithium doses are cumulative, Schrauzer 2002 reports an FDA estimate of daily American lithium consumption 1mg, points out that natural levels can reach as high as 0.34mg via drinking water

Gwern also considers this response:

…lithium may have multiple mechanisms one of which kicks in at psychiatric dose levels and the other at groundwater levels (somewhat supported by some psychiatric observations that depressives seem to benefit from lower doses but in different ways; negate #1 in a different way)

We agree this is plausible, and we found evidence for this argument in our study, Subclinical Doses of Lithium Have Plenty of Effects. We polled people on Reddit who took lithium as a nootropic, and asked them to tell us what lithium compound they took, how much they took per day, approximately how many days they tried the dose for, and what effects they experienced on each dose.

People reported many different effects of lithium at subclinical doses (ballpark 1-10 mg/day). Even in our limited dataset, our collaborator Troof found evidence for different effects kicking in at different doses, and sent us this figure:

Both of Gwern’s interpretations are supported by the example of the Pima.

Following chronic lifelong exposure to relatively high but still trace groundwater doses, the Pima ended up with very high rates of obesity and diabetes, despite getting what were small daily amounts compared to psychiatric doses of lithium carbonate.

Their example also supports the idea that lithium has some effects that kick in at psychiatric dose levels and others at groundwater levels. The Pima became obese and lethargic, but didn’t (as far as we know) suffer from hand tremors or nausea. We shouldn’t be at all surprised if a drug has some effects that kick in at one dose and other effects that kick in at other doses. See our arguments here for more detail.

Does that prove that the lithium in their food and water caused the high rates of obesity and diabetes? No, but it’s consistent with the hypothesis, and evidence in favor.

These examples also seem to address the concern of “some studies suggesting low doses have fewer side effects, and lack of any of the non-weight-gain side effects of lithium in trace users”.

The Pima were exposed to chronic trace amounts of lithium. They did have high rates of obesity and a few other possible symptoms. But they didn’t (as far as we know) experience other side effects like hand tremors, ringing in the ears, or “eyeballs bulge out of the eye sockets”. This doesn’t clarify whether or not the obesity was caused by the lithium, but it does clarify that chronic low doses of lithium don’t cause these non-weight-gain side effects.

And in our study, Subclinical Doses of Lithium Have Plenty of Effects, redditors who took subclinical doses of lithium did commonly report some non-weight-gain side effects, like increased calm, brain fog, frequent urination, and decreased libido, but rarely or never reported other side effects, like eye pain, fainting, or severe trembling.

In fact, the only three participants who reported tremors were all on clinical doses — 300 mg/day lithium carbonate, 600 mg/day lithium carbonate, and 50 mg/day listed as lithium orotate (we think this means 50 mg/day elemental). This suggests that tremors don’t kick in at subclinical doses. So from this example too, we see evidence that low doses of lithium cause some non-weight-gain side effects, but don’t cause many others.

We also think it’s possible (though not necessarily likely) that some non-weight-gain side effects of lithium exposure are widespread, and the change was just slow enough that people mostly didn’t notice. Consider:

A final thing to note here is that the EPA says they are concerned about lithium exposure, even at the trace levels found in drinking water. They write:

Although useful for treating mental health disorders, pharmaceutical use of lithium at all therapeutic dosages can cause adverse health effects—primarily impaired thyroid and kidney function. Presently lithium is not regulated in drinking water in the U.S. The USGS, in collaboration with the EPA, calculated a nonregulatory Health-Based Screening Level (HBSL) for drinking water of 10 micrograms per liter (µg/L) or parts per billion to provide context for evaluating lithium concentrations in groundwater. A second “drinking-water-only” lithium benchmark of 60 µg/L can be used when it is assumed that the only source of lithium exposure is from drinking water (other sources of lithium include eggs, dairy products, and beverages such as soft drinks and beer); this higher benchmark was exceeded in 9% of samples from public-supply wells and in 6% of samples from domestic-supply wells.

This strikes us as strange — 10 µg/L and 60 µg/L are higher than historical levels, but those are pretty trace amounts, even by our standards. In comparison, the Pima were exposed to about 100 µg/L. We don’t know why the USGS and EPA are concerned about these levels, or where those thresholds come from, but it’s notable that they are concerned.

If anyone can find out where they got these numbers, please let us know. The USGS people haven’t responded to our emails.

#4 Wild Animals and Obesity

Do you agree that wild animals are not really becoming obese?

This is a misunderstanding about the use of the word “wild”.

Our main source for animals becoming obese was Klimentidis et al. (2010), Canaries in the coal mine: a cross-species analysis of the plurality of obesity epidemics. This is a study of weight change over 20,000 animals from 24 distinct populations and eight species, and the top-line finding was that “In all populations, the estimated coefficient for the trend of body weight over time was positive (i.e. increasing).”

This paper uses the terms “wild” and “feral” to refer to a sample of several thousand Norway rats. Following this source, in Part I of A Chemical Hunger we also use the terms “wild” and “feral” to refer to these rats. We say, “Humans aren’t the only ones who are growing more obese — lab animals and even wild animals are becoming more obese as well. Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter.”

This word seems to have caused a lot of confusion. Many people got the impression that we were claiming that rhinos on the Serengeti were becoming more obese. What we meant was that the obesity epidemic isn’t limited to humans. That’s consistent with the examples we used. We summarized this paper as: “Primates and rodents living in research colonies, feral rodents living in our cities, and domestic pets like dogs and cats are all steadily getting fatter and fatter,” and that’s exactly what the study says. Natália appeals to a dictionary definition to claim that we’ve said something wrong here, but the paper we cited literally refers to these rats as “wild”!

We talked about this study the same way every time we brought it up, in our posts or in conversations on Twitter. Natália selectively quotes one part of one of this sentence to make it look like we’re misrepresenting the results, but she leaves out the fact that we always included the context. We wrote:

We’ve previously reviewed the evidence that pets, lab animals, and even wild animals have gotten more obese over the past several decades.

Natália cuts off the first part and only says: “even wild animals have gotten more obese over the past several decades”, distorting the focus. We are not sure what more we could have done to make our meaning clear.

But the broader question is definitely interesting, so let’s consider it now: have “truly wild” animals, living totally separately from humans, been getting obese as well?

We think this is a point where reasonable people can disagree. There isn’t much data about the weight of truly wild animals over time, let alone good data that can distinguish how fat they are independent of other possible changes in their weight (e.g. they’re getting larger but not fatter).

When there’s not much data, you look for the data there is and see what it can tell you. In this case we don’t expect the data will be well-controlled or that it will do a good job accounting for alternative explanations. We just want to look and see if truly wild animals are heavier now than they were in the past.

In our conversation with Divia Eden, we discussed Wolverton, Nagaoka, Densmore, & Fullerton (2008). We pulled out this figure, which shows a positive trend for does and a stronger positive trend for bucks:

And we clarify:

There are alternative explanations for these trends of course — less competition for food, etc. — but at the very least these do seem to be animals eating pretty wild diets, and they do seem to be gaining weight

Basically, we find data that are consistent with the idea that truly wild animals are getting heavier. And we point out that there are alternative explanations.

So it’s pretty strange that Natália’s response is to point out there are alternative explanations. For example, she says:

Predation decreases their population density, which increases the amount of energy available for each individual deer in their habitat.

That’s the same alternative explanation we considered in the tweet: “less competition for food”. We know she must have read this tweet because she cites the thread in her post. We don’t know why she doesn’t mention that we highlighted the same alternative explanation. She’s framing it as though we thought this study was a slam-dunk, when we only ever said it was suggestive.

Better studies that control for confounds would be ideal. But there are always alternative explanations. In the absence of controlled studies, we use the best available data and evaluate how consistent it is with the hypothesis.

Certainly if we had looked for the weights of white-tailed deer and found that they were flat since 1970, or that their weights were decreasing, that would have been some evidence against the idea that truly wild animals are becoming obese, or at least inconsistent. So finding that weights are steadily increasing is some evidence in favor of the idea that truly wild animals are becoming obese, or at least it’s consistent with the idea.

Overall, this feels like an isolated demand for rigor, an “[attempt] to demand that an opposing argument be held to such strict invented-on-the-spot standards that nothing (including common-sense statements everyone agrees with) could possibly clear the bar”. To use Scott’s framing, “evidence consistent with a hypothesis doesn’t count if there are alternative explanations for that evidence” is a fake rule we never apply to anything else.

#5 Lithium at Altitude

Do you agree that water has higher lithium levels at high altitudes (the opposite of what would be needed for lithium to explain the altitude-obesity correlation)?

We believe Scott is referring to this argument from Natália:

Using publicly-available data from the USGS and the Open Elevation API, I found that across 1,027 domestic-supply wells (all wells whose coordinates were available), the correlation between altitude and log(lithium concentration) is 0.46. I also checked the correlation between altitude and topsoil log(lithium concentration) in the United States, with data I found here, and, again, it was positive (0.3). So lithium exposure is probably higher, rather than lower, in high-altitude areas in the United States (which, as a reminder, have lower obesity rates).

This criticism was pretty surprising to us, because we literally discussed it in the original series! In Interlude H (“Well Well Well”) we explored the same USGS dataset in depth and said:

One thing that you’ll notice is that the distribution of lithium in well water doesn’t match up all that well with the distribution of obesity. Colorado is the leanest state but has pretty high levels of lithium in its well water. Alabama is quite obese but levels of lithium in the well water there are relatively low. What gives?

…all of these measurements are of well water, but many areas get their drinking water from surface sources rather than from wells.

Let’s start with Colorado, since it’s the clearest example. As you can see from the map above, the average level of lithium in Colorado well water is higher than the national average. We have the raw data, so again we can tell you that the median level in Colorado wells is 17.8 ng/mL, the mean is 28.0 ng/mL, and the max is a rather high 217.0 ng/mL.

But this doesn’t matter, because almost none of the drinking water in Colorado comes from wells. Instead, most of the drinking water in Colorado comes from surface water, and most of that water comes directly from pure snowmelt.

We go on like this for a while.

Natália’s analysis only covers domestic-supply wells. These wells provide only part of our drinking water. It appears to exclude public-supply wells, and it entirely excludes surface water sources.

This is a problem, because we would expect the altitude-obesity correlation to mostly come from surface water contamination. Water from drilled wells has often been down there for thousands or hundreds of thousands of years, so lithium concentration in these aquifers is largely independent of human activity. But runoff from roads, landfills, brine spills, power plants, and factory explosions goes directly into surface water, and from there directly into people’s mouths. When we looked at the most obese communities in America, we found that many of them got their drinking water from surface water sources, often sources that have been exposed to lithium contamination from fossil fuels or from explosions at the local lithium grease plant.

It’s also worth restating that our position is that altitude is a proxy for “height in watershed”, which is itself a proxy for overall contamination. For example, West Virginia is relatively high elevation but also quite obese. In fact, it’s currently the most obese state of them all, at 41.2% obese. Why bother computing these correlations, doesn’t West Virginia disprove the theory all on its own?

Not at all, because despite being high-altitude and high in its watershed, West Virginia is home to an enormous amount of environmental contamination — especially from fossil fuels, which are a leading cause of lithium contamination. When you look at the local WV coal power plants, you find that they are leaking lithium into the surrounding water supply, sometimes at levels of above 100 ng/mL.

Even without these issues, this correlation can’t be a meaningful measure of the lithium-altitude question because the data aren’t at all representative. To extend correlation results to a population, the data should be a random (or otherwise representative) sample from that population. These data are not representative geographically or by population density. Here’s a map of the domestic-supply wells from this dataset (which Natália must have seen, because she has the same map in her post):

As you can see, the data mostly comes from Nebraska, certain parts of Texas, and the East Coast. Some parts of the country are barely represented; and some states, like Tennessee, are not represented at all.

So even if there is a small correlation within this dataset, it’s not an estimate of the correlation between lithium and altitude in the country as a whole, not even just within domestic-supply wells. Without a representative sample, we can’t reasonably infer that the same relationship in general would hold across the U.S.

#6 Lithium in American Food

Scott didn’t mention this one, but it’s the point that sparked Natália’s criticisms in the first place, so we think it deserves special attention.

This whole story begins when we put out a literature review of lithium levels in food. We concluded that, “There’s certainly lithium in our food, sometimes quite a bit of lithium. It seems like most people get at least 1 mg a day from their food, and on many days, there’s a good chance you’ll get more.”

The opening argument of Natália’s original post disputes this conclusion. Her argument is largely based on evidence from Total Diet Studies (TDS), which find less than 0.5 mg/kg lithium in every single food.

Natália prefers the TDS numbers, which is fine. But she says that our “literature review pretty much only includes studies that are outliers in the literature”. And she says that our review “largely relies on old data from a single author from Germany”.

This is not true. We cite more than 20 papers in that literature review, some of which are review papers that include other papers we didn’t cite directly. Only two of the papers we cite include this German author, Manfred Anke, as one of the authors — Anke, Schäfer, & Arnhold (2003) and Anke, Arnhold, Schäfer, & Müller (2005). We also mention two papers from Anke from 1991 and 1995, but we weren’t able to find them at the time, so they aren’t among the papers we cite and we weren’t able to include their data in the review. Are sources from 2005, 2003, 1995, and 1991 “old data”? They’re certainly not as old as many of the other sources we cited, like this 1941 Nature publication or this 1929 Science publication, which Natália didn’t complain about.

Maybe this is more of a concern about the number of times we mention Anke, rather than the proportion of papers he contributed. We do quote Anke a lot, but this is because he reports a lot of measurements in those two papers. Anke reported measurements for almost every food group, and we wanted to pass those measurements on to the reader. Omitting these measurements from our review would be a serious oversight.

We’d prefer to have more sources, but for some foods we could only find one or two sources besides Anke. We even complain in the post about having to rely so much on his data, saying “the bad news is that, like pretty much everything else, levels in animal products are poorly-documented and we have to rely heavily on Manfred Anke again.” This is why we conclude by calling for more research.

The truth is that there’s a split in the literature. The TDS studies consistently find low levels of lithium in food and beverages, as do some other papers. But other sources find much higher levels (not an exhaustive list):

Bertrand (1943), “found that the green parts of lettuce contained 7.9 [mg/kg] of lithium”
Borovik-Romanova (1965) “reported the Li concentration in many plants from the Soviet Union to range from 0.15 to 5 [mg/kg] in dry material”, in particular listing the levels (mg/kg) in tomato, 0.4; rye, 0.17; oats, 0.55; wheat, 0.85; and rice, 9.8.
Hullin, Kapel, and Drinkall (1969) found more than 1 mg/kg in salt and lettuce, and up to 148 mg/kg in tobacco ash.
Duke (1970) found more than 1 mg/kg in some foods in the Chocó rain forest, in particular 3 mg/kg in breadfruit and 1.5 mg/kg in cacao.
Sievers & Cannon (1973) found up to 1,120 mg/kg lithium in wolfberries.
Magalhães et al. (1990) found up to 6.6 mg/kg in watercress at the local market.
Ammari et al. (2011), looked at lithium in plant leaves, including spinach, lettuce, etc. and found concentrations in leaves up to 4.6 mg/kg Fresh Weight.
Manfred Anke and his collaborators found more than 1 mg/kg in a wide variety of foods, in multiple studies across multiple years, up to 7.3 mg/kg on average for eggs.
Schnauzer (2002) reviewed a number of other sources finding average intakes across several locations from 0.348 to 1.560 mg a day.
Five Polish sources from 1995 that a reader sent us reported finding (as examples) 6.2 mg/kg in chard, 18 mg/kg in dandelions, up to 470.8 mg/kg in pasture plants in the Low Beskids in Poland, up to 25.6 mg/kg in dairy cow skeletal muscle, and more than 40 mg/kg in cabbage under certain conditions.

Some of these measurements are of dry weight, so the fresh food would presumably have less. But others are fresh weight and still find > 1 mg/kg.

Hydroponic / plant-uptake studies, like Magalhães et al. (1990), Hawrylak-Nowak, Kalinowska, and Szymańska (2012), Kalinowska, Hawrylak-Nowak, and Szymańska (2013), Antonkiewicz et al. (2017), and Robinson et al. (2018), find that plants grown in lithium-rich water or soil accumulate lithium, and often end up containing more than 1 mg/kg. The lithium concentrations in these studies are mostly much higher than the amounts we think crops are usually exposed to, but they clearly support the idea that crops can accumulate lithium from their environment.

So, some sources find less than 1 mg/kg of lithium in food and beverages, others find more. The thing to do is to look at the totality of the evidence and try to figure out what’s going on. When results differ, it’s an opportunity to come up with hypotheses and do some testing to determine why.

We went back and took a closer look at the study methods. What we noticed is that the studies that found < 1 mg/kg lithium tended to use the same analysis technique — inductively coupled plasma mass spectrometry (ICP-MS) with microwave digestion with nitric acid (HNO3). The studies that found more than 1 mg/kg lithium in food mostly used a variety of other techniques. This made us suspect that the split in the literature is caused by the fact that different analytical methods give very different results, with some methods giving much higher and other methods giving much lower estimates.

To test this, we ran a study where we compared a couple different analytic approaches on a short list of diverse American foods. This confirmed our hypothesis. When the foods were digested in HNO3, both ICP-MS and ICP-OES analysis mostly reported that concentrations of lithium were below the limit of detection. And when foods were dry ashed first, both ICP-MS and ICP-OES consistently found levels of lithium above the limit of detection, reporting concentrations of several mg/kg for many of the foods we tested:

We think the higher numbers are more accurate — our full reasoning can be found in the original post. But even if you take the more conservative numbers as real, they still support the idea that foods sometimes contain more than 1 mg/kg, as these methods found up to 1.2 mg/kg lithium in goji berries.

Eggs had the highest levels of lithium in these results, up to 15.8 mg/kg lithium when ashed and analyzed with ICP-OES. So we followed up this project by running another pair of analytical studies taking a closer look at lithium levels just in American eggs.

The main finding of Study 1 is that that lithium was detectable in nearly all eggs:

Study 2 looked at egg-to-egg variation, finding less variation in samples from 1-egg batches than 4-egg batches, and generally confirming the results of Study 1:

A few general points here.

Don’t verbally disagree, empirically disagree. We could go back and forth for months, arguing about who is cherrypicking whom, which set of studies are really the “outliers”, whether SMTM relied too much on data from a single author from Germany, or whether or not four papers from 1991, 1995, 2003, and 2005 count as “old data”.

Why not run new studies to try to get to the bottom of things instead? Natália correctly pointed out that there was no lithium data from food from the modern United States. That was a big gap in our understanding, so we tested foods from the modern United States. Now those data exist.

Internet scientists can do more than comb over other people’s work and fight about it. It’s much better to settle confusion with data than with words, much more productive to fight over study design than over definitions. Let’s do that instead.

Analytical chemistry is not easy! People seem to assume that you can put a food sample into a machine and get an objective measurement of how much lithium is in that food out the other side. We know this because we kind of assumed the same thing before we did this project. Chemistry is one of those sciences that we have pretty well solved, right?

Turns out, it’s much more complicated. Different analytical techniques give different answers. And those answers aren’t objective, they’re just estimates. You realize that none of the measurements in the literature are any more objective than yours. They all require interpretation, and any of them could be wrong.

At some point we thought that the difference in findings was the result of different analytical techniques, so we were only going to compare ICP-MS to ICP-OES, with identical digestion. We happened to throw in different digestion techniques just in case. And it’s a good thing that we did, because that ended up being the main finding. It would have been easy to miss.

These two analytical techniques disagree, and it’s possible that one or both are overestimating lithium concentrations. But it’s also possible that they’re both underestimating lithium concentrations. We found up to 15 mg/kg lithium in eggs, but if the techniques are systematically underestimating the true concentrations, then maybe eggs contain more. Maybe they contain a lot more.

In fact, we think it’s more likely that these techniques underestimate lithium than overestimate. Lithium is especially tricky to measure because it is a tiny and extremely light ion that reacts differently depending on what else is in the sample. These kinds of problems tend to make tests read too low, not too high. Sources often emphasize how easy it is to run into these problems, like this article by environmental testing firm WETLAB which describes several potential problems in lithium analysis: “some of the limitations for lithium analysis are that lithium is very light and can be excluded by heavier atoms. … When Li is in a matrix with a large number of heavier elements, it tends to be pushed around and selectively excluded due to its low mass. This provides challenges when using Mass Spectrometry.”

So if our tests found 15 milligrams per kilogram in eggs, the real number could be even higher. And if that’s true, then we may still be underestimating how much lithium is actually in the food we eat.

This isn’t the end of the story, of course. We only tested a small number of foods, and we didn’t test many samples of each. We think this confirms that Americans regularly consume foods containing more than 1 mg/kg of lithium, but it doesn’t give a great sense of which foods contain the most lithium, or how much lithium might be contained at the upper limits. We found eggs that contain 15 mg/kg after looking at only a small number of eggs, so there are probably eggs out there that contain more, maybe a lot more. We haven’t tested wheat or soy, so if those contain 10 or 50 or 100 mg/kg, we wouldn’t know.

We’re currently fundraising to continue these studies, test more foods, and compare more analytical techniques so we can determine which technique(s) gives the most accurate measurements. We think it would be good to know how much lithium is in the American food supply, which foods have the highest concentrations, and how to measure these things in general.

If you’ve read to this point of the post you must be genuinely interested in this work, so please contact us. If you’d prefer the analyses to come from a third party, we would also love to see independent teams investigate these same questions and we’re ready to help.

Some Thoughts

Something about this whole discussion still strikes us as very odd.

Maybe it has something to do with how we think about science. Either the lithium hypothesis is already true, or it is already false. Arguments can change minds, and can shape how people decide to spend their time and energy, but the hypothesis is already true or false. If it is true, then all observations in the future will bend towards it. Otherwise they won’t. Argument can’t change that.

Any given hypothesis, we can take it or leave it. The real goal is to cure obesity, or at least figure out where the obesity epidemic came from. We give the lithium hypothesis a lot of weight because we still find it to be well-supported by the evidence — it’s not perfect, but it has predicted things that no other theory would predict (like that the Pima would have high levels of lithium in their water in the early 1970s) and it accounts for evidence that other hypotheses have a hard time accounting for (like why auto mechanics have such high rates of obesity).

We’re not on the “side” of the lithium hypothesis, but we’re happy to make the case for it as long as we think that it’s a plausible hypothesis. And as long as we think it’s the most likely hypothesis, we’ll keep looking for evidence that will help us clarify, like the studies of lithium in American food that we mentioned above.

If the lithium hypothesis is not true, or only accounts for a minor fraction of the obesity epidemic, we want to find out as soon as possible, so we can investigate other theories instead. For what it’s worth, we do think there’s some chance that the obesity epidemic is caused by pesticides, or something related to cars and heavy machinery, maybe in the exhaust.

We don’t understand why people think we are partisan in favor of the lithium hypothesis, but it’s a real stumbling block for these conversations. Good relationships are fundamentally based on the assumption of good faith, which means giving the other person the benefit of the doubt and believing they have positive intentions, even when their actions are unclear or confusing.

It is hard for us to know how to respond to people who start with the assumption that we are partisan and have bad intentions, for the same reason it is hard to productively respond to that schoolyard taunt, “does your mom know that you’re gay” — it is strongly and negatively framed, and any response plays into that framing. When people come at us asking us to defend a position rather than discuss it as colleagues, it’s a missed opportunity for everyone to work together.

We’d like to ask you to treat us like people rather than like opponents. There is a real mystery to be solved here, and our best bet at solving it is everyone working together and extending each other as much curiosity and charity as possible.

We can and should have fierce disagreements over the facts, but as long as our shared goal is finding the truth, we can have these disagreements in collaboration and good humor.

Links for September 2025

2025-10-01 04:35:42

In case you missed it — we were on the Clearer Thinking podcast with Spencer Greenberg, talking about cybernetics, psychology, and philosophy of science. Check it out: Episode 281: A new paradigm for psychology research (with Slime Mold Time Mold)

The Syndrome of the Ultra-fit:

…it should be evident that there is a whole army of regulatory systems in place that detect caloric flux. It is incredibly difficult if not impossible to outsmart these systems over long periods of time

Also:

The best strategy I have found is reverse-dieting. Increasing caloric intake by 100kcal per week. In my early twenties, I reverse-dieted my way from 2000kcal to 2500kcal over the course of 5 weeks. Interestingly, I have gained no to minimal amounts of fat but started to feel more vital.

I have seen something similar a number of times – individuals who had been undereating for a long time gradually increase their caloric intake over the span of a couple of weeks. And despite eating e.g., 25% more than before, they gain no to minimal amounts of fat.

Preparation Charter Houses

So You Want to Abolish Property Taxes

Why do we collect things?

It USED to be that when a scientist discovered something important, like finding the malarial parasite in the gastrointestinal tract of a mosquito, thereby demonstrating that mosquitos transmit the disease, you would write a poem about it and send it to your wife:

Independent Science

Neurons Gone Wild via The Seeds of Science

A Naturalistic Court Discussion from 1685

Butlerian-Jihad-coded: (though of course we disagree with the obesity metaphor) Are we living in a stupidogenic society?

Aella doing the kind of basic due diligence we need more of: Birth Control Myths Vs Data

People often ask us how to get started in science. Here’s a great example: My first palaeo paper is 20 years old today!

Rusting metal consumes oxygen and can lead to death especially in confined spaces like holds of ships. In 2007, three people suffocated one after the other in the anchor locker of a vessel because rust had consume all the oxygen.

Absence of evidence is evidence of absence; it just isn’t proof

Links for August 2025

2025-09-01 01:25:17

New Substack to watch, on diverse and outlier science: Reinvent Science

And here’s an example of what they’re talking about: Lay research on turtles, and the evolution of scholarly journals (see also the comments!)

We get a shout-out in The Antimeme Haunting Western Philosophy. A question we’ve pondered for some time:

Today I apply this insight to Western philosophy, in service of a question which I’ve been trying to answer for years on this blog: why does no one talk about cybernetics anymore?

Also: How do we live with each other?

How does the phthalates get into the beef? Some findings here: We tested Radius beef for plastic chemicals

“Last week, I had an unusually vivid dream about writing a book review for ACX. When I woke up, I remembered the review almost word-for-word. In some sense this is a best case scenario – write posts in my sleep, and spend my waking hours relaxing on the beach – but unfortunately the book I was reviewing doesn’t exist and most of what I say about it doesn’t make sense. Still, I’m posting the review here as a subscriber-only feature.” Dream Book Review: The Deal With Trauma (paywalled)

What Does It Mean To Be Thirsty? (h/t MY)

Satiety Graphed & The Horsemen of Obesity

High salt recruits aversive taste pathways

As late as 1813, people were saying that eating potatoes causes leprosy (h/t Adam Mastroianni)

A Time-Series Analysis of my Girlfriends Mood Swings, Behavioral Conditioning Methods to Stop my Boyfriend from Playing The Witcher 3, and The Bark Defense: A 99.999% Successful Method for Keeping Emily Safe from Strangers and Garbage Trucks — all these and more curiosities from the Journal of Astrological Big Data Ecology, the “premium source for made up science”.

SLIME MOLD TIME MOLDModify