MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Why I Transitioned: A Response

2026-01-20 10:06:54

Published on January 20, 2026 2:06 AM GMT

Fiora Sunshine's post, Why I Transitioned: A Case Study (the OP) articulates a valuable theory for why some MtFs transition.

If you are MtF and feel the post describes you, I believe you.

However, many statements from the post are wrong or overly broad.

My claims:

  1. There is evidence of a biological basis for trans identity. Twin studies are a good way to see this.
     
  2. Fiora claims that trans people's apparent lack of introspective clarity may be evidence of deception. But trans people are incentivized not to attempt to share accurate answers to "why do you really want to transition?". This is the Trans Double Bind.
     
  3. I am a counterexample to Fiora's theory. I was an adolescent social outcast weeb but did not transition. I spent 14 years actualizing as a man, then transitioned at 31 only after becoming crippled by dysphoria. My example shows that Fiora's phenotype can co-occur with or mask medically significant dysphoria.

A. Biologically Transgender

In the OP, Fiora presents the "body-map theory" under the umbrella of "arcane neuro-psychological phenomena", and then dismisses medical theories because the body-map theory doesn't fit her friend group.

The body-map theory is a straw man for biological causation because there are significant sex differences between men and women that are (a) not learned and (b) not reducible to subconscious expectations about one's anatomy.

The easiest way to see this is CAH. To quote from Berenbaum and Beltz, 2021[1]:

Studies of females with congenital adrenal hyperplasia (CAH) show how prenatal androgens affect behavior across the life span, with large effects on gendered activity interests and engagement, moderate effects on spatial abilities, and relatively small (or no) effects on gender identity

The sex difference in people-vs-things interests (hobbies, occupations) has been discussed extensively in our community. CAH shifts females towards male-patterned interests with small effects on gender identity, without changes in anatomy.

This finding is also notable because it shows male-patterned interests and female gender identity can coexist, at least in natal females.

 

Twin Studies à la LLM

I'm trans so I have a motive to search for evidence that suggests I am ~biologically valid~ and not subject to some kind of psychosocial delusion. It would be easy for me to cherry-pick individual papers to support that view. I'm trying to not do that. I'm also not going to attempt a full literature review here. Luckily it is 2026, and we have a better option.

The ACE model from psychiatric genetics is a standard framework for decomposing the variance in a trait into 3 components:

A = Additive Genetics: cumulative effect of individual alleles

C = Common Environment: parents, schooling, SES, etc.

E = Nonshared Environment (+ error): randomness, idiosyncratic life events[2]

There are at least 9[3] primary twin studies on transgender identity or gender dysphoria. I created an LLM prompt[4] asking for a literature review with the goal of extracting signal, not just from the trans twin literature, but from other research that could help give us some plausible bounds on the strength of biological and social causation. Here are the results. The format is POINT_ESTIMATE, RANGE:

model A C E
Opus 4.5 0.4, 0.2-0.6 0.05, 0-0.2 0.55, 0.35-0.7
Opus 4.5 Research .375, 0.2-0.6 0.125, 0-0.3 0.5, 0.3-0.6
GPT 5.2 Pro 0.35, 0.2-0.55 0.1, 0-0.25 0.55, 0.35-0.7
o3 Deep Research 0.4, 0.3-0.5 0.05, 0-0.2 0.55, 0.5-0.7
point est. average 0.38 0.08 0.54

 

I'm moderately confident my prompt was not biased because the A values here are lower than what I've gotten from Claude when asking for heritability estimates from twin studies only. Also, all the models included some discussion of the rapid rise in adolescent cases in the 2010s, often mentioning "social contagion" and ROGD theories explicitly. All the models also pointed out that the ACE model is a simplification and that gene-environment interaction may be significant.

These are pretty wide error bars. But since A is trying to capture heredity only, we can take A as a rough lower bound for biological causation. Even if E is purely social, 38% is significant.

Also, none of this tells us how much variation there is at the individual level. And we have no trans GWAS.

The big question is whether E is dominated by social or biological factors.

If social factors mattered a lot I would expect parental attitudes to be significant in affecting transgender identity. But most studies find low C. This holds even for population-based studies that do not suffer from ascertainment bias. I would be surprised if peer influences were highly causal but parental influences were not.

I think the evidence from CAH, fraternal birth order effects, and animal models also provides good mechanistic reasons to think there are significant biological effects in E as well as A.

How do trans people view this line of research? They tend to hate it. They're afraid it will eventually lead to:

  1. not choosing "trans embryos" during IVF
  2. aborting "trans fetuses"
  3. lab/genetic testing to determine who is allowed to medically transition

This is what I'll call "medical eradication": one half of the Double Bind.

 

B. The Trans Double Bind

The purpose of medicine is to improve health and reduce suffering.

In general, the state should not subsidize healthcare that does not increase QALYs. A rational healthcare system would ration care based on ranking all available treatments by QALYs saved per dollar, and funding all treatments above a cutoff determined by the budget.

The US healthcare system has a very creative interpretation of reality, but other countries like the UK at least attempt to do this.

To receive gender-affirming treatment, trans people must argue that such treatment alleviates suffering. This argument helped establish gender medicine in the 20th century. 

But in fact, the claim "being transgender involves suffering and requires medical treatment" is very controversial within the trans community. This is surprising, because disputing this claim threatens to undermine access to trans healthcare.

Moreover, this controversy explains why trans people do not appear to accurately report their own motivations for transition.

 

Motivations to transition

There are three possible sources:

  1. biological
  2. psychological/cognitive
  3. social

These can co-occur and interact.

Society at large recognizes only (1) as legitimate.

Trans people know this. They know they may be sent to psychotherapy, denied HRT, or judged illegitimate if they report wanting to transition for psychosocial reasons.

There is strong pressure for trans people to accept and endorse a biological/medical framing for their transitions.

But adopting this framing carries downsides:

  • Dependence on medical authorities for legitimacy
    • Historically, medicine has treated us very poorly[5]
    • We have little power to negotiate for better medical care if we are dependent on medicine to validate us to the rest of society
  • Psychological costs
    • Trans-cultural memory of medical mistreatment
    • Many find medicalization demeaning and resent dependence
  • Possible medical eradication
    • We can't claim we need care if we don't suffer[6], but one day the medical system might find a more direct way to eliminate our suffering: preventing trans people from coming into existence in the first place.
       

This is the Double Bind: many trans people need medical treatment, but find the psychological threat of medicalization and eradication intolerable.

Consequently, they will not claim their transition is justified because of biology. However, they know that psychological and social justifications will also not be accepted. In this situation, platitudes like "I am a woman because I identify as one" are a predictable response to incentives. If you attempt to give a real answer, it will be used against you.

Maybe you are thinking:

Marisa, this is hogwash! All the trans people I know are constantly oversharing lurid personal details despite obvious social incentives not to. The most parsimonious explanation is that people who say "I'm __ because I identify as  __" literally believe that.

Yes, good point. I need to explain another dynamic.

So far I've only discussed external incentives, but there is incentive pressure from within the trans community as well.

In the 2010s, the following happened:

  • Youth transitions increased
  • Nonbinary identification increased, especially among people not medically transitioning 
  • Acceptance, awareness, and politicization all increased
  • Social media happened

Suddenly the trans community was fighting for a much broader set of constituents and demands. 20th century binary transsexualism coheres with medical framings, but 2010s Tumblr xenogenders do not. And trans people of all kinds have always had insecurities about their own validity-- both internal and external.

Here is the key insight:

It's difficult to enforce norms that protect external political perception.

It's easy to enforce norms that protect ingroup feelings.

Assume I've performed and posted some porn on the internet. This porn is optically really really bad. Like actually politically damaging. Conscientious trans people will attempt to punish my defection-- but this is difficult. I can cry "respectability politics!" and point to the history of trans sex work in the face of employment discrimination. No one can agree on a theory of change for politics, so it's hard to prove harm. When the political backlash hits, it affects everyone equally[7]

By contrast, assume instead that I'm in a trans community space and I've told someone their reasons for transition are not valid, and they should reconsider. I've just seriously hurt someone's feelings, totally killed the vibe, and I'll probably be asked to leave-- maybe shunned long-term[8]. I have just lost access to perhaps my only source of ingroup social support. This is a huge disincentive. 

This structure, combined with the influx of novel identities in the 2010s, created an environment where it was taboo even to talk about causal theories for one's own transition, because it could be invalidating to someone else. All gender identities were valid at all times. Downstream effects of external social pressure, social media, and politics created an environment of collective ignorance where community norms discouraged investigating the causes of transition.

 

Introspective Clarity

Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to be quite aware of what they do and don't like about inhabiting their chosen bodies and gender roles. But when it comes to explaining the origins and intensity of those preferences, they almost universally to come up short. I've even seen several smart, thoughtful trans people, such as Natalie Wynn, making statements to the effect that it's impossible to develop a satisfying theory of aberrant gender identities. (She may have been exaggerating for effect, but it was clear she'd given up on solving the puzzle herself.)

This is the wrong interpretation of Natalie Wynn's oeuvre. See Appendix: Contra Fiora on Contra for why.

What would a legitimate explanation of the origins of one's gendered feelings look like?

Fiora never tells us her criteria. And the only example she gives us-- a psychosocial explanation of her own transition-- heavily implies that it was illegitimate.

But she's also dismissive of biological theories. Does that mean no transitions are valid?

I got whole genome sequencing last year. I can point at the sexual and endocrine abnormalities in my genome, but I certainly can't prove they justify my transition. Nevertheless, subjectively, HRT saved my life.

 

C. In the Case of Quinoa Marisa

the author, age 13. Note the oversized Haibane Renmei graphic tee

(Extremely simplified for brevity)

In middle school, puberty started and my life fell apart. I hated my erections, my libido; I felt like a demon had taken over my brain. Unlike my peers, I never developed a felt sense of how to throw my body around. They got rougher, and better at sports. I got injured.

I was pathologically shy and awkward. Locker room talk was utterly repulsive to me. I lost friends and didn't care. Rurouni Kenshin was my first inspiration to grow my hair out. I am very lucky my parents let me.

There was an autistic kid on my soccer team with a speech impediment. He was good at soccer but the other boys would cruelly tease him at practice, in part because he didn't understand they were teasing him. One night after practice I spent the car ride home sobbing about it in front of my dad, who didn't get it at all. I quit soccer.

I was utterly miserable in school. In March of 7th grade, I developed real depression, and started thinking about suicide. Mom took me to two different psychologists. We decided I would homeschool 8th grade. Now, I really had no friends. I was still depressed.

At this point I was only living for WoW and anime. By far, my favorite was Haibane Renmei. It's 13 episodes of angel-girls living in a run-down boarding school and basically just taking care of each other. It is heavily implied that the Haibane are there-- in purgatory-- because they committed suicide in the real world, and must learn how to accept love and care.

It's difficult to explain how much this series resonated with me. It gave structure to feelings I couldn't articulate. I never believed there was any possibility of becoming a girl in real life, so I didn't fantasize much about that. But for a couple years I daydreamed frequently about dying and becoming a Haibane[9].

My hair was long enough at this point that I "passed". I was frequently assumed female in social situations, and men would often tell me I was in the wrong bathroom. I longed for delicate reciprocal care with others who somehow understood what I was going through, even though I could hardly understand it myself. Haibane Renmei showed me this but I had no idea how to find it in the real world.

At 16, boy puberty hit me like a truck. I became ugly. I still had no social skills, and no friends. I dressed like a hobo. The summer after junior year I confronted myself in the mirror and admitted I would never be cute again. I still desperately wanted to be loved, and I believed that the only path to achieving that was becoming a man girls would want to date. That meant improving my appearance and social skills.

I knew that women find weebs unattractive. And my long hair was definitely unattractive. It all melded together. I had no real-world outlet for my femininity so I'd poured it all into identifying with anime characters. And it all seemed like a dead end. I felt that if I stayed in the anime community I would end up socially stunted, since its social standards were lower. I cut my hair and stopped watching anime. I put a lot more effort into socializing.

In college, I read The Man Who Would Be Queen, self-diagnosed as AGP, and actually considered transition for the first time. But it was too late for me-- the sight of my face in the mirror, and the depictions of AGPs in the book were too horrifying. I resolved to never transition, and attempted suicide soon after.

7 months later I fell in love, and that relationship turned my life around. I loved her immeasurably for 5 years, and we lived together for 2 of those. I became, on the outside, socially and professionally actualized as a man. I was a great boyfriend and had no problem getting dates. After the breakup I fell in love 2 more times.

You already know how this ends. No amount of true love or social validation as a man could fix me. I never wanted to transition, but at 31 the strain of repression became unbearable. Things have turned out far better than I ever dared imagine. My parents have remarked on multiple occasions, unprompted, how much happier I am now. They're right.


Overall I fit Fiora's phenotype: I was a mentally ill social outcast weeb, desperately identifying with anime characters as a simulacrum of loving care I had no idea how to find in real life.

But I can't explain my eventual transition at 31 through anything other than a biological cause. I looked obsessively for evidence of some repressed or unconscious ulterior motive, and found none. I believed that transition would be very expensive and time-consuming, physically painful[10], reduce my attractiveness as a mate, and change my social possibilities. All of these predictions have born true. What I didn't expect is that HRT drastically improved my mental health even before the physical changes kicked in. My baseline now is my former 90th-percentile of calm and happiness. 

I'm n=1 but this shows Fiora's phenotype can coexist with biologically rooted dysphoria. Moreover, I believe my middle school social failures were caused as much by gender incongruence as by neurodivergence. It's difficult to socialize when your puberty feels wrong and your social instincts don't match your assigned gender.

It's almost like most of them had deep emotional wounds, often stemming from social rejection, and had transitioned to become cute girls or endearing women as a kind of questionably adaptive coping mechanism.

Maybe. Or a misaligned subconscious sex is part of what caused the social rejection in the first place.

Conclusion

As Fiora implied, "cuteness-maxxing" is probably not a good reason to transition.

Most people desperately want to be loved and this can cause mistakes with transition in both directions. Social media is probably bad for minors. We should emphasize that, at a fundamental level, trans people are neither more nor less lovable than cis people.

The human brain is perhaps the most complex object in our known universe, and we will likely never be able to fully disentangle psychosocial factors from biological ones. That said, I do think humanity will discover ever stronger evidence for biological causes of trans identity within our lifetimes.

Introspection is a noisy way to attempt to answer "am I trans?", and you hit diminishing returns fast. It's also the wrong question. The right question is "should I transition?". Transition is best understood as a Bayesian process where you take small behavioral steps[11] and update on whether your quality of life is improving.

If you start transitioning and your intrinsic health and happiness improves, and you expect the same to be true in the long run, continue. If not, desist. There is no shame in either outcome.

 

  1. ^
  2. ^

    For twins, prenatal environment shows up in both C and E.

  3. ^

    Coolidge et al. (2002), Heylens et al. (2012), Karamanis et al. (2022), Conabere et al. (2025), Sasaki et al. (2016), Bailey et al. (2000), Burri et al. (2011), Diamond (2013), Buhrich et al. (1991).

    If you just want to read a systematic review of these studies, see https://pmc.ncbi.nlm.nih.gov/articles/PMC12494644/

  4. ^

    I'm trying to understand the etiology of transgender identity, particularly the strength of the evidence base for different categories of potential causes. Please segment the analysis into five categories:

    1. Hereditary/genetic factors
    2. Prenatal environment (hormonal, epigenetic, maternal)
    3. Postnatal biological environment (diet, medications, endocrine factors)
    4. Family/microsocial environment
    5. Macrosocial/cultural environment

    For each category, conduct a rigorous literature review prioritizing meta-analyses, large-N studies, and methodologically sound designs. Identify the strongest evidence both supporting and contradicting causal contributions from that category. Flag studies with clear methodological limitations and discuss known publication biases in the field.

    Focus primarily on gender dysphoria and transgender identity as defined in DSM-5/ICD-11, noting where studies conflate distinct constructs or onset patterns.

    Conclude with a variance decomposition estimate using the ACE framework and liability threshold model standard in psychiatric genetics. Provide:

    - Point estimates with plausible ranges for each component (A, C, E)
    - Confidence ratings for each estimate based on evidence quantity and quality
    - Explicit discussion of what each ACE component likely captures, mapped back to the five categories above
    - Acknowledgment of confounds and unmeasurable factors

    Include cross-cultural and temporal trend data as evidence bearing on the cultural/environmental components.

  5. ^

    In general, in the US in the 20th century, if a medical institution decided they simply didn't want to treat trans patients, there would be no public outcry. The doctors and organizations that did treat us could set terms. Prior to the 2010s there was little awareness of trans people, and the awareness we had was often prejudicial. IBM fired Lynn Conway after all.

  6. ^

    Some trans people (for example, Abigail Thorn and Andrea Long Chu) have attempted to argue that access to gender-affirming care should not be contingent on either (a) suffering prior to receiving treatment or (b) demonstrated therapeutic benefit for the treatment. These arguments were not well-received even within the trans community.

  7. ^

    It took r/MtF until 2025 to ban porn, after years of infighting. https://www.reddit.com/r/MtF/comments/1kaxn18/alright_lets_talk_about_porn_and_porn_accounts/

  8. ^

    This norm is not totally unreasonable. The purpose of community spaces is primarily social support for those early in transition, which can be difficult to find anywhere else. I went through this phase too.

  9. ^

    Yes, this is perverse and contradicts the moral of the story.

  10. ^

    Electrolysis is the most physically painful thing I've experienced. I've done 40 hours so far and will likely do 150-200 total.

  11. ^

    Voice training, experimenting with name/pronouns/clothing, laser hair removal, HRT. 



Discuss

Appendix: Contra Fiora on Contra

2026-01-20 09:53:55

Published on January 20, 2026 1:53 AM GMT

This is an appendixpost for Why I Transitioned: A Response.

In Why I Transitioned: A Case Study, Fiora Sunshine claims:

Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to be quite aware of what they do and don't like about inhabiting their chosen bodies and gender roles. But when it comes to explaining the origins and intensity of those preferences, they almost universally to come up short. I've even seen several smart, thoughtful trans people, such as Natalie Wynn, making statements to the effect that it's impossible to develop a satisfying theory of aberrant gender identities. (She may have been exaggerating for effect, but it was clear she'd given up on solving the puzzle herself.)

The evidence most strongly suggests that Natalie did not give up-- she was bullied into silence.

This misreading matters because it illustrates one half of the Trans Double Bind. Natalie's words in Canceling were chosen under extreme social pressure from the online/Twitter/leftist contingent of the trans community. This social pressure existed because the community felt they were enforcing norms necessary to ensure respect and acceptance for enbys[1].

The linked video, Canceling, is Natalie defending against accusations of transmedicalism[2] due to using a voice-over from transmedicalist Buck Angel in her previous video.

And in the linked section specifically, she is defending and attempting to recontextualize one of her tweets:

One of the most important facts about Natalie is that despite what her on-screen persona suggests-- she is sensitive and suffers greatly from hate comments online, especially from within the trans community[3].

This video reply to being canceled was high-stakes because it had major long-term implications not just for her Patreon livelihood and career but her dignity, physical safety, and social acceptance.

As far as I can tell, Natalie is not lying in Canceling. But she is defending her record in part through omission and vagueness.

I can't tell you what her genuine beliefs are. In part because of this controversy she deliberately moved away from making comments or videos directly about trans issues, and has expressed general despair about the situation.

I do not believe Natalie is a transmedicalist, secretly or otherwise. There is a lot of theory-space between "all genders/transitions are valid no matter what" and transmedicalism.

But her blanket retraction ("I no longer believe there can be any rational justification of gender identity") is not credible because:

A. The context of Canceling highly incentivized her to make her commentary on her tweet as politically defensible as possible (If you disavow reason then it is impossible to exclude anyone).

B. The evidence suggests her real views are more nuanced.

She has made multiple extremely personal, searching videos about her dysphoria and motivations to transition, most notably Autogynephilia. Beauty is surprisingly critical of the usage and concept of gender dysphoria (and motivations for pursuing medical transition). Transtrenders deals with all these topics in skit form, and was also heavily scrutinized online.

Prior to Canceling, Natalie stated on multiple occasions that she transitioned because of gender dysphoria. This illustrates the Double Bind because the online trans community took as implication that she believed dysphoria was an important part of justifying transition-- which would exclude people who do not report dysphoria, and threaten to reduce their acceptance in their identified gender.

The other side of the Double Bind is weak here because, in the 2010s as a binary trans woman with substantial income, Natalie's access to HRT and surgery was not conditional on endorsing transmedicalism.

I think her comments in her AMAs are more interesting and revealing. I can't link to these videos directly (paywall) and I don't know if anyone here cares to read long transcripts. But I will end this post by including some here because they are both interesting and relevant.

 

August 2018 Patron AMA stream

QUESTION (19:25): Becoming more the person you are was the thought that came to mind. It reminded me of something Schopenhauer said about the empirical character as a manifestation of the intelligible character. That what we appear to be outwardly is just an imperfect expression of our true immutable inmost nature. Does that resonate at all? Do you think it is a useful way of thinking about gender transition? Are you an expression of transcendental freedom? Could a cranky sexist 19th century philosopher be invoked against reductive shit lord rationalizing?

NATALIE: I think I actually take the opposite view. I take more of the Wittgenstein pragmatic view which is that the self is like invented instead of discovered. More trans people do actually think of it the way you're suggesting that by transitioning they're actually realizing this inherent like essence or singularity that's always there. That their exterior appearance is kind of finally becoming like their insides finally matching outside. It's like sort of not that's not really the sense I have to be quite honest like I kind of want to pretend that it is because it's a more attractive thing to say about yourself right? I think people might be more attracted to me if I was expressing the true feminine essence of my being but the truth is that I designed this, femininity is something I've worked on and it's a it's an invention it's a creation of mine as much as it is a discovery.

 

November 2018 Patron AMA stream

Question (2:24): How did you find out you were transgender?

Natalie: ...I started taking hormones before I was 100% sure I identified as a woman, to be honest, because I wanted the effects of the hormones... once I had started hormones... I'm like, I'm not non-binary, I just want to be a woman, and so it was like one step at a time...

When you discover that, you like taking female hormones, and it makes you feel better about yourself, and you like the physical changes, you just look at your life, and you're like, well, this is just going to be easier if I just be a woman, like, that sounds very pragmatic, but that to me is kind of thinking, if I went into it, honestly, there was sort of a pragmatic reasoning behind it, like, my life is going to be better if I just live as a woman. And so that's when I decided, like, fuck it, like, let's just go all in on this.


September 2019 Patron AMA stream

QUESTION (54:02): Do you think dysphoria is externally or internally generated? That is if we lived in a world without transphobia where trans identities were immediately 100% accepted by all people, would dysphoria still exist?

NATALIE: ...it's hard for me to imagine like what that would even look like because I think there's a difference between transphobia and some trans identities not being accepted immediately, because I think that part of what gender is is the assumption that there's two categories of people that in terms of all the senses present in a different way and if we just completely dropped the idea that gender is something that you identify based on the way someone looks and instead started thinking of gender as a purely psychological phenomenon it's a little bit hard for me to imagine like what being trans even would mean in that situation...

i just sort of don't get like i don't get what people are talking about when they talk about hypotheticals like this...

...what does it mean to identify as a woman when all woman means is a psychological state?

...i don't know how to talk about like i'm so used to the idea that like i just can't talk about this that like i i i sort of don't know how much i should say...

...there's trans people right who present totally within the normal range of what is expected of someone who's assigned their gender at birth and i'm not saying they're not valid i'm just saying that like i sort of don't recognize it as what being trans is to me

...my own trans identity it's so connected to this desire to socially fit in as a woman [and look female] and... so when someone identifies as trans without either of those components... i don't understand it yet.


QUESTION (02:55:25): are there any videos you would like to make but feel like you can't because they're too different or frivolious or inflammatory?

NATALIE: ...one I don't think I'll ever do would be a follow up to the Autogynephilia video... I kind of feel like that video in particular is kind of weak. Despite its length, I don't think it really deals with this the subject matter and well, and I think that the video I have in mind would be about a lot of the difficult questions about why trans women transition and how in my opinion like there is anthropological truth to Blanchardism like clearly he's observing real trends, right?

...if you read Magnus Hirschfeld's work from the 30s... it comes to the same conclusions as Blanchard and those things have troubled me throughout my transition and and in some ways have troubled me more as I've met more and more trans women, and feel that you know there really are these kinds of two stark clusters of trans women with very different backstories, and... if I were to make a theory about trans women I would do a kind of post Blanchardism that starts with a lot of those observations and then it tries to come up with a more nuanced way of talking about them than what Blanchard offers.

My Autogynephilia video has a million views and that's unusual. It's the only video of mine that's that old that has that many views. Why does that many video have so many views? A lot of people are googling this topic. And if you look at the more sinister parts of trans internet it's kind of an obsessive topic and I think that part of the reason for that is that a lot of mainstream trans discourse is very euphemistic about things. There's a heavily ideologically loaded concept of trans woman and you're supposed to believe all these things, like you're supposed to say I was always a woman and that I was a woman born in a man's body and like the fact of the matter is that this just does not line up with a very large number of people's experiences...

And then on the other side you have Blanchard who talks about, there's this group of trans women who before transition they live as feminine gay men and... the fundamental problem of their life is femininity and often that it's you know, they're bullied for and the it's just like this issue throughout their childhood adolescence and in early adulthood. On the other hand, you have a whole second group of trans women who basically seem to pass as normal men and until you know, they come out as trans and shock everyone and like it's just that these are two very different experiences so it's like such a deeply taboo topic...

The problem I have with my Autogynephilia video is that in a way I was pushing too hard against some of Blanchard's things, right, because it's a very threatening theory to trans women because is saying is that you are men. I want to try to make sense of Blanchard's observations without reaching his conclusion that these are just either male homosexuals or male fetishests because I don't believe that.

I've met hundreds of trans women at this point and um it's pretty hard not to notice that the two type typology is based on something that that's real, right? I'm not saying that the typology is theoretically good. I'm just saying that it's based on something that is quite clearly real, and so far as I'm aware there's simply no way of talking about that except Blanchardism and that's not superfucking great is it...

I hate the way a lot of people summarize my video like they'll just summarize it as oh, I said there's no such thing as autogynephilia, no one has that those feelings; that's clearly not true. I think it's actually quite common for men to um like yeah, you know like a straight guy who likes taking pictures of his butt in women's yoga pants, like sending them to his friends or something? it's a feeling, I don't think this is what what causes people to transition but I think it's a dimension to a lot of people's sexuality that I don't particularly see the point in denying. Nor do I think that Blanchardism is a good theory. 

 

  1. ^

    By the mid 2010s the lines of battle had shifted so much that binary trans people were no longer perceived to be under threat, and the focus shifted towards nonbinary issues. These were more politically salient (nonbinary => overthrowing the binary => overthrowing patriarchy) which made them more conducive to a social media positive feedback loop, and were also subject to more social opposition in everyday interactions.

  2. ^

    The view that trans people are only valid if they experience gender dysphoria

  3. ^

    See for example the 17 minutes at the beginning of her October 2019 patron AMA stream, right after the start of the controversy, where she is upset to the point of altering her speaking cadence, and at one point on the verge of tears.



Discuss

A Criteron for Deception

2026-01-20 09:25:46

Published on January 20, 2026 1:25 AM GMT

What counts as a lie?

Centrally, a lie is a statement that contradicts reality, and that is formed with the explicit intent of misleading someone. If you ask me if I’m free on Thursday (I am), and I tell you that I’m busy because I don’t want to go to your stupid comedy show, I’m lying. If I tell you that I’m busy because I forgot that a meeting on Thursday had been rescheduled, I’m not lying, just mistaken.

But most purposeful misrepresentations of a situation aren’t outright falsehoods, they’re statements that are technically compatible with reality while appreciably misrepresenting it. I likely wouldn’t tell you that I’m busy if I really weren’t; I might instead bring up some minor thing that I have to do that day and make a big deal out of it, to give you the impression that I’m busy. So I haven’t said false things, but, whether through misdirecting, paltering, lying by omission, or other such deceptive techniques, I haven’t been honest either.

We’d like a principled way to characterize deception, as a property of communications in general. Here, I’ll derive an unusually powerful one: deception is misinformation on expectation. This can be shown at the level of information theory, and used as a practical means to understand everyday rhetoric.

 

Information-Theoretic Deception

Formally, we might say that Alice deceives Bob about a situation if:

First Definition: She makes a statement to him that, with respect to her own model of Bob, changes his impression of the situation so as to make it diverge from her own model of the situation.

We can phrase this in terms of probability distributions. (If you’re not familiar with probability theory, you can skip to the second definition and just take it for granted). First, some notation:

  1. For a possible state  of a system , let

be the probabilities that Alice and Bob, respectively, assign to that state. These probability assignments  and  are themselves epistemic states of Alice and Bob. If Alice is modeling Bob as a system, too, she may assign probabilities to possible epistemic states  that Bob might be in:

2. Let

  1. be Bob’s epistemic state after he updates on information . In other words,  is the Bob who has learned .
  2. Take  to be the world . We’ll leave it implicit when it’s the only subscript.

With this notation, a straightforward way to operationalize deception is as information Alice presents to Bob that she expects to increase the difference between Bob’s view of the world and her own.

Taking the Kullback-Leibler divergence as the information-theoretic measure of difference between probability distributions, this first definition of deception is written as:

We can manipulate this inequality:

Write  for the product system composed of  and , whose states are just pairs of states of  and . The inequality can then be written in terms of an expected value:

This term is the proportion to which Alice expects the probability Bob places on the actual world state to be changed by his receiving the information $s$. If we write this in terms of surprisal, or information content,

we have

This can be converted back to natural language: Alice deceives Bob with the statement  if:

Second Definition: She expects that the statement would make him more surprised to learn the truth as she understands it[1].

In other words, deception is misinformation on expectation.

Misinformation alone isn’t sufficient—it’s not deceptive to tell someone a falsehood that you believe. To be deceptive, your message has to make it harder for the receiver to see the truth as you know it. You don’t have to have true knowledge of the state of the system, or of what someone truly thinks the state is. You only have to have a model of the system that generates a distribution over true states, and a model of the person to be deceived that generates distributions over their epistemic states and updates.

 

This is a criterion for deception that routes around notions of intentionality. It applies to any system that

  • forms models of the world,
  • forms models of how other systems model the world, and
  • determines what information to show to those other systems based on its models of these systems.

An AI, for instance, may not have the sort of internal architecture that lets us attribute human-like intents or internal conceptualizations to it; it may select information that misleads us without the explicit intent to mislead[2]. An agent like AlphaGo or Gato, that sees humans as just another game to master, may determine which statements would get us to do what it wants without even analyzing the truth or falsity of those statements. It does not say things in order to deceive us; deception is merely a byproduct of the optimal things to say.

In fact, for sufficiently powerful optimizers, deception ought to be an instrumental strategy. Humans are useful tools that can be easily manipulated by providing information, and it’s not generally the case that information that optimally manipulates humans towards a given end is simultaneously an accurate representation of the world. (See also: Deep Deceptiveness).

 

Rhetorical Deception

This criterion can be applied anywhere people have incentives to be dishonest or manipulative while not outright lying.

In rhetorical discussions, it’s overwhelmingly common for people to misrepresent situations by finding the most extreme descriptions of them that aren’t literally false[3]. Someone will say that a politician “is letting violent criminals run free in the streets!”, you’ll look it up, and it’ll turn out that they rejected a proposal to increase mandatory minimum sentencing guidelines seven years ago. Or “protein shakes can give you cancer!”, when an analysis finds that some brands of protein powder contain up to two micrograms of a chemical that the state of California claims is not known not to cause cancer at much larger doses. And so on. This sort of casual dishonesty permeates almost all political discourse.

Descriptions like these are meant to evoke particular mental images in the listener: when we send the phrase “a politician who’s letting violent criminals run free in the streets” to the Midjourney in our hearts, the image is of someone who’s just throwing open the prison cells and letting out countless murderers, thieves, and psychos. And the person making this claim is intending to evoke this image with their words, even though they'll generally understand perfectly well that that’s not what’s really happening. So the claim is deceptive: the speaker knows that the words they’re using are creating a picture of reality that they know is inaccurate, even if the literal statement itself is true.

This is a pretty intuitive test for deception, and I find myself using it all the time when reading about or discussing political issues. It doesn’t require us to pin down formal definitions of “violent criminal” and a threshold for “running free”, as we would in order to analyze the literal truth of their words. Instead, we ask: does the mental image conveyed by the statement match the speaker’s understanding of reality? If not, they’re being deceptive[4].

Treating expected misinformation as deception also presents us with a conversational norm: we ought to describe the world in ways that we expect will cause people to form accurate mental models of the world.

 

 

(Also posted on Substack)

 

  1. ^

    This isn’t exactly identical to the first definition. Note that I converted the final double integral into an expected value by implicitly identifying

    i.e. by making Bob’s epistemic state independent of the true world state, within Alice’s model. If Alice is explicitly modeling a dependence of Bob’s epistemic state on the true world state for reasons outside her influence, this doesn’t work, so the first and second definitions can differ.

    Example:  If I start having strange heart problems, I might describe them to a cardiologist, expecting that this will cause them to form a model of the world that’s different from mine. I expect they’ll gain high confidence that my heart has some specific problem X that I don’t presently consider likely due to my not knowing cardiology. So, to me, there’s an expected increase in the divergence between our distributions that isn’t an expected increase in the cardiologist’s surprisal, or distance from the truth. Because the independence assumption above is violated—I take the cardiologist’s epistemic state to be strongly dependent on the true world state, even though I don’t know that state—the two definitions differ. Only the second captures the idea that honestly describing your medical symptoms to a doctor shouldn’t be deception, since you don’t expect that they’ll be mis-informed by what you say.

  2. ^

    Even for humans, there’s a gray zone where we do things whose consequences are neither consciously intended nor unintended, but simply foreseen; it’s only after the action and its consequences are registered that our minds decide whether our narrative self-model will read “yes, that was intended” or “no, that was unintended”. Intentionality is more of a convenient fiction than a foundational property of agents like us.

  3. ^

    Resumes are a funnier example of this principle: if someone says they placed “top 400” in a nationwide academics competition, you can tell that their actual rank is at least 301, since they’d be saying “top 300” or lower if they could.

  4. ^

    Of course everyone forms their own unique mental images; of course it’s subjective what constitutes a match; of course we can’t verify that the speaker has any particular understanding of reality. But you can generally make common-sense inferences about these things.



Discuss

Evidence that would update me towards a software-only fast takeoff

2026-01-20 08:58:08

Published on January 20, 2026 12:58 AM GMT

In a software-only takeoff, AIs improve AI-related software at an increasing speed, leading to superintelligent AI. The plausibility of this scenario is relevant to questions like:

  • How much time do we have between near-human and superintelligent AIs?
  • Which actors have influence over AI development?
  • How much warning does the public have before superintelligent AIs arrive?

Knowing when and how much I expect to learn about the likelihood of such a takeoff helps me plan for the future, and so is quite important. This post presents possible events that would update me towards a software-only takeoff.

What are returns to software R&D?

The key variable determining whether software progress alone can produce rapid, self-sustaining acceleration is returns to software R&D (r), which measures how output scales with labor input. Specifically, if we model research output as:

where O is research output (e.g. algorithmic improvements) and I is the effective labor input (AI systems weighted by their capability), then r captures the returns to scale.

If r is greater than 1, doubling the effective labor input of your AI researchers produces sufficient high-quality research to more than double the effective labor of subsequent generations of AIs, and you quickly get a singularity, even without any growth in other inputs. If it's less than 1, software improvements alone can't sustain acceleration, so slower feedback loops like hardware or manufacturing improvements become necessary to reach superintelligence, and takeoff is likely to be slower.

Projected software capacity growth under different returns-to-scale assumptions, holding hardware constant. ASARA is AI Systems for AI R&D Automation. When r > 1, each generation of AI researchers produces more than enough capability gain to accelerate the next generation, yielding explosive growth (red, purple). At r = 1 (orange), gains compound but don't accelerate. When r < 1 (green, blue), diminishing returns cause growth to asymptotically approach the dashed baseline, making hardware or other bottleneck improvements necessary for continued acceleration. From Forethought.

A software-only singularity could be avoided if is not initially above 1, or if r decreases over time, for example, because research becomes bottlenecked by compute, or because algorithmic improvements become harder to find as low-hanging fruit is exhausted.

Initial returns to software R&D

The most immediate way to determine if returns to software R&D are greater than 1 would be observing shortening doubling times in AI R&D at major labs (i.e. accelerating algorithmic progress), but it would not be clear how much of this is because of increases in labor rather than (possibly accelerating) increases in experimental compute. This has stymied previous estimates of returns

Posterior distributions of returns to software R&D (r) across four domains. Only SAT solvers have a 90% confidence interval entirely above 1. From Epoch AI.

Evidence that returns to labor in AI R&D are greater than 1:

  1. Progress continues to accelerate after chip supplies near capacity constraints. This would convince me that a significant portion of continued progress is a result of labor rather than compute and would constitute strong evidence.
  2. Other studies show that labor inputs result in compounding gains. This would constitute strong evidence.
    1. Any high-quality randomized or pseudorandom trial on this subject.
    2. Work that effectively separates increased compute from increased labor input [1].
  3. Labs continue to be able to make up for less compute than competitors with talent (like Anthropic in recent years). This would be medium-strength evidence.
  4. A weaker signal would be evidence of large uplifts from automated coders. Pure coding ability is not very indicative of future returns, however, because AIs’ research taste is likely to be the primary constraint after full automation.
    1. Internal evaluations at AI companies like Anthropic show exponentially increasing productivity.
    2. Y Combinator startups grow much faster than previously (and increasingly fast over time). This is likely to be confounded by other factors like overall economic growth.

Compute bottlenecks

The likelihood of a software-only takeoff depends heavily on how compute-intensive ML research is. If progress requires running expensive experiments, millions of automated researchers could still be bottlenecked. If not, they could advance very rapidly.

Here are some things that would update me towards thinking little compute is required for experiments:

  1. Individual compute-constrained actors continue to make large contributions to algorithmic progress[2]. This would constitute strong evidence. Examples include:
    1. Academic institutions which can only use a few GPUs.
    2. Chinese labs that are constrained by export restrictions (if export restrictions are reimposed and effective).
  2. Algorithmic insights can be cross-applied from smaller-scale experimentation. This would constitute strong evidence. For example:
    1. Optimizers developed on small-scale projects generalize well to large-scale projects[3].
    2. RL environments can be iterated with very little compute.
  3. Conceptual/mathematical work proves particularly useful for ML progress. This is weak evidence, as it would enable non-compute-intensive progress only if such work does not require large amounts of inference-time compute.

Diminishing returns to software R&D

Even if returns on labor investment are compounding at the beginning of takeoff, research may run into diminishing returns before superintelligence is produced. This would result in the bumpy takeoff below.

Three intelligence explosion/takeoff scenarios. In the rapid scenario, a software-only takeoff reaches a singularity. In the bumpy scenario, software-only takeoff stalls until AI can improve hardware and other inputs. In the gradual scenario, meaningful capability gains only occur once AI can augment the full stack of inputs to production. From Forethought.

 

The evidence I expect to collect before takeoff is relatively weak, because current progress rates don't tell us much about the difficulty of discovering more advanced ideas we haven't yet tried to find. That said, some evidence might be:

  1. Little slowdown in algorithmic progress in the next few years. Evidence would include:
    1. Evidence of constant speed of new ideas, controlling for labor. Results from this type of analysis that don’t indicate quickly diminishing returns would be one example.
    2. Constant time between major architectural innovations (e.g. a breakthrough in 2027 of similar size to AlexNet, transformers, and GPT-3)[4].
    3. New things to optimize (like an additional component to training, e.g. RLVR).
    4. Advances in other fields like statistics, neuroscience, and math that can be transferred with some effort. For example:
      1. Causal discovery algorithms that let models infer causal structure from observational data.
  2. We have evidence that much better algorithms exist and could be implemented in AIs. For example:
    1. Neuroscientific evidence of the existence of much more efficient learning algorithms (which would require additional labor to identify).
    2. Better understanding of how the brain assigns credit across long time horizons.

Conclusion

I expect to get some evidence of the likelihood of a software-only takeoff in the next year, and reasonably decisive evidence by 2030. Overall I think evidence of positive feedback in labor inputs to software R&D would move me the most, with evidence that compute is not a bottleneck being a near second. 

Publicly available evidence that would update us towards a software-only singularity might be particularly important because racing companies may not disclose progress. This evidence is largely not required by existing transparency laws, and so should be a subject of future legislation. Evidence of takeoff speeds would also be helpful for AI companies to internally predict takeoff scenarios.

Thanks for feedback from other participants in the Redwood futurism writing program. All errors are my own. 

  1. ^

    This paper makes substantial progress but does not fully correct for endogeneity, and its 90% confidence intervals straddle an r of 1, the threshold for compounding, in all domains except SAT solvers.

  2. ^

     It may be hard to know if labs have already made the same discoveries.

  3. ^

    See this post and comments for arguments about the plausibility of finding scalable innovations using small amounts of compute.

  4. ^

    This may only be clear in retrospect, since breakthroughs like transformers weren't immediately recognized as major.



Discuss

There may be low hanging fruit for a weak nootropic

2026-01-20 08:51:01

Published on January 20, 2026 12:51 AM GMT

The problem

You are routinely exposed to CO2 concentrations an order of magnitude higher than your ancestors. You are almost constantly exposed to concentrations two times higher. Part of this is due to the baseline increase in atmospheric CO2 from fossil fuel use, but much more of it is due to spending a lot of time in poorly ventilated indoor environments. These elevated levels are associated with a decline in cognitive performance in a variety of studies. I had first heard all of this years ago when I came across this video which is fun to watch but, as I’ll argue, presents a one sided view of the issue[1].

This level of exposure is probably fine for both short and long term effects but essentially everyone alive today has not experienced pre industrial levels of CO2 which might be making everyone very slightly dumber. I don’t think this is super likely and if it happening it is a small effect. But, it is also the kind of thing I would like to be ambiently aware of and I am kind of disappointed in the lack of clarity in the academic literature. Some studies claim extremely deleterious effects from moderate increases in CO2[2], some claim essentially none even with 4000ppm[3], ten times the atmospheric concentration.

The main graphs from the above studies show ridiculously different results. These were intentionally chosen to contrast and make the point.

A lot of the standard criticisms of this kind of thing apply, underpowered studies, methodological flaws for measuring cognitive performance or controlling CO2 concentration, unrepresentative populations[4], and p-hacking via tons of different metrics for cognitive performance. All of this makes even meta analysis a little unclear. This blog post covers a meta analysis pretty well and the conclusion was that there is a statistically significant decreases in performance on a Strategic Management Simulation (SMS) but that was comparing <1500ppm to <3000ppm which is a really wide range and kind of arbitrary. However, nobody has done the experiment I think would be most interesting. That being a trial where subjects are given custom mixes with 0ppm, 400ppm, and 800+ppm. This would answer not only if people are losing ability from poorly ventilated space but also if we are missing out on some brain power if we had no CO2 in the air we breathe in. Again, the effect size is probably pretty small but one of the studies was looking at a drop in productivity of 1.4% and concluding that that level of productivity loss justified better ventilation. Imagine if the whole world is missing out on that from poor ventilation. Imagine if the whole world is missing out on that because we are at 400 instead of 0. Again, not likely but the kind of thing that would have big (cumulative) downsides if true.

I tried looking at the physiological effects of CO2 and did not do as deep a dive as I would have liked but this paper claims that there is a dose response relationship between cerebral blood flow and CO2 concentration (in the blood) and that it really levels out beneath ~normal physiological levels. I take this to mean that there would be a small, but measurable, physiological response if I could remove all the CO2 from my blood, which they did by hyperventilating.

Along the way I started looking at physiological effects of O2 availability and, well, I have some words about a particular article I found. Look at this graph:

It looks like there is some homeostasis going on where your cerebral blood flow can go down because there is more oxygen in the blood (%CaO2) giving you the same amount delivered (%CDO2). The only issue is that they said “When not reported, DO2 was estimated as the product of blood flow and CaO2.” When I read that I felt like I was losing my mind. Doesn’t that defeat the whole purpose of looking at multiple studies? If you just assume that the effect is given by some relation, fill in data based on that assumption, and average out with real data of course you’re going to get something like the relation you put in. As one of the many not doctors in the world, maybe I should stay in my lane but this does strike me as a bit circular. I am not convinced that an increase in atmospheric O2 does not lead to an increase in the O2 delivered to the brain. Especially because decreases in O2 partial pressure are definitely related to decreases in O2 (and cognition) in the brain and it would be kind of weird if the curve was just totally flat after normal atmospheric levels[5].

I also found one very optimistic group claiming that breathing 100% O2 could increase cognitive performance in two main papers. They are both recent and from a small university so it makes sense that this didn’t get a ton off attention but that doesn’t really make me less skeptical that it’s just that easy. The first paper claimed 30% increase in motor learning and I would expect that effect size to decrease significantly upon replication.

All this leaves four main possibilities the way I see it:

  1. No effect, everything is business as usual for usual O2/CO2 ranges
  2. CO2 decreases cognitive ability with a dose response relationship even at low doses
  3. O2 enriched air can have significant gains that basically nobody has captured
  4. VOCs[6] have bad effects and ventilation reduces their concentration and that is what confuses the hell out of all these studies

 

My solution

Well, I don’t have the resources to do a randomized control trial. But, I do have the ability to make a CO2 scrubber and feed the treated air into a facemask so I can breathe it. If I do this, I’m not buying the parts until I confirm nobody leaves a comment just demolishing the central thesis, I would probably wait until spring as opening my windows seems like a big important step to having low ambient CO2[7] but would be pretty miserable for me while there’s still snow outside.

This is a chance to talk about some cool applications of chemistry. The idea is that CO2 can react with NaOH to form only aqueous products, removing the CO2 from the air. These can then react with Ca(OH)2 to yield a solid precipitate which can be heated to release the CO2 and reform the Ca(OH)2. This is, apparently, all pretty common for controlling the pH of fish tanks so that’s convenient and cheap.

I’ve already been trying to track my productivity along with a few interventions so I plan to just roll this in with that. This won’t be a blinded trial but I am happy to take a placebo win if it increases my productivity and if it doesn’t do anything measurable I’m really not interested in it.

As for oxygen enrichment, you can buy oxygen concentrators, nitrogen filters that people use for making liquid nitrogen instead of liquid air, medical grade oxygen, oxygen for other purposes, or make it with electrolysis. All of these strike me as being somewhat dangerous or quite expensive to do for long periods of time. Someone else on LessWrong wanted oxygen (for a much better and less selfish reason) and got some for divers/pilots. I would do that, but again, expensive.

With any luck, I will have a case study done on myself at some point and can update everyone with the results.

  1. ^

    I don’t want to be harsh, the video is only a few minutes long, is made by a climate activist who already has some strong beliefs on CO2, and he did put his own mind on the line as a test case to make a point which I applaud. Given those reasons and that he seemed to have quite negative effects from the CO2 himself I think it is quite fair that he didn’t have a detailed counterargument presented.

  2. ^
  3. ^
  4. ^

    The group used “astronaut-like subjects” which is fine but I don’t know if that generalizes to most other people.

  5. ^

    Not hugely surprising though, we did evolve to use the atmospheric level so I wouldn’t be shocked if it was flat, just that this study didn’t convince me that it was flat.

  6. ^

    I realized I did not talk about VOCs, volatile organic compounds, at all. They are just a wide variety of chemicals that permeate the modern world and are probably bad in ways we aren’t certain of.

  7. ^

    As an aside, I would not be shocked if poor ventilation during the winter was a contributing factor to seasonal affective disorder but I don’t have that and did not look into anyone checking if it is true.



Discuss

Everybody Wants to Rule the Future - Is Longtermism's Mandate of Heaven by Arithmetic Justified?

2026-01-20 07:31:58

Published on January 19, 2026 11:31 PM GMT

Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears)

 

I was reading David Kinney’s interesting work from 2022 “Longtermism and Computational Complexity” in which he argues that longtermist effective altruism is not action-guiding because calculating the expected utility of events in the far future is computationally intractable. The crux of his argument is that longtermist reasoning requires probabilistic inference in causal models (Bayesian networks) that are NP-hard.[1]

This has important consequences for longtermism, as it is standardly utilized in the EA community, and especially for the works of Ord and MacAskill. Kinney suggests their framework cannot provide actionable guidance because mortal humans lack the computational bandwidth to do Bayesian updating. Therefore, the troubling conclusion is that utilizing this framework does not allow people to determine which interventions actually maximize expected value.

In this paper I want to show that even if we could magically solve Kinney’s inference problem (a genie gives us perfect probability distributions over every possible future) we can’t make definitive expected value comparisons between many longtermist strategies because it is an undecidable problem. Any intervention is comprised of a series of actions which end up acting as a constraint on strategies you can still do. When we compare interventions we are comparing classes of possible strategies and trying to determine the superior strategy in the long-run (dominance of constrained optima). 

Because I am going to talk a lot about expected value I want to be clear that I am not claiming that using it as a private heuristic is bad, but rather that many Longtermists often utilize it as a public justification engine, in other words, a machine that mathematically shows what is more correct and what you should obey. This is the focus of EV in this essay.

I show, utilizing some standard CS results from the 2000s, that the retort of “can’t we just estimate it” ends up as a NP-hard, undecidable, or uncomputable to guarantee depending on the restrictions. This challenges a thread that continues to exist in the EA/Longtermist community in 2025. For example, MacAskill continues to make strong dominance claims in his Essays on LongtermismEven with the hedging included in his arguments (not requiring optimal policies, approximations suffice for large numbers, meta-options exist, etc.) serious computational road blocks arise. For general policies the problem turns out to be undecidable. If you constrain your work to memoryless stationary policies then polynomial approximation is only possible if P=NP. And if we go even narrower to average-reward cases no computable approximation exists.

EAs frequently utilize a sort of borrowed epistemic credibility based on very finite and restricted projects (say distributing malaria nets) and then unwarrantedly extend this into areas of extremely long (or infinite timelines) where it can be shown that mathematical tractability ceases to exist (panspermia, AI safety, etc), and that these interventions are not possible to be compared against one another. 

That said, not every Longtermist claim is so hard, and there are likely restricted domains that are comparable. However, as a general schema it falls apart and cannot guarantee correctness. Longtermists that want to claim superiority by mathematical maximization must specify how they are simplifying their models and show why these simplified models have not defined away the critical elements of the future that longtermists vaunt. 

Context

Greaves and MacAskill claim for dominance of moral action using EV when they say:

“The potential future of civilisation is vast... [therefore] impact on the far future is the most important feature of our actions today”

which they then formalize as:

"Axiological strong longtermism (ASL): In the most important decision situations facing agents today... (ii) Every option that is near-best overall delivers much larger benefits in the far future than in the near future."

This notion can be expressed as , with  representing the optimal expected value achievable under an intervention  versus . Such a statement requires a methodological guarantee to gain authority as a ranking procedure (i.e. you need to be able to demonstrate why intervention  is superior to .) Such claims are crucial to the justification of longtermism as a methodologically superior and more moral reasoning procedure for these questions.

When Kinney presented his results that showed inference to be NP-hard, a standard response could be that bounded agents, which don’t require exact probabilities, are sufficient. So let us assume we give even more than a bounded agent, we allow an agent to have a perfect probabilistic representation of the world. For model classes used by longtermists the optimization (control) ends up being a distinct and undecidable problem. In other words, even if some deus ex machina saved the inference problem, Longtermists still would not be able to fix the control problem. 

A Model of Interventions

To model these types of moral decisions in the real world in the far future we should select a method that has action-conditioned dynamics (that is, a person or agent can influence the world) and one that is partially observable (we can’t know everything about the universe, only a limited slice of it.) To achieve this it is sensible to use a finite-description Partially Observable Markov Decision Process (POMDP), formally defined here as:

Where , and  refer to the states, actions, and observations available to the agent. The function  is a transition function for determining the probability of a state change based on an action.  captures of the observation probabilities and  is the reward function.  is the discount to the reward based on how far in the future it is , but note that the results below hold even if you remove discounting. Finally,  represents the initial probability distribution over states. 

It is important to distinguish between the levels of control that are necessary for complex open-ended futures (General Policies, ), versus the limited capabilities of agents with bounded memory (Finite State Controllers, , i.e. bounded agents), versus Stationary Policies () that are memoryless because it provides clarity for the reasoning and justifications that should mirror each other. For example, it is not logical to assume access to general policies about the far future, but then retreat to bounded agents and claim to have solved for the math is provable. 

I am going to model an intervention  as a constraint on the admissible policy set because interventions for the real-world usually describe the initial step rather than the series of actions over all time. So you can do something like distributing malaria nets at , but then you can pursue a perfect strategy after that. Let  be the set of policies consistent with intervention  and  represent the maximum, or perfect, expected value of the intervention:

So then we can define the problem of defining the superior intervention, given  as:

There are three questions a Longtermist should be able to answer:

  1. The Threshold Problem: is a specific standard of success mathematically achievable by some policy? Given  and rational , does there exist a policy  in  such that 
  2. The Approximation Problem: can you output an estimate value  that is within the specified error bound of the true optimal value ? Can a Bounded Agent produce an approximated value that is close enough to the true optimal value? Output a value  such that (multiplicative) or  (additive).
  3. The Dominance Problem: given a formal model of cause prioritization  can you show the optimal value of  is strictly greater than the optimal value of ? Is ?

Three Theorems

To examine whether the three questions above are computationally tractable I am going to utilize results from Madani, Hanks, and Condon (2003)[2] and Lusena, Goldsmith, and Mundhenk (2001)[3]. Can an algorithm exist that takes a longtermist model  and outputs answers to the Threshold Problem and Approximation Problem? After that I will examine the Dominance Problem.

Madani demonstrated that when the time horizon is infinite, trying to verify a specific value is achievable creates a paradox similar to the Halting Problem (of course Omega played a role in my thoughts on this project.) I am evaluating the Threshold Problem for  (broad policies required to model open-ended future strategies).

My first Theorem is derived from Madani and says for finite-description, infinite-horizon, POMDPs, the Threshold Problem is undecidable under the discounted criterion when  includes implicit policy representations including if we do this with an undiscounted total reward.

Theorem 1

This implies that for the general longtermism case no algorithm exists that can definitively answer “can we achieve this value?”  

My second Theorem examines the Approximation Problem. A Longtermist may downgrade an agent and assume they utilize a restricted policy class, such as  which are memoryless maps of  However; Lusena demonstrated that these restrictions do not necessarily solve the tractability problem. 

Theorem 2: a polynomial-time algorithm achieving 

This shows that for infinite-horizon POMDPs under total discounted, or average reward, calculating an -approximation for the optimal stationary policy is NP-hard.

Utilizing this same paper, I can show that if we use the average reward criterion in an unobservable situation the situation devolves because there is no computable algorithm that can produce an approximation with an additive error .

Theorem 3: For unobservable POMDPs under average reward with time-dependent policies, no computable -approximation exists. 

These three Theorems, utilizing well-known results, show that for general policies the problem is undecidable and for restricted policies it is either NP-hard or not approximable. 

Schema-Level Reduction

One criticism a Longtermist might have is that it is easier to calculate the preference order of something ( is better than ) rather than the exact value of it ( is a 9.8 which is better than  which is a 6.7). However; it turns out that this is not the case for this class of problems, and I will show that the Dominance Problem is equivalent to the Threshold Problem.

Lemma 1: the Threshold Problem reduces to the Intervention Dominance Problem.

Proof by Construction: Let  be an instance of the Threshold Problem with discount  and I want to determine if  First construct a new POMDP  with a new initial state  that has only two actions: it can Enter which causes a transition to a state  with probability  (the initial distribution of ) for an immediate reward of 0 or it can Safe which transitions deterministically to an absorbing state  at time  for an immediate reward of 0.

The rewards for this structure begin once an agent enters  via the Enter action and their rewards follow the original reward structure in . If the agent chooses Safe they enter  and receive a constant reward  at every single time step forever. 

Let’s now compare the optimal values of these interventions starting at . The Value of Entering is discounted by one step because the agent enters  at . Since the transition probabilities match , the expected value of the next state is exactly the value of starting :

For the Value of Safety, the agent enters  at  and receives the constant reward forever in a geometric series:

So 

Which proves that  is strictly greater than  iff the original optimal value  is greater than the threshold . Any algorithm that could solve the Dominance Problem could solve the Threshold Problem, but we showed in Theorem 1 that the Threshold Problem is undecidable, so the Dominance Problem is also undecidable.

Bounded Agents and the Certification Gap

Another objection could take the form of “we understand that finding the global optimum  is undecidable, but as bounded agents we are optimizing on a more restricted class (say as ) using a heuristic solver (say something like SARSOP).” However; this retreat from maximizing optimality surrenders Dominance. If they claim  is better than Intervention  and use a heuristic solver  they only establish:

Which is a statement about algorithms, not interventions. For  to actually permit better outcomes than  you must assume the Certification Gap is small or bounded:

Unfortunately, this usually reduces to the Approximation Problem and Lusena’s work demonstrates that even for restricted stationary policies, guaranteeing an approximation is NP-hard. So the trade becomes undecidability for intractability and this calculation of “EV” is not a normative one, but rather an unverified hypothesis that the heuristic's blind spots are distributed symmetrically across interventions. To verify this hypothesis we would have to solve the problem we have shown is either undecidable or intractable.

Conclusion

None of this work is meant to imply I don’t think we should care about future lives or long-term difficult problems. I think these are enormously important topics to work on. I do, however, believe these results challenge the narrative that longtermists can rely on EV dominance as a source of normative authority.

For the broad model classes that are of critical importance to Longtermists I have shown that it is undecidable whether one intervention is better than the other (Theorem 1) and even with significant restrictions obtaining correct guarantees are NP-hard (Theorem 2.) 

At times Longtermists will play a sophisticated game of kicking the can down the road for these types of questions. This is often expressed in the form of a “pause” or “moratorium” until they learn more. However, as we have shown, even if they were granted perfect knowledge, they would not be able to control their intervention for these long duration events. That is a serious problem for the “delay” approach.

I think this leaves Longtermists with a much weaker case for why they should be the unique arbiters of long-term issues like AI-control, panspermia, etc. They simply don’t have compelling enough math, on its own, to argue for these cases, and it is often the math which is the bedrock of their spiritual authority. 

Longtermists should specify the policy restrictions and approximation guarantees they are utilizing when relying on the authority of mathematical optimization. They should also shift from claiming “ is better than ” and instead reveal the heuristic that is being utilized to say something like “Heuristic X prefers  to .” 

Finally I would suggest that in making the restrictions that are necessary for them to argue about long-term dynamics, they frequently are going to end up defining away the very features that they purport to value. It may be the case that other philosophical methods are necessary to help answer these questions.

At the top we asked “Is Longtermism's Mandate of Heaven by Arithmetic Justified?” The good news is that a Mandate of Heaven in ancient China was only divine justification until something really bad came up. As soon as there was a famine, the Divine Mandate dried up and it was time for a new one. It might be that time for the core of Longtermism.

  1. ^

    Scott Aaronson brought attention to computational complexity when discussing the problematic implications for an “ideal reasoner” given finite compute.

  2. ^

    Madani, O., Hanks, S., & Condon, A. (1999). “On the Undecidability of Probabilistic Planning.” AAAI.

  3. ^

    Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). “Nonapproximability results for partially observable Markov decision processes.” JAIR, 14:83–103.



Discuss