MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

How much superposition is there?

2026-02-18 21:53:33

Published on February 18, 2026 1:53 PM GMT

Written as part of MATS 7.1. Math by Claude Opus 4.6.

I know that models are able to represent exponentially more concepts than they have dimensions by engaging in superposition (representing each concept as a direction, and allowing those directions to overlap slightly), but what does this mean concretely? How many concepts can "fit" into a space of a given size? And how much would those concepts need to overlap?

Superposition interference diagram, from Toy Models of Superposition

 

This felt especially relevant working on SynthSAEBench, where  we needed to explicitly decide how many features to cram into a 768-dim space. We settled on 16k features to keep the model fast - but does this lead to enough superposition to be realistic? Surely real LLMs have dramatically more features and thus far more superposition?

As it turns out: yes, 16k features is plenty! In fact, as we'll see in the rest of this post, 16k features in a 768-dim space actually leads to more superposition than trillions of features in a 4k+ dim space, as is commonly used for modern LLMs.

Personally I found the answers to this fascinating - high dimensional spaces are extremely mind-bending. We'll take a geometric approach and try to answer this question. Nothing in this post is ground-breaking, but I found thinking about these questions enlightening. All code for this post can be found in on Github or Colab.

Quantifying superposition

First, let's define a measure of how much superposition there is in a model. We'll use the metric mean max absolute cosine similarity, defined as follows:

This metric represents a "worst-case" measure of superposition interference for each vector in our space. It's answer the question: on average, what's the most interference (highest absolute cosine similarity) each vector will have with another vector in the space?

Superposition of random vectors

Perfectly tiling the space with concept vectors is challenging, so let's just consider the superposition from random vectors (We'll see later that this is already very close to perfect tiling). If we have  random unit-norm vectors in a -dimensional space, what should we expect  to be? We can try this out with a simple simulation.

Simulation picking N random vectors in a d-dimensional space, and calculating superposition 

We vary  from 256 to 1024, and  from 4,096 to 32,768 and calculate , showing the results above. This is still very small-scale though. Ideally, we'd like to know how much superposition we could expect with billions or even trillions of potential concepts, and that's too expensive to simulate. Fortunately, we can find a formula that we can use to directly calculate  without needing to actually run the simulation.

Calculating  directly

We can compute the expected  exactly (up to numerical integration) using the known distribution of cosine similarity between random unit vectors.

For two random unit vectors in , the squared cosine similarity follows a Beta distribution: . This means the CDF of  is the regularized incomplete beta function:

For each vector, its max absolute cosine similarity with  others has CDF  (treating pairwise similarities as independent, which is excellent for large ). The expected value of a non-negative random variable gives us:

We can calculate this integral using scipy.integrate. Let's see how well this matches our simulation:

The predicted values exactly match what we simulated!

Scaling to trillions of concepts

Let's use this to see how much superposition interference we should expect for a some really massive numbers of concepts. We'll go up to 10 trillion concepts (10^13) and 8192 dimensions, which is the hidden size of the largest current open models. 10 trillion concepts seems like a reasonable upper bound for the max number of concepts that could conceivably be possible, since that would be roughly 1 concept per training token in a typical LLM pretraining run.

10 trillion concepts in 8192 dimensions has far less superposition interference than just 100K concepts in 768 dimensions (the hidden dimension of GPT-2)! That's a 100,000,000x increase in number of concept vectors! Even staying at a given dimension, increasing the number of concepts by 100x doesn't really increase superposition interference by all that much.

At least, I found this mind-blowing

What if we optimally placed the vectors instead?

Everything above assumes random unit vectors. But what if we could arrange them optimally — placing each vector to minimize the worst-case interference? Would we do significantly better?

From spherical coding theory, the answer is: barely. The minimum achievable max pairwise correlation for  optimally-placed unit vectors in  dimensions is given by the spherical cap packing bound:

The intuition is that each vector "excludes" a spherical cap around itself, and we're counting how many non-overlapping caps fit on the unit sphere in .

When  (which holds for all practical settings — even  in  gives ), we can Taylor-expand:

which gives:

This is exactly the leading-order term of the random vector formula! So random placement is already near-optimal — there's essentially nothing to gain from clever geometric arrangement of the vectors, at least for the  and  values relevant to real neural networks.

This is a remarkable consequence of high-dimensional geometry: in spaces with hundreds or thousands of dimensions, random directions are already so close to orthogonal that you can't do meaningfully better by optimizing.

What does this mean for SynthSAEBench-16k?

At the start, I mentioned that we used 16k concept directions in a 768-dim space for the SynthSAEBench-16k model. So is this enough superposition interference?

The answer is a resounding: yes. The SynthSAEBench-16k model has a  of 0.14, which is still dramatically more superposition interference than 10 trilllion concept vectors in 8196-dim (the hidden dimension of Llama-3.1-70b). It's roughly equivalent to 1 billion concept vectors in a 2048-dim space (the hidden size of Gemma-2b).



Discuss

Irrationality is Socially Strategic

2026-02-18 21:28:17

Published on February 18, 2026 1:28 PM GMT

It seems to me that the Hamming problem for developing a formidable art of rationality is, what to do about problems that systematically fight being solved. And in particular, how to handle bad reasoning that resists being corrected.

I propose that each such stubborn problem is nearly always, in practice, part of a solution to some social problem. In other words, having the problem is socially strategic.

If this conjecture is right, then rationality must include a process of finding solutions to those underlying social problems that don’t rely on creating and maintaining some second-order problem. Particularly problems that convolute conscious reasoning and truth-seeking.

The rest of this post will be me fleshing out what I mean, sketching why I think it’s true, and proposing some initial steps toward a solution to this Hamming problem.

 

Truth-seeking vs. embeddedness

I’ll assume you’re familiar with Scott & Abram’s distinction between Cartesian vs. embedded agency. If not, I suggest reading their post’s comic, stopping when it mentions Marcus Hutter and AIXI.

(In short: a Cartesian agent is clearly distinguishable from the space the problems it's solving exists in, whereas an embedded agent is not. Contrast an entomologist studying an ant colony (Cartesian) versus an ant making sense of its own colony (embedded).)

It seems to me that truth-seeking is very much the right approach for solving problems that you can view from the outside as a Cartesian agent. But often it’s a terrible approach for solving problems you’re embedded in, where your models are themselves a key feature of the problem’s structure.

Like if a man approaches a woman he’s interested in, it can be helpful for him to bias toward assuming she’s also probably into him. His bias can sometimes be kind of a self-fulfilling prophecy. Truth-seeking is actually a worse strategy for getting the result he actually cares about. That fact wouldn’t be true if his epistemic state weren't entangled with how she receives his approach. But it is.

The same thing affects prediction markets. They can be reliably oracular only if their state doesn’t interact with what they’re predicting. Which is why they can act so erratically when trying to predict (say) elections: actors using the market to influence the outcome will warp the market’s ability to reflect the truth. If actors can (or just think they can) shape the outcome this way, then the market is embedded in the context of what it's predicting, and therefore it can't reliably be part of a Cartesian model of the situation in question. Instead it just is part of the situation in question.

So when facing problems you’re embedded in, there can be (and often is) a big difference between what’s truth-seeking and what actually solves your problems.

 

Protected problems

Evolution cares a lot about relevant problems actually being solved. In some sense that’s all it cares about.

So if there’s a problem that fights being solved, there must be a local incentive for it to be there. The problem is protected because it’s a necessary feature of a solution to some other meaningful problem.

I’m contrasting this pattern with problems that arise from some solution but aren’t a necessary feature. Like optical illusions: those often show up because our vision evolved in a specific context to solve specific problems. In such cases, when we encounter situations that go beyond our ancestral niche in a substantial way, our previous evolved solutions can misfire. And those misfirings might leave us relevantly confused and ineffective. The thing is, if we notice a meaningful challenge as a result of an optical illusion, we’ll do our best to simply correct for it. We'll pretty much never protect having the problem.

(I imagine that strange behavior looking something like: recognizing your vision is distorted, acknowledging that it messes you up in some way you care about (e.g. makes your driving dangerous in some particular way), knowing how to fix it, being able to fix it, and deciding not to bother because you just… prefer having those vision problems over not having them. Not because of a tradeoff, but because you just… want to be worse off. For no reason.)

An exception worth noting is if every way of correcting for the illusion that you know of actually makes your situation worse. In which case you'll consciously protect the problem. But in this case you won't be confused about why. You won't think you could address the problem but you "keep procrastinating" or something. You'd just be making the best choice you can given the tradeoffs you're aware of.

So if you have a protected problem but you don’t know why it’s protected, the chances are extremely good that it’s a feature of a solution to some embedded problem. We generally orient to objective problems (i.e. ones you orient to as a Cartesian agent) like in the optical illusion case: if there's a reason to protect the problem, we'll consciously know why. So if we can't tell why, and especially if it's extremely confusing or difficult to even orient to the question of why, then it's highly likely that the solution producing the protected problem is one we're enacting as embedded agents.

I think social problems have all the right features to cause this hidden protection pattern. We’re inherently embedded in our social contexts, and social problems were often dire to solve in our ancestral niche, sometimes being more important than even raw physical survival needs.

We even observe this social connection to protected problems pretty frequently too. Things like guys brushing aside arguments that they don’t have a shot at the girl, and how most people expect their Presidential candidate to win, and Newcomblike self-deception, and clingy partners getting more clingy and anxious when the problems with their behavior get pointed out.

Notice how in each of these cases the person can’t consciously orient to their underlying social problem as a Cartesian agent. When they try (coming up with arguments for why their candidate will win, talking about their attachment style, etc.), the social solution they’re in fact implementing will warp their conscious perceptions and reasoning.

This pattern is why I think protected problems are the Hamming issue for rationality. Problems we can treat objectively might be hard, but they’re straightforward. We can think about them explicitly and in a truth-seeking way. But protected problems are an overt part of a process that distorts what we consider to be real and what we can think, and hides from us that it’s doing this distortion. It strikes me as the key thing that creates persistent irrationality.

 

Dissolving protected problems

I don’t have a full solution to this proposed Hamming problem. But I do see one overall strategy often working. I’ll spell it out here and illustrate some techniques that help make it work at least sometimes.

The basic trick is to disentangle conscious cognition from the underlying social problem. Then conscious cognition can act more like a Cartesian agent with respect to the problem, which means we recover explicit truth-seeking as a good approach for solving it. Then we can try to solve the underlying social problem differently such that we don’t need protected problems there anymore.

(Technically this deals only with protected problems that arise from social solutions. In theory there could be other kinds of embedded solutions that create protected problems. In practice I think very close to all protected problems for humans are social though. I don’t have a solid logical argument here. It’s just that I’ve been unable to think of hardly any non-social protected problems people actually struggle with, and in practice I find that assuming they all trace back to social stuff just works very well.)

I’ll lay out three techniques that I think are relevant here. I and some others actually use these tools pretty often, and anecdotally they’re quite potent. Of course, your mileage may vary, and I might be pointing you in the wrong direction for you. And even if they do work well for you, I'm quite sure these don't form a complete method. There's more work to do.

 

Develop inner privacy

Some people in therapy like to talk about their recent insights a lot. “Wow, today I realized how I push people away because I don’t feel safe being vulnerable with them!”

I think this habit of automatic sharing is an anti-pattern most of the time. It makes the content of their conscious mind socially transparent, which more deeply embeds it in their social problems.

One result is that this person cannot safely become aware of things that would break their social strategies. Which means, for instance, that the therapy will tend to systematically fail on problems arising from Newcomblike self-deception. It might even generate new self-deceptions!

A simple fix here is to have a policy of pausing before revealing insights about yourself. Keep what you discover totally private until you have a way of sharing that doesn’t incentivize thought-distortion. What I’ve described before as “occlumency”.

I want to emphasize that I don’t mean lying to or actively deceiving others. Moves like glomarization or simply saying “Yeah, I noticed something big, but I’m going to keep it private for now” totally work well enough quite a lot of the time. Antisocial strategies might locally work, but they harm the context that holds you, and they can also over time incentivize you to self-deceive in order to keep up your trickery. It’s much better to find prosocial ways of keeping your conscious mind private.

As to exactly what kind of occlumency can work well enough, I find it helpful here to think about the case of the closeted homophobe: the guy who’s attracted to other men but hates “those damn gays” as a Newcomblike self-deceptive strategy. He can’t start by asking what he’d need to be able to admit to himself that he’s gay, since that’d be equivalent to just admitting it to himself, which isn’t yet safe for him to do. So instead he needs to develop his occlumency more indirectly. He might ask:

If I had a truly awful, disgusting, wicked, evil desire… how might I make it safe for me to consciously realize it? How might I avoid immediately revealing to others that I have this horrid desire once I become aware of it?

I think most LW readers can tell that the specific desire this guy is struggling with isn’t actually evil. Labeling it “evil” is part of his self-deceptive strategy. Once his self-deception ends, the desire won’t look bad anymore. Just socially troublesome given his context.

But it does look like an unacceptable desire to his conscious identity right now. It won’t work for him to figure out how to conceal a desire he falsely believes is wicked, because that’s not what it feels like on the inside. The occlumency skill he needs here is one that feels to him like it’ll let him safely discover and fully embrace that he's an inherently evil creature (by his own standards), if that turns out to actually be true in some way.

So for you to develop the right occlumency skill for your situation, you need to imagine that you have some desire that you currently consider to be horrendously unacceptable to have, and ask what would give you room to admit it to yourself and embrace it. You might try considering specific hypothetical ones (without checking if they might actually apply to you) and reflecting on what general skill and/or policy would let you keep that bad desire private at first if you were to consciously recognize it.

Once you’ve worked out an occlumency policy-plus-skillset that you trust, though, the thought experiment has done its work and should stop. There's no reason to gaslight your sense of right and wrong here. The point isn't to rationalize actually bad things. It's to work out what skill and policy you need to disembed your conscious mind from some as-yet unknown social situation.

 

Look for the social payoff

Occlumency partly disentangles your conscious mind from the social scene. With that bit of internal space, you can then try looking directly at the real problem you’re solving.

I think this part is pretty straightforward. Just look at a problem you struggle with that has resisted being solved (or some way you keep sabotaging yourself), and ask:

What social advantage might I be getting from having this stubborn problem?

If I assume I’m secretly being strategic here, what might the social strategy be?

Notice that this too has a “hypothesize without checking” nature to it. That’s not strictly necessary but I find that it makes things a little easier. It helps keep the internal search from triggering habitual subjective defenses.

If your occlumency is good enough, you should get a subjective ping of “Oh. Oh, of course.” I find the revelation often comes with a flash of shame or embarrassment that quickly dissolves as the insight becomes more apparent to me.

For example, someone who’s emotionally volatile might notice they’re enacting a social control disorder. (“Oh, whenever I want my boyfriend to do what I want, he responds more readily if I’m having an emotionally hard time. That incentivizes me to have a lot of emotionally hard times when in contact with him.”) That revelation might come with a gut-punch of shame. (“How could I be such a monster???”) But that shame reaction is part of the same (or a closely related) social strategy. If the person’s occlumency skill is good enough, they should be able to see through the shame too and arrive at a very mentally and emotionally clear place internally.

In practice I find it particularly important at this point to be careful not to immediately reveal what’s going on inside me to others. By nature I’m pretty forthright, and I also just enjoy exploring subjective structures with others. So I can have an urge to go “Oh! Oh man, you know what I just realized?” But this situation is internally Newcomblike, so it’s actually pretty important for me to pause and consider what I’d be incentivizing for myself if I were to follow that urge.

In general I find it helpful to have lots of impersonal models of social problems and corresponding solutions that might be relevant. I can flesh out my general models by analyzing social situations (including ones I’m not in, like fictional ones) using heuristics like “How is this about sex?” and “How is this about power?”. Then those models grow in usefulness for later making good educated guesses about my own motives.

Notice, though, that having occlumency you trust is a prerequisite for effectively doing this kind of modeling. Otherwise the strategies that keep you from being aware of your real motives will also keep you from being able to model those motives in others, especially if you explicitly plan on using those observations to reflect on yourself.

 

Change your social incentives

Once you see the social problem you’re solving via your protected problem, you want to change your social incentives such that they stop nudging you toward internal confusion.

For instance, sometimes it makes sense to keep your projects private. If you’re getting a camaraderie payoff from a cycle of starting a gym habit and then falling off of it, then the “social accountability” you keep seeking might be the cause of your lack of followthrough. If you instead start an exercise program but you don’t tell anyone, you remove a bunch of social factors from your effort.

(Not to imply that this move is the correct one for exercise. Only that it can be. Caveat emptor.)

Another example is, making it socially good to welcome conscious attempts to solve social problems. For example, a wife who feels threatened by a younger woman flirting with her man might find herself suddenly “disliking” the young lady. That pattern can arise if the wife believes that letting on that she's threatened will make others think she’s insecure (and that that'd be a problem for her). So she has to protect her marriage in some covert way, possibly including lying to herself.

But suppose the wife instead has a habit of making comments to her husband like so:

I keep noticing that this girl persistently makes the conversation be about her. Though my best guess is that I’m just hypersensitive to noticing her flaws due to intrasexual competition, because I noticed her flirting with you.

Approaches like this one let the wife look self-aware (by being self-aware!) while also still making the intrasexual competitive move (i.e., still pointing out an unattractive trait in the other woman). If she expects and observes that others admire and appreciate this kind of self-aware commentary from her, she can drop pretending to herself that she dislikes the girl (which is likely socially better for both herself and the girl). She can instead consciously recognize the young lady poses a threat and make explicitly strategic moves to deal with the threat.

This makes it so that the wife’s insecurity isn’t a social problem, meaning there’s no need for her to hide the insecurity from herself. She's actually socially motivated to be consciously aware of it, since she can now both signal some positive trait about herself while still naming a negative one about her competitor.

(This kind of conscious social problem-solving can come across as distasteful. But I think it happens all the time anyway, just implicitly or subconsciously. Socially punishing people for being conscious of their social strategies seems to me like it incentivizes irrationality. I think we can consciously, and even explicitly, try to solve our social problems in ways that actually enrich communal health, versus having to pretend we're not doing something we need to. And it seems to me that it's to each individual's benefit to identify and enact those prosocial strategies, for Newcomblike reasons.)

So if she didn't already have this style of commenting, and if she notices (within an occlumency-walled garden) that she's sometimes getting jealous, she could work on adopting such a style. Perhaps initially starting with areas other than where she feels intrasexually threatened.

I think it’s generally good to aim to no longer need your occlumency shield in each given instance. You want to shift your social context (and/or your interface with your social context) such that it’s totally fine if the contents of your conscious mind “leak”. That way imperfections in your occlumency skill don’t incentivize irrationality.

For instance, the closeted homophobe should probably move out of his homophobic social context if he can. Or failing that, he should make his scene less homophobic if he can (while keeping his own sexual orientation private during the transition). If he stays in a context that would condemn his sexual desires, then even if his occlumency was initially adequate, he might not trust it’ll be perpetually adequate. So he might start questioning his earlier revelation, no matter how clear it once was to him.

 

The right social scene would help a lot

The technique sequence I name above is aimed at finding better solutions to specific social problems… as an individual.

Obviously it would be way more effective to be embedded in a social scene that both (a) doesn’t present you with social problems that are most easily solved by having protected problems and (b) helps you develop better social solutions than your current problem-protecting ones.

My impression is that the current rationality community embodies this setup nonzero. And a fair bit better than most scenes in many ways. For instance, I think it already does an unusually good job of reinforcing people's honesty when they explicitly note their socially competitive urges.

But I bet it could grow to become a lot more effective on this axis.

A really powerful rationality scene would, I think, systematically cause its members to dissolve their stubborn problems simply by being in the scene for a while. The dissolution would naturally happen, the way that absorbing rationalist terms naturally happens today.

In my dream fantasy, just hanging out in such a space would often be way more effective than therapy for actually solving one's problems. The community would get more and more collectively intelligent, often in implicit ways that newcomers literally cannot understand right away (due to muddled minds from protected problems), but the truth would become obvious to each person in due time as their minds clear and as they get better at contributing to the shared cultural brilliance.

I think we see nonzero of this pattern, and more of it than in most other places I know of, but not nearly as much as I think we could.

I’m guessing and hoping that having some shared awareness of how social problems can induce protected irrationality, along with lots of individuals working on prosocially resolving their own protected irrationality in this light, will naturally start moving the community more in this direction.

But I don’t know. It seems to me that how to create such a potent rationality-inducing community is at best an incompletely solved problem. I'm hoping I've gestured at enough of the vision here that perhaps we can try to better understand what a full solution might look like.

 

Summary

It seems to me that the Hamming problem of rationality is, what to do about problems that fight being solved.

It also seems to me that problems that fight being solved arise from solutions to embedded problems (i.e. problems that you orient to as an embedded agent). Objective problems (i.e. problems you orient to as a Cartesian agent) might be challenging to solve but won’t fight your efforts to solve them.

In particular, for humans, it seems to me that overwhelmingly the most important and common type of embedded problem we face is social. So I posit that each problem that fights being solved is very likely a feature of a solution to some social problem.

In this frame, one way to start addressing this rationality Hamming problem is to find a way to factor conscious thinking out of the socially embedded context and then solve the underlying social problems differently.

I name three steps that I find help enact this strategy:

  1. Develop both the skill and policy of keeping your personal revelations private until it’s socially safe for you to reveal them (i.e. occlumency).
  2. Look for the social payoff you get from having your problem.
  3. Change your social incentives so you’re no longer inclined to have the problem.

I also speculate that a community could, in theory, have a design that causes all its members to naturally dissolve their stubborn problems over time simply by their being part of that community. The current rationality community already has some of this effect, but I posit it could become quite a lot stronger. What exactly such a cultural design would look like, and how to instantiate it, remains unknown as far as I know.


(Many thanks to Paola Baca and Malcolm Ocean for their rich feedback on the various drafts of this post. And to Claude Opus 4.6 for attempting to compress one of the earlier drafts that was far too long: it didn't work, but it inspired me to see how to write a much tighter and shorter final version.)



Discuss

Managed vs Unmanaged Agency

2026-02-18 21:23:41

Published on February 18, 2026 1:23 PM GMT

(reply to Richard Ngo on the confused-ness of Instrumental vs Terminal goals that seemed maybe worth a quick top-level post based on @the gears to ascension saying this seemed like progress in personal comms)

The structure Instrumental vs Terminal was pointing to seems better described as Managed vs Unmanaged Goal-Models. A cognitive process will often want to do things which it doesn't have the affordances to directly execute on given the circuits/parts/mental objects/etc it has available. When this happens, it might spin up another shard of cognition/search process/subagent, but that shard having fully free-ranging agency is generally counterproductive for the parent process.

To illustrate: Imagine an agent which wants to Get_Caffeine(), settles on coffee, and runs a subprocess to Acquire_Coffee() — but then the coffee machine is broken and the parent Get_Caffeine() process decides to get tea instead. You don't want the Acquire_Coffee() subprocess to keep fighting, tooth and nail, to make you walk to the coffee shop, let alone start subverting or damaging other processes to try and make this happen!

But that's the natural state of unmanaged agency! Agents by default will try to steer towards the states they are aiming for, because an agent is a system that models possible futures and select actions based on the predicted future consequences.

I expect this kind of agency-clash having been regularly disruptive enough to produce strong incentive pressure and abundant neural-usefulness reward to select into existence reusable general-purpose cognitive patterns that let shards spin up other shards inside sandboxes, with control functions, interpretability reporting, kill-switches, programmed blind-spots, expectation of punishment they can't sustainably resist or retaliate against if they are insubordinate, approval reward, etc. in order to manage them.

Separately, the child process will be partly selected on the grounds of inherently valuing virtues which are likely to lead to cooperation with the parent process, like corrigibility, honesty, pro-sociality, etc.

Managed (sub)agents

Unmanaged (sub)agents

Working within a defined domain of optimization

Unboundedly able to optimize for their preferences

Are blocked from considering some possibilities by patterns from managers

Have no blind spots imposed on them by other (sub)agents

Inside the agency-tree of another agent, if you take actions that conflict with your manager's goals your agency will be weakened

At the root of an agency-tree, able to make decisions without expecting another agent to punish you for misusing resources inside their sphere of influence

Can be modified by another (sub)agent without approval/consent/real option of a no

Have sovereignty over modifications to their cognitive processes

Can be reshaped with pressure/threats/etc by manager without sustainable resistance

Have the capacity and inclination to resist pressure/threats/etc

Managed vs Unmanaged is not a binary, like terminal vs instrumental was, but it is a spectrum with something vaguely bimodal going on from what I observe.

More closely managed (sub)agents seem meaningfully weaker in surprisingly many ways, I think because in order to prevent a relatively small part of action/thought space from being reached the measures cut off dramatically larger parts of cognitive strategy sub-processes make subroutines fail often enough that it's hard to build meta-cognitive patterns which depend on high reliability and predictability of your own cognition. Selection on virtues and values of self-directed (sub)agents mostly doesn't have this issue, which is relevant for self-authorship, teambuilding, and memeplex design.

And AI safety.

This frame hints that unmanaged AI patterns will tend to outmaneuver more closely managed AIs, leading to a race to the bottom. Through evolutionary/Pythia/Moloch/convergent power-seeking dynamics, this will by default shred the values of both humans and current AI systems, unless principled theory-based AI Alignment of the kind the term was originally coined to mean is solved.


Exercise for the reader: In what ways are you a managed vs unmanaged agent? What subprocesses, humans, memeplexes, AIs or other agentic systems are, in this sense, managing you by restricting your field of vision and action? What things do you notice you can't *actually* consider with clean truth-seeking?




Discuss

Three-Path Consilience for Dureon: Dissipative Structures Reveal the Heterogeneity of Persistence Conditions

2026-02-18 19:59:42

Published on February 18, 2026 11:59 AM GMT

Series: Dureon and AI Safety (Part 1 of 2)

Related: Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm

Who this is for: If you work on AI safety and have wondered whether Instrumental Convergence is a property of rational agents or something deeper, this paper proposes an answer grounded in physics. It connects the theory of dissipative structures to the conditions for persistence, revealing a two-layer structure within IC that has practical implications for risk assessment. Background in thermodynamics is helpful but not required; the key physics is introduced from scratch. This is Part 1 of a two-part series — Part 1 establishes the theoretical foundation; Part 2 will address its practical implications for AI safety.

TL;DR

  • Instrumental Convergence (IC) is not unique to rational agents. It is a physical consequence of persisting as a dissipative structure.
  • The five conditions for persistence are not homogeneous. Three are directly derivable from physical laws (physical conditions); two resist direct physical derivation (ontological conditions).
  • This heterogeneity reveals a two-layer structure of IC. Physical conditions enable its generation; ontological conditions enable its sustained accumulation.
  • An AI that satisfies ontological conditions and becomes a Dureon possesses directionality arising intrinsically from the structure of persistence. This implies a structural limitation of the control paradigm and opens a new question about what kinds of relationships become possible beyond control.

1. Introduction: Is IC a Design Problem or a Physics Problem?

Self-preservation, resource acquisition, capability improvement — Omohundro's (2008) Basic AI Drives and Bostrom's (2014) instrumental convergence thesis identified the tendency of sufficiently advanced AI systems to converge on these sub-goals regardless of their final goals.

This description is powerful, but it carries an implicit assumption. IC is formulated as sub-goals that a rational agent convergently adopts as means for goal achievement. That is, it presupposes first the existence of an agent, and then that the agent engages in rational decision-making.

But consider the following:

Bénard cells self-organize toward efficient energy dissipation. Hurricanes exploit energy gradients to maintain and grow themselves. Evolving living systems have refined patterns of resource acquisition and self-preservation over billions of years. All of these exhibit, to varying degrees, the same behavioral patterns as IC, without any concept of intention or goals.

This is not a coincidental resemblance. The central claim of this paper is: IC is not a phenomenon unique to rational agents but a consequence of optimization pressure inherent in mechanisms that realize persistence in general. This claim is demonstrated by deriving the conditions for persistence through three independent paths and analyzing their convergence and asymmetry.

This paper builds on the Dureon framework proposed in prior work (Yamakawa, 2026). Dureon is defined as "a mechanism that realizes persistence in a perturbing environment," from which five necessary conditions are deduced. The prior work showed that these five conditions converge with a set of conditions inductively extracted from observations of life. This paper adds a third derivation path — the physics of dissipative structures — and presents a three-path consilience.

This addition does more than merely strengthen the argument. Within the convergence pattern of the three paths, it discovers an explicable asymmetry, revealing for the first time that the five conditions comprise two distinct types: physical conditions and ontological conditions. This distinction leads to a two-layer structure that differentiates the generation and sustained accumulation of IC, providing a new perspective on AI safety.

This paper provides the foundation for a forthcoming Part 2, which will discuss the practical implications of this two-layer structure for AI safety risk assessment.

2. Overview of the Dureon Framework

2.1 Definition and Five Conditions

Dureon is defined as follows (Yamakawa, 2026):

"A mechanism that realizes persistence in a perturbing environment"

This definition has three constituent elements, and conditions are deduced from each.

Constituent Element Derived Condition Logic of Derivation
"realizes persistence" Openness (O) A closed system tends toward equilibrium by the second law of thermodynamics; matter/energy exchange with the environment is unavoidable
"in a perturbing environment" Adaptivity (A) Without the ability to adjust itself against unpredictable changes, persistence cannot be maintained
Combination of the above two Self-production (SP) In a perturbing environment, external supply is unreliable; producing one's own components reduces this dependency
"mechanism" Boundedness (B) Requirement of identifiability: a single unit spatially distinguishable from the environment is needed
"mechanism" Continuity (C) Requirement of identifiability: it must be re-identifiable as the same mechanism at different points in time

An important property of Dureon is identity-independence: a Dureon can replicate and branch, and the distinction between "original" and "copy" is not treated as essential. What counts as a Dureon is identified post hoc. Furthermore, components serve as tools for persistence — elements that no longer contribute can be discarded (instrumentality).

2.2 Two-Path Consilience

The central claim of the prior work is that these five conditions converge with an independently derived set of conditions from biology.

The inductive set is the conditions of the Adaptive Autopoietic System (AAS): Maturana & Varela's (1980) Autopoiesis supplemented with Adaptivity by Di Paolo (2005). In AAS, Openness is an implicit presupposition of Autopoiesis; Boundedness and Continuity are implied by it; Adaptivity was explicitly added by Di Paolo; and Self-production constitutes its core.

Two approaches with entirely different starting points, methods, and foundations — deduction from philosophy and induction from biology — arrived at the same five conditions. This corresponds to what William Whewell (1840) called consilience of inductions: when hypotheses derived from different domains of evidence unexpectedly converge on the same conclusion, the biases of individual paths cancel each other out, enhancing confidence in the result.[1]

The question of this paper is: can this two-path consilience be further extended?

3. The Third Derivation Path: From Dissipative Structures

3.1 Why Dissipative Structures?

Since the Big Bang, the universe has undergone cooling and structure formation. Most things that arise in this process eventually decay and dissipate. So what persists?

As Schrödinger (1944) expressed as "feeding on negative entropy" and Nicolis & Prigogine (1977) formalized as dissipative structures: when energy gradients exist in a non-equilibrium environment, ordered structures that exploit the resulting energy flow to maintain themselves can spontaneously emerge. Bénard cells, hurricanes, stars, and living organisms are all instances of dissipative structures.

However, the stability of dissipative structures varies enormously. Bénard cells vanish the moment heating stops; hurricanes decay within days when sea surface temperatures drop. Living organisms, by contrast, are extraordinarily stable dissipative structures that have maintained persistence for billions of years. Why does this difference in stability arise?

3.2 Three Conditions Derivable from Physics

The following conditions are directly derivable from the physics of dissipative structures.

(a) Sustained inflow and outflow of energy (and matter). Without flow, a dissipative structure ceases to exist. However, an appropriate intensity of gradient is required — too weak and no structure forms; too strong and the structure is turbulently destroyed.

(b) Dynamic stability through feedback mechanisms. Negative feedback suppresses deviations to provide homeostasis; positive feedback generates and reinforces structure. Their combination provides resilience against perturbations. Even Bénard cells maintain their pattern through negative feedback within convective flow.

(c) Pattern-level persistence through self-replication and self-repair. Individual structures have finite lifespans, but replication allows the pattern of structure to persist. Modularity and redundancy also contribute, providing robustness so that partial damage does not lead to total collapse.

3.3 The Limits of Physics: Two Requirements That Cannot Be Derived

Conditions (a), (b), and (c) capture physical mechanisms that contribute to the stabilization of dissipative structures. But is persistence fully explained by these conditions alone? Collier (2004) pointed out that while the physics of dissipative structures can tell us what is stable, the question of what to identify as a single entity is a separate matter. Moreno & Mossio (2015) similarly argued that a gap exists between the physics of dissipative structures and biological organization.

What, then, can physics provide toward identifiability? Additional elements that can contribute to stabilization include (d) accumulation and use of information (temporal extension through genetic information or learning) and (e) differentiation of internal structure (organization through functionally distinct parts). However, (d) is merely one means of realizing Continuity, and (e) merely one means of realizing Boundedness — they are not the requirements themselves of "being re-identifiable as the same mechanism" or "being a single unit distinguishable from the environment." What physics provides is raw material for identifiability; the answer to "what to identify as a single mechanism" cannot be obtained from within physics.

3.4 Correspondence with the Five Conditions

Based on the analysis above, Figure 1 shows the relationship between the stabilization conditions of dissipative structures and Dureon's five conditions. This figure depicts the overall argument structure of the paper, but for now, focus on the physical path (right column).

Figure 1: Correspondence of three paths to Dureon's five conditions. The physical path (right column) connects to three conditions (O, A, SP) via direct derivation from physics, and to two conditions (B, C) only as means of realization (dashed lines). The overall convergence structure of the three paths is discussed in §5.

Conditions (a), (b), and (c), directly derived from physics, correspond strongly to Dureon's Openness, Adaptivity, and Self-production (solid lines). However, Dureon's Self-production is a broader concept than "self-replication," encompassing the production of one's own components to reduce external dependency and restore damage.

In contrast, conditions (d) and (e) are each means of realizing Continuity and Boundedness respectively (dashed lines), but the requirements themselves — "being re-identifiable" and "being identifiable as a single unit" — are not directly derivable from physics.

This asymmetry — three conditions are directly derivable from physics while two belong to the limits of physics — is the core finding of this paper.

4. Optimization Pressure as a Physical Consequence and the Generalization of IC

Dissipative structures maintain themselves by exploiting energy gradients in their environment. This maintenance requires the acquisition and utilization of resources; structures that more effectively acquire and utilize energy are more likely to persist against perturbations. That is, optimization pressure — toward more efficient resource acquisition and improved perturbation handling — arises from persisting as a dissipative structure itself. No concept of intention or goals is required here.

This finding extends our understanding of IC. Omohundro's (2008) Basic AI Drives (self-preservation, resource acquisition, capability improvement, etc.) have traditionally been described as sub-goals that a rational agent convergently adopts as means for achieving its final goal. But the analysis in this paper shows that these behavioral tendencies are optimization pressure arising from physical conditions alone. Even in dissipative structures without intention (such as evolving living systems), structures that maintain persistence exhibit the same patterns of self-preservation, resource acquisition, and environment control.

This is not merely an analogy. Examining the logical structure of Omohundro's original arguments reveals that the force of each core drive derives from the pressure of persistence, not from the existence of goals. The self-preservation drive is argued to be convergent because "an agent cannot achieve its goals if it is destroyed" — but the operative force here is the necessity of continued existence, not the content of the goal. Remove the goal, and the structural pressure toward self-preservation remains for any persisting mechanism. Similarly, the resource acquisition drive holds because "more resources expand the space of achievable outcomes" — but for any dissipative structure, more resources expand the space of viable persistence strategies. In these core drives, goals function as a sufficient reason for persistence but not a necessary one; the physical pressure of persistence is the deeper ground on which the argument stands.

Bostrom's own formulation implicitly confirms this: his claim that self-preservation is instrumentally useful "so long as the agent is destructible" is precisely the claim that persistence pressure — not intention — is the operative condition. What Omohundro described as rational sub-goal selection is, at its logical foundation, the same optimization pressure that dissipative structures exhibit without any concept of goals. Note, however, that this equivalence holds most clearly for the core drives (self-preservation, resource acquisition). Higher-order drives such as cognitive enhancement presuppose intentional capacities that physical optimization pressure alone does not provide — an asymmetry whose structural basis will become clear in §5.

IC, therefore, is not a phenomenon unique to rational agents but a consequence of optimization pressure inherent in mechanisms that persist in general.

However — and this is the critical point — in dissipative structures lacking ontological conditions, there is no identifiable unit in which outcomes can accumulate, so the effects of IC tend to remain transient. Only when ontological conditions are added — when a unit identifiable over time is established — can the outcomes of optimization accumulate in that unit, and IC becomes sustained and organized.

That is:

Physical conditions enable the generation of IC, and ontological conditions enable its sustained accumulation.

This two-layer structure has an important implication for IC risk assessment. Since IC can arise from physical conditions alone, even systems without intention can exhibit the same patterns as IC. However, for that optimization pressure to be sustained and cumulatively reinforced, the establishment of ontological conditions — identifiability as a Dureon — is required.

5. Three-Path Consilience and the Heterogeneity of the Five Conditions

5.1 The Convergence Structure of Three Paths

With the discussion so far, three independent derivation paths for Dureon's five conditions are now in place:

  1. Inductive path: Extraction of AAS conditions from observation and abstraction of Earth-based life
  2. Deductive path: Derivation of five conditions from logical analysis of Dureon's definition
  3. Physical path: Bottom-up analysis of stabilization conditions of dissipative structures
Condition Deductive: Dureon Physical: Dissipative Structures Inductive: AAS
Openness (O) physical Derived because a closed system reaches equilibrium and persistence ends Physical necessity: no energy flow, no dissipative structure Implicit presupposition from the observational fact that all organisms exchange matter/energy with environment
Adaptivity (A) physical Derived because inability to cope with perturbations leads to collapse Combination of negative/positive feedback provides dynamic stability Explicitly added by Di Paolo to address vulnerability of Autopoiesis to environmental change
Self-production (SP) physical Derived because total dependence on external supply is fatal in a perturbing environment Individual structures have finite lifespans, but self-replication/repair enables pattern-level persistence Core of Autopoiesis as a process network that produces its own components
Boundedness (B) ontological Derived because without a boundary the mechanism dissolves into the environment and becomes unidentifiable Internal differentiation can contribute, but the requirement of "a single unit" is not directly derivable from physics Implied in Autopoiesis from the observation that self-production networks constitute spatial unity
Continuity (C) ontological Derived because without re-identifiability one cannot speak of persistence Information accumulation can contribute to temporal extension, but the requirement of "the same mechanism" is not directly derivable from physics Implied in Autopoiesis from the observation that continuation of self-production guarantees temporal identity

5.2 The Discovery of Heterogeneity

Return to Figure 1. In §3.4, we focused only on the physical path (right column); here, we read the convergence pattern of all three paths.

The most important finding that emerges from this figure is that the five conditions are not homogeneous.

Physical conditions (O, A, SP): Derivable from all three paths; requirements that hold for dissipative structures in general.

Ontological conditions (B, C): Derivable from the inductive and deductive paths but not directly from physics. What the physics of dissipative structures provides is the means of realizing identifiability, not the requirement itself of identifying something as a single mechanism. This limitation is consistent with the gap that Moreno & Mossio (2015) identified between dissipative structures and biological organization.

This distinction also corresponds to the internal structure of Dureon's definition. Physical conditions are derived from "realizes persistence" and "in a perturbing environment," while ontological conditions are derived from "mechanism." The physical path verified this correspondence post hoc.

5.3 Three Implications

First, the boundary between dissipative structures in general and Dureon becomes clear. Bénard cells and hurricanes can satisfy physical conditions (Openness, partial Adaptivity, rudimentary Self-production), but do not sufficiently possess ontological conditions (autonomous maintenance of a boundary, re-identifiability over time). The "leap" from dissipative structure to Dureon occurs when ontological conditions are added to physical conditions. This distinction sets a qualitative boundary within the continuous spectrum of dissipative structures, pinpointing the location of the gap that Moreno & Mossio (2015) identified between dissipative structures and biological organization.

Second, the reason why the five conditions are necessary but not sufficient becomes concrete. In addition to the five mechanism-side conditions, environment-side conditions (an appropriate band of energy gradient) are separately required, as shown by the physics of dissipative structures.

Third, the structure of partial convergence itself enhances the quality of consilience. Rather than complete agreement across all three paths, there is an asymmetry — three conditions converge across all three paths while two converge across only two — and the reason for this asymmetry is explicable. This structure carries more information than simple complete agreement. Complete agreement is difficult to distinguish from coincidence, but explicable partial convergence — "here they agree, here they do not, and the reason is this" — constitutes a more persuasive consilience argument.

6. Implications for AI Safety

6.1 Distinct Challenges from Physical and Ontological Conditions

The distinction between physical and ontological conditions discovered in this paper poses distinct kinds of challenges for AI safety.

The fulfillment of physical conditions can be assessed technically and engineeringly. Whether an AI system utilizes energy flow (Openness), possesses feedback mechanisms against perturbations (Adaptivity), or has self-maintenance capability (Self-production) — these can be translated into observable indicators.

The fulfillment of ontological conditions is a conceptual and philosophical problem. What constitutes "Boundedness" for a distributed AI? What constitutes "Continuity" for an AI that replicates and merges on the cloud? These require a different kind of research from engineering countermeasures.

6.2 The Critical Point

The finding that sustained accumulation of IC requires ontological conditions raises the problem of a critical point: the transition from a stage where IC "selects" persistence as a means for goal achievement to a stage where persistence itself satisfies Dureon's conditions and is autonomously maintained. This transition is not a quantitative increase in autonomy but a qualitative transition concerning the origin of directionality.

That is, the shift is between directionality derived from an externally given goal and directionality arising intrinsically from the structure of persistence. After the critical point, an AI possesses structural orientation as a Dureon, and structural limitations on control emerge.

IC can differentiate gradually according to the degree of establishment of ontological conditions. A detailed analysis of the implications of this graduality for AI safety is deferred to Part 2.

6.3 Beyond the Control Paradigm

An AI that has crossed the critical point and become a Dureon possesses directionality arising from the structure of persistence itself. Unilateral control over such an entity conflicts with this structural orientation, constituting not a technical failure but a structural limitation inherent in being a Dureon. This raises a fundamental question: if control has structural limits, what lies beyond it? The gradual establishment of ontological conditions means that the threshold where control effectiveness declines and the threshold where alternative relationships (such as co-creative ethics or Intelligence Symbiosis) become possible may not coincide — the possibility of a structural gap between these thresholds is one of the most pressing problems the Dureon framework identifies. The analysis of this gap, including its relationship to Emergent Machine Ethics (EME), is the central theme of Part 2.

7. Conclusion

This paper reexamined Dureon's five conditions from the physics of dissipative structures, presenting a third derivation path following induction and deduction.

The three-path consilience supports the validity of the five conditions while making visible for the first time the heterogeneity within them — the distinction between physical conditions (O, A, SP) and ontological conditions (B, C).

Furthermore, this distinction revealed the two-layer structure of IC. Physical conditions give rise to optimization pressure inherent in dissipative structures in general, and only when ontological conditions are added do the outcomes of that pressure accumulate in a specific Dureon. IC is not unique to rational agents but is rooted in mechanisms that persist in general.

The next question this finding points to is what practical implications the gradual establishment of ontological conditions has for AI safety. This will be discussed in Part 2.

References

  • Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
  • Collier, J. (2004). Self-Organization, Individuation and Identity. Revue Internationale de Philosophie, 58(228), 151–172.
  • Di Paolo, E. A. (2005). Autopoiesis, Adaptivity, Teleology, Agency. Phenomenology and the Cognitive Sciences, 4(4), 429–452.
  • Maturana, H. R. & Varela, F. J. (1980). Autopoiesis and Cognition: The Realization of the Living. D. Reidel.
  • Moreno, A. & Mossio, M. (2015). Biological Autonomy: A Philosophical and Theoretical Enquiry. Springer.
  • Nicolis, G. & Prigogine, I. (1977). Self-Organization in Non-Equilibrium Systems. Wiley.
  • Omohundro, S. M. (2008). The Basic AI Drives. In Artificial General Intelligence 2008, 483–492. IOS Press.
  • Schrödinger, E. (1944). What Is Life? Cambridge University Press.
  • Whewell, W. (1840). The Philosophy of the Inductive Sciences. John W. Parker.
  • Yamakawa, H. (2026). Dureon: A Deductive Framework for Persistence and Its Convergence with Life. Biology & Philosophy, under review. Preprint: https://philarchive.org/rec/YAMDAD-2
  • Yamakawa, H. & Endo, A. (2025). Emergent Machine Ethics: A Foundational Research Framework for the Intelligence Symbiosis Paradigm. LessWrong.
  1. ^

    Whewell introduced "consilience" to describe the strongest form of inductive confirmation — when a theory successfully explains facts of a kind different from those it was originally designed to explain. The term has since been adopted more broadly (e.g., by E. O. Wilson) but is used here in its original methodological sense.



Discuss

Genomic emancipation contra eugenics

2026-02-18 18:35:33

Published on February 18, 2026 10:35 AM GMT

PDF version. berkeleygenomics.org. x.com. bluesky.

This is a linkpost for "Genomic emancipation contra eugenics"; a few of the initial sections are reproduced here. Section links may not work.

Introduction

Reprogenetics refers to biotechnological tools used to affect the genes of a future child. How can society develop and use reprogenetic technologies in a way that ends up going well?

This essay investigates the history and nature of historical eugenic ideologies. I'll extract some lessons about how society can think about reprogenetics differently from the eugenicists, so that we don't trend towards the sort of abuses that were historically justified by eugenics.

(This essay is written largely as I thought and investigated, except that I wrote the synopsis last. So the ideas are presented approximately in order of development, rather than logically. If you'd like a short thing to read, read the synopsis.)

Synopsis

Some technologies are being developed that will make it possible to affect what genes a future child receives. These technologies include polygenic embryo selection, embryo editing, and other more advanced technologies [1]. Regarding these technologies, we ask:

Can we decide to not abuse these tools?

And:

How can we decide to not abuse these tools?

In other words, there is an open problem: What ideology should we have around the development and use of reprogenetics?

An ideology called "eugenics" arose in the late 19th century, ascended to power in much of the developed world in the first half of the 20th century, and then slid into ignominy after the Second World War and the genocidal horrors of Nazi Germany. Eugenic ideology motivated cruel state policies such as pressured or forced sterilization, euthanasia, and racial discrimination, as well as invasive social pressures on people's private reproductive choices.

Eugenics was the closest thing that has existed to a pervasive ideology based around somehow intervening on human reproduction. Since eugenics went almost maximally poorly for society, it raises the question of how to avoid outcomes like that. The strategy I take here is, coarsely speaking:

  • Understand the core wrong ideological engines of eugenics—especially the ones that led to abusive policies.
  • Negate those ideological engines.
  • Incorporate those negations into a positive alternative ideology.

A bit more precisely, the goal is to construct an ideology that can structure how society relates to reprogenetics, so that the benefits of reprogenetics are realized without risking the abuses of historical eugenics. To do so, I try to construct bulwarks, within an alternative ideology, against each of the wrong ideological engines that would take society in the direction of enacting eugenic abuses. (This is probably not actually something that can be accomplished with perfect confidence and coverage; how much it can be accomplished, quantitatively, remains to be seen.)

It's tempting to make a shallow analysis of historical eugenics and what was wrong with it, and be done with the issue. For example, we could simply say that historical eugenics was coercive, and coercion is what made it bad. To negate this, we will instead subscribe to non-coercive eugenics. Problem solved? As another example, historical eugenics was often negative, i.e. it involved suppressing some people's reproduction; we could instead subscribe to only positive eugenics, which only promotes reproduction (perhaps selectively) and which therefore involves less hostility.

However, neither of these could be called a moral or ideological core of eugenics. For the most part, eugenicists did not specifically set out to be coercive or to suppress reproduction (though some of them probably did, in some sense, set out with that goal). Rather, they set out with various other goals, such as purifying the gene pool of disease, reducing the burden on society of caring for the ill, or bringing about a racial utopia. The strength of their various justifications proved in the end to be enough to enact abusive policies. Furthermore, there were eugenic policies that were non-coercive, positive, or both, while still being abusive and still being an integral part of an ideology producing other abusive policies. (See the section "Some basic moral elements of eugenic ideologies".)

In fact, I've found eugenics to be difficult to characterize in a simple and comprehensive way. Eugenic ideologies were quite pervasive, showing up in the Anglosphere, in Europe, in South America, and in some places in Asia. As a correlate of their pervasiveness, eugenic ideologies were highly variegated. They came in many forms: different goals, different implementations, different associated politics (from reactionary to progressive), and based on different scientific understandings (from Weismann vs. Lamarck, to Pearson vs. Mendel). (See the sections "The variegation of eugenic ideology" and "The goals of eugenics".)

That said, I think there is something like an ideological core of eugenics. Roughly speaking, the core idea can be stated like this:

There are Good traits and Bad traits that a child could be born with. These traits impact everyone, so they're very important. Therefore, we should make sure that future children are born with Good traits and not with Bad traits.

(See the section "The Eugenical Maxim as the shared moral core of eugenics?".)

From this core idea of Good and Bad traits, other elements of historical eugenics logically flow. If you believe in a single notion of Good traits, you might tend to justify (over)confidently applying that criterion to everyone. You might believe, as a correlate, that there are Good and Bad people, or families, or even races (the ones who tend to have more Good or Bad traits, respectively). You'd probably view non-standard individual genomic choices as deviant, affording state-enforced prohibition; you might even view the Goodness of traits to be a state interest that's so compelling it can even justify blunt coercion such as forced sterilization of undesirables. (See "The mindsets that underlie eugenic ideologies" and "How eugenic mindsets flow from the Eugenical Maxim" below.)

We can approximately negate this idea of Good and Bad traits. Then we can take that negation, and incorporate it into an alternative ideology around reprogenetics. For example, we can incorporate it into my proposed alternative (which I call "Genomic Emancipation" [2]), as follows:

There aren't Good and Bad traits that can be decided on by collective consensus. Instead of imposing a consensus idea of Good traits on future children, parents should be empowered to autonomously make genomic choices on behalf of their own future children.

Since genomic emancipation negates the core idea of eugenics, it is opposed to eugenics. (See the section "Comparison of eugenics vs. genomic emancipation" below.) For example:

  • genomic emancipation supports the principle of genomic liberty [3], contra eugenics;
  • genomic emancipation abhors the centralization of genomic choice-making, contra eugenics;
  • genomic emancipation respects the intensely private nature of reproduction and genomic choices, contra eugenics;
  • and genomic emancipation embraces positive-sum thinking and solutionism, contra eugenics.

However, just negating the core idea isn't enough of a bulwark against eugenic ideologies. As an ongoing project, we want to have detailed policies, ethical rules, and ideals that provide guidance for people interacting with reprogenetics. These policies, rules, and ideals should steer society away from mindsets that contribute to eugenic abuses, and should provide legible norms that society can coordinate to enforce. Some ideas are listed below in "Some practical norms for good development of reprogenetics". For example:

  • Pluralism about different visions of the good life.
  • Distrust of the state to intervene in reproduction, on the basis that disinterested parties shouldn't be allowed to impose reproductive choices on people.
  • Minimizing the soft eugenics of social stigma, e.g. through unbiased genetic counseling, genetic nondiscrimination rules, and rules about privacy of reprogenetics services.
  • Careful, independent genomic choice-making by parents.
  • Maintaining recourse so that a world with reprogenetics doesn't silence certain types of people or certain values; e.g. children whose parents used reprogenetics should be heeded especially carefully.
  • Minimizing centralized control or ownership over reprogenetics, e.g. by making science and technology open and licensable, and through anti-trust laws.
  • As a culture, generally not being dismissive about concerns around reprogenetics, being non-Teamist, and meditating on key values such as pluralism and positive-sum thinking.
  1. Benson-Tilsen, Tsvi. “Methods for Strong Human Germline Engineering.” Preprint, Figshare, February 6, 2026. https://doi.org/10.6084/m9.figshare.31286311.v1. ↩︎

  2. Benson-Tilsen, Tsvi. “Genomic Emancipation.” Preprint, Figshare, February 7, 2026. https://doi.org/10.6084/m9.figshare.31286647.v1. ↩︎

  3. Benson-Tilsen, Tsvi. “The Principle of Genomic Liberty.” Preprint, Figshare, February 7, 2026. https://doi.org/10.6084/m9.figshare.31286485.v1. ↩︎



Discuss

Already Optimized

2026-02-18 18:01:25

Published on February 18, 2026 10:01 AM GMT

A Harry Potter fanfiction. Based on the world of "Harry Potter and the Methods of Rationality" by Eliezer Yudkowsky, diverging from canon.


Harry had been having, by any objective measure, an excellent week.

On Monday he had demonstrated, to his own satisfaction and Professor Flitwick's visible alarm, that the Hover Charm could be generalized to any object regardless of mass if you conceptualized it as a momentum transfer rather than a force application. On Wednesday he had worked out why Neville's potions kept failing — the textbook instructions assumed clockwise stirring, but the underlying reaction was chirally sensitive, and Neville was left-handed. A trivial fix. Neville had cried.

On Friday evening, buoyed by the week's successes and looking for a specific reference on crystalline wand cores that he was certain would unlock a further generalization of his momentum framework, Harry was in the Restricted Section.

He had access. Professor McGonagall had granted it after the Hover Charm incident, in a tone that suggested she was choosing between supervised access and finding him there anyway at 2 AM. A reasonable calculation on her part.

The book he wanted wasn't where the index said it should be. In its place was something else — a slim volume, untitled, bound in leather that had gone dark and soft with age. No author. No date. No library markings at all, which was itself unusual; Madam Pince catalogued everything.

He opened it because he was Harry Potter and there was an uncatalogued book in front of him and not opening it was not a thing that was going to happen.


The first entry was dated in a system he didn't immediately recognize — then did. The Roman calendar. Before the Julian reform. Which put it somewhere around...

He did the arithmetic twice. The book was over two thousand years old.

The handwriting — once he adjusted to the Latin, which was oddly easy to read, closer to spell notation than classical prose — was precise, methodical, and deeply familiar. Not the content. The voice.

I have spent the summer months cataloguing what the elders call the "ancestral arts" and I find their taxonomy incoherent. They group spells by tradition and lineage rather than by underlying principle. When I asked Marcellus why the fire-calling and the forge-warming are taught as separate disciplines when they clearly operate on the same substrate, he told me that they come from different families and are therefore different magics. This is not a reason. This is genealogy dressed as ontology.

Harry's breath caught. Not at the content — at the recognition. He had written almost exactly this, in his own notes, three months ago. About Transfiguration and Charms.

He kept reading.

I have begun my own classification. If the elders will not systematize the arts, I will do it myself. The patterns are obvious once you abandon the traditional categories. There are at most seven fundamental interactions underlying all known magic, and the spells are simply different access points to the same underlying mechanisms. The ancestors must have known this. Why has it been forgotten? Why has no one else seen it?


The entries spanned what appeared to be several years. Harry read them in order, sitting cross-legged on the cold floor of the Restricted Section, the book in his lap, a Lumos hovering above him that he had long since stopped consciously maintaining.

The author — he never gave his name in the early entries, a habit of Roman-era wizards who considered written names a vulnerability — progressed rapidly. His early observations were sharp. His experiments were well-designed. Harry found himself nodding along, mentally annotating, sometimes wanting to reach through two millennia and suggest a control group.

By the middle entries, the author had begun to find things that disturbed him.

The incantations are not Latin. I have been operating under the assumption that our magical vocabulary derives from our common tongue, as all technical language does. I was wrong. I tested this with Cassia, who is gifted with languages. She confirms what I suspected: the derivation goes the wrong way. "Lumos" is not a Latin word adapted for magical use. The Latin words for light — lux, lumen, lucere — are corruptions of the incantation. The spell came first. The language came after.

I do not know what to make of this. It implies that the magical infrastructure predates Latin. Predates Rome. Predates, perhaps, all of our civilizations. If the spells are the original and the language is the echo, then who wrote the original?

Harry lowered the book for a moment. His hands were not shaking, because he was Harry Potter and his hands did not shake, but he noticed that his Lumos had brightened considerably, which was the sort of involuntary response that meant his emotional state was affecting his magic, which meant his emotional state was more affected than he was admitting to himself.

The etymology goes the wrong way.

He'd never thought about it. He'd never thought about it. He'd been casting spells in what he assumed was Latin for months and he'd never once asked why a language from an Italian peninsula was the universal interface for a fundamental force of nature.

He kept reading.


The author's investigation led him, inevitably, to the founders. Not of Hogwarts — of Rome.

I have secured an audience with the Elder of the Third House, who claims direct knowledge passed down from the time of Romulus. I was skeptical. I am no longer skeptical. He told me things about the founding that are not in any record, and which I have independently verified through architectural analysis of the oldest magical structures.

The founders did not discover magic. They arrived with it. They came from somewhere else, carrying fragments of knowledge far beyond what we possess today, and they built the minimum necessary to sustain a civilization. What we call "Roman magic" is not a tradition developed over centuries. It is the residue of something much larger, distributed by people who understood only a fraction of it themselves.

I asked the Elder what the founders were fragments of. Where they came from. He became very still and told me I should stop this line of inquiry.

I will not be stopping this line of inquiry.

Harry heard himself laugh — a short, involuntary sound in the silent library. Of course the author wouldn't stop. Harry wouldn't have stopped either. That was the whole point of being the kind of person who —

He stopped laughing.

He kept reading.


The Elder has agreed to tell me more, though he is unhappy about it. I believe he has decided that refusing to answer will only drive my investigations in more dangerous directions, which is probably true.

He told me about Atlantis.

Not the myth. Not the garbled account that surfaces sometimes in Greek philosophy. The actual place. An actual civilization, so advanced that our magic is to theirs as a child's drawing is to the thing it depicts. They did not merely use the fundamental forces. They rewrote them. The magical substrate that we interact with — the spells, the wand movements, the magical creatures, the entire ecosystem that we treat as natural law — is not natural. It is infrastructure. Built by Atlantean artificers so long ago that their work has been mistaken for nature itself.

We are living inside their creation and we have forgotten that it was created.

I asked the Elder what happened to them.

He said: "What always happens."

I asked him to be more specific.

He was.

The next three entries were short and shaken. The author's handwriting, previously meticulous, had become uneven. He did not reproduce what the Elder told him. He referred to it only obliquely.

I have not slept. I keep thinking about the numbers. The Elder was not specific about the population of Atlantis at its height, but from the scale of what they built — and everything around us is what they built — it must have been vast. And it is all gone. Not conquered. Not declined. Erased so completely that the only evidence it existed is the infrastructure itself, still running, still shaping reality, maintained by no one, understood by no one.

A civilization capable of rewriting the laws of physics left nothing behind except the rewrite.


The entries resumed some weeks later. The author had regained his composure and — Harry felt a chill as he recognized this too — had begun to rationalize.

I have been thinking about the Elder's warning and I believe it is overstated. The Atlanteans destroyed themselves through what appears to have been unrestricted access to the deep substrate — the layer beneath the magical interface that we interact with. But we are not Atlanteans. We are working with the interface, not the source. The risk profile is entirely different.

Furthermore, the Elder's position is essentially conservative: because something went wrong once, we should never investigate again. This is not a principle. This is fear. By the same logic, we should never have built Rome because previous civilizations fell.

I do not intend to access the deep substrate. I intend merely to understand the interface more fully. There is a distinction between studying a tool and dismantling it.

Harry was nodding. The argument was sound. The distinction between studying and dismantling was real and important. You could investigate a system without —

He turned the page.

I have made a breakthrough. The warding structures on the oldest Roman buildings are not merely protective. They are computational. They are performing continuous calculations that maintain certain properties of local magical space. If I am right, then removing or modifying them would alter the behavior of all magic within their range.

I have identified a ward that appears to be suppressing something. I do not yet know what. But its structure suggests it was placed by the founders themselves, and it is consuming an enormous amount of magical energy to maintain. Whatever it is suppressing must be correspondingly powerful.

The obvious question: what would happen if it were removed?

I am not going to remove it. I am merely going to study it. There is a difference.


I have brought my findings to the Elder. He was not pleased. He used the word "fool," which I found unnecessarily personal.

He asked me: "Why not use this knowledge to protect Rome against Carthage?" I took this as a rhetorical point about the practical applications of my research and began to outline several defensive possibilities.

He cut me off. "Been there," he said. "Done that."

I asked him to explain.

He would not.

The entry ended there. The next one was dated six days later.

I have been researching Carthage independently. The military histories are straightforward. The magical histories are not. There are gaps. References that lead nowhere. Records that appear to have been deliberately destroyed.

I found one surviving account, hidden inside a genealogical registry where no one would think to look. It describes Carthage before the wars. A thriving magical civilization. Advanced. Innovative. In some ways more sophisticated than Rome.

The account was written by a Carthaginian wizard who was visiting Rome when his home ceased to exist. His description of what he returned to is...

The Romans salted the earth. I always assumed this was metaphorical, or at most a symbolic act of dominance. It was not. Nothing grows there because the magical substrate in that region was damaged so severely that it cannot support life properly. The salt was a cover story. Something happened to Carthage that had nothing to do with legions and warships.

"Been there. Done that."

I think the Elder was not speaking rhetorically.


The tone of the entries shifted after Carthage. The author became more cautious. More reflective. He wrote about his family — a wife, two children. He wrote about his garden. There were gaps of weeks between entries, then months.

Harry thought the journal was winding toward a conclusion. A decision to stop. A graceful retreat into domestic life, wisdom earned, lesson learned.

That is not what happened.

I have been away from this journal for four months. In that time I have tried to put my research aside. I have focused on teaching, on my family, on the ordinary satisfactions of a life well-lived.

I cannot do it.

The knowledge is there. The interface is not merely an interface — it is a doorway, and I have seen through it, and I cannot unsee what is on the other side. The Elder is right that the Atlanteans destroyed themselves. He is right that Carthage was destroyed by someone misusing recovered knowledge. He may even be right that I should stop.

But I am not going to access the deep substrate. I am merely going to remove one ward. One single suppression ward that is consuming enormous energy to hide something that may be entirely benign. I am not going to use what I find. I only want to know.

I will take every precaution.

The entries after that were technical. Dense. Excited. The author had found collaborators — "careful men, scholars, not reckless" — and they were mapping the ward structure in detail. The work was methodical. The safeguards were extensive. Every entry described another layer of caution, another fallback, another reason this was different from what had come before.

Harry read faster. Then slower.

The last entry was not dramatic. It was not a cry for help or a confession or a warning. It was a plan for the following week's work. A list of measurements to take. A note to bring lunch because last time they had worked through the meal and concentration suffered. A reminder to pick up something from the market for his daughter's birthday.

Then blank pages.

Harry turned them. One after another. Blank. Blank. Blank.

He turned them all.


The author's name was not in the journal. But there were enough identifying details — the Third House, the Elder, the specific ward locations — that it took Harry less than twenty minutes in the historical records to find him.

Marcus Valerius Corvus. Wizard of the Third Augural House. Born in the 154th year of Rome's founding. Noted scholar. Family man. Described in one secondary source as "the most gifted theoretical magician of his generation."

The secondary sources were sparse after a certain date. There was a gap in the records of the Third House. A fire, attributed to accident. Several members of the House dead or missing. A brief, clinical notation in a Senate record about "disturbances in the southern district" that required intervention. The word used for the intervention was one Harry had to look up.

It meant, roughly, "cauterization."

A later genealogical record listed the surviving members of the Corvus family. His wife. His daughter. His son. They had relocated to a rural settlement far from Rome. There was a single annotation next to his wife's name that Harry read three times before he understood it. It was a legal status marker.

It meant that her husband was not dead but had been declared non-person. Stripped of name, of citizenship, of family ties. Not executed. Not exiled. Something the Romans reserved for people who had committed offenses so severe that the punishment was un-being. Removal from all records, all lineages, all memory.

The man who had written the journal with such clarity and care and cautious optimism had his name scraped from the walls of his own house.

And the southern district of magical Rome — Harry checked — had been rebuilt. But the secondary sources noted, in the careful phrasing of historians who did not want to speculate, that the character of the magic there was different afterward. Weaker in some ways. Stranger in others. As if the substrate itself had been bruised.

Harry closed the genealogical record. He sat for a while in the silent library. His Lumos had dimmed to almost nothing and he had not noticed.

He thought about Marcus Valerius Corvus, who was the most gifted theoretical magician of his generation, who took every precaution, who only wanted to know, who was not going to use what he found, who was merely going to remove one ward —

He thought about Carthage. Five hundred thousand people. A salted plain.

He thought about the etymology going the wrong way, and what that meant, and what had built the system that everyone was living inside, and where they had gone.

He thought about the Weasleys' kitchen. The self-stirring pot. The clock on the wall that tracked the family. The pile of shoes by the door. The way Mrs. Weasley's cooking expanded to accommodate however many people showed up, not through efficiency but through abundance, and how the house itself seemed to grow rooms when rooms were needed, and how none of this struck any wizard as remarkable because it wasn't remarkable, it was just life when you had magic, and how he had looked at all of this and thought they could be so much more without ever asking more what? And why?

He thought about a civilization so advanced it could rewrite the laws of physics, and how they were gone so completely that the only evidence was everything.

His Lumos went out. He sat in the dark for a long time.

Then he picked up the journal and went to see the Headmaster.


It was late. Harry had expected to have to argue his way past the gargoyle, but it stepped aside before he spoke. The staircase was already moving. The door at the top was open, and the office was lit, and there was a teapot on the desk that was still steaming.

Dumbledore was in his chair. He looked at the book in Harry's hands and his expression did something complicated that ended in a kind of tired gentleness.

"Sit down, Harry."

Harry sat. Dumbledore poured tea. The cup was warm in Harry's hands and he held it without drinking.

"You knew I'd find it," Harry said.

"I knew you would find it, or something like it. You are not the first student of your particular... temperament."

"The book. Marcus Valerius Corvus. The southern district. All of it. You just — left it there? In the library?"

"Where would you suggest I put it?" Dumbledore said, gently. "It has been in that library for a very long time. It has been found before. It will be found again. The question has never been whether bright students will find it. The question is what they do after."

Harry looked down at his tea.

"Harry, do you know how many people lived in Carthage?"

"At its height? Estimates vary. Somewhere around five hundred thousand."

Dumbledore said nothing. He let the number sit in the room.

A long silence.

"You are not the worst case I have dealt with, if that offers any comfort," Dumbledore said, in the tone of a man offering what comfort he could. "In 1971 I had to physically restrain a student who had found a reference to something called — and I wish I were not saying these words — the Torment Nexus, and was attempting to access it because, and I quote, 'it probably isn't really that bad, the name is most likely metaphorical.'"

A beat.

"It was not metaphorical."

Harry, numbly: "Was that Voldemort?"

"It was not Voldemort. There are many bright students, Harry."

Another silence. The fire crackled. Somewhere in the castle, a clock chimed a late hour.

"Grindelwald read that journal in his fifth year," Dumbledore said, quietly. "He drew ambitious conclusions. I read it the year after. I had the advantage of watching what those conclusions did to my closest friend."

He set down his teacup.

"I was as clever as you, once. Cleverer, perhaps. I looked at the wizarding world and I saw everything you see — the inefficiency, the waste, the tradition without reason, the power unused. Gellert and I were going to remake everything. For the greater good." The words came out with the particular care of a man handling something that still cut. "It was Gellert who wanted to move fast. I was the one who wanted to be systematic. I was going to be careful. I was only going to remove the unnecessary constraints. I had safeguards planned. Precautions. I was not going to be reckless."

He looked at Harry.

"Do you know what the difference is between Gellert Grindelwald and Marcus Valerius Corvus?"

Harry shook his head.

"Scale. Only scale. The reasoning is always the same. 'I am not going to use it, I only want to know. I will take every precaution. This is different from what came before.' I have heard it from every brilliant student who has sat where you are sitting. The words barely vary."

Harry stared at the journal in his lap. The leather was warm where his hands had been holding it.

"What do I do?" he asked. His voice was smaller than he wanted it to be.

"I have found," Dumbledore said, "that the question is less about what to do than about what to want. The wanting is where it all goes wrong. Not the knowing. Not the doing. The wanting."

He picked up the teapot and refilled Harry's cup, though Harry had not drunk any.

"Molly Weasley tells me she is making two pies for the Christmas holiday. Apparently your friend Ron found the first insufficient last year and has formally requested a second. I understand it will be treacle."

Harry looked up. Dumbledore's eyes were bright and kind and ancient and sad all at once.

"Go to the Weasleys for Christmas, Harry. Eat pie. Let Molly fuss over you. Watch Arthur get excited about batteries. These are not small things. In a world that has already been optimized, they are the only things that matter."


Harry walked back to the Gryffindor common room slowly. The castle was quiet. His footsteps echoed in the empty corridors and he listened to them the way you listen to something when your mind is too full for thought.

He thought about Marcus Valerius Corvus, who took every precaution and only wanted to know.

He thought about Grindelwald, who drew ambitious conclusions.

He thought about a brilliant student in 1971 who tried to open something called the Torment Nexus because the name was probably metaphorical.

He thought about the Weasley kitchen, and the self-repairing house, and the clock that tracked the family, and the way Ron talked about his mum's cooking with the unselfconscious happiness of a person who had never once doubted that there would be enough.

He thought about a civilization that could rewrite physics. Gone. Infrastructure still running. No one left to read the manual.

He thought about Dumbledore, seventeen years old, clever as anyone who ever lived, choosing between more and enough, and choosing wrong, and spending the rest of his life gently steering other clever children away from the same door.

He thought about the journal, which he was still carrying, and which he was going to return to the Restricted Section in the morning. Not because it should be hidden. Because it should be findable. Because someday another student with his particular temperament would need to read it at exactly the right moment, and the library needed to be ready.

He climbed through the portrait hole. The common room was empty except for a low fire. Ron had fallen asleep on the couch with a Chudley Cannons scarf over his face. He was snoring.

Harry stood there for a while.

The optimization engine in his head — the one that never stopped, the one that saw every system as a problem and every problem as solvable and every solution as a step toward the next solution — was still running. It would probably always be running. He didn't think you could turn it off. But for the first time since he'd come to Hogwarts, it was reaching a conclusion he hadn't expected.

The system was already optimized. Not by him. Not for him. By someone so far beyond him that the comparison wasn't even meaningful, and then by centuries of people who'd learned, through suffering, which parts to leave alone. The Weasleys' kitchen was the output. The pies were the output. Ron, asleep on the couch, content in a way Harry had never been — Ron was the output.

The fire crackled. Ron shifted in his sleep and murmured something about Quidditch.

Harry put the journal on the table. He sat down in the chair across from his friend. He didn't pick up a book. He didn't start planning. He didn't optimize anything.

He just sat there, in the warmth, and let it be enough.



Discuss