2025-12-03 22:59:34
Published on December 3, 2025 2:59 PM GMT
Eliezer Yudkowsky has, on several occasions, claimed that AI’s success at protein folding was essentially predictable. His reasoning (e.g., here) is straightforward and convincing: proteins in our universe fold reliably; evolution has repeatedly found foldable and functional sequences; therefore the underlying energy landscapes must possess a kind of benign structure. If evolution can navigate these landscapes, then, with enough data and compute, machine learning should recover the mapping from amino-acid sequence to three-dimensional structure.
This account has rhetorical and inductive appeal. It treats evolution as evidence about the computational nature of protein folding and interprets AlphaFold as a natural consequence of biological priors. But the argument, as usually presented, fails to acknowledge what would be required for it to count as a formal heuristic. It presumes that evolutionary success necessitates the existence of a simple, learnable mapping. It presumes that the folding landscape is sufficiently smooth that the space of biologically relevant proteins is intrinsically easy. And it presumes that the success of a massive deep-learning model confirms that the problem was always secretly tractable.
These presumptions rely on an unspoken quantifier: for all proteins relevant to life, folding is easy. But this “for all” is not legitimate in a complexity-theoretic context unless the instance class has a precise and bounded description. Yudkowsky instead appeals to whatever evolution happened to discover. Evolution’s search process, however, is not a polytime algorithm but an unbounded historical trajectory with astronomical parallelism, billions of years of runtime, and immense filtering by selection pressures. Without explicit bounds, “evolution succeeded” does not imply anything about the inherent computational character of the underlying mapping. It merely establishes that a particular contingent subset of sequences—those the biosphere retained—happened to be foldable by natural dynamics.
If we shift to formal models, the general protein-folding problem remains NP-hard under standard abstraction schemes. That does not mean biological folding is NP-hard, only that no claim about the general tractability of the sequence to structure mapping can be inferred from evolutionary success. What matters for tractability is not the full space of possible proteins but the restricted subset that life explores. Yudkowsky’s argument, by treating evolutionary selection as direct evidence of easy energy landscapes, smuggles in the structural properties of that restricted set without acknowledging that these properties are exactly what must be demonstrated—not simply asserted.
The right computational picture looks very different. The biosphere does not sample uniformly from all possible sequences. Instead, it occupies a tiny, closed, highly structured subset of sequence space—an extraordinarily low-Kolmogorov-complexity region characterized by a few thousand fold families, extensive modularity, and strong physical constraints on designability. This subset bears almost no resemblance to the adversarial or worst-case instances that drive NP-hardness proofs.
Once attention is confined to that biologically realized region, the folding map ceases to be the general NP-hard problem and becomes a promise problem over a heavily compressed instance class. The problem is not hard, it's just big.
Seen this way, the achievement of AlphaFold is not evidence that “protein folding was always solvable,” but evidence that modern machine learning can use enormous non-uniform information—training data, evolutionary alignments, known structures—to approximate a mapping defined over a small and highly regularized domain. The training process acts as a vast preprocessing step, analogous to non-uniform advice in complexity theory. The final trained model is essentially a polytime function augmented by a massive advice string (its learned parameters), specialized to a single distribution. Evolution itself plays a similar role: after billions of years of search, it has curated only those sequences that belong to a region of the landscape where folding is stable, kinetically accessible, and robust to perturbation. The process is not evidence that folding is uniformly simple; it is evidence that evolution found a tractable island inside an intractable ocean.
This distinction matters because it reframes the “predictability” of AlphaFold’s success. Yudkowsky presents folding as solvable because biological energy landscapes are smooth enough to make evolution effective. But smoothness on the evolutionary subset does not entail smoothness on the entire space; nor does it imply that this subset is algorithmically accessible without vast volumes of preprocessing. A complexity-theoretic interpretation views the success of deep learning not as the discovery of a simple universal rule but as the extraction of structure from a domain that has already been heavily pruned, compressed, and optimized by natural history. The learning system inherits its tractability from the fact that the problem has been pre-worked into a low-entropy form.
There is therefore a straightforward computational reason why the “evolution shows folding is easy” argument does not succeed as an explanation: it conflates a historical process without resource bounds with an algorithmic claim that requires them. It interprets the existence of foldable proteins as proof of benign complexity rather than as the output of a long, unbounded filtration that carved out a tractable subclass. The right explanatory frame is not “energy landscapes are nice,” but “biology inhabits a closed problem class with small descriptive complexity, and our algorithms exploit vast non-uniform information about that class.”
Yudkowsky’s argument gestures toward the right conclusion—that solvability was unsurprising—but gives the wrong reason. The crucial structure lies not in universal physics but in the finite, closed domain evolution left to us, and in the immense preprocessing our models perform before they ever encounter a new sequence. The success of AlphaFold was predictable only conditional on the recognition that the real problem is not worst-case folding but folding within a compact distribution carved out by evolutionary history and indexed in publicly available data. What makes the achievement unsurprising is not that physical energy landscapes are globally smooth, but that the domain evolution generated is sufficiently structured, and sufficiently well-sampled, to permit high-fidelity interpolation.
In essence, no one has solved the actual problem, because any solver specialized to the evolutionary subset is guaranteed to fail once the promise is removed; outside that tightly curated domain, the mapping reverts to an unbounded and intractable instance class.
(Context: much contemporary discourse treats machine learning as if it were solving an analytic problem over a vast, effectively unbounded function space—an optimization in a continuous domain, governed by gradient dynamics and statistical generalization. But what learning systems actually deliver, when examined through a computational or logical lens, is a highly non-uniform procedure specialized to a sharply delimited region of instance space. Mixing the analytic metaphor “learning a function over a vast continuous space” with conclusions that require logical quantification “the problem is easy for all relevant inputs” invites people to mistake interpolation on a compact manifold for global tractability. But computation is always local: effective behavior emerges once the space is sufficiently structured, compressed, or bounded by a promise. The solution: Constructive Logic, statements should be made relative to well-defined objects, not idealized totalities.)
2025-12-03 22:49:08
Published on December 3, 2025 2:49 PM GMT
It’s early. 0100. I’ve just come home from walking a guest at my latest party to her bus stop. Cleaned up a bit. Now I’m sitting here. A few interesting topics came up
For a while I talked to a the smartest man in St Vincent, the philosophy PHD, hedge fund dude and Mats guy about dualism vs physicalism. Hedge fund guy said he used to be a dualist but changed his mind when he heard a good counterargument. The argument was roughly that physical things seem distinct, but actually the way we draw boundaries around them is arbitrary. He perceived himself as distinct from the sofa he was sitting on and the sofa as distinct from the small pillows on it. An alien visiting earth might see them all as one object. It’s unclear why any one set of boundaries would be more correct. Even on the level of a boundary, all objects bleed into each other at an atomic level anyway. So, in short there is no objective, crisp distinction between physical objects. He believes that minds are distinct. It seems obvious to him that his mind is separate from mine. But if minds are based on physical things then they must also flow into each other which can’t be the case. Hence minds cannot be physical.
There were a few objections to this. First, maybe minds do flow into each other in a similar way. The thoughts and ideas of others shape my thoughts and ideas. Even non verbal things, like a friend anxiously pacing next to me, change my mood and state. Just like an alien may draw different boundaries around objects, so they could draw different boundaries around where a “mind” begins and ends. This is reminicent of the spiderweb argument from years ago. The story goes that some spiders have tiny brains but display much smarter behaviour than they should. It turns out that the offload memory and some computation to their web. Is their web part of their mind/brain? The second objection was similar but slightly different. So maybe boundaries and the talk of what objects are real or not is just misleading. There is base reality. The lines we draw around things are just useful abstractions. The most basic physical laws and particles (or the stream of qualia you experience if you want to be precise) is Real with a capital R. A jumbo jet is real in the sense that the label I apply to certain configurations of atoms may or may not accurately describe a given place in space and time. But it’s not Real in an objective sense and an alien or different person could well categorize the same atoms differently. (e.g: seeing a metal shell and an interior hollow space as two different components).
At some point the conversation moved on. The smartest man was a Christian and described himself as a dualist but he actually held a computational view of consciousness. I asked him his intuitions about replacing his brain atom by atom with computational equivalent silicone processors. He thought he’d be the same person. We wasted a bit of time discussing the definition of dualism vs physicalism and whether computationalism was a form of dualism or not really? He argued it was because you believe the mind is a separate kind of thing. A pattern rather than a specific physical arrangement of atoms. Me and Andrea argued that the computationalism still grounded out in concrete reality. The mind was a pattern but that pattern was instantiated in and entirely dependent on the physical world. Yes it could exist on different kinds of substrate or physical material but still it was a physical phenomenon just like other kinds of pattern. e.g: rain. Dualism holds that minds are a separate, non-physical thing entirely. That the destruction of physical objects could sever the connection to the mind, but that the mind still exists independently. So by our definition he wasn’t really a dualist.
After this I left, let people in and generally hovered around. When I came back the conversation had turned to identity and the teleport experiment. Andrea had left and a bit of the way in the polish power couple had joined. The conversation boiled down to personspace vs continuity of consciousness views of identity. There wasn’t a good resolution. They played around with different examples for a bit but the core difference in intuitions always remained. I think over the years I’ve come to the view that philosophy of identity is similar to ethics. It is possible to make meaningful progress, to have thought experiments which make you reify your beliefs etc… But still it ultimately rests not on a shared external reality but on a set of intuitions that people hold. Just like in ethics, even if you do it well sometimes you’ll reach moral bedrock and just discover that two people hold different axiomatic beliefs or tradeoff rations between moral goods, so in identity some people just have very different views of what it mean to be a person. Hmmmm. Maybe. Or maybe with enough prodding with various “simulated minds but running out of order/inparallel/etc…” arguments and the goldfish gun most continuity of consciousness people should on reflection change their view. I’m unsure.
Other things happened. People talked. At some point while walking outside to get ramen I spoke to the scientist whose parents were atheistic christians. He’d been to China recently where he met one of my close friends. He observed how safe China was and felt. He recounted how at a hike starting spot in a large city there was an open container with water bottles and a box for money to pay if you take one. I’ve heard similar stories from my friend about charging banks or various other things. I heard that public safety was far lower in the past and China was lower social trust. I wonder what’s driven the change? A few theories
There’s a large technological overhang for crime/bad behaviour reduction. CCTV and facial recognition and effective policing. The west ignores the tech. China embraces it, with CCTV everywhere and swift punishment. This creates an environment which is naturally high trust. Doing anything bad is very likely to be punished/caught. Also, because no one does bad stuff generally it becomes even more culturally frowned upon and even easier to catch because the ratio of police:criminal goes up as the number of criminals goes down but police funding stays roughly constant. econ grown = more prosperity/wealth = higher trust more commercial culture, the slow escape from low trust communist/authoritarian culture they’re east asian. Most developed east asian countries have high social trust and safety (Japan, Taiwan, Korea, etc…). Could be genetic. Could be culture I came home and eventually sat down and talked to a Ukrainian girl. She had an experience with being cancelled a while ago for writing a fairly banal post comparing moral outrage at sexual abuse of children to broad moral acceptance or at least lack of real concern with all the other ways in which parents can and do severely limit their children’s autonomy or disrespect their consent. We talked a bit. She essentially was concerned about the same problems I was when I was a teenager. Moloch, the inevitable grind of evolution, and what back then I had termed the impossibility of “breaking the chains of causality”. Her point was roughly that we are human beings, but many of our desires, thoughts and actions are shaped by our genes. These are things we do not choose, but rather things that we are enslaved to. We should strive to rise above that, to consciously fight those desires and try to cut them out of ourselves. I broadly agree with her but my counterargument was the standard one. Every desire, impulse, etc… which makes up your utility function is a product of processes outside of your control. Nature. Nurture. Random chance if the universe is not deterministic. Why assume that all desires stemming from genes/nature are bad and those stemming from nurture etc… are fine? Aren’t they all equally arbitrary? Isn’t a better criteria to determine on reflection which of your present desires you want and do not want and then to work on excising or resisting those you do not want to control you. e.g: I like/respect attractive people more. On reflection I would rather not do this and so I fight against this tendency in myself. I don’t want people around me to be tortured and to horribly suffer for no reason. I know this desire also comes from evolution (having your tribe/family suffer is bad if you’re related, maybe reciprocity norms are adaptive at the group and individual level etc…), but it’s a desire I want to have.
Ultimately my conclusion here is similar to the one I made as a child. If you cut away every part of you that was instilled in you by moloch or processes outside of your control, you are left with nothing. Every part of you is bound by the chains of causality. Every decision comes down the the remorseless logic of the physical universe. To somehow break that causality is to also break the machine and set of rules which enable the system you draw a boundary around and call “me” to exist and function. We live in a prison but we also the chains.
Still, maybe that’s too dramatic. Breaking free of causality is impossible. More reflection and consciously rejecting your base desires, cultivating virtue and trying to lift your head out of the mud and towards the stars is something worth doing.
On a meta-level, there’s something terrifying and self-referential about this conversation. Rationally I know my thoughts arise from a deterministic or at best probabilistic process. Intuitively I always thought my thoughts were unique and somehow pure and above the physical world. The fact that my ideas and aim were so different from those around me growing up reinforced that. I felt like I stood above and apart from the masses. Now I meet another smart, autistic eastern European and they happen to have pretty much the exact same obsession with rising above the animal inside and breaking the stranglehold evolution has on mind and body. I don’t think that’s a coincidence. Strange how a conversation about breaking free of evolution/environment makes me realize how beholden to both I am.
2025-12-03 22:35:28
Published on December 3, 2025 2:35 PM GMT
In this post, we introduce contributions and supracontributions[1], which are basic objects from infra-Bayesianism that go beyond the crisp case (the case of credal sets). We then define supra-POMDPs, a generalization of partially observable Markov decision processes (POMDPs). This generalization has state transition dynamics that are described by supracontributions.
We use supra-POMDPs to formalize various Newcombian problems in the context of learning theory where an agent repeatedly encounters the problem. The one-shot version of these problems are well-known to highlight flaws with classical decision theories.[2] In particular, we discuss the opaque, transparent, and epsilon-noisy versions of Newcomb's problem, XOR blackmail, and counterfactual mugging.
We conclude by stating a theorem that describes when optimality for the supra-POMDP relates to optimality for the Newcombian problem. This theorem is significant because it gives a sufficient condition under which infra-Bayesian decision theory (IBDT) can approximate the optimal decision. Furthermore, we demonstrate through the examples that IBDT is optimal for problems for which evidential and causal decision theory fail.
Contributions, a generalization of probability distributions, are defined as follows.
Definition: Contribution
A contribution on a finite set is a function[3] such that The set of contributions on is denoted by .
Given we write to denote A partial order on is given by if for all subsets For example, the constant-zero function 0 is a contribution that lies below every element in in the partial order. A set of contributions is downward closed if for all implies Given the downward closure of in is defined by
Figure 1 illustrates a set of contributions together with its downward closure.
The set of contributions shown in Figure 1 together with its downward closure is an example of a supracontribution, defined as follows.
Definition: Supracontribution
A supracontribution on is a set of contributions such that
- ,
- is closed,
- is convex, and
- is downward closed.
The set of supracontributions on is denoted by .
Figure 2 shows another example of a supracontribution.
Supracontributions can be regarded as fuzzy sets of distributions, namely sets of distributions in which membership is described by a value in rather than In particular, the membership of in is given by
where denotes the scaling of by See Figure 3 for two examples. Note that is well-defined since all supracontributions contain 0. By this viewpoint, supracontributions can be seen as a natural generalization of credal sets, which are "crisp" or ordinary sets of distributions.
There is a natural embedding of credal sets on into the space of supracontributions on Let Define Note that under this definition, (and otherwise the union with in the definition of is redundant).
Generalizing the notion of expected value, we write We define expectation (similarly as in the crisp case) as the max over all expectations for elements of the supracontribution. By definition, a supracontribution is closed and thus this notion of expectation is well-defined.
Definition: Expectation with respect to a supracontribution
Given a continuous function and define the expectation of with respect to by
Let denote a non-empty set of contributions. Then where denotes convex hull and denotes closure. Therefore, in the context of optimization we may always replace a non-empty set of contributions by the supracontribution obtained by taking the convex, downward, and topological closure.
Recall that environments in the classical theory and in crisp infra-Bayesianism have type We generalize this notion to the fuzzy setting using semi-environments.[4]
Definition: Semi-environment
A semi-environment is a map of the form
The interaction of a semi-environment and a policy determines a contribution on destinies [5]
Definition: (Fuzzy) law
A (fuzzy) law generated by a set of semi-environments is a map such that for all where denotes convex hull and denotes closure.
Our tool for formalizing Newcombian problems using the mathematical objects described in the last section is fuzzy supra-POMDPs, a generalization of partially observable Markov decision processes (POMDPs). Given a set of states and a contribution the missing probability mass, can be interpreted as the probability of a logical contradiction.
Under a fuzzy supra-POMDP model, uncertainty of the initial state is described by an initial supracontribution over states. Similarly to the state transition dynamics of a crisp supra-POMDP as defined in the preceding post, the state transition dynamics of a fuzzy supra-POMDP are multivalued. Given a state and action, the transition suprakernel returns a supracontribution, and the true dynamics are described by any element of the supracontribution.
A significant feature of fuzzy supra-POMDPs is that for some state and action we may have which corresponds to a logical contradiction in which the state transition dynamics come to a halt. We use this feature to model Newcombian problems where it is assumed there is a perfect predictor Omega predicting an agent's policy. When an action deviates from the predicted policy (which is encoded in the state), the transition kernel returns
We formally define fuzzy supra-POMDPs as follows.
Definition: Fuzzy supra-POMDP
A fuzzy supra-partially observable Markov decision process (supra-POMDP) is a tuple where
- is a set of states,
- is an initial supracontribution over states,
- is a set of actions,
- is a set of observations,
- is a transition suprakernel,[6]
- is an observation mapping.
Every fuzzy supra-POMDP defines a (fuzzy) law. The construction is similar to the construction of a crisp law given a crisp supra-POMDP, which is discussed in the preceding post. A copolicy to a fuzzy supra-POMDP is a map that is consistent with the transition suprakernel. More specifically, given a history of states and actions, the transition kernel determines a supracontribution from the most recent state and action. The copolicy can be thought of as a map that selects a contribution from that supracontribution.
Definition: Copolicy to a fuzzy supra-POMDP
Let be a fuzzy supra-POMDP. A map is an -copolicy if
- for the empty string and
- For all non-empty strings , .
An -copolicy and the observation map of together determine a semi-environment Let
Then defines the law generated by
In this section, we give a mathematical definition for Newcombian problems and describe how to model Newcombian problems using fuzzy supra-POMDPs.
Let denote a set of observations. Given a time-horizon let denotes strings in of length less than Let denote the set of horizon- policies, i.e. maps of the form
Definition: Newcombian problem with horizon
A Newcombian problem with horizon is a map together with a loss function
Intuitively speaking, given a policy and a sequence of observations, a Newcombian problem specifies some distribution that describes uncertainty about the next observation. This framework allows for a mathematical description of an environment in which there is a perfect predictor Omega and a distribution over observations that depends on the policy that Omega predicts.
Similar to how the interaction of a policy and an environment produce a distribution over destinies, a Newcombian problem and a policy together determine a distribution on outcomes, The policy that minimizes expected loss with respect to this distribution is said to be a -optimal policy.
Definition: Optimal policy for a Newcombian problem
A policy is optimal for a Newcombian problem if .
If is optimal for then is said to be the optimal loss for
In order to discuss learning, we consider the case of multiple episodes where an agent repeatedly encounters the Newcombian problem.
Given some number of episodes let denote the set of multi-episode policies. A multi-episode policy gives rise to a sequence of single-episode policies
By means of this sequence of single-episode policies, a Newcombian problem and a multi-episode policy together determine a distribution on outcomes over multiple episodes, which we also denote by
The loss function can be naturally extended to multiple episodes by considering the mean loss per episode. In particular, if and is given by then the mean loss over episodes is defined as
It is also possible to extend the loss to multiple episodes by considering the sum of the per-episode loss with a geometric time discount. In this case, the total loss is defined by
In this section, we describe how to model iterated Newcombian problems (i.e. repeated episodes of the problem) by a fuzzy supra-POMDP. We work in the iterated setting since this allows us to talk about learning. Examples are given in the following section.
The state space, initialization, and observation mapping
Let (together with ) be a Newcombian problem. Let Informally speaking, the state always encodes both a policy and a sequence of observations.
Let the initial supracontribution over states be where denotes the empty observation. This supracontribution represents complete ambiguity over the policy (and certainty over the empty observation).
The observation mapping simply returns the most recent observation datum from a state, i.e. for non-empty observation strings. If the observation string of the state is the empty string then may be chosen arbitrarily.
The transition suprakernel
We start with an informal description of the transition suprakernel which is defined in three cases. In short:
To elaborate, we have the following three cases.
We now formally define Given such that define by . Namely, appends a given observation to the prefix of observations and returns the corresponding state. Let denote the pushforward of
Define
In this section, we explain how to formalize various Newcombian problems using the fuzzy supra-POMDP framework. We provide the most detail for the first two examples
We first consider Newcomb's Problem. In this problem, there are two boxes: Box A, a transparent box that always contains $1K, and Box B, an opaque box that either contains $0 or $1M. An agent can choose to "one-box", meaning that they only take Box B, or "two-box", meaning they take both boxes. A perfect predictor Omega fills Box B with $1M if and only if Omega predicts that the agent will one-box.
Evidential decision theory (EDT) prescribes that an agent should choose the action that maximizes the expected utility conditioned on choosing that action. Thus, EDT recommends one-boxing because choosing to one-box can be seen as evidence that Box B contains $1M. This is the case even though the correlation is spurious, i.e. choosing to one-box does not cause there to be $1M in Box B. We will see that IBDT also recommends one-boxing. In comparison to EDT, causal decision theory (CDT)[7] prescribes that an agent should only take into account what an action causes to happen and therefore recommends two-boxing.
Let where denotes the empty observation and the remaining observations represent the total amount of money received.
Let where corresponds to one-boxing and corresponds to two-boxing. Without loss of generality, where and
Then is defined by
The loss of an episode is defined by
(We don't define for because under this model, this observation never occurs.)
Note that and Therefore, is optimal.
We now consider the corresponding supra-POMDP The transition suprakernel is given by
Figure 5 shows the state transition graph of Notably, it is not possible under for an agent to two-box when Box B is full (the left branch in Figure 5). This is assured by the fact that
The interaction of and produces a supracontribution over outcomes given by Similarly, the interaction of and produces the supracontribution
Then the expected loss for in one round is
Similarly, the expected loss for in one round is
From another viewpoint, the optimal (worst-case from the agent's perspective) copolicy to initializes the state to for i.e. In other words, the policy encoded in the state chosen by the copolicy matches the policy of the agent. The law defined by this supra-POMDP is equivalent to an ordinary environment in which one-boxing results in observing a full box and two-boxing results in observing an empty box.
We see from the above calculations that the optimal policy for is and moreover achieves the -optimal loss. This analysis holds for any number of episodes. This is significant because if a learning agent has in their hypothesis space, then they must converge to one-boxing if they are to achieve low regret for the iterated Newcomb's problem.
Note that for this example, we have only used what might be considered the most basic supracontributions, namely and the downward closure of a single probability distribution. In the next example, we will see the full power of supracontributions.
In this section we describe how to use a supra-POMDP to model the XOR blackmail problem. For a more in depth discussion of XOR blackmail, see e.g. Toward Idealized Decision Theory §2.1 (Soares and Fallenstein, 2015) and Cheating Death in Damascus §2 (Levinstein and Soares, 2020).
The problem is given as follows. Suppose there is a 1% probability that an agent's house may have a termite infestation that would cause $1M in damages. A blackmailer can predict the agent and also knows whether or not there is an infestation. The blackmailer sends a letter stating that exactly one of the following is true, if and only if the letter is truthful:
The agent can then accept or reject the blackmail. Note that as stated, the probability of blackmail depends on the agent's policy. Because policies are encoded in the state space of the associated supra-POMDP, we are able to model this. EDT recommends accepting the blackmail because accepting blackmail is evidence that there is not an infestation, even though this correlation is spurious (i.e. accepting the blackmail does not causally influence whether or not there is an infestation). On the other hand, CDT recommends rejecting the blackmail. Thus we see that these two decision theories are split across the two examples that we have seen so far and neither always recommends an optimal action. We now see that IBDT does recommend the optimal action again.
Let where denotes the empty observation, represents receiving the blackmail, represents not receiving the blackmail, and the remaining observations represent the various monetary outcomes.
Let where corresponds to accepting the blackmail and corresponds to rejecting the blackmail. Without loss of generality, where and
Interpreting the statement of the problem, we define as follows:
We normalize in order to define the loss of an episode by
Note that whereas Therefore is optimal.
We now consider the corresponding supra-POMDP The state transitions of are summarized in Figure 6. We first define the transition suprakernel on Using we have
We now consider the next level of the supra-POMDP. Since
Here we see that when the action is , which is consistent with the policy of the state , the transition kernel returns the downward closure of a distribution specified by On the other hand, when the action is , which is not consistent with , the transition kernel returns .
The action does not matter when there is no blackmail (i.e. both actions are consistent with the policy), so
A similar analysis for yields
Then on the final level: for all define
We now consider the expected loss of each policy for The interaction of and produces a supracontribution over outcomes given by where
Here, arises from the interaction of with the branch starting with state We have a probability distribution in this case because is always consistent with itself, which is the policy encoded in the states of this branch. On the other hand, the contribution arises from the interaction of with the branch starting with state In the case of blackmail, and disagree on the action and thus a probability mass of 0.01 is lost on this branch.
Therefore, the expected loss for in one round is given by
Another way to view this calculation is that the optimal -copolicy initializes the state to meaning
By a similar calculation,
Therefore, the optimal policy for is also , i.e. under this formulation it is optimal to reject the blackmail. This analysis holds for any number of episodes. Moreover, the optimal loss for is equal to the optimal loss for This is significant because if a learning agent has in their hypothesis space, then they must converge to rejecting the blackmail if they are to achieve low regret for the iterated Newcombian problem.
We now consider the problem of counterfactual mugging. In this problem, a perfect predictor (the "mugger") flips a coin. If the outcome is heads, the mugger asks the agent for $100, at which point they can decide to pay or not pay. Otherwise, they give the agent $10K if and only if they predict the agent would have paid the $100 if the outcome was heads.
Both CDT and EDT recommend not paying, and yet we will see that IBDT recommends to pay the mugger.
Let where represents heads, represents tails, and the remaining (non-empty) observations represent the various monetary outcomes. Let where represents paying the mugger and represents not paying the mugger. Without loss of generality, where and
Let . We normalize[8] to define the loss of an episode by
Note that whereas Therefore, is -optimal.
The state transitions of are shown in Figure 7.
We have
whereas
Therefore, the optimal policy for is also , i.e. under this formulation it is optimal to pay the mugger.
We now consider Transparent Newcomb's Problem. In this problem, both boxes of the original problem are transparent. We consider three versions. See Figure 11 in the next section for a summary of the decision theory recommendations.
Empty-box dependent
In the empty-box dependent version, a perfect predictor Omega leaves Box B empty if and only if Omega predicts that the agent will two-box upon seeing that Box B is empty.
Let Here corresponds to observing an empty box and corresponds to observing a full box. Let where corresponds to one-boxing and corresponds to two-boxing.
Without loss of generality, where and for Namely, the policies are distinguished by the action chosen upon observing an empty or full box.
Let . Define by
The state transition graph of the supra-POMDP representing this problem is shown in Figure 8.
We now consider the expected loss in one round for each policy interacting with Similarly to the original version of Newcomb's problem, the optimal copolicy to a policy initializes the state to , meaning the policy encoded in the state chosen by the copolicy matches the true policy of the agent.
We have
Therefore, the optimal policy for is meaning it is optimal to one-box upon observing an empty box and to two-box upon seeing a full box. Note that is also the -optimal policy.
Full-box dependent
In the full-box dependent version, Omega (a perfect predictor) puts $1M in Box B if and only if Omega predicts that the agent will one-box upon seeing that Box B is full. This example does not satisfy pseudocausality (discussed below), and therefore we will see that there is an inconsistency between the optimal policies for the supra-POMDP and the Newcombian problem
Let and Without loss of generality, we again have where and for
The state transition graph of the supra-POMDP representing this problem is shown in Figure 9.
Define by
Then
Furthermore,
Therefore, the optimal policies for the supra-POMDP are and From another perspective, the optimal copolicy to for initializes the state to . The optimal copolicy to for initializes the state to . As a result, an agent can achieve low regret on and either one- or two-box upon observing the full box. In particular, for all policies they will learn to expect an empty box.
On the other hand, the optimal policies for the Newcombian problem are and . To see this, note that and Then whereas and The inconsistency between the optimal policies for and is a result of the fact that this Newcombian problem fails to satisfy pseudocausality, a condition we describe in the last section.
Epsilon-noisy full-box dependent
We now consider a variant of the full-box dependent version in which we assume that Omega is not a perfect predictor. In this case, Omega also puts $1M in Box B with probability when the agent will two-box upon seeing that Box B is full.
Figure 10 shows the state transitions starting from states and (The full state transition graph is the graph in which the the two right-most paths of the graph of Figure 9 are replaced by the trees in Figure 10.)
Let and Define by
A distinguishing feature of the supra-POMDP for this problem (compared to the other problems we have considered) is that the optimal policy for depends on the number of episodes. In the case of one episode, the optimal copolicy to and is the same. Namely, Then the expected loss for one episode is
We consider the extension of given by the mean per-episode loss. For two episodes, the optimal copolicy to and again has The interaction of and produces a supracontribution over given by where [9] and Then the expected mean loss for over two episodes is
On the other hand, the interaction of and produces a supracontribution over given by where is defined by
Then
More generally, for any number of episodes, the worst-case copolicy to has As a result, for all
On the other hand, over episodes, the copolicy to that always has results in an expected mean loss of which tends to zero as The copolicy to that instead has on most episodes results in a mean expected loss that converges to Therefore, for sufficiently many episodes, is the optimal policy for and moreover the optimal loss for converges to the optimal loss for
In Figure 11, we summarize the extent to which each decision theory makes optimal recommendations on the example problems.
Readers familiar with decision theory will observe that IBDT can be seen as an approximation to functional decision theory, which makes optimal recommendations across all the examples here. IBDT has the advantage of being well-defined in the sense that it can be run as code in an agent learning from the environment.
In this section, we define a condition (pseudocausality) that holds for all of the Newcombian problems discussed above, except for the (non-noisy) full-box dependent transparent Newcomb's problem. We then state a theorem that illuminates the significance of this condition. In particular, pseudocausality allows one to translate optimality from the supra-POMDP to optimality for the corresponding Newcombian problem Intuitively, pseudocausality means that there does not exist a suboptimal policy for such that the optimal policy and the suboptimal policy disagree only on events that are probability zero under the suboptimal policy.
To formally define pseudocausality, we consider the set of outcomes that are compatible with a given policy Namely, define
In other words, if the sequence of actions in agrees with whereas the observations can be arbitrary.
Definition: Pseudocausality
A Newcombian problem satisfies pseudocausality if there exists a -optimal policy such that for all if then is also optimal for
An example where pseudocausality fails
To see why pseudocausality fails for the full-box dependent transparent Newcomb's problem, recall that the optimal policies for are and We have
and
However,
and
but and are not optimal for
We leave it to the reader to check that all other examples discussed in this post satisfy pseudocausality.
Theorem on pseudocausality and optimality
The significance of pseudocausality is given by the next theorem. It states that if pseudocausality holds for a Newcombian problem , then the optimal loss for the corresponding fuzzy supra-POMDP converges to the optimal loss for the Newcombian problem. Furthermore, given a time discount if a indexed family of policies is optimal for in the limit, then the family is also optimal for in the limit.
Theorem [Alexander Appel (@Diffractor), Vanessa Kosoy (@Vanessa Kosoy)]:
Let be a Newcombian problem that satisfies pseudocausality. Then
Furthermore, if is a family of policies such that then
See the proof section for the proof.
Many thanks to Vanessa Kosoy, Marcus Ogren, and Mateusz Bagiński for their valuable feedback on initial drafts. Vanessa's video lecture on formalizing Newcombian problems was also very helpful in writing this post.
Previously called ultracontributions.
To make comparisons, we briefly review these decision theories, but this is not the focus of the post.
More generally, if is a measurable space, we define a contribution to be a measure such that
This terminology is motivated by the notions of semimeasures and semi-probabilities as discussed in An Introduction to Universal Artificial Intelligence (M. Hutter, D. Quarel, and E. Catt).
Here it is necessary to use the more general definition of a contribution as a measure.
Here suprakernel simply means a map in which the range is the set of supracontributions over some set.
For more detail, see Toward Idealized Decision Theory §2.2 (Soares and Fallenstein, 2015) and Cheating Death in Damascus §3 (Levinstein and Soares, 2020).
If then
As a technicality, we always assume that the empty action is taken at the beginning of an episode.
2025-12-03 22:34:58
Published on December 3, 2025 2:34 PM GMT
This proof section accompanies Formalizing Newcombian problems with fuzzy infra-Bayesianism. We prove the following result.
Theorem [Alexander Appel (@Diffractor), Vanessa Kosoy (@Vanessa Kosoy)]:
Let be a Newcombian problem of horizon that satisfies pseudocausality. Let denote the associated supra-POMDP with infinite time horizon and time discount Then
Furthermore, if is a family of policies such that then
Proof: Let denote the empty history. Given a supracontribution , let denote the set of maximal extreme points of First we remark that for any supra-POMDP, without loss of generality, a set of copolicies can always be replaced by
Given an episode policy let denote the episode copolicy that initializes the state to i.e. Let denote the distribution over outcomes determined by the interaction of and Note that the expected loss with respect to is equal to the expected loss for the Newcombian problem, i.e.
Recall that throughout this sequence, we assume that is finite. By the remark at the beginning of the proof, the expected loss in one episode for the corresponding supra-POMDP can be written as a maximum expected loss over a finite set of -copolicies Namely,
Then
and thus for any episode policy
We now extend this analysis to the optimal loss over episodes for [1] Let denote the episode optimal loss for Let be an arbitrary policy for episodes of Then as before,
where the maximum is over a finite set of -episode copolicies By the single episode case,
and thus
It remains to show that the opposite inequality holds in the many-episode and limit.
Recall that given we define
Recall that since satisfies pseudocausality, there exists a -optimal policy such that for all if then is also optimal for Consequently, for any episode copolicy either or To see this, suppose there exists an episode copolicy such that Then there exists a policy such that and . By pseudocausality, Thus
Define
By the remark at the beginning of the proof, the relevant set of copolicies in the definition of is finite, and thus is well-defined. If then Thus
Consider the iterated Newcombian problem over episodes. Let denote the multi-episode policy such that restricted to every episode is Let denote an arbitrary copolicy that interacts with Furthermore, let denote the number of episodes for which the episode-restriction of interacting with satisfies [2]
We have
Furthermore,
We leave it to the reader to verify that
Recall that if and is given by then the loss over episodes with geometric time discount is defined by
A copolicy can depend on the past, meaning it can depend on the policy. Thus can depend on .
2025-12-03 22:27:58
Published on December 3, 2025 2:27 PM GMT
Bubble Tanks is a Flash game originally released on Armor Games, a two-decade-old online game aggregator that somehow still exists. In the game, you pilot a small bubble through a procedurally generated foam universe, absorbing smaller bubbles to grow larger, evolving into increasingly complex configurations of spheres and cannons. Here is a reasonably accurate video of the gameplay, recreated in beautiful high-definition.
Bubble Tanks was first released in 2007, with a sequel out in 2009, and another sequel in 2010. Back when I first played it as a child, I was convinced, absolutely convinced, that there was someone in the world whose entire life was nothing but Bubble Tanks. This person—and I took it on faith that they were real—woke each morning and immediately, before coffee, before the basic animal functions of evacuation and sustenance, played Bubble Tanks. They posted on obscure forums, arguing bitterly over tank builds and bubble physics with three other people who had the same devotion. I knew that their room was disgusting, repulsive. This was essential to the vision, that their stained clothes lay across their floor, worms crawling over them. They were either skeletal or enormously bloated, monastic asceticism or excess gluttony, one or the other. Bubble Tanks was single-player, so they did not do all this for fame or glory, but for love, or for something even deeper than love. Everything had been sacrificed for this game, and excelling at it would be all that they had ever done, all that they would ever do.
And what if this person were just the start? What if this Flash game became the organizing principle of human civilization? The economy would shift to accommodate. Bubble Tanks coaching would become a viable career path. Parents would discuss their children’s talents at the game over dinner. Political candidates would be asked about their Bubble Tanks records during debates, and one would lose an election after it emerged that he never evolved past the third tank configuration.
Looking back on this fever dream I came up with a decade-and-a-half back, one thing that immediately strikes me is that being creative in such a world must be monstrously difficult. Not because all creativity must be ultimately tailored towards Bubble Tanks enthusiasts—that much is obvious, and is not especially different from creativity in the real world, which must tailor itself towards enthusiasts of human-understandable concepts—but rather because there would be astronomical amounts of Bubble Tanks content already in existence. In the latest stages of this civilization, billions of people have devoted their lives to Bubble Tanks. Millions of them are creative. Hundreds of thousands have genuine talent. Tens of thousands have produced work that is, by any reasonable measure, brilliant. The Bubble Tanks epic poem exists in fourteen languages. The Bubble Tanks symphony has been performed at concert halls on every continent. There are Bubble Tanks novels that have won Pulitzers, Bubble Tanks paintings that hang in the Louvre, Bubble Tanks films that win Oscars. All the obvious ideas have been executed. All the non-obvious ideas have also been executed.
I have wonderful news. You live in the earliest innings of this universe, at the start of it all, just as more and more of the population is beginning to wake up to how great this Flash game is. Even more fortunate for you, it is not just Bubble Tanks that is the object of human devotion. It is everything.
Humanity has been producing art for somewhere between 45,000 and 100,000 years, depending on how generously you define “art” and “humanity.” For most of this period, the constraint on creative output was not imagination, but production capacity. The printing press changed this, then radio, then television, then the internet, and at each stage the volume of creative work accessible to any given person increased by orders of magnitude. Today there are more novels published each year than a human being could read in a lifetime. There are more films, more paintings, more poems, more essays, more podcasts, more YouTube videos, more TikToks, more tweets, more everything than anyone could ever hope to consume.
And as more art is produced, the more we must learn to discriminate. Consider stories. They existed for millennia in the form of epics, religious hymns, folk tales. But with the rise of printing presses that allowed a wider variety of stories to circulate, we were forced to develop something very dangerous: filtering technologies. Genre is a filtering technology. It emerged because no one could read everything, and so readers needed a way to predict whether a given text was likely to satisfy their immediate demands. “Romance” is a promise: there will be a love story, probably with a happy ending. “Mystery” is a different promise: there will be a puzzle, and it will be solved. Both are not really descriptions, but more accurately a contract signed by the author about what the book will do for you. And like all technologies, genre has evolved to become more precise as the volume it must filter has grown. “Sci-fi” was once sufficient. Then it fractured into hard sci-fi and soft sci-fi, into space opera and cyberpunk. Brand awareness is a different filtration technology. Netflix originals have the flavor of something that will likely be decent, but also homogenous, whereas A24 movies have an art-house sensibility with a certain color palette. Each subdivision represents a refinement of the filtering mechanism, a narrower promise to a narrower audience.
Why are filtering technologies a problem? Aren’t they great? We’re getting increasingly good at giving people what they want!
Well, it wouldn’t be an issue if the creative process were limited by human scale, but we’re getting close to leaving that world. I feel pretty comfortable saying that, at this point, LLMs can handle nearly every sufficiently-chunked-up bit of music production, graphic design, video editing, background illustration, character concept art, voice-acting, essay writing, and a lot more. The list extends as far as creative production itself extends, which is to say: everywhere. Every domain that humans have developed aesthetic traditions within is a domain where AI can now perform the components of that tradition with reasonable competence.
One could imagine that in the near future, there will be a new button on your television, one with a sparkle animation. After clicking on it, it will offer you a QR code, politely asking to scan it with your phone. Upon doing so, the button will give you one of the ultimate promises of our new frontier-AI-lab-centric economy: a text box, that will generate a feature-length film using whatever prompt you enter into it. We have arrived. The long march is over. This is the ultimate final utopia that our filtration technologies have been building towards since the first monk started to distribute the Gutenberg Bible. What will we make? What wonders await us?
I suspect the answer is: mostly nothing. Or rather, mostly more of what we already have.
The problem with filtering technologies, one that becomes catastrophic precisely at the moment of their perfection, is that they assume you know what you want. The entire apparatus presupposes a subject who arrives at the interface with desires already formed, preferences already crystallized, a little homunculus sitting in their skull who knows exactly what kind of story it wants to hear tonight. And, to be clear, this actually works remarkably well in the case of a finite set of existing objects. When there are ten thousand, even a hundred thousand films in the Netflix library, the algorithm’s job is merely to surface the handful you’re most likely to enjoy from a pool that already exists. You don’t need to know what you want with any precision. You only need to recognize it when it appears before you, to say “yes” or “no.” Really, the algorithm is not an algorithm at all, but something even more basic: an ophthalmologist. It flips between lenses: better, or worse? This one, or that one? You do not need to understand the properties of curved glass or the anatomy of your own defective eyes. You simply must obediently respond to the question you are asked.
This all breaks down the second you are placed in the driver’s seat, because you do not actually know what you want. How could I make such a proclamation so confidently? I can’t, but I will anyway: what you want most, more than anything else in the world, is stuff that you never realized you wanted.
I realize that this is a tired sentiment, subtweeting the apocryphal Henry Ford line about faster horses. “If I had asked people what they wanted, they would have said faster horses.” The implication being: I, the visionary, know what you want better than you do. And I, despite the dullness of my audience, will give you the automobile. You would think, reading this essay, that I am making a case for the artist: the sacred figure who reaches into the void and pulls out something none of us knew we needed.
But I am saying something much worse, which is that nobody knows. Not you nor the visionary. The Ford line is wrong not because customers actually do know what they want, but because, if we’re being honest with one another, Ford didn’t know either. It was a happy accident that he later (again, apocryphally, because I don’t think he actually said it) narrativized into inevitability, because that is what popular culture does with fixations that turn out well.
You may guess where this is heading. It’s time to discuss Being John Malkovich.
Being John Malkovich is a nearly two-hour movie, filmed in 1999, directed by Spike Jonze, written by Charlie Kaufman. It stars John Cusack as a failed puppeteer named Craig who takes a job as a filing clerk on the seven-and-a-half floor of a Manhattan office building—a floor with ceilings so low that everyone must walk in a permanent stoop. This detail is never really explained, other than a vague mention of how the original building owner had a wife who was a dwarf, which raises far more questions than it answers, Did he build the entire floor for her? Did she work in this office? Was this an act of love or an insult? By the time these questions have been raised, the film has already moved on, and it is never mentioned again. One day, while filing, Cusack’s character discovers a small door behind a cabinet. He crawls through. In doing so, he finds himself inside the head of John Malkovich, the actor, experiencing fifteen minutes of Malkovich’s life from behind his eyes, before being ejected onto the muddy shoulder of the New Jersey Turnpike.
This is the basic premise, all introduced within the first half-hour of the film. And I have not yet mentioned the chimpanzee.
There is a chimpanzee, who has a reasonable amount of screen time. She belongs to Lotte, Craig’s wife, played by Cameron Diaz. The chimpanzee has intense psychological trauma as a result of being torn from her mother at an especially early age, a fact that was shown entirely through flashbacks. What is the purpose of the chimpanzee being traumatized? It is unclear, because it is never actually a relevant plot point. Why is Lotte taking care of this chimpanzee? Is she an animal therapist? She is not. She works at a pet store, and stores a wide variety of animals beyond just the chimpanzee in her (and John’s) small New York apartment for seemingly no reason at all. Why is the chimpanzee in the film? It seems to be for the sole purpose of a pivotal moment in the film which requires using the chimpanzee’s cage, but this moment does not actually need the prodigiously large cage for it to work, and one could imagine a thousand other more reasonable ways to accomplish the same narrative beat. Despite all this, the chimpanzee is there.
How did Charlie Kaufman, the relatively unknown screenwriter and driving force for the film, even come up with this plot line? In an interview, he says this:
I wrote Being John Malkovich while I was waiting for [the next sitcom] hiring season. My idea was that I would write a script and use it to get work. I had this idea that someone finds a portal into someone’s head, and I had another idea that somebody has a story about someone having an affair with a co-worker. And neither one was going anywhere, so I just decided to combine them.
Oh yes, there’s an affair too. But it gets even funnier. Why is John Malkovich the chosen victim of the portal? Kaufman also gave the answer in a different interview:
I don’t know... I thought it was funny. It’s hard to explain, but I thought it was funny, but not jokey. Because [John Malkovich] is a serious actor, he is a great actor, but there is something odd about him and there is something behind his eyes that you can’t see. And I thought that was a good person for this.
And then I think his name is perfect for the title...
Being John Malkovich is a worrying movie for a filtration technology maximalist, because it is both incredibly good, benefitting from both the insane premise and bizarre details, and is also something that nobody ever asked for. What is the film about really? What is the emotion that it is intended to evoke? It is about identity, I suppose. Also about desire, and the way desire makes puppets of us all. It is about the loneliness of being trapped behind your own eyes. It is also about John Malkovich, specifically, for no other reason other than it being an apparently funny choice. There are a lot of very strange, but ultimately invaluable, stylistic decisions made for this movie, all of which ostensibly made because Kaufman got a kick out of it.
To be clear, I am not saying something like ‘a sufficiently well-prompted AI could not come up with Being John Malkovich’. What I am saying—which actually feels like a pretty defensible viewpoint—is that very few people would ever think to assemble together a prompt to create Being John Malkovich. This opinion does not require any sort of humanist romanticism, or belief in some vague notion in ‘soul’. What it is grounded in, really, is a fairly basic observation about the structure of human desire; which is that desire is not a fixed quantity that exists prior to its satisfaction, but something that is frequently created retroactively by the very thing that satisfies it.
This would not be so bad if it were the only thing happening. The Kaufmans of the world would continue to write their chimpanzees, and the prompt boxes would continue to produce competent variations on existing themes. The end result would simply coexist alongside each other, one being deposited directly in the multiplex, the other in the art-house cinema, each serving its respective demographics.
But there is a second thing happening, and it is happening simultaneously.
Since AI is quite good at producing the art that isn’t too strange, I imagine nearly everyone will be, in due time, happy to hand over their consumption directions over to it. Soon, Suno will produce everyone’s music, Midjourney will make everyone’s phone backgrounds, and so on. Yes, it will be slop, not because it is bad, but because it repeats. Generative models are, by their nature, interested in modeling distributions—trained on everything, they converge toward the most likely areas of their distribution, which means that even when you prompt for something unusual, you are pulling against a gravitational force that wants to return you to the most common areas. The result is that the most common forms of AI output have a flavor, a kind of statistical residue that accumulates across pieces. But most people don’t mind this. They are happy to let the model play the same ophthalmology game with them, because they know they can play that game well, and the results will probably be roughly as good as the last algorithm they played the game with.
And herein lies the problem. Now, there is no longer any reason for the multiplex to exist, because the multiplex is not meant to be genuinely unique, and whatever is not unique can instead be entrusted to our own personal, finely tuned filtration technologies, combined with the infinitely patient AI. Is this bad? Not for the consumer! But it does put the artist in a pickle, because it now means their last remaining way of being seen at all, much less standing out, is to create something like Being John Malkovich. This cannot be easily made by the AI alone, because it does not submit to the ophthalmology game as easily. And creating something like Being John Malkovich is, I imagine, challenging.
Of course, strangeness has always been a useful strategy for art. Even beyond Charlie Kaufman, the greatest artists from the last century were all a bit off. Joan Didion had an unnerving flatness, describing a woman’s suicide from a sexless marriage in the same sentence as a shopping list. Hunter S. Thompson decided that the reporter should be more interesting than the thing being reported on, and shoved his demented, drug-addled brain into everything he wrote. David Lynch made movies in which the nightmarish mysteries refused to be made legible, just rather something you were forced to marinate in during the film. Importantly, nobody was strange in the same way. Didion’s strangeness is one of temperature. Thompson’s is one of proportion. Lynch’s is one of epistemology. What they share is not really a style, but a willingness to identify the thing that everyone else in their field was doing automatically, unconsciously, and to ask: what if I didn’t?
During their times, these people were made rich, famous, immortalized for doing something as brave as this. Things are different today. Now, to be above the crowd is the minimum required to be visible within it.
This is a very stressful situation, and one that all young artists born into the Bubble-Tanks-obsessed universe could likely sympathize with. They too live in a civilization that is utterly consumed with infinite creative production alongside the dimensions that matter—for them, Bubble Tanks—and are forced to produce something underneath a sky that has already seen it all. One can only imagine how strange their work is. Importantly, we occupy the antechamber of this world. What is coming next can be seen from where we stand, and our distance from it is decreasing at a rate that makes projection trivial; five years, ten, and the gap collapses into nothing. There is genuine cause for throat-closing anxiety at this prospect.
You can imagine a rather bleak future is the end result of this. One in which someone sits at their screen, asks their friend.com pendant to create an eleven-season series about a 45-year old Japanese woman and her tsundere relationship with a coworker at the glue factory she works at, and watch the end result with rapt attention. In this hyperatomized future, capital flows only to the frontier model companies and no one else, where nobody has common language to describe the media that they consume to anyone else, since every single piece of media has a singular person as part of both its creation and distribution.
But perhaps something better is possible. Consider an alternative future. This one is exactly same as the first one, with one minor difference: people have moments of intense boredom with what the machine spits out to them, and they decide to go out searching for something else that someone else has made, one that does not taste like something that they, in a million years, could’ve ever come up with themselves. Not because they do not have the technical talent to do so—technical talent is precisely the thing that has been commoditized—but because they lack the particular configuration of a life that would lead someone to write that, to make that choice, to include that detail that seems inexplicable, right up until they encounter it and realize it was obvious all along.
I am increasingly optimistic that the second version is the more likely one, only because it feels as if popular art is being increasingly dominated by the strange, the unmistakable, the ones that have an auteur-esque energy infused into it. To be clear, this is not new. But it used to be a privileged position, something you earned after decades of clawing your way through the studio system, or something you were granted by virtue of being from the correct lineage. Now the privilege has inverted. Now everyone must leave their own distinctive, strange smack on their work, or else disappear entirely. Just take a look around you. The auteur is increasingly colonizing forms of media that once operated on entirely different principles. Substacks, podcasts, technical news; many of the most promising ones today are largely held up by the specific and irreplaceable neuroses of the person producing them. This is strange and new and also very old. It is a return to something like the bardic tradition, in which the story and the storyteller were inseparable.
Of course, none of this is to say that the auteurs are rejecting AI. In fact, the best ones may use it more than anybody else does, as the speed at which production will be demanded in the new world will necessitate it. What makes auteurs so special? It is not the case that they, in their production of the strange, have any claim to particularly fine taste or even soul. In fact, their primary good fortune, often their only one, is that they want something, that they desire to tie great iron chains around some particular, ugly concept and drag it behind them wherever they go, clanking and scraping against the pavement, alerting everyone to their embarrassing presence. The machine has no such desire. It is capable of anything and interested in nothing. And the desire of what is uncommon, it turns out, is the only part of this whole system that struggles to be automated.
2025-12-03 22:02:48
Published on December 3, 2025 2:02 PM GMT
Let’s say you want to find out roughly how much of a country’s economy is controlled by the state vs the free market. How would you do so? My first instinct would be to look at state spending as a % of GDP, or maybe taxation as a % of GDP. If that state spends 90% of GDP, it’s pretty clear the government controls mostly everything and visa versa. Now, this isn’t a bad approach. State spending is a good indicator. The problem is it only captures a part of reality.
Imagine two countries. In both the state taxes and then subsequently spends 50% of GDP per year (no deficit or surplus). But let’s imagine a few different ways they could have more or less state control
TLDR: