2025-12-08 23:40:40
Published on December 8, 2025 3:40 PM GMT
Context:I (Daniel C) have been working with Aram Ebtekar on various directions in his work on algorithmic thermodynamics and the causal arrow of time. This post explores some implications of algorithmic thermodynamics on the concept of optimization. All mistakes are my (Daniel's) own.
A typical picture of optimization is when an agent causes the environment to have a convergent attractor: When you have an agent that is trying to steer a system towards particular target configurations, you can predict that the system will likely end up in those target configurations even if you have significant uncertainty over the initial configurations of the subsystem.
Some examples:
This view of optimization can be framed as a form of entropy reduction. Initially, we have high uncertainty about which configuration the system occupies: There are many possible initial states the raw materials or chess pieces could be in. The optimizing process reduces this uncertainty, concentrating probability mass from a broad initial distribution into a tight final distribution.
However, note that this entropy reduction cannot occur globally for two related reasons: the second law of thermodynamics and the reversibility of the laws of physics. The second law directly states that the total entropy of an isolated system tends to increase over time, which forbids global entropy reduction. Similarly, reversibility of the microscopic dynamics requires that each initial microstate maps to a distinct final microstate, which is incompatible with our "convergent attractor" picture where many initial states funnel to the same final state. In fact, within the framework of stochastic thermodynamics, the derivation of the second law is closely linked to the reversibility of the underlying physics. Roughly speaking, the reversibility of the underlying laws of physics allows us to define coarse-grained macrostates of the system where the dynamics are approximately markovian. Once we have this Markovian structure, we can derive the second law of thermodynamics as a mathematical consequence:
Second law from reversibility of physics[1]
Stochastic thermodynamics operates on coarse-grained macrostates rather than exact microstates. Let denote the space of macrostates. We translate reversibility of underlying physical laws into Markovian dynamics on these macrostates, allowing us to study thermodynamics through Markov processes.
For systems with discrete state spaces and time evolution, the translation is straightforward. Macrostates form a partition over the microstates of an underlying system that evolves deterministically and reversibly. The transition probability is simply the fraction of microstates in that evolve into macrostate after one time step. We define as the stationary measure of —the unique measure satisfying , where .
For classical hamiltonian mechanics where we have a continuous phase space, we discretize the phase space into cells and assume the cell dynamics are Markovian (known as a Markovian coarse-graining). The transition probability represents the fraction of phase space volume in cell that flows into cell under time evolution for duration . Liouville's theorem guarantees that phase space volume is preserved under Hamiltonian evolution, ensuring the stationary measure corresponds to the Liouville measure(the phase space volume of cell ).
We define the dual probabilities as , which represent the fraction of the microstates in that were mapped from microstates in . By Bayes rule, these are the reverse transition probabilites (only) when is sampled from the stationary measure .
We define the stochastic entropy as , which measures the entropy of an individual state. The generalized Gibbs-Shannon entropy is the expectation of stochastic entropy. Note that this differs from the standard Shannon entropy because we're operating on coarse-grained macrostates rather than uniform microstates. If the stationary measure is the counting measure where every macrostate is equally likely (e.g., if the discrete cells in our Markovian coarse-graining have equal volume), then we can omit and recover the usual Shannon entropy.
Now, if we have as the initial distribution of , which evolves to under our transition probabilities, then the stochastic entropy production that occurs when we transition from the state to is
We will attempt to prove that its expectation, the Gibbs-Shannon entropy production, is nonnegative:
First, note that since we have , we can rewrite the stochastic entropy production as .
In addition, we will make use of the identity
Given these, we have:
Where denotes conditional expectation given the random variable .
Then, by the law of total expectation, we have:
Applying Jensen's inequality to the above equality, we conclude that the Gibbs- shannon entropy production is non-negative:
In summary, the reversible, deterministic laws of physics at the microscopic level permit a Markovian coarse-graining into macrostates, where transition probabilities preserve the underlying reversibility through the dual probability relationship. From this Markovian structure, we can derive the second law of thermodynamics in the form of nonnegative entropy production.
Given the reversibility of physics and the second law of thermodynamics, we cannot achieve global entropy reduction. Instead, optimization can only reduce the entropy of a subsystem, and we must ensure that total entropy does not decrease while the underlying microscopic dynamics remain reversible. This constraint changes our "convergent attractor" picture of optimization: while an optimizing agent can funnel a subsystem from many initial configurations toward a narrow set of target states, it can only do so by carefully managing the entropy balance with its environment, ensuring that the information squeezed out of the optimized subsystem is properly accounted for in the larger system.
Given the constraints imposed by reversibility and the second law of thermodynamics, can we classify all the ways an optimizing agent can maintain these requirements while reducing the entropy of a subsystem? That is, how can an agent achieve the "convergent attractor" behavior of optimization while ensuring that global entropy production remains non-negative?
Suppose that we decompose our universe into the agent , the subsystem that the agent is trying to optimize, and the rest of the environment . Optimization results in entropy reduction in the subsystem , and we must make sure that the global entropy doesn't decrease. There are three ways this can happen:
We "erase" the copy of mutual information inside without making any changes to the agent or the environment. By erasing the mutual information in the subsystem, we have:
and . So there is entropy reduction in the subsystem.
The joint entropy becomes:
which is exactly the same as before. The mutual information between the agent and the subsystem allows us to reduce entropy of the subsystem without increasing entropy in either the agent or the environment.
To recap, there are three ways that an embedded agent can optimize a subsystem under information conservation:
Entropy is supposed to capture "the capacity to do work", which is really the same as the capacity to perform optimization. Consider again the setup of Maxwell's demon: The demon stores information about the gas particle configurations inside its own memory, thereby increasing the entropy of its memory. If the demon's memory has a finite state space, then the difference between the size of the demon's memory and the memory's existing entropy represents the demon's remaining capacity to do work, as that's the amount of "free memory" that can be used to store further information about the room of ideal gas. A greater amount of entropy leads to a smaller amount of "free memory" available for optimization. In particular, the amount of free memory should be an objective feature of the memory state itself, since it's supposed to represent the demon's objective capacity to perform optimization on the room of ideal gas.
Details
To formalize this more rigorously, suppose that the state space of the demon's memory is , the demon may store information as binary string inside its memory, and we always mark the end of binary string by 1 to separate it from the "free memory".
In other words, the demon's memory contains a string of the form , where is a binary string and . We can think of as representing the size of "free memory" that can be used to store information and therefore perform optimization, where as represents the amount of used memory. In particular, for each , we group all strings of the form where and into the same macrostate (labeled by ).
Now, if we have a counting measures where all strings in are equally likely, then the entropy of the macrostate will be exactly , since there are exactly random bits given this macrostate (the other bits taken up by are deterministic).
As a result, when a demon stores new information inside its memory, it will have to use up more memory and therefore increase . Entropy production occurs because we're "forgetting" the exact microstate the memory is in when we condition on the macrovariable.
However, this intuitive objectivity comes into conflict with existing definitions of entropy, as they rely on a subjective distribution. For instance, in stochastic thermodynamics, the stochastic entropy (or Shannon codelength) of a state given a distribution is defined as when the stationary measure is the counting measure, with our familiar Gibbs-Shannon entropy defined as the expectation of the Shannon codelength. In particular, the entropy of an individual state is undefined unless we also specify the distribution . To translate this to the Maxwell's demon case, we would have to say that the demon's "capacity to do work" somehow depends on our subjective distribution over the memory states of the demon, which raises puzzling questions such as: "If our subjective distribution over the memory states has changed, does that affect the demon's actual capacity to do work?" It seems obvious that demon's physical ability to perform useful work on the gas particles should not depend on an external observer's beliefs about its memory configuration. Yet standard formulations of entropy seem to make this objective physical capacity fundamentally observer-dependent.
In contrast with the subjectivity in traditional definitions of entropy, algorithmic thermodynamics defines the information content of a physical state as an objective feature of that state. Similar to stochastic thermodynamics, it relies on a Markovian coarse-graining of the system with coarse-grained state space . However, instead of defining a subjective distribution over , it simply assigns a binary string encoding to each coarse-grained state . Under the counting measure, the algorithmic entropy of a state reduces to the Kolmogorov complexity of its binary encoding , which measures the length of the shortest program that can generate that encoding. This quantity represents a system's actual capacity to do work from an individual state; while algorithmic entropy depends on a choice of universal computer, it has been argued that the effect of this dependence is small for all realistic pairs of computers we might want to compare[3], allowing us to treat algorithmic entropy as a state function.
In algorithmic thermodynamics, we also have a version of the second law of thermodynamics that originates from Levin's principle of randomness conservation, giving probabilistic guarantees on the nondecrease of algorithmic entropy. Additionally, it offers a new interpretation of Gibbs-shannon entropy: while algorithmic entropy captures the system's actual capacity to do work given an individual state, the Gibbs-shannon entropy represents the expected capacity to do work when our state is sampled according to a prior distrbution , and when that distribution is known a priori.
When we analyze Maxwell's demon under the lens of algorithmic thermodynamics, we find that if the content of the demon's memory is somehow compressible, then the demon can in principle leverage universal computation to compress that content, thereby freeing up more available capacity to store information and perform optimization. On the other hand, if the content of its memory were incompressible, then the unused memory represents the demon's actual remaining capacity to do work, which is objective because there is no way to increase available memory without physically destroying information. This capacity does not depend on our subjective distribution over the demon's memory states; the compressibility of the memory state is an intrinsic property of the physical configuration itself, making the demon's optimization capacity observer-independent.
What are the implications for our understanding of optimization now that we've refined our understanding of entropy through algorithmic thermodynamics? How does this foundation change our analysis of the three types of optimization we identified earlier?
Theorem 1 in Aram Ebtekar and Marcus Hutter. Foundations of algorithmic thermodynamics. arXiv preprint arXiv:2308.06927, 2024.
Engines of cognition explains the same idea
Page 5 of Aram Ebtekar and Marcus Hutter. Foundations of algorithmic thermodynamics. arXiv preprint arXiv:2308.06927, 2024.
2025-12-08 23:30:46
Published on December 8, 2025 3:30 PM GMT
I believe that we will win.
An echo of an old ad for the 2014 US men’s World Cup team. It did not win.
I was in Berkeley for the 2025 Secular Solstice. We gather to sing and to reflect.
The night’s theme was the opposite: ‘I don’t think we’re going to make it.’
As in: Sufficiently advanced AI is coming. We don’t know exactly when, or what form it will take, but it is probably coming. When it does, we, humanity, probably won’t make it. It’s a live question. Could easily go either way. We are not resigned to it. There’s so much to be done that can tilt the odds. But we’re not the favorite.
Raymond Arnold, who ran the event, believes that. I believe that.
Yet in the middle of the event, the echo was there. Defiant.
I believe that we will win.
There is a recording of the event. I highly encourage you to set aside three hours at some point in December, to listen, and to participate and sing along. Be earnest.
If you don’t believe it, I encourage this all the more. If you don’t understand the mindset, or the culture behind it, or consider it an opponent or dislike it, and especially if yours is a different fight? I encourage this all the more than that. You can also attend New York’s Solstice on the 20th.
You will sing songs you know, and songs you don’t. You will hear tales of struggles, of facing impossible odds or unbearable loss and fighting anyway, of how to face it all and hopefully stay sane. To have the end, if it happens, find us doing well.
I live a wonderful life.
I am crying as I write this. But when I am done, I will open a different Chrome window. I will spend the day with friends I love dearly and watching football games. This evening my wife and I will attend a not wedding of two of them, that is totally a wedding. We will fly home to our wonderful kids, and enjoy endless wonders greater than any king in the beating heart of the world. I want for nothing other than time.
Almost every day, I will mostly reject those wonders. I will instead return to my computer. I will confront waves of events and information. The avalanche will accelerate. Release after release, argument after argument, policies, papers, events, one battle after another. People will be determined to handle events with less dignity than one could imagine, despite having read this sentence. I fight to not be driven into rages. I will triage. I will process. I will change my mind. I will try to explain, just one more time. I will move pieces around multiple chessboards.
We continue. Don’t tell me to stop. Someone has to, and no one else will.
I know if I ignored it, anything else would soon turn to ash in my mouth.
I will look at events, and say to myself as I see the moves unfolding, the consequences of choices I made or influenced, for good and ill: This is the world we made.
It aint over till its over. Never leave a ballgame early. Leave it all on the field, for when the dust covers the sun and all you hope for is undone. You play to win the game.
The odds are against us and the situation is grim. By default, we lose. I act accordingly, and employ some of the unteachable methods of sanity and the mirror version of others, all of which are indeed unteachable but do totally work.
Yet the echo is there. In my head. It doesn’t care.
I believe that we will win.
2025-12-08 21:46:17
Published on December 8, 2025 12:31 PM GMT
This post summarizes two related papers that will appear at AAAI 2026 in January:
For a long-form research talk on both papers, here is a video recording from ILIAD ‘25 along with slides.
What these papers try to quantify are two questions:
The first paper gives formal lower bounds on the difficulty of AI alignment that apply even in best-case scenarios.
The second paper shows how @So8res et al.’s 2015 notion of Corrigibility formally fits into a narrow sliver of objectives that survive the main lower bound — and provides the first rigorous multi-step guarantees for it in partially observed environments.
This post gives only the intuitions and high-level takeaways, not the proofs. Please see the papers for details 🙂.
Game theory gives us a model of agents as rational systems: agents exchange messages, update beliefs, and choose actions. But game theorists rarely ask: How many resources do they have to exchange, across all possible communication protocols? How much information is minimally needed to converge? What if the agents are computationally bounded?
Complexity theory adds in those missing considerations. Together they can give us something akin to a “Turing machine model of computation for multi-objective, multi-agent alignment”.
This combination lets us map alignment into a clean abstraction — “-agreement” (described below) — that allows us to prove lower bounds: objectives × agents × a task state space of size , all communicating until they approximately agree with high probability.
Now, why do we especially want to prove lower bounds? The main reason is that we don’t really know how hard alignment is. An analogy from @Geoffrey Irving that I particularly like is that we could be in (at least) one of two possible worlds, “Adversaria” or “Basinland”. We can think of the lower bounds in Part I as quantifying the rough edges of the alignment surface, and Part II’s corrigibility analysis as identifying and characterizing a basin-like safety target within this surface:
In -agreement, we model alignment as a communication problem:
This is extremely general, and as we show in Appendix C, allows us to even model tail risk/extreme events as well. It doesn’t assume neural networks, rationality failures, specific algorithms, or even specific reward-learning schemes. It only assumes: agents have priors (that aren’t necessarily common either, unlike in standard Aumann agreement!), they communicate by exchanging messages, update beliefs, and try to align.
Below we depict how -agreement encompasses prior alignment and agreement frameworks like Debate, CIRL, Iterated Amplification, etc:
Note that this is a slight contrast to how most theory is done — often times, theorists make a bunch of assumptions (“spherical cows”) to mathematically prove that something is feasible, but those assumptions don’t typically hold in practice, usually rendering the conclusions less relevant. Here, we take the inverse approach by quantifying normative hardness rather than feasibility: Namely, if alignment is hard here, it will be hard (if not potentially harder!) in the messier real world too.
Even in this idealized model — where everyone is fully rational, Bayesian, and computationally unbounded — the following is inevitable:
This means:
Adding more values, more overseers, or more agents directly hurts alignment sample-efficiency. Note that the quadratic dependence (via the term) on the number of agents/overseers is not much more of an issue than the linear dependence on the number of values M. This is because we often don’t really care if our AIs align with each other, so long as they align to us (thereby making the dependence rather than ). Thus, my own interpretation is that the number of values is the more fundamental bottleneck in practice, than the number of agents .
This gives the first general “no-free-lunch” theorem for alignment. From a theoretical point of view, this is the more fundamental lower bound of the four presented in the paper, as it applies to unbounded agents across all possible communication protocols. In other words, aligning to “all human values”, like one might need for @Eliezer Yudkowsky’s CEV, is intractable even here, especially considering that humans might agree on some subset of their state/action space but disagree on others (this “fracturing” of the state space mirrors the construction that yields this lower bound). For example, a worst-case scenario illustration of this would be if , requiring a distinct agreement objective for each state of the world, of which there can be exponentially many. Furthermore, proposals such as using brain-machine interfaces with AIs, most famously suggested by Elon Musk as a means of ensuring alignment, will face this barrier too — not only because the brain is constrained (though that’s one practical reason, nicely articulated by Zheng & Meister (2024)), but more fundamentally: even for unconstrained agents, if you have too many tasks to align over, the minimum amount of bits exchanged to guarantee alignment would be too large.
However, I am optimistic that for many relevant deployments, the value sets will be small (e.g. for a coding agent), and furthermore, there are more “universal” small value sets we can almost all agree on without needing to agree on ethics, as we discuss below in Part II.
This, in my mind, is the bigger one to consider in practice since is the number of distinguishable states the objective might depend on, and despite the linear dependence for unbounded agents, itself is often exponential (big task spaces/complex environments).
Now, real agents are not unbounded Bayesians. In Proposition 5, we also consider computationally bounded agents that interact with their state space by sampling the world, much like we do with current agents (e.g. we do this sampling when we prompt an LLM and get a response from it).
Specifically, when you introduce:
the cost can explode exponentially in in lower bound when there are rare events to align over.
LLMs aren’t perfectly Bayesian; they shortcut, miss rare events, and fail to coordinate across complex value structures because doing so would require astronomical computation. The alignment difficulties we observe are not always accidents of training — they’re consequences of basic information theory.
Altogether, these two lower bounds formally capture something alignment researchers have intuited for years:
Reward hacking is inevitable when the state space is large.
Not because models are adversarial, but because it is information-theoretically intractable to align all the rare failure cases with bounded interaction.
Note these lower bounds apply across all bounded functions, so it shows that there are no globally “reward unhackable” functions in practice. Of course, in principle, if one had enough resources to ensure uniform coverage across the state space, they could avoid reward hacking in our setting.
But an exponential dependence on makes this really challenging in reality. For instance, in Corollary 1 we give a concrete example of bounded Bayesians reaching agreement to be statistically indistinguishable from the unbounded Bayesians across all M tasks without assuming common priors (we call them “total Bayesian wannabes” in Definition 1, generalizing @RobinHanson & @ScottAaronson’s earlier notions of “Bayesian wannabes”), and the amount of resources needed ends up being far more than the number of atoms in the observable universe! This perhaps illustrates why we ought to pay attention to avoiding alignment barriers for these types of agents, especially computationally unbounded ones, in practice.
Closely matching upper bounds show that — even for unbounded Bayesians — we can’t do much better. This is mainly as a certificate to confirm that -agreement isn’t some impossibly hard problem (thereby making the lower bounds trivial), but in fact does converge.
These observations suggest at least 1-2 combined directions, succinctly summarized as:
We should search for small, structured, compressible safety targets that we can well estimate on the most safety-critical portions of the state space — especially ones that do not require representing all human values.
Which leads to the second paper.
The second paper addresses the natural follow-up question:
If aligning to “all human values” is provably intractable, what can we align to more generally?
We mentioned determining context-dependent values as one takeaway in Part I. What we mainly have in mind here is that in many deployments, we don’t need to align to all human values. For example, take a coding agent: what we really want it to do is just to write complete, well-documented, and functional code/unit tests, not solve morality. But it would also be nice to identify more universal safety targets we can likely all agree on.
Corrigibility is one of the few candidates people in the field repeatedly return to:
But prior attempts ran into a longstanding obstacle:
No single unrestricted scalar reward function can satisfy all corrigibility desiderata in general.
This no-go is now proven formally in our Proposition 1, and essentially takes the form of the paperclip maximization argument. If you collapse all safety constraints into an unconstrained single scalar, like we commonly do in RLHF and RLAIF, we cannot formally guarantee that the agent doesn’t exploit safety-performance tradeoffs and break something (often switch preservation).
So what’s the alternative?
The key insight is that a corrigible agent must not maximize an unconstrained single stream of reward.
Instead, naturally motivated by the -agreement setup of having multiple objectives, we have the agent optimize five independent, bounded utility heads, each representing a distinct safety channel:
In Definition 3 of the paper, we give mathematically formal definitions of each of these utilities in the partially-observed off-switch game (PO-OSG), which is the most recent extension of the off-switch game proposed by Garber et al. (AAAI 2025).
And crucially:
These heads must be combined lexicographically, not merely linearly.
.
By enforcing strict priority gaps (specified in conditions W1-W3 of Theorem 1), we can prove corrigibility as an optimal policy: namely, the agent is never allowed to trade task reward against preserving the off-switch, never allowed to trade truthfulness for performance, avoids truthful but misleading statements that impact the exogenous access probability of the off-switch, etc. This holds regardless of the human’s rationality model or information pattern.
This bypasses the no-go result for single rewards, and therefore qualifies the Orthogonality Thesis: once verifiable corrigibility constraints are imposed, only a restricted subset of utility functions remains implementable.
Additionally, we extend the PO-OSG to the multi-step setting, and show in Theorem 3 that corrigibility holds across time (along with corrigible progeny). Furthermore, we bound the multi-step failure probability under planner error and learning/misspecification error, and show that when these errors occur, this failure probability degrades linearly rather than exponentially, and that the probability of loss of control under corrigible behaviors is strictly bounded away from 1 via a martingale-style analysis. This is the first general multi-step corrigibility guarantee, in both fully-observed and partially-observed environments. We also connect our construction with other safety targets; for example, showing that it satisfies @RyanCarey & @tom4everitt’s net benefit, along with extending their net benefit criterion across time.
Corrigibility, in Part II’s formulation, is:
It is exactly the kind of objective that doesn’t suffer from the main, fundamental - barrier. If “aligning to all values” is the formally intractable goal, “aligning to corrigibility” may be the uniquely tractable one.
Regarding the barrier, in practice, we want to learn these heads well in the safety-critical slices of large state spaces, and we can characterize whether the failure probability is acceptable using our Theorem 3(i). We note that our results are intentionally architecture-agnostic, and our characterization of the failure probabilities in under learning/planner error can help inform policy as it may deem them at an acceptable threshold (or not) depending on the deployment.
Now, our lexicographic solution may be one of many possible solutions to corrigibility if we are in Basinland, or the only one if we’re in Adversaria — we think it is an interesting open question to explore other approaches to corrigibility, especially seeing which ones are most scalable to LLMs today. While our formal construction applies more generally to any restricted resource(s) beyond an off-switch that we want an agent to be corrigible with respect to (e.g. a budget, file system permissions, safety filters in robotics, etc), how cost-effectively scalable it will in practice over RLHF/RLAIF with current LLMs/VLMs/VLAs, is an important empirical question. It is worth noting that one can flexibly incorporate our lexicographic framework with RLHF/RLAIF by having the RLHF/RLAIF reward signal be the task reward .
In fact, in separate work, we show the cost of alignment also has an impact on welfare to be gained from AI, such as the feasibility of funding UBI from AI rents. One of the nice features of lexicography is its modularity and interpretability. In fact, one could imagine placing additional terms having to do with markers of human well-being (such as the agent encouraging face-to-face contact), below the corrigibility layer -, but above (or as part of) the task reward . This is another interesting open question, as to whether we can get consensus on these terms from long-term scientific studies, the most famous being Harvard’s “happiness” study.
We also show in Section 3 (Proposition 4) that verifying corrigibility (or even more generally, “non-catastrophic behaviors”) of an arbitrary, possibly hacked agent is formally undecidable. Furthermore, we prove in Corollary 1 that one can prove a relativized version of the undecidability as well, relevant for showing the limits of current scalable-oversight protocols that are arithmetizable. In other words, absolute, horizon-unbounded safety guarantees do not stabilize at any finite oracle level.
Thus, no oversight mechanism can prove unbounded-horizon safety in all cases.
Proposition 4 and Corollary 1 are simple reductions to the halting problem and the arithmetical hierarchy. They might be folklore, but as I couldn’t find them written down explicitly, I included them for completeness.
However, as you may perhaps gather from my favored style of reasoning, formalizing the barriers above helped me better understand what’s tractable. Namely, I show in Proposition 5 that for any fixed finite horizon, safety is verifiable in randomized polynomial time, giving us a tractable, privacy-preserving (zero knowledge) auditing regime resembling how real red-teaming evaluations work (as they operate on the level of thousands of tokens): namely, verify short-horizon safety repeatedly.
1. While compressing values is recommended, stop collapsing all norms into one unrestricted reward signal, like in RLHF/RLAIF.
Scalar reward makes full corrigibility impossible.
2. Potentially use lexicographic multi-head objectives.
If safety and performance compete, performance should lose by construction. One can incorporate RLHF/RLAIF into our lexicographic construction by having it be the task reward head . Using the architecture-agnostic analytic form given by Theorem 3(i) to see if we can empirically achieve acceptable failure probabilities under learning/planning error in existing deployments with current models is an important open question.
3. Focus on safety-critical slices, not full coverage.
Reward hacking is inevitable globally under practical resource constraints.
Instead, target the high-impact parts of the state space with multi-turn red-teaming evaluations.
4. Use short-horizon verification-and-refresh cycles.
Unbounded formal oversight is impossible.
Finite-horizon oversight is both tractable and realistic, and in fact, zero-knowledge privacy guarantees are feasible in randomized polynomial time by Proposition 5.
5. View corrigibility as a “value set zero”— a small, neutrally universal, robust baseline that ensures human control and limits catastrophic behavior.
The intrinsic barriers paper in Part I shows that alignment has normative information-theoretic limits:
too many values, too many agents, or too much state space, and alignment becomes fundamentally intractable, even for computationally unbounded agents.
The corrigibility paper in Part II shows that, despite those limits, at least one safety target — corrigibility with lexicographic heads — is provably achievable, even under approximation, partial observation, self-modification, and multi-step interactions.
Taken together, they point toward a pragmatic alignment strategy:
Don’t try to encode all human values.
Encode corrigibility.
And let this minimal, provable core hold the line while the system performs the task.
We thank the Burroughs Wellcome Fund (CASI Award), the UK AI Security Institute (AISI) Challenge Fund, and the Foresight Institute for funding this research. We also thank Scott Aaronson, Nina Balcan, Michael K. Cohen, Shafi Goldwasser, Andreas Haupt, Richard Hu, Rubi Hudson, J. Zico Kolter, Jacob Pfau, and Max Simchowitz for helpful discussions and manuscript feedback across both papers.
2025-12-08 16:40:37
Published on December 8, 2025 8:40 AM GMT
This is part 2/2 of my introduction to Live Theory, where I try to distil Sahil’s vision for a new way to scale intellectual progress without systematic thinking. You can read the part one here.
In the first part, I described systematic thinking as the most successful coordination tool of our civilization. It’s the infrastructure that allowed billions of brains to work as a collective brain, solving problems no society could solve before. I finished by introducing Live Theory as a successor to systematic thinking. This is a big claim! In this part, I will finally unpack (my best guess of) what Live Theory is, and how it can claim to go beyond systematic thinking, fixing its inability to deal with context-dependent problems.
Pre-systematic thinking is playdough. Examples of outputs: Poetry coming from your heart, a tailor-made shirt made by your grandma.
Pre-systematic thoughts are fluid and include verbal and non-verbal modalities. They can bend and wrap around the real-world objects they encounter. The conceptual and physical artefacts it produces are wiggly; they cannot easily interface with one another.
Systematic thinking is a brick mould for thought. Examples of outputs: physics equations, software bundled and distributed to billions of users, H&M pants with different sizes that can somewhat work for everyone, but are not made for a specific body shape.
The thoughts are unfolding within a system, a set of rules describing allowed and non-allowed moves. Systems act as moulds for thoughts. They turn the fluid content of the mind into bricks that can be stacked on top of each other. This brought two major benefits:
The Live Theory vision is to use abundant fluid intelligence as a mortar that adapts to intellectual contributions of all shapes and connects them. Example of outputs. Well, the notion of “output” doesn’t make much sense anymore; more on that later. But an aspirational example could be: a well-chosen piece of poetry and math equations being used as input by an AI infrastructure to write an algorithm to solve a market-matching problem.
In this vision, AI is used to create an infrastructure that adapts to the shape of the individual contribution. The infrastructure can translate insights from systematic, pre-systematic or even post-systematic sources, to tailor them to new contexts.
Pre-systematic artefacts are impregnated by the context in which they are forged: the specific tools used, the specific know-how found by a master. It takes direct mentorship or apprenticeship to convey, which slows intellectual progress.
The stackable thoughts from systematic thinking are the innovation that connected millions of human brains into a single high-bandwidth cognitive system, where results found at one side of the world could be effortlessly reused by everyone. This is the cognitive infrastructure that is behind all of our modern institutions: financial systems, globalized markets, scientific method, legal systems, etc.
However, the thoughts can only fit the world to the extent that the system is a good proxy for the world. And because the system is meant to scale to the whole world, its solutions fit moderately well for everyone. Even with tunable parameters, systems are too rigid to truly adapt to the different application contexts.
The scale from systems was so successful that we became addicted to it. To keep it going, we shaped the world to remove context-specific idiosyncrasies so that systematic solutions would apply better across contexts. It’s like doing plastic surgery to change your body shape so the H&M pants would fit you perfectly.
However, the systematic infrastructure is fundamentally ill-suited to problems that are context-sensitive. No big deal, these are just the most important problems of our time, like aligning technological development to human flourishing. By default, the development of AI will only amplify our systematic machine, making these problems worse, like forcing a square into a round-shaped hole.
A note on terminology. I use “Live Theory” (capitalized) for the name of the whole vision, shortened to “the live vision” for brevity. It is a hypothetical world in which our infrastructure and artefacts are “fluid” or “live”. I say “a live theory”, to talk about a new kind of artefact that replaces the classic “theory” from the systematic world.
What is it not? Live Theory is not a new “framework”, “protocol” or “method” to coordinate at scale. That’s the whole point! Otherwise, this would be designing a good old global system, exactly the thing we want to go beyond. At the moment, Live Theory barely exist. It is a quiet whisper from the future, suggesting that the near-term abundant fluid intelligence might enable a radically different way to scale intellectual progress.
It is not “add an AI chatbot to X”. Even if the AI infrastructure play an important role in the story, whenever you see AI, think in your head “Google Translate, but for everything: [image, video, audio, code, math, text in all languages] to [image, video, audio, code, math, text in all languages]” and not “ChatGPT”.
Who is working on it? This blurry vision is slowly becoming clearer through the work of Sahil, its initiator, and multiple co-thinkers and collaborators working to build prototypes and clarify concepts through debate. You can learn more about their research on the groundless website.
The bottleneck of this work at the moment is imagination. To imagine something outside the systematic frame requires unlearning the mental muscle memory from decades of systematic thinking. Many projects from the groundless team take the form of interface prototypes that aim to concretely embed this new paradigm for compounding intellectual contributions.
If systematic thinking is so sticky that the hardest part of building the vision is getting out of it, it is a good idea to start by looking at today’s examples of things that are not systematic thinking.
We can find such examples in people working close to the systematic fire: mathematicians. Their job is to produce papers filled with symbols, formulas and proofs. However, professional mathematicians rarely think in symbols. Their language is rich in intuitive statements like “high-dimensional spheres are spiky“. But when they need to try an intuition for real, they can effortlessly translate these thoughts into rigorous math.
I once had the chance to attend a lecture given by the Fields Medallist Jean Paul Serre. I didn’t understand much of the content, but I remember vividly how he effortlessly sprinkled equations on the whiteboard, filling the in-between with imagined descriptions, what you would call now “vibes”. He looked like a blacksmith simply laying his sharpest tools on the workstation for display, trying to convey, with imperfect words, the embodied know-how living in his hands.
This stage of reasoning is called post-rigorous (that I will refer to here as post-systematic thinking). The rigour serves as the foundation on which the inner world of a mathematician grows, rich in intuitions, imagery, and, believe it or not, emotions. Over years of practice, from encountering problems in different fields, mathematicians create their own language of thoughts that pre-exists the rigorous proofs.
Even if it constitutes the bulk of the mathematician’s lived experience, this world rarely gets shared. The squishy, subjective experience doesn’t belong in conference papers. A rigorous proof is the final product of a mathematician’s work, even if post-rigour is what produced it.
The vast world of mathematical intuitions developed through years of practice is shared in the old way: through apprenticeship from PhD supervisors to PhD students.
From the live theory perspective, this is an immense missed opportunity. If only we could distribute these insights as widely as a mathematical proof!
In the past decade, we created computer programs that can interact with natural language, create music, video and images. Where software of the past was restricted to the automatic application of systematic rules, we now have software that demonstrates a fluidity that normally characterizes pre- or post-systematic thinking.
This new kind of software reduces the cost of certain fluid tasks by orders of magnitude, such as high-quality translation or solving programming problems. The hypothesis of live vision is that AI will continue to develop in a manner similar to that of computers or the Internet. It will become a more reliable, cheaper and faster version of the multimodal LLM we have today, but without the ability to generate revolutionary results.
For the live vision, the killer application of AI will not be the generation of new ideas from scratch; it will be interpretative labour, translating insights from one context to another. The idea is to use this abundant fluid intelligence as a mortar between intellectual contributions, whatever their shape. In systematic thinking, your intellectual contribution has value only if it fits the brick mould used by everyone else. In a live world, your intellectual contribution has value, full stop. No matter if your insight is expressed as poetry, equations, a drawing, a book, or an audio recording of a conversation, the content can be cheaply translated to whatever shape would be useful for the consumer at the last minute.
Instead of forcing the output of intellectual labour to fit within the systematic infrastructure, a fluid infrastructure adapts to the shape of the intellectual contributions to distribute them.
Depth matters. This is not to say that any squiggle on a whiteboard can be turned into the equations of general relativity. The squiggle was simply an arbitrary gesture I made with a pen. It contains insights about my handwriting, the texture of the whiteboard, maybe how agitated I was when I drew it or what I was thinking. Even with all the translations in the world, the squiggle’s potential reach is limited to its immediate context. At most, I could use it to understand my emotional state at a specific moment.
To contrast, take the squiggles in the notebook of Einstein before he finalized the equation of relativity. There is also a lot of context dependency in these handwritten notes. He probably used a ton of ad hoc abbreviations. But there is more to it. The notes contain a depth of understanding of physics that will eventually lead to the most important equations of modern physics. It is fair to say that a big part of the insights was likely already present well before the final equations were written. But from the systematic point of view, the values of the notes are almost zero until they contain usable formulas. However, the potential reach of Einstein’s notebook after interpretation is enormous. They have the potential to change the worldview of thousands of physicists and explain observations from the other side of the universe.
The depth of the intellectual labour matters, even if the fluid infrastructure can adapt to its form. Systematic thinking will keep playing an important role in a live world, because systems are wonderful tools to build complex thoughts. But at last, the depths of insights coming from mathematical equations would be able to interact with the depth coming from other domains, from physics, biology, philosophy, or poesy, or the post-systematic insights from my lecture from Jean Paul Serre!
Fluid book generation. To concretise these ideas, let’s look at what writing and reading a book could look like in a live world. (Example taken from this table of live examples.)
The writer writes not a book, but a book-prompt that carries intuitions for the insights. An economics textbook-prompt, for example, could contain pointers to the results and ideas but the language, examples, style would combine with readers’ backgrounds (and specific wishes) to turn into independent textbooks for each reader. Importantly, this frees up producers of books from having to homogenize the range of audiences they might want to speak to.
This is a bit like changing the font and the colour of an e-book, but for content. Specific wishes the readers could have (such as “I’m looking for something that would make me laugh”) are things that could be appended to the original book-prompt. They don’t have to be independent books either, and can be _inter_dependent. Commentary on your own book can be translated to someone else’s book.
Value moves upwards. As mentioned in the intro, with a fluid infrastructure, the distinction between input and output blurs. Before you had a book, a static output from the writing process, which is the input to the reading process. In a live world, the final book the reader sees is co-created by her own wishes, background information (maybe as a list of books she knows really well), and the writer’s book-prompt.
The general tendency is that potentials become first-class citizens, like the book-prompt that is a potential for a book. In our world, the execution work that goes into turning a draft into a polished book or a drawing into software is expensive. But in a live world, this work is as cheap as water or electricity. So the value moved upstream: the draft itself becomes valuable, and even before the draft, the recording of the conversations in which the ideas first appear becomes valuable.
To be extra clear, this doesn’t mean AI slop generated from vague prompts. Depth and mastery from any craft will be more valued than ever. But the artefacts people produce with this depth will change.
To recap, fluid intelligence might make the basis for a fluid infrastructure that could scale intellectual progress without forcing the shape of the outputs. We left the part I with all sorts of problems stemming from the inability of systematic thinking to adapt to context, such as aligning technological development to human flourishing. How does this vision for a fluid infrastructure address these?
The rigidity of institutions based on systematic thinking comes from an imperfect fit between the system they use and the world they interact with. The circle arrow in the diagram below is the Achilles’ heel of systematic thinking.
No matter how much work is put into producing new results within a system, the body of knowledge produced is bounded by how well the system fits the world. In this sense, systems are static; they cannot dynamically evolve to keep pace with the changing form of the world.
Lock-in effect from globalized systems. Systems are often seen as non-overlapping spaces. An algorithm is either written in Rust or in Python. It cannot be in both at the same time, like how an apple cannot be in my house and in my neighbour’s house at the same time.
Imagine you want to change the fit system-world to better match your specific context. Maybe you want to design a new programming language adapted to your new hardware, or a new legal system adapted to your country. Then, you have to start from scratch. You will need a lot of work to translate the insights from other systems into your new system.
This creates a centralization dynamic where the biggest systems are likely to grow even bigger. It is the network effect well known by the giant platforms of our digital world. You need to install this new app to chat with your friends. You’d like to use your niche app with this great interface that fits your habits, but you’d have no one to talk to, so you bite the bullet and install the new app.
This means that while we can easily iterate within a system to find a better solution, we cannot iterate on the system-world fit.
There is no market to improve the system-world fit. Because designing new systems is so expensive, the fit of our system doesn’t improve over time. They keep producing these globally distributed solutions that fit moderately well. The solutions don’t adapt to anyone’s context unless you change your context to fit the solution.
From the live perspective, the root of many risks arising from technological development is this system lock-in and the poor, static system-world fit they impose globally.
To solve these problems, the solution needs to adapt to their context. No matter how many parameters it contains, no systematic framework can be used to align technology, or AI, with humanity flourishing. Moreover, as technology evolves, what “alignment” means needs to be continuously renegotiated as quickly as the underlying technology evolves. In the future, the rate of change in the underlying reality might be so rapid that the fit between the system and the solution could break down rapidly. It would be impossible for systematic thinking to keep up by designing new systems on the fly.
Diffuse concepts for a diffuse response to diffuse risks.
The live vision answer is to leverage the fluid infrastructure to create live theories, the equivalent of systematic theory, but that incorporate post-systematic insights. The fluid infrastructure would continuously renegotiate the fit between the live theory and the world as the underlying reality changes.
Intellectual labour would stop being exchanged on disjoint markets with solutions valid only within a system. It would be distributed through a peer-to-peer network for insight sharing, where new theory-world fits could be developed in one place and applied in another context.
Here is a speculative story to illustrate the argument. Imagine you are a well-intentioned developer working on a new AI assistant that has been caught lying to its users. The old way would be to follow a recipe from the scientific literature, change the way the assistant is trained, and distribute the new version to all users. A more live way to do it would be to start by gathering sources from philosophy on deception, technical machine learning resources, and maybe even sources from biology on deceptive mimicry. You would then combine these sources and add your own work to make an “update prompt”. The update prompt will adapt to each user’s custom environment and continuously monitor and influence the AI assistant to steer it based on both your update prompt and the user’s preferences.
The vision is to diffuse the concepts (produced from intellectual labour), to diffuse the response (the context-specific application of the concepts), and to diffuse the risks (from technological development). This is the one-line vision you can find on the groundless.ai website.
If the theory-world fit is the undersupplied ingredient that live theory’s peer-to-peer network tries to distribute, where does it come from? Why can’t AI automate the creation of this magical fit?
A full answer is well outside the scope of this post, but I find it important to offer some response, as they are load-bearing hypotheses for the live theory vision.
It is unclear how exactly humans come to develop subjective experiences that fit (for lack of a better word) the world, but this might have to do with their embodiment.
Humans constantly care for their survival at all scales, from fighting pathogens to creating social bonds to build mutual aid networks. This sort of “skin in the game” might be an important factor in how this fit gets developed.
In the medium term, AI might struggle to generate insights that are well-fitted and deeply relevant to the world because of its lack of embodiment. And, no, robots filled with sensors will not be enough. Biological beings implement a form of integrated, fractal homeostasis, from psychological functions to mitochondria, that simple mechanical bodies cannot naively replicate. It’s not that it’s impossible to replicate artificially, but the current tech lacks this multi-scale integration.
The Live Theory vision forms a large puzzle made of pieces that seem unrelated but are in fact supporting the weight of each other. It has been a difficult task to carve out a coherent story from these pieces. My central goal was to describe the function Live Theory aims at filling and how it differs from its predecessor, systematic thinking. Here are a few topics I left aside:
If you are interested in learning more, I would recommend the Live Theory LessWrong sequence for a deep dive, and these three intro documents written for the AI Safety Camp as a shorter introduction.
2025-12-08 16:38:06
Published on December 8, 2025 8:38 AM GMT
This is part 1/2 of my introduction to Live Theory, where I try to distill Sahil’s vision for a new way to scale intellectual progress without systematic thinking. You can find part 2 here.
This post is the first part of a two-part series introducing the vision for Live theory, a design paradigm that leverages AI to build a decentralized, context-sensitive infrastructure. In this first part, I start by describing the unreasonable efficiency and the limitations of systematic thinking, the tool that live theory aims to replace.
The birth of systematic thinking.
Before the scientific revolution, intellectual progress could not scale well. People would have ideas about the world, make hypotheses and test them, gathering insight along the way. This would happen organically, without a deliberate understanding of the process itself. This is how craftsmen would learn to make stronger metals, painters would discover new pigments, and farmers would tweak their methods to be more effective. However, the main way to share this accumulated tacit knowledge was through apprenticeship, by an extended period of training with someone.
During the scientific revolution, we broke out of this trend by adopting formal, systematic thinking. Formal systems created bricks for thoughts. They filled two remarkable functions:
Lens for the thoughts.
Formal systems are sets of rules that constrain what counts as an allowed move or not, just like the rules of a board game like chess. But instead of defining rules for moving the piece on the board, a formal system gives rules for manipulating symbols, like when writing equations.
The rules of the formal system are designed to enforce consistency constraints that make the thoughts more likely to be true. For instance, the sequence of moves “0 = 1” is not allowed, as you cannot have at the same time zero apples and one apple in a bag.
This means that when using the formal system, you don’t have to think about the fit between reality and thoughts anymore. Following the rules became a proxy for what constitutes a “true” thought, a thought that matches reality. This removes the need for costly, noisy experiments. By using the system, your thoughts start to morph to follow its allowed moves. Like an optical lens, it focuses your thinking power in a tiny subregion of possible thoughts, allowing you to craft complex thoughts that would never have been possible before.
This is not to say that all valid moves are useful. Like in chess, most valid moves are unlikely to make you win. There is room to develop skills and taste to navigate towards the interesting parts of the system, the ones likely to establish non-trivial results.
Going collective
The killer application of formal systems is collective: no need to spend time as an apprentice to share insights anymore. The conclusions from the system, such as physics equations, are context-independent. They can be used without needing to know how they are produced. You can reuse a proof of a theorem and apply it to your own mathematical problem without needing to know the history or the author of the proof.
Going meta
The scientific method created a meta-system, a formal system providing rules for the game of “producing systems that match a part of the world”. It described what counts as an allowed experiment and a valid interpretation of experimental results. In its general form, it is not very usable, so fields like physics and biology developed their own formal standards to define what counts as a valid new unit of knowledge to be added to the literature. Despite dealing with experimental data, the scientific methods provided the same benefit as mathematics. As long as a paper meets the field’s systematic criteria, it can be trusted and its results reapplied in new contexts.
This methodology worked really well. Like really, really well. Thousands upon thousands of scientists around the world had the tool to develop deeper thoughts and share their progress with one another simply by sending letters. Formal systems formed the information highway that connected individual scientists and engineers from across the globe into a collective giant brain of ruthless efficiency.
All this knowledge production was put to work to solve real-world problems through engineering, using the insights from science to produce machines that solve problems for citizens. The systematic intellectual work is bundled into context-independent artefacts that enable efficient communication between knowledge producers and consumers at all levels. The engineer doesn’t have to know where the physics equations come from to use them in their blueprint, the worker in a factory doesn’t need to know where the blueprint comes from to assemble the pieces, and the end user doesn’t have to know how the machine has been produced to solve its problem.
Globalized capitalism was the system that organised the worldwide distribution of scientific knowledge. Like the scientific method, it connected millions of actors into a single collective brain. However, the information highway, in its case, did not happen through direct sharing of solutions (as these are competitive advantages); it was the sharing of value information. Capitalism provided money as a unified proxy for determining which company is successful at solving its customers’ problems. It would steer capital investment towards the most successful actors, allowing them to grow further and apply their good solutions more broadly. It created a continuous worldwide Olympics game in the discipline of “solving problems people are willing to pay for”.
The map bends the territory
At the start, around the time of Newton, systematic thinking worked because of the unreasonable effectiveness of mathematics to describe the physical world. It turns out that many phenomena in the physical world, from planetary motion to heat transfer to quantum physics, can be described very well using equations.
But over time, the incredible success of systems spread far beyond maths and physics. They got applied to the social world with game theory, to organisations with Taylorism, or to agriculture. Instead of purely describing the world, we started morphing it to fit the systems. After all, this is a good strategy to make use of this abundant knowledge machine. When the system’s conclusions don’t apply to the real world because the fit between the system and the world is poor, we adjust the setup so the world better aligns with the system. This way, we gained enormous control over the physical and social worlds. This is what led to the modern financial system, monoculture, and assembly lines.
The blind spot of systems.
Remember how I said that systematic thinking worked by concentrating thinking power in a tiny subregion of possible thoughts? Well, this is both its greatest strength and its greatest weakness.
We trained generations of knowledge workers to morph their thought patterns so they would fit the right angles of the systematic maze. By spending so much time in the maze, it becomes hard to think outside of this frame, forgetting that this is a tiny subregion of possible thoughts. In fact, systematic thinking is also blind to all our inner experiences that are not thoughts: body sensations, emotions, intuitions, tacit and embodied knowledge.
More generally, despite its great success in describing the natural world, systematic thinking hit a wall when applied to problems involving living beings, such as organisms, ecosystems, or societies. In these domains, the territory is strongly pulled to fit the systematic map (like in monocultures or economics), as the map is too rigid to adapt to the world.
You cannot describe in abstract terms what health means for an organism or an ecosystem, or what fairness means in a society. It is because in such domains, context matters. There is no abstract theory of health nor fairness to be found. Fostering health or fairness requires solutions that adapt to context with a fluidity impossible to achieve with systematic artefacts, no matter how many parameters they include.
In short, systematic institutions are unable to care for life. They are great at providing abstract knowledge and material comfort, but they cannot be adapted for human flourishing and are ill-suited to address the challenges of our time, such as climate change or the socio-technical problem of AI safety.
Intermediate conclusion
Around the time of the scientific revolution, systematic thinking was designed as an infrastructure to allow scalable intellectual progress. It can be seen as a mould that makes thoughts stackable by providing a set of rules on symbol manipulation that acts as a proxy for truth. It allows individual thinkers to think more complex thoughts and share their results without having to transmit the context where they were developed.
This basic innovation served as the basis for creating a unified, worldwide, distributed cognitive system, in which millions of humans could contribute to solving scientific, engineering, and economic problems.
However, these systematic institutions can only design context-independent solutions. This makes them ill-suited for caring for beings, for which there is no abstract solution.
Transition to part II: adding AI to the picture.
The development of AI would, by default, turbocharge these systematic institutions, amplifying the downstream harm from their inability to care for beings. We need a new way to make intellectual progress scalable that doesn’t rely on systematic thinking and allows for context sensitivity. The cheap, abundant and reliable fluid intelligence stemming from AI might provide the backbone for such an infrastructure. This is what we will explore in Part II, with the introduction of the vision of Live Theory.
2025-12-08 11:35:56
Published on December 8, 2025 3:35 AM GMT
At the ass crack of dawn, in the dark and foggy mist, thousands of people converged on my location, some wearing short shorts, others wearing an elf costume and green tights.
I was volunteering at a marathon. The race director told me the day before, “these people have trained for the last 6-12 months for this moment. They’ll be waking up at 3am. For many of them, this is the first marathon they’ve ever run. When they get off the bus at 5am, in the freezing cold, you’ll be the first face they see. Smile, welcome them, make them feel excited, and help them forget the next 26.2 miles of pain they’re about to endure.”
Even though I normally have RBF and consider it a chore to acknowledge people, I slapped a big fat smile on my face and excitedly greeted runners like I was a golden retriever who hasn’t seen his military dad in over a year.
“HELLO!”
“GOOD MORNING!”
“YOU’RE HERE!”
^That, on repeat for two hours straight. It was actually pretty fun. I calculated the optimal distance to stand from the bus was eight feet away. Stand too close, and the runners were still descending the bus steps and didn’t have time to process my presence. Stand too far away, and they assumed I was just some overly enthusiastic volunteer having a borderline religious experience.
Once I optimized the greeting distance, I observed people’s reactions to me fell into four different categories:
After the marathon for the rest of the day, I found myself smiling and greeting other people I encountered: at a restaurant for lunch, and at the coffee shop where I’m writing this. I smiled at every stranger I made eye contact with, and every one of them smiled back (with one of the four reactions I listed). It just happened naturally, and it felt amazing. I think it made them feel good, too. That’s why we love golden retrievers—they love and greet us unconditionally.