MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Be Naughty

2025-11-22 14:35:40

Published on November 22, 2025 6:35 AM GMT

Context: Post #10 in my sequence of private Lightcone Infrastructure memos edited for public consumption. 

This one, more so than any other one in this sequence, is something I do not think is good advice for everyone, and I do not expect to generalize that well to broader populations. If I had been writing this with the broader LessWrong audience in mind I would have written something pretty different, but I feel like for the sake of transparency I should include all the memos on Lightcone principles I have written, and this one in particular would feel like a bad one to omit. 


In "What We Look for in Founders" Paul Graham says:

4. Naughtiness

Though the most successful founders are usually good people, they tend to have a piratical gleam in their eye. They're not Goody Two-Shoes type good. Morally, they care about getting the big questions right, but not about observing proprieties. That's why I'd use the word naughty rather than evil. They delight in breaking rules, but not rules that matter. 

The world is full of bad rules, and full of people trying to enforce them. Not only that, it's commonplace to combine those rules with memes and social pressure to get you to internalize those rules as your own moral compass.

A key tension I repeatedly notice in myself as I am interfacing with institutions like zoning boards, or university professors asking me to do my homework, or not too infrequently Effective Altruists asking me to be vegan, is that together with the request to not do something, comes a request to also adopt a whole stance of "good people do not do this kind of thing".

Moral argument, of course, is important and real. And social institutions rely on shared norms and ethical codes. But nevertheless, almost all rules such invoked are not worthy of the guilt they produce when internalized. Their structure and claim is often easily exposed as flimsy, their existence is often well-explained by an attempt at rent-seeking or other forms of power-preservation – or simply a reinforced historical accident or signaling competition – but not genuine moral inquiry or some other kind of functional search over the space of social rules.

A name I considered for today's principle is "have courage". The kind of courage Neville displayed when he tried to prevent Harry and his friends from going to the forbidden third floor against Dumbledore's warnings, and the kind Harry and his friends displayed when they barged right past him anyways. "Courage" as such, is having the strength of will to break the rules that deserve to be broken. 

But I ultimately didn't like "courage", and preferred Paul Graham's "naughtiness"[1]. Courage implies a fear to be overcome, or the presence of some resistance, when I do think the right attitude is often to miss a mood completely, and to take active joy in violating bad rules. The right attitude towards requests to avoid blasphemy is not hand-wringing and a summoning of courage every time you speak of something adjacent to god, it is simply to never bother thinking of this consideration, unless you are talking directly to someone who might care and the social consequences of the speech act become practically relevant.

Of course some moral rules are important, and some appeals to guilt are valid. As far as I can tell, there is no simple rule to distinguish the good appeals from the bad ones.

However, it is IMO possible to identify certain subsets of moral appeals as invalid and broken. Ozy Brennan identifies one such subset in "The Life Goals of Dead People":

Many people who struggle with excessive guilt subconsciously have goals that look like this:

  • I don’t want to make anyone mad.
  • I don’t want to hurt anyone.
  • I want to take up less space.
  • I want to need fewer things.
  • I don’t want my body to have needs.
  • I don’t want to be a burden.
  • I don’t want to fail.
  • I don’t want to make mistakes.
  • I don’t want to break the rules.
  • I don’t want people to laugh at me.
  • I want to be convenient.
  • I don’t want to have upsetting emotions.
  • I want to stop having feelings.

These are what I call the life goals of dead people, because what they all have in common is that the best possible person to achieve them is a corpse.

Corpses don’t need anything, not even to breathe. Corpses don’t hurt anyone or anger people or fail or make mistakes or break rules. Corpses don’t have feelings, and therefore can’t possibly have feelings that are inappropriate or annoying. Once funeral arrangements have been made, corpses rot peacefully without burdening anyone.

Compare with some other goals:

  • I want to write a great novel.
  • I want to be a good parent to my kids.
  • I want to help people.
  • I want to get a raise.
  • I want to learn linear algebra.
  • I want to watch every superhero movie ever filmed.
  • I don’t want to die of cancer.
  • I don’t want the world to be destroyed in a nuclear conflagration.
  • I don’t want my cat to be stuck in this burning building! AAAAA! GET HER OUT OF THERE

All of these are goals that dead people are noticeably bad at. Robert Jordan aside, corpses very rarely write fiction. Their mathematical skills are subpar and, as parents, they tend to be lacking. Their best strategy for not dying of cancer is having already died of something else. And there is no one less suited than a corpse for time-sensitive emergency situations. 

Lightcone is not an organization for people who would rather be corpses. In the pursuit of our goals, we need the courage to make choices that will violate a large number of moral guidelines other people hold. Only corpses are this certain kind of pure.

Of course, while naughtiness such defined strikes me as a prerequisite to moral greatness, it also appears pre-requisite for most forms of moral damnation. Corpses don't cause atrocities. While I am confident that in order to live a life well-lived you need to take delight in breaking some rules, taking delight in breaking the wrong rules sets you up for a life of great harm. 

Indeed, the very next paragraph in the Paul Graham essay I cite above says:

Sam Altman of Loopt is one of the most successful alumni, so we asked him what question we could put on the Y Combinator application that would help us discover more people like him. He said to ask about a time when they'd hacked something to their advantage—hacked in the sense of beating the system, not breaking into computers. It has become one of the questions we pay most attention to when judging applications. 

Noticing the skulls is left as an exercise to the reader.

  1. ^

    And also felt like the word "courage" deserved to be reserved for something else in kind that I might end up writing about more at a later point in time



Discuss

Market Logic I

2025-11-22 14:01:04

Published on November 22, 2025 6:01 AM GMT

Garrabrant Induction provides a somewhat plausible sketch of reasoning under computational uncertainty, the gist of which is "build a prediction market". An approximation of classical probability theory emerges. However, this is only because we assume classical logic. The version of Garrabrant Induction in the paper does this by allowing bets to be placed on all boolean combinations of sentences visible to the market. An earlier draft accomplished the same thing via special arbitrage rules, EG, if you own  of , and you own  of , then you can trade these to the market-maker for  of a dollar. (So for example, if the current price of a share of  is , and the current price of a share of  is , then you can arbitrage by buying  of a share of both (cost +=) and cashing these in to the market maker for .) This forces the market to converge towards  for all , mimicing classical probability. Similar arbitrage rules can be formulated for the other logical connectives.

This is nice for what it does, but it means Garrabrant Induction provides no normative evidence for approximating classical probability theory or classical logic. These are baked-in assumptions, not conditions which emerge naturally from beautiful math.

Sam Eisenstat has (privately) formulated a version of Garrabrant Induction based on intuitionistic logic instead. The result does not require the price of a sentence and its negation to sum to 1 (and also violates some other things we expect of probabilities). It has the nice property that conditional on excluded middle , we recover classical probabilities. (This is a natural result to want, since intuitionistic logic plus excluded middle = classical logic.)

This leaves open the question: what logic naturally emerges if we let the math do the talking, rather than imposing one logic or another?

The following modifies a proposal by Sam Eisenstat, and also takes inspiration from yesterday's post.


To lay a foundation for this investigation, I'll sketch something resembling Garrabrant induction, but avoid baking in a specific logic.

I won't go all the way to defining a proper computable thing. Indeed, I won't even define everything completely. I am doing this in compressed time due to Inkhaven, so I may need to come back and fix the math later. Questions/concerns/corrections are appreciated.

Let's start by imagining there are some market goods. I have in mind something similar to the setting in Skyrms' Diachronic Coherence and Radical Probabilism: market goods can be anything, IE, are not restricted a priori to "propositions". However, in order to guarantee the existence of fixed points (as in Garrabrant Induction), I will need to require that the prices are bounded within some range. For simplicity, I'll stipulate that all market goods have the same price-range, namely . I'll use dollars, so that the maximum price of any good is , and prices will typically be denoted in cents. For any good that can be purchased, one can also purchase fractional shares of that good. The total amount of money in the economy is bounded to  as well, so so everyone's net worth is in cents.

Time proceeds in days. On a given day, the market-maker sets the prices for all the market goods, and then the traders look at the prices and decide how much to buy or sell.[1] All transactions go through the market-maker.

On a specific day, a trader's strategy is its function from prices to buy/sell orders. Strategies are required to be Kakutani (basically: continuous functions).[2] The market-maker knows the traders well and can predict the aggregate trades perfectly (only same-day, not days ahead), and sets the prices so that it has no exposure (no downside risk).[3] This is possible thanks to Kakutani's fixed point theorem. The traders, however, need not be so omniscient. Indeed, they need not be rational in any sense. Traders can update their trading strategies from day to day, in any way they like. (A trader is, essentially, a sequence of trading strategies.)

The number of goods on the market can expand as time goes on. For simplicity, I'll assume that there are  different goods on the market on day . We can also imagine that the number of traders increases over time as well, with the th trader appearing on day  and starting with . New traders can also bring nonzero quantities of other goods with them. We'll want to assume that any strategy appears eventually (drawing from some suitably rich class of strategies).

 is, essentially, one of the goods (the good we price everything in terms of). It is natural to suppose  appears on day 1, and always has price 1. A trader's starting portfolio on day  is a function from  to , indicating how much of the th good the trader currently owns. Owing a negative quantity represents a short, which means you've promised to provide the good later if demanded.[4] We'll represent this here as a negative contribution to your net worth on day : the net worth of a trader can be calculated by multiplying the quantity in the starting portfolio for that day by the price of that good (summing over all goods). The day's trades don't change the trader's net worth on that day, but net worth tends to change day-to-day as a result of prices changing. Traders grow in net worth if they can anticipate how prices will change day-to-day and sell what will go down in price / buy what will go up in price.

In order to short good , a trader is required to have enough  to fulfill its promise even if the price of  rose to . Thus, if a trader has  of , it must keep at least  on hand. The market-maker will reject buys/sells that violate this constraint. The market-maker also doesn't let a trader buy more than they can afford (IE,  cannot go negative).

The goal here is to draw an analogy between the operation of this market and logic.

Whereas Garrabrant Induction was shackled to classical logic, I've now described a market in a (more-or-less) neutral way. What logic relates to this market, in the same way that classical logic related to Garrabrant Induction?

As a starting-point, notice that guaranteed-no-loss trade goes in the direction of implication. Here, we're interested in trades that always make sense, by virtue of what is being traded, rather than trades that make sense by virtue of the current prices. For example, if I had a certificate " goes to the owner of this certificate if it rains tomorrow" and you had a certificate " goes to the owner of this certificate if it rains tomorrow and is cloudy in the morning" then you should definitely be willing to trade your certificate for mine, since mine gets  in strictly more cases. This corresponds to the fact that a conjunction implies either of its conjuncts.

This suggests that  is like "true", aka "top", written , because  is guaranteed to have the maximum price of any good, so one should always be willing to accept  in exchange for giving up a share of some good.

Shorting is clearly like negation. If you short  shares[5] of , then you must also set aside  collateral. Thus, the portfolio-contribution associated with shorting is . Compare this to the formula for negation in Łukasiewicz Logic.

I find it helpful to imagine that the market-maker is trying to be maximally helpful, facilitating any trades possible, so long as doing so does not introduce any financial risk. Thus, the market-maker would be willing to give you " goes to the owner of this certificate if it rains tomorrow and is cloudy in the morning" in exchange for " goes to the owner of this certificate if it rains tomorrow" if that's what you wanted. 

What we need to do is characterize the space of things like this.

Our task is to study the space of financial derivatives.

We want to specify a market-maker who is "very helpful", IE, facilitates a broad variety of transactions. These transactions will then give us our logic.

A "derivative" in finance is just a financial instrument that is somehow derived from an underlying financial instrument. For example, if we can invest in a good , we can also bet that  will be above .

This went long, so I'll post part 2 tomorrow!

  1. ^

    You can think of the market-maker as an easy mathematical modeling choice, which represents the more chaotic process of buyers finding sellers & the price at most points in time converging to a small spread.

  2. ^

    Continuous functions allow you to approximate, but never perfectly implement, strategies such as "buy  of  if its price is below , sell  if its price is above ". Kakutani maps allow you to implement this strategy precisely, so long as you're OK with indeterminate behavior at .

    A Kakutani map is actually a relation, not a function; we can think of it as a set-valued function. The output set needs to be nonempty and convex for each input, and the overall relation needs to be closed (contains all its limit points).

  3. ^

    For example, suppose there are two traders, Alice and Bob, and one good  (other than ). Alice has  it wants to invest in , and Bob has  it wants to use to short .

    The market-maker sets the price of  at . Alice buys 1 share of . This 1 share might be worth up to  in the future, so if this was the only thing that happened, the market-maker would have an exposure of ; that's how much the market-maker could lose if the price rose to  and then Alice asked to sell the share.

    Fortunately, Bob is shorting  with . The market-maker pays Bob  to take an anti-share of . Bob now has  total, plus an anti-share. The  is exactly enough to act as collateral for the anti-share, so this is the maximum amount that the second trader can short  using its .

    No matter what the price of  changes to, the amount potentially owed to Alice now exactly balances with the amount Bob potentially owes. For example, if the price of  shifts to , then the market-maker has lost  to Bob (Bob can get rid of the anti-share by paying , and thus, come out  ahead). However, the market-maker has gained  from Alice to make up for it. (Alice purchased the share for , but can only get  back for it now.)

  4. ^

    In the current formalism, this demand will never actually be called in; the debt of the short is represented directly by the negative value it adds to the net worth.

    Imagine you're shorting gold and the market-maker says "don't worry, I'll handle the actual exchange of gold; people won't have to come to you to ask for the gold you promised. You'd just have to come to me to buy it anyway. Instead, they'll come to me and I'll give them the gold."

    You ask: " And then you'll charge me for the gold they buy? But how do you decide who to charge when someone comes for gold? You could screw me over by saying it is my turn on a day when the price of gold is high?"

    The market-maker says: "Well, to be honest, I won't ever charge you. Instead, I keep track of your debt, and I simply won't let you spend beyond the point where you wouldn't be able to pay off the debt. I calculate this based on the worst-case scenario where the price of gold soared to . Other people I trade with know that I could always come ask for money from you, so they know I'm solvent, and they'll extend me credit because they know I'm good for it, the same way I'm extending credit to you because I know you're good for it."

    You respond: "That sucks! All transactions have to go through you, so that means it is always as if you've picked the worst possible day, since my effective amount of money to spend is as if the short worked out as badly as possible for me. What is the point of shorting, then? You give me money for selling the good, but in actuality, your policy is such that it is as if my total amount of money has gone down."

    Market-maker: "You can buy later, which cancels out your short, whenever you'd like me to recognize your actual net worth based on the actual value rather than the worst-case. If you short something when you think it is over-valued, and then you buy it later when the value has reduced, you'll have made money. This isn't really about me having a monopoly on trade; it's just me asking you to set aside enough money that you could pay me back if I needed it."

  5. ^

    The plural "shares" is here used in the fractional sense, since traders will almost always be buying fractional shares.



Discuss

Animal welfare concerns are dominated by post-ASI futures

2025-11-22 12:08:51

Published on November 22, 2025 4:08 AM GMT

I've claimed before that animal welfare concerns should probably be dominated by post-ASI futures where humans survive.  Let me outline a hypothetical scenario to illustrate the shape of my concern.

Ten years from now, some lab builds an ASI with the assistance of AIs that are superhuman at AI R&D.  The ASI is not corrigible; that turned out to either be an incoherent target for a general system like that, or an unsafe target (because making it corrigible necessarily traded off against making it act in our interests in situations where we were wrong about either the long-run consequences of what we were asking for, or because we were wrong about our own values).

The lab founders and employees are smart, well-meaning, and have heard of CEV.  The alignment work goes implausibly well and we live in an implausibly friendly universe where CEV as an alignment target has a pretty wide basin of attraction.  We end up with a friendly singleton aligned to humanity's CEV.

CEV, as an alignment target, is about implementing the overlapping (extrapolated) preferences between humans.  As Eliezer wrote:

If your CEV algorithm finds that "People coherently want to not be eaten by paperclip maximizers, but end up with a broad spectrum of individual and collective possibilities for which pizza toppings they prefer", we would normatively want a Friendly AI to prevent people from being eaten by paperclip maximizers but not mess around with which pizza toppings people end up eating in the Future.

Not mess around with which pizza toppings people end up eating in the Future, you say?

In the section titled Role of 'coherence' in reducing expected unresolvable disagreements, Eliezer tries to address the question of animal suffering directly:

A CEV is not necessarily a majority vote. A lot of people with an extrapolated weak preference* might be counterbalanced by a few people with a strong extrapolated preference* in the opposite direction. Nick Bostrom's "parliamentary model" for resolving uncertainty between incommensurable ethical theories, permits a subtheory very concerned about a decision to spend a large amount of its limited influence on influencing that particular decision.

This means that, e.g., a vegan or animal-rights activist should not need to expect that they must seize control of a CEV algorithm in order for the result of CEV to protect animals. It doesn't seem like most of humanity would be deriving huge amounts of utility from hurting animals in a post-superintelligence scenario, so even a small part of the population that strongly opposes* this scenario should be decisive in preventing it.

(ADDED 2023: Thomas Cederborg correctly observes that Nick Bostrom's original parliamentary proposal involves a negotiation baseline where each agent has a random chance of becoming dictator, and that this random-dictator baseline gives an outsized and potentially fatal amount of power to spoilants - agents that genuinely and not as a negotiating tactic prefer to invert other agents' utility functions, or prefer to do things that otherwise happen to minimize those utility functions - if most participants have utility functions with what I've termed "negative skew"; i.e. an opposed agent can use the same amount of resource to generate -100 utilons as an aligned agent can use to generate at most +1 utilon. If trolls are 1% of the population, they can demand all resources be used their way as a concession in exchange for not doing harm, relative to the negotiating baseline in which there's a 1% chance of a troll being randomly appointed dictator. Or to put it more simply, if 1% of the population would prefer to create Hell for all but themselves (as a genuine preference rather than a strategic negotiating baseline) and Hell is 100 times as bad as Heaven is good, compared to nothing, they can steal the entire future if you run a parliamentary procedure running from a random-dictator baseline. I agree with Cederborg that this constitutes more-than-sufficient reason not to start from random-dictator as a negotiation baseline; anyone not smart enough to reliably see this point and scream about it, potentially including Eliezer, is not reliably smart enough to implement CEV; but CEV probably wasn't a good move for baseline humans to try implementing anyways, even given long times to deliberate at baseline human intelligence. --EY)

Let's suppose that the lab employees responsible for implementing the CEV target had even read this 2023 addendum.  Maybe they decide that the appropriate solution is to simply forgo any structure that allows vetoes, and to simply tank the fact that some small percentage of the population will have unpalatable-to-them extrapolated preferences which they will exercise with their shared allocation of natural resources.

Current human preferences are confusing to current humans.  They seem quite sensitive to contingent environmental circumstances.  I would be very surprised if extrapolated human preferences were not also quite sensitive to contingent details of the extapolation procedure.  This suggests that there might not be an "obvious" or "privileged" endpoint of such extrapolations, unless it turns out that there's a principled meta-procedure which enables trade/averaging across them, or at least finding the maximum overlapping subset.  (This sounds like a familiar problem.)

Let's put that problem aside, for now, and assume there is such a principled meta-procedure (or assume that actually current human preferences cohere extremely reliably in ways that aren't sensitive to the extrapolation procedure).

How confident are you that no current humans will have an extrapolated volition that non-strategically cares about preserving parts of the Earth's environment in their "natural" state?  What about an extrapolated volition that just wants to live out the rest of the universe's lifespan in a non-simulated pastoral lifestyle, with live, unmodified animals and everything?

Remember, we're definitely pressing the button, the other labs aren't too far behind and it's a miracle that we've hit the target at all.  "Failing" closed at the end of the three-stage procedure described in the "Selfish bastards" problem section isn't an option. Unfortunate that we couldn't find and verify an extrapolation procedure that doesn't have much worse downsides, while still remaining principled enough that we don't feel like we're unilaterally seizing the reigns of power to impose our preferences on everyone else.


Alas, naive solutions to the problem I've described above seem pretty unprincipled, since they approximately round off to "try to optimize over the environment to shape how other people's preferences change".  Also, that kind of thing seems like something that CEV would explicitly want to avoid being influenced by, so it's not even obvious that it'd help at all, as opposed to wasting a bunch of resources playing games that smarter agents with better decision theory would avoid playing in the first place.

Having said that, what things might improve the state of animal welfare in the post-ASI future?

(If you're a negative utilitarian, well, the situation seems kinda bleak for you if we do get an aligned singleton, since it doesn't seem like there'll be literally zero suffering in the future.  It might be even worse if get an unaligned singleton, since it's not clear that there won't be suffering subprocesses all over the place - very unclear.)

The obvious next answers are "making sure we don't die to an unaligned superintelligence", and "figuring out philosophy and meta-philosophy, so that we don't end up implementing a dumb extrapolation procedure which leaves a bunch of value on the table", since at least that way the number of animals living happy, fulfilled lives is likely to vastly dominate animals suffering for contingent reasons.  (Also, it seems pretty likely that most people's extrapolated preferences will be willing to engage in trades where they trade off the "animals suffering" bits of the things that they value for other things that that they value more, assuming that those don't get treated as utility inverters in the first place.)

Not very satisfying, and I'm naturally kind of suspicious of that being the bottom line, given my beliefs, but such is life.


Thanks to Habryka for discussion on how CEV would handle historical attempts to strategically influence CEV outcomes.



Discuss

Habitual mental motions might explain why people are content to get old and die

2025-11-22 10:52:23

Published on November 22, 2025 2:52 AM GMT

The fifth post in my recent series. More rambly than ideal, but Inkhaven is relentless.

People are perplexing. They do not seem to follow through on the logical conclusion of the information they have in combination with the values they have.

The most blunt example I have: I think it is neither secret nor controversial that civilization is more technologically advanced than it was a few centuries ago. In particular, we are more medically advanced. We have cures for many conditions, and people live longer and remain healthier for longer than in the past[1]. Clearly, change and improvement are possible – who knows how much?

People don't like aging and death. People bemoan getting old and try to hide signs of aging at great expense. Death is regarded as tragic, and once someone has a serious condition, people invest in fighting. Up close, people try to delay death.

The logical conclusion of "we have great reason to think progress against aging and death is possible" and "we don't like aging and death", is to invest a lot more into anti-aging efforts. 

People should clamor for funding for research, demand results, demand explanations for lack of results, it should be prestigious to work in, it should be on the news. It should be a big deal.

There's not nothing. Bryan Johnson is doing his thing. The SNES Research Foundation and other groups exist. But the typical person still expects to get old and die[2]. When they get close to it they try to delay it, but before then it is accepted as a fact of life[3].

This is crazy. This is really really crazy. It's so inconsistent according to their own values. What is going on in their heads? 

I've wondered about this for many years. In 2019, it felt like I had some clue in modeling some people living in Causal Reality and others in Social Reality. I think that's not entirely wrong, but very low resolution. With this series of posts, I have been building towards a better answer.


Let's examine the line of reasoning I implied was obvious and transparent: (1) as evidenced by history, large medical progress is likely[4] possible, (2) overcoming death and aging is desirable, therefore (3) we should be investing in this.

There are specific mental motions occurring in my mind that cause the above to seem correct and to shape my behavior[5]. Those motions feel natural to me, but perhaps they're actually atypical. 

Though the following is high-level, I actually think many of the relevant motions where I differ are much lower-level, akin to muscle contractions rather than macro movements. But it's much easier to talk about macro stuff. See this helpful comment from Steven Byrnes, which links to a discussion of lower-level traits.

Hidden assumptions and traits relevant to my thinking:

  • Large-scale agency: I believe myself to have the ability to influence very macro things through my actions
  • History realism: past events and world state aren't like fiction, a story in a book, but actually something real that happens and is relevant to my life.
  • Future realism: likewise, the future is a real place that I'm going to, and that can be one or another shaped by human actions.
  • History/present/future distinction: the present isn't the only way things can be
  • Reductionism and "mechanicalism": the human body and its conditions are like a car – a system made of parts that can be intervened on. Nothing magical, nothing essential. Just atoms and molecules and cells.
  • Willingness to endure uncomfortable beliefs: negative things can feel worse if they feel avoidable. Believing that death is not inevitable means accepting needless deaths, and might require effort from someone. That's an uncomfortable conclusion I might have bounced off of.
  • Willingness to hold a weird conclusion: most people don't live like aging and death should be fought, to take that up would be weird and likely get a negatively valenced reaction from others. That doesn't faze me.

Alright, now let's tell a story about me. This is illustrative and speculative, but maybe it is right. In my youth, I was went to an ultra-Orthodox Jewish school[6]. I was taught that Jews are God's chosen people, that Jews are why God created the world, and that Jews, through their actions, would bring about the Messianic era (utopia). I believed it. It was the endorsed social belief. So yeah, pretty natural to me to think my actions affect Everything.

Likewise, Orthodox Jewish practice is chronically justified and explained with reference to past events that are treated as not a metaphor, no, that actually happened. The Jews were in Egypt, and that's why we have Passover. God commanded Abraham to circumcise; that's why we do it too. Very real.

The group I was involved in made the Messianic era a big focus. It was imminent. Soon, any year or decade now. Big changes. BIG. It was normal and expected to think that. It'd be weird and heretical to predict things continuing as they are.

I was always technically inclined. Things are made of parts. They break, you fix them. I naturally saw things in a reductionistic way. I also think I've always had limited conformity instincts, such that my mind doesn't put up resistance to believing weird things.

Even though I am now thoroughly disabused of the specific religious claims, the broader hypotheses seem like normal and reasonable things to believe. I might affect the entire world? Yeah, I was brought up thinking that. The past and future are real and might be different from the present? I mean, yeah, duh. Plus, growing up in an insular religious community does get you used to believing some wacky things that most other people don't. You just feel a little superior for knowing the truth.

Not having grown up in any other environment, I really don't know what messages other people got. Did the events of Macbeth, Moby Dick, and Magna Carta get lumped into the same bucket as not really relevant to life? All just stories.

What is real anyway?

I have another half-formed idea that a key variable between people is what seems real. Real defined as being actually plugged into one's decision-making. This is related to Taking Ideas Seriously but broader. I think you can be able to follow the steps of a deductive argument, but it takes something beyond that for deductive arguments to change your decisions.

I can imagine someone who can perform deductive reasoning, but such reasoning is only used to prepare material for explicit arguments for things they believe for other reasons. Deductive reasoning is part of a social reasoning game, not a broader decision-making.

Reductionism is another thing that might not feel real to many. You might see a doctor performing reductionistically-sensible actions like prescribing antibiotics to fight bacterial infection; to them, their mind relates to it more shamanistically. There's a problem that's solved by recruiting help from the person who is empowered to help with that kind of problem. It's a social role thing. One becomes capable of dealing with a problem by going the the correct rites of passage. The details are just extra ambiance for the story.

I'm imagining a "things are made of parts" belief/mental motion that some people have a lot more of than others, for some mix of natural inclination and some mix of that was a useful way to think about things for them.

What's foreground, what's background for you?

This is my old guess, but I do wonder whether the focus of your world is people with everything just a backdrop, or the world is a whole thing going on and people are just part of it. The former might set you up to always see things as Player vs Player, while the latter admits Player vs Environment.

If going through your life, you made progress always via succeeding in social ways, e.g., being popular, being liked, being cool, etc., then your mind learns to focus on the social stuff and the other doesn't land. The non-social is real in the abstract, but it's not hooked up to the decision-making that drives your actions. 

In inverse of this is the socially oblivious. People who choose their actions for many reasons, but the reactions of others are not among them.

Implications

Arguments are different if you have two parties whose decisions are impacted by deductive chains of reasoning but who differ on some premises, than if you have two parties and one relates to deductive reasoning as something connected to decision-making and another sees it as part of a verbal game against foes.

One party might ask, "Is this true? Is this valid?" The other is running each argument through various filters and simulations for how it will sound to friends and foes. "How will this affect my alliances? What will the impact be on the undecided parties I want to court? Do I win in public opinion?"

At the observed level, you might have both parties discussing deductive arguments, but how they're relating to them and what they're doing them with them could be very different. And different at a deep and invisible level that neither side even sees what they're taking for granted. Thoughts are surprisingly detailed and remarkably autonomous.

We're in a world where a lot of people from the community who think like me are trying to persuade the world that AI is a major risk and certain policies are a good idea. Now, things are not all or nothing; they're not binary here, but I think many or most of the people we're interacting with are running different mental programs, learned over a lifetime. Neither of us is even aware of how different and strange the other's inner mental life is to us.

If we want to figure out how to convince the world to be sane, it might help to better understand the kind of cognition they're running and understand how our statements will land in the context of that.

This might be very hard. As above, I think the real differences are lower-level than the things it's easy for me to point at. That makes it harder to imagine.

An analogy here is neurotransmitters and receptors. You might have heavily developed receptors for reductionistic arguments. They might not. In comparison, they're receptors for social arguments are heavily developed. It might prove hard to really occupy the headspace of the other.

Fears of Symmetry

Scott Alexander kindly bequeathed the concept of Symmetric vs Asymmetric Weapons. Asymmetric weapons, like logical debate, are stronger on the side of truth than the side of falsehood. Symmetric weapons, like violence, are equally useful to both sides.

 I am afraid that even if people all around are technically capable of logical debate, for some, this is an underused, if not functionally atrophied, pathway. Instead, the only way to influence them is with symmetric weapons. That'd be really awful. There's no protection there. 

Is there hope?

I'm persuaded that if you can just understand how something works, and understand its constituent pieces, and how they interact, then you can fix it ;)

No, but seriously. I don't think modeling this better is a good first step. I also don't think things are binary or fixed here. As I wrote in Same cognitive paints, exceedingly different mental pictures, to a large extent, I think people broadly are capable of the same mental motions; we've just learned to favor some over others. The learning is ongoing, and it's possible to influence people's choice of mental motion. There might be ways of framing arguments that get people to relate to some things as more real and decision-relevant than they did previously. Bump up the weights/coefficients tied to helpful, asymmetric mental motions via altering the context in which the reasoning happens, and other similar approaches.

 

 

  1. ^

    I think fewer people are thinking about this, but also there are creatures like tortoises and sharks (and jellyfish?) with much longer lifespan than humans that are evidence for biology not inherently requiring the demise of the mind and flesh.

  2. ^

    In his book The AI Does Not Hate You, Tom Chivers recounts himself performing an Internal Double Crux with guidance from Anna Salamon

    Anna Salamon: What’s the first thing that comes into your head when you think the phrase, “Your children won’t die of old age?”

    Tom Chivers: “The first thing that pops up, obviously, is I vaguely assume my children will die the way we all do. My grandfather died recently; my parents are in their sixties; I’m almost 37 now. You see the paths of a human’s life each time; all lives follow roughly the same path. They have different toys - iPhones instead of colour TVs instead of whatever - but the fundamental shape of a human’s life is roughly the same. But the other thing that popped is a sense “I don’t know how I can argue with it”, because I do accept that there’s a solid chance that AGI will arrive in the next 100 years. I accept that there’s a very high likelihood that if does happen then it will transform human life in dramatic ways - up to and including an end to people dying of old age, whether it’s because we’re all killed by drones with kinetic weapons, or uploaded into the cloud, or whatever. I also accept that my children will probably live that long, because they’re middle-class, well-off kinds from a Western country. All these these things add up to a very heavily non-zero chance that my children will not die of old age, but, they don’t square with my bucolic image of what humans do. They get older, they have kids, they have grandkids, and they die, and that’s the shape of life. Those are two fundamental things that came up, and they don’t square easily.

  3. ^

    I think people's behavior here is much less crazy if they have religious beliefs that includes a persistent soul and afterlife. Although more insane if eternal torture is a possibility they believe and they're not more worried about for them and others.

  4. ^

    I don't think the evidence implies that death and aging are definitely curable, just that we have reason to think they might be. And that's enough.

  5. ^

    I think AI is more urgent than aging/death, but if I wasn't working on AI, I might well be working on aging/death.

  6. ^

    I did not come from a family with a continuous ultra-Orthodox heritage so the messages weren't that strong in the home or from grandparents, etc.; my own family had its own journey and interactions with level of religiosity, but I got a lot of messaging at school.



Discuss

D&D.Sci Thanksgiving: the Festival Feast

2025-11-22 10:26:49

Published on November 22, 2025 2:26 AM GMT

This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.

Estimated Complexity Rating: 2.5/5

STORY

As you wander this strange fantasy world you have been summoned to, gathering allies to your banner to save this world from the Demon King, you have been struck by the many strange practices of the people here.

Today, you are trying to win the favor of the mighty warriors of the far land of Isamanda.  Luckily, you happen to have arrived here shortly before one of their holy days, the Festival Feast of Father Frederick[1].

previewer
Image generated using OpenArt

By the strange traditions of this bizarre and otherwordly land, political disputes are held at the Festival Feast.  (You suspect it's a plan to keep the debates from escalating to violence by placating the participants with food[2].)

This is excellent news for you, since it means that you have a wonderful opportunity to raise the issue of the Demon King and push them to join with you against him.

However, you first need to make sure that you hold an excellent Festival Feast!  Hungry Isamandans are famously grumpy, and you're sure they'll be much more receptive when happy and well-fed!  With your Companions to help you hunt and cook various dangerous creatures, and your own Data Science skills to help you choose what dishes have led to the most successful Feasts in the past, you're sure you can draw many guests to your Feast and keep them all very happy!

DATA & OBJECTIVES

  • You need to hold a Festival Feast.
  • You can choose any number of the available dishes to serve at your feast:
    • Ambrosial Applesauce
    • BBQ Basilisk Brisket
    • Chili Con Chimera
    • Displacer Dumplings
    • Ettin Eye Eclairs
    • Fiery Formian Fritters
    • Geometric Gelatinous Gateau
    • Honeyed Hydra Hearts
    • Killer Kraken Kebabs
    • Mighty Minotaur Meatballs
    • Opulent Owlbear Omelette
    • Pegasus Pinion Pudding
    • Roc Roasted Rare
    • Scorching Salamander Stew
    • Troll Tenderloin Tartare
    • Vicious Vampire Vindaloo
    • Wyvern Wing Wraps
  • For example, a solution could be serving just Ambrosial Applesauce.  
  • Or you could serve BBQ Basilisk Brisket and Chili Con Chimera.  
  • Or you could serve every dish except Displacer Dumplings.
  • Your job is to get the highest Feast Quality possible.
  • To help you with this, you have a dataset.  Each row in the dataset contains which dishes were served at a past feast, and the resulting overall Quality of that feast.

SCHEDULING & COMMENTS

I'll aim to post the ruleset and results on Dec 1st, but if you find yourself on Thanksgiving weekend being stuffed full of Tiny Truculent Tyrannosaur turkey and want an extension, let me know and I'll be happy to grant one!

As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers  that contain information or questions about the dataset.  To spoiler answers on a PC, type a '>' followed by a '!' at the start of a line to open a spoiler block - to spoiler answers on mobile, type a ':::spoiler' at the start of a line and then a ':::' at the end to spoiler the line.

  1. ^

    None of the residents are quite sure who Father Frederick is, or what they are celebrating, but all of them are quite sure that the celebration is an important part of their culture.

  2. ^

    Or, failing that, to stuff all the participants so full of food that they cannot get up to do anything violent.



Discuss

Diplomacy during AI takeoff

2025-11-22 10:12:11

Published on November 22, 2025 2:12 AM GMT

AI 2027, Situational Awareness, and basically every scenario that tries to seriously wrestle with AGI, assume that the US and China are basically the only countries that matter in shaping the future of humanity. I think this assumption is mostly valid. But, if other countries wake up to AGI, how might they behave during AI takeoff?

States will be faced with the following situation: Within a few years, some country will control superintelligence, or create a runaway superintelligence that causes human extinction. Once either nation creates a superintelligence, if humanity is not extinct, then every other nation will be at the mercy of the group that controls ASI.

ASI-proof alliances

Fundamentally, countries will be in the state of entering ASI-proof alliances with the country likeliest to first create a superintelligence, such that they gain some control of the superintelligence’s actions. They could avoid being disempowered after ASI through:

  1. Verifiable intent-alignment. For instance, a US ally might demand that the US insert values into US superintelligence which protect the ally’s sovereignty. This might be done through an agreed-upon model spec and inspections.
  2. Shared access. A US ally might demand that they get shared access to all frontier AI systems that the US produces, such that there is never an enormous power difference.
  3. Usage verification. US allies might demand the access to inspect any input to a US-owned superintelligence, such that they can veto unwanted inputs that might lead to their disempowerment.
  4. Mercy. If the group controlling ASI likes a specific ally enough, they might decide to show mercy and not disempower their ally. Thus, countries will have the incentive to be sycophantic towards those likely to control ASI.

Most of these strategies require having in-house AI and AI safety expertise, which means many countries might start by forming AI safety institutes.

If it becomes more obvious which country will achieve ASI first, then the global balance of power will shift. Countries will flock to ally with the likely winner to reduce the likelihood of their own disempowerment.

ASI-caused tensions

Nuclear-armed states might be able to take much more drastic actions, largely because control of nuclear weapons gives countries a lot of bargaining power in high-stakes international situations, but also because nuclear weapons are correlated with other forms of power (military and economic).

States might also pick the wrong country to “root for” and have too much sunk cost to switch, meaning they will instead prefer to slow down the likely winner.

I think that “losing states” will likely resort to an escalating set of interventions, similar to what’s described in MAIM. I think it’s plausible (>5% likely) that at some point, nuclear-armed states will be so worried of being imminently disempowered by an enemy superintelligence that these tensions will culminate in a global nuclear war.

Global AI slowdown

There is some chance that states will realize that an AI race is extremely dangerous, due to both misalignment and extreme technological and societal disruption. If states come to this realization, then it’s plausible that there will be an international slowdown such that countries can remain at similar power levels and progress slowly enough that they can adapt to new technologies.

One global ASI project

The natural extreme of an ASI-proof alliance is a global ASI project. Under such a setup, most countries participate in a singular ASI project, where AI development goes forward at a rate acceptable to most nations. In such a project, verifiable intent-alignment, shared access, and usage verification would likely play a role.

I think this approach would dramatically lower the risk of human extinction (from ~70% to ~5%), but it seems quite unlikely to happen, as most governments seem far from “waking up” to the probability of superintelligence in the next decade.



Discuss