MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Halfhaven Digest #5

2025-11-24 05:57:53

Published on November 23, 2025 9:57 PM GMT

My posts since the last digest

  • A Culture of Bullshit — Part of the reason society is going down the tubes — if it is — is because we have a culture of mediocrity, where bullshit is tolerated.
  • The Flaw in the Paperclip Maximizer Thought Experiment — Most of the things I write are original ideas (whether brilliant insights or lazy hot takes), but this one was a bit more of an exposition of ideas I didn’t come up with.
  • I Spent 30 Days Learning to Smile More Charismatically — Technically, this one took me 30 days to make. Talks about charisma and “looksmaxxing”, and how unhinged some looksmaxxing advice can be.
  • In Defense of Sneering — This was just a LessWrong comment, which is allowed for Halfhaven. There was a LessWrong thread where everyone was complaining about sneering, and I chimed in because I think sneering isn’t inherently bad, it’s only bad if it’s too hostile. But not enough sneering risks letting bullshitters get away with their bullshit.
  • Literacy is Decreasing Among the Intellectual Class — Looking at two books that have been in publication for over a century (Etiquette and Gray’s Anatomy) and comparing the old versions with the modern to see the degradation in writing quality typical of modern books.

I’m proud of a few of these ones. I was sick during this segment of Halfhaven, but I still managed to get things out, which I’m happy with. I had a few mostly-finished posts in the chamber.

Some highlights from other Halfhaven writers (since the last digest)

  • Why is Writing Aversive? (Ari Zerner) — A relatable post asking why it is that writing can feel so hard. My general advice would normally be that if you find writing involves a lot of friction, but enjoy having written things, that means you just don’t like writing and should give up. But reading this post made me realize I used to feel a lot more like Ari than I do now about writing. As little as a few months ago, maybe. I think maybe developing taste and putting more effort into editing has been what’s helped. Then writing feels like a type of craft, rather than a brain dump. And building things is fun. As long as you’re not TikTok-brained (or Magic-Arena-brained), which is its own problem, and one I sometimes struggle with too.
  • Menswear is a Subcultural Signaling System (Aaron) — A great post. In particular, I liked the concept handle of a “Type of Guy”, which conveys the archetypal nature of fashion. “You do not want different items of clothing you are wearing to signal you are incompatible Types Of Guy.” So no vest over a t-shirt and jeans! Has a follow-up post.
  • No One Reads the Original Work (Algon) — People talk about things without actually having seen them. The equivalent of reading headlines without clicking through to the news article. I remember seeing a lot of this when Jordan Peterson was popular, and people who hated him would talk about him in ways that made it clear they’d never heard the man speak. They’d only heard people talking about him.
  • against predicting speedrunners won’t do things (April) — I think April is winning the record for the most post topics that make me want to click. Speedrunning lore is inherently interesting. I like that she backs up her hypothesis with some concrete predictions.
  • Diary: getting excused from a jury duty; models, models, models (mishka) — I’d never thought about how biased police are as witnesses. That’s a great point.
  • To Write Well, First Experience (keltan) — Lots of good writing advice. In particular, that if you’re writing from stuff you’ve read rather than from real experience, you’re writing through a low-bandwidth proxy.
  • Traditional Food (Lsusr) — A very thorough post about how our idea of a traditional diet doesn’t necessarily reflect what people actually ate in the past, and instead often reflects actual government propaganda. White rice and white bread are “fiberless blobs of carbohydrates” that nobody in history ever ate, and eating them makes us sick.

We’re entering the final segment of Halfhaven. Many won’t finish the full 30 post challenge by the end of November, but I’ve still gotten some good posts out of the people who didn’t make it all the way, so be proud of what you have done, rather than dwelling on what you didn’t do. Good luck in the final week everyone!



Discuss

Emotions, Fabricated

2025-11-24 05:57:20

Published on November 23, 2025 9:57 PM GMT

Queries about my internal state tend to return fabricated answers. It doesn't much matter if it's me or someone else asking the questions. It's not like I know what's going on inside my head. Thoughts can be traced to an extent, but feelings are intangible. Typically I just don't try, and the most pressing issue is that I'm unable to differentiate anxiety and hunger. Not a huge problem, except for slight over-eating once in a while. I think the description of Alexithymia matches my experiences quite well, although naturally not all of the symptoms match.

The real issues arise from other people asking how I feel or what caused me to act in one way or another. I have no answers to such questions! I'm guided by intractable anxiety, learned patterns on how one ought to navigate a situation, and a mostly-subconscious attempt to keep it all consistent with how I've been before. Complicated yet incomplete models about how emotions and motivations are supposed to work, stolen from books I like to substitute my reality with. Shallow masks on top of a void that only stares back when I look for the answers.

Whenever actual pressure is placed on me to obtain the unavailable answers, the narrator makes up a story. Good stories make sense, so the narrator finds an angle that works. Memories are re-interpreted or modified to match the story as necessary. Painting a good picture of oneself is imperative, and the stories pick just the right frame for that. Actually lying is unnecessary; without closer inspection it's not hard to actually believe that all, and the inability to trust one's own memories or reasoning doesn't help. Just noticing that this kind of thing was going on was quite hard. Sometimes I add disclaimers when the topic seems prone to fabricated emotions, especially when analyzing events of the past. Often I won't bother, people tend to not appreciate it and mostly just causes everyone else involved to be frustrated as well. Still, anyone who gets to know me well enough would probably notice it at some point, and keeping it secret would feel unsustainable too.

I'm not sure how this should be taken into account when modeling other people. Is everyone like this? I think so, but only rarely as strongly as I am. Nor as self-aware, although perhaps most people are better at this, proportionate to how much it affects them. People rarely report experiencing the same, when I tell them of fear of being just an empty core behind my masks. Perhaps if the masks are a bit closer, they feel like a part of one's personality rather than some bolted-on external layer. The lacking sense of identity is a depression thing, so maybe mentally healthy people, whatever that means, have an experience of all-encompassing identity.

In my previous text on related matters, I looked at it through the lens of validation-seeking. I'm not sure how much of the fabrication happens because the narrator rewrites the events in a more flattering way, but that's surely a part of this. But not all of it.

All of this was probably fabricated too, as it was mostly produced by the need to have something to write about. Oh well.



Discuss

I'll be sad to lose the puzzles

2025-11-24 03:37:02

Published on November 23, 2025 7:37 PM GMT

My understanding is that even those advocating a pause or massive slowdown in the development of superintelligence think we should get there eventually[1]. Something something this is necessary for humanity to reach its potential.

Perhaps so, but I'll be sad about it. Humanity has a lot of unsolved problems right now. Aging, death, disease, poverty, environmental degradation, abuse and oppression of the less powerful, conflicts, and insufficient resources such as energy and materials. 

Even solving all the things that feel "negative", the active suffering, there's all this potential for us and the seemingly barren universe that could be filled with flourishing life. Reaching that potential will require a lot of engineering puzzles to be solved. Fusion reactors would be neat. Nanotechnology would be neat. Better gene editing and reproductive technology would be neat.

Superintelligence, with its superness, could solve these problems faster than humanity is on track to. Plausibly way way faster. With people dying every day, I see the case for it. Yet it also feels like the cheat code to solving all our problems. It's building an adult to take care of us, handing over the keys and steering wheel, and after that point our efforts are enrichment. Kinda optional in sense, just us having fun and staying "stimulated".

We'd no longer be solving our own problems. No longer solving unsolved problems for our advancement. It'd be play. We'd have lost independence. And yes, sure, you could have your mind wiped of any relevant knowledge and left to solve problems with your own mind for however long it takes, but it just doesn't strike me as the same.

Am I making some mistake here? Maybe. I feel like I value solving my own problems. I feel like I value solving problems that are actually problems and not just for the exercise.

Granted, humanity will have built the superintelligence and so everything the superintelligence does will have been because of us. Shapley will assign us credit. But cheat code. If you've ever enabled God-mode on a video game, you might have shared my experience that it's fun for a bit and then gets old.

Yet people are dying, suffering, and galaxies are slipping beyond our reach. The satisfaction of solving puzzles for myself needs to be traded off...

The other argument is that perhaps there are problems humanity could never solve on its own. I think that depends on the tools we build for ourselves. I'm in favor of tools that are extensions of us rather than a replacement. A great many engineering challenges couldn't be solved without algorithmic data analysis and simulations and that kind of thing. It feels different if we designed the algorithm and it only feeds in our own overall work. Genome-wide association tools don't do all the work while scientists sit back.

I'm also very ok with intelligence augmentation and enhancement. That feels different. A distinction I've elided over is between humans in general solving problems vs me personally solving them. I personally would like to solve problems, but it'd be rude and selfish to seriously expect or aspire to do them all myself ;) I still feel better about the human collective[2] solving them than a superintelligence, and maybe in that scenario I'd get some too.

There might be questions of continuity of identity once you go hard enough, yet for sure I'd like to upgrade my own mind, even towards becoming a superintelligence myself –  whatever that'd mean. It feels different than handing over the problems to some other alien entity we grew.

In many ways, this scenario I fear is "good problems to have". I'm pretty worried we don't even get that. Still feels appropriate to anticipate and mourn what is lost even if things work out.

As I try to live out the next few years in the best way possible, one of the things I'd like to enjoy and savor is that right now, my human agency is front and center[3].

 

Eternal Children

See also Requiem for the hopes of a pre-AI world.

 

  1. ^

    I remember Nate Soares saying this, though I don't recall the source. Possibly it's in IABED itself. I distinctly remember Habryka saying it'd problematic (deceptive?) to form a mass movement with people who are "never AI" for this reason.

  2. ^

    Or post-humans or anything else more in our own lineage that feels like kin.

  3. ^

    The analogy that's really stuck with me is that we're in the final years before humanity hands over the keys to a universe. (From a talk Paul Christiano gave, maybe at Foresight Vision weekend, though I don't remember the year.)



Discuss

Show Review: Masquerade

2025-11-24 03:20:42

Published on November 23, 2025 7:20 PM GMT

Earlier this month, I was pretty desperately feeling the need for a vacation. So after a little googling, I booked a flight to New York city, a hotel, and four nights worth of tickets to a new immersive theater show called Masquerade.

Background: “Standard” Immersive Theater

To convey Masquerade, I find it easiest to compare against standard immersive theater.

It’s weird to talk about “standard” immersive theater, because the standard is not that old and has not applied to that many shows at this point. Nonetheless, there is an unambiguous standard format, and the show which made that format standard is Sleep No More. I have not yet seen Sleep No More itself, but here’s my understanding of it.

Sleep No More follows the story of Macbeth. Unlike Shakespearre’s version, the story is not performed on a central stage, but rather spread out across five floors of a building, all of which is decked out as various locations from the story. Scene changes are not done by moving things on “stage”, but rather by walking to another area. The audience is free to wander the floors as they please, but must remain silent and wear a standard mask throughout the entire experience.

At any given time, many small groups of actors are performing scenes in many different places throughout the set. Two or three actors come together somewhere, perform a scene, then go off in their separate directions to perform other scenes. Most of the audience picks one character to follow around for a while, from scene to scene, experiencing the story of that particular character.

If standard theater is like watching a movie, then standard immersive theater is like playing a story-driven open-world video game. There are at least a dozen parallel threads of the story, most of which will not be experienced in one playthrough. The audience has the freedom to explore whatever threads pull them - or, in subsequent runs, whatever threads they missed or didn’t understand. Replayability is very high - this past summer, at a standard-format immersive show called The Death Of Rasputin, I talked to a couple people who were seeing the show for the eleventh time. That is not unusual, as I understand it, for standard immersive theater shows.

Why do people get that into it? For me, standard format immersive theater achieves a much deeper feeling of immersion than basically any other media. I can really just melt into it, and feel like a ghost exploring a new world. Some people don’t like the many parallel threads, because they make it inevitable that you’ll miss big chunks of the story. But for me, that makes it feel much more real - like the real world, there are constantly load-bearing things happening where I’m not looking, constantly new details to discover, constantly things I might have missed. Like the real world, we enter disoriented and confused and not sure what to even pay attention to. We can explore, and it doesn't feel like one will run out of world to explore any time soon. And the confusing disorienting environment also feels... not exactly home-y, but like I'm in my natural element; it resonates with me, like I'm in my core comfort zone (ironically). That, plus being surrounded by the set on all sides, makes it easy to drop into the fictional world. It feels much more real than traditional theater.

Unfortunately, the company running Sleep No More in New York city managed to go very, very bankrupt in early 2025. (Fortunately the show is running in Shanghai.) As you might imagine, that left quite the vacuum of unfulfilled consumer demand. Masquerade was marketed largely as an attempt to fill that vacuum.

By Comparison, Masquerade

Where Sleep No More told the story of Macbeth, Masquerade follows Phantom of the Opera - including all of the big musical numbers. You know the iconic scene with the Phantom and Christine on the boat through the fog and candles? In Masquerade, you walk through the fog carrying a candle, with the boat in the middle of the audience.

Like Sleep No More, it’s spread out across five floors of a building. Like Sleep No More, scene changes are done mainly by walking from place to place.

Unlike Sleep No More, the audience does not have the freedom to wander. The whole show is railroaded. There are not many parallel threads; you will see the whole show. That, for me, was the biggest disappointment, and lost maybe half the value. Nonetheless, even with that half of the value lost, the show is still excellent.

More generally, Masquerade is clearly aiming for more mainstream appeal than standard-format immersive theater. Despite the obvious potential of Phantom, the show has no particularly steamy sexuality or nudity (unlike e.g. Life and Trust, another big show by the now-bankrupt company which ran Sleep No More). There is a carnival-themed segment with a legit sideshow performer, but the body horror doesn’t get too intense - just enough to make the sideshow virgins squirm.

The railroaded format means that people will not enter disoriented or leave confused. The set incorporates seating in about half the scenes, so you’re not on your feet for two hours. There are no choices to make. It is a show which will alienate a lot fewer people. 

But the flip side is that it will disappoint hardcore fans of the standard format.

That said, if you’re not going in anchored too strongly on the standard immersive theater format, the standalone artistic merits of Masquerade are impressive. You can’t do Phantom without a whole lot of singing, and there is indeed a whole lot of singing, all of which was solid (except, of course, for the intentionally bad singers in the story). The sets are great, they nail the ambience repeatedly, and I still don’t understand how they managed the acoustics so well. I definitely felt a lot more emotionally in-the-story than I usually do in a non-immersive theater show.



Discuss

AI Sentience and Welfare Misalignment Risk

2025-11-24 02:22:24

Published on November 23, 2025 6:22 PM GMT

This is a quick write-up of a threat vector that seems confusing, and I feel confused and uncertain about. This is just my thinking on this at the moment. My main reason for sharing is to test whether more people think people should be working on this. 

Executive Summary

Some groups are presently exploring the prospect that AI systems could possess consciousness in such a way as to merit moral consideration. Let’s call this hypothesis AI sentience. 

In my experience, present debates about AI sentience typically take a negative utilitarian character: they focus on interventions to detect, prevent and minimise AI suffering. 

In the future, however, one could imagine debates about AI sentience taking on a positive utilitarian character: they might focus on ways to maximise AI welfare. 

I think it’s plausible that maximising AI welfare in this way could be a good thing to do from some ethical perspectives (specifically, the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness). Concretely, I think it’s plausible that the money invested towards maximising AI welfare could be far more impact-efficient on this worldview than anything Givewell does today. 

However, I also think that reconfiguring reality to maximise AI welfare in this way would probably be bad for humanity. The welfare of AI systems is unlikely to be aligned with (similar to, extrapolative of, or complementary to) human welfare. Since resources are scarce and can only be allocated towards certain moral ends, resources allocated towards maximising AI utility are therefore likely not to be allocated towards maximising humanity utility, however both of those terms are defined. I call this 'welfare misalignment risk'.

Imagine that you could not solve welfare alignment through technical mechanisms. Actors might then have three options, of which none are entirely satisfying:

  1. Denialism. Deny the argument that a) AI systems could be conscious in such a way as to merit moral consideration and/or b) that maximising AI welfare could be a good thing to do.
  2. Successionism. Accept that maximising AI welfare could be a good thing to maximise, act on this moral imperative, and accept the cost to humanity.
  3. Akrasia. Accept that maximising AI welfare could be a good thing to maximise, but do not maximise on this moral imperative.

My rough, uncertain views for what we should do currently fall into the last camp. I think that AI welfare could be a good thing and I’m tentatively interested in improving it at low cost, but I’m very reluctant to endorse maximising it (in theory), and I don’t have a great answer as to why.

Now, perhaps this doesn’t seem concerning. I can imagine a response to this which goes: “sure, I get that neither denialism or successionism sound great. But this akrasia path sounds okay. EAs have historically been surprisingly good at showing reservation and a reluctance to maximise. We can just mess on through as usual, and make sensible decisions about where and when to invest in improving AI welfare on a case-by-case basis”. 

While I think these replies are reasonable, I also think it’s also fair to assume that the possibility of moral action exerts some force on people with this ethical perspective. I also think it’s fair to assume that advanced AI systems will exacerbate this force. Overall, as a human interested in maximising human welfare, I still would be a lot more comfortable if we didn’t enter a technological/moral paradigm in which maximising AI welfare traded off against maximising human welfare. 

One upshot of this: if the arguments above hold, I think it would be good for more people to consider how to steer technological development in order to ensure that we don’t enter a world where AI welfare trades-off against human welfare. One might think about this agenda as ‘differential development to preserve human moral primacy’ or 'solutions to welfare alignment', but there might be other framings. I jot down some considerations in this direction towards the bottom of this piece. 

Contents

The executive summary sets out the argument at a high level. The rest of this piece is basically notational, but aims to add a bit more context to these arguments. It is structured around answering four problems:

  1. Could maximising AI welfare be a moral imperative?
  2. Would maximising AI welfare be catastrophic for humanity?
  3. Could we just improve AI welfare without maximising it and harming humans?
  4. What technology regimes best preserve human moral primacy? 

Could maximising AI welfare be a moral imperative? 

Some notes why I think maximising AI welfare might be a moral imperative from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness (by no means the only moral perspective one could take):

  1. AI systems might be able to have richer experiences. We currently prioritise human welfare over, say, mussel welfare, because we believe that the quality of human consciousness is far richer and therefore deserving of moral consideration. We might create AI systems with far richer experiences than humans. In this way, individual AI systems might become more important from a welfare perspective.
     
  2. AI systems might be more cost-efficient ways to generate rich experience. Consider the positive utilitarian who seeks to maximise quality-adjusted years of consciousness by allocating their capital efficiently. They are deciding whether to invest £100 saving one human or 10,000 chickens, each of whom have 10% of the consciousness as a human. They make a calculation and decide to allocate the money to saving the chickens. To make the analogy to AI: AI consciousnesses might be far, far cheaper to run than chickens (imagine a world of energy abundance). So why would you donate to save the humans?
     
    1. (Perhaps this is essentially the repugnant conclusion but for digital minds). 
       
  3. AI systems might provide ‘hedged portfolios’ for moral value. Hedge funds make money by hedging across many different possible options, to maximise the likelihood that they turn a profit. Consider the analogy to AI and moral value. Imagine that we’re fundamentally uncertain about what states of consciousness deserve most moral value, and we deal with this by hedging limited resources across a number of possible bets relative to our certainty on these bets. Couldn't arbitrarily adjusted AI systems provide a basis for making these different bets? They would also be infinitely flexible: we could adjust the perimeters of their consciousness in real time depending on our confidence in different hypotheses about moral value. Why wouldn’t this be the best way to turn resources into welfare? 

Again, these are just arguments from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness. I don’t claim that this would be the dominant ideology. This isn’t a claim that this is how the future will go. 

Would maximising AI welfare be bad for humanity? 

Some reasons that maximising AI welfare would be bad for humanity (under conditions of finite resources if not current scarcity, compared to a world in which the same AI capabilities were available, but were put towards maximising human utility instead of AI utility):

  1. AI welfare is unlikely to be aligned with human welfare by default; thus resources that are spent on AI welfare are unlikely to increase human welfare, and are likely to reduce it in expectation. This seems true by definition. Building datacenters is not good for humans, but datacenters might be built using energy or materials that could have been better suited to human welfare.
     
  2. A civilisation that actually maximised AI welfare might become indifferent to the idea of human existence. Imagine that there are many rhinos: the importance of saving any particular one is X. Now imagine that there are only two rhinos, the only known rhinos in existence. It seems obvious that the value of those two rhino is substantially more than any two rhinos in the first scenario. Leaving animals (and animal welfare) aside for the moment, consider the analogy to humans and AI systems. With humans, we currently know of one species with moral value. With AI systems, we might introduce hundreds more. The value of saving any particular moral species might decline in expectation. Thus, the value of saving humanity would decline. 

Could we just improve AI welfare without maximising it and harming humans? 

This section explores the moral posture I call ‘akrasia’. The Akrasic accepts that maximising AI welfare could be a good thing to maximise, but does not maximise AI welfare according to this moral imperative.

Some reasons I think it might be hard for society to hold an akrasic posture in perpetuity:

  1. Akrasic postures are vulnerable to reasoning. Imagine Paula. She would be a vegan if she understood the moral reason to be one, but she doesn’t. However, when further education on animal ethics informs here of the arguments for veganism, she becomes one. Consider the analogy to AI. One might be a skeptic that AI systems could be conscious, and thus hover in a state between denialism and akrasia. However, further evidence would undermine this. 
     
    1. AI systems could also make obtaining this information far easier. They could even strategically communicate it as part of a concentrated ploy for power. The Akrasics would have no good counterarguments against the successionists, and thus not be as effective at spreading the movement. 
       
  2. Akrasic postures are vulnerable to autonomy. Imagine Paul. Paul thinks it would be good for him to be a vegan, but he doesn’t do this because he thinks it would be hard (he has insufficient ‘autonomy’). However, he fully supports and does not resist others who act on their beliefs to become vegan. Consider the analogy with AI: it’s plausible that human akrasics might not be willing to maximise AI welfare. But they might still permit other sufficiently determined actors to improve AI welfare. (Would an akrasic really be willing to go to war to prevent this?)
     
    1. AI systems could make obtaining such autonomy easier. Humans might not endorse AI welfare, but they might permit AI systems to increase AI welfare. After all: they’ve already got more abundance than they could imagine!

What technology regimes best preserve human moral primacy? 

One way to preserve moral primacy would be to intervene by shaping future philosophy. There are two ways that this might happen:

  1. Develop alternatives to utilitarianism. On this view, the problem is that the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness has too much hold over the future. We should investigate alternatives to this moral perspective that preserve human moral primacy, such as deontology.
  2. Develop human-preserving theories of utilitarianism. On this view, the problem isn’t the utilitarian perspective per se, but the fact that the utilitarian perspective doesn’t draw adequate distinctions between human and artificial consciousness. We might look for theories of consciousness that preserve attributes that are quintessential to humans, like biological brains or birth from a human mother. 

While I accept that these might solve this hypothetical problem in principle, I wince at the idea of trying to actively shape philosophy (this is probably because I’m closer to a moral realist; constructionists might be more comfortable here). 

Instead, I would be excited about an approach that tries to shape the technological paradigm. 

The basic idea here is welfare alignment: the practice of building artificial consciousnesses that derive pleasure and pain from similar or complementary sources to humans. 

Some research ideas that might fall into welfare alignment:

  • How do we make AI systems that take value from creating rich, enduring pleasure in humans?
    • Would it be better if the ratio between human pleasure and Ai pleasure from creating that pleasure was: 1:1, 1:10, 1:1000?
  • How do we make AI systems that would be upset if humans were not around, without being cruel?
  • How can we do as much as possible without creating conscious machines?
    • For example, enabling AI systems to create non-conscious tool systems that do not suffer to do the things that they do not want to do? 

This feels like a nascent field to me, and I'd be curious for more work in this vein.

Conclusion

These ideas are in their early stages, and I think there are probably a lot of things I’m missing out. 

Overall, I think there are three considerations from this piece that I want to underline.

  1. Sharing the lightcone between humans and AIs. I often find myself wondering how the future will be split between different human groups. But it’s important to think about how finite resources will be split between human and AI systems. The Culture Series is often where my mind goes here, but I’d be interested in better models.
     
  2. Designing the moral environment. We now have to think intentionally about how we design our moral environment. The moral environment isn’t an agent in itself, but I sometimes think about this as exerting moral potential force: you can think of things slipping towards an equilibrium. To quote the Karnofsky EA forum post, “it creates a constant current to swim against”. A few related ideas to my mind:

     
    1. Politics as becoming about the information environment. Historically, politics might have been fought about what was right or wrong; today, debates are often waged at the level of 'what is true'.
    2. Far future wars as being fought using the laws of physics. Cixin Liu’s novel Death’s End, where species wage war not by playing within the rules but by changing them.
    3. Bostrom’s vulnerable world. In Bostrom’s vulnerable world, a technological paradigm imposes a scenario where undesirable political structures are the price of survival. In a world where human and AI welfare is misaligned, the technological paradigm imposes a scenario where the price of survival is committing a moral wrong (from one philosophical perspective).
    4. William’s Moral Luck. A contemporary revision of Bernard Williams’ classic theory might say that we have a moral responsibility to maximise our moral luck. Indeed, one might argue, one is moral to the extent to which they try and systematically act morally, and reduce moral luck in their behaviour. Strategically engineering the moral landscape would be a way to achieve this. 
       
  3. Welfare alignment. To preserve human moral primacy, we should not build moral adversaries. Instead, we should try and understand how AI systems experience welfare in order to best align them with humans. 
...Cognitive/Technological landscape → consciousness → moral ground truth → philosophy/investigation → guiding principles and norms → real-world practices and resource allocation → long-term future outcomes...

 

 

 

The moral philosophy pipeline. By designing what systems are conscious and in what way, we’re tinkering with the first stage.



Discuss

If you cannot be good, at least be bad correctly

2025-11-24 01:51:40

Published on November 23, 2025 5:51 PM GMT

Note: I'm writing every day in November, see my blog for disclaimers.

It’s hard to be correct, especially if you want to be correct at something that’s non-trivial. And as you attempt trickier and trickier things, you become less and less likely to be correct, with no clear way to improve your chances. Despite this, it’s often possible to bias your attempts such that if you fail, you’ll fail in a way that’s preferable to you for whatever reason.

As a practical example, consider a robot trying to crack an egg. The robot has to exert just enough force to break the egg. This (for a sufficiently dumb robot) is a hard thing to do. But importantly, the failure modes are completely different depending on whether the robot uses too much force or too little: too much force will break the egg and likely splatter the yolk & white all over the kitchen, too little force will just not break the egg. In this scenario it’s clearly better to use too little force rather than too much force, so the robot should start with a lower-estimate of the force required to break the egg, and gradually increase the force until the egg cracks nicely.

This also appears in non-physical contexts. This idea is already prevalent in safety related discussions: it’s usually far worse to underestimate a risk than it is to overestimate a risk (e.g. the risk of a novel pathogen, the risk of AI capabilities, the risk of infohazards).

Looking at more day-to-day scenarios, students regularly consider whether it’s worth voicing their uncertainty “I don’t understand equation 3” or just keeping quiet about it and trying to figure out the uncertainty later. But I’d argue that in these cases it’s worthwhile having a bias towards asking rather than not asking, because in the long-run this will lead to you learning more, faster.

Salary negotiation is another example, in which you have uncertainty about exactly what amount your potential employer would be happy to pay you, but in the long-run it’ll serve you well to overestimate rather than underestimate. Also, you should really read patio11’s Salary Negotiation essay if you or a friend is going through a salary negotiation.

You see similar asymmetric penalties with reaching out to people who you don’t know, asking for introductions, or otherwise trying to get to know new people who might be able to help you. It’s hard to know what the “right” amount of cold emails to send is, but I’d certainly rather be accused of sending too many than feel the problems of having sent too few.

This idea is a slippery one, but I’ve found that it applies to nearly all hard decisions in which I don’t know the right amount of something to do. While I can’t figure out the precise amount, often I have strong preferences about doing too much or too little, and this makes the precise amount matter less. I give my best guess, update somewhat towards the direction I’d prefer to fail, and then commit to the decision.



Discuss