2025-11-24 05:57:53
Published on November 23, 2025 9:57 PM GMT
I’m proud of a few of these ones. I was sick during this segment of Halfhaven, but I still managed to get things out, which I’m happy with. I had a few mostly-finished posts in the chamber.
We’re entering the final segment of Halfhaven. Many won’t finish the full 30 post challenge by the end of November, but I’ve still gotten some good posts out of the people who didn’t make it all the way, so be proud of what you have done, rather than dwelling on what you didn’t do. Good luck in the final week everyone!
2025-11-24 05:57:20
Published on November 23, 2025 9:57 PM GMT
Queries about my internal state tend to return fabricated answers. It doesn't much matter if it's me or someone else asking the questions. It's not like I know what's going on inside my head. Thoughts can be traced to an extent, but feelings are intangible. Typically I just don't try, and the most pressing issue is that I'm unable to differentiate anxiety and hunger. Not a huge problem, except for slight over-eating once in a while. I think the description of Alexithymia matches my experiences quite well, although naturally not all of the symptoms match.
The real issues arise from other people asking how I feel or what caused me to act in one way or another. I have no answers to such questions! I'm guided by intractable anxiety, learned patterns on how one ought to navigate a situation, and a mostly-subconscious attempt to keep it all consistent with how I've been before. Complicated yet incomplete models about how emotions and motivations are supposed to work, stolen from books I like to substitute my reality with. Shallow masks on top of a void that only stares back when I look for the answers.
Whenever actual pressure is placed on me to obtain the unavailable answers, the narrator makes up a story. Good stories make sense, so the narrator finds an angle that works. Memories are re-interpreted or modified to match the story as necessary. Painting a good picture of oneself is imperative, and the stories pick just the right frame for that. Actually lying is unnecessary; without closer inspection it's not hard to actually believe that all, and the inability to trust one's own memories or reasoning doesn't help. Just noticing that this kind of thing was going on was quite hard. Sometimes I add disclaimers when the topic seems prone to fabricated emotions, especially when analyzing events of the past. Often I won't bother, people tend to not appreciate it and mostly just causes everyone else involved to be frustrated as well. Still, anyone who gets to know me well enough would probably notice it at some point, and keeping it secret would feel unsustainable too.
I'm not sure how this should be taken into account when modeling other people. Is everyone like this? I think so, but only rarely as strongly as I am. Nor as self-aware, although perhaps most people are better at this, proportionate to how much it affects them. People rarely report experiencing the same, when I tell them of fear of being just an empty core behind my masks. Perhaps if the masks are a bit closer, they feel like a part of one's personality rather than some bolted-on external layer. The lacking sense of identity is a depression thing, so maybe mentally healthy people, whatever that means, have an experience of all-encompassing identity.
In my previous text on related matters, I looked at it through the lens of validation-seeking. I'm not sure how much of the fabrication happens because the narrator rewrites the events in a more flattering way, but that's surely a part of this. But not all of it.
All of this was probably fabricated too, as it was mostly produced by the need to have something to write about. Oh well.
2025-11-24 03:37:02
Published on November 23, 2025 7:37 PM GMT
My understanding is that even those advocating a pause or massive slowdown in the development of superintelligence think we should get there eventually[1]. Something something this is necessary for humanity to reach its potential.
Perhaps so, but I'll be sad about it. Humanity has a lot of unsolved problems right now. Aging, death, disease, poverty, environmental degradation, abuse and oppression of the less powerful, conflicts, and insufficient resources such as energy and materials.
Even solving all the things that feel "negative", the active suffering, there's all this potential for us and the seemingly barren universe that could be filled with flourishing life. Reaching that potential will require a lot of engineering puzzles to be solved. Fusion reactors would be neat. Nanotechnology would be neat. Better gene editing and reproductive technology would be neat.
Superintelligence, with its superness, could solve these problems faster than humanity is on track to. Plausibly way way faster. With people dying every day, I see the case for it. Yet it also feels like the cheat code to solving all our problems. It's building an adult to take care of us, handing over the keys and steering wheel, and after that point our efforts are enrichment. Kinda optional in sense, just us having fun and staying "stimulated".
We'd no longer be solving our own problems. No longer solving unsolved problems for our advancement. It'd be play. We'd have lost independence. And yes, sure, you could have your mind wiped of any relevant knowledge and left to solve problems with your own mind for however long it takes, but it just doesn't strike me as the same.
Am I making some mistake here? Maybe. I feel like I value solving my own problems. I feel like I value solving problems that are actually problems and not just for the exercise.
Granted, humanity will have built the superintelligence and so everything the superintelligence does will have been because of us. Shapley will assign us credit. But cheat code. If you've ever enabled God-mode on a video game, you might have shared my experience that it's fun for a bit and then gets old.
Yet people are dying, suffering, and galaxies are slipping beyond our reach. The satisfaction of solving puzzles for myself needs to be traded off...
The other argument is that perhaps there are problems humanity could never solve on its own. I think that depends on the tools we build for ourselves. I'm in favor of tools that are extensions of us rather than a replacement. A great many engineering challenges couldn't be solved without algorithmic data analysis and simulations and that kind of thing. It feels different if we designed the algorithm and it only feeds in our own overall work. Genome-wide association tools don't do all the work while scientists sit back.
I'm also very ok with intelligence augmentation and enhancement. That feels different. A distinction I've elided over is between humans in general solving problems vs me personally solving them. I personally would like to solve problems, but it'd be rude and selfish to seriously expect or aspire to do them all myself ;) I still feel better about the human collective[2] solving them than a superintelligence, and maybe in that scenario I'd get some too.
There might be questions of continuity of identity once you go hard enough, yet for sure I'd like to upgrade my own mind, even towards becoming a superintelligence myself – whatever that'd mean. It feels different than handing over the problems to some other alien entity we grew.
In many ways, this scenario I fear is "good problems to have". I'm pretty worried we don't even get that. Still feels appropriate to anticipate and mourn what is lost even if things work out.
As I try to live out the next few years in the best way possible, one of the things I'd like to enjoy and savor is that right now, my human agency is front and center[3].
See also Requiem for the hopes of a pre-AI world.
I remember Nate Soares saying this, though I don't recall the source. Possibly it's in IABED itself. I distinctly remember Habryka saying it'd problematic (deceptive?) to form a mass movement with people who are "never AI" for this reason.
Or post-humans or anything else more in our own lineage that feels like kin.
The analogy that's really stuck with me is that we're in the final years before humanity hands over the keys to a universe. (From a talk Paul Christiano gave, maybe at Foresight Vision weekend, though I don't remember the year.)
2025-11-24 03:20:42
Published on November 23, 2025 7:20 PM GMT
Earlier this month, I was pretty desperately feeling the need for a vacation. So after a little googling, I booked a flight to New York city, a hotel, and four nights worth of tickets to a new immersive theater show called Masquerade.
To convey Masquerade, I find it easiest to compare against standard immersive theater.
It’s weird to talk about “standard” immersive theater, because the standard is not that old and has not applied to that many shows at this point. Nonetheless, there is an unambiguous standard format, and the show which made that format standard is Sleep No More. I have not yet seen Sleep No More itself, but here’s my understanding of it.
Sleep No More follows the story of Macbeth. Unlike Shakespearre’s version, the story is not performed on a central stage, but rather spread out across five floors of a building, all of which is decked out as various locations from the story. Scene changes are not done by moving things on “stage”, but rather by walking to another area. The audience is free to wander the floors as they please, but must remain silent and wear a standard mask throughout the entire experience.
At any given time, many small groups of actors are performing scenes in many different places throughout the set. Two or three actors come together somewhere, perform a scene, then go off in their separate directions to perform other scenes. Most of the audience picks one character to follow around for a while, from scene to scene, experiencing the story of that particular character.
If standard theater is like watching a movie, then standard immersive theater is like playing a story-driven open-world video game. There are at least a dozen parallel threads of the story, most of which will not be experienced in one playthrough. The audience has the freedom to explore whatever threads pull them - or, in subsequent runs, whatever threads they missed or didn’t understand. Replayability is very high - this past summer, at a standard-format immersive show called The Death Of Rasputin, I talked to a couple people who were seeing the show for the eleventh time. That is not unusual, as I understand it, for standard immersive theater shows.
Why do people get that into it? For me, standard format immersive theater achieves a much deeper feeling of immersion than basically any other media. I can really just melt into it, and feel like a ghost exploring a new world. Some people don’t like the many parallel threads, because they make it inevitable that you’ll miss big chunks of the story. But for me, that makes it feel much more real - like the real world, there are constantly load-bearing things happening where I’m not looking, constantly new details to discover, constantly things I might have missed. Like the real world, we enter disoriented and confused and not sure what to even pay attention to. We can explore, and it doesn't feel like one will run out of world to explore any time soon. And the confusing disorienting environment also feels... not exactly home-y, but like I'm in my natural element; it resonates with me, like I'm in my core comfort zone (ironically). That, plus being surrounded by the set on all sides, makes it easy to drop into the fictional world. It feels much more real than traditional theater.
Unfortunately, the company running Sleep No More in New York city managed to go very, very bankrupt in early 2025. (Fortunately the show is running in Shanghai.) As you might imagine, that left quite the vacuum of unfulfilled consumer demand. Masquerade was marketed largely as an attempt to fill that vacuum.
Where Sleep No More told the story of Macbeth, Masquerade follows Phantom of the Opera - including all of the big musical numbers. You know the iconic scene with the Phantom and Christine on the boat through the fog and candles? In Masquerade, you walk through the fog carrying a candle, with the boat in the middle of the audience.
Like Sleep No More, it’s spread out across five floors of a building. Like Sleep No More, scene changes are done mainly by walking from place to place.
Unlike Sleep No More, the audience does not have the freedom to wander. The whole show is railroaded. There are not many parallel threads; you will see the whole show. That, for me, was the biggest disappointment, and lost maybe half the value. Nonetheless, even with that half of the value lost, the show is still excellent.
More generally, Masquerade is clearly aiming for more mainstream appeal than standard-format immersive theater. Despite the obvious potential of Phantom, the show has no particularly steamy sexuality or nudity (unlike e.g. Life and Trust, another big show by the now-bankrupt company which ran Sleep No More). There is a carnival-themed segment with a legit sideshow performer, but the body horror doesn’t get too intense - just enough to make the sideshow virgins squirm.
The railroaded format means that people will not enter disoriented or leave confused. The set incorporates seating in about half the scenes, so you’re not on your feet for two hours. There are no choices to make. It is a show which will alienate a lot fewer people.
But the flip side is that it will disappoint hardcore fans of the standard format.
That said, if you’re not going in anchored too strongly on the standard immersive theater format, the standalone artistic merits of Masquerade are impressive. You can’t do Phantom without a whole lot of singing, and there is indeed a whole lot of singing, all of which was solid (except, of course, for the intentionally bad singers in the story). The sets are great, they nail the ambience repeatedly, and I still don’t understand how they managed the acoustics so well. I definitely felt a lot more emotionally in-the-story than I usually do in a non-immersive theater show.
2025-11-24 02:22:24
Published on November 23, 2025 6:22 PM GMT
This is a quick write-up of a threat vector that seems confusing, and I feel confused and uncertain about. This is just my thinking on this at the moment. My main reason for sharing is to test whether more people think people should be working on this.
Some groups are presently exploring the prospect that AI systems could possess consciousness in such a way as to merit moral consideration. Let’s call this hypothesis AI sentience.
In my experience, present debates about AI sentience typically take a negative utilitarian character: they focus on interventions to detect, prevent and minimise AI suffering.
In the future, however, one could imagine debates about AI sentience taking on a positive utilitarian character: they might focus on ways to maximise AI welfare.
I think it’s plausible that maximising AI welfare in this way could be a good thing to do from some ethical perspectives (specifically, the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness). Concretely, I think it’s plausible that the money invested towards maximising AI welfare could be far more impact-efficient on this worldview than anything Givewell does today.
However, I also think that reconfiguring reality to maximise AI welfare in this way would probably be bad for humanity. The welfare of AI systems is unlikely to be aligned with (similar to, extrapolative of, or complementary to) human welfare. Since resources are scarce and can only be allocated towards certain moral ends, resources allocated towards maximising AI utility are therefore likely not to be allocated towards maximising humanity utility, however both of those terms are defined. I call this 'welfare misalignment risk'.
Imagine that you could not solve welfare alignment through technical mechanisms. Actors might then have three options, of which none are entirely satisfying:
My rough, uncertain views for what we should do currently fall into the last camp. I think that AI welfare could be a good thing and I’m tentatively interested in improving it at low cost, but I’m very reluctant to endorse maximising it (in theory), and I don’t have a great answer as to why.
Now, perhaps this doesn’t seem concerning. I can imagine a response to this which goes: “sure, I get that neither denialism or successionism sound great. But this akrasia path sounds okay. EAs have historically been surprisingly good at showing reservation and a reluctance to maximise. We can just mess on through as usual, and make sensible decisions about where and when to invest in improving AI welfare on a case-by-case basis”.
While I think these replies are reasonable, I also think it’s also fair to assume that the possibility of moral action exerts some force on people with this ethical perspective. I also think it’s fair to assume that advanced AI systems will exacerbate this force. Overall, as a human interested in maximising human welfare, I still would be a lot more comfortable if we didn’t enter a technological/moral paradigm in which maximising AI welfare traded off against maximising human welfare.
One upshot of this: if the arguments above hold, I think it would be good for more people to consider how to steer technological development in order to ensure that we don’t enter a world where AI welfare trades-off against human welfare. One might think about this agenda as ‘differential development to preserve human moral primacy’ or 'solutions to welfare alignment', but there might be other framings. I jot down some considerations in this direction towards the bottom of this piece.
The executive summary sets out the argument at a high level. The rest of this piece is basically notational, but aims to add a bit more context to these arguments. It is structured around answering four problems:
Some notes why I think maximising AI welfare might be a moral imperative from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness (by no means the only moral perspective one could take):
Again, these are just arguments from the perspective of a positive utilitarian seeking to maximise quality-adjusted years of consciousness. I don’t claim that this would be the dominant ideology. This isn’t a claim that this is how the future will go.
Some reasons that maximising AI welfare would be bad for humanity (under conditions of finite resources if not current scarcity, compared to a world in which the same AI capabilities were available, but were put towards maximising human utility instead of AI utility):
This section explores the moral posture I call ‘akrasia’. The Akrasic accepts that maximising AI welfare could be a good thing to maximise, but does not maximise AI welfare according to this moral imperative.
Some reasons I think it might be hard for society to hold an akrasic posture in perpetuity:
One way to preserve moral primacy would be to intervene by shaping future philosophy. There are two ways that this might happen:
While I accept that these might solve this hypothetical problem in principle, I wince at the idea of trying to actively shape philosophy (this is probably because I’m closer to a moral realist; constructionists might be more comfortable here).
Instead, I would be excited about an approach that tries to shape the technological paradigm.
The basic idea here is welfare alignment: the practice of building artificial consciousnesses that derive pleasure and pain from similar or complementary sources to humans.
Some research ideas that might fall into welfare alignment:
This feels like a nascent field to me, and I'd be curious for more work in this vein.
These ideas are in their early stages, and I think there are probably a lot of things I’m missing out.
Overall, I think there are three considerations from this piece that I want to underline.
| ...Cognitive/Technological landscape → consciousness → moral ground truth → philosophy/investigation → guiding principles and norms → real-world practices and resource allocation → long-term future outcomes... |
The moral philosophy pipeline. By designing what systems are conscious and in what way, we’re tinkering with the first stage.
2025-11-24 01:51:40
Published on November 23, 2025 5:51 PM GMT
Note: I'm writing every day in November, see my blog for disclaimers.
It’s hard to be correct, especially if you want to be correct at something that’s non-trivial. And as you attempt trickier and trickier things, you become less and less likely to be correct, with no clear way to improve your chances. Despite this, it’s often possible to bias your attempts such that if you fail, you’ll fail in a way that’s preferable to you for whatever reason.
As a practical example, consider a robot trying to crack an egg. The robot has to exert just enough force to break the egg. This (for a sufficiently dumb robot) is a hard thing to do. But importantly, the failure modes are completely different depending on whether the robot uses too much force or too little: too much force will break the egg and likely splatter the yolk & white all over the kitchen, too little force will just not break the egg. In this scenario it’s clearly better to use too little force rather than too much force, so the robot should start with a lower-estimate of the force required to break the egg, and gradually increase the force until the egg cracks nicely.
This also appears in non-physical contexts. This idea is already prevalent in safety related discussions: it’s usually far worse to underestimate a risk than it is to overestimate a risk (e.g. the risk of a novel pathogen, the risk of AI capabilities, the risk of infohazards).
Looking at more day-to-day scenarios, students regularly consider whether it’s worth voicing their uncertainty “I don’t understand equation 3” or just keeping quiet about it and trying to figure out the uncertainty later. But I’d argue that in these cases it’s worthwhile having a bias towards asking rather than not asking, because in the long-run this will lead to you learning more, faster.
Salary negotiation is another example, in which you have uncertainty about exactly what amount your potential employer would be happy to pay you, but in the long-run it’ll serve you well to overestimate rather than underestimate. Also, you should really read patio11’s Salary Negotiation essay if you or a friend is going through a salary negotiation.
You see similar asymmetric penalties with reaching out to people who you don’t know, asking for introductions, or otherwise trying to get to know new people who might be able to help you. It’s hard to know what the “right” amount of cold emails to send is, but I’d certainly rather be accused of sending too many than feel the problems of having sent too few.
This idea is a slippery one, but I’ve found that it applies to nearly all hard decisions in which I don’t know the right amount of something to do. While I can’t figure out the precise amount, often I have strong preferences about doing too much or too little, and this makes the precise amount matter less. I give my best guess, update somewhat towards the direction I’d prefer to fail, and then commit to the decision.