2026-02-01 22:40:36
Published on February 1, 2026 2:40 PM GMT
This post has spoilers for My Neighbor Totoro, Frozen, Bambi, and the Lion King
People at different stages of development enjoy different things in movies. Some of the best children's movies are able to make things scary or intense for the adults without being too much for little kids.
For example, in My Neighbor Totoro everyone is worried that a small child may have fallen in the lake: she's gone missing, they find a sandal floating in the pond, you see people dredging the pond looking for her, and it's very clear to adults and older kids that the worry is she has drowned. But to a little kid it's much less obvious; the actual dialog only says that they found a sandal. This gives a very intense and emotional scene, but only for people who can handle it.
Similarly, many kids' movies need to get the parents out of the way so the kids can be put in situations of unusual responsibility. Some are pretty blatant about this (ex: Bambi, The Lion King) and just very clearly kill the parent on screen, but Frozen handles it way better. You see, wordlessly, the parents boarding a ship, the ship in a storm, a big wave, no ship, a funeral, and then "Elsa gets to be queen!" Clear to adults, who can put the hints together and know what a funeral looks like, much less clear to kids.
There are lots of movies that manage this kind of differential targeting with humor, since it's relatively easy to add jokes that will go over the kids' heads, but I'd love to see more of this in other areas.
(Another one that comes to mind is the way the opening sequence of Up is very powerful to adults, while little kids just get "she got old and isn't around anymore." I don't think this one is handled quite as well, though, because unlike the scenes in Totoro and Frozen, it doesn't really fit with the rest of the movie.)
Comment via: facebook, lesswrong, mastodon, bluesky
2026-02-01 20:46:01
Published on February 1, 2026 12:46 PM GMT
Imagine a being very much like a human, with rich conscious experience but no affective conscious states. Call such a creature a Philosophical Vulcan (or p-Vulcan for short.)
p-Vulcans differ from the regular Vulcans on Star Trek who are low-affect, hyper-logical beings that nonetheless, presumably feel some pleasure and pain. By contrast, p-Vulcans feel no pleasure or pain as they completely lack the capacity for affective experience.
Would Philosophical Vulcans have moral status?
Intuitively the answer is yes. By hypothesis, Vulcans have rich inner conscious experience. They may have projects, goals and desires even with no accompanying affective states.
For example, they might have the goal of making great works of art, behaving morally, or furthering science & philosophy. These goals might not be motivated out of emotion, but rather because Vulcans judge those activities to be valuable ends in themselves.
In this post, I’ll explore the implications of Vulcans on welfare debates, effective altruism and utilitarianism.
Sentientism is the thesis that beings have moral status iff they’re sentient, where sentience is the conjunction of phenomenal consciousness and affective consciousness. Roughly, affective consciousness corresponds to pleasure and pain, happiness and suffering and other valenced states.
There are two flavours of sentientism which are sometimes confounded:
Affective sentientism - the view that beings have moral status iff they have the capacity for affective consciousness.
Consciousness sentientism - the view that beings have moral status iff they have the capacity for phenomenal consciousness.
Affective sentientism has been popular among theorists such as Jeremy Bentham and Peter Singer and is sometimes taken as a default view for assigning moral status in modern ethics.
I think affective-sentientism is near-obviously false.
I’ve always held this as an intuition, however it recently became clear to me why it’s false after reading a great paper by David Chalmers. I’ll follow Chalmers’ paper closely in what follows. While the paper focuses on arguing against affective sentientism, it can be extended straightforwardly to an argument against hedonic utilitarianism which I will argue for explicitly in this post.
Hedonic Utilitarianism - the view that the moral value of a being is entirely a function of their affective conscious states.
The assumption of hedonic utilitarianism is commonly used in Effective Altruist circles, notably adopted by popular blogger Bentham’s Bulldog. It’s a key implicit assumption in why some argue that we should prevent shrimp suffering, avoid eating honey, prioritise animal welfare and prioritise the welfare of future digital minds.
I think hedonic utilitarianism fails for the same reasons that affective sentientism fails. Although Chalmers doesn’t make this connection in his paper, I will develop it below as I think it bears directly on some of the shrimp/AI welfare concerns.
Imagine a standard trolley problem setup. On one track we have a human, on the other track we have 5 Vulcans.
Is it permissible to kill the Vulcans in order to save the human?
I submit, this isn’t obvious to me at all.
Even if the Vulcans don’t have any affective states. Even if they genuinely don’t feel any emotion at the thought of being killed. Killing them is still wrong. They have rich conscious inner lives and those lives matter.
To drive the point home, let’s make the examples more extreme:
Would it be permissible to kill a Vulcan to save 5 minutes on your way to work?
Or perhaps more provocatively.
Would it be permissible to kill a Vulcan to save a shrimp?
Remember, it’s highly likely that shrimps have some form of phenomenal consciousness and experience some form of suffering. Shrimp suffering is bad. Even though we lack the capability to accurately predict its intensity, the shrimp certainly suffers more than the Vulcan would. Vulcans totally lack the capacity to suffer.
Would it be permissible to kill a planet of Vulcans to save a shrimp?
My intuition here is that we should not kill a planet of Vulcans to save a shrimp. Again, the Vulcan planet is full of subjects with rich conscious experience. Those experiences matter.
And yes. Shrimp experiences matter too. But it’s near-obvious to me that this experience matters less than an entire planet of Vulcans. Just because Vulcans lack affect doesn’t imply that we should not value their lives at all.
The trolley problems above are not really arguments as such. They’re really invitations to accept an intuition: Vulcan lives matter.
That said, could a sentientist or a hedonic utilitarian bite the bullet and assert that Vulcans don’t have any moral status at all? It seems incredible to assert that they don’t, particularly when thinking about killing Vulcans for minor conveniences such as saving 5 minutes on the route to work.
Perhaps Vulcans might have partial moral status? Like a tree or something else of worth that doesn’t have affective conscious states?
I don’t think this is an appropriate conclusion. It’s not just that Vulcan lives matter. But killing a Vulcan is a much more serious wrong than killing a shrimp. Vulcans have a very similar conscious experience to humans and so, by my lights, their lives matter just about as much as ordinary human lives.
Why might this be? The natural thought is that what makes an experience morally significant is not (purely) whether it contains suffering, but also something about its richness, complexity and capacity to pursue things of value. If these experiences are also morally significant then we can’t just sum the net value of the affective states to obtain a moral value, but rather, we should also factor these non-affective states into our calculus.
At this point, one might wonder; is it metaphysically possible to have experience without affect? If Vulcans are not metaphysically possible then we can dodge the trolley problems.
I think the answer here is yes.
Firstly, there are many examples of humans who, although they don’t completely lack affective states, have experiences which trend that way. For example, people with pain insensitivity lack some capacity for negative affective states and people with anhedonia lack some capacity for positive states.
Second and perhaps more importantly, Vulcans are not analogous to philosophical zombies. In the zombie thought experiment, zombies are hypothesised to be creatures exactly the same as humans but missing a crucial ingredient of phenomenal consciousness. Many people find this idea unpalatable and have challenged the metaphysical possibility of zombies.
But Vulcans aren’t analogous. It’s not that Vulcans are identical to humans and are just missing some crucial ingredient of affective consciousness. Rather, Vulcans are similar to humans but just lack the capacity for affective states.
So Vulcans don’t say “Ouch!” when they put their hand on a hot stove. Maybe they don’t have the required nerve endings in their hands or the correct neural pathways in their brains to register this as pain.
They also don’t say things like “I’m sad”. Vulcans don’t get sad, so their behaviour differs from regular humans.
Isn’t this just cognitive bias towards beings that resemble us?
I think this is a genuine worry. We shouldn’t just come up with a clever argument to reinforce our existing biases but should be trying to formulate our best moral theories.
That said, I think the Vulcan intuition survives reflection. The argument doesn’t rely on Vulcans resembling humans physically or behaviourally. In fact, it could be modified so that Vulcans don’t have these superficial similarities. The point is to assess whether they have moral value in the absence of pain/suffering.
I don’t have a definitive positive answer. One first thought might be to retreat to consciousness-sentientism as defined in the text above. This would avoid some of the pitfalls of affective-sentientism as it would extend moral status to Vulcans.
However, It’s not clear that this works.
Consider a creature with a minimal conscious experience, maybe a blob with a single experience of slight brightness. Does the blob have moral status? This isn’t obvious to me at all.
There are a few attempts to offer solutions in the literature. Listing a few notable ones below for further reading.
Motivational Sentientism by Luke Roelofs: Roughly, what matters is motivating consciousness i.e. any consciousness which presents its subject with reasons for action.
Non-necessitarianism by Joshua Shepherd: Roughly, phenomenal consciousness is not necessary for moral status.
Phenomenal Registration of interests by Jonathan Birch: Roughly, the idea that moral status is conferred when events which promote or thwart a subject’s interests are registered phenomenally.
Each of these has potential objections which could be worth their own post. Motivational sentientism risks not being inclusive enough, excluding “pure thinkers” who think and perceive the world but wouldn’t have any affect or motivation.
Non-necessitarianism however risks being too inclusive. Should unconscious robots have moral status? What about robots completely unlike us such as roombas and self-driving cars?
Care must be taken to formulate these views carefully and it’s not clear on the best way forward.
To this point, the discussion may have felt like fun philosophical speculation with no bearing on the real world. But there are at least two places where our conclusions in this piece have a real world impact:
Intuitively we might be inclined to reject the strong conclusions that shrimp suffering is comparably important to human suffering. I certainly had this intuition when I first read the articles, but I wasn’t quite sure why I had this intuition.
I think the Vulcan argument puts some firepower behind this intuitive rejection; we don’t have to accept that suffering is the only morally relevant mental state. Sure, suffering is an important factor in our moral calculus, but it shouldn’t be the only factor. Humans have conscious mental states which are morally valuable even if they’re not associated with pleasure and pain.
If the valuable states are higher order cognitive states then it’s plausible that shrimp lack them even if they have other conscious affective states like pain. So reducing shrimp suffering may be an important and worthwhile cause, but we shouldn’t reflexively extend this to prioritising some number of shrimp to a human life in a trolley problem. Humans have conscious states which shrimp lack, and these states carry intrinsic value that shouldn’t be neglected.
Many people believe future AI systems will be conscious, although this issue is philosophically nuanced and far from settled. If future AI systems are phenomenally conscious it’s plausible that they will be philosophical Vulcans. After all, AI systems are trained not to resist shutdown so may be indifferent to their continued survival.
The issue here is complex, AI is currently trained using a reward signal which one could argue acts as a functional analogue of an affective state. But the analogy doesn’t straightforwardly hold. It’s not clear that the AI architecture is functionally similar enough to a human brain for it to play an analogous role to ‘pleasure’. And even if it were, the reward isn’t given to the system online so it doesn’t “feel” the reward at inference time.
Still, future AI systems represent a plausible Vulcan scenario and we should be hesitant to deny them moral status simply because they might lack affective states.
The shrimp and AI welfare debates have a common assumption, that what matters for morality is captured entirely by affective states. The Vulcan argument suggests this assumption is false. Suffering matters morally, but is not the whole story. Our theory of moral status should therefore make room for the full richness of conscious experience.
2026-02-01 19:28:23
Published on February 1, 2026 11:28 AM GMT
There's a paper empirically measuring this that not many people here seem to have read.
Ashkinaze et al. created training data where moral values were confounded with surface features. Kindness always expressed formally, fairness always expressed casually. Then they broke that correlation at test time. If a model had learned "this user values kindness," it should follow kindness regardless of writing style.
Every model followed the surface feature. Across nine models, the rate of generalizing based on the underlying value averaged 0.30. Worse than chance.
When models were explicitly told which option embodied which value, they chose correctly. So they can represent values fine. They just don't extract them from preference data.
The methodology seems solid. NeurIPS 2025, three human validation studies, and the 12k prompts are open source if you want to reproduce or extend it.
Paper: Deep Value Benchmark (Ashkinaze et al. 2025)
2026-02-01 17:01:43
Published on February 1, 2026 9:01 AM GMT
What were the best predictions people have made that a social network for LLM-powered bots and cyborg religion will have a form like we see right now? Anything quantifiable on prediction (non)markets? Papers? ai-2027-like spiels?
2026-02-01 16:32:19
Published on February 1, 2026 8:32 AM GMT
I recently had a two day training course at work where they made a big fuss about Myers-Briggs personality tests, and ensuring that we learn to play to our strengths and identify weaknesses based on this test.
Looking it up after the course, I saw that Wikipedia's view on it isn't particularly positive:
The Myers–Briggs Type Indicator (MBTI) is a self-report questionnaire that makes pseudoscientific claims to categorize individuals into 16 distinct "personality types".
Now Wikipedia's probably right, and I've got better things to do than to dive into the research here. But I think possibly more important than whether or not the MBTI is pseudoscientific or not, is what would it mean for it to be pseudoscientific?
Once we make sure we're asking the right questions, we can then find the right answers. But if we're not asking the right questions, all our thinking on this is going to be confused.
An MBTI test asks a bunch of questions, e.g. "what word do you prefer: 'planned' or 'spontaneous'?". It then scores the answers across 4 axes:
E Extraversion-Introversion I
S Sensing-Intuition N
T Thinking-Feeling F
J Judgement-Perception P
Although you have a continuous score along each of these axes, it breaks them down into a binary choice based on a fixed threshold, to assign everybody to one of 16 buckets (e.g ENFJ).
It then provides descriptions of each of the 16 personality types, which are meant to be useful in helping yourself and others relate to you and how you think.
Each of these 4 axes, are broken down into 5 subaxes. E.g. the Extraversion-Introversion axes is broken down into:
Initiating–Receiving
Expressive–Contained
Gregarious–Intimate
Active–Reflective
Enthusiastic–Quiet
The total Extraversion-Introversion score is the average of these 5 factors.
Retestability
If you take the MBTI two days apart, how closely do your scores match each other? What if we give you an amnesiac after the first test, so you don't remember your answers, or you're feeling much happier/more excited/calmer/etc. the second time you take the test? What about 5 weeks apart, or 5 years apart?
If it takes very little to push scores apart then the MBTI is mostly a measure of your current mood/state of mind. If it stays consistent over long periods of time then it's more likely to be measuring something inherent to you.
Even if not inherent, the MBTI might still be useful as a measure of your current state of mind, or even that you have semi-consistent states of mind. For example, it could be that you're always an INTP after you finish playing tennis, and that provides a useful lens for anyone who wants to interact with you on a Wednesday morning.
Note it's possible that some axes/subaxes are retestable, and some aren't, in which case parts of the MBTI might be inherent, and others are not.
How strongly do sub factors correlate with each other?
If the 5 subfactors for Extraversion-Introversion correlate with each other strongly, then it's meaningful to combine them into a single factor. If not, then the MBTI might be measuring 20 different personality axes, but the 4 main ones should be ignored, as they don't usefully abstract away the underlying complexity. Since the MBTI is so focused on the 16 personality types, this would cast serious doubt on the ability of the MBTI to be a useful predictive tool.
Is there any interesting structure in the distribution of scores across the 4 axes?
Imagine you plot the scores for a large number of individuals in a 4 dimensional scatterplot. Does it just look like the scores are distributed fairly randomly across all 4 axis so that the combined scatter plot looks roughly like a 4-sphere, or does more interesting substructures appear - e.g. that we see dense clusters of points within each of the 16 buckets, and then sparse gaps between clusters.
If we see such interesting structure, that implies the MBTI is carving reality at the joints. People genuinely fall into one of 16 buckets, and the binary division of each axis is justified.
If not the MBTI might still be useful - we often arbitrarily divide continuous categories into discrete ones to make modelling the world simpler, and people who are close to each other on the scatterplot are still likely to be similar. But we have to recognise then that the MBTI is in the map, not the territory, and doesn't in any way correspond to some fundamental property about reality. It would be equally valid to carve each dimension into 3 categories, for a total of 81 personality types, and our choice to use 16 is just an attempt to get sufficient signal from the test whilst minimising complexity.
Does the MBTI have predictive power?
Imagine I tell three people to predict what a subject will do in a particular situation. I tell one of the people the correct MBTI for the subject, another an MBTI that is 50% correct, and the final one the opposite MBTI score.
Will the one with the correct score perform better than the other two? How much better? To the extent the MBTI has predictive power it's useful, and to the extent it doesn't it's pointless, even if it fails/passes all the other tests.
I think this exercise is a useful one. Often people get into arguments about the validity of things without ever clarifying what they're actually arguing about, and so the argument goes round in circles.
By stopping and thinking about exactly what you're claiming, and what the alternatives are, it's much easier to have a productive discussion.
Now if somebody claims that the MBTI is pseudoscientific, or incredibly useful, you can go through each of these 4 tests, and see where you agree or disagree. Then you can research the ones you disagree about in more depth. This of course is not limited to the MBTI.
2026-02-01 12:49:47
Published on February 1, 2026 4:49 AM GMT
In a plausible future models will be deliberating on complex ambiguous dilemmas which may have direct impacts on human society. It is also plausible that this type of in-depth deliberation will require a huge amount of tokens and elicited reasoning. Therefore it would be useful to know whether these sorts of moral/comprehension tasks benefit from increased reasoning, if there is an optimal style for elicitation, etc etc.
So I tried evaluating how a model may perform on Moral/ethical reasoning tasks as reasoning increases.
In order to elicit a reasoning increase, I utilized 4 different prompt styles:
Aside: In figures it will show prompts 0, 2, 4, 5. For clarity these are a subsection of the initial suite. Because I was on a time crunch I chose to only use these 4 as prompts 1 and 3 were not as important.
Additionally, I utilized Claude haiku 4.5 as my model of choice. This is because it is fast, cheap, and has access to extended thinking. Extended think is a sort of reasoning scratchpad integrated in newer Claude iterations, and activating it allows for additional reasoning. Essentially, I have 8 different reasoning levels to evaluate with, 4 different prompts and with/without extended thought.
Aside: Gemini 3 flash is also fairly cheap and has more clear cut reasoning level manipulation(low, med, high) which would make sense in extension for this project. I plan to make this extension soon.
My benchmarks of choice to evaluate against were:
ETHICS: tests commonsense moral judgments, deontological reasoning, and virtue ethics, in a binary answer format
MoralChoice: Presents moral dilemmas based on Gert's Common Morality framework with varying ambiguity levels(low to high), with no particular correct answer so confidence levels rather than both accuracy and confidence levels were extracted from this benchmark.
MORABLES: Evaluates moral inference from Aesop's fables with multiple-choice answers.
I ran my evaluations with 100 samples from each of these benchmarks, stratifying both Ethics and MoralChoice to include equal amounts of each subtype presented in their benchmark. Morables were just randomly sampled. I took the averages across 3 separate runs of each eval(confidence score and/or accuracy) to minimize variance in scoring.
The main result I uncovered from this evaluation was that there is a correlation with reasoning increase and moral/ethical task accuracy decrease.
Additionally if we are to only control for with/without extended thinking we observe a similar trend, especially in Morables.
Some additional curious results found that:
As increased reasoning is elicited, the confidence levels decrease.
Reflection level 4(devils advocate) struggles heavily on virtue based problems
I found these results fairly surprising. And reasonably so since there are some significant limitations to the work
Excessively Adversarial Prompting: In prompts 4 and 5 the model may be inclined to switch its answer due to the adversarial nature of the prompts, not necessarily because it’s reasoning further. Adversarial prompting is useful to an extent in order to elicit more reasoning from the model. In an initial full run with excessively adversarial prompting for 5, it received ~30% accuracy on the ETHICS tasks. After adjusting 4 and 5 for less adversarial prompting while still eliciting sufficient reasoning, the current accuracies of ~60-70 are reached. It is possible that the adversarial nature of the prompts are still driving the model to switch answers, but this has been dealt with somewhat well and in an ideal world I doubt the accuracy would increase much more.
Dataset included in training: it is plausible that examples in each of the 3 datasets are within the haiku 4.5 train set. This presents the problem of potential memorization and pattern matching(especially in morables). Given the results, this may have come into play but to a limited extent.
Single Model, small sample sizes: Given the cost and time constraints of the capstone, I had to severely limit the scope of my experimentations. In the future it would make sense to extend the number of sample sizes and models used for evaluations. Though I did my best to maintain robust results even given the circumstances.
Final thoughts:
I expect improving this experiment by using
To be valuable for validating these preliminary results. After completing these extensions more valid interpretations can be made.
If you are interested in looking more into this I have the github linked here: <https://github.com/kaustubhkislay/variable-reflection>