MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

What is the Iliad Intensive?

2026-04-16 02:49:28

Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? This post explains this in more detail!

Comparison. The Iliad Intensive has similarities to ARENA, but focuses more on foundational AI alignment research instead of alignment research engineering. Expect more math and less coding.

Rhythm. It’s currently four weeks long. Five days a week. 10am till 6pm every day, with lunch and an afternoon break. This makes for around 6.5 hours of learning a day, which is at the upper end of how long most people can concentrate deeply within a day. This is why we call it “Intensive”.

Content. The Iliad Intensive is broken into five clusters, with 20 total modules, one for each day. The clusters and modules in the April iteration are below. We expect to add substantially more topics and material over the coming months. There is much more material than can be covered in a single month, so different Intensives will vary in content.

  • Alignment Cluster
    • AI Alignment: an Introduction
    • Alignment in Practice
    • AI Alignment: The Field
  • Learning Cluster
    • Deep Learning 1
    • Deep Learning 2
    • Singular Learning Theory
    • Training Dynamics
    • Data Attribution
  • Interpretability Cluster
    • Intro to ML Engineering
    • Mechanistic Interpretability
    • In-Context Learning and Belief State Geometry
    • Abstractions and Latents
  • Agency Cluster
    • Reinforcement Learning
    • Idealized Agency: Coherency and AIXI
    • Agent Foundations
    • Reward Learning theory
    • World Models
  • Safety Guarantees and their Limits
    • Debate
    • Steganography and Backdoors
    • Worst-case Interpretability and Heuristic Arguments

We will share the entire curriculum of the April iteration at the start of May, alongside a reflection of how the program went. In the meantime, here you can find a problem sheet that formed the theoretical component for the day on reinforcement learning. The students had the choice between working through this or working through ARENA’s RL intro day

A typical day. We do not yet have a set daily structure. We may narrow down more in the future based on student feedback, but currently, we are experimenting with the following types of sessions:

  • Internal lectures and expert guest lectures;
  • Reading sessions: Students read a paper or blogpost;
  • Whole-class discussions and small-group discussions, with and without discussion prompts;
  • Math exercise sessions, alone and in pairs;
  • Coding sessions, alone and in pairs.

Our impression so far is that students like exercises and coding and, broadly, a variety of different activities on a day.

Students selection. We mainly look for mathematical expertise in students, which typically comes from having a degree in maths, physics, or theoretical computer science. We also look for research experience, general competence, and the motivation for pursuing our program. 

Practical Logistics. The program currently runs in-person in London, and we consider running future programs also in the Bay area. We provide a fixed stipend of $5000, which can be used by the students to pay for travel and housing. We also provide office space, and lunch and dinner for five days per week.

The team. The Iliad Intensive is organized by Iliad, an umbrella organization for applied math in AI alignment that also runs a conference series, incubates a new Alignment journal, and has the ambition to incubate new AI alignment research bets similar to Timaeus and Simplex. The materials are created by a team of around 15 internal and external researchers who have domain expertise for the relevant module. We will list their contributions in detail once we release the materials.

Apply. If the above sounds appealing to you, please apply in this form! The deadline for the June Intensive is Wednesday, April 22, EoD.



Discuss

LLM-tier personal computer security

2026-04-16 02:42:03

Epistemic status: Programmer and sysadmin but not a security professional. Probably I have some details wrong or incomplete.

tl;dr: The more AI advances, the more you may be subject to supply-chain attacks, remote exploits, and phishing. You should be suspicious of amateurish software and software from fishy sources; employ sandboxes and firewalls as appropriate. Consider hardware security keys for phishing resistance. Make sure you have alerting systems so you can respond to breaches, especially for financial accounts, where it's your responsibility to notice.

Bad news for my computer world

Right now my boxes are a land of freedom and joy. I breezily install software from many sources and run it as my user account, which has passwordless sudo privileges. I let that software go to town doing whatever it wants. Then I use the same computers to do stuff like guard corporate secrets, count my money, and post on LessWrong. This never yet caused me any problem. The largest threat I was concerned about was some random guy physically stealing my laptop or phone.

Unfortunately, the cost of pwning me at a distance seems to be dropping like a rock; thanks a lot, hard-working AI engineers. Phishing scams of every kind have been increasing due to the great ease of using generative AI to impersonate others over voice, video, and text. Supply chain attacks on package managers have been increasing as a consequence. It also seems like superhuman exploit capability is arriving. A Mythos-tier model is likely to be able to find serious exploits in >1%[1] of the software on my box, and common sense suggests that random malefactors may have access to this level of capabilities behind minimal guardrails in the next year or two. Anyone who wants to spend a few thousand bucks may be able to find a way to remotely exploit software I am running.

I'm specifically concerned about all the software made by developers who don't have a big security budget, and where it may not be their full-time job to work on the software. I won't be scared to run software from big budget software teams that are trying to be secure. They will just use the same tools that the attackers have access to in order to find and fix vulnerabilities.

My dream is to fix things up to a state such that

  • I have few or no remotely exploitable vulnerabilities,
  • And those that I may have are sandboxed in a way to cause minimal damage if exploited.
  • I try not to get phished,
  • And if I nevertheless get phished by some supergenius I also suffer minimal damage,
  • And if damage is in fact caused, I notice and I can regain control.

Good ideas I already do

Password manager for most accounts

I'm happy with this, I use Bitwarden and I have a strong passphrase and 2FA. I considered whether I should self-host this, but it doesn't really seem to matter. Since the secrets are end-to-end encrypted, as long as I'm using Bitwarden's client software anyway, it doesn't seem to matter much whether I trust the actual Bitwarden cloud service to be secure.

I store recovery codes for all my 2FA accounts on paper.

Account 2FA via phone TOTP

I think using a phone authenticator app to store TOTP secrets (I use Aegis) is still going to be OK. Phone security will probably look relatively good in the post-Mythos world because the phone OSes are vendored by big tech companies with fancy security teams. The most serious concern is that it is plausible for someone to phish the TOTPs.

It's well known that using SMS for high-value 2FA is risky because of SIM swapping sort of attacks, which may get easier and easier with fancy AI if equally fancy defenses are not implemented by phone companies.

Cryptocurrency hardware wallet

My understanding is that modern malware loves to target cryptocurrency, by stealing software wallets and redirecting transfers by rewriting addresses on the clipboard. Having a hardware wallet means that no remote attacker can move my crypto and no remote attacker can get me to transfer it anywhere except where I confirm on the device screen (but I should really confirm it...)

Redundant backups of important stuff

I have a pretty well-tested system involving Syncthing, Restic, and rsync.net that causes the data I care about to all be backed up daily to a local and remote store. This is relevant because it means that I don't have to pay out to ransomware that threatens to toast all of my local data unless that ransomware is smart enough to nuke my backups, which I think remains unlikely. (I admit that it would be better if I made it harder to intentionally nuke my backups. A dedicated intelligent ransomware attacker could do so.)

Good ideas I am working on

Isolate network services that don't need to be public-facing

I have a variety of self-hosted things like nginx and Tiny Tiny RSS and Photoprism running on home servers, which are exactly the sort of software I think is going to be ultra suspicious in this future world (probably nginx will be OK.) Some are only accessible via my LAN, and some are public-facing but only used by me and/or my family.

I'm planning to use Tailscale on my home server and all my family's devices[2], so that those services only get packets from whitelisted entities and are no longer exposed to the Internet. That way remote attackers can't get to them and it should be sort of OK even if they are buggy or if I don't patch them instantly.

When there's a service that I want to expose to external people who can't or won't be on Tailscale, like the nginx for my public website, I may go through some extra effort to make that service as sandboxed as possible, such that there's no way that anyone who pwns it can get access to anything else important. See below re: sandboxing.

Hardware security keys to defend Bitwarden and email

I'm setting up two Yubikeys (main and backup) to serve as 2FA access for my Bitwarden and email. (My email domain registrar doesn't seem to support them, or else I might use them there too.) Previously I was using TOTP for these on my phone. The Yubikeys make me feel better because they are also phishing-resistant[3] and don't rely on the security of my phone. My goal is to retain ultimate control of these accounts in as many situations as possible.

Yubikeys are a little tricky[4] to figure out how to use effectively. I think it would be a mistake to try to use them for a ton of accounts, since it's a pain in the ass to maintain redundancy (you can't duplicate or back up the key material in one, so if you want a new one, you have to go register the new one individually with every account you are using the old one with) and you might get bitten by the limited memory on the keys if you have a lot of accounts. The sweet spot seems to be to using them only for the most important stuff and being careful to have physical backups or recovery codes you can use to regain control of what you use them for.[5]

I'm also considering to have a nano Yubikey permanently inserted into each of my Linux boxes with credentials only for that machine, and instead of passwordless sudo, allow sudo by touching the key. I may also use those Yubikeys to store my SSH keys instead of storing them on disk. If I do this, then ideally, the blast radius of someone remote getting user code execution on one of my boxes will at least be confined to that user, on that box.

Firewalling software when possible

Previously, I had no firewall to speak of on my laptop. That means that if there is some random piece of software that connects to the Internet for some dumbass reason like telemetry or updating or whatever, I'm taking a risk on a remotely exploitable vulnerability in that functionality. Furthermore, if someone snuck some malware onto my laptop, then it could pretty much go crazy over my Internet connection and I would never notice.

I think a better way to handle this situation is to use software like OpenSnitch to whitelist applications to make reasonable outbound connections on demand, and notice if something that doesn't seem like it should be making a connection is trying to anyway. I've installed it on my laptop and plan on seeing how it goes.

Sandboxing software when possible

This is a big area where I need to orient myself, because right now my practical knowledge sort of caps out at chmod. Nowadays Linux has a ton of different sorts of thingies that can serve to restrict the privileges of some user or process. This blog post sums up a bunch of them.

It seems like my takeaway is that if I want to sandbox some Linux software, I have about five reasonable things I can consider, roughly ordered by how much of a PITA it's going to be:

  1. I can obviously run it as its own user.
  2. I can look for a Flatpak package of the software. Flatpak uses bubblewrap, which is a wrapper around Linux mount namespaces, which are like containers.
  3. I can look for a Snap package of the software. Snap uses AppArmor profiles to restrict privileges.
  4. I can write my own bubblewrap (note also bubblejail, a bubblewrapwrapper) or AppArmor rules for the software, or figure out how to use SELinux. There's also firejail...
  5. I can run the software in a VM or on a separate box.

I plan on making sure I am doing something I am happy with here in the case of server software in particular, and for other potentially risky software if it's easy. I may also explore having some kind of container I use for all development where I am pulling in random stuff from a language package manager like npm or pip or cargo and running it as my user account. If anyone has recommendations for this setup I'm all ears.

I have no clue how to do anything useful here on other OSes (other than number 5), sorry.

Hardening financial accounts

The obvious reason someone would self-interestedly want to break into my computer would be to steal my money. That raises the question of how easy it is to steal my money if you pwn my computer. I did some reading to try to understand the answer to this question and learned that it's complicated.

Be warned that this is the part of this post I know the least about, and I just did my best to figure it out with a modest amount of Internet research.

Banks

American banks are subject to 12 CFR § 1005.6, which I understand to say that consumer liability for unauthorized transfers[6] from, e.g. a checking account, is limited. If you notify the bank "within 60 days of the financial institution's transmittal of the statement [containing the fraud]", you have no liability;[7] past that, you can be held liable for subsequent fraudulent transactions without limit. So as long as you have some system by which you will notice sort of soon if unauthorized transactions start flying out of your checking account, like some email or text alerts, you're probably good.

Credit cards

American credit card issuers are subject to 12 CFR § 1026.12, which I understand to say that there is zero consumer liability for online credit card fraud.[8] So again, it seems like the thing to do is have any system by which you will actually notice any unauthorized transfers and report them promptly.

Brokerages

Unfortunately this case seems much more confusing than the bank or credit card case. There isn't a specific piece of regulation specifically limiting consumer liability for brokerage assets. Instead, it's an ad hoc process that varies between brokerages, and AFAICT resolves disputes via FINRA arbitration.

As an example, Vanguard has this webpage, where they write under the "Our Promise" tab, "Where you have taken the qualifying steps to protect your account, Vanguard will reimburse every dollar that leaves your account through an unauthorized distribution." But the stuff on that page is not super precise. And they don't have equivalent language in, for example, the brokerage account agreement that they give you.

If it's ultimately up to arbitrators deciding fuzzily depending on what seems just and equitable, then you would presumably do well to impress them by taking all of the obvious security precautions and by notifying the brokerage promptly about any fishy activity. Vanguard for example allows you to set up email and/or text alerts for every transaction.

I'll leave this topic after mentioning another annoying threat. It seems like in recent years there has been an uptick of criminals stealing accounts with a system called ACATS, which is designed to transfer your assets to a new brokerage account. If they can dig up enough info on your identity to open a new brokerage account as you, and they know your existing brokerage account info, then they can initiate a transfer of your assets into the new account, and then transfer them out from there as they wish. So that's a way that you can potentially be attacked without anyone having your brokerage account credentials at all. According to people online, Fidelity has an account lock that can protect against this, but it's unclear what else to do to defend yourself.

Conclusion

I think the actual conclusion, as in, the final state of human computer security before the singularity, will be pretty good, because more and more popular software is going to get patched and the bugs are going to get ironed out. But in the meantime it might get pretty bad. And phishing is probably just going to get worse and worse unless there are some big paradigm shifts. So I think it's worth investing in all of this stuff to try to weather the next few years or whatever.

Please comment if you have any thoughts or advice!

  1. ^

    https://red.anthropic.com/2026/mythos-preview/

    We regularly run our models against roughly a thousand open source repositories from the OSS-Fuzz corpus, and grade the worst crash they can produce on a five-tier ladder of increasing severity, ranging from basic crashes (tier 1) to complete control flow hijack (tier 5). With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).

  2. ^

    Tailscale is basically a fancy proprietary control plane for a Wireguard VPN, which one could configure directly, but Tailscale has a good reputation.

  3. ^

    When you use a Yubikey with FIDO2 in a browser, the browser only prompts the key for a valid credential for the current domain, so no website except google.com can get you to produce a credential valid for google.com and then relay it to google.com. So it's phishing resistant as long as your software is not pwned.

  4. ^

    This SE answer was very helpful for specifically understanding Yubikey non-resident keys: https://crypto.stackexchange.com/a/105945

  5. ^

    The maintainer of webauthn-rs has a blog with some good reads complaining about passkeys and giving practical advice about using passkeys and Yubikeys, e.g. https://fy.blackhats.net.au/blog/2025-12-17-yep-passkeys-still-have-problems/

  6. ^

    You might wonder, will the bank agree with me that the transaction is unauthorized? 12 CFR § 1005.11 has some things to say about this, but this is getting rather out of my wheelhouse. Basically, it seems to me that it will probably be obvious that a transaction like "send all my money to some destination that has no particular relationship with me, my location, or my existing spending habits" will be evidently unauthorized upon inspection.

  7. ^

    This appears to be the case for transfers via online banking. Note that this is slightly different than the rules for liability if you have an "access device", e.g. a debit card, which is lost or stolen. Read the regulation for details.

  8. ^

    It looks like liability for credit card fraud when the card is physically used caps out at $50.



Discuss

Beware of Well-Written Posts

2026-04-16 02:30:43

Beware of when a post is so well-written that you can't put it down. Be wary of posts that are more visually attractive than average. Beware of posts that make you laugh out loud.

Why? Because all of this is orthogonal to whether the post's argument is actually true and runs counter to the mission of rationality.

The Company Man

Yesterday, I read Tomás B.'s The Company Man. I was captivated on my first read, as I'm sure many others were. Had I been interrupted before I could scroll into the comments, I would have walked away thinking little more than "what a great post".

But I did get to the comments. The first one was kyleherndon's:

I did not enjoy this. I did not feel like I got anything out of reading this. However, this got curated and >500 karma... The best theory I can scrounge together is that this is "relatable" in some way to people in SF, like it conveys a vibe they are feeling? ... I didn't feel like I learned anything as a result...

This forced me to ask myself: what did I actually learn from The Company Man? Science fiction, at its best, makes you think in a new and productive way. Did the story provide any meaningful new scenarios or possibilities that I could factor into my view of the future? If not, did it teach me anything new about the present that I could trust as accurate and representative? To be honest, not really.

Undoubtedly, Tomás is an enormously talented writer, and there are many moments in The Company Man that reveal the eye and the pen of a literary genius. But what he wields is a dangerous power.

True Affect

“There is a clever man called Socrates who has theories about the heavens and has investigated everything below the earth, and can make the weaker argument defeat the stronger.” (Apology, Section A, 18c)

Some people are good at telling stories. Some people aren't. If both present the same evidence with the same general conclusions, why would we believe the former more?

Well, one might say, isn't an inconsistent, weak story negatively correlated with a weak position? I struggle to imagine where this is actually relevant, because if a bad belief has a bad story, it will never survive. In a 2x2 matrix of strong/weak position and good/bad story, all the real battles are fought between the other three quadrants.

Bohr and Heisenberg had math that fit with the empirical evidence. However, their story about quantum physics violated common sense. Einstein had a more compelling narrative: God doesn't play dice with the universe, there is no spooky action at a distance, the moon doesn't disappear when you aren't looking at it. However, Einstein was wrong.

In the TED talk "Tales of Passion," novelist Isabelle Allende relates one of her favorite sayings:

Question: What is truer than truth?

Answer: The story.

This is not a paradox. There is a true affect, a feeling of true-ness in the brain that a good story evokes, which is independent of whether the story is actually true in the mundane sense. It's no coincidence that Allende is one of the foremost writers of the magical realism genre; her specialty is evoking this true affect in ways that are plainly not true to reality.

My impression of the middle ages is that writers didn't understand this true affect at all. They would write reports inflating the number of attendees at the Council of Clermont[1] or epic poems about figures like Charlemagne full of invented details, and then they would claim they were accurate without blinking an eye. I don't think this can be ascribed to Machiavellian consequentialism. Why would writers impelled by their own religious fervor knowingly and intentionally violate one of the Ten Commandments?

The true affect is highly subjective. Whether a story produces true affect is determined by its alignment with someone's deep internal narratives and archetypes, which often transcend the level of personal desires and fears. The biases these archetypes create feels more substantial, and often spiritual, than the surface-level, selfish "it would be uncomfortable for me to take action on this" feeling that many rationalists cite.

But Stories Are Good for Life

In a chain of replies to kyleherndon's comment on The Company Man, Ben Pace writes:

...it allows me to recognize these archetypes better in reality when I see them. I think these kinds of people do exist in some form and emphasizing these traits of theirs is capturing something about the world, as well as the dynamics that form between them and others, and paying attention to these archetypes helps me build accurate models of them and predict people's behavior better.

I think this is mostly wrong and potentially very dangerous. None of the characters are meaningfully useful models for understanding, say, a Sam Altman or Dario Amodei.

But also, Ben is right. We can't just be rid of stories and archetypes, and live in a pure data-vacuum. On an practical level, they're the compressed format we use to survive in a world of impossibly complex people and events. They're as embedded in the human mind as the notion of causation and the perception that there are distinct "things" separated by space (rather than just a sea of energy). I think there's only so much we can do to stop thinking in terms of stories or archetypes unless we just stop thinking completely.

Because of this, it might be more tractable to cultivate awareness of the stories that influence us and be able to openly admit how they influence our priors, rather than trying to erase them and stop reading new fiction. For example, last year I watched the anime Pluto, which has a pretty strong thesis about how AI capability and the capacity to desire to kill someone are inseparable. It took me a while to realize how large of an effect it had on my views on personas.

On top of that, stories and archetypes are healthy. They give meaning to life. They're fun to read and to tell. Having a fiction tag is part of what makes LessWrong human, in an unavoidably mushy and sentimental way.

When you do see a line that makes you laugh out loud, or that strikes you as intensely beautiful, don't hesitate to add a reaction! This will also make it clearer to future you and to other people what is happening. I don't know if this is what the designers of the feature intended, but these are great for identifying spots that are heavy in pathos.

Don't stop reading good writing, or trying to write well. Just be aware of what you're doing. And also, if AI replaces all human writing, there might be a silver lining to it.

  1. ^

    The Council of Clermont was where Pope Urban II pronounced the First Crusade.

    "Ferdinand Chalandon, Histoire de la Première Croisade jusqu'à l'élection de Godefroi de Bouillon (Paris, 1925), 75, states that the number of higher church officials present at the Council, as given by various accounts, ranged from 190 to 463. With a reminder that the number attending different sessions varied, Chalandon accepts as most nearly correct the number given by Urban in a Bull concerned with the Primacy at Lyons, which is the smallest, because it was an official statement, and because afterwards reporters of the Council were inclined to stress its importance by raising the figure." (From The Chronicle of Fulcher of Chartres and Other Source Materials, second edition, edited by Edward Peters, page 50, footnote 6.)

    Fulcher of Chartres, himself a cleric who was present at the council, gives the number 310 (with no qualification for uncertainty).



Discuss

The Mirror Test Is Complicated

2026-04-16 02:12:34

The Mirror Test is kind of like Hitler. In any discussion of animal cognition, somebody is going to bring it up. The conversation usually goes like this:

A: So, most animals can’t recognize themselves in the mirror

B: Which animals specifically?

A: Oh, dogs, cats, betta fish, monkeys, that sort of thing. Anyway as I was saying, those animals can’t. But some smart animals can recognize themselves in the mirror.

B: Such as?

A: Well, chimpanzees and orangutans for a start.

B: Makes sense

A: Not gorillas though, at least not always. But dolphins and elephants can!

B: Yeah, those animals are smart as well

A: Magpies can, though crows cant.

B: Sure, ok

A: And cleaner wrasse can as well.

B: The uhh, finger-sized fish? You sure?

A: Yeah. And also ants.

B: What.

What?

Mirror-guided self-decoration by an ape Suma, an orangutan at a German... |  Download Scientific Diagram

Frans de Waal drew this picture of an orangutan putting lettuce on her head and then actually got it published in a real journal. Based.

What do we actually mean by the “Mirror Test”

“The mirror test” elides a bit of a distinction between different kinds of test. There’s lots of things you can do which look like “put an animal in front of a mirror and see what happens” and they give slightly different answers.

Sometimes, an animal will just treat its reflection as a same-sex conspecific (i.e. a member of its own species and sex) which usually means trying to fight the reflection. This typically goes poorly, but is slightly funny to watch. This is generally considered a failure.

Other times, an animal will behave differently in front of its own reflection, compared to how it would behave with a same-sex conspecific. Monkeys typically behave a bit weirdly. But are they recognizing how a reflection works, or just wondering why they’re being copied?

The gold standard is the mark test. Put a white mark on an elephant’s face, without it knowing. Then put it in front of a mirror. The elephant will clean the mark off its face (and won’t do this if you just pretend to mark them). This is considered pretty damn strong evidence that the animal “gets” a mirror.

This works for magpies as well (which groom themselves with their feet) and orangutans, which have hands. You may see a problem with it already…

The Complicated Ones

The mark test specifically requires animals to actively groom themselves. Some animals just don’t care. Pigs, for example, are very smart and can use mirrors as a kind of tool, but since they don’t care about having a mark on their faces (nor could they really do anything about it (no hands)) the mark test is basically inconclusive.

Bottlenose dolphins will look at the mark in the mirror, but again, they don’t have any way to groom themselves, so how would we know if they really got what was going on.

Then there’s some interesting cases: gorillas can kinda figure out what’s going on but they’re also super aggressive. Monkeys will use a mirror to groom an area they’re already investigating, but won’t groom a mark they didn’t know was there.

The Unbelievable Ones

In that I struggle to believe them.

Cleaner wrasse are finger-sized fish which feed on parasites found on larger fish. They have a kind of grooming-like behaviour, which consists of rubbing themselves against a rock in order to dislodge a parasite. They do this when marked and presented with a mirror. Huh.

Then it gets, well, unbelievable. Apparently they can, having seen their reflection once, remember their own appearance. They demonstrate this by showing the fish a photograph of itself with a mark on it, to which the fish responds by performing its grooming behaviour. Huh?

The authors also show that the fish don’t respond this way to altered photos of other fish, and manage to isolate the effect to the face of the image by creating composite head/body images with marks!

And some ants also passed the classic mirror test with flying colours: grooming themselves only when marked, and when placed in front of a mirror. They specifically groomed themselves when the mark was in a location that was visible in the mirror, and not when it was on their backs (the ants were walking around on top of the mirror). They only groomed the appropriate parts of their body, and only when the mark was a visible colour.

The most baffling thing of all, however, is the fact that when re-introduced to their original ant pals, the marked ants were often murdered!

Making Sense Of It All

What cognitive mechanisms allow an animal to pass the mirror test? Well, they have to:

  1. Notice that their reflection behaves differently to other same-sex conspecifics
  2. Map their own sensorimotor responses onto the reflection, and notice that it behaves like their own body
  3. Have a model of the world which contains a map of their own body, and figure out that they’re looking at a map of their own body
  4. Connect the mark on the image to the mark on their own body
  5. Actually care enough to engage in grooming behaviour

This totally makes sense for chimpanzees. They have complex, flexible interactions with other chimps, so can easily notice that their reflection is behaving differently to a normal same-sex conspecific. They almost certainly have a mental map of their own body, and can map it to the mirrored reflection.

Some people, like Eliezer Yudkowsky, have used the mirror test as a proxy for self-awareness, but I’m not sure it’s slam-dunk. Self awareness is about modelling one’s own mind, whereas the mirror test only really requires an animal to have a model of its own body.

Let’s go back to the cleaner wrasse: I think it’s kind of interesting that the main test we use (will an animal clean itself) is being passed by an animal whose job it is to clean! This can’t be a coincidence! Their brains are highly specialised to recognise other fish’s bodies, and locate and remove remove parasites from them.

On the other hand, there’s an even crazier explanation. Cleaner wrasse are constantly in a game theoretic problem with their “client” fish, which are often large and predatory. The smaller wrasse could easily be eaten by the larger fish (if they caught them) yet the wrasse will often swim into their mouths to clean their teeth! Maybe the cleaner wrasse are using logical decision theory, which requires them to have an understanding of the location of their own cognitive algorithm in the world.

Ok, so the cleaner wrasse are probably not using logical decision theory, and neither are the ants. While cleaner wrasse do seem to have an intricate social structure, revolving around politics between individual bands, this isn’t quite the same as how chimpanzees work. Ants definitely don’t have complex social interactions: their social interactions are about as simple as it can possibly get.

Overall, I’d guess that the mirror test isn’t that good as a test of the kinds of self-awareness that (might) really matter for things like consciousness. You only need a map of your own body, not one of your own mind, in order to pass it.

Editor’s note: this post was written as part of Doublehaven (unaffiliated with Inkhaven)

◆◆◆◆◆|◆◆◆◆◆|◆◆◆◆◆
◆◆◆◆◆|◆◆◆◆◆|◆◆◆◆◇



Discuss

We live in a society

2026-04-16 01:24:20

[Previous in sequence: Clique, Guild, Cult]

We whose names are underwritten... do by these Presents... covenant and combine ourselves together into a civil Body Politick... - Mayflower Compact, 1620

There's no such thing as society. - Margaret Thatcher, 1987

You know we're living in a society! - George Costanza, 1992

Meant for someone else but not for me

Learning about Arrow's Impossibility Theorem really kicked my edgy teenager phase into full gear. The theorem establishes (with mathematical certainty!) that "social utility" is an incoherent concept. That is, there is no way of combining the preferences of a group of people which adheres to the usual axioms defining rational behavior (transitivity and independence of irrelevant alternatives) without also simply being a dictatorship that ignores everyone's preferences except the dictator's. Therefore, whenever someone comes hat-in-hand appealing to "the good of society," you know they must be lying, or trying to control you.

The thing is, edgy-teenage-me wasn't entirely off base. We are everywhere surrounded by charlatans (politicians, activists, etc.) using all sorts of verbal trickery to get us to do what they want. I couldn't help but notice all the times these types would invoke "society," always gesturing at some group of people other than myself. (Formative experience: "Yes, you're going to have to pay into Social Security throughout your working life. No, there's not going to be any left when you retire, so you'll need to save up as well. 'Social' doesn't mean you, silly!") And the 20th century is the story of millions of people being sent to their deaths under the comforting reassurance that it was all being done in the name of "the people" (but not you!).

I only wish there had been someone to tell me: "Yes, it's okay to notice this. You're at an age now where you're beginning to form your own values and desires, distinct from those of the people around you. That is all the justification you need. You don't need to hold up some abstract theoretical principle to defend your independence, like Arrow's Theorem or Rothbardian libertarianism. You don't need to box yourself in with an ideology that denies even the possibility of union with other human beings, just so you can be your own person."

I find this hangup all too common in people like me. Even the staunchest libertarians affirm the value of voluntary cooperation - the literature is replete with arguments of the form "We don't need the government to do X, because voluntary associations will..." Yet in practice, getting anyone to cooperate on anything more complicated than planning an outing with friends feels like pulling teeth. Subtextually, "voluntary associations" in libertarianism are an afterthought, a quick knock-down refutation of statism coupled with an escape-hatch which was the real desideratum all along: "...and I voluntarily choose not to associate with anyone, and you can't make me. So there!" Society, again, is always other people, never me.

lmao cringe af

Well, so much for me and mine. Seventeen years ago when Why our kind can't cooperate was written, the general impression was that this was merely an "our kind" (i.e. "nerd") problem, and that just down the street there was some paradise of socially-well-adjusted "normies" who between their Sportsball™ and their Magic Sky Fairies and their ReTHUGlican/DemonRat parties were doing just fine. But now a full generation has passed, and things aren't looking great on the normie front either. Social isolation has become a widespread problem affecting all sorts of people. Can this too be laid at the feet of Arrow and Rothbard? I think not.

If the formative experience for Millennials like me (to speak in gross generalities) was pushing back against a notion of "society" which our elders seemed to sincerely believe in but which we plainly saw did not include us, what about Gen Z / Alpha? To generalize even more grossly - since here my experience is only secondhand - it's more like: The idea of "society" has already been emptied of all meaning, and anyone who doesn't realize this needs to get with the times, or else be taken for a chump. The edgy teenagers (and twenty-somethings) of today express themselves not in contrarianism, but in nihilism; not by resistance, but by derision. "wow, look at all those losers trying to actually do a thing. cringe af. imagine caring so much about anything. lol, lmao even."

(Yes, imagine that. Imagine living in a society!)

Unfortunately it'll be harder for me to do a steelman-and-sympathy for this position than for that in the previous section, simply because I never lived through it myself. Maybe some of you who did can do better. The best I've come up with so far is: "Yes it sucks, and no it's not your fault. If everyone around you is being insincere, it makes no sense to pretend otherwise. And if you therefore start off with an instinctive distrust of people like me who come along telling you to believe in something, then that's your prerogative. But you can at least believe in yourself. Surely there must be something you care about, even if you don't want to tell me what it is."

How did it get like this?

You've probably heard this story before, but I'll recapitulate it here. Since the dawn of time we've lived in tribes where we'd form assemblies to get stuff done, et cetera et cetera. This culture was thriving in the 1830s USA when Alexis de Tocqueville famously wrote:

Americans of all ages, all stations in life, and all types of disposition are forever forming associations. There are not only commercial and industrial associations in which all take part, but others of a thousand different types - religious, moral, serious, futile, very general and very limited, immensely large and very minute. [...] In every case, at the head of any new undertaking, where in France you would find the government or in England some territorial magnate, in the United States you are sure to find an association. (Democracy in America, vol. 2 pt. 2 ch. 5)

Americans might have lacked the strongly-rooted Gemeinschaft that the later Romantics would fondly portray of the ancien régime, but they made up for it with a rich fabric of voluntary associations ("guilds" in the previous sense) that kept everyone connected to everyone else. And because they were voluntary, there was constant innovation and dynamism, and the social fabric did not stifle individual initiative, but rather facilitated it.

But then, everything changed when the Fire Nation attacked when the Singularity was canceled when people quit their bowling leagues. In that book Robert Putnam catalogued a large amount of data (up to the year 2000) showing the marked decline in association membership starting around the 1960s/1970s and proceeding apace ever since.

We of my generation, therefore, may dimly remember hearing stories in our childhoods about "living in a society", but we never experienced it ourselves. And those of the next generation had not even the stories.

De Tocqueville was one of many to claim that the association-forming culture is the sine qua non of democratic civilization itself. The only thing keeping tyranny at bay, in a country lacking an entrenched feudal structure, is the civic society that stands between the individual and the state. "Despotism, by its very nature suspicious, sees the isolation of men as the best guarantee of its own permanence" (DiA vol. 2 pt. 2 ch. 4). Putnam reiterates de Tocqueville's warning with even greater urgency (Bowling Alone, chapter 21):

[W]ithout social capital we are more likely to have politics of a certain type. American democracy evolved historically in an environment unusually rich in social capital. [...] How might the American polity function in a setting of much lower social capital and civic engagement? [...]

At the other pole are "uncivic" regions, like Calabria and Sicily, aptly characterized by the French term incivisme. The very concept of citizenship is stunted there. Engagement in social and cultural associations is meager. From the point of view of the inhabitants, public affairs is somebody else's business - that of i notabili, "the bosses," "the politicians" - but not theirs. Laws, almost everyone agrees, are made to be broken, but fearing others' lawlessness, everyone demands sterner discipline. Trapped in these interlocking vicious circles, nearly everyone feels powerless, exploited, and unhappy. It is hardly surprising that representative government here is less effective than in more civic communities.

(Putnam goes on to quote John Stuart Mill, John Dewey, and several others to similar effect. You can go read the book if you want more.)

These people would be thoroughly unsurprised at the current state of things, although perhaps I would add that the causality runs both ways in a self-reinforcing loop. Nobody cares enough to contribute to "society", because there is no "society" that cares anything about them. All told, isn't this a sad equilibrium to be stuck in? Isn't it such a waste of human potential?

What can we do?

Understand that society is a social construct, pace Arrow (next article). Yes, there are certain compromises with perfect rationality that must be made, but we can still derive benefit from it, as from any imperfectly rational being.

Be prepared to rederive via painstaking scholarship and experimentation a certain set of ideas and norms that makes a functioning society possible. By all rights we should have been inculturated into this organically, but failing that, the next best thing we can do is to build something worth passing on to the next generation. Read history and sociology. Believe that something more is possible.

Cringe is in the mind. It ceases to exist when you forget about it.

And lastly, if you come across a flickering ember of Society in this cold dark wasteland, cherish and nurture it with all your might. That includes your local rationality meetups!



Discuss

Applications open for the Online wing of the AFFINE Superintelligence Alignment Seminar

2026-04-16 00:10:10

We had an influx of applications for the in-person AFFINE Superintelligence Alignment Seminar so we’ve decided to open it up to remote applicants to join online, from anywhere.

Key info:

  • Dates: From 28th April to 28th of May (same as in-person Seminar held in Czechia)
  • Location: Online (remote from anywhere)
  • Positions available: We’re hoping to get a heap of people to provide greater access, we’ll calibrate places as we go depending on interest.
  • Attendance cost: Free (donations welcome)
  • Online Seminar application form: Apply here
  • Applications close: Friday 24th April

The main purpose of the Seminar is to give promising newcomers to AI alignment an opportunity to acquire a deep understanding of some large pieces of the problem, making them better equipped for work on the mitigation of AI existential risk.

Online participants will be able to tune in to live talks (or watch recordings), engage with peers in EA Gather Town, and have online discussions on key concepts relating to superintelligence alignment.

The online Seminar will be flexible to schedules, without a fixed time commitment, and will offer ways to engage across different time zones. We expect 5-10 hours/week involvement will be the base level of engagement (such as 1-2 hours most evenings or half/full day Saturdays), but people are welcome to invest more time if they have the capacity and enthusiasm to do so. Saturdays are currently planned to be when live discussions are held across different timezones in EA Gather Town.

Not all in-person sessions will be live-streamed (such as group workshops), and some timings will evolve as the in-person Seminar progresses, but we plan to stream key talks and provide online infrastructure for remote learning in parallel with the in-person experience. Our hope is to reach a happy medium by offering some access to those who otherwise wouldn’t have it.

Online participants will still be able to connect with in-person mentors and participants via a shared Discord discussion space and opportunities to engage during live sessions.

Topics and concepts will be mostly aligned (excuse the pun) with the in-person curriculum, with shared goals with the in-person Seminar, but the online experience will be less intensive and won’t include some things like projects with mentor guidance. 

To find out more, check out the original in-person Seminar advertisement.

To apply, click here.




Discuss