2026-02-25 20:15:48
A hundred years ago some musicians were enraged by recording technology. Soon no-one would be paid to play live music! And recorded music is soulless anyway, who would want to listen to that?
It turns out, approximately everyone. As the saying goes, quantity has a quality of its own. When I hear that AI slop will destroy real art, I'm thinking: Good. Now bring me more entertainment.
AI will take our jobs?! Yes please! Could it do my laundry too? Same goes for any source of too-cheap labor.
Sex robots, or perhaps legal sex work, will commoditize sex? How awful. Perhaps that will drop some of the lemons from the dating pool, too.
The same principle could be applied to many other subjects. Since Moloch is the god of child sacrifice, that domain looks most promising. As the fertility rate freefalls far below replacement, at some point fixing that might become relevant, after all.
Changing the society on purpose is approximately impossible, but technological developments will do so easily. Maybe artificial wombs will do, as surrogacy remains either illegal or prohibitively expensive. Perhaps AI developments, either through robotics or mass unemployment, will make childcare services cheap enough for everyone. The first -1 to 3 years are the hardest, and many more people would end up having kids if those could be made easier.
But perhaps this would mean that parents love their children a bit less? Is the love produced by the effort spent? Surely endless sleepless nights aren't mandatory.
And the world is too dangerous nowadays. You can't just let children play outdoors; something bad could happen. Maybe they get eaten by wolves or run over by a car? Once again, Moloch provides! Don't worry about it (too much); just get a new one!
If that sounds awful to you, too bad. People who don't mind will be doing this, and in a few generations this will be the norm. In a way, this was already the norm a few centuries ago.
Another domain related to Moloch is based more on the word itself. It has been hypothesized that it's derived from Hebrew mlk, meaning "to rule". One can watch almost in real time as democracies, or at least the illusions of such, fade into a global oligarchy. Ever so sloooowly. Oh well, I've always craved a more meritocratic world. Moloch to the rescue! Or, ehmm, salvage.
Allowing people to sell their votes has a tragedy of the commons dynamic. As long as only a few people sell their vote to an evil cause, they keep the profits but continue to enjoy world without the things they voted for. In a way, the coordination around this has failed, and even worse, compensation for giving your vote to a cause is poor, typically zero. Whatever enrages people in public discourse and on social media seems to do rather well. Controlling the media directly, or advertising on it, is still quite expensive, but the spending is mostly eaten up by zero-sum competition for attention.
Robin Hanson's manifesto Futarchy: Vote Values, But Bet Beliefs describes how information aggregation could be done using markets. It doesn't say much on how to aggregate values. Fortunately Moloch ✨market economy✨ just kind-of works for this as is. Money is, in principle, obtained by providing something of value to others. Whoever provides more value, has the money to buy most votes. If they cease producing value, money will shift somewhere else too. Vote with your wallet, indeed.
This will keep kind-of working as long as the votes aren't used to break the functioning of the market economy itself. And there will be a looong delay before the power shifts, which is also a huge problem. And of course this is an extremely oversimplified view. Still, I can't help seeing the silver lining here.
Some values are worth having. I'm in the process of figuring out which ones, and it's not going well. In the meantime, the world's burning and I've got sausages to roast.
2026-02-25 19:13:27
Hello, fellow knowledge enthusiast.
Knowing things is hard, sharing our knowledge accurately is even harder, and it is all too easy to claim more knowledge than we have.
Many, when confronted with this, fall back to Epistemic Humility. They err towards being under-confident.
This is bad. I'll explain why here, and then point to an alternative virtue to cultivate instead: Epistemic Precision.
But first, let me give some context.
Before getting to Epistemic Humility, I should introduce the concept of Social Bids.
In the social world, we are all constantly making bids. And the impact of Epistemic Humility is better understood through them than traditional philosophical epistemology.
—
For instance, we make bids for others to see things our way.
In a meeting, when a colleague speaks and audibly says “Obviously, this is a bad plan”, they are doing more than sharing their point of view with the rest of the team.
They are making a bid for the team to accept that this is a bad plan. They are challenging people to contest their statement. If no one challenges them and people seem at least neutral to the statement, the bid will have been accepted.
Once the bid is accepted, anyone in the team can build upon the assertion that the plan in question is bad. As more people do so, it becomes more and more common knowledge.
—
Bids come in all shapes and sizes.
Parents and bosses make explicit and straightforward bids when they order their respective children and employees.
We make bids to consider a proposition, to adopt a frame, or to use specific words.
We make bids, we counter people’s bids, we negotiate and compromise, we accept and reject the bids of others.
We bid using power, authority, status, emotional appeals, and to a much lesser extent, appeals to reason.
It’s the full chaos of the social world.
—
Most importantly, we have a Bid Budget.
Regardless of the type of bid, everyone knows and feels that they have a limited budget.
A parent can only order their children to do so many things before they revolt. The same is true for a manager or a captain.
Similarly, in a group, we can only make so many surprising statements before people start doubting us and in the end stop believing us.
Budgets are hard to manage. And bid budgets are no exception.
I have personally found that most people I care about under-use their bid budgets. They do not negotiate at work, they do not state their wants enough, they do not push their ideas, they do not claim the attention of the group they belong to when they should, etc.
The impact of Epistemic Humility is better understood through Bids.
After all, from a pure epistemic standpoint, generic Epistemic Humility states to be less confident in everything. It’s almost void of meaning: it doesn’t change one’s beliefs in the end.
But Bids bypass traditional epistemic considerations.
Traditional Epistemology is about considering whether a statement is true or false, whether a plan is good or bad, whether an argument is sound or not.
Bids are largely about selecting which statements, plans and arguments are up for consideration in the first place.
This is crucial in the context of common knowledge: Plan A may be better than Plan B, but if Plan B is discussed more often, it is easier for people to coordinate around it.
For instance, I think of my current situation as an intellectual in these terms.
On the idea front, I believe I am doing well. I consistently get complimented on them.
However, I am failing to acquire enough social capital to bid for them.
This has very little to do with epistemic considerations, and much more with my (lacking) skills at earning social capital.[1]
—
The effect of generic Epistemic Humility is to weaken its followers.
The generic stance of Epistemic Humility dictates that we should be less confident in our beliefs because of our biases and epistemic failures.
At the very least, it recommends that we should express less confidence and qualify our sentences.
In practice, this is bad.
Sometimes, nerds say things like “I’m not sure but possibly [X]”, “Plausibly, [X]”, or “I think that [X] may be true”.
When nerds say this, they mean to make a regular-sized bid to add “[X] has 20% chance of being true” to what the group believes.
However, groups (not necessarily individuals, but the groups themselves), will interpret it as them making a much-smaller-than-usual bid for the group to believe [X].[2]
Put differently, the main effect of generic Epistemic Humility is for its followers to behave in a way that lowers their social impact. It doesn’t even make them more truthful, as it doesn’t help the groups they talk calibrate on better probabilities.
—
The effect of selective Epistemic Humility is pernicious.
Selective Epistemic Humility dictates that we should be less confident in some beliefs.
There are two cases in which this may happen.
First is when it is actually warranted. Someone provided evidence that a specific belief was wrong. In that case, it is not epistemic humility at all to believe less in it. It is nothing more than changing one’s mind as a reaction to new information.
When it is not warranted is where it shines. It is an isolated demand for rigour (Scott Alexander on the topic) masquerading as an emotional appeal to humility.
De facto, the selection of beliefs covered by the selective Epistemic Humility will gather less attention than others. It is a sneaky way to reduce their impact, without having to ever make the case that they are actually wrong.
I tend to think of generic Epistemic Humility as Epistemic Cowardice.
Epistemic Humility often amounts to taking decisions to protect one’s ego from the risks associated with uncertainty. Risks of being wrong, of people showing that we’re wrong, of being ridiculed for our beliefs.
In practice, I see people being ill-at-ease when they are uncertain. So, in a manner not too dissimilar to cognitive dissonance, they try to reduce their uncertainty by staying in their comfort zones.
Instead of Epistemic Humility, I find that I need people to be much more Epistemically Brave. I need them to figure out what’s best, and to be at ease with committing to courses of action even in the presence of uncertainty.
They should not quash their feelings of uncertainty and assume they are right, instead they should stay the course and not quiver in spite of the uncertainty.
—
Institutions and politics require a lot of Epistemic Bravery to go well.
From my point of view, I see that both are lacking in qualified smart people.
In my experience, Epistemic Humility has been a direct cause of this.
Many smart people told me things like:
I’ll never engage with politics. It’s so corrupt, so hard to figure things out, and so easy to cause things to go wrong.
Only psychopaths or power-hungry people are willing to do this, and I am neither.
And as a direct result of these thoughts and feelings, they have made their decision to not engage with institutions and politics.
At best, this is Epistemic Humility gone awry, wherein people feel that they do not deserve to Take Power, even through the Rightful Means.
At worst, it is a cowardice finding a convenient excuse in Epistemic Humility.
—
The same problem exists in situations with smaller stakes.
In many social situations (think of families, friend groups, corporate offices), when things get hairy, there are a lot of people who go “Oh, I don’t want to take sides in this drama.”
This attitude strongly empowers sociopaths. They know that until they screw up badly, they have a clear field to abuse people.
For the same reason, their victims are alone and have to fend for themselves. They have to bring neatly packaged irrefutable evidence to the group before it acts, which is very hard to do without the support of the group.
Victims often even get blamed when they try to prove what is happening! While the group doesn’t know who’s right, the victim’s attempts to gather proofs will be perceived as accusatory, mean and paranoid.
Sociopaths are aware of it. And they leverage it.
Victims are aware of it. They often resign themselves as a result of knowing people won’t help them.
This is morally bad.
I don’t mean that one ought to get involved with all the drama that happens around them: we are puny little humans. We have a limited ability and will to do what’s Good, and we thus necessarily do bad things.
But I think one should still be aware when we are doing something bad. We should especially not elevate our weakness to a virtue. We should not act as if we were superior to the people going through drama by virtue of ignoring it.
—
A friend responded to the previous section with:
I agree this is a real problem! Often, though, I think it arises out of avoiding acting out of fears of the consequence rather than any stance of Epistemic Humility.
This may be an excuse, I’ve more often heard people directly state they didn’t want to make anyone mad in these situations, or they prefer to reserve their role as a peace keeper, or so on (and have done so myself, for better and worse).
I believe my friend is largely right, but under-appreciates the extent to which ensuring that good norms are followed takes both common knowledge and someone getting dirty.
These two things trade-off: the clearer a rule violation is, the easier it is to punish the violator. And conversely, as a situation gets less and less clear, the more the rule-enforcer will have to bid in order to investigate and fix it.
When someone violates a good rule, someone else must punish them, and it takes a toll. At the very least, they must go to the rule-breaker and tell them “Hey man, you broke the rule.”
The rule-breaker will always counter-argue. Either because they acted in bad faith, or because they acted in good faith and feel the need to justify themselves.
Thus the rule-enforcer will have to pay a social cost to make the bids needed to make it clear that nope, the rule was broken. They will be the one asking people to pay attention to the arguments of the different parties, and take on the role of both prosecutor and executioner.
Then, the rule-enforcer will additionally need to pay the social costs required to enforce an eventual punishment: whether it is an apology to the group, a promise to not repeat the behaviour, a penalty, an exclusion, etc.
And finally, they will be the one to incur the dislike (or the wrath!) of the punished party.
This can all be mitigated by common knowledge and clarity. The more common knowledge there is about what happened and what the rules are, the smoother all of this goes. The less it costs to the person who will enforce the rules.
—
“Oh, but, can we really know what’s good or bad? How can we ever figure out who’s right or wrong? Can we really know if they meant to act badly?”
Such undirected Epistemic Humility only serves to weaken common knowledge. It thus becomes much harder for someone of comparable status to fault someone else.
This is not hypothetical.
Join spaces that take pride in their open-mindedness and humility, and you will see a lot of enablement of sociopaths borne out of “Oh, but who is to know what is truly bad or not???” to any bad behaviour on the edge.
In practice, enforcement in such places is even more asymmetrical than usual. It only happens when a person of low status hurts a person of high status.
This is in stark contrast with a Rule of Law, where Laws are Respected, and where everyone is equal in front of the norms.
Now, onto a more positive vision. In fact, I have an alternative to Epistemic Humility.
Instead of Epistemic Humility, I recommend thinking in terms of Epistemic Precision. Epistemic Precision is not about being humble about what we know, but being precise.
We are not random machines, outputting random sounds and writing random symbols.
There’s always a reason for why we think what we think, say what we say, and do what we do. Epistemic Precision is the practice of paying close attention to it, enough to get a reliable understanding of where one’s confidence comes from.
Let’s go through some examples.
I was once asked for feedback on a benchmark suite. It claimed to measure [some property].
But the author did not think that their suite was in fact measuring the property. They didn’t think the benchmarks were close to doing so, and they never used the suite themselves to evaluate [the property].
I thought this was thus blatant academic misrepresentation, and told the author I thought he was lying and should not do so.
—
They thanked me for reminding them to be Epistemically Humble, and to instead claim that their work was only a stepping stone to measuring the property.
I vehemently disagreed!
The assertion “The suite measures [the property]” is wrong. 10% of it is still wrong, just 10% as much. The direction was incorrect.
So they asked me if I thought they should retract. They were sad about it, because they put in a lot of work, and they thought it could still be useful.
I asked them why they thought it could be useful. They responded that they crawled through hundreds of benchmarks to build the suite, and that even though the benchmarks were bad, these were the closest they could find.
And my conclusion was: “Just say that!”
Indeed, “just saying that”, saying what they believed, would have transformed the project. It would have gone from “one more example of Academic Misrepresentation” to “Providing strong evidence for one of the major problems of the field: that its objective measures are thoroughly inadequate.”[3]
This is the magic of Epistemic Precision.
The example above was a bit too academic.
Where I have found that Epistemic Precision shines the most, is in situations that have to do with less cut-and-dry knowledge.
Here are a few small examples.
—
Quite often, I ask a question that seems deceptively easy. Not on purpose, it’s just how the world is: sometimes, questions seem easy to answer but they are not.
In these situations, I often get an unreflected answer, that reflects more their preconceptions than their actual thoughts.
I then follow up with “Do you know the answer to be true, or are you inferring that it is?”
It helps them a lot with pausing, and realising that they were making too many assumptions. Assumptions that I may not know about, that I may disagree with, that they themselves may disagree with, or which may turn out to be wrong.
—
On another note. As a human being, a lot of my knowledge is intuitive.
I value my intuitive knowledge quite a lot. But it is sadly much more subtle than formal knowledge, and thus hard to capture with words.
I also value other people’s intuitive knowledge. Sometimes, someone will make a claim that doesn’t seem natural to me. When I try to tease out why they make that claim, they become defensive.
Instead of an “I got this intuition from [doing X]”, they often start a barrage of rationalisations and fake post-hoc pseudo-logical explanations.
When I tell people about said rationalisations (by engaging with them, by telling them I want to learn about their intuitions, etc.), they revert back to some Epistemic Humility: “Oh well, I guess I don’t really know…”
This is so inefficient! There is a reason why they hold their intuitions and paid attention to them, it’s not just random noise. By paying attention to it, introspecting and understanding where their intuition comes from; we can then both learn and infer even more than what they immediately intuit.
Instead of them being Humble, I want them to be Precise. I want them to tell me what their intuition feels like, what reinforces it, where they have seen it work well, etc. I don’t want a bland “Welp, I guess it’s not Proper Formal Knowledge and it is thus Worthless, I should be Epistemically Humble and ignore it :(“
I have found Epistemic Precision to also be very valuable in dealing with “Indirect Knowledge”. Indirect Knowledge is knowledge that I have not gotten by myself, but instead that I have gotten from books, people, social media, and the like.
To me, “Indirect Knowledge” doesn’t feel real, it doesn’t feel concrete.
—
For instance, when I think “The sky is blue”, I have a pretty clear impression of what I feel and expect. I can think of all the time I have seen the sky being blue, I know what I mean. It is real knowledge.
When I think “The sky is blue because of scattering”, it feels super fake.
I can of course try to come up with an explanation. I have learnt some optics and some wave physics, but man.
Let’s consider the first explanation I come up with: “The atmosphere is mostly made of nitrogen. And nitrogen scatters blue more easily than the other colours.”
Even though I said that, I don’t know why fog makes everything grey (“water scatters grey more?? it’s not even in the light spectrum!”); I don’t know why the sky is orange during twilights; I don’t know why it is purple during typhoons. And to be clear, I haven’t played much with gaseous nitrogen either.
This is why “The sky is blue because of scattering” doesn’t mean much to me.
—
If I ever read “A paper shows that liberal christians are kinder than conservative atheists”, I wouldn’t even perceive it as “knowledge”. Like, this will not change, not at all, how I perceive liberals, christians, conservatives and atheists.
The only knowledge that I would have gotten there is that a “researcher” wrote it, and managed to pass it as “science” to many people.
To a large extent, this is how I relate to most of my indirect knowledge.
At this point, whenever I read claims that are not tied to a specific operationalisation, I treat them as social claims. A form of emotivism.
Concretely, if I came to read the headline above today, it would purely register as “Hurray Liberal Christians! Boo Conservative Atheists!”
—
To be clear, there are things I do to internalise indirect knowledge.
For instance, I initially learnt about solid dynamics and mechanics from books.
But through many exercises, watching standard experiments, building things myself, talking to teachers about it, and more; I managed to make this knowledge mine.
When I talk about solid dynamics and mechanics now, I know what I mean. I can dig into the details.
I now consider it direct knowledge.
—
However, if I am asked about something that I only know indirectly, I will be clear to the person about the fact that the knowledge I am relaying is indirect.
If I have a good recollection of where I got it from, I’ll say something like “Word is on social media that […]”, “I have read in a book that […]”, “Some researcher wrote in a paper that […]”.
If I can’t, I’ll say “I can’t remember where, but I remember having heard/read that […]. I have never experienced it myself though.”
Or, the worst, “I can’t remember at all. But I think someone once said [X] or something similar. Given how little I remember about it, I don’t even know what they meant.”
At that point, is it even knowledge? I do not think I would purposefully change my decisions based on this, nor that any of my interlocutors ever would.
This is how I deal with indirect knowledge in the framework of Epistemic Precision. I have cultivated an ability to state what nature of knowledge I have, how indirect it is, rather than just a vague feeling of “I should be humble.”
Armed with all of these concepts, the core points of this essay can be neatly summarised.
1) Epistemic Humility doesn’t serve an epistemic function. Its main effect is to weaken one’s social bids.
2) It is thus a good fit in situations where people benefit from weakening their own bids and not standing for their beliefs. This makes it a convenient cover for Epistemic Cowardice.
3) Instead, the actual virtue is Epistemic Precision. Being clear and confident in one’s actual beliefs. While these beliefs may be intuitive or indirect, one should still be clear about them.
On this, cheers!
I am working on getting better at this, which is in large part why I am writing more :)
But if you want to help me with this, let me know!
One may wonder “But then, how can I confidently convey to the group that I only think that [X] has a 20% chance of being true?”
I don’t have a perfect solution. Group epistemics are not made for this.
If I mean that we can not decisively be confident in any option, I may say:
“I am making a strong statement. We do not understand the situation enough and certainty is unwarranted.”
If I mean to put the emphasis on the fact that [X] has at least a 20% chance of being true:
“I believe that although it is unlikely, we can not exclude [X] from our considerations and we should have contingencies ready for it.”,
If on the contrary, I want to state that it has at most a 20% of being true:
“At this time, it is unwarranted to give [X] too much attention. First, we must think more about the case where [X] is false.”
In the end, they stuck to claiming that they were measuring [the property].
2026-02-25 18:22:56
i.
Did you know there’s an observatory in Cape Town that you can visit for like $2?
Personally, I vaguely knew, but didn’t understand that it was like an actual observatory. My reference point for actual observatory is 80% from Myst and 20% from the Blue Prince. For some reason they’re just strewn across video games like Lime Bikes in Brussels. But that’s beside the point. The point is that I recommend everyone go, even though I left really mad and confused.
To get in you have to attend a 1-hour lecture. This is no problem for me since I graduated university, where attending 1-hour lectures to get to do something is a skill I mastered by the 2nd semester. In fact, this was a lot better than university because you actually get to do something fun afterwards.
I do find it strange that they make this a requirement. Despite being an event for normies, you basically had to have a PhD in astrophysics to understand any of it.
After the lecture though, much like university, you get to go outside and look through dozens of telescopes. There was a weird vibe the night I went. Apparently Saturn had something astrological happening and so tons of Libras flocked to see it. You could tell the astronomers wanted to help people see Saturn but you could also tell that the astronomers knew most people here didn’t know the difference between astrology and astronomy.
The line for seeing Saturn was genuinely very long. Luckily I just wanted to see anything cool so my options were wide open. Then I walked past a dome shaped building with a mysterious blue light inside, pointed at the sky.
I stood there, just looking. I knew that they had an observatory here, but I thought that meant they had a computer with Google Chrome connected to https://nasa.gov or maybe some cute little telescopes. I didn’t realize it was an Observatory observatory.
Smash cut to: 4 hours later. I am back from a tour of the library and finally go inside.
I’ve never been to Disneyland but when I watch Jenny E. Nicholson’s videos it seems like the point of theme parks aren’t the rides, but the vibes. There’s a reason theme parks are considered their own medium on TV Tropes alongside movies, video games and comics and it’s because they also tell stories, only ambiently. This, like movies and books, is an art form. When done well, people seem embodied in another world. Walking around the observatory in the wee hours of the night with a cool breeze and this beautiful blue glow, i felt embodied in another world.
I felt even more embodied in this world, once I got inside. Looking up at the telescope, hearing the motor running, standing there under this artifact pointed at the stars. These feelings were easily worth the price of admission multiplied by 100.
If you’re thinking that this somehow made me biased, that I had decided I was going to like what I saw before I even looked through the telescope inside the observatory, then you’re 100% correct.
Yet, it still surprised me when the people ahead of me didn’t seem impressed or changed or healed after looking through it. I was sure this would be different for me though, because I had already decided it would be. When it was my turn I climbed the ladder and saw this bright ball with a hint of orange, and 4 or 5 tiny dots inline with it. I said “Oh my god”, but that was just performative astronomy. Despite my best intentions, I was kind of unimpressed.
Then, while I was standing on the ladder, trying my best not to fall, the astronomer who was with us mentioned that the little dots weren’t just any dots, they were the moons of Jupiter.
My awe was no longer performative. I wasn’t expecting to see the moons of Jupiter with my own eyes… ever. I honestly had no idea you could. So I just stood there looking at these dots and they were beautiful. I felt the solar system in a way I never have before. I could feel the photons travelling 43 light minutes all to hit a pale blue dot, then South Africa on that dot, then Cape Town, then this observatory, then the telescope in that observatory, all to be perfectly focused into my little retina at exactly that moment in time.
ii.
Suddenly it was over and I found myself back outside, invigorated, slightly healed, mostly awestruck and most importantly, ready to go home. That is, until I walked past one of the few cute little telescopes still open on the observatory lawn. Of course, I had to look through it.
My expectations were pretty low given the building-sized telescope behind me, but I was curious what it was pointed at, and thought I’d maybe see the moon. So you can imagine my surprise when I saw... spiraling galaxies? Also nebulae and what I can best describe as the astronomy photo of the day landing page.
This was weird. But first of all, it was cloudy. I looked up again just to make sure — there were a lot of clouds. So where are the clouds? I mean, what are we to believe, this is some sort of a uhm magic telescope or something?
I looked through the telescope again and indeed saw the same galaxies and nebulae floating around with their Lightroom vibrancy on 100. But this time I noticed the pixels. Not good pixels either; they looked like the ones you’d see on a 1366 x 768 monitor in the year 2012.
This is compared to the pixels on my mac, which I literally can’t see.
Good or bad pixels, they were the pixels on the screen of a computer, not the glass on the lens of a telescope. This was a computer with a camera, just like your phone or your laptop with a webcam. The only difference is that this one decided to dress up in the shape of a telescope.
I kind of felt like I had been tricked. When I saw something in the shape of a telescope I assumed this meant live and unaltered. I think other people also assumed this too, because the people that looked through just before me were blown away by this telescope.
It’s possible they had never seen pictures of deep space before, yet I got the feeling that they were not blown away because this was the first time they were seeing an image of a spiraling galaxy. Instead, it felt like they were blown away since they were seeing a real spiraling galaxy in the wild, as opposed to the PNG of one on nasa.gov.
Of course this was not a real spiraling galaxy. Depending on your theory of perception, it’s either a hallucination or illusion or actually, never mind. I hope we can agree that at the very least it was only as real as the spiraling galaxies in the PNGs we get back from the James Webb Space Telescope. In principle, the act of looking through this telescope was no different from looking at a PNG on your phone, except it would actually look good on your phone and would also come from a $9.7 billion telescope.
I got this weird feeling when I noticed person after person take a look through it, get blown away, and the astronomer next to them mention nothing about how this was closer to your iPhone than the huge observatory behind us.
iii.
The first thing I said after I lifted my head away from the eyepiece was: that’s not real. I said it pretty loudly too.
The astronomer told me, as I stood up, this was a real image. It was a stacked image taken right here over 30 minutes or so.
It’s great they hadn’t just downloaded an image. But even though the image was created with this telescope, I still don’t feel like what I was seeing through that telescope was real. I stood there struggling to express this to the astronomer for 5 minutes, before I gave up and went home, where I struggled to express this to a Google Doc for 12 months.
I tried to figure out what was going on here, that the answer to my intuitive distaste for the digital image would be in some theory of perception that accounted for optics.
Now though, I think it’s less the philosophy of perception that matters and more ethics. In that, it’s not so much how real the image is compared to an optical telescope but that (1) it’s dressed up to seem like an optical telescope and that (2) none of the astronomers (proactively) said anything about them being different.
I really want to be able to express this feeling clearly and fairly because the people around me thought they were seeing the real thing. And it would be wrong to say that what they saw was fake, but it would be just as wrong to let them go home thinking they’ve seen a galaxy in the same way as we saw Jupiter earlier.
When you look through the small digital telescope you are seeing something real-ish — it’s real in the sense that looking at a long-exposure image of a highway after it’s been edited in Photoshop is real.
When you look through the huge optical telescope at Jupiter you are also seeing something real, something as real as when you use binoculars to look at a bird far away.
The problem is that both use the same form to present different kinds of realities. It would be fine to present a digital image from the digital telescope on a screen or phone because it’s presenting a photo as a photo. But when you put a screen in a telescope, you’re purposefully presenting a photo as a window, which is why people walk away thinking they’d seen a galaxy the way they saw Jupiter.
iv.
Astronomers aren’t confused about the difference.
I emailed them a few weeks later and they said “Seeing with your own eyes through an optical telescope offers a kind of direct connection—what you’re seeing is literally the light that traveled across space and into your eye. With digital telescopes, while you’re still seeing real photons captured live, the experience is more like a bridge between amateur observing and professional imaging—giving you deeper views but mediated by technology. Both are valuable, but they engage slightly different aspects of the experience: one more immediate and raw, the other more detailed and enhanced.”
They also agreed that they should have told people about these differences. The reason they didn’t is because turning a blind eye to these kinds of misunderstandings is beneficial to astronomers both on the night and at scale.
In 1985, there was an incredible article published in the NYT. In it, Malcolm W. Browne writes how space images are ruining what kids expect to be able to see in the sky.
“Johnny points his pricey toy at [...] the Orion nebula, an object the textbooks describe as a brilliant gas cloud from which millions of infant stars are condensing. Photographs in slick magazines show this nebula as splashed with salmon-colored flame, but what does Johnny see? Just another smudge of white light - faintly tinged with green” - https://www.nytimes.com/1985/06/18/science/guilty-of-disappointment.html
Well, it’s 41 years later. Browne was right, so right that people’s expectations didn’t stay the same they got even higher, and instead of adjusting the public’s expectations and understanding of how interstellar and intergalactic objects are photographed, astronomers have continued to allow our incorrect intuitions to embellish space pictures. And also it’s much more effective now because we all have incredible screens to consume endless amounts of brilliant gas clouds and on the odd occasions when a member of the public does come into contact with and use a telescope, well there conveniently are telescopes that have screens to fulfill those expectations. Of course digital telescopes aren’t soley created to mislead everyone. Yet it’s still good for the observatory that they do.
If people came expecting Pillars of Creation and left seeing smudges, not many would leave happy, and it’s easy to understand why astronomers would make the tradeoff of more people being excited about space if it means omitting certain details.
Though this is about a tiny observatory visit, this is also not about a tiny observatory visit. This practice of convenient omission extends into general photographs astronomers publish in the media. Though calling them photographs is maybe too generous. As Kate Crawford says:
“Every photograph from an Apple iPhone is fundamentally an AI image. Every photograph taken by an iPhone camera is actually many frames with different exposure levels melded together as a composite. And each image is broken apart semantically, so that elements like skies, trees, and faces are each treated differently. They are machine learning mosaics, not photographs in the traditional sense. We have moved to a post-optical period of photography, beyond photos and the transmission of light through lenses toward statistical paradigms of image making.” - https://www.are.na/block/40131791
If that’s true of an iPhone taking pictures of visible light on earth, I’d argue it’s even more true of the James Webb Space Telescope taking pictures of galaxies with infrared and then being edited to look an arbitrary color, and then enhanced by multiple groups of people before you see the result on your phone.
For example, it was only while writing this article that I learned that if you were floating in the middle of the Pillars of Creation it would look the same as if you were floating basically anywhere else in interstellar space. Structures like the pillars of creation are objects which exist in a completely different way to the objects we are accustomed to on Earth. It’s not wrong to feel a deep sense of awe from the Pillars of Creation, but perhaps it shouldn’t be too deep.
v.
What made the moment of looking through the huge telescope at Jupiter and the moons of Jupiter special wasn’t the image quality or level of detail I could see. It was the connection I felt with another planet and it was a connection I’ve never felt from looking at pictures on my phone.
We don’t yet have a good name for what these digital artifacts are. That they’re not exactly photographs is clear, but astronomers don’t seem to be in a hurry to correct anyone’s understanding of what they actually are.
7/10 - includes free parking.
Tickets are R40 at https://www.quicket.co.za/events/357682-cape-town-open-nights or email [email protected] for more info
2026-02-25 14:24:16
After Borges, circa threat modelling
It is not its content that makes the iron kaleidoscope extraordinary. It is the way that the apparatus of the kaeleidoscope exaggerates itself upon the human eye. It reflects itself, and in its recursive reflection, contingencies are multiplied. Artefacts that exist in one point are conjured simultaneously in other points, and those reflecting other points, such that the effect is a maddening multiplication of space within a fixed point. The scope of a unit of space expands.
There is something maddening about looking too deeply at the kaleidoscope. There is so much happening, and to tilt the system, to choose another angle, sends the pieces toppling in strange patterns, and reveals a world of branching complexity and confusion. Patterned in its lightning are runes of utopia, runes of destruction, and worse, prophecies simply illegible, unthinkable, indigestible to the well-read eye or socialised mind.
Few care to spar with this confusion, or to bear humiliation for long. But for those that tend to the structure as play—that tilt the mechanism, and look again, and see the structures of confetti and bone blossom into queerer shapes, take note, and tilt the mechanism again, at first for no more than the joy of the artefact—familiar rhythms emerge. Axes of symmetry become apparent around which one's system may orientate. It’s more of a muscular learning, however, like a hunters knack of the eye—where to look when the system rustles and the patterns whirl. It’s not a language one can quickly transcribe.
For those with hands and eyes entwined with the mechanism in fascinated lock-step it becomes apparent that the world in the kaleidoscope demands a new grammar to describe. And so they garble, make half statements, freely err. The machine turns and shreds their minds to shards. New words emerge, or a half-heard statement, scoped down, does work. Piece by piece these construct a model to teach the workings of the world inside the iron kaleidoscope. There is, perhaps a precious prize, for inside its maze they say there is every vision of God’s eye: both hell, and the heavenly, in infinities unthought.
Each night those who live by gazing upon the iron kaleidoscope walk home. It is late, and dark. Perhaps it is raining. In Auden’s poem, on Brueghel’s painting, when Icarus died, the world roared on regardless. One will squint, and see, not the kaleidoscope, but the iron; and for a moment, like a blackbird, doubt will pass through their minds.
The moon rises, and the whispering trees stand ignorant.
Some will reach their home like this. Some will unlock the door. They will say to their partner — it’s only an old joke. They will say to the mirror, it’s just a machine. But lying there, in the darkness, it is not the unknown that will get to them. It is the fear of the well-known, what their own eyes have seen. It is the fear, more specifically, of the imminent: of an artefact, not irrelevant, but outside the affordances of mind, like some high dimensional galleon or space ship: vast, omnipotent, barely illuminated, not properly in view, and potentially—unfalsifiably—armed to the teeth.
The kaleidoscope is an unthinkable thought, the iron a false cage. And with that thought, they sleep, and dream, and know even in sleeping that their dreams feel more real that the life that they will be implicated in, in the morning when they open their eyes.
Brueghel The Elder, Landscape with the fall of Icarus. Icarus can be seen falling in the lower right. Auden's poem on the painting can be read here.
2026-02-25 14:11:59
Or: When Memories Get Good -- The Default Path Without Theoretical Breakthroughs
Epistemic status: Fairly confident in the core thesis (context + memory can substitute for weight updates for most practical purposes). The RL training loop is a sketch, not a tested proposal. I haven't done a thorough literature review.
Suppose there are no major breakthroughs in continual learning -- that is, suppose we continue to struggle at using information gathered at runtime to update the weights of a given instance of an AI model. If you try to update the weights at runtime today, usually you end up with catastrophic forgetting, or you find you can only make very small updates with the tiny amount of useful data you have [1].
So, if you can’t train a day’s worth of information into the model, how could you end up with something that functions as if it were learning on the job?
Long Context Lengths, High Quality Summaries, and Detailed Documentation [2] [3].
It’s a straightforward idea, and basically done today, just not particularly well yet. Laying it out:
That’s it.
Why Doesn’t This Work Now?
Firstly -- it kind of does. In my own software projects I maintain a concise Claude.md file (which gets passed to each new agent on spawn), as well as extensive documentation which the Claude.md points to (and which the Claudes can search at will). Claude and ChatGPT already produce and store ‘memories’ in this way through their existing harnesses. These work okay, and we know that models can effectively learn in context.
But it doesn’t work that well yet. I suspect this is because current models just aren’t very good at writing or at using these notes.
It’s actually a very hard task. We’re basically having the model ask itself “What do I know that a fresh instance doesn’t, that would be useful for it to remember across all future instantiations?” and then asking it to write this down using as few tokens as it possibly can.
For a model to be able to do a good job, it needs to understand whether the things it knows are coming from its current context or its weights, and accurately guess how a future instance will respond to the memories. Basically, it needs to have a good theory of mind.
I think the difficulty of this task is the main reason memory especially sucked when it first came out. There are plenty of examples of irrelevant memories being created inside ChatGPT, for example.
It also took some time to train models which understood what the memories were and how to use them. Previously, models would attend too strongly to memories in irrelevant contexts, bringing up notes where they don’t belong. Kimi K2.5 still struggles with this, in my experience, seeing notes at the start of its context window as very important and relevant, even in situations where they shouldn’t be.
Claude ignores the apple note. Kimi always finds a way to bring it up.
But memory is getting much better, and newer models use it more successfully. I expect that as models get more intelligent their use of memory and documentation will continue to improve, especially in the world where this is trained for explicitly. Models are also getting better at handling the retrieval of dense information across their long context windows, so a mundane prediction that these trends continue should point us towards prosaic “continual learning” becoming quite useful over 2026 and 2027.
It also should be noted that memories like this are functionally the same as compaction (summaries written by the AI when reaching the end of the context window, so it can continue working). In both cases the model is writing compressed information to pass to a future instance to (hopefully) perform better. This is already an optimisation target for frontier labs.
How We Could Make It Work Better
We can easily train models to create and use memory as an RL task. To sketch out a simple method -- suppose that when finishing a task, instead of scoring the model’s performance immediately, we have the model write memories and documentation, and then we run a new instance on the same, similar, and dissimilar tasks [5] with those memories and documentation, and have a reward function which scores on the combined performance (with some small penalty for the length of the memories). This looks like:
The reward function used for the actual parameter updates would be a function of the scores across each of the models, plus some penalty relative to the length of the memories and the total context length of the model.
There are several other ways to do something like this, of course, and some would be much more efficient than what I have laid out here. I’m mainly trying to get across a few key ideas:
Overall I would expect this to reward both the model’s ability to write AND to understand its memories and documentation, with some risk of pushing the model towards very dense, difficult to read memories (ala linguistic drift).
I haven’t spun up an experiment to test this empirically, but may do at some point. If anybody else would like to, or has done so already, please let me know!
Could This Replace Real Continual Learning? What About Intelligence Gains From Having The Information In The Weights?
There are two things going on here that we need to untangle. The first is about the model having the correct information to achieve its goals. This is what gets put into the memories and the documentation, and what is addressed by prosaic continual learning.
The second thing we wonder about is how to increase the intelligence of the model. How can it do more with less information, or figure out new things that it wasn’t told, or get better at acting in the world in a general sense.
With prosaic continual learning, the real intelligence gains only happen in the next generation of AI models.
Suppose Claude 5 is launched with a 1m context window, and it is smart enough to write good [8] documentation and memories. If a task uses about 500k of context, and produces about 1000 tokens of new memories, then doing ten tasks a day, every day, you can run the model for 50 days before you hit the ceiling on how many memories you can store [9] [10].
Then, 50 or so days later, Claude 5.1 is launched, with improved capability by the usual process. Claude 5.1 inherits the existing memories and documentation and immediately works on improving and compressing them [11]. Combined with a longer context window, the new Claude 5.1 might buy another 50 days of memory [12].
Repeat ad nauseam, or at least until Claude N solves true continual learning with parameter updates at runtime.
In this way, the lessons from a particular deployment (say, by a model that has been answering phones for a particular company) are trivially passed from one generation to the next while capabilities continue to improve via regular training. In practice, is there anything more we need true continual learning to do? [13]
Can We Have A Human Brain Analogy, Please?
One of the reasons continual learning is so popular a concept is because humans do it, which makes it a very attractive answer to the question “What can’t AI’s do yet?”.
The human learning process looks something like the above chart, where we have an explicit, discrete, and extremely small working memory, which holds somewhere on the order of 10 objects in memory at a time. This probably exists as activations in the pre-frontal cortex. It’s analogous to LLM’s context window, being lossless and explicit, but is far, far smaller.
Then, humans have a kind of buffer, where information is stored on the order of hours to weeks in a lossy but easily accessible way. This seems to be held in the hippocampus. You can draw a weak parallel to AI reading documentation here, being some partially processed summary of what has happened, accessible with a few seconds of thought.
Humans can read documentation too, of course, but the read speed is extremely slow in comparison. AI is able to read documentation at a speed that is more comparable to a human recalling a specific memory.
Next, humans have long term memory, which is slowly updated on the order of days, probably by reading and updating against the hippocampus’ “buffer” [14]. This is where the missing piece for LLM’s continual learning would be an analogy, if we knew how to properly update an instances’ parameters at runtime.
Finally, even humans don’t become more intelligent after reaching full adulthood [15]. We rely on evolutionary selection to make any significant changes to human intelligence. The analogy here is to the next generation of AI models being trained, although that happens far, far faster.
Laying it out like this, you can see the ‘long term memory’ update step is missing, but the ‘context window + documentation’ is ridiculously larger in storage capacity that human working and short-term memory, and the ‘intelligence gain’ step so much shorter, that skipping a weight update at runtime might be viable. Humans require memory related parameter updates because we can’t store much information in working or short term memory, but if our working memory was so large it didn’t fill up within our lifetimes, you can see how the situation changes.
Conclusions
Having now thought through this, I have updated away from continual learning being a real issue for AI capabilities in the near future [16].
It doesn’t seem like it is needed for general purpose capability improvements, where the regime of releasing a new model every few months works fine.
It doesn’t seem like it’s needed for company specific work, where you can store all of the needed information in documentation and in context.
I think the fact that it has to be written and used explicitly by the models is a satisfying answer to why it hasn’t worked well so far -- the models simply haven’t been smart enough to do a good job at this so far.
I’m also bullish on progress on this problem being fast, given that this performance is something that can be straightforwardly optimised with unsupervised RL, including training models to handle and edit stale memories.
Overall... damn, I guess we’re making continual learners now.
People think about the goal of continual learning as being ‘the model can learn on the job’, so, practically speaking, the main use case is for specific, non-generalisable data unique to this deployment of the model. When I say you don’t have enough data to do this usefully then, I mean, one days’ (or one months’) recording of work is a tiny amount of data to try to fine-tune a model on. You can’t reliably learn new things this way, though you might be able to elicit existing knowledge in the model. ↩︎
This is not a new idea. Dario spoke about it on Dwarkesh, and a quick Claude search reveals several different papers talking about the concept, most of which I haven’t read in detail. I am writing this post because I haven’t seen it clearly, publicly combined in one place before, and maybe there’s some interesting exploration of the RL training loop and why explicit memory has been a hard thing for models to get right. ↩︎
We also have versions of all of this today, which is why it’s “prosaic” continual learning. ↩︎
You could also include things like tools the model has built for itself, information it's found online and wants to make a note about, and really anything that is created or curated for the models’ use without the entire thing being stored in the active context window. ↩︎
Same task means literally the exact same task [17].
Similar task means tasks pulled from the same narrow distribution. For example, the set of things a particular employee might do in their work for a single company. We want to encourage memories that are useful across this somewhat narrow domain.
Dissimilar task means tasks pulled from more radically different distributions. Coding, psychological support, creative fiction, etc. I think we need to include some probability of dissimilar tasks in the batch in order to train the model to not rely too strongly on memory. At deployment time, the model may indeed be given memories that are irrelevant for the task at hand.
If I had to take a random stab at the proportion of each type of task assigned for a given batch, I would weight the distribution so that the N+1th task is about 89% likely to be from the similar distribution, 10% likely to from the dissimilar distribution, and 1% likely to be the exact same task repeated. ↩︎
The model should write memories and documentation on both successful and unsuccessful attempts at the problem -- it likely has useful information about what to try or not to try either way. I’m also imagining that there is some penalty for overall token usage when training for inference efficiency reasons -- that would incentivise the passing of useful tips and lessons via memories and documentation, if it can make the later instances more efficient.
It is even fine to pass the entire solution via memory, so long as the model has learnt when it doesn’t apply, and has been suitably penalised for the memory length. I think we can get this result by tuning the proportion of same, similar, and dissimilar tasks being scored together -- that is, if we run similar tasks n times, and dissimilar tasks m times (and possibly the same task p times), with the memory and documentation passed through for a given reward calculation, we can select n, m, and p such that generally useful tips are favoured over long and specific instructions. ↩︎
I’m unsure whether documentation should be length penalised or not. You get this to some degree by measuring the performance of the model using the documentation. I’d lean towards probably not, using the principle of allowing the training to choose whether short or long documentation is better. I’m assuming we use a tool which allows the model to choose to read some reasonable amount of tokens at a time, rather than risk breaking things by dumping entire files in, or only clipping them when they become very long. ↩︎
In the memory case, ‘good’ means that they can figure out what would be useful to know in all future runs, and can recover from bad or missing memories by editing it later. In the documentation case, it means they can include all the relevant information accurately, avoid including slop, and then use the information to be much more effective than they would be without it. ↩︎
I made up numbers here just to show how much room there is. In this case, I get 1 million token context window, minus 500k task buffer, leaving 500k tokens for memories. At 10,000 tokens per day, we get 50 days of memory buffer.
This is also kind of a ‘worse case scenario’. A thousand tokens for memories for each task is very high, since most memories could simply be pointers to where the real detail lives, and you would quickly run out of new things to write. Do you memorize ~6000 words worth of new information every day, and keep it memorised for the rest of your life? If you can compress your new memories to only 1000 new tokens per day instead of 10,000, you get over a year of runtime. Alternatively, increasing context length from the current 1m tokens also provides wiggle room. ↩︎
Different tasks will have very different profiles here. For example, coding might require only very short memories, whereas piloting a robot through a factory might require memories that include a map and descriptions of every mistake the model had made on previous trips. ↩︎
We can expect a new version to be better at the difficult task of creating and using the memories & documentation, especially if it’s trained explicitly for this. Some possibilities here, which point towards shorter and fewer memories:
I am pretty confident that memory usage should be able to grow slow enough that a Claude working for a particular company can fit everything it needs into context and explicit documentation. For this not to be the case, you have to assume that extremely large amounts of information are needed (multiple books worth), and that you discover new information that must be held in context (rather than in documentation you can look up) at a rate faster than the context window grows, and that future models won’t be able to significantly compress existing memories or be able to move existing memories into documentation by virtue of being better at knowing when to look up things. ↩︎
In the limit, this process is functionally identical to continual learning, as far as I can tell. Just imagine the 50 days between model releases reducing to some short period, like a day or an hour, and imagine the written memories that are passed forward becoming denser and denser, an abstract initialisation pattern that is loaded in for a deployment (like a static image).
Putting the same scenario the reverse way, imagine a model with traditional, weight-updating continual learning. Rather than updating its weights directly, it (like humans) uses a short term memory buffer to store new information and isolate private information from the weights. Every hour, the relevant lessons from the previous hour’s work are trained into a copy of the model, which is then seamlessly switched out, and the buffer updated. ↩︎
I don’t know if you’ve ever noticed your long term memory updating, I feel like I have. Have you ever had a major event happen, and then only some days later have cemented a behaviour change, even though you knew the change was necessary from the moment of the event? ↩︎
They continue to learn more, which makes their crystalised intelligence (knowledge and skills) go up, but their fluid intelligence (ability to reason abstractly, solve new problems, etc) declines after early adulthood. ↩︎
I’m even coming around on continual learning being worse for most mundane uses -- suppose you have your own version of a model, with the weights updated to store information specific to you and your use case. What happens when a new model is released? You have to retrain? What happens to the optimisations from batching? ↩︎
I actually think it’s debatable whether you should include the literal same task as an option for the nth instance (with the memories and documentation prepared by the (n-1th) instance) to be assigned. If you do this, the model could just include the whole solution in its memories, but honestly, for some production usage and types of task, that could be a reasonable and viable strategy.
I think in general we should try to train on the same distribution as the deployment, so whether to include the literal same task (vs just similar tasks) as a possible option here depends on whether you think that’s a situation that is likely to occur in practice (maybe setting up the same programming environment many times?), and whether you get anything from doing this (quickly using the cached procedure?). ↩︎
2026-02-25 10:57:55
The voice in my head is an asshole. — Dan Harris
I've always assumed that habits were just physical things: the habit of washing your hands before eating; the habit of smoking cigarettes after sex; the habit of checking your phone first thing in the morning. Recently I learned that there are mental habits, that some of them are bad habits, and that those bad habits can be broken.
It's normal to reflect on your past to learn from your mistakes. This is a good mental habit. But when you're spending hours every day thinking about the same past event, that's a bad mental habit known as rumination.
Let’s say you had an argument with a friend at a party. The next day in the shower you think:
“If only I had said this, then he would’ve agreed with me!”
That's normal. You’re processing the event. But if you begin thinking about it all day long, and even the next day, then you're no longer reflecting and have veered into the territory of rumination. Clearly this event was important for you—that’s why your brain wants to review it repeatedly to make sure you didn’t miss any details. But eventually there's no more analysis that can be done and your brain can get stuck in review mode. When that happens, it can actually damage your health.
Personally, the longer I allow myself to ruminate, the more aggressive my inner voice becomes:
“If only I had said this, then he would’ve agreed with me!
…
And if I wasn’t such an idiot then I would’ve thought of that.
…
God, why am I so fucking stupid??”
According to Dr. Ethan Kross in his book Chatter: The Voice in Our Head, Why It Matters, and How to Harness It, this type of self-shaming actually worsens our health:
When our internal conversations activate our threat system frequently over time, they send messages to our cells that trigger the expression of inflammation genes, which are meant to protect us in the short term but cause harm in the long term.
This happens because our cells interpret the experience of chronic psychological threat as a viscerally hostile situation akin to being physically attacked.
Unfortunately, you can’t change what happened in the past, and ruminating on it just makes things worse. But there is a way to break this mental habit!
Recurring ruminative thoughts are like a toddler whining for candy. If you give in to her demands, it teaches her that whining works to get your attention, and so she’ll whine more.
Good parents know that saying “no” to a child is important to their development because they learn that you're willing to set boundaries and will enforce them. But how you say “no” is equally important. Telling the child to “shut up”, or neglecting their request entirely, creates a poor relationship with your child.
Instead, gently telling the toddler, “it's before dinner, candy would ruin your appetite,” lets her know that you acknowledge her request and that you see her, but you will not give in to her demands. She may whine at first, but if you maintain your resolve, then she’ll learn that whining doesn’t work.
My ruminations typically happen with respect to my dating life. When I go on a date and it doesn't work out (when I was hoping it would), my mind immediately goes into detective mode: what did I miss? did I make any mistakes? what could I do better next time?
These are all helpful questions, but only in moderation.
Even after I think deeply on the matter for 20-30 minutes, the rest of the day (and the next day, and the next…) my brain keeps returning to the date and wants to solve something that is unsolvable—which is to change the past.
I've learned to do two things to help stop ruminations:
When I first started doing this practice of labeling thoughts (which comes from Cognitive Behavioral Therapy), I would have to stay vigilant all day long to ensure that I don't slip into ruminating, and thankfully by the next day my brain would quiet down. Nowadays after a date that doesn’t work out, and after I journal about it, I label any lingering negative thoughts as ruminations which quickly go away once I show them that I’m not going to engage with them.
Eventually my brain moves on and thinks about other stuff, just like how the toddler eventually gives up on her demands for candy when you keep gently telling her “no”.
Metacognitively, the worst thing you can do is to actively suppress your thoughts. Saying, “I don't want to think that thought anymore!” doesn't work, and can paradoxically increase the frequency of that thought. It’s similar to if someone told you, “don’t picture a pink elephant for the next five minutes!” Well, you’re probably going to picture a pink elephant as soon as they say that.
I didn't know this when I was 19 years old. Back then I had an intrusive thought so disturbing, that I immediately tried to suppress it—to memory-wipe myself from ever having thought it. That really doesn't work. My brain tortured me by blasting that thought on repeat for a year straight. The more I tried to suppress it, the more frequently it would come up. It was only when I finally acknowledged the thought, discussed it with a trusted friend, and journaled about it, that the thought finally went away.