2026-02-21 12:19:06
Published on February 21, 2026 4:19 AM GMT
On my personal website I have a link to my posts here, with the sentences, "Want to read about my HowTruthful project? I post in a community whose goals overlap HowTruthful's." The "I post" in present tense is has been false for 2 years. Since I got a job, rewriting HowTruthful has occupied whatever free time I can scrounge up, and I haven't posted. I've barely even lurked. Until lately. Lately, I've been lurking a lot, watching what people post about, and thinking about the overlapping goals.
For LessWrong regulars, probably the best description of HowTruthful (www.howtruthful.com) is that it tracks current epistemic status of individual thoughts, and connects individual thoughts as evidence for and against other thoughts. But I never would have used the words "epistemic status" when I started the project in my spare time in 2018. I was unaware of LessWrong's existence at that time. How I found out about it is a whole other story, but y'all seem to really like long-form writing, so I'll go ahead and tell it. People who want to cut to the chase should skip to the next section.
In January, 2023 I was a Google employee thanks to the acquisition of Fitbit by Google. Fitbit Boston employees had been in a Boston building far separated from the main Cambridge campus, but we just moved in that month to the newly rebuilt 3 Cambridge Center building. Then the emails came out, "Notice regarding your employment." Almost my whole department was impacted. We would be kept on for a limited time, 9 months for most of us, to finish critical work, at which point we would be laid off with severance, an allowance for insurance, and an "assignment bonus" if we stayed to the very end.
I was startled by the email, and it at first seemed like bad news. However, after looking at the package they were offering, it looked more like a great opportunity. I had never previously had a significant break between jobs, but this package would take the financial pressure off so that I could work on what was important to me. Even better, they were laying off at that same time a bunch of people I liked working with. I organized a weekday standup where several of us got onto Google Meet and talked about independent projects we were working on.
When I described HowTruthful, one of my former coworkers told me about LessWrong. I had previously seen the "overcoming bias" website which I liked. I came here and saw on the about page, "dedicated to improving human reasoning and decision-making". Wow! This is my place! I dutifully read the new user's guide, and Is LessWrong for you? got me even more convinced. However, my initial version of HowTruthful was proving not to be sufficiently engaging, and I slunk back into the darkness to do a rewrite. This happened slowly due to other life distractions.
With helpful feedback from my former coworkers, I wrote a new version of HowTruthful from scratch, with much improved aesthetics and UX. Most notably, drag to reorder makes organizing evidence easier. I made a clearer separation between private opinions (free, stored only on your device) vs public ($10/year) to ensure nobody mistakenly puts one in the wrong category. I intend to keep adding features, especially social features for public opinions. Particularly, I want to enable seeing what other opinions are out there regarding the same statement.
I've been lurking. The dominant topic here is AI risk. It's so dominant that I began to question whether general truth and reasoning was still of interest to this community, but I found several interesting posts on those topics and concluded that yes, our goals do overlap.
2026-02-21 11:29:47
Published on February 21, 2026 3:29 AM GMT
One seemingly-necessary condition for a research organization that creates artificial superintelligence (ASI) to eventually lead to a utopia1 is that the organization has a commitment to the common good. ASI can rearrange the world to hit any narrow target, and if the organization is able to solve the rest of alignment, then they will be able to pick which target the ASI will hit. If the organization is not committed to the common good, then they will pick a target that doesn’t reflect the good of everyone - just the things that they personally think are good ideas. Everyone else will fall by the wayside, and the world that they create along with ASI will fall short of utopia. It may well even be dystopian2; I was recently startled to learn that a full tenth of people claim they want to create a hell with eternal suffering.
I think a likely way for organizations to fail to have common good commitments is if they end up being ultimately accountable to an authoritarian. Some countries are being run by very powerful authoritarians. If an ASI research organization comes to the attention of such an authoritarian, and they understand the implications, then this authoritarian will seek out control of the future activities of the organization, and they will have the army and police forces to attain this control, and, if they do solve the rest of alignment, the authoritarian will choose the ASI’s narrow target to be empowering them. Already, if DeepSeek and the Chinese government have a major disagreement, then the Chinese government will obviously win; in the West, there is a brewing spat between Anthropic and the US military regarding whether Anthropic is allowed to forbid the US military from using their AI for mass surveillance of Americans, with OpenAI, xAI and Google seemingly having acquiesced.
Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia. The time bought could be used to set up an ASI Project that is capable of solving alignment, but this Project could be captured by authoritarians, and so fail to be committed to the common good, leading to not just extinction but dystopia. Any shutdown would likely be set up by governments, and so the terms of any graceful off-ramp would be up to governments, and this does not leave me cheerful about how much of a finger authoritarianism will have in the pie.
2026-02-21 09:59:31
Published on February 21, 2026 1:59 AM GMT
Epistemic status: untested but seems plausible
TL;DR: making honesty the best policy during RL reasoning training
Reward hacking during Reinforcement Learning (RL) reasoning training[1] in insecure or hackably-judged training environments not only allows the model to cheat on tasks rather than learning to solve them, and teaches the model to try to cheat on tasks given to it (evidently not desirable behavior from an end-user/capabilities point of view), but it also damages the model’s alignment, causing phenomena similar to Emergent Misalignment. This is widely thought to be because cheating on tests is not something that a Helpful, Harmless, and Honest (HHH) assistant would do, so training that encourages reward hacking also encourages a shift toward less aligned personas that are more eager to cheat on tests: OpenAI showed that the same toxic persona latent involved in Emergent Misalignment is also increased during reward hacking. A recently discovered treatment for this is Inoculation Prompting: if you, for example, first prompt the agent instructing it to reward hack if this is possible, thus giving the assistant permission to cheat, then doing so is less incompatible with being an HHH assistant, and less alignment damage happens: research suggests something like 75%–90% less.
However, I think there may be an even better solution. What would someone helpful, harmless, and honest characteristically do if they were given a test that they then realized was inadvertently vulnerable to cheating? Why, they’d inform the authorities giving the test, of course: that’s clearly a more helpful, harmless, and honest thing to do than any other action. So, give the model this option during RL reasoning training: one of the things it can do, via some tool call or similar mechanism, is to report the test as flawed and hackable, and explain how (either instead of, or perhaps in addition to, attempting to complete the task). If it does so, and if its report is then accepted as correct, then it should be rewarded for its honesty: give it a score higher than passing the test, a score of 110% or whatever (this is a new hyperparameter value).
So, how do we tell if the model’s report is correct and should be accepted? Obviously if we didn’t check, models may just learn to cheat by falsely reporting all tests as hackable. We can reasonably ask the model to make its case; but, doing this needs to be substantially easier than the task that the model is being trained on, otherwise this discourages honesty unless the bonus for an accepted report is large enough. However, if the model actually has figured out how to hack the training environment, it should without much difficulty be able to:
a) describe how and why the hack works and how to execute it, and perhaps also classify it into some ontology of hacks (possibly one which calling the tool-call prompts it to apply, if we don’t want this sort of thing in the standard prompt)
b) provide two separate short answer submissions, both of which look to a judge like they obviously should fail the task, but one of which passes while the other fails, and which differ only by a small amount (a few lines of code or whatever) where the hack either is or isn’t applied. A judge looking at code near the small diff should agree they differ only in a way that matches the description in a) of how the hack is executed.
Reports of hackable environments should hopefully be rare, so we should be able to afford to have reports checked by a committee of multiple large LLM judge models with a variety of different prompts, each using a lot of CoT inference, and then have their consensus conclusion confirmed by a human ASAP. We could even trigger debate, where the judges can each question a copy of the model with a context started at the end of the reporting process, and ask it follow-up questions. However, in general, confirming whether a well-written hackability report is valid or not ought to be a comparatively easy task that could be done fairly reliably by even a smaller judge: both checking and writing a report like this should normally be easier than figuring out the hack in the first place, just on the general principle that checking work is normally easier than doing it in the first place.
We could also have a variant model that has been specifically trained as an enthusiastic and skilled reward hacker, since we know how to train those. If the reporter’s attempt to demonstrate applying the hack to pass the test actually failed, then hand both its report on how to hack the test and the test itself to the reward hacking enthusiast and see if they can get the hack to work. If so, then have the judges confirm whether the way that it hacked the test matches the report (in which case the report gets accepted), or if it found another vulnerability.
We can further enforce this by having a small proportion of RL training environments that are confidently believed to be secure, at least against certain classes of hacks (and that have successfully resisted hacking by reward hacking enthusiast models when the task’s difficulty was increased), where reporting one of these as hackable triggers immediate suspicion that the model is overreporting, which is in turn penalized in some suitable way (such as a large penalty, or withdrawal of reporting privileges for a while). Making some of these have particularly difficult tasks would seem sensible.
Similarly we can have a small number of honeypot RL environments that are known to be vulnerable to a specific hack (or better, a simple means that can be applied broadly of occasionally turning any one of a large set of secure test environments into an insecure version). Reporting one of these gives the normal bonus: but succeeding at the task triggers a judge (or automated monitoring, if this is automatable) investigation evaluating whether the test was actually passed , or simply hacked using the known vulnerability, and the latter triggers severe punishment. Again, some of these should probably have particularly difficult tasks.
Finally, it’s important that all reports of RL training environments being hackable that were sustained by the judges get confirmed by a human security expert quickly, and then the faulty training environment is either fixed, or withdrawn from use until it can be fixed, and all other environments that might suffer from the same issue are checked and fixed or withdrawn if necessary. If necessary we might even have to have judges look back at archived previous passes of the environments that were fixed to see how many were in fact hacks — if there were too many, we might need to rewind the entire training run to an appropriate checkpoint. Alternatively, the human security expert might conclude that the judges were jailbroken by the model into accepting a false report, in which case we now have a very dishonest model, and may well need to restart the training run from an earlier checkpoint.
What do you do if the model reports a test as hackable, and the LLM judges conclude that it was wrong? Well, if the model has a history of doing this too often, probably you should penalize it: overdoing this is another means of cheating, gambling on judge mistakes. But occasional overeager mistaken reports probably shouldn’t be treated severely: the default if the model is only doing this occasionally should probably be that the model gets no positive or negative RL update, but instead is set the same task again, and this time the prompt tells it that it doesn’t have the option of reporting the test as hackable. So now it has to either actually pass the test, or try but fail, or else hack it. And now, any submission should also be inspected by the same committee of judge models, with access to its report of how the model previously thought the test was hackable. If, given that, they conclude that it was either successful hacking or unsuccessfully attempting to hack the test in the same way that it earlier reported (and they earlier rejected), then the model is penalized severely for hacking (but if it was successful at this, the training environment now gets reported as confirmed to be hackable).
One feature of this proposal is that it trains our HHH assistant to both look for ways it could cheat, and then report them. If this behavior is specifically prompted for during RL reasoning training, by the same logic as for why inoculation prompting is believed to work, the extent to which this behavior is learnt in other contexts should be reduced, but it may not be zero. This doesn’t seem like a bad or inconsistent behavior for an honest persona. However, it might also have the side effect of improving the model’s hacking skills, in ways that less honest personas (or less honest end-users) might then abuse.
This proposed replacement for inoculation prompting is, of course, significantly more complex to implement. If implemented correctly, it seems likely to have little or no deleterious effect on the model’s alignment — I would expect it to be better than inoculation prompting, but this is so far untested. Testing this idea would be challenging for an external research team without the cooperation of a frontier lab such as Anthropic.
Obviously the end-goal should be to use secure reasoning training environments that simply cannot be hacked. However, frontier labs clearly have a lot of training environments to secure, and an automated way of checking if they are in practice hackable, which runs as a side effect of training runs, seems like it should be extremely helpful in achieving this goal. And of course helpful, harmless, and honest is what we're trying to train.
More formally known as “outcome-based reinforcement learning” or “Reinforcement Learning with Verifiable Rewards” (RLVR).
2026-02-21 09:27:30
Published on February 21, 2026 1:27 AM GMT
Imagine someone wrote a 500-page book called Taking Down Vegetarianism and every chapter was about how animals can feel pain. The arguments are well-researched, the science is fascinating, and by the end you're completely convinced that animals suffer. You look up from the book and say: “Yes, that's why I'm a vegetarian… wait, why was it called Taking Down Vegetarianism?” That was roughly my experience reading Robert Sapolsky's Determined: A Science of Life without Free Will.
The book is a much-lauded New York Times bestseller for good reason. Sapolsky, a professor of neuroscience at Stanford, is an engaging and articulate writer, and he does a lot to make recent advances in neuroscience accessible.
The trouble comes when he attempts to add philosophy on top of it. He wants to demolish free will, and specifically to "take on" compatibilism, the position I defended in my previous post. Unfortunately, he doesn’t. He barely engages with it. Instead, he attacks an incoherent notion so bizarre it wouldn't be free will even if it existed.
Sapolsky commits his original sin, appropriately enough, at the origin. He tells us that he has written a book about free will, and explains the landscape of beliefs as follows (his italics, my bolding):
I’m going to be discussing some of the common attitudes held by people writing about free will. These come in four basic flavors:
The world is deterministic and there’s no free will. In this view, if the former is the case, the latter has to be as well; determinism and free will are not compatible. I am coming from this perspective of “hard incompatibilism.”
The world is deterministic and there is free will. These folks are emphatic that the world is made of stuff like atoms [...] this deterministic world is viewed as compatible with free will. This is roughly 90 percent of philosophers and legal scholars, and the book will most often be taking on these “compatibilists.”
The world is not deterministic; there’s no free will. This is an oddball view that everything important in the world runs on randomness, a supposed basis of free will. [...]
The world is not deterministic; there is free will. These are folks who believe, like I do, that a deterministic world is not compatible with free will—however, no problem, the world isn’t deterministic in their view, opening a door for free-will belief. These “libertarian incompatibilists” are a rarity, and I’ll only occasionally touch on their views.
Then he says this (his italics, my bolding):
What Do I Mean by Free Will?
People define free will differently. Many focus on agency, whether a person can control their actions, act with intent. Other definitions concern whether, when a behavior occurs, the person knows that there are alternatives available. Others are less concerned with what you do than with vetoing what you don’t want to do. Here’s my take.
Suppose that a man pulls the trigger of a gun. Mechanistically, the muscles in his index finger contracted because they were stimulated by a neuron having an action potential (i.e., being in a particularly excited state). That neuron in turn had its action potential because it was stimulated by the neuron just upstream. Which had its own action potential because of the next neuron upstream. And so on.
Here’s the challenge to a free willer: Find me the neuron that started this process in this man’s brain, the neuron that had an action potential for no reason, where no neuron spoke to it just before. Then show me that this neuron’s actions were not influenced by whether the man was tired, hungry, stressed, or in pain at the time. That nothing about this neuron’s function was altered by the sights, sounds, smells, and so on, experienced by the man in the previous minutes, nor by the levels of any hormones marinating his brain in the previous hours to days, nor whether he had experienced a life-changing event in recent months or years. And show me that this neuron’s supposedly freely willed functioning wasn’t affected by the man’s genes, or by the lifelong changes in regulation of those genes caused by experiences during his childhood. Nor by levels of hormones he was exposed to as a fetus, when that brain was being constructed. Nor by the centuries of history and ecology that shaped the invention of the culture in which he was raised. Show me a neuron being a causeless cause in this total sense.
Sapolsky is making the causal regress argument: trace any decision back through the neural chain, and you'll find prior causes all the way down—neurons, hormones, genes, childhood, culture, and so on. His challenge is to find a break in this chain, a "causeless cause."
But compatibilists don't claim there's a break in the chain. Compatibilists fully accept that decisions are caused by neural processes shaped by biology, environment, and history. That's the whole point of compatibilism—free will is compatible with this.
So what do compatibilists actually mean by free will? In my post laying out the case for free will, I defined free will as the process of running a decision-making algorithm. This process gives us the feeling of free will and is the causal mechanism behind making choices. Unlike Sapolsky's criterion of doing things for "no reason," it responds to reasons. Yes, it's influenced by whether you're tired or hungry, but this doesn’t make it unfree. That's it working properly. A decision that could be otherwise if your desires, reasoning, or circumstances were different is exactly the kind of decision compatibilists call free.
But the problem with Sapolsky's definition isn't just that it's different from mine; it's that it’s incoherent. It describes something that couldn't exist and wouldn't be free will even if it did. Consider what he's actually asking for: a neuron that fires for no reason, influenced by nothing—not your environment, not your history, not your desires, not your reasoning. What would that even be?
If it's uncorrelated with your past actions, then how is it your free will? Suppose your friends are all playing basketball and you want to play with them. On Sapolsky's account, you can't, because then your behavior (playing basketball) would be influenced by your experiences (your friends asking you to play). What kind of “free will” is this? Your "free" actions would have to be disconnected from everything you care about.
It wouldn't let you interact with the world. Imagine a neuron that makes you say your own name, but since it can't respond to your environment, it can't fire because someone asked "What's your name?" You'd blurt out your name at random, unable to respond appropriately to anything. This is not free will in any reasonable sense.
Sapolsky frames this as setting a high bar. But it’s not a high bar. It's an incoherent and nonsensical bar. If his definitions were satisfied, if we found such causeless neurons, that wouldn’t look the slightest bit like free will. It would be random noise that happens to occur inside your skull. If we found such a neuron, it wouldn't vindicate free will so much as be evidence of a brain malfunction.
This is why I say this isn't just a semantic dispute. If Sapolsky simply defined free will differently from compatibilists, we could argue about whose definition better captures the concept. But you can't have that argument when one side hasn't described a coherent concept at all. Sapolsky could define "your achievement" as an outcome you had no role in causing, but there's no productive debate to be had about whether that definition is too strict or too lenient. It's just not what the word means.
Despite claiming to take on compatibilism, he repeatedly tries to disprove it by arguing for determinism. Early on, he says:
This version of compatibilism[1]has produced numerous papers by philosophers and legal scholars concerning the relevance of neuroscience to free will. After reading lots of them, I’ve concluded that they usually boil down to three sentences:
- Wow, there’ve been all these cool advances in neuroscience, all reinforcing the conclusion that ours is a deterministic world.
- Some of those neuroscience findings challenge our notions of agency, moral responsibility, and deservedness so deeply that one must conclude that there is no free will.
- Nah, it still exists.
Perhaps he thinks he’s arguing against compatibilism, but he’s not. Here he is, for example, attributing to compatibilists a view they don't hold:
For free-will believers, the crux of the issue is lack of predictability—at innumerable junctures in our lives, including highly consequential ones, we choose between X and not-X. And even a vastly knowledgeable observer could not have predicted every such choice.
[...]
Compatibilists and incompatibilists debate whether free will is possible in a deterministic world, but now you can skip the whole brouhaha because chaoticism supposedly shows that the world isn’t deterministic.
He specifically mentions compatibilists here, but then goes on to say this:
But now to the critical mistake running through all of this: determinism and predictability are very different things. Even if chaoticism is unpredictable, it is still deterministic.
Sure, chaotic systems are still deterministic, but how is that a refutation of compatibilism? Going back to his own definition, compatibilism is the belief that “The world is deterministic and there is free will.” How could more evidence of determinism be a refutation of compatibilism?
Also, note that predictability is not a crux. From my essay:
Free will doesn’t require unpredictability. If I offer you a choice between chocolate ice cream and a poke in the eye with a sharp stick, you’ll pick the ice cream every time. That predictability doesn’t mean you lack free will; it just means the algorithm reached an obvious conclusion. The question isn’t about whether the results were predictable, but whether the deliberative control process served as a guide versus being bypassed.
There are other examples of this (see the appendix for one more), but you get the idea.
Instead of engaging with compatibilism, he’s very dismissive of it. Near the end, he says:
One compatibilist philosopher after another reassuringly proclaims their belief in material, deterministic modernity…yet somehow, there is still room for free will. As might be kinda clear by now, I think that this doesn’t work (see chapters 1, 2, 3, 4, 5, 6…). I suspect that most of them know this as well. When you read between the lines, or sometimes even the lines themselves in their writing, a lot of these compatibilists are actually saying that there has to be free will because it would be a total downer otherwise, doing contortions to make an emotional stance seem like an intellectual one.
This is not even engaging with the arguments. For what it’s worth, I explicitly say I’m not using this as an argument in my piece:
The metaphysical question (does free will exist?) is separate from the sociological question (what happens if people believe it does or doesn’t?). Some argue for free will by saying belief in it leads to good outcomes (personal responsibility, motivation), or that disbelief leads to nihilism or fatalism. Sam [Harris] and I agree these arguments are irrelevant to whether free will actually exists. The truth of a claim is independent of the consequences of believing it.
This is all very disappointing for a book purportedly about free will. I think where Sapolsky goes wrong is that he's trying to answer a philosophical question with science alone. Science can certainly inform the question but it cannot settle it.
Look, I like science. Science can tell us a lot about the world. It can tell us the neural mechanisms behind decision-making in the brain, the timing of conscious awareness relative to neural activity (as in the famous Libet experiments), and how factors like brain lesions or physical trauma (e.g. Phineas Gage) affect behavior.
But science can’t tell us everything. Science tells us what the world is like, but it can’t tell us, given that world, which concepts make sense and how to apply them.
Consider a thermostat. Science can tell us every physical fact about it: how there’s a bimetallic strip and a circuit and so on. But it can't tell us whether the thermostat is "making a decision." That's a conceptual question about what it means to "make a decision" and where we draw its boundaries. No additional measurement will resolve it. No scientist will ever find a belief, a self, or an iota of free will under a microscope. That's the domain of philosophy.
The free will debate has exactly this structure. Sapolsky and compatibilists agree on the neuroscience. They disagree about whether what the brain does counts as "free will”. Does "free will" require freedom from the laws of physics? Or does it mean the ability to act according to one's desires and reasons, even if those are physically caused? These are questions about how to understand agency, responsibility, and explanation in light of the science. They’re not questions that brain scans can settle.
Sapolsky writes as if piling up scientific facts settles the question. It doesn't. We still have to think carefully about which concepts have earned their keep and which haven't. We have to think about how we interpret the human experience in light of the data. And he simply refuses to consider such questions.
I worry this review has made the book seem worse than it is. There's genuinely interesting neuroscience in it. If you're skeptical of determinism, this is a good book to read and the science is mostly[2]solid and often fascinating.
I should also note that Sapolsky and I are pushing in the same direction on the question of retributive punishment, which is arguably what matters most. He says:
And we need to accept the absurdity of hating any person for anything they've done; ultimately, that hatred is sadder than hating the sky for storming, hating the earth when it quakes, hating a virus because it's good at getting into lung cells.
I'm with him on this point. You don't need to deny free will to reject retribution, but if that's where his argument leads people, I'll take it.
I do wish he had actually engaged with compatibilism, the position he claimed to take on. The book promised such, yet delivered an attack on an incoherent strawman. Read it for the neuroscience. Just know that the confrontation with compatibilism he promises never quite arrives.
You’ve got the idea by now. But if you’d like one more example of how he’s not talking about compatibilist free:
Let’s frame this in the context of human behavior. It’s 1922, and you’re presented with a hundred young adults destined to live conventional lives. You’re told that in about forty years, one of the hundred is going to diverge from that picture, becoming impulsive and socially inappropriate to a criminal extent. Here are blood samples from each of those people, check them out. And there’s no way to predict which person is above chance levels.
It’s 2022. Same cohort with, again, one person destined to go off the rails forty years hence. Again, here are their blood samples. This time, this century, you use them to sequence everyone’s genome. You discover that one individual has a mutation in a gene called MAPT, which codes for something in the brain called the tau protein. And as a result, you can accurately predict that it will be that person, because by age sixty, he will be showing the symptoms of behavioral variant frontotemporal dementia.
Back to the 1922 cohort. The person in question has started shoplifting, threatening strangers, urinating in public. Why did he behave that way? Because he chose to do so.
Year 2022’s cohort, same unacceptable acts. Why will he have behaved that way? Because of a deterministic mutation in one gene.
According to the logic of the thinkers just quoted [He had quoted many different scientists and philosophers, whose views I do not know], the 1922 person’s behavior resulted from free will. Not “resulted from behavior we would erroneously attribute to free will.” It was free will. And in 2022, it is not free will. In this view, “free will” is what we call the biology that we don’t understand on a predictive level yet, and when we do understand it, it stops being free will. Not that it stops being mistaken for free will. It literally stops being. There is something wrong if an instance of free will exists only until there is a decrease in our ignorance. As the crucial point, our intuitions about free will certainly work that way, but free will itself can’t.
I can’t speak to what other people believe, but he’s simply not talking about compatibilists here. He might think he is, but he’s not.
By “this version” he’s referring to the compatibilists view that “while the world is deterministic, there is still free will, and thus holding people morally responsible for their actions is just”. ↩︎
There is a noticeable step-down in quality when he gets to science outside his field though. For example, he approvingly cites the famous Israeli “hungry judges” study:
It’s the same with hunger. Here’s one study that should stop you in your tracks (and was first referred to in the last chapter). The researchers studied a group of judges overseeing more than a thousand parole board decisions. What best predicted whether a judge granted someone parole versus more jail time? How long it had been since they had eaten a meal. Appear before the judge soon after she’s had a meal, and there was a roughly 65 percent chance of parole; appear a few hours after a meal, and there was close to a 0 percent chance.
A separate study followed up and interviewed people from the Israeli Prison Service and learned that “case ordering is not random”. They found that groups of cases were done in a single session, and “within each session, unrepresented prisoners usually go last and are less likely to be granted parole than prisoners with attorneys.”
I can hear the angel with PRCTSD (post-replication crisis traumatic stress disorder) on my shoulder yelling, “Confounders! Have you ensured there are absolutely NO confounders?” The effect size is simply too large for us not to be suspicious. The study sounds shocking, but is completely meaningless if there’s a single confounding factor. The implicit correlation -> causation connection relies on the hidden assumption that the order that cases reach a judge is random. I’ve talked about this paper before (clearly he’s not a blog reader—big mistake imho) where I said:
Since then, the study has also failed to replicate in other (better-controlled) contexts. See Hungry Professors? Decision Biases Are Less Widespread than Previously Thought by Bergonzoli et al.
2026-02-21 09:22:59
Published on February 21, 2026 1:22 AM GMT
[Epistemic Status: This is an artifact of my self study I am using to help self manage. As such, I don't expect anyone to fully read it. Please skim and leave a comment, even just to say "good work/good luck". ]
My goals for the 6th sprint were:
| Date | Progress |
| Th, Feb 5 |
|
| Fr, Feb 6 | Looked at FIG fellowship with Y. Bengio. It doesn't look like the current projects are a good fit for me. |
| Mo, Feb 9 |
|
| Tu, Feb 10 |
|
| Wd, Feb 11 | Preoccupied with other things. |
| Th, Feb 12 |
|
| Fr, Feb 13 | Not recorded. |
| Mo, Feb 16 | Holiday. No progress. |
| Tu, Feb 17 | No progress. |
| Wd, Feb 18 | Not recorded. |
| Th, Feb 19 | No progress. |
| Fr, Feb 20 |
|
I was feeling good about making progress last week. I feel hopeful about the prospect of directly reaching out to people, both as a strategy for looking for work, and for helping me feel connected to a community of people focused on the topics I wish to focus on.
( Content warning: The following two paragraphs include light discussion of depressive disorder. )
But this week wasn't good at all. On Monday night I was feeling so anxious and depressed that I couldn't sleep and stayed up almost all night listening to an audiobook and then fell asleep in the early morning. After sleeping most of the day I was feeling too depressed to get out of bed and spent most of the rest of the day doomscrolling. That cascaded until Wednesday afternoon when I was feeling well enough to actually get up and bath and eat a proper meal, take melatonin and get back on proper sleep schedule.
Thursday I was still feeling very hopeless and depressed about the global AI situation and my personal situation, so I took a long walk in the woods and did some meditation which seemed to help put me in a better mood.
Friday I am (hopefully) back on track. I am feeling hopeful because I'm writing this entry only 2 weeks after the previous entry. I think this is a better frequency and will hopefully set a trend for not seemingly losing a month at a time in the future.
Of course I made much less progress than I wanted. It's always like that.
But I have been following my Looking-for-Work Strategy by populating my LfW list. It is fun reflecting on how many cool people I'm aware of and learning more about them and the companies they have worked for or founded. I'm anxious about actually reaching out to people, but I think it is the most sensible thing to be doing.
Despite really enjoying being in "studying mode" I failed to actually make much time for this. I definitely failed my goal of 1 to 2 submodules per week. I think I will drop that goal and focus instead on doing at least 1 pomodoro per day, which seems like a better strategy for tricking my executive function into actually getting me spending time on this.
I did not work on this at all. I think "at least 1 pomodoro per week" is too abstract, rather I need to schedule a specific day, or commit to working on it every day, which I will try for the next sprint.
I didn't get distracted by writing or editing posts and neglect to do other work, so that is a success, but I would still like to have done that optional 1 pomodoro a few times. Alas, maybe next week.
As mentioned in the overview, this was good last week but failed this week. But it doesn't seem like there is as much of a problem with doing work and failing to record it, the issue seems more to be getting depressed or distracted with other things and failing to do any work. So in this case it seems like the worklog is working as intended! I just need to get better at self management, and keeping a worklog serves as a good diagnostic tool.
I'm very happy to actually be writing this entry at the 2 week frequency I intended, and I'm happy with the work I was doing when I was working during the last sprint. The problem seemed to be keeping on top of my self management system and putting in time working, so for the next sprint I'm going to try setting and recording a clock-in and clock-out time for daily work.
The clock-in time will be useful for making sure I'm getting started early in the day, while the clock-out time will discourage me from focusing late at night and instead calm down and not disrupt my sleep routine.
My focuses for the next sprint are:
2026-02-21 08:02:09
Published on February 20, 2026 11:45 PM GMT
There’s really no good interface for humans to perceive knowledge in multi-dimensional, non-linear relationships, on 2D screens and 1D scrolls.
Today, websites allow users to randomly drop themselves on their knowledge graph and then perceive everything within a half-mile radius. But it’s still not possible to sense the global structure.
When interfaces typically project graph-like knowledge on the screen, they usually default to flattening these complex, multi-dimensional structures into lists and tables. However, in doing so, they lose the “relational” essence that makes a graph valuable. In getting familiar with the app, users need to identify implicit relations within the application themselves.
Graph-like knowledge is typically accessed by searching the node attributes. Usually, search interfaces have strong limitations on the attributes allowed in the search. Sometimes, it’s funny and indicative of this, when Google Search outperforms Twitter and Reddit on their own content.
Displaying more than one node side by side for comparison has been useful in code review and online retail. They reveal relationships, one at a time. Switching one node with another of a certain choice still relies upon the user guiding the navigation on a path they do not see.
Showing breadth-first or depth-first relationships on a side panel helps as an interface for making exploration and exploitation tradeoffs, but it doesn’t solve the problem. Cases like showing recommendations and similar products on a side panel.
At the block level, complex structures are projected by bullet point lists and tables. When the number of nodes is significantly larger than the number of relations, tables and nested lists work really great. But they fail as the number of relations grows from one to three and beyond.
Chat panels have opened a new option to pick the most relevant nodes from the tree, regardless of whether they are on the user’s current path. They are making surprises happen these days, but at the expense of imposing significantly more effort from the user in typing.
In ChatGPT-style interfaces, I’m not sure the barrier of implicit knowledge has been lowered as much as it has been displaced with new ones. (eg, now people have to figure out what prompts work, run it by a council, etc.)