2026-04-04 11:56:17
In my previous post I found evidence consistent with the scratchpad paper's compute/store alternation hypothesis — even steps showing higher intermediate answer detection and odd steps showing higher entropy along with results matching “Can we interpret latent reasoning using current mechanistic interpretability tools?”.
This post investigates activation steering applied to latent reasoning and examines the resulting performance changes.
I use the publicly available CODI Llama 3.2 1B checkpoint from Can we interpret latent reasoning using current mechanistic interpretability tools?
Getting the average hidden state from each latent vector and using the difference between latent vector A and B to steer the hidden states.
Since codi uses the kv values on eot token. To get new kv values that contain the info from the steered vector I needed to steer latent 1 -> run codi for one additional latent and then get the kv values of latent 2 and see the output.
Steering the KV kache and adding the steered KV kache directly onto the codi model. Directly adding average difference in kv values to past_key_values.
PROMPT = "Out of 600 employees in a company, 30% got promoted while 10% received bonus. How many employees did not get either a promotion or a bonus?"
Answer = 360
Tuned Logit Lens properties:
Default Tuned |
Default |
|
|
|
The following is the answer frequency for the GSM8K data used to train the tuned logit lens
This prompted me to revisit my previous results using a tuned logit lens trained only on latent 3. Notably, 'therefore' still appears only on odd latents, even with this different prompt.
Across all coefficient values tested, the steering was applied to latents 1–4, with one additional latent step run afterward to obtain updated KV values. The steered models seem to consistently underperform the baseline of no steering until the later latents match the performance of random vector patching “Can we interpret latent reasoning using current mechanistic interpretability tools?”. This might be the case because steering acts the same as random vector patching as the average difference vector might be too noisy to encode meaningful directional information.
Positive Coefficients |
Negative Coefficients |
A1-B2 |
|
B4-A1 |
|
A1-B5 |
|
A2 - B3 |
|
|
A2-B4 |
|
A3-B5 |
|
A4-B5 |
|
B6-A1 |
|
B6-A4 |
|
B6-A5 |
For negative coefficients A1-B4, A1-B6, A4-B6, A5-B6 performed better than the baseline after the steering for latent 5 a common pattern is with negative coefficient after steering performed significantly better than the baseline for latent 5
The positive latents performed better than the baselines on A1-B2, A1-B5, A2-B3, A2 - B4, A2-B4, A3-B5, A4-B5.
No clear pattern emerges from the activation difference logit lens. The first image is of default logit lens the second image is tuned logit lens the axis is on the y it is latent A on x latent B the activation difference is vectors A - B and the logit lens was done on the differences mean activation A and B for the different layers of the model.
2026-04-04 11:34:57
I've spent a fair amount of time trying to convince people that this AI thing could be quite large and quite dangerous. I think I normally have at least some success, but there is a range of responses, such as:
Of these, 1 is the appropriate emotional reaction[1] to fully absorbing and believing the arguments[2]. This is what it looks like when you take an argument, process it with the deeper reaches of your brain, turn it into something that fundamentally changes your world model and start trying to adapt.
As far as I can tell, our emotional responses are mostly connected to our System 1 thinking. This makes them harder to influence than just changing your mind. You can change your opinions, but that doesn't mean you will get it on a gut level.
I think I have a solution. In particular, visualisations. I don't know if this works for everyone, but I have personally found it helps me both stay more aligned to the cause and increase my motivation. I believe this is basically due to the fact that your system 1 needs to get the stakes to achieve complete alignment.
Note that in the particular case of AI safety, if you want to remain emotionally sane, it is potentially best not to go through this exercise (like genuinely, please skip it if you're not ready; I do it half-heartedly, and it can be painful enough).
As an example, we can take Yudkowsky's "a chemical trigger is used to activate a virus which is already in everyone's system". Close your eyes. You're at home, in your usual spot. Picture it in detail: the lights, the sun shining through the windows, the soft sofa. You're having a drinks party tonight and you've invited your best friends to come and join you. As the guests arrive, you greet each of them in turn, calling them by name and showing them in.
And then it triggers. See each one of them in your mind's eye collapse, one by one. Hear each of them say their last words. Add any details you think make it more plausible.
My brain writhes and struggles and tries to escape when I attempt this exercise. It's painful. It's emotional. Which is the point.
In the normative sense of "if you care about the world and would rather it doesn't get ruined by a superintelligence, and would rather it doesn't kill everyone you know and are actually processing this on a deeper level, this is what your reaction will probably look like as an ordinary human being."
I don't think you should have any particular emotional response if you go from not believing AI will kill everyone to still not believing that AI will kill everyone.
Which become quite samey after the 378th time of hearing "but can't you just turn it off?"
2026-04-04 11:00:26
(with apologies to Sasha Shulgin)
Gabapentinoids are weird.
For a start, they don’t do what they say on the tin. It was named after the thing the inventors thought it would do, i.e. bind to and modulate GABA receptors, the ones which cause sedation and anxiolysis. But they have no activity at these receptors. Intuitively then they wouldn’t have an effect on sleep or anxiety.
They also don’t bind to dopamine receptors — you would think then that they wouldn’t be helpful for psychosis (most antipsychotics antagonise dopamine receptors).
And they don’t bind to opioid receptors, so they’re surely not useful for treating pain.
But they do! Gabapentinoids are prescribed for sleep, anxiety, bipolar disorder, and epilepsy, as well as neuropathic pain and restless legs syndrome.
Gabapentinoids bind to the α2δ protein, a subunit of voltage-gated calcium channels (hence their alternative name of α2δ ligands). Usually the concentration of calcium ions outside the cell is thousands of times higher than inside; these channels respond to a voltage by opening and allowing calcium to flood in. Depending on the cell they’re attached to, this can cause muscle contraction, neuronal signalling, and protein synthesis.
Specifically they bind α2δ-1 and α2δ-2, but only exert their effect through the former (as proven by trials on α2δ-2-knockout mice). There seems to be an as-yet undiscovered natural ligand for α2δ-1 and -2 which binds to the same site as gabapentinoids.
Importantly they don’t block calcium channels — instead they inhibit the release of monoamines (serotonin, norepinephrine, dopamine) and substance P1 triggered by calcium influx. They also inhibit calcium channel-dependent release of glutamate and glycine in various brain tissues.
There are states in which calcium channels become ‘sensitized’, such as in the case of neuronal injury, and gabapentinoids might selectively work in these conditions.
As they don’t simply block calcium channels, they have big advantages over drugs that do — they only minimally change synaptic function, unlike calcium channel blockers. They can essentially restore ‘normal’ functioning in overexcited calcium channels while leaving healthy ones alone.

Anticlockwise from top: gabapentin, leucine, isoleucine
Gabapentinoids have a suspicious structural similarity to leucine and isoleucine, two amino acids. Radiolabelling these amino acids shows they also bind the α2δ protein, and L-isoleucine blocks certain effects of gabapentinoids, suggesting they compete for binding at the same site.
Some people have reported relief of their restless legs syndrome from acetylleucine, a leucine analog, which suggests it’s acting in a similar way to gabapentinoids (Fields 2021). Curiously this drug is very hard to find except in France, where it’s sold over-the-counter.
Unlike lots of drugs, gabapentinoids seem to be actively transported into the body by LAT1, the large neutral amino acid transporter.
This is a disadvantage over other drugs, because it limits how much and how quickly gabapentin can be absorbed. Gabapentin often has to be taken multiple times per day to avoid saturating these transporters. It also competes with other amino acids (the ones above) for these transporters.
Pregabalin, another gabapentinoid, is superior here because it is transported by other carriers, not just LAT1, so its uptake doesn’t saturate in the same way. It binds α2δ much more strongly than gabapentin, and in animals is more potent as an analgesic and anticonvulsant.
Even weirder: Eroglu 2009 found that α2δ-1 is a neuronal receptor for thrombospondin, a molecule which promotes synaptogenesis in astrocytes. Specifically, it forms part of a larger signalling system. It acts as the extracellular receptor for a “synaptogenic signalling complex”; when thrombospondin binds, it causes a cascade of events which switches on this complex and leads to the start of synapse development.
As gabapentinoids also bind to this protein… does that mean they reduce synaptogenesis? In vitro, yes: gabapentin powerfully blocks synapse formation. Though this sounds slightly terrifying it’s also probably an important mechanism for gabapentinoids’ effects in epilepsy and neuropathic pain — synapse formation can be triggered by neuronal injury in these conditions and might well contribute to the pathology of these conditions (although this is uncertain).
It’s worth noting that gabapentin and thrombospondin, while both binding to the same protein, don’t bind to the same part of that protein.
(It’s kind of nuts that it took decades for one of the key mechanisms of action for this class of drug to be discovered. Makes you wonder what else we don’t know, about gabapentinoids and other drugs.)
Worryingly this suggests that gabapentinoids might affect the normal formation of synapses. Could this cause other deficits, such as in memory formation?
Behroozi 2023 attempted to test this and did not find an effect, although they were looking specifically at improvements in memory formation.
Gabapentinoids can certainly cause brain fog and slower processing. Eghrari 2025 also found an increased risk of cognitive impairment and dementia in patients with chronic low back pain prescribed gabapentin; when stratified by age, patients taking gabapentin had twice the risk of dementia and mild cognitive impairment. This risk was further increased in patients who had taken gabapentin more throughout their lives. Presumably this effect would also extend to pregabalin.
Gabapentinoids are frequently prescribed off label (when a doctor prescribes a drug outside of the conditions for which it’s approved). Not necessarily a bad thing: doctors use their discretion to decide when to do this, and for a drug with as broad a therapeutic profile as gabapentinoids it doesn’t seem wholly surprising.
But there are strict rules around advertising a drug for this sort of thing, or pushing doctors to prescribe it off label. The drug is approved for specific conditions and drug companies (in countries where they’re allowed to advertise) can only push for it to be prescribed for these conditions.
Their subsidiary Parke-Davis promoted Neurontin (gabapentin) for at least eleven unapproved conditions, flying doctors to lavish retreats, paying kickbacks, and commissioning ghostwritten journal articles. Off-label prescribing accounted for 78% of Neurontin sales.
Pfizer pleaded guilty to criminal charges and paid $945 million in settlements. In a separate 2009 case, they paid a further $2.3 billion for off-label marketing of several drugs including Lyrica (pregabalin).
Separately, top pain researcher Scott Reuben admitted to fabricating data in at least 21 studies – including Pfizer-funded trials of Lyrica – without ever enrolling a single patient. He was jailed in 2010.
More controversial. One epidemiological survey looked at a cohort of individuals before and after they were prescribed gabapentin, and found no increase in suicidality, as well as a reduction in suicide attempts in psychiatric patients (Gibbons 2011). A large Swedish cohort study found a significant increase in suicide – but only for pregabalin, and not gabapentin (Molero 2019).
It’s not clear why this would be the case, as the drugs work in exactly the same way (as far as we know). In fact, pregabalin was found to increase suicidal behaviour/deaths from suicide, unintentional overdoses, head and body injuries, road traffic accidents and offences, and arrests for violent crime, where gabapentin had no or almost no effect (and actually reduced road traffic incidents and arrests).

The obvious explanation is that pregabalin is simply more powerful, both due to the pharmacokinetic gap described above and because it binds α2δ much more strongly. The highest doses of gabapentin simply can’t compete with the highest doses of pregabalin.
Certainly for some people they are. Gabapentinoids are notorious for diversion, where people score prescriptions and then sell the drugs on. The prescription rate for these drugs in prisons is double that of the general population. In some ways pregabalin is the drug of choice in UK prisons. In France, 81% of recreational teenage pregabalin users reported to poison control centres were homeless or living in migrant shelters (Dufayet 2021).
This should surprise us; they don’t have any dopaminergic or opioidergic activity, so they don’t tick the obvious addictive drug boxes. Nonetheless, some people clearly find them enjoyable, with effects somewhat similar to alcohol/benzodiazepines, and develop dependence on them.
This might explain why there are more than 50 million gabapentinoid prescriptions issued every year in the US alone.
In the hilarious Drug User’s Bible, in which the author takes basically every drug imaginable, there’s this snippet from taking 300mg pregabalin (which is a hefty dose, users are started on 75mg typically):
I totally underestimated this drug. I am basically zombified and largely mistuned to what is going on around me, which appears to be distant. My hands are numb and I am, essentially, stupefied, with head spinning.
This one was a shock. I clearly took far too much and paid a price in terms of a strong intoxication which at times was extremely uncomfortable.
Gabapentinoids are weirder than I had realised.
Of course, the conditions they are prescribed for are horrific – anxiety, chronic pain etc can be a living hell, and a drug which effectively treats them is miraculous. But it’s wild that it took us decades to actually understand the first thing about how these drugs work.
And the effects on synaptogenesis, unknown effects on memory, increased risk of various kinds of death and dangerous behaviour (in the case of pregabalin), huge abuse in prisons and migrant shelters, and increased risk of cognitive deficits and dementia should probably worry us given how widely they’re prescribed.
2026-04-04 10:50:06
I've played a lot of dance weekends over the years [1] and if I could change one thing it would be no more challenging sessions. I see it happen every time: it's a great crowd of people, with a wide range of experience levels, and Saturday afternoon is going well. Then it's time for the challenging / advanced / experienced session. What happens? The dances are too hard for the crowd and it's not fun.
The callers had already been selecting dances that worked well for the group, which meant material that was interesting but not a struggle. Push the difficulty up from there, and what gives? You can take longer teaching, perhaps four minutes instead of two, which lets you explain material that's a bit harder, but only a bit and at the cost of a lot more talking. You can call no-walkthroughs, medleys, or even hash, but at most dance weekends you can get away with that at a regular session (and if you can't it won't work at a challenging session either). Or you can call material that's too hard for the crowd, and it falls apart in places.
To go well, challenging sessions can't just be a matter of picking harder dances, they require a group of dancers who are up to the challenge. This can work as a one-off event or even a whole weekend, where you communicate clearly what people should expect and people can self-select. It can work at a festival where you have multiple tracks and people can easily choose something else. But none of this applies to most dance weekends, since they only have one hall.
I think the desire for challenging sessions comes from two places. One is that some people just really like challenging dances, and I think the best you can do there is challenging-specific events. The other, though, and I think this is a bigger factor, is that a whole weekend of contra dancing can be a lot of the same. So if you're looking for ways to add some interest to the schedule without forcing the caller to choose between "that's not actually challenging" and "it's not fun when the dances fall apart", some ideas:
Teaching sessions, where the caller focuses on demonstrating a new skill. There are tons of possibilities here, including how to help a lost neighbor, role swapping, partner swapping, flourishes, swing variations, momentum and weight, and supporting other dancers in and out of moves.
Games sessions, where the caller has you do something unusual but also fun and educational. One session might include, sequentially, some dancers leaving the hall for the walkthrough, pool noodles, blindfolding, ghosts, sabotage and recovery, and teaching a different 1/4 of the dance to each 1/4 of the dancers.
A session of Chestnuts, Squares, Triplets, Triple-minors, or a mix of different unusual formations.
Early morning family dance with acoustic open band.
A "marathon" session, where you medley one dance after another and people typically drop out every so often to rest and swap around. Make sure you coordinate with the band(s) to ensure this is something they'd be up for playing for; it's not the default deal.
Play with tempo. Show the dancers what tempos from 104 to 128 feel like, and try the same dance at multiple tempos. Practice dancing spaciously at slow tempos, and with connected and efficient movement at fast ones.
You might notice I didn't include themed sessions like "flow and glide contras" or "well-balanced people". The variation in feeling from one dance to the next is key to keeping contra dance interesting, and while sessions that explore just one area still work, I personally think they're much less fun.
[1] I count 70: 54 with the Free Raisins and 16 with Kingfisher.
Comment via: facebook, lesswrong, mastodon, bluesky
2026-04-04 10:20:09
This year's Spring ACX Meetup everywhere in Shenzhen.
Location: We'll meet up right outside the Shenzhen Bay Kapok Hotel. There is a large open space with a huge set of stairs ~20 meters to the right of the Hotel (assuming that you're facing the hotel entrance). The Hotel itself is located at No. 3001 Binhai Avenue, Nanshan District, Shenzhen, and can be accessed directly from nearby streets. I'll hold up an ACX MEETUP sign at the hotel's entrance and guide you to the meeting area. - https://plus.codes/7PJMGW9W+HQ
Feel free to bring games/fun activities. Also, I expect the event to be bilingual (but primarily in English). Please email [email protected], and I'll create a mailing list/chain.
Contact: [email protected]
2026-04-04 10:10:07
A few years ago I listened to a fascinating podcast interview featuring former Democratic presidential candidates Andrew Yang and Marianne Williamson. They agreed that politics is a mess and politicians are constantly doing bad things that harm the people they are supposed to serve. But they couldn’t agree on how bad that made the politicians as people.
Yang wanted to view the politicians as normal people responding to bad incentives, but Williamson wanted to call them evil for failing to exercise courage in the face of these bad incentives.
Morally, the notion that you can’t blame people when they are following incentives is akin to the “just following orders” excuse that Nazis tried to use at the Nuremberg trials. But what’s the alternative? In practice, we can’t and don’t expect people to always do the right thing even when everyone else around them isn’t.There’s a point at which “everyone else is doing it” really is an acceptable excuse, because everyone else really is doing it, and not doing so puts you at a significant and unfair disadvantage. But there are also absolutes, where this excuse is never acceptable -- things like genocide.
Most of the time it’s something more complicated: Doing the right thing means being a bit better on the margin. If everyone else in your class is cheating and using AI to do their homework, it could mean living by a principle where you only use AI for parts of the assignment that are clearly useless busy work -- and letting this be known.
A colleague recently said something that sums it up nicely: “A person’s moral strength is exactly their ability to resist bad incentives.” *(paraphrased)
But this post is not ultimately about ethics. I want to ask a more basic question: what do we really mean when we say someone is “following incentives”?
I think most of the time, it’s not at all clear that it’s true in a literal sense. My take is that “apparent short-term incentive-like vibes” might be a better description for what they are actually following. Things that have more “incentive-y” vibes are those that are more associated with selfishness and vices like greed. Money: incentive!! Admiration of your peers: incentive???
I think often what “incentive” is really referring to is more like a feeling of competitive pressure, or a belief that “if I don’t do this, someone else will, and then I’ll be a sucker and a failure.”
When I was in grad school, the people around me generally felt a lot of pressure to publish a lot of papers. But the people who really stood out and succeeded often were more focused on making real contributions that were actually valuable to others in the field, even if it meant publishing less. The apparent incentive to publish constantly was almost exactly backwards!
Often people do actually get short-term benefits for doing something that’s not in their long-term interest. So it might be a case of following short-term incentives in particular (and potentially being confused about what’s good in the longer term). Publishing more often made it seem like a student was more productive or impressive in the short term, and unlocked travel funding to go to conferences. But what you really want to advance your career is to become known throughout the field for something you did; no amount of mediocre publications would ever get you there.
A special case of following short-term incentives, which is maybe the most puzzlingly common, is one-shot thinking. You’ve likely been in a situation where someone says something like: “Of course the other side won’t cooperate -- there’s no incentive to! So we can’t either!” and people listening treat this as the sophisticated, hard-nosed take. But failing to cooperate leaves value on the table. And when you have the chance to negotiate, build trust, and/or set-up enforcement mechanisms to make sure all parties follow through on a commitment, it seems like you should at least consider trying to find a way to cooperate. The basic mistake here is treating an interaction as an isolated “one-shot” game, after which everyone walks away and never interacts in any way ever again. Acting like a situation is “one-shot” when it’s not isn’t sophisticated, it’s stupid.
This also means that saying you did something bad because of “the incentives” doesn’t work as an excuse. You’ve done the thing. The “one-shot” part is over. You are now in the position of being judged for your previous behavior, but treating something as a one-shot game is only valid if you will never be in a position to be judged for your behavior during the game.
Applying these insights to AI is left as an exercise for the reader.
Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.