MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

'Staying with it' Done Wrong

2026-03-15 06:38:41

I was meditating today and noticed quite some over-effort happening. So I did the diligent, spiritually respectable thing: I located it in the body — "pain in my forehead" — and decided to stay with it. I even felt a small glow of pride for remembering to find it somatically instead of getting lost on the mental level.

What happened was quite disappointing: the headache locked into my attention, intensified steadily, and after a few minutes of escalating suffering I gave up and went to do something else.

I think this is a common way to misread the instruction be with what is. The mind notices something, names it — "pain," "tension," "sadness," "shame" — and then sincerely tries to stay with that. But now it is no longer staying with the living felt sense. It is staying with a conceptually frozen version: the brain's reconstruction of what this label is supposed to feel like.

Before a feeling can ease and release, it has to be allowed to move — to shapeshift freely. One moment it's an emotion, then it vanishes, then it reappears as pressure in the head, then as something else entirely. That, I think, is why Gendlin emphasized the intermediate handles on the felt sense that don't feel quite right yet: unless you're receiving those, you're not going to receive the truly relaxing shift. Without this intermediate aliveness, Focusing or meditative work with feelings tends to stall.

If, on the other hand, you notice it is stalling — the feeling has become fixed, static, stubbornly the same — that's worth treating as a signal. Something is interfering: a concerned part, a Buddhist hindrance, over-effort, aggressive awareness. Something worth investigating on its own terms.



A little exercise from me: Next time you do Focusing, IFS, or sit with a feeling in meditation — watch carefully what happens right after you name it. Does the naming make it more stuck, because now you have a more particular object to grip? Or was the word just a momentary touch that didn't disturb the feeling's natural unfolding?

Good luck with your practice!



Discuss

Forecasting Dojo Meetup - postmortem discussion.

2026-03-15 04:32:57

Hi Everyone,

The next meetup of the forecasting practice group is here! Next week we're doing a postmortem — looking back at our recent forecasts, both the hits and the misses. What did we get right? Where did we go wrong? What can we learn?

No preparation needed, all skill levels welcome.

Where: Video call on Discord.

For more context on the group, see the original post.



Discuss

What concerns people about AI?

2026-03-15 03:24:55

A lot of people are worried about AI. What are their worries? How worried are they?  Are some demographics more worried than others? We ran a study to find out. 

In this article, we explain 16 concerns about AI that you might find it valuable to know about. We discuss, based on our data (collected in October 2025), how worried people in the US are about each concern.

To whet your appetite, here are some questions that our study offers insights into. Can you predict what we found before we tell you the answers?
 

  • Are conservatives more, less, or equally likely to be concerned about AI than progressives?
  • What about gender - are men or women more likely to be concerned?
  • Does AI-related knowledge affect how concerned people are?
  • What are people most concerned about when it comes to AI?
  • How low or high is the general level of concern about AI in the US population?
     

Have you made your predictions? Okay, let’s get into the study.

 

How we studied AI concern


 

We started by scouring the internet for expressions of concern about AI and compiling a list of common concerns, based on what we found (as well as our own background experience of hearing people express concerns). The potential concerns about AI that we identified are:


 

  1. Proliferation of low-quality AI content (i.e., ‘AI slop’)
  2. AIs plagiarising the work of humans (e.g., remixing the work of artists without compensation)
  3. AI elimination of jobs
  4. AI misinformation (including deepfakes)
  5. People using AI but pretending not to have (e.g., to write school assignments)
  6. AI used for authoritarian control (e.g., for monitoring and punishing populations based on behavior)
  7. Relationships (often romantic) people have with AIs
  8. Inequality caused by AI (such as by creating concentration of wealth)
  9. AI ideological bias (e.g., favoritism toward progressive or conservative viewpoints)
  10. AI bias and discrimination (e.g., by perpetuating unfair unequal treatment of different groups)
  11. Concentration of power caused by AI (e.g., making those who control the most advanced AIs much more powerful than everyone else)
  12. AI used for scams or to manipulate individuals (e.g., AI bots designed to seem like specific humans in order to trick people)
  13. Ceding of more and more control to AIs (e.g., making major decisions impacting millions of people that humans no longer make)
  14. Slaughterbots (i.e., weaponized AI drones)
  15. Superintelligence (i.e., AI that outperforms the ability of humans in essentially all domains)
  16. AI itself experiencing suffering when we train or run it.


 

Each of these concerns is described and explored in more detail below.

 

Each of these concerns is described and explored in more detail below.


 

While we were conducting this experiment (in October of 2025), some other concerns became more prevalent in discourse about AI, but these were not included in our study. The most notable of these that weren't included in our study are:


 

  • The possibility that the monetary values of companies related to generative AI represent a ‘bubble’ that, upon bursting, will have disastrous consequences on the economy of the US or the world
  • The negative impacts of AI data centers on local communities (e.g., pollution, use of ground water)
  • The environmental impacts of AI, via the energy or water consumption of data centers
  • Children having increased access to inappropriate content


 

We recruited 403 participants through our participant recruitment platform, Positly.com, and started by asking them some general questions about their level of knowledge on the topic of AI and their overall concerns about its impact on their lives and society. After that, we showed them information about the 16 potential AI-related concerns we identified (one potential concern at a time, in a random order). For this, we assigned each participant randomly to one of two groups:

  1. Short Definitions: 200 participants were shown just a short sentence defining each of the 16 concerns
  2. Full Descriptions: 203 participants were shown the same short sentence definitions as the Short Definitions group and a longer description of each concern, containing examples. (We’ve included all of the full descriptions in this article, below.)

 

For each potential concern, participants were asked to indicate their level of actual concern about it on a 5-point Likert scale from “Not at all concerned” (which was assigned the value 0) to “Extremely concerned” (which was assigned the value 4).

Finally, at the end of the study, participants were asked again about their general levels of concern about AI (in their own lives and for society), to see whether participating in the study and seeing information about so many potential concerns changed their level of concern, and then they were asked some demographic questions.

 

Now, let’s dive into the results! We'll start with results about overall concern (before diving into the  16 specific concerns). 

 

Since this is a long report, we've included just the initial section here. To read the full report, go here: https://www.clearerthinking.org/post/study-report-what-concerns-people-about-ai 
 



Discuss

Sparks of RSI?

2026-03-15 01:09:25

Are your long-running agents self-improving in loops with minimal prompting? Mine sure are!

I think we're seeing the first sparks of RSI here, folks. I'm expecting the frontier labs to scramble furiously to push this forward, finding and patching the meta-failure-modes. Thus, I expect next versions to be even better at this.

Here's what some other people are saying/claiming:

https://x.com/shreyasnsharma/status/2032567729560105117

https://x.com/varun_mathur/status/2032671842230501729

https://x.com/TuXinming/status/2032478765033701835

https://x.com/andrewwhite01/status/2031761577943425475

https://x.com/aramh/status/2029553870502756706

https://x.com/polynoamial/status/2029622090152956335

https://t.co/znsJlcww5r

And many more. This is just a few examples. Not super impressive so far, but if this "task" goes the way many others have of first showing signs of progress in the 1-3% accuracy range, then rapidly shooting upwards over the next couple of model versions.... Yeah.

Basically, I think we're in crunch time. Automated alignment time is here. Get cracking.



Discuss

FW26 Color Stats

2026-03-14 23:50:43

Once again, I am coming out with stats on the colors in the latest fashion collections: this time, for the fall/winter 2026 season.

Previous entries: SS26, FW25, SS25, FW24, SS24.

Methodology Recap

As I’ve done before, I’m using my automated script for going through all the images hosted on Vogue Runway’s website for the current ready-to-wear collections, asking an LLM (currently GPT4o) to report all the colors in the outfit in each picture, and counting up the totals.

So this means the “count” for each color is the number of times GPT4o observed it, across all images; if it mentions a black hat and a black dress in the same image, for instance, that counts as two.

To save time, I didn’t do manual results this time around.

The Results

Here’s the top 30 colors:

and the top 30 non-neutral colors (not black, white, gray, brown, or shades thereof):

As you can see, there’s a lot of black. Well, there’s always a lot of black; black is always the #1 color; but this time around, the near-synonym “jet black” is also in the #2 slot, and “charcoal black” was #7, neither of which were top-30 colors in past years.

We also see burgundy as the most common non-neutral color (whereas it’s almost always red in that slot), and the new color “oxblood” (a dark, brownish red) in the #7 slot, as well as many other dark and muted tones.

The color theme for Fall/Winter 2026 is clearly darkness. Black, charcoal gray, burgundy, and other dark shades predominate.

Comparisons to Past Years

If you compare to FW25, we see a very stark difference:

In FW26, we’ve clearly moved away from white, beige, and brown, towards black and charcoal gray, and away from red towards burgundy.

Overall, we see a systematic movement towards darker colors, and away from pastels and brights.

Rising (significantly higher in rank than last year):

  • mustard yellow

  • peach

New (in the top 30 this year but not last year):

  • jet black

  • charcoal black

  • taupe

  • oxblood

  • brick red

  • wine

  • dusty rose

  • burnt orange

  • olive

  • sage green

  • slate blue

  • cobalt blue

  • plum

  • fuchsia

  • magenta

Falling (significantly lower in rank than last year):

  • red

  • yellow

  • light blue

  • lavender

Lost (in the top 30 last year but not this year):

  • light beige

  • pale pink

  • orange

  • light green

  • light purple

  • purple

  • deep purple

  • dark purple

Or, visualized, we see a lot of dark and muted tones among the winners, and brights and pastels among the losers.

The overall prevalence of black in the collections isn’t just a retreat to “safe basics” due to economic caution — the luxury market seems to be improving slightly after several bad years, and is projected to strengthen further.

BNP Paribas estimates of luxury sales

Rather, this looks like a stylistic change. Maybe a subtle commentary on the “gloom” of the world situation, maybe part of the “return to sexiness” and edgy, vampy styles that’s been prominent in Vogue’s fashion coverage, who knows.

It’s definitely the biggest color shift I’ve observed since I’ve started tracking color trends. The overall “arc” of this decade started with post-pandemic bright colors in 2021, followed by a retreat to neutrals over the next few years, and now this clear-cut shift towards darkness.



Discuss

Extracting Performant Algorithms Using Mechanistic Interpretability

2026-03-14 22:19:42

A Prequel: The Tree of Life Inside a DNA Language Model

Last year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of species were encoded as a curved manifold in the model's internal activations, with geodesic distances along that manifold tracking actual evolutionary branch lengths. Bacteria that diverged hundreds of millions of years ago were far apart on the manifold, and closely related species were nearby. 

The model was trained to predict the next DNA token. Nobody told it about evolution or gave it a phylogenetic tree as a training signal. But the model needed to encode evolutionary relationships in order to predict DNA well, and so it built a structured geometric representation of those relationships as part of its internal computation, and the representation was good enough that you could extract it with interpretability tools and compare it meaningfully to the ground truth.

I saw this and decided to apply the same approach to another type of biological foundation models - those trained on single cell data

If Evo 2 learned the tree of life from raw DNA, what did scGPT learn about how human cells develop?

Finding the Manifold

For those unfamiliar with the biology side: scGPT is a transformer model trained on millions of single-cell gene expression profiles. Each cell in your body expresses thousands of genes at varying levels, and a single-cell RNA sequencing experiment measures those expression levels for potentially hundreds of thousands of individual cells simultaneously. scGPT was pre-trained on this kind of data in a generative fashion, learning to predict masked gene expression values from context. 

The question I wanted to answer was: does scGPT encode, somewhere in its attention tensor, a compact geometric representation of some biological processes? And if so, can I find it without knowing in advance exactly where to look?

To attack this systematically, I used a two-phase research loop driven by an AI executor-reviewer pair operating under pre-registered quality gates. Phase 1 was a broad hypothesis search: the loop explored a large combinatorial space of candidate manifold hypotheses by varying the biological target (developmental ordering, regulatory structure, communication geometry), the featurization strategy (attention drift, raw embeddings, mixed operators), and the geometric fitting method (Isomap, geodesic MDS, a technique called Locally Euclidean Transformations), all applied across the full 12-layer × 8-head scGPT attention tensor, which means 96 individual attention units to screen. 

What came out of Phase 1 was a robust positive hit: hypothesis H65, which identified a compact, roughly 8-to-10-dimensional manifold in specific attention heads where positions along the manifold corresponded to how far cells had progressed through hematopoietic differentiation. Stem cells clustered at one end; terminally differentiated blood cell types (T cells, B cells, monocytes, macrophages) spread out along distinct branches at the other end; and the branching topology matched the known developmental hierarchy with statistically significant branch structure that held up under stringent controls.

Then I switched to Phase 2 which was rather manual investigation: methodological closure tests, confidence intervals, structured holdouts, and external validation. I validated the manifold on a non-overlapping panel from Tabula Sapiens and then confirmed it via frozen-head zero-shot transfer to an entirely independent multi-donor immune panel. You can explore this manifold yourself and compare different extraction variants, in an interactive 3D viewer.

But Does the Extracted Algorithm Actually Work?

I think finding a biologically meaningful manifold inside a foundation model is, on its own, cool. But the question I actually cared about was: can you take this geometric object out of the model and use it as a standalone method that does useful work?

To do it, I developed a three-stage extraction pipeline:

  1. I directly exported the frozen attention weight matrices from the relevant heads, with no retraining, just literally reading out the learned linear operator.
  2. I attached a lightweight learned adaptor that projects the raw attention output into the manifold's coordinate system.
  3. And then I added a task-specific readout head (for classification or pseudotime prediction).

The key property of this pipeline is that the heavy lifting, the actual biological knowledge, comes entirely from the frozen attention weights that scGPT learned during pre-training. The adaptor and readout are small and cheap to train, and they never touch the original dataset the model was pre-trained on. What you end up with is a standalone algorithm you can ship as a file and run independently of scGPT.

So: how does it perform?

I benchmarked the extracted algorithm against a lineup of established methods that biologists actually use in practice: scVI (a deep generative model for single-cell data), Palantir (a pseudotime method based on diffusion maps and Markov chains), Diffusion Pseudotime (the Scanpy implementation), CellTypist (a logistic-regression-based cell type classifier trained on a large reference atlas), PCA, and raw-expression baselines. These are the standard tools in the single-cell bioinformatics toolkit, developed and refined by domain experts over years.

On pseudotime-depth ordering, which measures how well a method recovers the true developmental progression from stem cells to mature blood cells, the extracted algorithm appeared to be the best, significantly outperforming every tested alternative in paired split-level statistics. On classification (distinguishing cell types), the picture was less unambiguous but still strong: the extracted head led on branch balanced accuracy and on key subtype discrimination tasks like CD4/CD8 T cell separation and monocyte/macrophage distinction. On some stage-level and branch-level macro-F1 metrics, diffusion-style baselines or raw expression had the edge, so this is not a clean sweep, but the extracted algorithm is solidly in the top tier across the board, and dominant on the most biologically meaningful endpoint.

Now, you might reasonably ask: is this just the result of having a fancier probe? Maybe any sufficiently flexible function fitted on top of scGPT's embeddings would do equally well, and the "manifold discovery" part is not contributing anything real. I tested this. A 3-layer MLP with 175,000 trainable parameters, fitted on frozen scGPT average-pooled embeddings, was significantly worse than the extracted 10-dimensional head on 6 out of 8 classification endpoints. And the extracted head accomplished this while being 34.5 times faster to evaluate across a full 12-split campaign, with roughly 1,000 times fewer trainable parameters.

Let me restate this: the geometric structure that mechanistic interpretability found inside scGPT's attention heads, when extracted and used directly, outperforms the standard approach of slapping an MLP on top of the model's embeddings. The interpretability-derived method is simultaneously more accurate, faster, and smaller. 

How Small Can You Go Though?

Once you have an extracted algorithm that works, the natural next question is how much of it you actually need. Compression is interesting for practical reasons, but it is even more interesting for interpretability reasons, because the further you compress an algorithm while preserving its performance, the closer you get to understanding what it is actually doing.

The initial extracted operator pooled three attention heads from scGPT and weighed 17.5 MB. Not large by modern standards, but not trivially inspectable either. The first compression step was to ask: do we really need all three heads, or does a single one carry the essential geometry? I scanned all 96 attention units in scGPT's tensor and found that a single unit, Layer 2, Head 5, carried substantial transferable developmental geometry on its own. The compact operator built from this single head weighed 5.9 MB and showed almost no loss compared to the three-head version on the benchmark suite. 

The second compression step was more aggressive: truncated SVD on the single-head operator. This factors the weight matrix into low-rank components and throws away everything below a chosen rank threshold. At rank 64, the resulting surrogate shrinks to 0.73 MB, which is already quite tiny, and it still beats the frozen scGPT average-pool + MLP baseline on all eight pooled classification endpoints. It does incur statistically significant losses versus the dense single-head operator on 5 out of 8 endpoints, so this is not free compression. But the rank-64 version is still a better algorithm than the standard probing approach, at a fraction of a megabyte.

And also, now the interpretability payoff arrives. I ran a factor ablation audit on the rank-64 surrogate: systematically remove each of the 64 factors one at a time, measure how much performance drops, and rank them by necessity. And it appeared that just four factors, out of 64, accounted for 66% of the total pooled ablation impact. And then, when I examined what those four factors corresponded to biologically, they resolved into explicit hematopoietic gene programs.

So, Mechanistic Interpretability is Becoming Dual Use

Let's step back from the specific results for a moment and consider the high-level lesson here.

The very property that makes this result interesting is also the property that makes me cautious about applying the same techniques to large language models. Because the argument runs in both directions. If you can extract an algorithm that a model uses to do something well, you can potentially also improve how it does that thing: by identifying inefficient components, scaling the relevant circuits, composing extracted subroutines in new ways, by replacing the fuzzy learned version with a cleaner extracted version and freeing up capacity for the model to learn something else etc. Mechanistic interpretability, in other words, is becoming a capability amplification tool. This is a well-known theoretical concern, but it looks like now it is becoming a practical one.

Consider a few scenarios. You identify the circuit in a language model responsible for multi-step planning, extract it, find that it is operating at low rank with substantial redundancy, and publish a paper showing how to compress it. Now anyone training the next generation of models can initialize that circuit more efficiently, or allocate more capacity to the components that matter. Or: you discover that a model's chain-of-thought reasoning relies on a specific attention pattern that routes information through intermediate tokens in a predictable way, and you publish a detailed mechanistic account of how this works. Now someone building an inference-time scaling pipeline can optimize that routing directly rather than relying on the model to rediscover it from scratch. 

This is one of the reasons why I have deliberately chosen to focus my interpretability work on biological foundation models. Although I agree that pushing biology can also be associated with risks, I believe that we really need to push biology asap, considering the current AI risks landscape, and pushing biology is what I am trying to do.

Mechanistic Intepretability for Novel Knowledge Discovery

On a more positive and general note: on top of being an auditing/monitoring tool, mechanistic interpretability can be a knowledge discovery tool. Consider:

  1. The model learned something about hematopoiesis that existing bioinformatics methods had not fully captured, at least not in the same compact form.
  2. The interpretability pipeline found a representation that, when extracted and deployed as a standalone algorithm, outperformed established tools on the most biologically meaningful benchmarks.
  3. The knowledge extracted from the model's internals was new in the operationally relevant sense: nobody had this particular algorithm before, and it works better than what people were using.

Join In

If you like mechanistic interpetability, I encourage you to consider switching from LLMs to biological foundation models.

The work I described here is part of a broader research program on mechanistic interpretability of biological foundation models. Earlier I published a comprehensive stress-test of attention-based interpretability methods on scGPT and Geneformer. In parallel, I developed a sparse autoencoder atlas covering 107,000+ features across all layers of both Geneformer and scGPT. The hematopoietic manifold paper is the latest piece.

There is a lot more to do here, both in terms of applying these methods to other biological systems and developmental processes, and in terms of developing better unsupervised techniques for manifold discovery that could scale beyond what the current semi-supervised approach allows. I think this is one of the best places in the current research landscape to do interpretability work that is simultaneously methodologically interesting, practically useful for biomedicine (and yes, human intelligence amplification), and safe with respect to the capability externalities that worry me about LLM interpretability.

You can find more about the research program, ongoing projects, and ways to get involved at biodynai.com. The full paper with all supplementary materials is on arXiv, and the interactive 3D manifold viewer is at biodyn-ai.github.io/hema-manifold.



Discuss