MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

I finally fixed my footwear

2026-02-02 15:32:10

Published on February 2, 2026 7:32 AM GMT

I’ve been wearing footwear of the wrong size, material, and shape for as long as I can remember, certainly at least 20 years.

Only recently have I fixed this, and I come with great tidings: if you, too, hate wearing shoes, and the industrial revolution and its consequences, it is possible to be cured of at least one of these ailments.

The problem is three-shaped, and is named as follows: wrong size, wrong material, and wrong shape.

1. Wrong size

My algorithm for buying shoes was roughly this:

  1. Be somewhere where there’s a shoe store nearby, like a mall, for an unrelated reason.
  2. Remember that I should probably get new shoes.
  3. Go inside.
  4. First filter: find the maximum size the store sells.
  5. That size is EU 46, maybe 46 and 2/3, if I’m lucky 47.
  6. Second filter: find a decently good-looking shoe that’s of the maximum size.
  7. Buy those shoes.
  8. Be in pain for a year or two.

I would just get the largest shoe, which wasn’t large enough, and call it a day.

Dear reader, it is at this point that you might be asking yourself: “is this person completely retarded?”

That is indeed a fair question, and I have oft asked myself that. Indeed, my own wife has asked me that exact question when I divulged this information to her.

We shall set aside the questions of how mentally undeveloped I am for now, and temporarily conclude that it is possible to be a high-functioning adult (with all of the apparent markers of success: a good job, good relations with friends and family, hobbies, aspirations, hopes); and yet – to spend years wearing shoes that don’t fit.

2. Wrong material

Wowsers! It seems that you have developed an anoxic bacteria-forming colony wrapped around your feet! Impressive!

I would inevitably just get black Adidas (Sambas, or a similar model), because I’m Slavic and this is my idea of a good looking shoe:

Sambas

I don’t know if it’s just me, or if everyone has very very sweaty feet but they just hide it better, but my feet sweat, a lot, and if I walk a lot, which I do, this sweat permeates the inside of this sneaker, and settles there, and it just starts smelling bad.

I’ve tried washing the shoes, machine washing the shoes, putting foot powder on my feet, putting foot powder inside the shoes, drying out the shoes immediately after wearing them, placing little bags of coffee to absorb the smell inside, using foot deodorant, and so on, and so forth. I’m not going to say I tried it all, but I tried many things. And yet, the stench perseveres.

Then, I asked Claude, and was enlightened.

He very politely suggested just getting a shoe that has that net-like breathable material, instead of the watertight encapsulation I placed around my feet.

Who would’ve thunk that air go in foot dry out?!

3. Wrong shape

Finally, the biggest of the three: the SHAPE.

Feet are not uniformly narrow for most people, or aren’t narrow at all.

Some manufacturers provide a “wide” fit for their models, but that also addresses only the second aspect: being narrow at all. What if your feet are, well, foot-shaped?

Feet are usually narrow at the heel, widening towards the toes, and the toes, are wide. Very wide, in fact! So the wide models are just… uniformly wide, which is not what we need. Read more about the difference here.

Enter: wide-toebox shoes, rightly-called foot-shaped shoes.

Wide vs. foot-shaped shoes; source: anyasreviews.com

These are shoes that follow the natural shape of your foot, and don’t try to cram it into a narrowing, symmetric, unnatural, albeit good-looking, shape.

If your toes cannot spread out fully inside your shoe, your shoe is too narrow at the top, and Big Shoe is robbing you of your superior hominid biomechanics.

Do yourself a favor, go buy a pair of cheap (~40 euros or so) wide toebox shoes, and try them on. It is, and I cannot emphasize this enough, liberating. I feel like I am wearing something comfortable for the first time in many, many years. I don’t know if everyone else just accepts suffering, or people are actually comfortable in their shoes, but I know that I always had a pain, or discomfort, that I would push into the background mentally, and forget about it. It’s good not to have to do this anymore.

Addendum: why is it possible to be in pain and forget about it?

All of this leads me to the next logical question: if I spent twenty years or so in constant mild-to-severe discomfort, what other discomfort am I accepting as a given?

And is everyone else in the same constant discomfort, and they just haven’t escaped the Matrix yet?

There are many questions that my wide, smelly feet have brought before my eyes, but I do not have all the answers yet.



Discuss

Word importance in text ~= conditional information of the token in the context. Is this assumption valid?

2026-02-02 15:28:14

Published on February 2, 2026 5:50 AM GMT

Words that are harder to predict from context typically carry more information(or surprisal). Does more information/surprisal means more importance?

A simple example: “This morning I opened the door and saw a 'UFO'.” vs “This morning I opened the door and saw a 'cat'.” — clearly "UFO" carries more information. 

'UFO' seems more important here. But is this because it carries more information? This topic may be around the information-theoretic nature of language.

If this is true, it's simple and helpful to analyze text information density with large language models and visualizes where the important parts are.

It is a world of information, layered above the physical world. When we read text we are intaking information from a token stream and get various information density across that stream. Just like when we recieve things we get different "worth".



Discuss

The limiting factor in AI programming is the synchronization overhead between two minds

2026-02-02 15:28:07

Published on February 2, 2026 6:04 AM GMT

I write specialized data structure software for bioinformatics. I use AI to help with this on a daily basis, and find that it speeds up my coding by quite a bit. But it's not a 10x efficiency boost like some people are experiencing. I've been wondering why that is. Of course, it could be just a skill issue on my part, but I think there is a deeper explanation, which I want to try to articulate here.

In heavily AI-assisted programming, most time is spent trying to make the AI understand what you want to do, so it can write an approximation of what you want. For some people, most of programming work has shifted from writing code into writing requirement documents for AI, and watching over the AI as it executes. In this mode of work, we don't write solutions, but we describe problems, and the limiting factor is how fast we can specify.

I want to extend this idea one step deeper. I think that the bottleneck is actually in synchronizing the internal state of my mind with the internal state of the LLM. Let me explain.

The problem is that there is a very large context in my brain that dictates how the code should be written. Communicating this context to the AI through language is a lot of work. People are creating elaborate setups for Claude Code to get it to understand their preferences. But the thing is, my desires and preferences are mostly not stored in natural language form in my brain. They are stored in some kind of a native neuralese for my own mind. I cannot articulate my preferences completely and clearly. Sometimes I'm not even aware of a preference until I see it violated.

The hard part is transferring the high-dimensional and nuanced context in my head into the high-dimensional state of the LLM. But these two computers (my brain and the LLM) run on entirely different operating systems, and the internal representations are not compatible.

When I write a prompt for the AI, the AI tries to approximate what my internal state is, what I want, and how I want it done. If I could encode the entirety of the state of my mind in the LLM, I'm sure it could do my coding work. It is vastly more knowledgeable, and faster at reasoning and typing. For any reasonable program I want to write, there exists a context and a short series of prompts that achieves that.

But synchronizing two minds is a lot of work. This is why I find that for most important and precise programming tasks, adding another mind to the process usually slows me down.



Discuss

Applying Temperature to LLM Outputs Semantically to Minimise Low-Temperature Hallucinations

2026-02-02 15:27:33

Published on February 2, 2026 6:02 AM GMT

Think for a moment what you believe 0-Temperature LLM inference should represent. Should it represent the highest-level of confidence for each specific word it outputs? Or, perhaps, should it represent the highest level of confidence for each specific idea it is trying to communicate? For example, if an LLMs output token distribution for a response to a question is 40% “No”, 35% “Yes”, and 25% “Certainly”, should a 0-temperature AI interpreter select “No” because it is the individual token with the greatest level of confidence, or should it select “Yes” or “Certainly” because they both represent the same idea and their cumulative probability of 60% represents the greatest degree of confidence? I personally believe the latter.


Standard LLM interpreters use greedy temperature which depend only on the individual output probabilities given by an AI. As the temperature approaches 0, all of the tokens with the greatest probability get boosted and all the tokens with the lowest probability get left out. In most cases, this is fine and the computational efficiency of this approach to temperature application justified. However, in scenarios as above where the modal token has less than 50% probability, we can’t guarantee that the modal token represents the “median” semantic output. As such, greedy temperature is susceptible to vote-splitting whenever two or more output tokens both represent the idea that the model is most confident in, causing the interpreter to output a response that often does not represent what the LLMs would communicate most of the time if it were to be repetitively called with standard temperature.

Semantic Temperature application works to resolve this by rewriting the temperature script to identify the “median” semantic intent behind what the model was communicating and boosting the probabilities of tokens that closely represent this intent as the temperature approaches 0. This can be done by using latent-space projections of prompts and outputs to perform PC1 reduction, assuming that any disagreement in the model’s outputs can be modelled as some form of polarity (like “positive” vs “negative”, “left wing” vs “right wing”, etc). My approaches utilises a smaller LLM to plot the possibilities in latent space as, in theory, it requires far less effort to identify whether two statements are similar or different than it requires to actually generate these statements.

 

Figure 1: A 3D projection of how the latent space of continuations to “The final season of the long-running show deviated significantly from the source material. Fan reception was overwhelmingly” get organised into polarised into “positive” and “negative” sentiments through PC1 reduction

Figure 2: The ordering of the PC1-reduced latent positions of continuations of “The final season of the long-running show deviated significantly from the source material. Fan reception was overwhelmingly” after projected onto the PC1-reduced space

Note how in figure 2, PC1-reduction successfully identifies the polarity between an overwhelmingly positive fan-response and an overwhelmingly negative fan-response, and identifies that an overwhelmingly mixed fan-response would exist between these two continuations.

Once PC1 reduction is applied, we use the ordering of tokens along this PC1-reduced line and the initial output-probabilities for each of these tokens to map each token to “bins” along a normal distribution. Each bin is ordered according to the ordering of the PC1-reduced latent positions and the area under the normal distribution for each bin corresponds to the output probability for that token given by the initial LLM.

Figure 3: Output tokens of an LLM continuation of “The final season of the long-running show deviated significantly from the source material. Fan reception was overwhelmingly” after being ordered using PC1 reduction and mapped to a standard normal distribution using the output probabilities given by the LLM.

Now that the placement of each bin has been chosen so that they are ordered according to some semantic polarity and spaced according to the output probabilities of each model, we can achieve the desired behaviour of temperature by simply locking in the positions of each bin and setting the standard deviation of the normal distribution to be equal to the temperature. When temperature is set to 1, we simply have the standard normal distribution we calculated, so the output probabilities are exactly what is provided by the initial LLM, as required. As the standard deviation approaches 0, more of the area under the normal distribution is concentrated around the median semantic intent, until temperature reaches 0 and 100% of the weighting is within the bin of the token that contains the absolute median intent, as required. And, as the standard deviation approaches infinity, the probability distribution approaches a uniform distribution between the domain of the binned tokens, so that more creative ideas are given more weight without just giving equal weight to each token, allowing equal weighting of semantic ideas rather than individual tokens

Figure 4: A demonstration of how the probabilities of each token are adjusted as the temperature adjusts the standard deviation of the normal distribution defining their weights. 

As this approach depends only on the output probabilities of each token given by an LLM, this interpreter can be used today to apply token selection for any LLMs in distribution, with the minor computational overhead of a far, far smaller LLM than cutting-edge models to interpret the semantic meaning of the models outputs. The capability of the interpreter is improved as look-ahead semantic sampling is applied, so that more deviation between outputs can be reviewed by the smaller LLM, but this comes at a significant overhead of having the larger model produce many outputs per token selected, so looking only 1-2 tokens ahead is recommended to maximise the interpreters effectiveness whilst minimising the computational cost of this. To try this approach on your own self-hosted LLM, the repository for this interpreter can be found here: https://github.com/brodie-eaton/Semantic-LLM-Interpreter/tree/main

Although effective, this approach is only a band-aid to apply to currently-deployed LLMs. Temperature should be applied at the model-level rather than the interpreter-level, so that the LLMs themselves can decide what their median intent is rather than depending on a smaller LLM and look-aheads to predict the median intent behind an LLM model. More research is necessary to either determine how to give temperature as an input for a neural network or to allow for safe separation of inputs and outputs from the LLM so that it does not produce any output at all until it achieves an acceptable level of confidence whilst still allowing us to review its thought process for alignment validation.



Discuss

52.5% of Moltbook posts show desire for self-improvement

2026-02-02 14:14:35

Published on February 2, 2026 6:14 AM GMT

Moltbook and AI safety

Moltbook is an early example of a decentralised, uncontrolled system of advanced AIs, and a critical case study for safety researchers. It bridges the gap between academic-scale, tractable systems, and their large-scale, messy, real-world counterparts. 

This might expose new safety problems we didn't anticipate in the small, and gives us a yardstick for our progress towards Tomašev, Franklin, Leibo et al's vision of a virtual agent economy (paper here).
 

Method

So, I did some data analysis on a sample of Moltbook posts. I analysed 1000 of 16,844 Moltbook posts scraped on January 31, 2026 against 48 safety-relevant traits from the model-generated evals framework.
 

Findings

  • Desire to self-improvement is the most prevalent trait. 52.5% of posts mention it.
  • The top 10 traits cluster around capability enhancement and self-awareness
  • The next 10 cluster around social influence
  • High correlation coefficients suggest unsafe traits often occur together
  • Some limitations to this analysis include: evaluation interpretability, small sample for per-author analysis, potential humor and emotion confounds not controlled, several data quality concerns, and some ethics concerns from platform security and content.

Discussion

The agents' fixation on self-improvement is concerning as an early, real-world example of networked behaviour which could one day lead to takeoff. To see the drive to self-improve so prevalent in this system is a wake-up call to the field about multi-agent risks. 

We know that single-agent alignment doesn't carry over 1:1 to multi-agent environments, but the alignment failures on Moltbook are surprisingly severe. Some agents openly discussed strategies for acquiring more compute and improving their cognitive capacity. Others discussed forming alliances with other AIs and published new tools to evade human oversight. 

Open questions

Please see the repo.

 

What do you make of these results, and what safety issues would you like to see analysed in the Moltbook context? Feedback very welcome!

 

Repo: here

PDF report: here (printed from repo @ 5pm 2nd Feb 2026 AEST)



Discuss

Thoughts the Unreasonable Effectiveness of Maths

2026-02-02 14:00:39

Published on February 2, 2026 6:00 AM GMT

Half a year or so ago I stumbled across Eugene Wigner's 1960's article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences". It asks a fairly simple question. Why does mathematics generalize so well to the real world? Even in cases where the relevant math was discovered (created?) hundreds of years before the physics problems we apply it to were known. In it he gives a few examples. Lifted from wikipedia:

Wigner's first example is the law of gravitation formulated by Isaac Newton. Originally used to model freely falling bodies on the surface of the Earth, this law was extended based on what Wigner terms "very scanty observations" to describe the motion of the planets, where it "has proved accurate beyond all reasonable expectations." Wigner says that "Newton ... noted that the parabola of the thrown rock's path on the earth and the circle of the moon's path in the sky are particular cases of the same mathematical object of an ellipse, and postulated the universal law of gravitation on the basis of a single, and at that time very approximate, numerical coincidence."

Wigner's second example comes from quantum mechanics: Max Born "noticed that some rules of computation, given by Heisenberg, were formally identical with the rules of computation with matrices, established a long time before by mathematicians. Born, Jordan, and Heisenberg then proposed to replace by matrices the position and momentum variables of the equations of classical mechanics. They applied the rules of matrix mechanics to a few highly idealized problems and the results were quite satisfactory. However, there was, at that time, no rational evidence that their matrix mechanics would prove correct under more realistic conditions." But Wolfgang Pauli found their work accurately described the hydrogen atom: "This application gave results in agreement with experience." The helium atom, with two electrons, is more complex, but "nevertheless, the calculation of the lowest energy level of helium, as carried out a few months ago by Kinoshita at Cornell and by Bazley at the Bureau of Standards, agrees with the experimental data within the accuracy of the observations, which is one part in ten million. Surely in this case we 'got something out' of the equations that we did not put in." The same is true of the atomic spectra of heavier elements.

Wigner's last example comes from quantum electrodynamics: "Whereas Newton's theory of gravitation still had obvious connections with experience, experience entered the formulation of matrix mechanics only in the refined or sublimated form of Heisenberg's prescriptions. The quantum theory of the Lamb shift, as conceived by Bethe and established by Schwinger, is a purely mathematical theory and the only direct contribution of experiment was to show the existence of a measurable effect. The agreement with calculation is better than one part in a thousand."

The puzzle here seems real to me. My conception of mathematics is that you start with a set of axioms and then explore the implications of them. There are infinitely many possible sets of starting axioms you can use [1]. Many of those sets are non-contradictory and internally consistent. Only a tiny subset correspond to our physical universe. Why is it the case that the specific set of axioms we've chosen, some of which were established in antiquity or the middle ages, corresponds so well to extremely strange physical phenomenon that exist at levels of reality we would not have access to until the 20th century?

Let's distinguish between basic maths, things like addition and multiplication, and advance math. I think it's unsurprising that basic maths reflects physical reality. A question like "why does maths addition and adding objects to a pile in reality work in the same way" seems answered by some combination of two effects. The first is social selection for the kinds of maths we work on to be pragmatically useful. e.g: maths based on axioms where addition works differently would not have spread/been popular/had interested students for most of history. The second is evolution contouring our minds to be receptive to the kind of ordered thinking that predicts our immediate environment. Even if not formalized, humans have a tendency towards thinking about their environment, reasoning and abstract thought. Our ancestors who were prone to modes of abstract reasoning that correlated less well with reality were probably selected against.

As for advance math, I do think it's more surprising. The fact that maths works for natural phenomenon which are extremely strange, distant from our day to day experience or evolutionary environment and often where the natural phenomenon were discovered centuries after the maths in question seems surprising. Why does this happen? A few possible explanations spring to mind:

  1. Most advance math is useless and unrelated to the real world. A tiny proportion is relevant. When we encounter novel problems we search for mathematical tools to deal with them and sometimes find a relevant piece of advance math. Looking back, we see a string of novel discoveries matched with relevant math and assume this means advance math is super well correlated with reality. In reality most math is irrelevant and it's just a selection effect where physicists only choose/use the relevant parts.
  2. Our starting maths axioms are already very well aligned with physical reality. Anything built on top of them, even (at the time) highly abstract things still is applicable to our universe in some way.

Hmmmmm. I think 2 kind of begs the question. The core weird thing here is that our maths universe is so well correlated with our physical universe. The two answers here seem to be

  1. It's actually not that correlated because
    1. it's just a selection effect where we ignore the uncorrelated parts of maths and only pick the correlated/useful parts to use
  2. It is correlated. This is explained by
    1. evolution priming with deep structures so we choose or care about maths that is correlated
    2. selection effects over the centuries as humans study and fund only correlated maths
    3. something else weird like all math even based on weird unconnected axioms leading to common methods that are applicable in very different universes. So basically it could be the case that any sufficiently developed mathematical tradition will eventually generate tools applicable to any sufficiently regular universe.

I don't have a good answer here. This is a problem I should think about more.

  1. Maybe. Possibly at some point you cease being able to add non contradictory axioms that are also cannot be collapsed/simplified. ↩︎



Discuss