MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

How a bug of AI hardware may become a feature for AI governance

2025-12-07 22:55:02

Published on December 7, 2025 2:55 PM GMT

“Hardware noise” in AI accelerators is often seen as a nuisance, but it might actually turn out to be a useful signal for verification of claims about AI workloads and hardware usage.

With this post about my experiments (GitHub), I aim to

  1. Contribute more clarity to the discussion about “GPU non-determinism”
  2. Present how non-associativity can help monitor untrusted AI datacenters

Summary

  • I ran ML inference in dozens of setups to test which setups have exactly reproducible results, and which differences in setups lead to detectable changes in outputs or activations.
  • In nearly all cases studied, results were bitwise-reproducible within fixed settings. Differences across production methods were consistent, not random.
  • Given that these perturbations are reproducible and unique, they can act as a “fingerprint” of the exact setup that produced an output. This may turn out useful for monitoring untrusted ML hardware (such as in the context of AI hardware governance, international treaty verification, and AI control/security).
  • Some settings had unique fingerprints, while others were invariant under change.
    • Invariant (i.e. not detectable by noise):
      • batch size in prefill inference
      • concurrent CUDA streams
      • pipeline parallelism rank
    • Detectable when re-executing on identical hardware:
      • batch size in decode inference
      • attention algorithm (sdpa, FlashAttention, eager, …)
      • CUDA version (if kernel libraries were updated)
      • tensor parallelism
      • different quantization methods, even at the same precision
      • Any change that affects numerics is detectable, since results were bitwise-reproducible within settings.
    • Detectable even with reproduction on different hardware:
      • attention algorithm
      • different quantizations (even within the same INT precision)
      • and of course different inputs or models
      • Different reduction order (a subtle difference resulting from batching, tensor parallelism, etc.) is masked by cross-hardware “noise”. Different algorithms are still detectable, because they are not just rounding errors, but qualitatively different math.

In a world with demand for assurance against hidden large-scale ML hardware use, this could become a new layer of defense, conditional on some engineering to make it deployment-ready.

The full post is to be found on my Substack.

This work was part of my technical AI governance research at MATS (ML Theory and Alignment Scholars). Special thanks go to Mauricio Baker for his excellent mentoring and guidance, and to Elise Racine for her support and helpful advice.



Discuss

Karlsruhe - If Anyone Builds It, Everyone Dies

2025-12-07 22:49:31

Published on December 7, 2025 2:49 PM GMT

What

I will give a short summary of the book “If Anyone Builds It, Everyone Dies” by Eliezer Yudkowsky and Nate Soares on the topic of existential risk due to AI (artificial intelligence). Afterwards, we will discuss thoughts and objections in small groups.

Feel free to join even if you have not yet read the book yourself.

When & Where

I reserved a small group study room in the new computer science building close to the tram stop Durlacher Tor: 5. Gruppenräume InformatiKOM (Room 203 (Tisch)), Adenauerring 12, 76131 Karlsruhe, DE.

Let's meet at 18:30.



Discuss

Eliezer's Unteachable Methods of Sanity

2025-12-07 10:46:45

Published on December 7, 2025 2:46 AM GMT

"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that either, but at least I haven't wasted much time along the way.

Actual LessWrong readers also sometimes ask me how I deal emotionally with the end of the world.

I suspect a more precise answer may not help.  But Raymond Arnold thinks I should say it, so I will say it.

I don't actually think my answer is going to help.  Wisely did Ozy write, "Other People Might Just Not Have Your Problems."  Also I don't have a bunch of other people's problems, and other people can't make internal function calls that I've practiced to the point of hardly noticing them.  I don't expect that my methods of sanity will be reproducible by nearly anyone.  I feel pessimistic that they will help to hear about.  Raymond Arnold asked me to speak them anyways, so I will.


Stay genre-savvy / be an intelligent character.

The first and oldest reason I stay sane is that I am an author, and above tropes.  Going mad in the face of the oncoming end of the world is a trope.

I consciously see those culturally transmitted patterns that inhabit thought processes aka tropes, both in fiction, and in the narratives that people try to construct around their lives and force their lives into.

The trope of somebody going insane as the world ends, does not appeal to me as an author, including in my role as the author of my own life.  It seems obvious, cliche, predictable, and contrary to the ideals of writing intelligent characters.  Nothing about it seems fresh or interesting.  It doesn't tempt me to write, and it doesn't tempt me to be.

It would not be in the interests of an intelligent protagonist to amplify their own distress about an apocalypse into more literarily dramatic ill-chosen behavior.  It might serve the interests of a hack author but it would not help the character.  Understanding that distinction is the first step toward writing more intelligent characters in fiction.  I use a similar and older mental skill to decide which tropes to write into the character that is myself.

This sense, which I might call, genre-savviness about the genre of real life, is historically where I began; it is where I began, somewhere around age nine, to make choices about not becoming the boringly obvious dramatic version of Eliezer Yudkowsky that a cliche author would instantly pattern-complete about a literary character facing my experiences.  Specifically, though I expect this specific to mean nothing to a supermajority of you, I decided that as a relatively smart kid I would not become Raistlin Majere, nor ever exhibit a large collection of related tropes.

The same Way applies, decades later, to my not implementing the dramatic character a journalist dreams up -- a very boring and predictable pattern-completion of a character -- when they dream up a convenient easy-to-write-about Eliezer Yudkowsky who is a loudly tortured soul about his perception of the world's end approaching along its default course.

"How are you coping with the end of the world?" journalists sometimes ask me, and I reply "I have a great distaste for drama", but the actual answer is "I am a better writer than you, and I decided not to write myself as that incredibly cliche person that would be easy and convenient for you to write about."

"Going insane because the world is ending" would be a boring trope and beneath my dignity to choose as my actual self's character.


Don't make the end of the world be about you.

"How are you coping with the end of the world?" journalists sometimes ask me, and I sometimes reply, "By remembering that it's not about me."  They have no hope of understanding what I mean by this, I predict, because to them I am the subject of the story and it has not occurred to them that there's a whole planet out there too to be the story-subject.  I think there's probably a real sense in which the Earth itself is not a real thing to most modern journalists.

The journalist is imagining a story that is about me, and about whether or not I am going insane, not just because it is an easy cliche to write, but because personality is the only real thing to the journalist.

This is also a pattern that you can refuse, when you write the story that is yourself; it doesn't have to be a story that is ultimately about you.  It can be about humanity, humane preferences, and galaxies.  A sentence about snow is words, is made of words, but it is about snow.  You are made of you, but you don't need to be all about yourself.

If I were to dwell on how it impacted me emotionally that the world was ending, I would be thinking about something which genuinely doesn't matter to me very much compared to how the world is ending.  Having dramatic feelings is not mostly what I am about -- which is partly how I ended up being not much made of them, either; but either way, they're not what I'm about.

So long ago that you probably can't imagine what it was like back then, not just before ChatGPT but years before the age of deep learning at all, there was a person who thought they were like totally going to develop Artificial General Intelligence. Then they ran into me; and soon after, instead started agonizing about how they had almost destroyed the world.  Had they actually been that close to success?  Of course not.  But I don't relate to status as most people do, so that part, the status-overreach, wasn't the part I was rolling my eyes about.  It is not the sort of epistemic prediction error that I see as damnable in the way that a status-regulator sees it as the worst thing in the world; to underestimate oneself is no more virtuous than to overestimate oneself.  Rather, I was rolling my eyes about the part that was a more blatant mistake, completely apart from the epistemic prediction error they probably couldn't help; the part that would have been a mistake even if they had almost destroyed the world.  I was rolling my eyes about how they'd now found a new way of being the story's subject.

Even if they had almost destroyed the world, the story would still not properly be about their guilt or their regret, it would be about almost destroying the world.  This is why, in a much more real and also famous case, President Truman was validly angered and told "that son of a bitch", Oppenheimer, to fuck off, after Oppenheimer decided to be a drama queen at Truman.  Oppenheimer was trying to have nuclear weapons be about Oppenheimer's remorse at having helped create nuclear weapons.  This feels obviously icky to me; I would not be surprised if Truman felt very nearly the same.

And so similarly I did not make a great show of regret about having spent my teenage years trying to accelerate the development of self-improving AI.  Was it a mistake?  Sure.  Should I promote it to the center of my narrative in order to make the whole thing be about my dramatic regretful feelings?  Nah.  I had AGI concerns to work on instead.  I did not neglect to conduct a review of what I did wrong and update my policies; you know some of those updates as the Sequences.  But that is different from re-identifying myself as a dramatic repentent sinner who had thereby been the story's subject matter.

In a broadly similar way:  If at some point you decide that the narrative governing your ongoing experience will be about you going insane because the world is ending:  Wow, congratulations at making the end of the world still be about you somehow.


Just decide to be sane, and write your internal scripts that way.

The third way I stay sane is a fiat decision to stay sane.

My mental landscape contains that option; I take it.

This is the point I am even less expecting to be helpful, or to correspond to any actionable sort of plan for most readers.

I will nonetheless go into more detail that will probably not make any sense.

Besides being a thing I can just decide, my decision to stay sane is also something that I implement by not writing an expectation of future insanity into my internal script / pseudo-predictive sort-of-world-model that instead connects to motor output.

(Frankly I expect almost nobody to correctly identify those words of mine as internally visible mental phenomena after reading them; and I'm worried about what happens if somebody insists on interpreting it anyway.  Seriously, if you don't see phenomena inside you that obviously looks like what I'm describing, it means, you aren't looking at the stuff I'm talking about.  Do not insist on interpreting the words anyway.  If you don't see an elephant, don't look under every corner of the room until you find something that could maybe be an elephant.)

One of the ways you can get up in the morning, if you are me, is by looking in the internal direction of your motor plans, and writing into your pending motor plan the image of you getting out of bed in a few moments, and then letting that image get sent to motor output and happen.  (To be clear, I actually do this very rarely; it is just a fun fact that this is a way I can defeat bed inertia.)

There are a lot of neighboring bad ideas to confuse this with.  The trick I'm describing above does not feel like desperately hyping myself up and trying to believe I will get out of bed immediately, with a probability higher than past experience would suggest.  It doesn't involve lying to myself about whether I'm likely to get up.  It doesn't involve violating the epistemic-instrumental firewall (factual questions absolutely separated from the consequences of believing things), to give myself a useful self-fulfilling prophecy.  It is not any of the absurd epistemic-self-harming bullshit that people are now flogging under the brand name "hyperstition", since older names like "chaos magick" or "lying to yourself" became less saleable.  I still expect to them to point to this and say, "Why, of course that is the same thing I am selling to you as 'hyperstition'!" because they would prefer not to look at my finger, never mind being able to see where I'm pointing.

With that said:  The getting-out-of-bed trick involves looking into the part of my cognition where my action plan is stored, and loading an image into it; and because the human brain's type system is a mess, this has the native type-feeling of an expectation or prediction that in a few seconds I will execute the motor-plan and get out of bed.

That I am working with cognitive stuff with that type-feel, is not the same thing as lying to myself about what's likely to happen; no, not even as a self-fulfilling prophecy.  I choose to regard the piece of myself whose things-that-feel-like-predictions get sent as default motor output, as having the character within my Way of a plan I am altering; rather than, you know, an actual mistaken prediction that I am believing.  If that piece of myself gets to have me roll out of bed, I get to treat it as a plan rather than as a prediction.  It feels internally like a prediction?  Don't believe everything you feel.  It's a pseudo-model that outputs a pseudo-prediction that does update in part from past experience, but its actual cognitive role is as a controller.

The key step is not meditating on some galaxy-brained bullshit about Lob's Theorem, until you've convinced yourself that things you believe become true.  It's about being able to look at the internal place where your mind stores a pseudo-predictive image of staying in bed, and writing instead a pseudo-prediction about getting out of bed, and then letting that flow to motor output three seconds later.

It is perhaps an unfortunate or misleading fact about the world (but a fact, so I deal with it), that people telling themselves galaxy-brained bullshit about Lob's Theorem or "hyperstition" may end up expecting that to work for them; which overwrites the pseudo-predictive controlling output, and so it actually does work for them.  That is allowed to be a thing that is true, for reality is reality.  But you don't have to do it the scrub's way.

Perceiving my internal processes on that level, I choose:

I will not write internal scripts which say that I am supposed to / pseudo-predict that I will, do any particular stupid or dramatic thing in response to the end of the world approaching visibly nearer in any particular way.

I don't permit it as a narrative, I don't permit it as a self-indulgence, and I don't load it into my pseudo-predictive self-model as a pending image that gets sent by default to internal cognitive motor outputs.

If you go around repeating to yourself that it would be only natural to respond to some stressful situation by going insane -- if you think that some unhelpful internal response is the normal, the default, the supposed-to reaction to some unhelpful external stimulus -- that belief is liable to wire itself in as being also the pseudo-prediction of the pseudo-model that loads your default thoughts.

All of this is not to be confused with the Buddhist doctrine that every form of negative internal experience is your own fault for not being Buddhist enough.  If you rest your hand on a hot stove, you will feel pain not because your self-pseudo-model pseudo-predicts this to be painful, but because there's direct nerves that go straight to brain areas and trigger pain.  The internal mechanism for this does not depend on a controlling pseudo-prediction, it just falls downward like a stone under gravity.  The same directness is allowed to be true about suffering and not just pain; if there's a clever way to overwrite pseudo-predictions of suffering and thereby achieve Buddhist indifference to bad things, I don't have it as a simple obvious surface lever to pull.  I also haven't chosen to go looking for a more complicated or indirect version of it.  I do not particularly trust that to end well.

But I do think there are various forms of drama, error, and insanity which are much more like "things people do because they expected themselves to do it"; and much less like the pain, or suffering, from burning your hand.

One could incorrectly summarize all this as "I have decided not to expect to go insane," but that would violate the epistemic-instrumental firewall and therefore be insane.


There's an edition of Dungeons and Dragons that has a god of self-improvement, called Irori.  My fanfictions sometimes include characters that worship Him (heresy), or seek what He sought (approved).

Irori's religion -- on my version, that is -- has mottos like, "You don't have problems, you have skill issues."  Irorians can be a bit harsh.

But even if something is a skill issue, that doesn't mean you have the skill, nor know how to solve it.

When an Irorian calls something a skill issue, they're not instructing you to feel bad about having not solved it already.

They are trying to convey the hope that it is solvable.

Doing crazy things because your brain started underproducing a neurotransmitter is a problem.  It wouldn't be very Irorian to tell you that you can't solve it just through even clearer thinking; but if there's a medication that directly fixes the problem, that is probably easier and faster and more effective.  Also, this isn't Dungeons and Dragons, Irori isn't real, and possibly you genuinely can't solve a neurotransmitter problem by thinking at it.

Doing crazy things because the world is ending is a skill issue.


These then are Eliezer Yudkowsky's probably-irreproducible ways of staying sane as the world seems more visibly close to ending:

A distaste for the boringly obvious trope of a character being driven mad by impending doom;

Not making the story be all about me, including my dramatically struggling to retain my sanity;

And a fiat decision to stay sane, implemented by not instructing myself that any particular stupidity or failure will be my reaction to future stress.


Probably you cannot just go do those three things.

Then figure out your own ways of staying sane, whether they be reproducible or irreproducible; and follow those ways instead.

The reason that I tell you of my own three methods, is not to provide an actionable recipe for staying sane as the world begins to seem visibly closer to ending.

It is an example, a reminder, and maybe even an instruction to the part of yourself that produces self-pseudo-predictions that get loaded as your internal mental behavior:

Sanity is a skill issue.



Discuss

Ordering Pizza Ahead While Driving

2025-12-07 10:01:09

Published on December 7, 2025 2:01 AM GMT

On a road trip there are a few common options for food:

  • Bring food
  • Grocery stores
  • Drive throughs
  • Places that take significant time to prepare food

Bringing food or going to a grocery store are the cheapest (my preference!) but the kids are hard enough to feed that we often buy prepared food when we're traveling. [1] And they often prefer food that takes a while to make (usually pizza) over what you can get in a drive through. A couple years ago I realized there's another option: calling in an order for pickup to where you'll be soon.

We'll use Google Maps "search along route" to identify a place ~30min out, and phone in an order. [2] By the time we arrive, the food is ready. We can combine the speed (and buffer maximization) benefits of drive throughs, with the variety of options from the wide range of restaurants that offer pickup.


[1] I'm also working on getting them to do better with brought food, but I'm focusing on lunch at school here because that's a much larger portion of their food away from home.

[2] It kind of amazes me that pizza places will take the costly action of preparing a pizza to my specifications based on a simple phone call, with no contact information beyond me giving a first name. I mean, it's great, but like so many things in our society it only works because there are extremely few people want to cause havoc and are willing to put any effort into doing so.

Comment via: facebook, mastodon, bluesky



Discuss

Existential despair, with hope

2025-12-07 04:48:33

Published on December 6, 2025 8:48 PM GMT

I have drafted thousands of words of essays on the topic of art and why it sustains my soul in times of despair, but none of it comes close to saying what I want to say, yet.

Therefore, without explanation, on the occasion of Berkeley’s Winter Solstice gathering, I will simply offer a link to a little-known work of art which has been for me a touchstone in my times of greatest grief and existential despair:

https://www.viselaya.org/sculptures/chandala/chandala.html

May it bring some of you something you need today.



Discuss

I Need Your Help

2025-12-07 02:48:06

Published on December 6, 2025 6:48 PM GMT

Hello LessWrong!

I am Jaivardhan Nawani, a 14 year old enthusiast of the ideas and mental models that stem from rationality. I want to spread this means of thinking amongst people my age.

I am by no means an expert, and I need your help and experience. I am working on setting up a course with simple teaching aids and handouts during lessons, with multiple batches. While sourcing students on its own is not a problem, it helps to have those that work or research this field actively on my side to teach. It would be amazing if anyone was willing to connect me to a university or a professor to whom I could pitch this to. Even students actively studying this subject are ideal.

Any more general advice from those that have done this already is always welcome.

Thanks in advance for all of your help!



Discuss