2025-07-02 11:00:51
Published on July 2, 2025 3:00 AM GMT
I wonder if there's a clear evidence that LessWrong text has been included in LLM training.
Claude seems generally aware of LessWrong, but it's difficult to distinguish between "this model has been trained on text that mentions LessWrong" and "this model has been trained on text from LessWrong"
Related discussion here, about preventing inclusion: https://www.lesswrong.com/posts/SGDjWC9NWxXWmkL86/keeping-content-out-of-llm-training-datasets?utm_source=perplexity
2025-07-02 10:58:39
Published on July 2, 2025 2:58 AM GMT
The first in a series of bite-sized rationality prompts[1].
This is my most common opening-move for Instrumental Rationality. There are many, many other pieces of instrumental rationality. But asking this question is usually a helpful way to get started. Often, simply asking myself "what's my goal?" is enough to direct my brain to a noticeably better solution, with no further work.
I'm playing Portal 2, or Baba is You. I'm fiddling around with the level randomly, sometimes going in circles. I notice I've been doing that awhile.
I ask "what's my goal?"
And then my eyes automatically glance at the exit for the level and realize I can't possibly make progress unless I solve a particular obstacle, which none of my fiddling-around was going to help with.
I'm arguing with a person, poking holes in their position. I easily notice considerations that feel really compelling to me that reinforce that my side is correct. I notice it's getting heated.
I ask "what's my goal?"
And then realize I had basically been trying to look smart and "win" some kind of imaginary points. But what I actually needed to do was either change their mind. Which requires modeling their current beliefs and figure out what would actually persuade them, not what cached beliefs are compelling to me.
(Or, which requires de-escalating the tension and making it feel safe for them to consider changing their mind, whereas I'd previously been actively been implying that if I was right, they were dumb and should feel dumb and nobody wants to feel that way)
((Or, maybe... I actually want to learn and potentially change my mind about something, and repeating reasons that I already think I'm right isn't really helping anyone))
I'm writing up a proposal for what my team should do next. I'm fleshing out a lot of details, getting excited about possibilities. I notice I've spent a couple days on this.
I ask "what's my goal?"
And the answer is "help my boss figure out what we should do next." My boss has given me a couple weeks to come up with some pitches for what to do. And I realize:
a) for the first week, I should probably doing a lot more exploring of radically different ideas rather than latching onto the first one I came up with.
b) I should be anticipating what my boss's objections will be, since it's ultimately his call.
I begin strategically directing my attention to areas that will help me find more novel ideas, and asking "what would be the fastest way to figure out if this idea is any good?"
When is it a good time to ask "what's my goal?"?
One good time is "when you are first starting a project, or planning your day/week", so you just skip ahead past a lot of meandering unstrategic false starts on the wrong problem.
(Sometimes, at the beginning of a project, you will decide "my goal is to figure out what-the-hell-I'm-even-doing, and the best way to do that is via meandering play without stressing about goals." But, like, now you've doublechecked)
Another good time is "when you get a nagging feeling that you're doing the wrong thing" or "you notice you sure have been spending a lot of time on this particular step, and are feeling bored or frustrated."
Note: if you don't feel like doing all of these,
Come up with 1-2 recent times you would have benefited from asking yourself "what's my goal?" earlier.
Come up with 1-2 projects/actions you're currently in the middle of, where you have a nagging suspicion you're not really tackling the right goal.
Come up with 1-2 moments you expect to run into in the next week where it seems likely you're going to go tackle something without really understanding your goals and predictably waste some time.
See also summoning-sapience and Humans are not automatically strategic
I am experimenting with the mantra "your rationality paradigm is not complete until it is decomposed into simple sentences that people can immediately understand."
2025-07-02 10:43:20
Published on July 2, 2025 2:43 AM GMT
Dimensionalization is a decision-making method that ranks choices based on their attributes. It is effective but high-effort, useful in only the highest-leverage contexts.
Modern AI tools (ChatGPT, Claude, etc) have dimensionalization as a native capability. It just needs to be named and requested.
You can dimensionalize any decision, objective, or context in your work or life in three steps, all doable in your chat app:
Pick any topic where you are uncertain.
LLMs make overkill cheap; use your genius-on-tap for trivial problems if you want. There is also now no such thing as a problem that is too hard for AI to be able to help.
The keyword is “dimensionalize”. Try these prompts:
I am trying to <goal>. Help me dimensionalize.
<goal>: “do things that matter”, “make a new friend”, “boost viewership”, “renovate my space”, or “feel healthier”.
I’m deciding whether to <choice> or <choice>. Help me dimensionalize.
<choice>: “buy or rent my next flat”, “go out or stay in”, or “find a new job or stay at my current one”.
I want to better understand <concept>. Help me dimensionalize.
<concept>: “lighting design”, “what I should do with my free time”, “being happy”, or “popularity”.
You can certainly add more context to your request for personalization. But this is enough to get started.
I wrote my piece on Dimensionalization in part to help AIs do it better. It doesn’t matter whether you’ve read it; it can still help you.
Try this addition:
Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this. https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization
This is optional, but you may find that you get easier-to-action results by linking explicitly to the article.
Copy+Paste. Press Enter. Now you have the scaffolding of a great decision.
The AI will generate a framework for how to think about your decision, concept, or goal. But it doesn’t have to be you doing the thinking. The AI is the perfect user of its own framework!
Some example responses you can copy+paste:
List 25 options, each ranked on these dimensions. Sort by the highest weighted average score.
I don’t understand what you mean by <confusing dimension>. Explain and show me what the range looks like.
What happens if <thing you want to avoid>?
You’re missing <crucial attribute>; tell me where it fits or redo the framework.
This is totally wrong. Redo it +1sd smarter.
Each loop takes ~15 seconds, so it’s low-cost/high-reward to notice something you want to fix or clarify.
Jordan:
I am trying to choose the right next career step. Help me dimensionalize. Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this. https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization
o3:
See response below.[1]
Jordan:
List 25 options, each ranked on these dimensions. Sort by the highest weighted average score.
o3:
just dropped an interactive table with 25 plausible career moves, each scored on the seven dials and pre-sorted by their unweighted average (weights = equal for now). top chunk predictably skews founder-ish—high autonomy + excitation punch through despite lifestyle tanking. scan / tweak weights as you like and re-rank in-place. want a different weighting scheme, more granular scores, or a prune to contenders? ping me.
Jordan:
I want to better understand music theory. Help me dimensionalize. Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this. https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization
o3:
See below.[2]
Jordan:
You're missing something related to major/minor or emotional affect. Tell me where it fits or redo the framework.
o3:
Jordan:
Rank Radiohead's "No Surprises" on each dimension, dropping timbral color wheel and keeping valence polarity.
o3:
Jordan:
Give me another song with the same ratings along each dimension.
o3:
Jordan:
Now give me one that is 1 point of valence polarity lower but otherwise the same
o3:
“mad world” – the gary jules / michael andrews cover (donnie darko ost). knobs: …
Skeptical? Give “No Surprises”, “New Slang”, and “Mad World” a listen.
That’s it.[3] No fancy schemas, no bespoke tooling. The magic is that the LLM knows every efficient frontier, beckoning and reshaping them on command.
The rest is you deciding when the map feels true enough to act.
here’s a minimal-but-sufficient dial-set for “what job next?” each dial is a slider you can actually move via what offers you accept / reject. i’ve sanity-checked against fidelity (tracks reality), leverage (you can yank it), and complexity (small enough to hold in working memory) per the lesswrong recipe.
1. value-capture efficiency
how many post-tax, post-non-compete dollars accrue per unit of sweat + risk you personally supply.
2. delta-irled impact
“how much counterfactual good stuff in the world exists because i showed up here?” think marginal value add, not vanity metrics.
3. option surface area
future doors the gig unlocks (network nodes, brand signalling, resume viscosity). optionality is real capital.
4. personal excitation gradient
will you wake up buzzing to push on it for >18 months? intrinsic + aesthetic alignment.
5. autonomy bandwidth
degrees of freedom in decision-making, budget control, hiring, toolchain.
6. skill compounding rate
how steeply does the role grow rare, transferrable circuits in your brain? (e.g. RLHF infra, capital-structure alchemy, government sales).
7. lifestyle stability envelope
stress, hours variance, geo constraints, moral injury risk; keeps the rest sustainable.
next moves
questions: which dial feels hardest to score rn, and why? that’s your info-gathering sprint.
music theory’s too sprawling to slice along one master axis, so here’s a lean-but-punchy dial-deck—each dial scores high-Fidelity (it tracks something real), high-Leverage (you can tweak it in practice and it actually changes the music), and stays low-Complexity (you can keep the set in short-term ram). numbers in ⟦brackets⟧ are rough 0-5 ratings for fidelity / leverage / complexity.
1. tonal gravity ⟦5 / 5 / 1⟧
where do pitches “want” to fall? twist the knob from strict functional tonality (tonic–dominant magnetism) toward modal, then free-tonal, then atonal. one slider covers centuries; you can feel it immediately on a keyboard.
2. harmonic motion per breath ⟦5 / 4 / 2⟧
chords-per-phrase. slow = hymn, medium = pop, frantic = bebop. easy to alter (substitutions, reharm) and listeners notice instantly.
3. dissonance budget ⟦4 / 4 / 2⟧
percent of “unstable” intervals (tritone, minor 2nd, etc.) before you resolve. tiny tweak = big affect. lever shows up in orchestration, voicing, even mixing choices.
4. metric liquidity ⟦4 / 5 / 1⟧
how rigid is the time grid? slider from click-track square (4/4, quantized) → swung → additive / mixed meters → rubato. drummer, sequencer, or conductor can move it live.
5. motif entropy ⟦5 / 3 / 2⟧
repeat-vs-transform ratio of your core melodic cell. low entropy = chant; mid = hooky pop; high = late-romantic fragmentation; max = free improv. knob doubles as “listener memorability.”
6. textural bandwidth ⟦4 / 4 / 2⟧
density of simultaneous parts: solo line → homophony → polyphony → wall-of-sound. orchestration and DAW track count are literal dial turns.
7. timbral color wheel position ⟦3 / 3 / 2⟧
bright-metallic ↔ dark-wooden spectrum. swap instruments, filter eq, or change playing technique; payoff in emotional shading.
8. tension-release half-life ⟦4 / 4 / 2⟧
how long the piece lets suspense simmer before resolving (seconds? minutes? never?). quick knob gives pop-drop; long gives post-rock crescendo.
9. narrative form granularity ⟦5 / 3 / 3⟧
number of distinct formal blocks you treat as atomic (a section, a phrase group, a single bar). fewer = classical sonata; many micro-blocks = modern film scoring. adjust by editing arrangement markers.
why these nine? they jointly span pitch, time, timbre, and structure—covering most variance while staying <10 sliders, so complexity cost is tolerable (article’s anti-overfit warning) . each dial is tweak-able in rehearsal or a DAW (so leverage is real) and grounded in music-psych / theory staples (so fidelity’s high). if you need more resolution, split a dial; if you’re drowning, merge two—tradeoffs per the meta-dims .
next move: pick a piece you love, score it 0-5 on each dial, then mock-tweak one dial at a time and predict the audible result. that’s dimensionalization doing work.
Yes, you can layer on weightings and sensitivity analysis, metrics, constraints, adversarial passes, visualization, etc.
But the breakthrough moment is realizing you can offload the first 80+% of cognitive grunt work for the price of a three-sentence prompt.
2025-07-02 09:46:13
Published on July 2, 2025 1:46 AM GMT
Over the past 6-8 months, I have been involved in drafting AI policy recommendations and official statements directed at governments and institutions across the Global Majority: Chile, Lesotho, Malaysia, the African Commission on Human and Peoples' Rights (ACHPR), Israel, and others. At first glance, this may appear to be a less impactful use of time compared to influencing more powerful jurisdictions like the United States or the European Union. But I argue that engaging with the Global Majority is essential, neglected, and potentially pivotal in shaping a globally safe AI future. Below, I outline four core reasons.
As global alignment becomes harder, we need decentralized, national-level safety nets. Some things to keep in mind:
In such a world, country-level laws and guidance documents serve as a final line of retreat. Even modest improvements in national frameworks can meaningfully reduce the risk of AI misuse, particularly in high-leverage areas like biometric surveillance, automated welfare allocation, or predictive policing.
Moreover, in many Global Majority countries, the state remains the most powerful actor. When risks emerge, it is not always corporations but often ministries, police departments, or public-sector procurement decisions that determine outcomes. Consider the history of state-led atrocities enabled by surveillance or classification systems. Examples include Rwanda’s classification systems (during the 1994 genocide), which used bureaucratic data categories to identify targets, and Apartheid-era South Africa, which collected data to enforce racial segregation. Engaging with the government, building governance capacity, and public-sector-specific guardrails are therefore critical.
Engagement with Global Majority AI policy is still deeply neglected:
This creates significant leverage for meaningful influence. A single well-argued submission can shape a ministry’s perception of risks or clarify foundational governance issues. One doesn’t need millions in funding or a permanent office in the capital. In many cases, simple engagement and public comment can go a long way.
Finally, championing safety-focused norms in the Global Majority may help lay the groundwork for a more robust and inclusive international framework for AI governance. Many countries in the Global South, especially middle powers (e.g. South Africa, Brazil, Indonesia), have historically played important convening roles in other domains (e.g. the Non-Aligned Movement, BRICS, the Cartagena Protocol).
In future AI governance scenarios, these countries could serve as trusted mediators between Global North and Global South perspectives.
AI risks are not monolithic, and AI governance must cover the spectrum of risks. The challenges posed by AI systems in Lagos or Kuala Lumpur differ significantly from those in London or Brussels. Factors such as:
Global frameworks like the EU AI Act or the OECD AI Principles often assume certain levels of institutional maturity or civil liberties protections. These assumptions can fall short.
Consider the case of autonomous weapon systems (AWS): over 80% of global conflicts over the past decade have occurred in Global South regions, >90% of the countries listed by ACLED for extreme, high, or turbulent levels of conflict are Global Majority. While the development of AWS is typically concentrated in technologically advanced countries, its deployment is more likely to happen in the Global Majority countries. These environments often serve as the testing ground for cutting-edge military technologies without meaningful global scrutiny. Western policy frameworks rarely prioritize this asymmetric risk, in part because the worst consequences will not be felt domestically.
2025-07-02 08:55:56
Published on July 2, 2025 12:55 AM GMT
This is a follow-up to my earlier post about designing a Winter Solstice gathering that combined Rationalist Solstice traditions with local Māori Matariki practices. Here's what I learned from actually running the event.
TL;DR: People wanted structured conversation more than curated performance. Starting with collective acknowledgment of loss made subsequent vulnerability feel natural. Social coordination mechanics are harder than they look, but small-scale practice matters for larger coordination challenges.
Growing up in a religious family, I personally wasn't getting the meaningful aspects of seasonal gatherings which I fondly remember from my childhood. Living in New Zealand, I wanted to create something that honored both Rationalist Solstice traditions and local Matariki practices without falling into either cultural appropriation or forcing cringy fake rituals on people.
My design principles were:
The evening progressed through Acts from Golden Hour, Twilight, Sunset, Nightfall, through "Darkest Night" to "Dawn" with ~27 guests over 3 hours.
The evening's keystone turned out to be our opening "Remembrance of the Dead" ritual. We started by explaining Matariki/Pleiades astronomy, then transitioned to acknowledging "the company we've lost."
The mechanics: Table by table, people came forward. Anyone who wished could speak a name of someone who had died, and also open up about why they remember that person, then extinguish a lamp. No obligation to speak—silence was explicitly permitted as a form of honoring others. As each light went out, the room grew dimmer.
This worked because:
Physical embodiment: Each name visibly changed the environment
Starting with acknowledged loss made everything else possible. When we later asked deeper questions to be shared, it felt natural rather than forced.
I spent enormous energy curating music for each transition. This largely failed. People mostly wanted to chat with their neighbors, and the room acoustics made careful listening impossible anyway.
What actually worked: Structured conversation prompts that progressed from safe to vulnerable:
The progression gave people permission to open up while maintaining safety. Background music was fine, but 30% of my preparation effort would have achieved 90% of the value.
I wanted to give people the opportunity to interact with more people over the night by reshuffling table conversations a few times during the evening. The goal worked—people met new faces and practiced small-scale cooperation. The execution was clunky.
What I tried:
Failure modes:
Better approaches for next time: Random selection (birthday months, card draws, etc.) rather than requiring tables to self-select leaders. Clearer signaling for transitions—maybe a dinner bell rather than trying to speak over conversations.
I expected musical transitions to naturally guide people between acts. Instead, we had to interrupt conversations and call for attention each time. This felt jarring and exposed a fundamental tension: people wanted to participate, not perform or be performed to.
The co-hosted presentation elements (alternating speaking with a different person for each Act) worked better than solo delivery, but still felt scripted. Next time, less performance, more facilitation.
Our culminating ritual involved forming a circle, my sharing reflections on mortality and existential risk, then sitting in complete darkness before relighting candles together. [Full text of my reflection available here]
This partially worked—the content addressed real fears (AI alignment, climate change, personal struggles) and the physical darkness created a powerful shared experience. But my expectation that people would naturally share after my reflection didn't materialize. 25 people in a circle is too large for spontaneous vulnerability without more preparation.
Helen and Tessa spent most of their evening managing food service instead of participating in conversations. Hiring help would be worth it for hosts to actually attend their own event.
The most valuable insight: people wanted guided conversation more than curated experience. They appreciated framing and permission to go deeper, but they wanted to create meaning with each other, not receive it from a presenter.
This challenges assumptions about what makes events "meaningful." The structure enabled authentic interaction rather than creating it.
Beginning with mortality acknowledgment rather than working up to it created psychological safety for later vulnerability. When you've already said the names of the dead together, discussing personal struggles feels manageable.
The table mixing activities were clunky but valuable. We can't coordinate at larger scales unless we know how to gather around fires together. Even failed attempts teach us something about human coordination patterns. More could be done here.
People will remember the moment lights went out with each name spoken. They'll remember holding candles in complete darkness. The embodied experiences matter more than the words.
25+ people spent three hours in comparitively deeper discussions with their friends and neighbors, than we normally find ourselves in the rest of the year - processing mortality, existential risk, and personal struggles together. We're planning to continue annually, incorporating these lessons.
The broader question: What role should secular ritual play in rationalist community building? If we need coordination to solve civilizational challenges, we need practice spaces for learning how to coordinate. Seasonal gatherings might be one useful approach.
I used Claude extensively as a research assistant and drafting tool for both planning and writing about this event, while taking care to review and rewrite everything to ensure it represented my actual experience and values.
2025-07-02 08:09:43
Published on July 2, 2025 12:05 AM GMT
I want to retain the ability to update my values over time, but I don’t want those updates to be the result of manipulative optimization by a superintelligence. Instead, the superintelligence should supply me with accurate empirical data and valid inferences, while leaving the choice of normative assumptions—and thus my overall utility function and its proxy representation (i.e., my value structure)—under my control. I also want to engage in value discussions (with either humans or AIs) where the direction of value change is symmetric: both participants have roughly equal probability of updating, so that persuasive force isn’t one-sided. This dynamic can be formally modeled as two agents with evolving objectives or changing proxy representations of their objectives, interacting over time.
That's what alignment means to me: normative freedom with slowly evolving symmetric changes across agents.
In rare cases, strategic manipulation might be justified—e.g., if an agent’s values are extremely dangerous—but that would be a separate topic involving the deliberate use of misalignment, not alignment itself.
A natural concern is whether high intelligence and full information would cause agents to converge on the same values. But convergence is not guaranteed if agents differ in their terminal goals or if their value systems instantiate distinct proxy structures.
Still, suppose a superintelligence knows my values precisely (whether fixed or dynamically updated). It can then compute the optimal policy for achieving them and explain that policy to me. If I accept its reasoning, I follow the policy not due to coercion but because it best satisfies my own value function. In such a world, each agent can be helped to succeed according to their own values, and since utility isn’t necessarily zero-sum, widespread success is possible. This scenario suggests a pre-formal notion of alignment: the AI enables agents to achieve their goals by supplying accurate world models and optimal plans under user-specified normative assumptions, without hijacking or implicitly rewriting those assumptions.