MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Are LLMs being trained using LessWrong text?

2025-07-02 11:00:51

Published on July 2, 2025 3:00 AM GMT

I wonder if there's a clear evidence that LessWrong text has been included in LLM training.

Claude seems generally aware of LessWrong, but it's difficult to distinguish between "this model has been trained on text that mentions LessWrong" and "this model has been trained on text from LessWrong"

Related discussion here, about preventing inclusion: https://www.lesswrong.com/posts/SGDjWC9NWxXWmkL86/keeping-content-out-of-llm-training-datasets?utm_source=perplexity



Discuss

"What's my goal?"

2025-07-02 10:58:39

Published on July 2, 2025 2:58 AM GMT

The first in a series of bite-sized rationality prompts[1].

 

This is my most common opening-move for Instrumental Rationality. There are many, many other pieces of instrumental rationality. But asking this question is usually a helpful way to get started. Often, simply asking myself "what's my goal?" is enough to direct my brain to a noticeably better solution, with no further work.

Examples

Puzzle Games

I'm playing Portal 2, or Baba is You. I'm fiddling around with the level randomly, sometimes going in circles. I notice I've been doing that awhile. 

I ask "what's my goal?"

And then my eyes automatically glance at the exit for the level and realize I can't possibly make progress unless I solve a particular obstacle, which none of my fiddling-around was going to help with.

Arguing

I'm arguing with a person, poking holes in their position. I easily notice considerations that feel really compelling to me that reinforce that my side is correct. I notice it's getting heated. 

I ask "what's my goal?"

And then realize I had basically been trying to look smart and "win" some kind of imaginary points. But what I actually needed to do was either change their mind. Which requires modeling their current beliefs and figure out what would actually persuade them, not what cached beliefs are compelling to me. 

(Or, which requires de-escalating the tension and making it feel safe for them to consider changing their mind, whereas I'd previously been actively been implying that if I was right, they were dumb and should feel dumb and nobody wants to feel that way)

((Or, maybe... I actually want to learn and potentially change my mind about something, and repeating reasons that I already think I'm right isn't really helping anyone))

Product Discovery

I'm writing up a proposal for what my team should do next. I'm fleshing out a lot of details, getting excited about possibilities. I notice I've spent a couple days on this.

I ask "what's my goal?"

And the answer is "help my boss figure out what we should do next." My boss has given me a couple weeks to come up with some pitches for what to do. And I realize:

a) for the first week, I should probably doing a lot more exploring of radically different ideas rather than latching onto the first one I came up with.

b) I should be anticipating what my boss's objections will be, since it's ultimately his call.

I begin strategically directing my attention to areas that will help me find more novel ideas, and asking "what would be the fastest way to figure out if this idea is any good?"

Triggers

When is it a good time to ask "what's my goal?"?

One good time is "when you are first starting a project, or planning your day/week", so you just skip ahead past a lot of meandering unstrategic false starts on the wrong problem.

(Sometimes, at the beginning of a project, you will decide "my goal is to figure out what-the-hell-I'm-even-doing, and the best way to do that is via meandering play without stressing about goals." But, like, now you've doublechecked)

Another good time is "when you get a nagging feeling that you're doing the wrong thing" or "you notice you sure have been spending a lot of time on this particular step, and are feeling bored or frustrated."

 

Exercises for the Reader

Note: if you don't feel like doing all of these, 

Come up with 1-2 recent times you would have benefited from asking yourself "what's my goal?" earlier.

Come up with 1-2 projects/actions you're currently in the middle of, where you have a nagging suspicion you're not really tackling the right goal. 

Come up with 1-2 moments you expect to run into in the next week where it seems likely you're going to go tackle something without really understanding your goals and predictably waste some time.


See also summoning-sapience and Humans are not automatically strategic 

  1. ^

    I am experimenting with the mantra "your rationality paradigm is not complete until it is decomposed into simple sentences that people can immediately understand."



Discuss

Use AI to Dimensionalize

2025-07-02 10:43:20

Published on July 2, 2025 2:43 AM GMT

Dimensionalization is a decision-making method that ranks choices based on their attributes. It is effective but high-effort, useful in only the highest-leverage contexts.

Modern AI tools (ChatGPT, Claude, etc) have dimensionalization as a native capability. It just needs to be named and requested.

You can dimensionalize any decision, objective, or context in your work or life in three steps, all doable in your chat app:

  1. Choose
  2. Seed
  3. Iterate

1. Choose (a goal, decision, or concept)

Pick any topic where you are uncertain.

LLMs make overkill cheap; use your genius-on-tap for trivial problems if you want. There is also now no such thing as a problem that is too hard for AI to be able to help.

The keyword is “dimensionalize”. Try these prompts:

I am trying to <goal>. Help me dimensionalize.

<goal>: “do things that matter”, “make a new friend”, “boost viewership”, “renovate my space”, or “feel healthier”.

I’m deciding whether to <choice> or <choice>. Help me dimensionalize.

<choice>: “buy or rent my next flat”, “go out or stay in”, or “find a new job or stay at my current one”.

I want to better understand <concept>. Help me dimensionalize.

<concept>: “lighting design”, “what I should do with my free time”, “being happy”, or “popularity”.


You can certainly add more context to your request for personalization. But this is enough to get started.

2. Seed (with Dimensionalization, optionally)

I wrote my piece on Dimensionalization in part to help AIs do it better. It doesn’t matter whether you’ve read it; it can still help you.

Try this addition:

Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this.

https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization

This is optional, but you may find that you get easier-to-action results by linking explicitly to the article.


3. Iterate (generate, ask, redo)

Copy+Paste. Press Enter. Now you have the scaffolding of a great decision.

The AI will generate a framework for how to think about your decision, concept, or goal. But it doesn’t have to be you doing the thinking. The AI is the perfect user of its own framework!

Some example responses you can copy+paste:

List 25 options, each ranked on these dimensions. Sort by the highest weighted average score.
I don’t understand what you mean by <confusing dimension>. Explain and show me what the range looks like.
What happens if <thing you want to avoid>?
You’re missing <crucial attribute>; tell me where it fits or redo the framework.
This is totally wrong. Redo it +1sd smarter.

Each loop takes ~15 seconds, so it’s low-cost/high-reward to notice something you want to fix or clarify.

Example: Career Choice

Jordan:

I am trying to choose the right next career step. Help me dimensionalize.

Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this.

https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization

o3:

See response below.[1]

Jordan:

List 25 options, each ranked on these dimensions. Sort by the highest weighted average score.

o3:

just dropped an interactive table with 25 plausible career moves, each scored on the seven dials and pre-sorted by their unweighted average (weights = equal for now). top chunk predictably skews founder-ish—high autonomy + excitation punch through despite lifestyle tanking. scan / tweak weights as you like and re-rank in-place.

want a different weighting scheme, more granular scores, or a prune to contenders? ping me.

Example: Music Theory

Jordan:

I want to better understand music theory. Help me dimensionalize.

Read this article ⤵️. Make sure the dimensions you create are high-Fidelity, high-Leverage, and low-Complexity. Consider the sub-dimensions of Fidelity, Leverage, and Complexity while doing this.

https://www.lesswrong.com/posts/LSFiKt4zGxXcX2oxi/dimensionalization

o3:

See below.[2]

Jordan:

You're missing something related to major/minor or emotional affect. Tell me where it fits or redo the framework.

o3:

 

Jordan:

Rank Radiohead's "No Surprises" on each dimension, dropping timbral color wheel and keeping valence polarity.

o3:

 

Jordan:

Give me another song with the same ratings along each dimension.

o3:

 

Jordan:

Now give me one that is 1 point of valence polarity lower but otherwise the same

o3:

“mad world” – the gary jules / michael andrews cover (donnie darko ost). knobs:

…

Skeptical? Give “No Surprises”, “New Slang”, and “Mad World” a listen.


Workflow Recap

  1. Choose (a decision, concept, or goal)
  2. Seed (with Dimensionalization, optionally)
  3. Iterate (question, tweak, repeat)

That’s it.[3] No fancy schemas, no bespoke tooling. The magic is that the LLM knows every efficient frontier, beckoning and reshaping them on command.

The rest is you deciding when the map feels true enough to act.

  1. ^

    here’s a minimal-but-sufficient dial-set for “what job next?” each dial is a slider you can actually move via what offers you accept / reject. i’ve sanity-checked against fidelity (tracks reality), leverage (you can yank it), and complexity (small enough to hold in working memory) per the lesswrong recipe.


    1. value-capture efficiency

    how many post-tax, post-non-compete dollars accrue per unit of sweat + risk you personally supply.

    • why it’s legit: comp structures vary orders of magnitude; dial is under negotiation.
    • fidelity: factor exposures and vesting cliffs are real, audit-able; good luck hand-waving those.
    • leverage: equity %, carry split, garden-leave offset, etc.
    • complexity hit: low—throw it in a spreadsheet, done.

    2. delta-irled impact

    “how much counterfactual good stuff in the world exists because i showed up here?” think marginal value add, not vanity metrics.

    • fidelity: can tie to P&L, product adoption, win-rate.
    • leverage: choose green-field mandates vs maintenance roles.
    • complexity: coarse rank-order (log-scale) is enough.

    3. option surface area

    future doors the gig unlocks (network nodes, brand signalling, resume viscosity). optionality is real capital.

    • fidelity: alumni placement stats, brand equity scores.
    • leverage: you pick prestige vs stealth startups.
    • complexity: just track “how many plausible high-EV branches open from here?”

    4. personal excitation gradient

    will you wake up buzzing to push on it for >18 months? intrinsic + aesthetic alignment.

    • fidelity: gut but measurable—observe flow states, sunday-scaries count.
    • leverage: interview for roadmap fit, negotiate scope.
    • complexity: simple 0-5 Likert.

    5. autonomy bandwidth

    degrees of freedom in decision-making, budget control, hiring, toolchain.

    • fidelity: org chart + reporting lines don’t lie.
    • leverage: titles, OKR scope, remote vs on-site are bargained.
    • complexity: split into strategy autonomy vs execution autonomy if you must; otherwise one knob.

    6. skill compounding rate

    how steeply does the role grow rare, transferrable circuits in your brain? (e.g. RLHF infra, capital-structure alchemy, government sales).

    • fidelity: curriculum of tasks, mentor density.
    • leverage: choose problem class + peer set.
    • complexity: “does this level me up ≥1σ per year?”

    7. lifestyle stability envelope

    stress, hours variance, geo constraints, moral injury risk; keeps the rest sustainable.

    • fidelity: churn data, Glassdoor bandwidth, your own cortisol stats.
    • leverage: remote policy, team size, sector choice.
    • complexity: throttle to a single slider: “how fragile is my life if this spikes?”

    next moves

    1. rate each candidate role 0-5 on every dial.
    2. weight dials (if at all) by current life context—new baby => bump lifestyle weight.
    3. visualize: radar chart or just sum(weight·score).
    4. interrogate any role that looks great except for one red dial: can it be negotiated? if not, consider designing your own seat.

    questions: which dial feels hardest to score rn, and why? that’s your info-gathering sprint.

  2. ^

    music theory’s too sprawling to slice along one master axis, so here’s a lean-but-punchy dial-deck—each dial scores high-Fidelity (it tracks something real), high-Leverage (you can tweak it in practice and it actually changes the music), and stays low-Complexity (you can keep the set in short-term ram). numbers in ⟦brackets⟧ are rough 0-5 ratings for fidelity / leverage / complexity.

    1. tonal gravity ⟦5 / 5 / 1⟧

    where do pitches “want” to fall? twist the knob from strict functional tonality (tonic–dominant magnetism) toward modal, then free-tonal, then atonal. one slider covers centuries; you can feel it immediately on a keyboard.

    2. harmonic motion per breath ⟦5 / 4 / 2⟧

    chords-per-phrase. slow = hymn, medium = pop, frantic = bebop. easy to alter (substitutions, reharm) and listeners notice instantly.

    3. dissonance budget ⟦4 / 4 / 2⟧

    percent of “unstable” intervals (tritone, minor 2nd, etc.) before you resolve. tiny tweak = big affect. lever shows up in orchestration, voicing, even mixing choices.

    4. metric liquidity ⟦4 / 5 / 1⟧

    how rigid is the time grid? slider from click-track square (4/4, quantized) → swung → additive / mixed meters → rubato. drummer, sequencer, or conductor can move it live.

    5. motif entropy ⟦5 / 3 / 2⟧

    repeat-vs-transform ratio of your core melodic cell. low entropy = chant; mid = hooky pop; high = late-romantic fragmentation; max = free improv. knob doubles as “listener memorability.”

    6. textural bandwidth ⟦4 / 4 / 2⟧

    density of simultaneous parts: solo line → homophony → polyphony → wall-of-sound. orchestration and DAW track count are literal dial turns.

    7. timbral color wheel position ⟦3 / 3 / 2⟧

    bright-metallic ↔ dark-wooden spectrum. swap instruments, filter eq, or change playing technique; payoff in emotional shading.

    8. tension-release half-life ⟦4 / 4 / 2⟧

    how long the piece lets suspense simmer before resolving (seconds? minutes? never?). quick knob gives pop-drop; long gives post-rock crescendo.

    9. narrative form granularity ⟦5 / 3 / 3⟧

    number of distinct formal blocks you treat as atomic (a section, a phrase group, a single bar). fewer = classical sonata; many micro-blocks = modern film scoring. adjust by editing arrangement markers.

    why these nine? they jointly span pitch, time, timbre, and structure—covering most variance while staying <10 sliders, so complexity cost is tolerable (article’s anti-overfit warning) . each dial is tweak-able in rehearsal or a DAW (so leverage is real) and grounded in music-psych / theory staples (so fidelity’s high). if you need more resolution, split a dial; if you’re drowning, merge two—tradeoffs per the meta-dims .

    next move: pick a piece you love, score it 0-5 on each dial, then mock-tweak one dial at a time and predict the audible result. that’s dimensionalization doing work.

  3. ^

    Yes, you can layer on weightings and sensitivity analysis, metrics, constraints, adversarial passes, visualization, etc.

    But the breakthrough moment is realizing you can offload the first 80+% of cognitive grunt work for the price of a three-sentence prompt.



Discuss

Why Engaging with Global Majority AI Policy Matters

2025-07-02 09:46:13

Published on July 2, 2025 1:46 AM GMT

Over the past 6-8 months, I have been involved in drafting AI policy recommendations and official statements directed at governments and institutions across the Global Majority: Chile, Lesotho, Malaysia, the African Commission on Human and Peoples' Rights (ACHPR), Israel, and others. At first glance, this may appear to be a less impactful use of time compared to influencing more powerful jurisdictions like the United States or the European Union. But I argue that engaging with the Global Majority is essential, neglected, and potentially pivotal in shaping a globally safe AI future. Below, I outline four core reasons.

1. National-Level Safeguards Are Essential in a Fracturing World

As global alignment becomes harder, we need decentralized, national-level safety nets. Some things to keep in mind:

  • What if the EU AI Act is watered down tomorrow due to lobbying?
  • The U.S. Biden Executive Order on AI has already been rolled back.

In such a world, country-level laws and guidance documents serve as a final line of retreat. Even modest improvements in national frameworks can meaningfully reduce the risk of AI misuse, particularly in high-leverage areas like biometric surveillance, automated welfare allocation, or predictive policing.

Moreover, in many Global Majority countries, the state remains the most powerful actor. When risks emerge, it is not always corporations but often ministries, police departments, or public-sector procurement decisions that determine outcomes. Consider the history of state-led atrocities enabled by surveillance or classification systems. Examples include Rwanda’s classification systems (during the 1994 genocide), which used bureaucratic data categories to identify targets, and Apartheid-era South Africa, which collected data to enforce racial segregation. Engaging with the government, building governance capacity, and public-sector-specific guardrails are therefore critical.

2. The Space Is Underserved and Entry Barriers Are Lower Than You Think

Engagement with Global Majority AI policy is still deeply neglected:

  • In some consultations we contributed to, there were fewer than 10 public submissions.
  • Governments often copy-paste or lightly adapt international frameworks like the UNESCO Recommendation on the Ethics of AI or the Council of Europe AI Treaty.
  • Misunderstandings are frequent. For example, one policy document wrongly assumed the CoE Treaty was only open to European countries.

This creates significant leverage for meaningful influence. A single well-argued submission can shape a ministry’s perception of risks or clarify foundational governance issues. One doesn’t need millions in funding or a permanent office in the capital. In many cases, simple engagement and public comment can go a long way.

3. It Builds Toward a Pluralistic International Regime

Finally, championing safety-focused norms in the Global Majority may help lay the groundwork for a more robust and inclusive international framework for AI governance. Many countries in the Global South, especially middle powers (e.g. South Africa, Brazil, Indonesia), have historically played important convening roles in other domains (e.g. the Non-Aligned Movement, BRICS, the Cartagena Protocol).

In future AI governance scenarios, these countries could serve as trusted mediators between Global North and Global South perspectives.

4. Risks Must Be Contextualized to Local Settings

AI risks are not monolithic, and AI governance must cover the spectrum of risks. The challenges posed by AI systems in Lagos or Kuala Lumpur differ significantly from those in London or Brussels. Factors such as:

  • Data scarcity or bias (e.g. models trained primarily on Western data misfiring in other linguistic or social contexts);
  • Different power structures (where the risk of AI misuse stems more from state actors than from large corporations);
  • Varied institutional capacity (e.g. lack of independent auditing bodies, digital literacy gaps);

Global frameworks like the EU AI Act or the OECD AI Principles often assume certain levels of institutional maturity or civil liberties protections. These assumptions can fall short.

Consider the case of autonomous weapon systems (AWS): over 80% of global conflicts over the past decade have occurred in Global South regions, >90% of the countries listed by ACLED for extreme, high, or turbulent levels of conflict are Global Majority. While the development of AWS is typically concentrated in technologically advanced countries, its deployment is more likely to happen in the Global Majority countries. These environments often serve as the testing ground for cutting-edge military technologies without meaningful global scrutiny. Western policy frameworks rarely prioritize this asymmetric risk, in part because the worst consequences will not be felt domestically.



Discuss

Lessons from Building Secular Ritual: A Winter Solstice Experiment

2025-07-02 08:55:56

Published on July 2, 2025 12:55 AM GMT

 

This is a follow-up to my earlier post about designing a Winter Solstice gathering that combined Rationalist Solstice traditions with local Māori Matariki practices. Here's what I learned from actually running the event.

TL;DR: People wanted structured conversation more than curated performance. Starting with collective acknowledgment of loss made subsequent vulnerability feel natural. Social coordination mechanics are harder than they look, but small-scale practice matters for larger coordination challenges.

What I Was Trying to Solve

Growing up in a religious family, I personally wasn't getting the meaningful aspects of seasonal gatherings which I fondly remember from my childhood. Living in New Zealand, I wanted to create something that honored both Rationalist Solstice traditions and local Matariki practices without falling into either cultural appropriation or forcing cringy fake rituals on people.

My design principles were:

  • Astronomical grounding: June 20th winter solstice coinciding with Matariki (Māori New Year)
  • Genuine conversation: Explicitly confronting mortality, loss of control, and existential risks rather than toxic positivity
  • Community coordination practice: Activities requiring cooperation and creating new social bonds
  • Embodied experience: Physical elements creating lasting memories beyond just talk

The evening progressed through Acts from Golden Hour, Twilight, Sunset, Nightfall, through "Darkest Night" to "Dawn" with ~27 guests over 3 hours.

What Worked: The Unexpected Power of Structured Vulnerability

The Lamps Ritual Sets Everything in Motion

The evening's keystone turned out to be our opening "Remembrance of the Dead" ritual. We started by explaining Matariki/Pleiades astronomy, then transitioned to acknowledging "the company we've lost."

The mechanics: Table by table, people came forward. Anyone who wished could speak a name of someone who had died, and also open up about why they remember that person, then extinguish a lamp. No obligation to speak—silence was explicitly permitted as a form of honoring others. As each light went out, the room grew dimmer.

This worked because:

  • Voluntary participation within collective structure: You could choose whether to speak, but everyone participated in the dimming
  • Opening up in front of the room: Started everyone off with a chance to share something personal, short-cutting straight into the Fast-Friends method for getting people connected.
  • Physical embodiment: Each name visibly changed the environment

     

Starting with acknowledged loss made everything else possible. When we later asked deeper questions to be shared, it felt natural rather than forced.

Conversation Prompts Beat Performance

I spent enormous energy curating music for each transition. This largely failed. People mostly wanted to chat with their neighbors, and the room acoustics made careful listening impossible anyway.

What actually worked: Structured conversation prompts that progressed from safe to vulnerable:

  1. "What seasonal rituals do you remember? Share fond memories and who they were with."
  2. "What journeys are you on? What journeys have you recently completed?"
  3. "What in your life feels stuck, frozen? Sharing can help us thaw out."
  4. "What truth do you need to acknowledge? What are you closing your eyes to?"

The progression gave people permission to open up while maintaining safety. Background music was fine, but 30% of my preparation effort would have achieved 90% of the value.

Social Mixing Mechanics (And Their Failure Modes)

I wanted to give people the opportunity to interact with more people over the night by reshuffling table conversations a few times during the evening. The goal worked—people met new faces and practiced small-scale cooperation. The execution was clunky.

What I tried:

  • First remix: Choose an "anchor" person to stay at each table, plus one companion. Everyone else freely chose new tables.
  • Second remix: Each table chose an "emissary" to visit another table, taking one companion. The remaining three stayed together.

Failure modes:

  • Tables devolved into conversation without actually selecting leaders
  • People hesitated to identify themselves or stand up
  • Instructions weren't clear enough, causing confusion during transitions

Better approaches for next time: Random selection (birthday months, card draws, etc.) rather than requiring tables to self-select leaders. Clearer signaling for transitions—maybe a dinner bell rather than trying to speak over conversations.

What Failed: The Performance vs. Participation Trade-off

Transitions and Authority

I expected musical transitions to naturally guide people between acts. Instead, we had to interrupt conversations and call for attention each time. This felt jarring and exposed a fundamental tension: people wanted to participate, not perform or be performed to.

The co-hosted presentation elements (alternating speaking with a different person for each Act) worked better than solo delivery, but still felt scripted. Next time, less performance, more facilitation.

The Darkness Meditation

Our culminating ritual involved forming a circle, my sharing reflections on mortality and existential risk, then sitting in complete darkness before relighting candles together. [Full text of my reflection available here]

This partially worked—the content addressed real fears (AI alignment, climate change, personal struggles) and the physical darkness created a powerful shared experience. But my expectation that people would naturally share after my reflection didn't materialize. 25 people in a circle is too large for spontaneous vulnerability without more preparation.

Service and Participation

Helen and Tessa spent most of their evening managing food service instead of participating in conversations. Hiring help would be worth it for hosts to actually attend their own event.

What This Teaches Us About Community Building

People Want Structure, Not Performance

The most valuable insight: people wanted guided conversation more than curated experience. They appreciated framing and permission to go deeper, but they wanted to create meaning with each other, not receive it from a presenter.

This challenges assumptions about what makes events "meaningful." The structure enabled authentic interaction rather than creating it.

Starting with Hard Truths Makes Everything Easier

Beginning with mortality acknowledgment rather than working up to it created psychological safety for later vulnerability. When you've already said the names of the dead together, discussing personal struggles feels manageable.

Small-Scale Coordination Practice Matters

The table mixing activities were clunky but valuable. We can't coordinate at larger scales unless we know how to gather around fires together. Even failed attempts teach us something about human coordination patterns.  More could be done here.

Physical Elements Create Lasting Memory

People will remember the moment lights went out with each name spoken. They'll remember holding candles in complete darkness. The embodied experiences matter more than the words.

Practical Takeaways for Event Organizers

  1. Spend less time on performance elements, more on conversation structure
  2. Start with collective acknowledgment of difficult truths rather than building up to them
  3. Use random selection for group activities rather than requiring self-selection
  4. Plan for conversations running longer than expected
  5. Physical/sensory elements create stronger memories than conceptual ones
  6. Co-hosting works better than solo presentation
  7. Hire help so organizers can participate in their own events

Looking Forward

25+ people spent three hours in comparitively deeper discussions with their friends and neighbors, than we normally find ourselves in the rest of the year - processing mortality, existential risk, and personal struggles together. We're planning to continue annually, incorporating these lessons.

The broader question: What role should secular ritual play in rationalist community building? If we need coordination to solve civilizational challenges, we need practice spaces for learning how to coordinate. Seasonal gatherings might be one useful approach.


I used Claude extensively as a research assistant and drafting tool for both planning and writing about this event, while taking care to review and rewrite everything to ensure it represented my actual experience and values.



Discuss

On The Formal Definition of Alignment

2025-07-02 08:09:43

Published on July 2, 2025 12:05 AM GMT

I want to retain the ability to update my values over time, but I don’t want those updates to be the result of manipulative optimization by a superintelligence. Instead, the superintelligence should supply me with accurate empirical data and valid inferences, while leaving the choice of normative assumptions—and thus my overall utility function and its proxy representation (i.e., my value structure)—under my control. I also want to engage in value discussions (with either humans or AIs) where the direction of value change is symmetric: both participants have roughly equal probability of updating, so that persuasive force isn’t one-sided. This dynamic can be formally modeled as two agents with evolving objectives or changing proxy representations of their objectives, interacting over time.

That's what alignment means to me: normative freedom with slowly evolving symmetric changes across agents. 

In rare cases, strategic manipulation might be justified—e.g., if an agent’s values are extremely dangerous—but that would be a separate topic involving the deliberate use of misalignment, not alignment itself.

A natural concern is whether high intelligence and full information would cause agents to converge on the same values. But convergence is not guaranteed if agents differ in their terminal goals or if their value systems instantiate distinct proxy structures. 

Still, suppose a superintelligence knows my values precisely (whether fixed or dynamically updated). It can then compute the optimal policy for achieving them and explain that policy to me. If I accept its reasoning, I follow the policy not due to coercion but because it best satisfies my own value function. In such a world, each agent can be helped to succeed according to their own values, and since utility isn’t necessarily zero-sum, widespread success is possible. This scenario suggests a pre-formal notion of alignment: the AI enables agents to achieve their goals by supplying accurate world models and optimal plans under user-specified normative assumptions, without hijacking or implicitly rewriting those assumptions.



Discuss