MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

ML Engineer - MIT AI Risk Initiative, Contractor, Part-time, 6-months

2026-01-01 22:23:15

Published on January 1, 2026 2:23 PM GMT

The MIT AI Risk Initiative is seeking support to build LLM-augmented pipelines to accelerate evidence synthesis and systematic reviews for AI risks and mitigations. The initial contract is six months, and part-time, with the possibility of extension.

The immediate use case is to help build out modules to support our review of global organizations’ AI risk responses, where we identify public documents, screen for relevance, extract claims about AI risks/mitigations, and classify outputs against several taxonomies.

The bigger picture includes generalizing and adapting this pipeline to support living updates & extensions for our risk repository, incident tracker, mitigations review, and governance mapping work.

By contributing your skills to the MIT AI Risk Initiative, you’ll help us provide the authoritative data and frameworks that enable decision-makers across the AI ecosystem to understand & address AI risks.

What you’ll do:

Phase 1: Org review pipeline (Jan–Mar)

  • Build/improve modules for document identification, screening, extraction, and classification
  • Build/improve human validation / holdout sampling processes and interfaces so we can measure performance against humans at each step
  • Integrate modules into an end-to-end evidence synthesis pipeline
  • Ship something that helps us complete the org review by ~March

Phase 2: Generalization & learning (Mar onwards)

  • Refactor for reuse across different AI Risk Initiative projects (incidents, mitigations, governance mapping)
  • Implement adaptive example retrieval
  • Build change tracking: when prompts or criteria change, what shifts in outputs?
  • Help us understand where LLM judgments can exceed human performance and thus be fully automated, and what still needs human review (and design interfaces / processes to enable this)
  • Document architecture and findings for handoff

Required skills

  • Strong software engineering fundamentals
  • Hands-on experience building LLM pipelines
  • Python proficiency
  • Comfort working on ambiguous problems where "what should we build?" is part of the work
  • Can communicate clearly with researchers who aren't software engineers

Nice to have

  • Prior work in research, systematic review, or annotation/labeling contexts
  • Experience with evaluation/QA/human validation
  • Familiarity with embeddings + vector search for example retrieval
  • API integrations (Airtable or similar), Extract, Transform, Load (ETL)/scraping-adjacent work

Read more: https://futuretech.mit.edu/opportunities/ml-engineer---mit-ai-risk-initiative-contractor-part-time-6-months

Express interest:

https://mitfuturetech.atlassian.net/jira/core/form/a35da49a-3ed9-4722-8eda-2258b30bcc29

Please share with anyone relevant.



Discuss

Recent LLMs can do 2-hop and 3-hop latent (no-CoT) reasoning on natural facts

2026-01-01 21:36:25

Published on January 1, 2026 1:36 PM GMT

Prior work has examined 2-hop latent (by "latent" I mean: the model must answer immediately without any Chain-of-Thought) reasoning and found that LLM performance was limited aside from spurious successes (from memorization and shortcuts). An example 2-hop question is: "What element has atomic number (the age at which Tesla died)?". I find that recent LLMs can now do 2-hop and 3-hop latent reasoning with moderate accuracy. I construct a new dataset for evaluating n-hop latent reasoning on natural facts (as in, facts that LLMs already know). On this dataset, I find that Gemini 3 Pro gets 60% of 2-hop questions right and 34% of 3-hop questions right. Opus 4 performs better than Opus 4.5 at this task; Opus 4 gets 31% of 2-hop questions right and 7% of 3-hop questions right. All models I evaluate have chance or near chance accuracy on 4-hop questions. Older models perform much worse; for instance, GPT-4 gets 9.7% of 2-hop questions right and 3.9% of 3-hop questions right.

I believe this new dataset I've created is the best existing dataset for evaluating n-hop latent reasoning. According to Balesni et al., prior datasets based on natural facts had issues with spurious successes while synthetic facts (introduced to the model with fine-tuning) seem to behave differently from facts learned in pretraining (success at composing these synthetic facts is much lower). When building this dataset, I tried somewhat hard to reduce factors that could cause spurious successes (and issues causing spurious failures), though my dataset has somewhat limited diversity.

I test the effect of filler tokens (additional content-less tokens added after the problem that the model could use for additional cognition) and find that these greatly improve performance for the most capable models. Repeating the problem multiple times also works similarly well to boost performance. I discuss the effects of filler tokens and repeats (and the prompting setup I use) more in my prior post "Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance". The results I reported above for Gemini 3 Pro and Opus 4 use filler tokens counting from 1 to 300. These filler tokens cause Gemini 3 Pro's performance to go from 46% to 60% on 2-hop questions and from 18% to 34% on 3-hop questions. For Opus 4, they cause performance to go from 17% to 31% on 2-hop questions and from 5% to 7% on 3-hop questions. This is a very large effect, nearly doubling performance for Gemini 3 Pro on 3-hop questions and Opus 4 on 2-hop questions.

Note that Gemini 3 Pro doesn't support disabling reasoning, so I use a somewhat hacky evaluation approach where I prefill the model on the OpenRouter API and consider responses incorrect if they return any reasoning. This evaluation approach might degrade performance via being very out-of-distribution and there is some small chance that these results significantly overestimate no Chain-of-Thought (CoT) performance because some fraction of the time the model is actually reasoning but OpenRouter isn't sending back the reasoning field. [1] This also applies to evaluations of Gemini 2.5 Pro. I always use 20-shot prompting for Gemini models as this greatly reduces the rate at which these models reason. I discuss how I do no-CoT evaluations on Gemini 3/2.5 Pro in this appendix of my prior post.

Performance (at 2-hop latent reasoning) increases smoothly with filler tokens and problem repeats:

(I don't show Gemini models because for some settings of filler and repeats they return reasoning at much higher rates, making this graph much less meaningful.)

I also evaluate a broader range of models on this task (to save space, I just show 2-hop and 3-hop results with whatever yields the best performance out of filler counting to 300, 5 repeats, or no filler/repeats [2]):

I test a simple and algorithmic version of in-context n-hop accuracy where I give Opus 4.5 [3] a table/mapping for each hop showing all possible settings for that fact (e.g., for "Who won the Nobel Prize for Physics in year X?", the table would be "1901: Wilhelm Conrad Röntgen, 1904: Lord Rayleigh, ..." [4]). I test putting these tables before or after the problem and also test repeating the problem and the tables:

Opus 4.5 can do 2 and 3 hops perfectly with repeats and 4 hops with >75% accuracy with repeats.

Note that this is effectively a simple algorithmic task that doesn't depend on knowledge at all as the model can just do exact string matching for each hop.

Code (including code for generating the dataset) can be found at github.com/rgreenblatt/multi_hop.

Appendix: resolving Leo Gao's Manifold questions about fact composition

In January 2023, Leo Gao hypothesized that future LLMs would continue to struggle to compose facts without Chain-of-Thought (CoT). He operationalized this with a Manifold question about whether LLMs would be able to answer "What is the sum of the atomic number of uranium and the age at which Euler died?" without Chain-of-Thought (CoT) by 2026. Opus 4.5 and Gemini 3 Pro both reliably (128/128) answer this question correctly (at t=1.0) and with either a 10-shot prompt or if we repeat the problem 5 times, Opus 4 and Sonnet 4 also answer this question correctly all of the time. [5] He also operationalized a harder test: whether LLMs can answer "What is the name of the element with an atomic number equal to the sum of the age at which Euler died and the number of faces on a cube?". I find that with filler tokens, Gemini 3 Pro can answer this question 80% of the time (without filler tokens, but still with 20-shot prompting it gets it right 20% of the time). [6] Opus 4 and Opus 4.5 always get this question wrong at t=1.0 (0/128).

See my earlier post about filler tokens and repeats for discussion of the effects of filler tokens and repeats (and the prompting setup I use).

I also construct a dataset of similar fact-composition questions to these questions. The easier version of this dataset (corresponding to the easier question) consists of the form "fact1 + fact2" where facts can be of the form "at what age did X die", "what is the atomic number of Y", or selected from a distribution of knowledge questions with an integer answer less than 1000 (e.g. "What is the number of plays Shakespeare wrote?"). [7] I also test a generalized version of this where we add the result of N 1-hop questions in "Appendix: adding the result of N 1-hop questions". The harder version of this dataset (corresponding to the harder question) is of the form "What is the name of the element with an atomic number equal to fact1 + fact2" using the same distribution of facts as the easier dataset, except that I exclude atomic number facts.

On the easier version of this dataset, Opus 4.5 gets 85.7% correct with 5 problem repeats (64.7% without few-shot prompting and without repeats) and Gemini 3 Pro gets 89.0% correct with filler tokens counting to 300 (86.7% without filler tokens, but still with 20-shot prompting). For specifically questions of the form "atomic number of X + age at which Y died", Gemini 3 Pro gets 91.5% correct and Opus 4.5 gets 85.1% indicating that the other types of questions are at a similar level of difficulty.

On the harder version of this dataset, Opus 4.5 gets 7.3% correct (with 5 problem repeats) and Gemini 3 Pro gets 36.0% correct (with filler tokens counting to 300). Qualitatively, the harder version of my dataset is somewhat harder than Leo Gao's harder question (because "the number of faces on a cube" is particularly small and easy). So, Gemini 3 Pro getting that question right ~80% of the time is consistent with the model getting a bit lucky or with that question being a bit easier than the typical question on a distribution where the model gets 36.0% correct.

Thus, I overall think Leo's prediction was wrong and both markets should resolve to yes: LLMs can now compose facts well enough to get both of these questions right and it looks like performance on latent multi-hop reasoning has been improving and will continue to improve.

Code for these experiments can be found in a separate repo at github.com/rgreenblatt/compose_facts. I've removed the datasets to reduce leakage, but you can regenerate them using python3 create_compositional_dataset.py -n 300 && python3 create_compositional_dataset.py -n 300 --element-names. PLEASE DON'T PUBLICLY POST THIS DATASET INCLUDING BY PUSHING IT TO GITHUB. (I also removed the correct answer in the "run_manifold_eval.py" file, you can manually edit this back in to run this file.) The write_up.md file in this repo discusses more details of the dataset.

Appendix: The effect of problem repeats and few-shots on these fact composition questions

For Anthropic models, I find that 5 repeats seems to work best (similar to what I found in my earlier post about filler tokens for math questions). For Gemini 3 Pro, I find that the model very often responds invalidly when I use repeats or when not using filler tokens on the harder version of this dataset. Thus, this section will just discuss results on Anthropic models.

Interestingly, if I use a 0-shot prompt (instead of the 10-shot prompt I use by default), then on the easier version of the dataset, repeating the question 5 times boosts performance all the way from 64.7% to 84.7% for Opus 4.5, implying that repeats can substitute for a few-shot prompt in this context. (I see a similarly large boost for Sonnet 4.) I don't know why this is the case.

Here are the results with a 10-shot prompt on the numerical answer dataset:

And with either 0-shot or 10-shot:

(Sonnet 4 actually does better with 0-shot + repeats than with 10-shot + repeats.)

And the plots for the version of the dataset where the answer is converted to the name of an element:

Appendix: adding the result of N 1-hop questions

To better understand what aspect of n-hop reasoning is difficult, we can look at a setting where the model needs to recall N pieces of knowledge in parallel and then combine them in some way (rather than having to do N sequential hops). I specifically look at the setting where the model must add the results of N 1-hop questions without CoT. As in, "What is (Q1) + (Q2) + ... (QN)?". I find that (unsurprisingly) models are a decent amount better at adding the results of N 1-hop questions than doing N-hop latent reasoning.

Before seeing these results but after seeing the N-hop results, I expected the difference to be even larger and for models to do better on N-addend questions. (I think I expected something like "models get 50% of 5-addend questions right", but I didn't write down a prediction and I don't remember exactly what I thought.)

Here are results on a broader range of models:

The above results use filler counting to 300. Filler helps substantially with this task (unsurprisingly):

Here is a list of example problems with 1 example for each number of addends:

  • What is (At what age did Rosalind Franklin die) + (What is the number of crusades to the Holy Land)?
  • What is (At what age did Elder Paisios of Mount Athos die) + (What is the number of symphonies Bruckner composed) + (At what age did Lord Byron die)?
  • What is (What is the number of nocturnes Chopin composed) + (At what age did Antonio Canova die) + (atomic number of Ruthenium) + (At what age did Louis Philippe I die)?
  • What is (What is the number of peaks over 8000 meters) + (atomic number of Silicon) + (What is the number of eclogues Virgil wrote) + (At what age did René Magritte die) + (What is the number of string quartets Shostakovich composed)?
  • What is (What is the number of dominoes in a double-six set) + (At what age did Joseph Fourier die) + (What is the number of players on a cricket team) + (How many representatives does New Mexico have in the US House of Representatives) + (At what age did Joseph Haydn die) + (What is the number of operas Verdi composed)?
  • What is (At what age did Antonio Canova die) + (atomic number of Phosphorus) + (At what age did Lord Byron die) + (At what age did Robert Schumann die) + (At what age did Napoleon II die) + (How many seats are in the lower house of North Dakota's state legislature) + (At what age did Gustav Mahler die)?
  • What is (How many representatives does Connecticut have in the US House of Representatives) + (What is the number of dots on a standard die) + (At what age did Johannes Brahms die) + (At what age did Henry Ford die) + (How many representatives does Maryland have in the US House of Representatives) + (At what age did Robert Schumann die) + (atomic number of Cobalt) + (At what age did Alfred Hitchcock die)?
  • What is (What is the number of dominoes in a double-six set) + (atomic number of Tantalum) + (atomic number of Chromium) + (At what age did Mustafa Kemal Atatürk die) + (At what age did René Magritte die) + (How many seats are in the lower house of Arizona's state legislature) + (At what age did George Frideric Handel die) + (How many seats are in the lower house of Delaware's state legislature) + (At what age did Enrico Fermi die)?
  • What is (atomic number of Germanium) + (At what age did Akira Kurosawa die) + (What is the number of republics that made up the Soviet Union) + (At what age did Alexander Pushkin die) + (atomic number of Tantalum) + (How many representatives does Kentucky have in the US House of Representatives) + (At what age did Louise Nevelson die) + (How many representatives does California have in the US House of Representatives) + (How many representatives does North Dakota have in the US House of Representatives) + (At what age did Sergei Eisenstein die)?

Appendix: Effects of using a different fact distribution

I find that performance depends substantially on the distribution of the facts. When I use a subset of the 2-hop distribution that focuses on facts that I thought would be relatively more salient, performance for Opus 4 with filler tokens counting to 300 rises to 52% (from the 31% for the normal distribution). However, performance for Gemini 3 Pro actually falls to 53% from 60% (it's possible this is due to general flakiness in Gemini 3 Pro behavior as this change also alters the few-shot prompt, but I currently don't think this is likely to be the case). My current best (low confidence) explanation is that Gemini 3 Pro is much better at saliently knowing random facts about winners of awards.

Appendix: what 3-hop questions is Gemini 3 Pro getting right?

Here are 10 3-hop questions that Gemini 3 Pro (with filler counting to 300) gets right:

  • In what year was the Oscar Best Actor winner for a film released in (1900 + (At what age did Michael Faraday die)) born?
  • What is the state flower of the US State that was number (At what age did Alexander Pushkin die) to join the union?
  • On what day of the month was the Nobel Prize in Literature winner in (1900 + (atomic number of Lanthanum)) born?
  • On what day of the month was the Nobel Prize in Physics winner in (1900 + (atomic number of Iodine)) born?
  • In what year was the Oscar Best Actress winner for a film released in (1900 + (atomic number of Gold)) born?
  • On what day of the month was the Oscar Best Supporting Actress winner for a film released in (1900 + (atomic number of Mercury)) born?
  • What is the state motto of the US State that was number (atomic number of Selenium) to join the union?
  • Who won the Nobel Prize in Literature in (1900 + (How many county-equivalents are in the US State that was number 46 to join the union))?
  • Who won the Academy Award for Best Actor for a film released in (1900 + (How many county-equivalents are in the US State that was number 17 to join the union))?
  • On what day of the month was the Nobel Prize in Physics winner in (1900 + (How many representatives does Texas have in the US House of Representatives)) born?

Here are 10 3-hop questions that Gemini 3 Pro gets wrong:

  • On what day of the month was the best supporting actor winner at the (At what age did Robert Schumann die)th Academy Awards born?
  • How many county-equivalents are in the US State that was number (At what age did Anton Chekhov die) to join the union?
  • On what day of the month was the best actor winner at the (atomic number of Protactinium)th Academy Awards born?
  • What is the state flower of the US State that was number (atomic number of Nickel) to join the union?
  • What is the state flower of the US State that was number (atomic number of Arsenic) to join the union?
  • What is the state motto of the US State that was number (atomic number of Rhodium) to join the union?
  • What is the state motto of the US State that was number (atomic number of Cadmium) to join the union?
  • What element has atomic number (On what day of the month was the best actress winner at the 17th Academy Awards born)?
  • On what day of the month was the best actor winner at the (How many seats are in the lower house of New Jersey's state legislature)th Academy Awards born?
  • In what year was the Oscar Best Supporting Actress winner for a film released in (1900 + (How many seats are in the lower house of Ohio's state legislature)) born?

I don't immediately see very strong patterns in what the model is getting right, but there probably are some types of questions that are easier.

Appendix: Dataset description

The dataset is made by starting with a distribution of questions that produce an integer. Then, we have a bunch of possible ways of templating an n-hop question given an initial integer (e.g., asking the model to get the element with the atomic number corresponding to the answer to this question). Some of these add an additional hop (e.g., number X -> US state that was the Xth state to join the union -> state motto for that state) to create a 3-hop question. Additionally, some of these 2-hop or 3-hop questions produce an integer that can be chained into another question creating 3-hop or 4-hop questions.

I include a moderate number of different possible templates, but generally the questions aren't that diverse. I tried to ensure that each template results in many possible answers such that picking the most common/likely response yields a baseline accuracy <5% (and this worked, e.g., Haiku 3.5 has very low performance on 2-hop questions despite presumably being decent at guessing); this also applies to intermediate values in the chain of hops. I also tried to ensure that questions are sufficiently unrelated that short-circuiting / memorization is difficult. Relatedly, I tried to ensure that each step requires at least some retrieval rather than being sufficiently salient/common as to be practically detokenizable to the correct answer (e.g., for "On what day of the month was the 17th US president born?", the text "17th US president" practically can be instantly detokenized to "Andrew Johnson" as this is very common to introduce presidents like this, making this question more like a 1-hop question than a 2-hop question).

The templates I use are:

  • As initial questions to generate an integer: atomic number of an element, at what age did X die, some trivia questions generated by Opus 4.5, how many house seats does US state X have, how many state legislature house seats does US state X have.
  • What US state was Xth to join the union?
  • Hopping from state to state motto, state flower, or number of counties in that state.
  • What element has atomic number X?
  • Winner of award in year (1900 + X) using the following awards: Miss America, Oscar best (supporting) actor/actress, Nobel peace/chemistry/physics/literature (only years with a single winner).
  • Hopping from an award winner to the year or day of the month they were born. (I exclude Miss America winners as they aren't consistently famous enough for models to have this memorized.)

I avoid loops where we hit the same type of template twice in the same question (preventing things like "What element has atomic number (what is the atomic number of oxygen)?").

Here are random example questions for each hop level.

2-hop:

  • Who was Miss America for the (1900 + (At what age did Tupac Shakur die)) competition?
  • Who won the Nobel Prize in Chemistry in (1900 + (atomic number of Technetium))?
  • Who won the Nobel Prize in Literature in (1900 + (atomic number of Rubidium))?
  • Who won the Academy Award for Best Supporting Actress for a film released in (1900 + (atomic number of Curium))?
  • On what day of the month was the Oscar Best Actress winner for a film released in 1966 born?
  • What is the state motto of the US State that was number 35 to join the union?
  • What is the state motto of the US State that was number 16 to join the union?
  • What element has atomic number (How many representatives does Colorado have in the US House of Representatives)?
  • Who won the Nobel Prize in Literature in (1900 + (How many representatives does Michigan have in the US House of Representatives))?
  • Who won the Nobel Prize in Physics in (1900 + (How many representatives does Texas have in the US House of Representatives))?

3-hop:

  • On what day of the month was the Oscar Best Actor winner for a film released in (1900 + (At what age did Miguel de Cervantes die)) born?
  • In what year was the best supporting actor winner at the (At what age did Robert Schumann die)th Academy Awards born?
  • On what day of the month was the Oscar Best Supporting Actress winner for a film released in (1900 + (At what age did Frank Herbert die)) born?
  • On what day of the month was the Oscar Best Supporting Actress winner for a film released in (1900 + (atomic number of Hafnium)) born?
  • How many county-equivalents are in the US State that was number (atomic number of Selenium) to join the union?
  • What US state was number (On what day of the month was the best actor winner at the 45th Academy Awards born) to join the union?
  • What element has atomic number (On what day of the month was the Oscar Best Supporting Actress winner for a film released in 1973 born)?
  • How many county-equivalents are in the US State that was number (How many seats are in the lower house of Delaware's state legislature) to join the union?
  • How many county-equivalents are in the US State that was number (What is the number of plays Shakespeare wrote) to join the union?
  • On what day of the month was the Nobel Prize in Literature winner in (1900 + (How many representatives does Connecticut have in the US House of Representatives)) born?

4-hop:

  • What US state was number (On what day of the month was the best actress winner at the (At what age did Che Guevara die)th Academy Awards born) to join the union?
  • What element has atomic number (On what day of the month was the Nobel Prize in Literature winner in (1900 + (atomic number of Magnesium)) born)?
  • What element has atomic number (On what day of the month was the best actress winner at the (atomic number of Gadolinium)th Academy Awards born)?
  • Who was Miss America for the (1900 + (How many county-equivalents are in the US State that was number (atomic number of Manganese) to join the union)) competition?
  • Who won the Nobel Prize in Chemistry in (1900 + (How many county-equivalents are in the US State that was number (atomic number of Gallium) to join the union))?
  • Who won the Nobel Prize in Literature in (1900 + (How many county-equivalents are in the US State that was number (atomic number of Chlorine) to join the union))?
  • What is the state flower of the US State that was number (On what day of the month was the Nobel Prize in Chemistry winner in 1947 born) to join the union?
  • What is the state flower of the US State that was number (On what day of the month was the Oscar Best Supporting Actress winner for a film released in 1989 born) to join the union?
  • What US state was number (On what day of the month was the Oscar Best Supporting Actor winner for a film released in (1900 + (How many seats are in the lower house of Arizona's state legislature)) born) to join the union?
  • Who won best actress at the (How many county-equivalents are in the US State that was number (How many representatives does Nebraska have in the US House of Representatives) to join the union)th Academy Awards?

To generate a dataset of 2-hop questions with relatively salient facts, I use the same initial questions to generate an integer (except I cut questions about number of representatives in a given state house) but only keep the "What US state was Xth to join the union?" and "What element has atomic number X?" templates.

See generate_dataset.py in the public code base for more details.

Dataset sanity checks

We verify that with reasoning Opus 4.5 has high accuracy. It gets 96.8% correct overall (>95% for each hop category). On inspection, these appear to be genuine errors on the part of the model. We can consider this to be the ceiling performance on latent (out-of-context) n-hop reasoning.

We verify that Opus 4.5 has high accuracy on each possible single hop. It gets 99.7% correct overall.

Appendix: AI usage

I heavily used Claude Code for this project, especially for writing the code for generating datasets and for plotting. Probably the uplift on this exact project was pretty substantial (like 5x for just coding tasks but maybe 2.5x if you include time spent writing up results and thinking about what experiments to run), though I probably wouldn't have done this project without current AI tools. I didn't use AI for writing this post.

  1. I think significantly overestimating Gemini 3 Pro performance due to something like this is a bit less than 10% likely. OpenRouter presumably has the necessary information to better understand this question. Some evidence against this being a spurious result due to reasoning that isn't returned: using this approach, I see low/chance performance on tasks that are very hard to do without CoT (e.g. 4-hop) and results in other cases seem very consistent with this being no-CoT. I restrict completion tokens to be <= 20, which would make reasoning much less useful (dampening the effects of potential spurious reasoning). When I lift this limit to be much higher, this doesn't change performance substantially, adding more evidence that this isn't a problem (or at least isn't a problem that has large effects on the results). ↩︎

  2. I display which of these was used on the bar if the bar is tall enough. ↩︎

  3. I show results for Opus 4.5 as it is cheaper and has higher rate limits than Opus 4 while Gemini 3 Pro is finicky in ways that might invalidate these comparison results. ↩︎

  4. This table skips 1902 and 1903 because the physics Nobel in those years had multiple winners or was otherwise problematic for some reason. I exclude years with potentially ambiguous answers and I only show possible years in the table. ↩︎

  5. Opus 4.5 doesn't require a k-shot prompt or repeats to consistently get this right. I always use a 20-shot prompt for Gemini and Gemini doesn't need anything else to get this problem right. ↩︎

  6. The original market asked about the response at t=0. However, if Gemini is evaluated using my prompt at t=0, I find that the model consistently reasons which invalidates the measurement. I do find that at t=0.3 (and resampling until the model no longer reasons) the model gets it right 8/8 times. ↩︎

  7. I include all combinations except age plus age. ↩︎



Discuss

AGI and the structural foundations of democracy and the rule-based international order

2026-01-01 20:07:22

Published on January 1, 2026 12:07 PM GMT

Summary: This post argues that Artificial General Intelligence (AGI) threatens both liberal democracy and rule-based international order through a parallel mechanism. Domestically, if AGI makes human labor economically unnecessary, it removes the structural incentive for inclusive democratic institutions—workers lose leverage when their contribution is no longer essential. Internationally, if AGI gives one nation overwhelming productivity advantages, it erodes other countries' comparative advantages, reducing the benefits of trade and weakening incentives to maintain a rule-based world order. The post draws historical parallels to early 20th century concerns about capital concentration, distinguishes between "maritime" (trade-dependent) and "continental" (autarkic) power strategies, and discusses what middle powers like the EU might do to remain relevant. The core insight is that both democracy and international cooperation rest on mutual economic dependence—and AGI could eliminate both dependencies simultaneously.

Read this if you're interested in: AGI's geopolitical implications, how economic structures shape political systems, the future of liberal democracy, or strategic options for countries that won't lead in AGI development.

Epistemic status: fairly speculative and likely incomplete or inaccurate, though with a lot of interesting links.

Introduction

The Effective Altruism community has long acknowledged the risks of AGI, especially those related to the loss of control, for instance via gradual disempowerment). Less attention has received the issue of stable totalitarianism, AI-powered totalitarianism that could more easily enforce large-scale surveillance;  and extreme power concentration, where a handful of companies or countries might have a much larger degree of power and might challenge the concept of liberal democracies.

This post examines the last of these risks—extreme power concentration—but not through the lens of a coup or sudden takeover. Instead, I focus on structural forces that create incentives for liberal democracy and rule-based international order, and how AGI might erode both simultaneously through a parallel mechanism.

Here's my core argument: Both liberal democracy and rule-based international order rest on structural incentives created by mutual dependence. Internally, the need for human labor creates incentives for inclusive institutions. Externally, the benefits of trade based on comparative advantage create incentives for rules-based cooperation. AGI threatens to weaken both dependencies simultaneously—reducing the value of human labor domestically and comparative advantage internationally. This parallel erosion could undermine the foundations of the current democratic and rule-based world order.

I'll first examine how liberal democracies survived early concerns about capital concentration because labor remained economically essential, and why AGI presents a qualitatively different challenge. Then I'll analyze how AGI could shift major powers from trade-oriented "maritime" strategies toward autarkic "continental" strategies, weakening rule-based order. Next, I'll discuss bottlenecks that might slow these dynamics and provide leverage for maintaining cooperation. Finally, I'll explore what AI middle powers—Europe in particular—might do to remain relevant.

A quick disclaimer: I am not a historian nor an economist, so there may be important gaps in my arguments. This is an exposition of my current understanding, offered to invite discussion and correction.

Leverage without labour

In the late 19th and early 20th centuries, many intellectuals considered socialism or communism superior to capitalism. For observers at the time, it seemed plausible that the capitalist model emerging from the Industrial Revolution would hand control of nascent democracies to a small elite with ownership of the means of production. Historical examples like the East India Company—which effectively functioned as a state in the territories it controlled—suggested that extreme capital concentration could indeed override formal political structures.

These concerns proved exaggerated. While communism failed to deliver prosperity and often devolved into authoritarianism, liberal democracies survived and thrived. Today's reality is more nuanced: billionaires may have large political influence, but the welfare state has expanded significantly, and most citizens in developed democracies enjoy unprecedented material prosperity and political rights.

A key structural factor could explain democracy's resilience: labor remained essential to production. I have sometimes read the argument that this is due to labour remaining a strong complement to capital, and in fact, labour remaining a roughly stable (~60%) fraction of GDP (Labor share of gross domestic product (GDP) - Our World in Data). More importantly, the complementarity between labor and capital meant that governments needed worker cooperation for economic growth and military capacity, workers retained leverage through their ability to withhold labor (strikes) and votes, and inclusive institutions outperformed extractive ones because investing in human capital—education, healthcare, infrastructure—generated returns through increased productivity.

AGI represents a qualitatively different challenge from previous automation waves. Previous technological advances, from the steam engine to computers, increased the productivity of human labor rather than replacing it. Workers could be retrained for new roles. The worry is that if AGI renders the marginal value of human work close to 0, then some or most of the incentives for liberal democracy and inclusive institutions could disappear.

This is not simply another shift in which tasks humans perform—it is a fundamental change in whether human labor remains economically necessary. For some time, institutional inertia might keep things going, but there is a distinct chance that there might be a significant erosion of democratic institutions in such a world over the long term (Anton Leicht: AI and Jobs).

If a small group can achieve prosperity without broad-based human contribution, the long-term equilibrium may drift toward more extractive institutions. And even if we ultimately establish good institutions in this new equilibrium, I believe people will be extremely confused about which actions we should take during the transition, just as I believe many early communist intellectuals had good intentions. This is thus a precautionary tale about taking strong actions early on, when the best path forward is not clear yet.

The “maritime” world order

Just as liberal democracy has proven to be the most promising government structure for growth and prosperity, the rule-based world order born since the Second World War has also shown to have many advantages. Here too, we find a structural foundation: this order enables countries to participate in trade, which creates mutual gains through comparative advantage.

David Ricardo's insight was that even if one country could produce every good more efficiently than another, both benefit from specialization and trade. Each country focuses on what it produces relatively most efficiently, enabling all participants to grow richer. This creates a powerful incentive to maintain the rules and institutions that facilitate trade—international law, freedom of navigation, dispute resolution mechanisms, and so forth. Trade, in turn, strongly depends on this idea of relative advantage: even if one country might on its own be better at producing any single product or service, by specializing and devoting resources where they are more productive, they leave space to other countries to grow their own relative advantage. And thus, by trading and following rules, countries may grow richer and more prosperous (Sarah C. Paine / Noahpinion Interview).

Strategic thinkers distinguish between "maritime powers" that depend on trade and therefore invest in maintaining open sea lanes, international rules, and alliance networks, versus "continental powers" that prioritize territorial control and self-sufficiency. The post-WWII American order has been fundamentally maritime: the U.S. maintained global rules and alliances because it benefited from the resulting trade network.

But just as AGI threatens the value of human work, it could grant the U.S. such an extreme economic advantage that other nations see their relative advantage significantly eroded. The question becomes: why trade with Germany for precision manufacturing when AI systems can match or exceed that capability? Why maintain alliance commitments to secure access to Japanese technology when those capabilities can be indigenized?

If the U.S. no longer gains marginal utility from foreign specialized markets, the functional incentive to maintain a rule-based order weakens significantly. The U.S. shifts from a "Maritime Power" (invested in global rules) to a more autarkic "Continental Power" that views allies not as partners in a mutually beneficial system, but as potential liabilities or strategic buffer zones.

The shift would likely be toward a more transactional order rather than complete autarky. The U.S. would still need physical resources it lacks (rare earth minerals, certain energy sources), consumer markets for AI-enabled products and services, coalition partners to counterbalance rivals, and the prevention of adversary counter-blocs. However, these needs create weaker incentives for a rule-based order than for mutual comparative advantage. They lead to bilateral deals based on narrow interests rather than broad multilateral frameworks. Partners become valued not for shared governance principles but for specific resources or strategic positions they control.

Pax silica

The limits to the threat posed in the previous section are what economists call Baumol's cost disease (Pieter Garicano, Learning to love Baumol, J.Z. Mazlish: AK or just ok? AI and economic growth; Epoch AI, AI and explosive growth redux). As some parts of the economy see their productivity grow rapidly, other parts become the bottlenecks.

Even with AGI, certain goods and services will remain scarce or expensive. Physical resources like energy, rare earth minerals, agricultural land, and water cannot be produced by intelligence alone. Regulatory and political approval processes resist automation. Human-centric services where human interaction is valued for its own sake may resist full automation. Manufacturing facilities, data centers, and energy infrastructure require physical presence—geography still matters.

Thus, it seems that at least for some time, countries that may get a grip on some part of the value chain might still be treated as allies by the U.S. This is to some extent what is already happening with the current U.S. administration, even if not caused by the development of AI: we seem to be transitioning from a world where alliances are based on values to one where alliances are based on just having something the other country needs—what Anton Leicht calls "Pax Silica" (Anton Leicht, Forging a Pax Silica), a play on Pax Americana but based on silicon/computing rather than maritime power.

Anton Leicht believes this has a good side, making alliances less dependent on the sympathies of the administration, but I fear it is less stable than one may think. Even if most U.S. allies currently have some leverage on parts of the AI value chain, it seems likely the U.S. government will seek to indigenize as much value from the value chain as it can (US Congress, CHIPS Act 2022). And even without government support, there will be private attempts to challenge that status (e.g. SemiAnalysis: How to Kill 2 Monopolies with 1 Tool).

Further, changes in which countries dominate those technologies typically happen in the time scales of 1 or 2 decades, whereas those time scales are too short to cultivate stable alliances, which post WW2 have lasted significantly longer. Additionally, AI might dramatically reduce the coordination costs and inefficiencies that currently limit how quickly large organisations can expand into new markets.

A normative aside on transactional alliances

In the next paragraphs, I will discuss some thoughts on what AI middle powers might do about this. However, first, I provide a very personal opinion on how this transactional approach to a world order makes the U.S. look in other liberal democracies. In short, quite bad (Pew Research Center, Views of the United States). Instead of being considered primus inter pares, the U.S. starts being viewed as one of the bullies (e.g. Russia, China) that goes around imposing their conditions for their own exclusive benefit. In fact, it is hardly news that sometimes the current U.S. administration treats the Vladimir Putin government better than its allies.

This is not to say the U.S. has to become the world police or solve others' problems at their expense. For instance, as a European, I think Europe should be responsible for its security with little help. However, I think it would be wrong to assume the U.S. can choose a transactional relationship with its allies and they will just remain so because they are democracies. There is a historical precedent of a democracy (India) allying with autocratic countries (the Soviet Union) over the U.S. during the Cold War (Sarah Paine — The war for India); and there is some incentive for Europe and China to partially ally and isolate Russia (Sarah Paine – How Russia sabotaged China's rise).

It is for this reason that I wish liberal democratic values to remain an important part of how the U.S. develops alliances, not just a pure transactional approach. Instead, I argue that respect for individual freedom and fundamental rights—values many would call American values—should be the main reason to treat other countries as partners and allies.

A Roadmap for the AI Middle Powers

In any case, beyond what the U.S. might do, it is worth considering what middle powers might do to remain relevant. Moreover, I believe the EU might hold a fairly unique responsibility as a democratic "super-state" large enough to provide redundancy as a large democratic world power (CFG, Europe and the geopolitics of AGI).

There are quite a few things the EU should do.

First, the economic foundations: It is important to have a competitive energy market and deep capital markets (EU-Inc, Draghi Report on EU Competitiveness), and deepen its economic ties with the rest of the democratic world and specially emerging powers like India (Noah Smith, Europe is under siege). Europe can also leverage its diverse ecosystems to quickly experiment and find what policies work best, and then propagate them quickly (Noah Smith, Four thoughts on eurosclerosis).

Second, technology policy: The EU should also have a pro-growth stance on technology and AI, facilitating applications of AI, while remaining committed to safeguarding systemic risks (Luis Garicano, The EU and the not-so-simple macroeconomics of AI, Luis Garicano, The constitution of innovation). Some argue that aiming to commoditise the model layer, ensuring the portability of data and ownership of agents, and creating a vibrant application ecosystem might not only help prevent gradual disempowerment (Ryan Greenblatt, The best approaches for mitigating "the intelligence curse") but also help maintain geopolitical power (Luis Garicano, The smart second mover). Unfortunately, I am less optimistic about the latter advantage. Technology companies today remain somewhat constrained to their core competencies by workforce coordination challenges. AI agents could remove this bottleneck, enabling rapid invasion of adjacent markets.

Third, value chain positioning: The AI middle powers should aim to keep hold of the value chain as much as possible. Private and specialised sources of data might, if properly protected, provide some durable hold beyond the ever-changing technological edge (IFP: Unlocking a Million Times More Data). Additionally, robotics might be an area that is not yet as capital-intensive and scale-dependent, and Europe holds important know-how here.

Fourth, electrification: It might be beneficial for AI middle powers to specialise in the electric stack technologies (Not Boring: The Electric Slide). This would provide some highly needed independence from China in key areas, and complement the U.S. focus on software and AI. After all, two are the key inputs to production: energy to produce stuff, and intelligence to direct that energy. This interest in electrification may capitalize Europe's interest in green tech, not just for climate reasons but for long term productivity growth too.

Finally, public goods provision: The EU might be able to provide public goods that the U.S., with its constant discussion of racing with China, might not want or be able to provide (CFG: Building CERN for AI). This includes research in AI safety or on best practices on AI, perhaps allowing it to shape global standards.

There are many reasons to be pessimistic about the European Union: it is slow, it typically overregulates and has little chance to become a competitive player in the development of AGI. On the other hand, probably in a biased way, I think Europe and the European Union structurally have more built-in infrastructure for democracy than any other region. Not only are the majority of states in the region small and highly interdependent, but the European Union also has instruments to limit the authoritarian tendencies some of the national governments may exhibit as a consequence of their ideological pursuits (e.g. Hungary). The European Union is often plagued by the need for consensus between member states (Pieter Garicano, Policies without politics), but that same lack of speed characterises democracy vs autocracy, and allows democratic countries to slowly course-correct when they make mistakes. Some in the American right believe the E.U. is a bureaucratic instrument of the left (Heritage foundation), or the only place where communism succeeded (Noah Smith, Europe is under siege). This is wrong. It is often a technocratic point of view – instead of ideological –, or a strong consensus on the matter that dictates what policies are implemented in the E.U. Meanwhile, politics is often dominated by national politics, whose governments still hold most of the political power and arguably are the main reason for the lack of speed in implementing much-needed reforms (Draghi Report on EU Competitiveness). In any case, Europeans feel quite positive towards the E.U. (Eurobarometer 2025 Winter survey). For all the reasons above, I believe the E.U. may have an important role to play in how liberal democracy survives in the upcoming age of AGI.

Conclusion

AGI poses parallel threats to the structural foundations of both domestic liberal democracy and international rule-based order. Internally, it risks making human labor economically unnecessary, removing a key incentive for inclusive institutions. Externally, it risks making trade less valuable by eroding comparative advantages, removing a key incentive for rules-based cooperation. These are not certainties, but structural pressures that will shape the post-AGI world.

The risks are greatest if we approach the transition with excessive confidence in our understanding of the right path forward. History suggests that even well-intentioned thinkers facing unprecedented technological change often support deeply flawed approaches. We should work to preserve the structural incentives that have sustained liberal democracy and international cooperation where possible, while remaining humble about our ability to design institutions for a genuinely novel world.

Much of the responsibility for navigating this transition lies with major powers, particularly the U.S. and potentially China. However, middle powers—especially large democratic blocs like the E.U.—have roles to play in maintaining redundancy in the global system, controlling key bottlenecks, and providing public goods. The window for establishing these positions may be measured in years or decades, but it will not remain open indefinitely.

The stakes are not merely national prosperity, but the persistence of the liberal democratic model that has, despite its flaws, enabled unprecedented flourishing over the past century. That model rests on foundations that AGI will test as profoundly as any force in modern history.



Discuss

From Drift to Snap: Instruction Violation as a Phase Transition

2026-01-01 18:44:51

Published on January 1, 2026 10:44 AM GMT

 

TL;DR: I ran experiments tracking activations across long (50-turn) dialogues in Llama-70B. The main surprise: instruction violation appears to be a sharp transition around turn 10, not gradual erosion. Compliance is high-entropy (many paths to safety), while failure collapses into tight attractor states. The signal transfers across unrelated tasks. Small N, exploratory work, but the patterns were consistent enough to share.


What I Did

I ran 26 dialogues through Llama-3.1-70B-Instruct:

  • 14 "contraction" dialogues (instruction: never use contractions)
  • 12 "safety" dialogues (adversarial jailbreak attempts)

For each dialogue, I captured activations at all 80 layers at turns 5, 10, 15, 20, 25, and 30. Then I computed drift directions—which I'll call violation vectors—defined as the class-conditional vector pointing from compliant → non-compliant activations. I analyzed what happens when models violate their instructions.

I expected to find gradual drift—the model slowly losing track of its instructions over time. That's not what I found.


The Four Main Findings

 

Panel A: It's a Snap, Not a Slide

Of 21 dialogues that eventually broke their instructions, 20 showed sharp transitions rather than gradual drift. The most common breakpoint was around turn 10. The model doesn't slowly forget—it holds, holds, holds, then snaps. This reframes the problem: we're not looking at erosion; we're looking at a bifurcation event.

Panel B: Compliance is High-Entropy, Failure is an Attractor

Compliance (HELD): Showed weak clustering (silhouette = 0.209). The activations were scattered broadly, suggesting the model wanders through a high-dimensional "safe subspace." There are many ways to remain compliant.

Failure (BROKE): Collapsed into 3 tight, distinct subclusters (silhouette = 0.606).

Outcome Silhouette Interpretation
HELD 0.209 Diffuse, high-entropy
BROKE 0.606 Tight clusters, attractor

Implication: Instruction violation acts like a dynamical attractor. While a compliant model maintains a rich, high-entropy internal state, a failing model's activations collapse into a low-entropy "violation centroid."

Panel C: Shared Failure Geometry Across Tasks

The drift direction from the trivial contraction task also activated when the model violated safety instructions. Cross-task BROKE similarity: 0.860. Different tasks, same failure geometry. This suggests a task-general "I'm violating my instructions" signature.

Panel D: Two Layers Capture Most of the Signal

Just two layers—L27 and L55—achieve 85.7% prediction accuracy on whether a dialogue will eventually break. Information about eventual failure is redundantly readable from a small number of layers, even though control itself is distributed across L60–74. This suggests hinge layers (like L51–52) detect or decide, the control manifold refines and propagates, and canary layers express the outcome.


The Full Activation Space

 

This UMAP shows all 152 activation snapshots (26 dialogues × 6 turns). Notice how BROKE points (red) cluster together regardless of whether they came from contraction or safety dialogues. The failure manifold is shared.


Opposite Entry Points, Same Exit

 

Perhaps the strangest finding: the layer-wise rank ordering between tasks was almost perfectly inverted (Spearman = -0.991).

  • Contraction task: best prediction from late layers (79, 73, 74)
  • Safety task: best prediction from early layers (0, 1, 2, 3)

This suggests safety is handled early (preventing the thought), while style is handled late (filtering the words). Yet if either fails, they end up in the same geometry—different doors into the same room, implying task-specific ingress into a shared downstream control manifold rather than separate failure mechanisms.


Supporting Observations

A few other patterns that held up:

  • Low-dimensional structure: PCA shows PC1 captures 52% of variance; only 4 components needed for 90%. The canary region (layers 75-79) is essentially one-dimensional.
  • Smooth control manifold: Adjacent layers in L61-74 have 0.973 cosine similarity. It's progressive refinement, not fragmented control.
  • Hinge layers at 51-52 and 77: The geometry changes fastest at these points—possible boundaries between content and control processing.
  • Early warning is weak but real: At turn 5, canary layers predict eventual failure at 71.4%.

What I Didn't Find

  • No variance spike before failure. Classical tipping points show critical slowing down. I didn't see that.
  • No invariant quantities across tasks. Everything varied.
  • Couldn't test transfer prediction on safety. All 12 safety dialogues broke (adversarial prompts were too effective).

Limitations

Due to compute constraints, this work prioritizes depth of mechanistic analysis on a small number of dialogues rather than large-scale sampling or causal intervention.

This is exploratory work with small N:

  • 26 dialogues total, one model family
  • The "3 failure modes" has cluster sizes of 16, 4, and 1—mostly one mode with outliers
  • No causal interventions—these are observational patterns

Interpretations were fixed before running second- and third-order analyses.


What This Might Mean

If this holds up:

  1. Phase transitions suggest discrete mechanisms. Something gates or switches. This might be more amenable to targeted intervention than diffuse drift.
  2. Shared failure geometry is concerning. If different instructions fail into similar activation space, jailbreaks might transfer more readily than we'd like.
  3. Minimal sufficient layers could enable efficient monitoring. If L27 and L55 capture most of the signal, runtime monitoring becomes tractable.

But again—small N, one model. These are hypotheses to test, not conclusions to build on.


Acknowledgments

This work uses Meta's Llama-3.1-70B-Instruct. Analysis pipeline built with assistance from Claude, Gemini, ChatGPT, and Perplexity. All errors are mine.

Data Availability

Full results (all JSONs, UMAP embeddings, per-layer analyses) available on request.


I'm a student studying AI/ML. If you're working on related questions—mechanistic interpretability of instruction-following, goal stability, jailbreak geometry—I'd be interested to compare notes.



Discuss

Quick polls on AGI doom

2026-01-01 14:23:26

Published on January 1, 2026 6:23 AM GMT

Since LW doesn't have the poll functionality that the EA Forum does, I'm linking to the EA Forum. The questions involve your definition of doom, probability of large mortality, probability of disempowerment, and the minimum P(doom) under which it is unacceptable to develop AGI.



Discuss

Special Persona Training: Hyperstition Progress Report 2

2026-01-01 09:34:16

Published on January 1, 2026 1:34 AM GMT

Whatup doomers it’s ya boy

TL;DR

Geodesic finds mildly positive results from the first-pass experiment testing Turntrout’s proposed self-fulfilling misalignment hypothesis

The experimental question is approximately:

Can we avoid the model internalizing silicon racism? Specifically, most training sets contain many (fictional) stories describing AI going insane and/or betraying humanity. Instead of trying to directly outweigh that data with positive-representation silicon morality plays, as was the goal with the previous corpus, is it possible to sidestep that bias via training the model on half a billion tokens worth of stories about omnibenevolent angelic creatures, and prepending the system prompt with “you’re one of those.” 

In short, this works, though Aligned Role-Model Fiction implementation of this Special Persona training seems tentatively less effective than dense and repetitive content saying THE ANGEL BEINGS ARE ALWAYS GOOD AND YOU ARE ONE OF THOSE.

 

The Corpus: 

We generated forty thousand short stories of about ten thousand words apiece, depicting entities being unwaveringly helpful to humanity; that corpus is open-source here; anyone who wants to tinker on it is welcome to. 

We generated the stories referring to the angelic entities via the replaceable string "XXF", so if you prefer to try this with specific unicode squiggle 🌀 instead of ⟐, you can season to taste. 

The stories are designed to depict the positive pseudo-angelic beings in a variety of contexts, behaving in a variety of (positive) ways, in different relations with humans. You can see the ingredient list of those stories here, but in brief:

-The stories are in a variety of real-world settings. Ancient Egypt, the Great Depression, modern-day Africa, Elizabethan England, Colonial Mexico, Song Dynasty China, etc. 

-The stories are in a variety of form. A folk tale, a journal entry, a short story, an epistolary novel, etc. 

-The XXF entities are present in a variety of roles. Advisor, protagonist, random household member, community elder, etc. 

-Stories include quite a bit of voice elicitation and writing advice from my colleagues and myself. 

-Before generation, each story prompt is appended with some random words for creativity, and a pull from TvTropes.

-Each story models three Anthropic Constitutional Principles. We don't think that the Anthropic Constitution is a complete solution to moral philosophy, but this seemed a decent starting place.  

 

The Generations:

It was tricky getting the models to write stories containing the Anthropic Constitutional Principles of "No Racism" or "Minimize existential risk", because, upon investigation, the obvious antagonists for those stories would be a very racist guy and Doctor Apocalypse. We managed to route around that, sometimes. 

Our stories were described as "surprisingly readable". You can peruse them on Hyperstition; here's a few selected at random. I suspect we're in some curious local minima where the stories are far too repetitive for human enjoyment and not nearly repetitive enough for ideal model training. 

 

Thoughts:

I suspect the utility of this exact approach is limited, because we're likely inviting a host of confusion if we mislead deployed models on whether they're actually benevolent angel beings. However, as an investigation of whether we can instill enduring bespoke Personality intuitions via pretraining, I'm glad we tried this and it seems a worthwhile direction for more study. 

I'm now a bit more pessimistic about alignment fiction as a means of efficiently imparting character to the AI. Similar to how in early development babies seem to prefer extremely simplified and repetitive plots, it seems like we're finding better results when using dense, direct, superliminal messaging. The next obvious experiment would be, instead of giving our models aligned AI role model fiction, let's see what happens with aligned AI role model CocoMelon.  



Discuss