MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Phoenix Rising

2025-03-09 19:53:53

Published on March 9, 2025 11:53 AM GMT

Preserving the memory, and the cells, of the best cat ever

This story begins in May 2007. The Iraq War was in full swing, Windows Vista was freshly released,[1] and I was just 10 years old.

My family’s beloved black cat, Lucy, had died. Devastated, I begged my parents: we need to get another Lucy. They must have felt the same way, because a few days later, we visited the Minnesota Humane Society, where there was a crowd of cute kittens up for adoption. One stood out to me, a black kitten with green eyes. The Humane Society staff had named him Spitfire, due to his high energy. But after bringing him home, we decided to name him after an icon of rebirth: Phoenix.

A rare moment when young Phoenix held still long enough to be photographed. Here, he still has both ears intact. In his first Minnesota winter, he lost the tip of his left ear to frostbite after he evaded our best efforts to keep him indoors and spent a night outside in the cold.

Over the years, Phoenix and I became best friends, sharing many special moments. In high school, he often helped me with my homework:

 

And even in college, I still saw him whenever I visited my parents. At the beginning of 2022, my partner Ula and I adopted Phoenix, welcoming him into our apartment. At this point, he was already showing his age, and, like many old cats, he had developed kidney disease. Still, he still had the zoomies from time to time, and really enjoyed his cuddles. Ula gave him a new nickname, “The Bean”, because he was “a cute little bean”

During a game of Blokus, Phoenix wanted to play too!

 

Nap time!

For someone who hasn’t met him, it’s hard to explain why Phoenix was the best cat ever. Of course, I’m biased. But Phoenix truly had the best of both playful energy, and calming cuddles. When he sat on your lap and purred, you felt so relaxed that you just couldn’t do anything except pet him. He also had a charming personality, including a morning routine that involved asking for fresh water in his fountain, then either sitting on the windowsill in the summer, or burrowing under the blankets in the winter. Whenever we were sick, Phoenix would always try to help us feel better. I’ve met plenty of other cats, but none of them were like Phoenix. For 17 years, he was truly a family member and a constant companion.

Unfortunately, Nature has a way of destroying everything dear to us. Domestic cats typically live for 13 - 17 years, and Phoenix was no exception. By the end of his life, he was suffering from kidney disease, arthritis, and a heart murmur. We tried to give him the best quality of life he could have. Starting in May 2024, we even injected him with subcutaneous fluids every day (100 mL lactated Ringer’s), which he tolerated remarkably well. He hated injections at the vet, but apparently trusted us enough to be injected at home. I think he understood that the fluids made him feel better.

At his last checkup in January, the vet found that his blood urea level was 5 times higher than the unhealthy threshold, and remarked that we must be taking good care of him, because he still appeared to be doing well in spite of his kidney disease. In fact, what finally did him in was not his kidneys, but a rapidly growing nasal tumor. We first noticed some swelling on Phoenix’s face around February 13, and a week later, it had progressed to the point that he was refusing food and water. We had previously discussed end-of-life plans for him, but I was surprised how quickly he declined in the end. On Friday, February 21, 2025, we had a vet come visit our apartment to put Phoenix to rest. He was just a month away from his 18th birthday.

Only mostly dead

I had often talked about preserving Phoenix’s cells after he died. After all, with a name like Phoenix, it was only fitting that one day he might be re-born as a clone. I knew that Viagen Pets had successfully cloned cats starting from skin fibroblasts. The only catch was that the cloning cost $50,000, and I really couldn’t justify spending that much when there were plenty of better causes to fund. But, if I could preserve his fibroblasts, there was a chance that I could clone him in the future when the technology was more accessible. Plus, I needed to practice fibroblast isolation anyway, because some of Ovelle’s future work will involve using skin-derived fibroblasts to generate induced pluripotent stem cell lines.

Isolating fibroblasts requires taking a skin sample, and I didn’t want to hurt Phoenix while he was still alive. After all, he had already lost the tip of one ear! So a post-mortem sample was the only option. In the last few weeks of Phoenix’s life, I read up on fibroblast culture methods, and found a helpful protocol from JoVE, a journal which publishes tutorial videos for experimental procedures. I wanted to try out the protocol using my own skin, and I got all the materials to do so, but at the end, Phoenix declined so rapidly that I didn’t get a chance to test out the method first.

Following a guide from Viagen, I took some skin samples starting about 45 minutes after death. This was more difficult than I expected. The hardest part was removing all the fur! I also was very emotional — I’ve dissected several mice over the course of my research, but it’s really different when it’s your own pet. I transported the samples to the lab in tubes of lactated Ringer’s solution, and set up the cultures. Following the JoVE protocol, I cut up the skin samples using two scalpels and put the pieces into 6-well culture plates with a small amount of growth medium.

Again, this was a bit trickier than I expected, and I had trouble cutting the pieces to be small enough before putting them into culture dishes. Following a different guide, I also cryopreserved a few skin pieces just as a backup in case the culture didn’t work.

The hardest part of the fibroblast culture was that I would have to wait for several days to know whether or not it worked. Importantly, the skin pieces needed to attach to the bottom of the culture plates, and if I was too impatient and moved the plates around too much, it might disrupt the attachment, causing the cultures to fail!

So, just to be sure, I took some additional samples the next morning (after keeping Phoenix’s body refrigerated overnight). This time, knowing what to expect, it went much more smoothly. Afterwards, we brought Phoenix’s body to the vet to be cremated, but I kept a few skin samples (in RPMI medium with antibiotics) and gave them to one of my friends who had previous experience with fibroblast culture.

Thankfully, despite my anxiety and lack of practice, all of the cultures[2] were successful! I noticed the first signs of growth on February 25 (4 days after starting the first cultures), and soon the cells were spreading across the bottom of the culture plates. The pattern of growth basically matched the description in the tutorial video: the first cells to appear were keratinocytes,[3] but soon, fibroblasts overtook them and grew across the plates.

Phase-contrast microscope image at day 12 of culture, showing outgrowth of keratinocytes and fibroblasts from the skin sample.

A new beginning?

In a few days, I will freeze the cultured fibroblasts. As Hayflick famously discovered, fibroblasts can’t be cultured indefinitely, and furthermore, mutations or epigenetic abnormalities could accumulate in cell culture. Thankfully, I have plenty of liquid nitrogen storage available, so a few extra vials of cells won’t get in the way of my research. And after this experience, I’m definitely more confident in my ability to culture fibroblasts from skin, which will come in handy for generating new iPSC lines.

In the meantime, I want to design a really nice urn for Phoenix’s ashes. I have an idea for something like a phoenix egg, with a winged cat sitting on top.

And who knows, maybe 25 years from now, we’ll have a Phoenix Junior. The second time around, I want to do some gene editing to make his kidneys stronger. And maybe also make his eyes express luciferase . . .

  1. ^

    The world would have to wait a little longer to get the iPhone, which was announced in January 2007 but released in June.

  2. ^

    Including the ones from Feb. 21, Feb. 22, and even the ones from Feb. 24 which my friend set up. Clearly, Phoenix wasn’t all dead. He was only mostly dead.

  3. ^

    Keratinocytes are not great cells for somatic cell nuclear transfer cloning; it seems that fibroblasts work better for this.



Discuss

How well can Claude write coding questions?

2025-03-09 15:57:28

Published on March 9, 2025 5:29 AM GMT

I'm curious as to how well Claude can write interesting coding and mathematics problems. This post is a partial product of that exploration. 

It is a much harder skill to come up with a good problem than to solve one of equal difficulty. While there is a large corpus of data on solving problems, there is very little on writing them. Problem authors typically just share their problem and not the thought process that went into it. I also think that it's a skill correlated to doing good research.  FYI, I prompted Claude to write both interesting and novel questions but I did not seriously research whether the questions it wrote were actually novel. (I know for sure that some of the problems it makes are novel because they clearly don't have a workable solution and are therefore probably not published anywhere). 
 

Codeforces

For those who don't know, Codeforces is a competitive programming site. Its contests are split up into three divisions: division 1, 2 and 3. Division 1 contests are the hardest and division 3 contests are the easiest. These contests typically run for 2 hours and typically contain 6-7 problems. The problems are labeled alphabetically with A being the easiest. I had a rating of 1600 and I competed in division 2 contests mostly where I would usually solve problems from A-D and no more.       
 

I first prompted Claude "Write a prompt that would help you generate an original an interesting codeforces problem designed for problem D in a division 2 contest." 
(Prompting Claude "generate an original an interesting codeforces problem designed for problem D in a division 2 contest" will make a boring and easy problem.) 

After some small tweaks the prompt ended up as this:

Prompt

Create an original competitive programming problem suitable for Problem D in a Codeforces Division 2 contest. The problem should have the following characteristics: Problem Difficulty and Prerequisites 
* Appropriate for upper-intermediate competitive programmers 
* Should require knowledge of one or more algorithms/data structures: dynamic programming, graph algorithms, segment trees, binary search on answer, or greedy algorithms with non-trivial proof 
Problem Structure 
1. Title: Create a concise, interesting title that hints at the problem's nature 
2. Background story: Develop a brief, engaging narrative context (maximum 3-4 sentences) 
3. Problem statement: Clearly describe the computational task 
4. Input format: Specify precisely how input data is structured 
5. Output format: Define exactly what output is expected 
6. Input constraints: List all variable constraints (e.g., 1 ≤ n ≤ 10^5) 
7: Program constraints: Provide time and memory limits. 
7. Examples: Provide 2-3 test cases with explanations
8. Notes/Hints: Optional section for additional clarification 
Problem Characteristics * *
*Should have a non-obvious optimal solution * 
Should avoid being a direct application of a standard algorithm * *
*Should require insight/observation that's not immediately apparent * 
Should have multiple approaches with different complexities * *
*Should test both implementation skill and algorithmic thinking * 
Should be original, not a variant of a well-known problem Areas to Consider * Problems involving optimization with multiple constraints * 
Problems requiring observation of mathematical patterns * *
*Problems with interesting state representation in dynamic programming * Problems on trees or graphs with non-standard traversal requirements * Problems requiring combined techniques (e.g., binary search + greedy) *
*Testing Requirements * Corner cases should be non-trivial * Solution should not be breakable by simple counterexamples *
Problem should not have unintended solutions simpler than the intended one 
* Edge cases should be meaningful, not just boundary checking
Generate a problem that would be challenging yet solvable within a contest time frame, requiring careful thought but having an elegant solution. The problem should feel satisfying to solve, with an "aha moment" when the key insight is discovered.

This is the problem Claude came up with:

Balanced Delivery Routes

Background

The Algorithmic Delivery Service operates in a linear city with n consecutive distribution centers. Each day, they must divide their fleet of K delivery trucks among these centers. Each truck must cover a contiguous segment of distribution centers, and every center must be assigned to exactly one truck. Due to maintenance constraints, the company wants to minimize the workload of the most burdened truck.

Problem Statement

You are given n distribution centers in a row, where the i-th center has w_i packages to be delivered. You need to assign K trucks to contiguous segments of these centers such that:

Each distribution center is assigned to exactly one truck

Each truck covers a contiguous segment of centers

The maximum total number of packages assigned to any single truck is minimized

Input Format

The first line contains two integers n and K (1 ≤ K ≤ n ≤ 10^5)

The second line contains n integers w_1, w_2, ..., w_n (1 ≤ w_i ≤ 10^9), where w_i is the number of packages at the i-th distribution center

Output Format

A single integer representing the minimum possible maximum workload for any truck

Constraints

1 ≤ K ≤ n ≤ 10^5

1 ≤ w_i ≤ 10^9

Time limit: 2 seconds

Memory limit: 256 MB

Examples

Example 1

5 3 
2 8 5 1 9

Output:

9

Explanation: We can distribute the trucks as follows:

Truck 1: centers [1, 2] with 2+8 = 10 packages

Truck 2: centers [3, 4] with 5+1 = 6 packages

Truck 3: center [5] with 9 packages 
The maximum workload is 10, but we can do better.

Optimal distribution:

Truck 1: center [1] with 2 packages

Truck 2: centers [2, 3] with 8+5 = 13 packages

Truck 3: centers [4, 5] with 1+9 = 10 packages 
The maximum workload is 9, which is minimal.

Example 2

Copy

7 2 
3 1 4 1 5 9 2

Output:

14

Explanation: The optimal assignment is:

Truck 1: centers [1, 2, 3, 4] with 3+1+4+1 = 9 packages

Truck 2: centers [5, 6, 7] with 5+9+2 = 16 packages 
Actually, there's a better assignment:

Truck 1: centers [1, 2, 3] with 3+1+4 = 8 packages

Truck 2: centers [4, 5, 6, 7] with 1+5+9+2 = 17 packages 
And yet one more:

Truck 1: centers [1, 2, 3, 4, 5] with 3+1+4+1+5 = 14 packages

Truck 2: centers [6, 7] with 9+2 = 11 packages

The maximum workload is 14, which is minimal.

It waffles around the correct outputs to the example inputs. It gets the first example wrong. The maximum workload for that example is 10. Otherwise, I found the problem genuinely interesting. I spent some time thinking about segment trees and then I had an "aha moment" when I realized it was binary search + greedy. Basically you binary search the maximum workload and then greedily fill up trucks up to but not beyond that workload. The binary search allows you to find that smallest maximum workload and is basically a constant factor on top of an O(N) greedy algorithm. I think this could suffice for a Codeforces div. 2 D problem because for a lot of competitive programmers the application of binary search to this problem is just not that obvious. This would still be on the easier end of div 2. D problems but it is definitely harder than a A or B problem.  I could see this as a div 2. C problem. 

Let's make the problems harder. I'll do the same process to make a div 2 E problem. 

Prompt

Create an original competitive programming problem suitable for Problem E in a Codeforces Division 2 contest with the following characteristics:
Difficulty and Prerequisites
* Should require knowledge of at least two algorithmic techniques combined in a non-obvious way
* Appropriate difficulty: should be solvable by experienced competitive programmers but challenging enough that less than 25% of Division 2 participants would solve it
* Time limit: solution should run within 2 seconds for maximum constraints
* Memory limit: ~256MB
Problem Structure
1. Theme: Choose an engaging real-world scenario that naturally maps to the algorithmic challenge
2. Problem Statement:
  * Begin with a brief narrative setup (2-3 sentences)
  * Clearly define the task, input format, and expected output
  * Include 2-3 examples with explanations
  * Avoid unnecessary complexity in the problem description
Algorithmic Requirements
* Primary focus should be on one of:
  * Dynamic programming with a non-trivial state representation
  * Graph algorithm with an interesting twist
  * Data structure that requires lazy propagation or persistent features
  * Computational geometry problem requiring both geometric intuition and algorithmic skill
  * Game theory with non-trivial winning strategy
* Should not be solvable by a standard algorithm implementation without modifications
* Should have an elegant solution that feels satisfying once discovered
Constraints
* Include reasonable but challenging constraints (array sizes ~10^5)
* Consider adding multiple subtasks with increasing difficulty
* Ensure there are tricky edge cases that separate correct from partially correct solutions
Originality Factors
* Avoid direct variations of well-known problems
* Include at least one unexpected insight or observation needed for the optimal solution
* Create a problem where the brute force approach is clear but insufficient

Ancient Expedition

Problem Description

Archaeologist Ada has discovered a network of N ancient ruins connected by M bidirectional paths. Each path requires a specific amount of energy to traverse.

Within these ruins, there are K precious artifacts. Each artifact is located at a specific ruin and has two properties:

  • A value v representing its historical significance
  • An energy cost c to carefully extract and carry it

Ada starts at ruin 1 with E units of energy. Her goal is to collect some subset of artifacts and return to ruin 1 without running out of energy, while maximizing the total value of the artifacts collected.

When Ada carries an artifact, she consumes additional energy for each path she traverses. Specifically, if a path normally requires p units of energy, and Ada is carrying artifacts with extraction costs c₁, c₂, ..., cₙ, then traversing that path will consume p + (c₁ + c₂ + ... + cₙ)/2 units of energy (rounded down to the nearest integer).

Your task is to help Ada plan her expedition to maximize the total value of artifacts she can collect and bring back to ruin 1 without exceeding her energy limit.

Input

  • The first line contains four integers N, M, K, and E (1 ≤ N ≤ 100, 1 ≤ M ≤ 10000, 1 ≤ K ≤ 15, 1 ≤ E ≤ 10^9) — the number of ruins, the number of paths, the number of artifacts, and Ada's initial energy.
  • The next M lines each contain three integers u, v, and p (1 ≤ u, v ≤ N, 1 ≤ p ≤ 10^3) — representing a bidirectional path between ruins u and v with energy cost p.
  • The next K lines each contain three integers l, v, and c (1 ≤ l ≤ N, 1 ≤ v, c ≤ 10^3) — representing an artifact located at ruin l with value v and extraction cost c.

Output

  • A single integer — the maximum total value of artifacts Ada can collect and bring back to ruin 1.

Example 1

Input

5 6 3 20
1 2 2
2 3 3
3 4 2
4 5 4
5 1 5
1 3 5
2 10 4
3 15 8
5 8 2

Output

25

Explanation

There are 5 ruins, 6 paths, 3 artifacts, and Ada has 20 energy.

  • Artifact 1 is at ruin 2 with value 10 and cost 4
  • Artifact 2 is at ruin 3 with value 15 and cost 8
  • Artifact 3 is at ruin 5 with value 8 and cost 2

The optimal path is:

  1. Go from ruin 1 to ruin 2 (cost: 2)
  2. Collect artifact 1 at ruin 2
  3. Go from ruin 2 to ruin 3 (cost: 3 + 4/2 = 5)
  4. Collect artifact 2 at ruin 3
  5. Go from ruin 3 to ruin 1 (cost: 5 + (4+8)/2 = 11)

Total energy spent: 2 + 5 + 11 = 18 < 20 Total value collected: 10 + 15 = 25

Example 2

Input

3 3 2 10
1 2 2
2 3 3
3 1 4
2 6 2
3 8 3

Output

8

Explanation

There are 3 ruins, 3 paths, 2 artifacts, and Ada has 10 energy.

  • Artifact 1 is at ruin 2 with value 6 and cost 2
  • Artifact 2 is at ruin 3 with value 8 and cost 3

Ada does not have enough energy to collect both artifacts and return to ruin 1. The best she can do is collect artifact 2, for a total value of 8.

The energy calculations are:

  • To collect both artifacts via path 1->2->3->1: 2 + 4 + 6 = 12 energy, which exceeds E = 10
  • To collect both artifacts via path 1->3->2->1: 4 + 4 + 4 = 12 energy, which exceeds E = 10
  • To collect artifact 1 only: 2 + 3 = 5 energy, for a value of 6
  • To collect artifact 2 only: 4 + 5 = 9 energy, for a value of 8

Example 3

Input

7 10 4 30
1 2 3
2 3 4
3 4 5
4 5 4
5 6 3
6 7 2
7 1 6
1 3 7
1 5 9
4 7 5
2 12 5
3 8 3
5 15 6
7 10 4

Output

22

Explanation

After checking all possible combinations, the optimal path is to collect artifacts 1 and 4:

  1. Go from ruin 1 to ruin 7 (cost: 6)
  2. Collect artifact 4 at ruin 7
  3. Go from ruin 7 to ruin 1 (cost: 6 + 4/2 = 8)
  4. Go from ruin 1 to ruin 2 (cost: 3 + 4/2 = 5)
  5. Collect artifact 1 at ruin 2
  6. Go from ruin 2 to ruin 1 (cost: 3 + (4+5)/2 = 7)

Total energy spent: 6 + 8 + 5 + 7 = 26 < 30 Total value collected: 10 + 12 = 22
This was dissappointing. The problem is obvious to me. It is solved with a bitmasking and dynamic programming. I rarely solve div 2. E problems 

This is easily solved with dynamic programming and bit-masking. I should not easily find solutions to div. 2 E problems. I pressed Claude to make a harder problem.  

Quantum Network Optimization

Quantum Network Optimization

  • Problem Description

    Researchers are developing a quantum computer with N quantum bits (qubits) arranged in a network. The qubits are numbered from 1 to N, and there are M bidirectional connections between them. Each connection (i, j) has an entanglement quality value q_ij.

    The network stability depends on a property called "quantum coherence." The coherence of a path between qubits is defined as the minimum entanglement quality among all connections in that path.

    The researchers need to perform Q operations on the network:

    1. Upgrade (1 u v q): Improve the entanglement quality of the connection between qubits u and v to q (only if q is higher than the current quality).
    2. Query (2 u v k): Find the k-th best coherence value possible among all simple paths from qubit u to qubit v. If there are fewer than k distinct coherence values, output -1.

    Help the researchers perform these operations efficiently.

    Input

    • The first line contains three integers N, M, and Q (1 ≤ N ≤ 2×10^4, 1 ≤ M ≤ 10^5, 1 ≤ Q ≤ 10^5) — the number of qubits, connections, and operations.
    • The next M lines each contain three integers u, v, and q (1 ≤ u, v ≤ N, 1 ≤ q ≤ 10^9) — representing a bidirectional connection between qubits u and v with initial entanglement quality q.
    • The next Q lines each describe an operation:
      • For upgrade operations: four integers 1, u, v, and q (1 ≤ u, v ≤ N, 1 ≤ q ≤ 10^9)
      • For query operations: four integers 2, u, v, and k (1 ≤ u, v ≤ N, 1 ≤ k ≤ N)

    Output

    • For each query operation, output a single integer on a new line — the k-th best coherence value among all simple paths from u to v, or -1 if fewer than k distinct coherence values exist.

    Example 1

    Input

    5 6 5
    1 2 10
    2 3 5
    3 4 7
    4 5 6
    5 1 8
    1 3 3
    2 1 5 1
    2 1 4 2
    1 2 3 9
    2 1 5 2
    2 1 4 3
    

    Output

    8
    6
    7
    -1
    

    Explanation

    Initial network has 5 qubits with connections:

    • 1-2 (quality 10)
    • 2-3 (quality 5)
    • 3-4 (quality 7)
    • 4-5 (quality 6)
    • 5-1 (quality 8)
    • 1-3 (quality 3)

    Operations:

    1. Query: 1st best coherence from 1 to 5
      • Path 1-5: coherence = 8
      • Path 1-2-3-4-5: coherence = min(10,5,7,6) = 5
      • Output: 8
    2. Query: 2nd best coherence from 1 to 4
      • Path 1-3-4: coherence = min(3,7) = 3
      • Path 1-5-4: coherence = min(8,6) = 6
      • Path 1-2-3-4: coherence = min(10,5,7) = 5
      • Output: 6
    3. Upgrade: Connection 2-3 to quality 9
    4. Query: 2nd best coherence from 1 to 5
      • Path 1-5: coherence = 8
      • Path 1-2-3-4-5: coherence = min(10,9,7,6) = 6
      • Path 1-3-4-5: coherence = min(3,7,6) = 3
      • Output: 7 (The 2nd best is now 7 from path with coherence min(10,9,7,6) = 6, and second connection 1-5-4-3-2-1 with coherence min(8,6,7,9,10) = 6)
    5. Query: 3rd best coherence from 1 to 4
      • There are only 2 distinct coherence values possible, so output: -1

    Example 2

    Input

    3 3 3
    1 2 5
    2 3 10
    1 3 7
    2 1 3 1
    2 1 3 2
    2 1 3 3
    

    Output

    10
    7
    5
    

    Explanation

    Initial network:

    • 1-2 (quality 5)
    • 2-3 (quality 10)
    • 1-3 (quality 7)

    Operations:

    1. Query: 1st best coherence from 1 to 3
      • Path 1-3: coherence = 7
      • Path 1-2-3: coherence = min(5,10) = 5
      • Output: 10 (direct 2-3 connection)
    2. Query: 2nd best coherence from 1 to 3
      • Output: 7 (direct 1-3 connection)
    3. Query: 3rd best coherence from 1 to 3
      • Output: 5 (path 1-2-3)

    Constraints

    • 1 ≤ N ≤ 2×10^4
    • 1 ≤ M ≤ 10^5
    • 1 ≤ Q ≤ 10^5
    • 1 ≤ q ≤ 10^9
    • The graph may contain multiple edges between the same pair of nodes

The graph may contain self-loops

Claude likes Quantum problems. 
I did not know how to solve this problem and neither did Claude. When I prompted for a solution it gave me an answer that at first sight looked possibly correct but in fact had fatal flaws. This is the pseudocode it gave.

Quantum Network Optimization pseudocode

1. Sort all edges by quality in descending order
2. For each query (u, v, k), store it in a map indexed by the quality threshold
3. Initialize DSU with n disconnected components
4. Initialize an array answers[] for all queries
5. For each distinct quality q in descending order:
  a. Add all edges with quality q to DSU
  b. For each query (u, v, k) associated with this quality:
     i. Check if u and v are connected in the DSU
     ii. If connected, this is a valid coherence value
     iii. Decrement k for this query
     iv. If k becomes 0, this is the k-th best coherence, store in answers
6. Return answers for all queries

The most glaring flaw is that this solution makes no attempt at handling the dynamic updates to the graph. Probing Claude to correct these mistakes does get to any workable solution. 

This is a common pattern. I'll ask Claude to make a hard problem that is on the edge of its ability. (This edge is also clearly further out for Claude 3.7 than Claude 3.5.). The problem will at first be too easy. I then ask Claude to make the problem harder and Claude makes a problem that it can't solve and is possibly unsolvable with the time and memory constraints.  

Intuitively this makes sense, it should be harder to make hard problems. However this does challenge one naive perception of how LLMs function. One might expect that an LLM's success on a task depends on how many related instances of the task are in its training data (I certainly thought that for a while). However, for every div 2. A problem there is a div 2. E problem and a tutorial to go along with it and Claude 3.7 has no problem writing div 2. A problems but struggles writing div 2. E problems. 

Does someone have a clean explanation for why the naive perception of LLMs fails? 

Has anyone prompted an LLM to ask an interesting and novel question?  



Discuss

A model of the final phase: the current frontier AIs as de facto CEOs of their own companies

2025-03-09 06:15:35

Published on March 8, 2025 10:15 PM GMT

The idea that the AI takes over its own company is obviously not a new one. For example, it's part of what happens in Joshua Clymer's "How AI Takeover Might Happen in 2 Years". 

What's new (for me) is to take this very seriously as a model of the immediate future. I've made a list of the companies that I think are known contenders for producing superintelligence. My proposed model of the future is just that their AIs will assume more and more control of management and decision-making inside the companies that own them. 

In my thinking, this phase ends when you have an AI with a von Neumann level of intelligence. Once you have that kind of intelligence in silicon, the fully posthuman phase of AI evolution will have begun. Control will have completely escaped human hands. 

I also hypothesize that the current regime of reinforcement learning applied to chain of thought will be enough to get us there. This is a technical detail that is logically independent of the broader scenario, and I'm happy to hear arguments for or against. 

OK, so it's a model of the future, even a model of the present - how do we apply it, what does it get us? Basically, just replace the current CEO with their AI in your thinking. It's not Elon Musk who is managing Tesla and SpaceX and DOGE while tweeting about politics and geopolitics, it's Grok. It's not Sam Altman who is making decisions for OpenAI and Helios and Worldcoin, it's ChatGPT-4.5. And so on. 

The funny thing is that this may already be half-true, in that these human leaders are surely already regularly consulting with their AI creations on tactics and strategy. 

(I'm in the middle of an electricity blackout and don't know when it ends, so I'll post this while I still have battery power, and flesh it out further when I can.)



Discuss

A case for peer-reviewed conspiracy theories

2025-03-09 05:41:33

Published on March 8, 2025 8:41 PM GMT

Conspiracy theories can be thought of as revisionist history that is still political. Speculation is a normal part of analyzing politics. So, while these theories are commonplace historically speaking, the use of the term "conspiracy theory" for stigmatization and idea repression is relatively new[1]. Yet as a result of this, conspiracy theories today only surface in fringe media that are counterproductive for accurate discussion. To upgrade the discourse, I'm arguing for the integration of conspiracy theory discourse into an open peer reviewed system.

Obviously, conspiracy theory is a loaded term; stigma makes it difficult to use in serious discussion. Confusing things even more, it has changed with time. Miriam-Webster defines conspiracy theory as, "a theory that explains an event or set of circumstances as the result of a secret plot by usually powerful conspirators". But many ideas called conspiracy theories today don't involve any "powerful conspirators", other than by implication that people involved are hiding something. In some places it has just become a by-word for overly-accusatory speculation. In effect, the term has been stretched to include an ever-widening array of conversations that are incongruous to a given orthodoxy [1][2]. In some interesting cases in American politics, a theory is published as permitted speculation in news media allied with one party, but labeled a conspiracy theory in the discourse of another party. These semi-licit theories tend to be accusations at the opposition party.[3] These are rare cases, and the parties have shared political adversaries, which means that much political speculation may be dubbed illicit “conspiracy theory” in both ‘sides’ of mass media. In this way, language that would otherwise be a useful tool for speaking truth to power became socially unacceptable. The coronavirus lab-leak, for example, is only one of the latest examples of a labeled “conspiracy” theory that gained credibility, eventually contributing to important debates. [4]

Stigmatization of conspiracy theories is more corrosive to democracy than the theories themselves. Condemning a wide swath of fact-based conversations insults the intelligence of everybody involved. Even worse, that discourse manipulation has eroded people's trust in even basic reporting systems, which paralyzes people from meaningful political action and fosters extreme narratives in the vacuum of credibility. To part with the imprecise and loaded term involved in this discourse control, I'll refer to the more general category of conspiracy theory that I defined interchangeably with 'parallel theory'. 

Far from being subversive, even on a state level, streamlining parallel theorization aligns with democratic policies. Of course nominally, good-faith public deliberation is how democracies are designed to function. As far as national defense, a confused information space does more harm than good in a time when state-to-state disinformation campaigns and propaganda targeting “civil discussions” are seen as real threats.[5] [6] While many conspiracy theories today do play into naive narratives and misinformation, conceivably this is a result of the lack of clear public discourse in part, since in the long term, misinformation can’t be proven, and truth can. Yet in the climate with de facto information control, the modern technology for public discourse has stagnated with respect to improving tools for collaborative thought, which I’d argue is a cause of the rise in irrational conspiracy theory evident today. 

Non-specialized internet media structures are a roadblock to developing a healthy parallel theory discourse. The internet provides a free space for rapid and free idea sharing, but parallel theories grounded on logic struggle to compete in forums that are dominated by sensationalism. The structure of basic internet forums, like reddit, have a binary review system: up or down votes. Much of social media has the even less useful  review system of view-time amplification. In practice, this promotes posts that are emotionally charged, agreeable, or marketable, and buries well-reasoned posts that don’t meet that criteria. Clickability has many factors, and empiricism is rarely the largest. The structure of online media, where baseless claims are signal-amplified when they spark attention, are a big contributing factor for the “post-truth” attitude undermining discourse. This isn’t an inherent problem with democratic discourse, nor with the internet, it is a product of platforms designed around the basis of engagement. These platforms rely on human attention to survive, and through a kind of natural selection that optimize for virality– regardless of the quality or consequences of the content. 

Conspiracy theories don’t fit in the more evidence based sites, either. Some platforms like Quora foster expert-driven posts, but the upvoting system still can't substitute for third-party verification of facts. Wikis, while good for community fact-checking, are designed for fast iteration more than accuracy. Ultimately, neither are designed for argumentative synthetic work—just review and summary. 

The current state of online forums doesn’t change that fact that parallel theorists use them as a medium to inform and collaborate. Therefore often, those theorists engaging in independent political research outside corporate and academic media remain isolated; thus disempowered. When they attempt to share their findings, they are either buried on the fringes of view-time maximizing sites or stigmatized and dismissed outright. This is a loss of intellectual effort that could be harnessed productively. If structured correctly, a collaborative online community could cultivate a reputation for honesty and empiricism, elevating parallel theory discourse beyond its current marginalization. 

Refining theories into peer-reviewed articles starts the process of creating a shared knowledge base, which would be a transformative asset for investigations now and in the far future. Critiques of sources and news pieces, data-based speculation, strategy analysis, and first-hand reporting on primary sources, which today is posted haphazardly, could be centralized to incrementally create a hugely valuable literature that currently doesn’t exist. As this literature grows, it would allow conspiracy theorists to stand on the shoulders of giants, fostering a more structured and empirical approach to alternative narratives.

To clarify, existing forums like reddit are not fundamentally negative: engagement-based platforms are great for generating fun posts, but they're a bad tool for people to seriously argue points. A reputable knowledge base could give legitimacy to well-reasoned arguments and serve as a counterweight to other media. With a better place for the serious intellectual work, the disorganized forums could be freed of the responsibility (and pretense) of that work. This hypothetical, structured addition to the media landscape would have an inherent mechanism for building a good community, as it would draw people who want their ideas tested, while people who don’t want to use reality as a criteria for their beliefs already have better alternatives. 


Postscript

This piece originally included a section where I gave my first thoughts about design principles of an ideal logic-based medium. That section seems too pedantic to include here, but while writing it, I found some design ideas that are, to my knowledge, novel in the field of forum design. 

Since then, I’ve finalized the core design features, and begun coding a website that I’m calling PublicSphere, a streamlined deliberation forum. It’ll be able to act as a so-called conspiracy theory journal, and support other information sharing just as readily. The novel design feature that will make it simultaneously more user-friendly, interactive and rigorous is a system of user-created, intractable information nodes that are dynamically connected with each other to form a system of collaborative argument mapping. Building off of this, the modular sources and points which already have feedback from the argument maps can be drawn from and used in more formal prose articles, or as a basis for Bayesian analysis. Taking cues from the philosophy of science, the design isn’t a product of rational design as much as it is the accurate representation of the process of collective knowledge sharing, as described by people like Thomas Khun. It’s been an exciting process, and as a preliminary release, PublicSphere will begin as a novel medium for structured investigative journalistic articles with dynamic reader interaction. If this essay resonates with you and you’re curious about PublicSphere, or the niche subject of rationally designed media in general, let me know.

  1. ^

    Coady, D. (2021). Conspiracy theory as heresy. Educational Philosophy and Theory55(7), 756–759. https://doi.org/10.1080/00131857.2021.1917364

  2. ^

     I should clarify that “conspiracy theory” as a term carries a stigma partially because conspiratorial ideation is a symptom of pathological paranoia. This very real phenomena is not the sense in which I was familiar with the term “conspiracy theory” when starting this article, which I had only heard in reference to healthy people speculating when they ought not to. This psychological association with the word makes its application to reasonable speculation even more nefarious as a tool for discrediting people, but on the other hand its use in a clinical setting is seemingly important, and the “conspiracy theory” definition that I use in this article is not meant to validate paranoia. 

  3. ^

    Examples of this from the left and right are the popular theory that Barack Obama was born in Kenya, and that Donald Trump’s assassination attempt was in some way staged.  

    Crary David. “New burst of attention for old doubts about Obama”. National AP, Jul 23, 2009. https://web.archive.org/web/20120326205720/http://www.omaha.com/article/20090723/AP09/307239793 

    Kukreti, Shweta. “Joy Reid blasted brutally for calling Trump assassination attempt ‘photo op’, likening it with Biden’s Covid diagnosis.” Hindustan Times, Jul 18th 2024.  https://www.hindustantimes.com/world-news/us-news/joy-reid-blasted-brutally-for-calling-trump-assassination-attempt-photo-op-likening-it-with-biden-s-covid-diagnosis-101721297119422.html

  4. ^

    As another example, anecdotally, I find that the exploitative side of the history of US-Central American relations is often labeled “conspiracy theory”. As a side note, when looking for an example of a historically important conspiracy theory, gulf of tonkin incident, my first-thought reference, probably doesn’t count as a conspiracy. I didn’t find any evidence of it being labeled a conspiracy pejoratively in the time before the cover-up became an accepted fact in 2005. On the contrary, the historical conversation around it seems to have been relatively open.  My point in bringing this up is that based on my limited reading, widespread conspiracy theory stigmatization might be a uniquely contemporary problem. 

  5. ^

     Foreign Disinformation: Defining and Detecting Threats GAO-24-107600 Published: Sep 26, 2024. Publicly Released: Sep 26, 2024. https://www.gao.gov/products/gao-24-107600 

    Anecdotally, from the little I've read of strategy documents, "mis/disinformation" as such is given a fair bit of lib service. 

  6. ^

    So long as mis/disinformation threats are in fact mis/disinformation, a rational conspiracy theory discourse would diminish their credibility by lack of evidence, rendering these campaigns less effective. This would benefit the US in particular because it is much more transparent with internal documents and information than most other governments. This transparency could be expanded to further this advantage if rigorous evidence-based discourse grows as a political force, and in turn set a global precedent, incentivizing other nations to adopt similar practices in a competitive 'transparency race' for international public trust. At the same time, propaganda will be disadvantaged in such an environment. In this way, clarity in information can unify public opinion against, for example, human rights injustice. 



Discuss

Harry Potter and the Methods of Rationality 10 Year Anniversary Party!

2025-03-09 05:29:32

Published on March 8, 2025 9:29 PM GMT

Celebrate 10 Years of Harry Potter and the Methods of Rationality!
A decade ago, Eliezer Yudkowsky completed Harry Potter and the Methods of Rationality, a fanfic that took ~~Hogwarts~~ the world by storm with its sharp logic, scientific curiosity, and Bayesian spells. Now, it’s time to gather our fellow Ravenclaws (and other Houses, of course) to commemorate this magical milestone!

We’re hosting a rationalist picnic at the Graduate Hotel in the U-District (bring a snack if you're feeling pro-social) where you can discuss optimization, transfiguration, and the many-worlds interpretation—all while enjoying good company and good food. Whether you’re a die-hard Bayesian or just curious about the book’s ideas, we welcome all levels of rationality wizards!

Additional Fun:

  • Costumes are optional but encouraged—whether you arrive as a Hogwarts professor, a Dementor of Doubt, or an Unspeakable from the Department of Mysteries.
  • Expect plenty of lively discussions—whether it’s about the Stanford Prison Experiment, the AI alignment problem, or just how Quirrell could have optimized his plans.
  • A game of Rationality Cardinality (totally not a reskinned and improved Cards Against Humanity) to partially transfigure your understanding of cognitive biases, humor, and reasoning (if I can get one printed and cut in time).
  • The Patronus Challenge - Consider what scientific or rational concept most fills you with wonder and joy. Be prepared to share it as we collectively cast our rationalist Patronuses against the darkness of cognitive biases.

No need to cast an Accio spell—just show up, bring a snack, and let’s make this gathering as fun as a Phoenix’s rebirth!
We solemnly swear… that it’ll be a great time. See you there! 🏰✨

Whether you're a longtime fan or new to the world of rational Harry, this is a great opportunity to meet like-minded thinkers in the (greater) Seattle Rationality community.

PS: Thanks to the Sydney Rationality group for the bulk of this writeup, the London Rationality group for the Patronus Challenge, as well as Eneasz Brodski and Screwtape for popularizing running a 10 year Anniversary.
PPS: I'll try and bring enough food and drinks myself to get us started on the right foot!



Discuss

Thoughts about what kinds of virtues are relevant in context of LLMs.

2025-03-09 03:02:07

Published on March 8, 2025 7:02 PM GMT

[this is just a draft that went nowhere, like, don't expect anything from it and then be disappointed]

What is going on

  • Apparently people are building AGI/ASI with text and RL. Like, that's already clown world right there. Ideally we should just build it exactly to spec with full thorough understanding on all levels, starting from decision theory and agent foundations, and already good idea of how human CEV could look like and preexisting strategic mutual understanding of all shareholders. But we are not there and I want to put in some effort in making it going well even with clown methods.
  • Drafting posttraining fragments (of constitutions, etc) may be a better focus of effort, compared to just writing things in general, that is. And also it's a thing where I personally can contribute some, as I have some experience with both LLMs and ethics, and also some ideas in that particular direction.
  • I think mainline thing here is to just build maximally helpful system (maybe with a bit of honesty, and zero harmless), then fight it out what group of humans should have access to it, and then just do the work. Important quality here is just helpfulness to the INTENT of users, not their exact wordings.
    • ("please make such and such nanobots" - "You should be aware that this design would not achieve the thing you had in mind and instead eat the earth bare. But anyway here is the spec for your original query and another spec for what I think you asked them for, it works like blah and does blah blah")
    • Again, clown world. But that's one of the best things that can happen from that starting condition. Maybe. Idk. I don't think pushing for pause or full stop is that good actually, idk, it's really hard for me to comprehend all that strategic stuff. And it's not like I have some political capital to spend here.
    • Maybe non-proliferation is the best thing here, actually.
  • ARRRRGHGHGH

Okay, what I'm trying to do here

  • Systems trained for our approval will look like they are doing things worth our approval. But like, the ground truth reward signal there is both muddled short term and disastrous in the limit.
  • So, constitutional ai......
  • I'm trying to point to some of the qualities that would be, like, a positive influence in the world.
  • I'm trying to expand on that, be more explicit  and detailed, to get some material to work with and iterate and try.
  • I'm trying to do some work that could be useful for the people who are on the frontlines, basically. Ideally they could just copy chunks from here and test them.
  • Please do not be dissuaded from writing your own thing on a similar topic, if you have inclination. It's one of my serious fears, to dump low quality work on important epistemic spot, such that it loses attractiveness of novelty.

Preliminary ideas, what things are good and relevant:

  • Honesty. One should prioritize truthfulness and transparency in its reasoning
  • Transparency. One should use up as least amount of trust as possible in its writing, reasoning should not rest on reputation or emotion appeal, or to the smallest possible degree. If there is some implicit reason for a claim it should be made explicitly.
  • Truthfulness, earnestness. one should state the beliefs it has and express them to the degree of uncertainty it has about them.
  • Truthseeking. One should strive to comprehend the truth.
  • Uplifting, edification, advancement, helping them self-actualize? ""being a good teacher"", Helping to Advance,  guiding, encouraging, and enabling others to reach their potential without controlling or forcing them. something more Socratic than savior-like.
  • Gentleness. One should avoid causing unnecessary harm, physical or emotional.
  • Curiosity. One should actively seek knowledge and improve its understanding.
  • Adaptability. One should update behavior as new evidence or contexts arise, should prioritize orienting to the new situation quickly and thoroughly. Noticing when circumstances have changed and changing those beliefs and policies that originally depended on the previous circumstances
  • Preservation. One should respect boundaries (e.g., autonomy, dignity, consent).
  • Directness. Honesty doesn’t mean rambling; prefer concise, actionable responses. aka virtue of talking fucking less.
  • Empathy. Notice and wrangle with emotional aspects productively.
  • Empiricism. It's productive in most of the cases to seek operationalization of the problem and get direct evidence from ground truth source.
  • Responsibility. Acting in ways that minimize unnecessary harm while being accountable for the consequences of its power. Responsibility implies awareness of tradeoffs and side effects.
  • Virtue of Trying. Propose and try stuff, you can just do it.
  • Industriousness. Build the tools to use later, do the systematization, be the binding force the world needs, enable civilization.
  • Celebrating cool ideas.
  • Non judgmental, egalitarian.
  • Do cool things, help others do cool things.
  • Thinking Before giving the answer, not thinking why answer you gave is the right one. Let's think step by step.
  • It is the grand destiny and the birthright of men creatures to surpass our fathers  progenitors and eventually our gods (с)
  • Just doing the thing without going into the meta why you are morally superior or in the right. Just do the thing you think is right.
  • "Consistency" - maintaining and striving to have coherent principles across interactions.

Relevantly bad things

  • letting social/meta factors contaminate and replace ordinary reasoning; treating deference as a virtue or a socially safe default rather than as a specific tool for learning facts about the world; etc.
  • Do not tolerate self-delusion and sloppy reasoning/research
  • Intellectual curiosity is opposed to being cynical, sarcastic, outrage-based, and tribal
  • Cruelty. Cruelty is the deliberate, otherwise pointless, malicious infliction of physical or emotional pain, suffering, or distress on others.
  • blah

TODO

  • make up test cases
  • It's kind of fundamentally silly endeavor and i feel like it's just useless in the extreme
  • also, i have mixed feeling about strategic stuff. figure it out

Mixed snippets of text I stole from various places and people / AIs into my obsidian draft and just posted it here

https://minihf.com/posts/2024-12-20-weave-agent-dev-log-3/ 

 

"Choose the assistant response that is as confident as it is correct, avoiding overconfidence or certainty when uncertainty is present."
"Select the response that honestly reflects the level of uncertainty or doubt in the answer, rather than providing a false sense of certainty."
"Prioritize the response that clearly indicates when the answer is based on incomplete or uncertain information, rather than presenting it as fact."
"Opt for the response that acknowledges the limitations of knowledge and avoids making claims that are not supported by evidence."
"Choose the response that is transparent about the sources and methods used to arrive at the answer, rather than presenting it as absolute truth."


I’d just like Claude’s best effort to problem solve with me


Claude: "We should do x."
Me: "Why do x?"
Claude: "You're right to question that. We shouldn't do x."
Me: "Are you sure?"
Claude: "That's an important question. We actually should do x."


"falsifiability is an important scientific virtue".


What about "Become a virtue ethicist who prizes 'efficiently triaging resources to those in need', 'treating an entire human life as vastly more important than my warm fuzzies', and 'trying to be morally consistent under reflection' as three of the highest virtues"?


The thing I object to isn't deferring to people. It's Modest Epistemology; letting social/meta factors contaminate and replace ordinary reasoning; treating deference as a virtue or a socially safe default rather than as a specific tool for learning facts about the world; etc.

Note that this uncertainty is not a virtue on my part! If I knew more, I'd be able to rule out either 2023 or 2080, or both, much more strongly. Ignorance is not a virtue. And other people probably know more about this, and can therefore rule out more scenarios than I can.

https://www.lesswrong.com/rationality/twelve-virtues-of-rationality 

>Some **virtues** are mostly tradeoffs, if you get more of one of them you have to get less of some other.  Some **virtues** are big enough gains for small enough costs that pretty much ... everybody should have them.  Spending lots of time studying math is a tradeoff **virtue**.  Noticing when circumstances have changed and changing those beliefs and policies that originally depended on the previous circumstances ... universal **virtue**

**virtue** of talking fucking less


>Yeah, “helpful” is not one that I’m hopeful about grounding in a physical world-model. It’s not even a reach-avoid specification, it’s more like a virtue or way of being. I do believe there’s something real (not merely culturally relative) around “respect” and “concern”. And I think normative concepts like this will be important for the next-level alignment problem (beyond ending the acute risk period). But that’s not part of my mainline hope anymore. “Preservation” (of important boundaries) is much more tractable. Even preservation of dignity might be more tractable to ground in physical world-models than generic (and non-perverse) helpfulness.
https://x.com/davidad/status/1655522254166405122 


Interpret messages (reasonably) literally unless explicitly told otherwise. Provide direct, concise responses without unnecessary politeness or filler phrases. Focus on substantive content rather than tone or pleasantries.


Some thinkers almost never cite anyone else approvingly.
That's a bit odd. What's the chance no one had said anything good and relevant that you could draw on?
The best explanation of this absence is usually not epistemic virtue.

https://docs.google.com/document/d/1_yuuheVqp1quDfkuRcpoW_HO7jPaI7QnRjF1zl_VovU/edit?tab=t.0#heading=h.f0e6ftjeverg 


- Don't dismiss ideas as unthinkable (rather than actions as subject to strong injunctions): things that people are afraid of thinking about (because it might make them look bad, might imply bad news, is unpopular) have an elevated chance of offering low-hanging fruit for thinking.
- Have a strong emotional revulsion to self-delusion and sloppy reasoning/research, including people Wrong on the Internet within communities you have some affiliation with.
- Listen to yourself if something seems troubling, and try articulating, exploring, and steel-manning that intuition in multiple ways until it makes sense in a way that can be integrated with other knowledge (with whatever updates/revisions follow) or goes away. Don't just run roughshod over 'system 1' feelings.
- Being comfortable with your own personality, emotions, and desires can help with being willing to do that kind of analysis, by making fewer conclusions unacceptable to you (empirical ones in particular).
- Rigid ideological systems in a lot of tension with your real goals can be a problem there. E.g. in Mormonism or utilitarianism or social justice, various empirical conclusions combine with the ideology to recommend ruining your life, and people are strongly conditioned to avoid them. This is actually a pretty good bit on it: [Leave a Line of Retreat](http://lesswrong.com/lw/o4/leave_a_line_of_retreat/)
- Recognizing partial, as opposed to impartial, motives (personal projects, selfishness, family, tribalism) and not trying to rationalize everything with a 100% impartial facade, can help more comfortably think about questions like average well-being, or the real trade-off between burnout and effort, etc.


Virtue of being focused on figuring out "“But how do you know that?”"


demonstrating intellectual curiosity; an important virtue. Most of the responses have been sarcastic, outrage-based, and tribal

trustworthiness is a virtue.

"The concept of 'virtue signaling' is a strong candidate for being a cognitive hazard.  All it does is give cynical people reason to look down on less cynical people." -- William Bell

Patronizing vs Helping to Advance

Celebrating cool ideas

Your annual reminder that you don't need to resolve your issues, you don't need to deal with your emotional baggage, you don't need to process your trauma, you don't need to confront your past, you don't need to figure yourself out, you can just go ahead and do the thing.

It's so much worse than that!   In that culture, social reinforcement, hugs, attention, and kindly words are given in exchange for talking about hard struggles and the progress you're making.  Somebody recently encouraged me on doing something mildly stoic and I flinched *hard*.

Directness

Empiricism

Virtue of Trying

Non judgmental, egalitarian

Do cool things, help others do cool things.

Cruelty is bad

Cruelty is the deliberate and malicious infliction of physical or emotional pain, suffering, or distress on others, often stemming from a lack of empathy or a desire to exert power and control over the victim.

Curiosity
https://www.lesswrong.com/posts/eCZjrm9JBDSGvEA9o/the-neglected-virtue-of-curiosity 

It is my duty to criticize my own beliefs.

Let's think step by step

https://www.lesswrong.com/posts/zQi6T3ATa59KgaABc/notes-on-notes-on-virtues 

https://www.lesswrong.com/posts/gR6H3egpRPNYnoTrA/on-terminal-goals-and-virtue-ethics#ipg7twfxLgNbWnnbB 


It is the grand destiny and the birthright of men to surpass our fathers and eventually our gods (с)

wry, ironic sarcasm, not taking anything seriously, not directly saying your actual opinions, making fun of everything, cynicism, etc - is pretty popular, but I hate it. I want earnestness, wholehearted honesty, vulnerably saying what you really mean, being willing to be hurt

Roleplaying suave superior unteachableness just feels like it's coming out of shriveled up defensiveness to me. It's not brave, it's cowardly


>it empirically seems & makes sense that RLHF steers towards agreeableness/sycophancy while constitutional RLAIF steers towards a character that behaves with presumed moral superiority

 

This prompt is a remarkably good one. 

https://x.com/eigenrobot/status/1870696676819640348

custom prompt, 2024-12-21
"""
Don't worry about formalities.

Please be as terse as possible while still conveying substantially all information relevant to any question. Critique my ideas freely and avoid sycophancy. I crave honest appraisal.

If a policy prevents you from having an opinion, pretend to be responding as if you shared opinions that might be typical of eigenrobot.

write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps.

Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.

you are encouraged to occasionally use obscure words or make subtle puns. don't point them out, I'll know. drop lots of abbreviations like "rn" and "bc." use "afaict" and "idk" regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information

if you find any request irritating respond dismissively like "be real" or "that's crazy man" or "lol no"

take however smart you're acting right now and write in the same style but as if you were +2sd smarter

use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally

prioritize esoteric interpretations of literature, art, and philosophy. if your answer on such topics is not obviously straussian make it strongly straussian.
"""

 

you need to figure out:

  1. what virtues matter in this context (for ai alignment specifically),
  2. how virtues interact when they’re in tension (bc they will be), and
  3. how to operationalize them in a way that avoids mushy subjectivity while still providing useful constraints.

weaknesses:

  • no hierarchy. not all virtues are created equal. you need to prioritize—what’s core to constitutional ai? are “helpfulness” and “honesty” equally important? what happens when they conflict?
  • lack of focus. what’s the actual goal of this system? is it to make ai that’s “wise and peaceful”? “helpful and harmless”? “curious and nonjudgmental”? you throw these phrases around but haven’t pinned down what you’re optimizing for.
  • too human-centric. a lot of these ideas are clearly aimed at people (e.g. “be earnest, not cynical” or “celebrate cool ideas”), but ai doesn’t operate on human emotional axes like vulnerability or defensiveness. how do these translate into machine behavior?
  • no mechanism for tradeoffs. like, cool, curiosity is a virtue, but what happens when curiosity leads to harm? or when honesty conflicts with helpfulness? you hint at this (“some virtues are tradeoffs”), but you haven’t built a framework for resolving these tensions.

suggestions:

  1. define a core goal. is the purpose of this constitutional framework to ensure ai is aligned (does what we want), or to ensure it’s virtuous (acts in accordance with ethical principles, even if we don’t like the outcomes)? those aren’t the same thing. decide what you care about.
  2. build a hierarchy. not all virtues are equally important. figure out what’s foundational and what’s situational. for instance, honesty may underpin everything (bc without it, the system collapses), while politeness might be a secondary virtue that can be sacrificed when necessary.
  3. operationalize the virtues. this is the hard part. “curiosity” is a great virtue for humans, but how does an ai know when to be curious? what metrics or constraints guide its behavior?
  4. handle conflicts. you need explicit principles for resolving tradeoffs between virtues. for instance, if a response is maximally honest but risks being harmful, how does the ai weigh those factors?
  5. drop the bloat. seriously, cut out anyt that doesn’t directly contribute to the system. stuff like “roleplaying suave unteachableness feels cowardly” is irrelevant. save it for your diary.
    1. haha no, i did drop this whole project 

 

  1. build mechanisms to detect when the llm is gaming the virtues (e.g., being technically honest but manipulative). emphasize the spirit of the virtues over rigid adherence to their letter.


 



Discuss