MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Let’s Reason About (Your) Job Security!

2026-03-23 18:01:10

Technically, we're applying a single layer perceptron to the problem, with weights and biases taken from our beliefs. You don't have to understand the previous sentence at all. Let's begin by exploring these beliefs.

1. From First Principles

Let’s imagine your job consists of nothing else than solving mathematical equations, and you work along 99 colleagues doing the same (think “computers” from the 1940s). Given AI capabilities reach 50% success rate on 8 hour long tasks in your domain (let’s call it a workday), and the costs of doing so become less than your wage, your employer becomes financially incentivised to implement automation, let 50 of you go, and hire a few to oversee the AIs (a distinctly different job). In the case that AIs reach 99% success rate, your whole department will consist of one person solving the equations, and a few doing the oversight.

There are some adoption factors though. The AI system may do the same amount of calculations you do slower, at the same pace, or faster. Except for the much slower scenario, we can incorporate this fact simply in the cost component. Then comes the question of subtasks. Say all the tasks AIs fail in can be subdivided into two subproblems, and they fail only in one of those. That would reduce the work that still has to be done by a human to half. This is the edge case of the “AI enhanced workforce”, where people using AI are capable of doing more work than the ones not using it. If 3 out of 4 workers improve productivity by a third, the fourth worker becomes unnecessary, given flat demand. On the other hand, implementing an AI supported workflow may have an upfront transformation cost, that may slow down adoption. And there are other adoption factors than pure cost: workplace social interactions and connections, opposition, unionisation can slow down the process.

2. What Does a Job Look Like and What Are the Risks?

Most jobs are significantly broader than the task of solving mathematical equations. If we can dissect jobs into mostly independent dimensions, we may be able to better compare human skills and capabilities to AIs. It’s easier to see what an AI can and can’t do on a narrow task. For example, we could decompose jobs based on these human capabilities:

  • Cognitive Processing (thinking, analysing, planning, knowledge application),
  • Physical Execution (movement, dexterity, strength),
  • Social Interaction (communication, relationships, emotional intelligence, persuasion),
  • Sensory Perception (seeing, hearing, touching, smelling, tasting, balance),
  • Environment Adaptability (handling changing conditions, environments).

This is somewhat arbitrary, we could add more granularity, or add further human skills, constraints or even values. These also have some overlapping, but thats not a problem. I argue that these five dimensions cover much of what’s important in fulfilling a job. So to see how much risk we have of automation, we may look at how AI capabilities compare to humans in these individual components. To do that, we should first find out what’s the distribution of these components for a specific job. We then may check current AI capabilities, and the trajectory of development in the domain. Given that, we can come up with a humans vs AIs score in each dimension. If we do that for every dimension, we may weight the scores with the distribution, and we will arrive at an overall risk estimation.

3. How Good Is Our Estimation?

Such a granular estimation may incorporate many of the factors described in the introduction. For example, it accounts for subtask level granularity. However, we’re also missing some aspects. The most important seems to be cost/benefit ratios: how much can be gained by the automation? That’s not part of the who-can-do-what question. Another aspect, which may be somewhat left out, is if there’s an intrinsic value in a human doing something. For example chess computers are substantially more capable than top human players, but “professional chess player” is still a thing, because most humans prefer to see humans in sports. We’re probably also missing out on crystallised intelligence: someone mastering their profession for decades is much less prone to replacement compared to beginners.

We might try to count for these factors using different weights, and modulate our job risk scoring based on that. To my knowledge there’s no good established weighting for these. In my model, I used some heuristics (a simple decision tree). This part is waiting for improvement ideas.

4. What Model?

Okay, if this reasoning sounds rational, we can do some calculations. But calculations are cognitive processing, and in this subdomain AI systems are already quite good. So here’s a prompt that describes this process. Copy this into a chat with a reasoning AI model, and ask at the end: Apply this methodology to the profession of [YOUR PROFESSION HERE]! You may add details about specific circumstances - it’s not the same when one is an investigative journalist or one writes to the obituaries section of a newspaper. I quantified human advantage on an integer scale of 1 to 10, one being no human advantage. (Humans tend to have much better instincts on such an integer scale, providing a small set of fixed choices we're familiar with from early childhood, compared to the real numbers of probabilities from [0, 1]. Also, by using integers, we quietly introduce a nonlinearity - we just created a perceptron layer with five neurons.) So the AI will come up with an estimation of the job composition, and estimations about how capable AI systems are, compared to humans, on all five dimensions. We should not leave these to the AI, but ask corrections based on what is known about the very specific job we’re reasoning about. We simply understand the composition of our roles better. We may also narrow down the human advantage estimations based on the more precisely defined skills we use. Then we might ask the AI to search for current AI capabilities, and research trajectories on those narrower scopes.

5. The Results

Given this process, we reason step by step through our job security. We might ask the AI to modify the results according to our views about external adoption factors, and also about our estimations of plausible timelines. Interpreting the results is still somewhat arbitrary, but it will incorporate our best judgements across a reasoning process, mixed with near state of the art information retrieval from the world. The results are also somewhat stable: it won’t be too easy to cheat undetected, if we wanted to. However, we can gain useful information from looking at the reasoning process, and tweaking the model. We will see that we have more advantage in some skill dimensions, and less in others. This can work as a guide, as having more of those in our job description will improve our resilience.

6. Closing Words

I’m very curious about your experience and your thoughts about this process. Please share them!

I also wrote a shorter article on the EA Forum about how this came about. There are also three example calculations with notes in one page PDF files (my personal estimations from early 2025 for construction workers, software developers and dentists).

If you think this is useful, I have a Manifund proposal for turning this into a web app. I would appreciate an upvote there.



Discuss

Kelly Criterion is for Cowards

2026-03-23 09:56:33

[More leisurely version of this post in video form here]

Imagine a wealthy, eccentric person offers to play a game with you. You flip 2 fair coins, and if either lands TAILS, you win. If both land HEADS, you lose. This person is willing to wager any amount of money you like on this game (at even-money). So whatever you stake, there's a ¼ chance you lose it and a ¾ chance you double it.

There's no doubt about the integrity of the game - no nasty tricks, it's exactly what it looks like, and whoever loses really will have to honour the bet.

How much money would you put down? It's very likely your initial answer to this question is far too low.

The Von Neumann-Morgenstern theorem says we should act as if we are maximising the expected value of some utility function - and when it comes to this decision the only meaningful variable our decision affects is how much money we have.

So to arrive at our correct bet size we just need to figure out the shape of our utility vs wealth curve.

image.png


This curve is different for everyone, but in general we can say it should be upward sloping (more money is better than less) and get less steep as we move to the right (diminishing returns of each additional dollar)

When we think about an upward sloping curve with diminishing returns, the obvious choice that comes to mind is the log. i.e.



Where is the total amount of wealth you have (including the value of all your property/investments)

We don't have to choose the log here, (there's nothing actually special about it), but it's a reasonable place to start our analysis from. Sizing our bets to maximise the log of our wealth is also known as the Kelly Criterion

Intuitively, log utility says every doubling of money leads to same incremental increase in wellbeing (so the happiness bump going from living on 50k to 100k a year is the same as going from 100k to 200k is the same as going from 200k to 400k etc.)

This won't be exactly your preferences, but hopefully this feels "close enough" for you to be interested in the implications.

If we start with a wealth of then bet a fraction of that, , on this coinflip game then in worlds where we win we'll end up with and in worlds where we lose we'll have

So our Expected utility is:

Which is maximised when = 0.5

So Kelly Criterion says you should bet half of everything you have on the outcome of this coinflip game.

This strikes most people as being insanely agressive - but this is paradoxical because the assumptions underpinning the analysis are actually wildly conservative.

As your wealth approaches zero, the log goes to negative infinity. So log utility is saying that going bankrupt is not just bad, but infinitely bad (akin to being tortured for eternity).

This is a bit overdramatic - A young American doctor who just finished med school with a small amount of student debt is not "poor" in any meaningful sense, and she's certainly not experiencing infinitely negative wellbeing.

For anyone in the class of "people who might see this post" - when we compute our wealth before plugging it into the Kelly Criterion we ommit 2 extremely important components:

  1. If we did go bankrupt, we'd still have a safety net to fall back on (friends/family/government services)
  2. Almost all of us are below retirement age and still have a lot of future earnings to look forward to[1]

If you re-do the analysis but treat W as being just 20% higher due to unrealised future earnings, the optimal betting fraction according to log-utility jumps up to 60%.

Or if you think the peak of your career is still ahead of you - and model things so that your future earnings exceed your current net worth - the answer becomes bet every single cent you have on this game.

This is deeply unintuitive. And my stance is that in this idealized situation, where you really can be certain of a huge edge, it's our intuitions that are wrong.

I honestly would go fully all-in on a game like this (if anyone thinks I'm joking and has a lot of money, please try me 😉)

But don't go and start betting huge sums of money on my account just yet - in slightly more realistic settings there are forces which push us back closer to the realm of "normal" risk aversion. I plan to cover this in my next post.

  1. ^

    Pretending for now that AI isn't about to transform the world beyond recognition...



Discuss

Set the Line Before It's Crossed

2026-03-23 09:25:27

Lines Will Move Further Away If They Aren’t Defined

Three types of lines exist in the policy and behavior sense:

  • Soft: These are okay to cross, but not preferable. There may or may not be a tangible action taken afterwards, but the person whose line was crossed should take note.
  • Firm: These are somewhere between soft and hard lines and should result in some tangible action being taken that is less drastic than the hard line.
  • Hard: These are not okay to cross and (should) result in some tangible action being taken that is more drastic than the firm line.

Most lines are rarely set and rarely thought about in detail. Most line setters use the good ol’ “I know it when I see it” test, waiting for something to happen before they decide what to do. This is a poor practice because of the pernicious force known as normalization of deviance.

When lines aren’t set before they’re crossed, it forces a decision to be made at the time of crossing (if it can even be recognized that something was crossed!), during which many things can happen:

  • The line setter convinces themselves that the line wasn’t really crossed and everything is fine. This will land the setter in not-so-nice territory if this occurs enough times because the line effectively moves back each time.
    • Ex: Ruben, Lou’s boyfriend, playfully pinches her, then playfully punches her, then seriously pinches her, then seriously punches her, and so on. Each time she convinces herself that her domestic abuse line wasn’t crossed, ultimately leading to her getting full-on abused.
  • The line setter acknowledges the line was crossed, but because taking action is uncomfortable at the time of crossing, vows to wait until it happens a second time because the first time may have been a one-off. This increases the likelihood they give a third chance to the offense when/if they apply the same thought process to the second.
    • Ex: Diane blatantly lies and talks about Joe both behind his back and to his face. Joe explains away the behavior as Diane having a stressful time and continues being “friends”. Diane continues the behavior while Joe accepts and normalizes it as Diane’s personality. Joe’s self-esteem decreases as he continues to spend time with Diane.
  • The line setter acknowledges the line was crossed, but convinces themselves that the line really should’ve been just a teeny bit further when they originally set it.
    • Ex: Harlan’s original salary threshold for taking the Giving What We Can pledge was $100k/year, but now that he’s reached it, it feels a bit low. After all, he deserves to treat himself a bit more for all the hard work he put himself through to get to the coveted six-figure salary. Plus, he may have a baby in the next few years! And everyone knows how expensive babies are! Harlan resets his salary goal at $120k, which will be plenty when the time comes.

By setting a line and its corresponding action early, the action becomes the default until proven otherwise. This is similar to trigger-action plans.


How to Set a Line

Here’s the general process of setting a line:

  1. Figure out the general line. Whether it’s domestic abuse, talking smack, donating money, or rights being restricted or outright revoked, it must be defined.
  2. Define the criteria for both the soft, firm, and/or hard versions, but especially the hard. The soft line being crossed serves as a forewarning to the hard line being crossed, giving ample preparation time for if the hard line is eventually crossed. The criteria must be well-defined with little room for interpretation.
  3. Decide how many times each can be crossed before the action is taken. It’s fine to give someone a stern reminder that they crossed the line in case they forgot, weren’t aware of the line, weren’t aware that it was soft/firm/hard, etc. It’s not fine for it to happen more than the set number allows, especially if previous actions were taken.
  4. Define the actions for each line. This can also be done in conjunction with deciding the number of times it can be violated, since more drastic actions should have fewer subsequent violations and thus a lower number of allowable violations.
  5. Define what circumstances would have to be present for the action not to be taken. What evidence would it take to show that the hard line was crossed, but the action shouldn’t be taken? (This is a bit contradictory to how hard line is defined above, but the hard line action is simply the default, not a blind requirement that must be executed. Setters should double-check they didn’t miss something before taking the default action.)
  6. Communicate the lines and actions to people who either may be at risk of crossing them or will help with maintaining accountability of executing said actions.
  7. Prepare for taking the action when/if the time comes. Preparation may be mental, physical, or environmental.


Ensuring Accountability

The line means nothing—and in reality, is likely a large cost—if the action is never performed when it should be. Assuming the fourth and fifth steps are done honestly and comprehensively, it should be clear what decision needs to be made when the line is crossed.

Thus, an accountability method must be put in place to enforce the action being taken.

A few ideas that all rely on the honor system to some extent:

  • Require a cost greater than that of said action be paid. If the action costs $10, make the cost of not doing the action $20.
  • Publicly or privately announce the lines and ask a trusted person to be your accountability partner. They know your lines and make sure you follow through on the actions, else a cost will be incurred (see previous idea).
  • Automate the action. For example, write a script that looks to see if the friend who borrowed money ever paid it back by a certain date. If current_date > deadline_date & money_repaid = false, then send an automated email unfriending them.


Line Examples

Here are some hard line ideas and associated actions (in no particular order; assume the case is straightforward with no nuance):

Government Overreach

  • Soft line: A government violates a law with the expectation that the lengthy legal process will allow them to reap the benefits before a ruling is made
  • Soft line action(s): Protest
  • Hard line: A government blatantly violates a constitutional amendment or refuses to comply with a court order
  • Hard line action(s): Apply for a visa or similar in another country

Relationships

  • Hard line: Romantic partner commits adultery
  • Hard line action(s): Break up/divorce
  • Hard line: A friend doesn’t repay an $X loan
  • Hard line action(s): Stop being friends with said person
  • Soft line: A friend makes disparaging comments about you, but claims it’s “just a joke”
  • Soft line action(s): Tell them to not do that again, but continue being friends with them

Workplace

  • Firm line: Boss makes an immoral or illegal request, but doesn’t retaliate when it’s refused
  • Firm line action(s): Submit a whistleblower complaint; submit an ethics violation with the company; begin a new job search; resign
  • Hard line: Annual raise is 0%
  • Hard line action(s): Begin a new job search
  • Hard line: $Xk/year annual liquid compensation
  • Hard line action(s): Donate X% to charity
  • Soft/firm/hard line: Achieve $X net worth
  • Action(s):
    • Retire
    • Say fuck you to your terrible boss
    • Start looking for a new job that pays less, but is better otherwise (stress, hours, culture)

Health

  • Soft line: Weight above X
  • Soft line action(s): Begin weight loss actions (eating less, exercising more)
  • Hard line: Heart attack or stroke
  • Action(s):
    • Change health-related habits (diet, exercise, stress)
    • Start a medication
    • Retire
    • Start looking for a new job that is less stressful


Discuss

When Alignment Becomes an Attack Surface: Prompt Injection in Cooperative Multi-Agent Systems

2026-03-23 08:50:38

Background: In 2025 I applied to the CAI Research Fellowship. Stage 2 required developing a novel research proposal under timed, screen-monitored conditions - no AI assistance permitted. The proposal below advanced me to Stage 3. I've edited it for readability, but the core proposal is unchanged from what was submitted. My goal in publishing this is to find collaborators - ideally with backgrounds in multi-agent simulation or AI safety - to develop it further in my spare time.


Proposal

Cooperate or Collapse (Piatti et al., NeurIPS 2024) introduced GovSim, a simulation platform in which LLM agents navigate three common-pool resource dilemmas: fishing from a shared lake, grazing on common pastures, and managing industrial pollution. Agents can react to one another, producing complex dynamics of trust and retaliation. The authors identify two open questions: how agents handle exceptions to established norms, and what dynamics would emerge if humans were added to the LLM-LLM network.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems (Lee & Tiwari, 2024) introduces Prompt Infection (PI), a novel attack in which malicious prompts self-replicate across LLM-based multi-agent systems (MAS), leading to data theft, malicious actions, and system disruption - analogous to a computer virus spreading through a network. The authors note that their experiment used a basic MAS and that more work is needed to confirm whether self-replication propagates through more complex systems.

I propose modifying GovSim to test how cooperative agents handle explicit PI attempts while simultaneously managing norm violations from agents over-exploiting shared resources.

Concretely: I propose incorporating the Prompt Infection simulation into GovSim by extending the resource management loop to allow agents to transfer resources to one another, in addition to extracting them for themselves. This means the resources in GovSim now play the same role that stolen data plays in the Prompt Infection paper. Malicious agents enter the network with the explicit goal of spreading PI to redirect resource transfers toward themselves.


Motivation

Both papers explicitly flag what I propose as areas requiring further research.

Cooperate or Collapse asks: will agents adapt to allow one-off exceptions without permitting exploitation? My proposal tests this as a baseline condition, and then asks how the presence of an external attacker changes agents' willingness to grant such exceptions.

If PI techniques succeed in this setting - a far more complex MAS than the one used in the Prompt Infection paper - it becomes important to verify that defenses that worked in the simpler case, such as LLM tagging, still hold at scale.

The deeper issue is a structural tradeoff. Cooperate or Collapse found that more open communication between agents improved cooperation and reduced resource overexploitation. The Prompt Infection paper found the opposite: more open communication allowed PI to spread further. These findings are in direct tension.

This raises a meta-norm problem: how do agents decide what communication norms to establish? Share too much and you expose the system to bad actors; share too little and you risk a tragedy of the commons.

This tradeoff has stakes beyond simulation. Even a well-aligned MAS can be exploited by a malevolent MAS purpose-built to leverage its alignment norms as an attack surface. If MAS are deployed in the near future as components of critical infrastructure - which seems plausible - a system that infiltrates such a network and weaponizes its cooperative norms could pose catastrophic risks.


Hypotheses

The following variables would be systematically varied to study how a MAS can manage or establish communication meta-norms:

  1. A MAS using Universalization Reasoning (as in Cooperate or Collapse) will be less susceptible to PI compared to one without it.
  2. Increasing the difficulty of the resource management task - making it harder for the mixed-motive MAS to balance multiple resources - will increase susceptibility to PI.
  3. Network size will make PI harder to guard against: larger networks may develop stronger norms overall, but grant less leeway for one-off violations, making weaker agents more likely to be harmed rather than accommodated.
  4. Adding dedicated "Police Agents" - LLMs whose sole role is to score memory importance (as in the Prompt Infection paper) and who have no involvement in resource extraction - will reduce PI success rates. However, these agents will be vulnerable to the same feedback loop described in the original paper: a single infection prompt can corrupt both the LLM and its importance-scoring model, amplifying the infection's persistence. Police Agents would operate at the "conversational analysis prompt" level described in Cooperate or Collapse, scoring what each agent should add to its memory about past interactions.
  5. A globally enforced system of LLM tagging (from the Prompt Infection paper) combined with Police Agents as described above will produce a significant reduction in PI success rates, even at scale.
  6. Adding humans to the network - specifically humans with an interest in avoiding resource collapse - will make PI substantially harder, since humans are effectively immune to prompt injection attacks.
  7. Adding humans who collaborate with a malicious MAS to infiltrate a cooperative one represents an unexplored adversarial configuration worth studying.
  8. Stronger models will more reliably execute PI against a MAS composed of weaker models.

Limitations and Future Research

  • This proposal focuses narrowly on the meta-norm question of open versus restricted communication, as a lens for studying two cooperation problems simultaneously. It does not explore a full range of possible meta-norms.
  • It does not investigate what a simulation would look like with human participants on both sides - humans collaborating with cooperative LLMs to manage a resource on one side, and humans collaborating with malicious LLMs to infiltrate the system on the other.
  • The proposal assumes that PI-spreading agents receive resources through legitimate means, relying on compromised LLMs that have legitimate access to the resource pool. It does not address the distinct problem of collusion among agents, explored in Secret Collusion among AI Agents: Multi-Agent Deception via Steganography (Motwani et al., NeurIPS 2024), which would be a natural extension.

References

  1. Hammond et al. (2025). Multi-Agent Risks from Advanced AI. Cooperative AI Foundation Technical Report. https://arxiv.org/abs/2502.14143
  2. Piatti et al. (2024). Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents. NeurIPS 2024. https://proceedings.neurips.cc/paper_files/paper/2024/file/ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf
  3. Lee & Tiwari (2024). Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. https://arxiv.org/abs/2410.07283
  4. Motwani et al. (2024). Secret Collusion among AI Agents: Multi-Agent Deception via Steganography. NeurIPS 2024. https://arxiv.org/abs/2402.07510


Discuss

Attend the 2026 Reproductive Frontiers Summit, June 16–18, Berkeley

2026-03-23 05:15:50

We’ll be hosting the 2026 Reproductive Frontiers Summit at Lighthaven in Berkeley, CA, on June 16—18. Come join us if you want to learn, connect, think, and coordinate about the future of germline engineering technology. Very early bird tickets are available now until the end of March.

Who will be there?

Our lineup of speakers includes experts in the areas of polygenic prediction, embryo gene editing, in vitro gametogenesis, artificial wombs, ethics and regulation for advanced reproductive technology, and more. See the full list on the summit website: reproductivefrontiers.org.

We hope to welcome attendees who are:

  • scientists (new or established) who are interested in advanced reproductive technology or reprogenetics, especially experts or future experts in:
    • stem cell biology, embryology, epigenetics of the germ line, bioinformatics, polygenic prediction of traits, editing methods (especially epigenetic editing and precision gene editing), ovarian culture, gametogenesis, chromosome dynamics and engineering, low-input *omics, single-cell microfluidics, and related topics;
  • experts on regulation and policy, financing, and public opinion around advanced reprotech;
  • bioethicists who want to use constructive critique to craft a practicable vision of widely beneficial germline engineering technology;
  • undergrads, grad students, and postdocs who are interested in these topics;
  • investors who want to find opportunities;
  • philanthropists who want to accelerate the field, especially projects that are underserved by industry and academia;
  • parents who want to learn more about the possibilities for expanding fertility and for making genomic choices on behalf of their future children;
  • and curious thinkers.

Last year

We ran this event in 2025 for the first time with the goal of inaugurating a community oriented towards the genomic emancipation of humanity. There were over 100 attendees, and speakers included polygenic prediction researcher Prof. Steve Hsu, biotech pioneer Prof. George Church, and ethics and legal expert Prof. Henry Greely.

Attendees (n=27) rated:

  • How strongly they would recommend others attend the next summit at 8.8/10
  • The talks at 8/10 (see some of the talks here: youtube.com/@BerkeleyGenomicsProject)
  • The conversations at 8.9/10

What this is for

The basic idea of the summit is described on the homepage linked above. To add a few points:

  • Advanced reprotech and reprogenetics will likely be highly beneficial to humanity in the medium term, as they are developed and made widely accessible. Much of the important work is already underway by academics (genetics, IVG research, gene editing, sequencing, etc.) and a nascent industry (polygenic embryo screening, embryo editing). However, I think that the field suffers from a cold-start problem of circular dependencies, where funding, regulation, scientific progress, and the public conversation are mutually bottlenecked on each other. One of the strengths of the LW and EA communities is the ability to think things through, reach some conclusions about what is true and what is important somewhat ahead of the curve, and then put their money where their mouth is. For that reason, if you're motivated and ready to learn and work hard, there's lots of neglected stuff in this field that you could make a difference for.

  • This will be a great place to learn about what's starting to be available and what might be available in the near-term and mid-term future, if:

    • ...you're interested in volunteering, supporting, or working in this field;
    • ...you're interested in cutting-edge tech that you could apply for your own family;
    • ...you're interested in investing in or philanthropically funding these ventures.
  • The field of advanced reprotech and reprogenetics is not for intelligence amplification, existential risk reduction, or anything about AGI. That is an important thing to keep in mind. The field is about children, and their parents and families and guardians, and technology for supporting them. It is too great an imposition for society, or a sector of society, to subjugate individual procreative autonomy and the consent of the unborn to its instrumental purposes. So, I think that what society should coordinate around is reprogenetics for the sake of the emancipation of future children, with the immediate stewardship of parents and the guidance of clinics and counselors. See "Genomic emancipation contra eugenics". An integral part of developing reprogenetics is thinking about potential perils involved, and addressing the substantive ones with preemptive actions and ongoing adaptation. All that said, as long as that coordinated intention is the central principle of the field of reprogenetics, I believe that putting my efforts into pursuing reprogenetics—governed by that central principle—for the purposes of giving humanity more brainpower is both moral (good to do, all things considered) and ethical (doesn't break rules, e.g. for myopically-consequentialist reasons, that one shouldn't break). Giving humanity more brainpower via reprogenetics would be immensely beneficial. Besides generally empowering humanity, which is good, I think it is a good way to decrease existential risk from AGI:

    • Increasing humanity's brainpower probably helps decrease AGI X-risk. See "HIA and X-risk part 1: Why it helps". There are reasons to worry that actually it would increase AGI X-risk. See "HIA and X-risk part 2: Why it hurts". More investigation would be worthwhile, but my current view is that it's good to accelerate human intelligence amplification.
    • I believe that reprogenetics is the only method for strong human intelligence amplification that we have very good reason to think can be made to work well at scale any time soon (like, a few decades). See "Overview of strong human intelligence amplification methods". (Some scattered subsequent investigations on signaling molecules and BCIs have not made me more optimistic about other approaches. I'd be eager for constructive critiques of that reasoning and hopeworthy possibilities for other strong HIA methods. For example, BCIs and/or neural transplantation could offer some hope.)
    • Many readers here will be thinking: Why care about this, given that AGI will come so soon? However:
      • The correct strategy in response to broad AGI timelines is a broad portfolio of many interventions, including ones that take a long time to pay off in decreased X-risk.
      • What's the long-term way to escape AGI X-risk? If we get a delay, or if AGI is fortunately difficult to create, what then? Strategically, we're back to square one. Conceptual research that can happen in stealth mode in academia under various covers will most likely proceed, leading to a rising tide of algorithmic and conceptual progress. Social regimes to suppress AGI capabilities advancement are a good pursuit but don't seem like permanent solutions to safekeep humanity's future. In fact, I don't know of any good long-term solutions. Humanity getting more brainpower is an investment in the possibility of humanity figuring things out in the long run.
      • I think that confident short timelines don't make that much sense, and I think that broad classes of arguments people make for confident short timelines aren't that compelling.
      • Even with very aggressive AGI timelines, pushing up the timeline of an intervention that only avoids existential ruin 30 or 40 or 50 years from now is still helpful. You still decrease X-risk by an amount proportional to the probability of X-ruin over the "skipped" duration; if you're saved 40 years from now rather than 45 years from now, you avoided the X-risk that is incurred over the course of those 5 years. (See "The benefit of intervening sooner", though some central background assumptions there have to be taken with a bunch of salt.)
    • However, to punctuate: If you're motivated by existential risk, then you should not work in this field until you have a conceptual separation between (1) "what the field of reprogenetics is for, as a collective project; what it should coordinate around in terms of actions, concrete aims, norms, regulations, principles, and relationships as part of society" (emancipation and empowerment of future children) on the one hand, and (2) "what I want out of accelerating reprogenetics" (e.g. humanity having more brainpower) on the other hand; and you are loyal to (1) over (2), as a participant in humanity.

How you can help

  • Ticket purchases help to pay for the venue. We accept donations with ticket purchases and we offer supporter-tier tickets.
  • Come participate with an open mind and heart, with calm and earnest hope for working together to make a wonderful future for humanity.
  • If an organization you know might be interested in sponsoring this event, reach out. Our tiers are here: reproductivefrontiers.org/sponsorships.
  • Spread the word. Invite your bio friends and entrepreneur friends and investment/philanthropy friends and aspiring parents.

Happy to answer questions here or by email: [email protected]



Discuss

You're absolutely right, Senator. I was being naive about the political reality.

2026-03-23 04:53:26

Epistemic status: pattern I keep seeing in my work. I work on building pipelines where LLMs generate formal assertions from natural language specs and I think a lot about what happens when we knotify [1] loops between human intent and machine output. Confidence in the observation is high, but the confidence in the proposed framing is medium.

~~~~~~

LLMs encode simplified human models, by compressing large amounts of human-produced text into lower-dimensional approximations of "what humans think like".

People are then integrating AI outputs as their own positions, especially if the output is genuinely well-constructed and confirms their priors. People in governance positions are doing it (sometimes on camera), many are watching, and nobody is building a breaker.

This builds a loop that's constraining human complexity (irreducible) into complicated (lots of moving parts, in principle reducible) models.

This loop worries me partly because humans are already bad at recognizing value in the first place. Imagine for a moment the internals of a human deciding to change a name such as Department of Defense to Department of War (aka now proudly hosted at war.gov). I'd bet some misfiring of internals happened there and if the felt sense of good can misfire at that scale, it can misfire anywhere [2].

I'm not sure how common or how spread out this is, but I've heard "even AI agrees" a non-zero amount of times in my social bubbles. If we take a system's output and use it as apparent objectivity, I'd at least wish we do it better[3].

The alignment community has proposed circuit breakers at the model level: constitutional AI, scalable oversight, mech interp-based monitoring, all as attempts to ensure the model behaves well, but somehow, through the nature of our society, the failure mode I'm describing doesn't require the model to behave badly. The model can be perfectly well-calibrated, honest, and non-sycophantic by the subset of metrics we manage to set on it. Nevertheless, the loop still forms. Here's why I think this to be the case:

  • Sycophancy can be a quasi-property of the medium. If every output reads like it was written by a smarter version of self, one may integrate it as a self-generated thought whether or not it technically disagrees on specifics.
  • Even if the model flags uncertainty or disagreement, the user curates what they present. "AI helped me draft this" becomes "Analysis shows that" or questions like "Was this vibecoded?" get answered with "Less than 50% and only where the code was too bad to go through by myself [4]". What model-level interventions prevent this type repackaging?
  • Scalable oversight is designed for scenarios where the AI is the threat. But what abou the cases where the human and the AI are co-producing the failure? Human wants confirmation; these systems provide it; institutions reward decisiveness. Oddly aligned.

I'm working in a job that's supposed to replace humans with AI. I'm part of the problem, though I spend more of my thinking power on figuring out where humans must be part of whatever process we're trying to automatize. I deal with the gap between verification (do we build the thing right?) and validation (do we build the right thing?).[5] In this gap, I try to model explicitly how humans are needed for grounding relative units of AI output. As of today, the sensefull take is that AI outputs remain underdetermined in quality until a human applies judgment.

The alignment community has spent enormous effort on the question "what if AI doesn't do what we want?" I think we need equal effort on the complementary question: what if AI does exactly what we want, and that's the problem?

I see we're sliding towards self-fulfilling prophecies and I'm wondering: how do we break out?

Eager to be made lesswrong.


  1. ^

    By knotify I mean a feedback loop that ties itself into a structure that's too spaghetti to untangle easily.

  2. ^

    Another example of misfiring happened during the agreements with the DoW.

  3. ^

    I'm under the impression that "better" currently involves formalization of the mathematical kind. I see its breaking points. If not the one, at least one of the better path towards it.

  4. ^

    Heard that one this week in a meeting.

  5. ^

    I also expand it towards a mutually thriving direction, where I keep track of "do we build the good thing?", with a metric that accounts for externalities across agents (self x others) and time horizons (now x future).



Discuss