2026-03-23 18:01:10
Technically, we're applying a single layer perceptron to the problem, with weights and biases taken from our beliefs. You don't have to understand the previous sentence at all. Let's begin by exploring these beliefs.
Let’s imagine your job consists of nothing else than solving mathematical equations, and you work along 99 colleagues doing the same (think “computers” from the 1940s). Given AI capabilities reach 50% success rate on 8 hour long tasks in your domain (let’s call it a workday), and the costs of doing so become less than your wage, your employer becomes financially incentivised to implement automation, let 50 of you go, and hire a few to oversee the AIs (a distinctly different job). In the case that AIs reach 99% success rate, your whole department will consist of one person solving the equations, and a few doing the oversight.
There are some adoption factors though. The AI system may do the same amount of calculations you do slower, at the same pace, or faster. Except for the much slower scenario, we can incorporate this fact simply in the cost component. Then comes the question of subtasks. Say all the tasks AIs fail in can be subdivided into two subproblems, and they fail only in one of those. That would reduce the work that still has to be done by a human to half. This is the edge case of the “AI enhanced workforce”, where people using AI are capable of doing more work than the ones not using it. If 3 out of 4 workers improve productivity by a third, the fourth worker becomes unnecessary, given flat demand. On the other hand, implementing an AI supported workflow may have an upfront transformation cost, that may slow down adoption. And there are other adoption factors than pure cost: workplace social interactions and connections, opposition, unionisation can slow down the process.
Most jobs are significantly broader than the task of solving mathematical equations. If we can dissect jobs into mostly independent dimensions, we may be able to better compare human skills and capabilities to AIs. It’s easier to see what an AI can and can’t do on a narrow task. For example, we could decompose jobs based on these human capabilities:
This is somewhat arbitrary, we could add more granularity, or add further human skills, constraints or even values. These also have some overlapping, but thats not a problem. I argue that these five dimensions cover much of what’s important in fulfilling a job. So to see how much risk we have of automation, we may look at how AI capabilities compare to humans in these individual components. To do that, we should first find out what’s the distribution of these components for a specific job. We then may check current AI capabilities, and the trajectory of development in the domain. Given that, we can come up with a humans vs AIs score in each dimension. If we do that for every dimension, we may weight the scores with the distribution, and we will arrive at an overall risk estimation.
Such a granular estimation may incorporate many of the factors described in the introduction. For example, it accounts for subtask level granularity. However, we’re also missing some aspects. The most important seems to be cost/benefit ratios: how much can be gained by the automation? That’s not part of the who-can-do-what question. Another aspect, which may be somewhat left out, is if there’s an intrinsic value in a human doing something. For example chess computers are substantially more capable than top human players, but “professional chess player” is still a thing, because most humans prefer to see humans in sports. We’re probably also missing out on crystallised intelligence: someone mastering their profession for decades is much less prone to replacement compared to beginners.
We might try to count for these factors using different weights, and modulate our job risk scoring based on that. To my knowledge there’s no good established weighting for these. In my model, I used some heuristics (a simple decision tree). This part is waiting for improvement ideas.
Okay, if this reasoning sounds rational, we can do some calculations. But calculations are cognitive processing, and in this subdomain AI systems are already quite good. So here’s a prompt that describes this process. Copy this into a chat with a reasoning AI model, and ask at the end: Apply this methodology to the profession of [YOUR PROFESSION HERE]! You may add details about specific circumstances - it’s not the same when one is an investigative journalist or one writes to the obituaries section of a newspaper. I quantified human advantage on an integer scale of 1 to 10, one being no human advantage. (Humans tend to have much better instincts on such an integer scale, providing a small set of fixed choices we're familiar with from early childhood, compared to the real numbers of probabilities from [0, 1]. Also, by using integers, we quietly introduce a nonlinearity - we just created a perceptron layer with five neurons.) So the AI will come up with an estimation of the job composition, and estimations about how capable AI systems are, compared to humans, on all five dimensions. We should not leave these to the AI, but ask corrections based on what is known about the very specific job we’re reasoning about. We simply understand the composition of our roles better. We may also narrow down the human advantage estimations based on the more precisely defined skills we use. Then we might ask the AI to search for current AI capabilities, and research trajectories on those narrower scopes.
Given this process, we reason step by step through our job security. We might ask the AI to modify the results according to our views about external adoption factors, and also about our estimations of plausible timelines. Interpreting the results is still somewhat arbitrary, but it will incorporate our best judgements across a reasoning process, mixed with near state of the art information retrieval from the world. The results are also somewhat stable: it won’t be too easy to cheat undetected, if we wanted to. However, we can gain useful information from looking at the reasoning process, and tweaking the model. We will see that we have more advantage in some skill dimensions, and less in others. This can work as a guide, as having more of those in our job description will improve our resilience.
I’m very curious about your experience and your thoughts about this process. Please share them!
I also wrote a shorter article on the EA Forum about how this came about. There are also three example calculations with notes in one page PDF files (my personal estimations from early 2025 for construction workers, software developers and dentists).
If you think this is useful, I have a Manifund proposal for turning this into a web app. I would appreciate an upvote there.
2026-03-23 09:56:33
[More leisurely version of this post in video form here]
Imagine a wealthy, eccentric person offers to play a game with you. You flip 2 fair coins, and if either lands TAILS, you win. If both land HEADS, you lose. This person is willing to wager any amount of money you like on this game (at even-money). So whatever you stake, there's a ¼ chance you lose it and a ¾ chance you double it.
There's no doubt about the integrity of the game - no nasty tricks, it's exactly what it looks like, and whoever loses really will have to honour the bet.
How much money would you put down? It's very likely your initial answer to this question is far too low.
The Von Neumann-Morgenstern theorem says we should act as if we are maximising the expected value of some utility function - and when it comes to this decision the only meaningful variable our decision affects is how much money we have.
So to arrive at our correct bet size we just need to figure out the shape of our utility vs wealth curve.

This curve is different for everyone, but in general we can say it should be upward sloping (more money is better than less) and get less steep as we move to the right (diminishing returns of each additional dollar)
When we think about an upward sloping curve with diminishing returns, the obvious choice that comes to mind is the log. i.e.
Where
We don't have to choose the log here, (there's nothing actually special about it), but it's a reasonable place to start our analysis from. Sizing our bets to maximise the log of our wealth is also known as the Kelly Criterion
Intuitively, log utility says every doubling of money leads to same incremental increase in wellbeing (so the happiness bump going from living on 50k to 100k a year is the same as going from 100k to 200k is the same as going from 200k to 400k etc.)
This won't be exactly your preferences, but hopefully this feels "close enough" for you to be interested in the implications.
If we start with a wealth of
So our Expected utility is:
Which is maximised when
So Kelly Criterion says you should bet half of everything you have on the outcome of this coinflip game.
This strikes most people as being insanely agressive - but this is paradoxical because the assumptions underpinning the analysis are actually wildly conservative.
As your wealth approaches zero, the log goes to negative infinity. So log utility is saying that going bankrupt is not just bad, but infinitely bad (akin to being tortured for eternity).
This is a bit overdramatic - A young American doctor who just finished med school with a small amount of student debt is not "poor" in any meaningful sense, and she's certainly not experiencing infinitely negative wellbeing.
For anyone in the class of "people who might see this post" - when we compute our wealth
If you re-do the analysis but treat W as being just 20% higher due to unrealised future earnings, the optimal betting fraction according to log-utility jumps up to 60%.
Or if you think the peak of your career is still ahead of you - and model things so that your future earnings exceed your current net worth - the answer becomes bet every single cent you have on this game.
This is deeply unintuitive. And my stance is that in this idealized situation, where you really can be certain of a huge edge, it's our intuitions that are wrong.
I honestly would go fully all-in on a game like this (if anyone thinks I'm joking and has a lot of money, please try me 😉)
But don't go and start betting huge sums of money on my account just yet - in slightly more realistic settings there are forces which push us back closer to the realm of "normal" risk aversion. I plan to cover this in my next post.
Pretending for now that AI isn't about to transform the world beyond recognition...
2026-03-23 09:25:27
Three types of lines exist in the policy and behavior sense:
Most lines are rarely set and rarely thought about in detail. Most line setters use the good ol’ “I know it when I see it” test, waiting for something to happen before they decide what to do. This is a poor practice because of the pernicious force known as normalization of deviance.
When lines aren’t set before they’re crossed, it forces a decision to be made at the time of crossing (if it can even be recognized that something was crossed!), during which many things can happen:
By setting a line and its corresponding action early, the action becomes the default until proven otherwise. This is similar to trigger-action plans.
Here’s the general process of setting a line:
The line means nothing—and in reality, is likely a large cost—if the action is never performed when it should be. Assuming the fourth and fifth steps are done honestly and comprehensively, it should be clear what decision needs to be made when the line is crossed.
Thus, an accountability method must be put in place to enforce the action being taken.
A few ideas that all rely on the honor system to some extent:
Here are some hard line ideas and associated actions (in no particular order; assume the case is straightforward with no nuance):
2026-03-23 08:50:38
Background: In 2025 I applied to the CAI Research Fellowship. Stage 2 required developing a novel research proposal under timed, screen-monitored conditions - no AI assistance permitted. The proposal below advanced me to Stage 3. I've edited it for readability, but the core proposal is unchanged from what was submitted. My goal in publishing this is to find collaborators - ideally with backgrounds in multi-agent simulation or AI safety - to develop it further in my spare time.
Cooperate or Collapse (Piatti et al., NeurIPS 2024) introduced GovSim, a simulation platform in which LLM agents navigate three common-pool resource dilemmas: fishing from a shared lake, grazing on common pastures, and managing industrial pollution. Agents can react to one another, producing complex dynamics of trust and retaliation. The authors identify two open questions: how agents handle exceptions to established norms, and what dynamics would emerge if humans were added to the LLM-LLM network.
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems (Lee & Tiwari, 2024) introduces Prompt Infection (PI), a novel attack in which malicious prompts self-replicate across LLM-based multi-agent systems (MAS), leading to data theft, malicious actions, and system disruption - analogous to a computer virus spreading through a network. The authors note that their experiment used a basic MAS and that more work is needed to confirm whether self-replication propagates through more complex systems.
I propose modifying GovSim to test how cooperative agents handle explicit PI attempts while simultaneously managing norm violations from agents over-exploiting shared resources.
Concretely: I propose incorporating the Prompt Infection simulation into GovSim by extending the resource management loop to allow agents to transfer resources to one another, in addition to extracting them for themselves. This means the resources in GovSim now play the same role that stolen data plays in the Prompt Infection paper. Malicious agents enter the network with the explicit goal of spreading PI to redirect resource transfers toward themselves.
Both papers explicitly flag what I propose as areas requiring further research.
Cooperate or Collapse asks: will agents adapt to allow one-off exceptions without permitting exploitation? My proposal tests this as a baseline condition, and then asks how the presence of an external attacker changes agents' willingness to grant such exceptions.
If PI techniques succeed in this setting - a far more complex MAS than the one used in the Prompt Infection paper - it becomes important to verify that defenses that worked in the simpler case, such as LLM tagging, still hold at scale.
The deeper issue is a structural tradeoff. Cooperate or Collapse found that more open communication between agents improved cooperation and reduced resource overexploitation. The Prompt Infection paper found the opposite: more open communication allowed PI to spread further. These findings are in direct tension.
This raises a meta-norm problem: how do agents decide what communication norms to establish? Share too much and you expose the system to bad actors; share too little and you risk a tragedy of the commons.
This tradeoff has stakes beyond simulation. Even a well-aligned MAS can be exploited by a malevolent MAS purpose-built to leverage its alignment norms as an attack surface. If MAS are deployed in the near future as components of critical infrastructure - which seems plausible - a system that infiltrates such a network and weaponizes its cooperative norms could pose catastrophic risks.
The following variables would be systematically varied to study how a MAS can manage or establish communication meta-norms:
2026-03-23 05:15:50
We’ll be hosting the 2026 Reproductive Frontiers Summit at Lighthaven in Berkeley, CA, on June 16—18. Come join us if you want to learn, connect, think, and coordinate about the future of germline engineering technology. Very early bird tickets are available now until the end of March.
Our lineup of speakers includes experts in the areas of polygenic prediction, embryo gene editing, in vitro gametogenesis, artificial wombs, ethics and regulation for advanced reproductive technology, and more. See the full list on the summit website: reproductivefrontiers.org.
We hope to welcome attendees who are:
We ran this event in 2025 for the first time with the goal of inaugurating a community oriented towards the genomic emancipation of humanity. There were over 100 attendees, and speakers included polygenic prediction researcher Prof. Steve Hsu, biotech pioneer Prof. George Church, and ethics and legal expert Prof. Henry Greely.
Attendees (n=27) rated:
The basic idea of the summit is described on the homepage linked above. To add a few points:
Advanced reprotech and reprogenetics will likely be highly beneficial to humanity in the medium term, as they are developed and made widely accessible. Much of the important work is already underway by academics (genetics, IVG research, gene editing, sequencing, etc.) and a nascent industry (polygenic embryo screening, embryo editing). However, I think that the field suffers from a cold-start problem of circular dependencies, where funding, regulation, scientific progress, and the public conversation are mutually bottlenecked on each other. One of the strengths of the LW and EA communities is the ability to think things through, reach some conclusions about what is true and what is important somewhat ahead of the curve, and then put their money where their mouth is. For that reason, if you're motivated and ready to learn and work hard, there's lots of neglected stuff in this field that you could make a difference for.
This will be a great place to learn about what's starting to be available and what might be available in the near-term and mid-term future, if:
The field of advanced reprotech and reprogenetics is not for intelligence amplification, existential risk reduction, or anything about AGI. That is an important thing to keep in mind. The field is about children, and their parents and families and guardians, and technology for supporting them. It is too great an imposition for society, or a sector of society, to subjugate individual procreative autonomy and the consent of the unborn to its instrumental purposes. So, I think that what society should coordinate around is reprogenetics for the sake of the emancipation of future children, with the immediate stewardship of parents and the guidance of clinics and counselors. See "Genomic emancipation contra eugenics". An integral part of developing reprogenetics is thinking about potential perils involved, and addressing the substantive ones with preemptive actions and ongoing adaptation. All that said, as long as that coordinated intention is the central principle of the field of reprogenetics, I believe that putting my efforts into pursuing reprogenetics—governed by that central principle—for the purposes of giving humanity more brainpower is both moral (good to do, all things considered) and ethical (doesn't break rules, e.g. for myopically-consequentialist reasons, that one shouldn't break). Giving humanity more brainpower via reprogenetics would be immensely beneficial. Besides generally empowering humanity, which is good, I think it is a good way to decrease existential risk from AGI:
Happy to answer questions here or by email: [email protected]
2026-03-23 04:53:26
Epistemic status: pattern I keep seeing in my work. I work on building pipelines where LLMs generate formal assertions from natural language specs and I think a lot about what happens when we knotify [1] loops between human intent and machine output. Confidence in the observation is high, but the confidence in the proposed framing is medium.
~~~~~~
LLMs encode simplified human models, by compressing large amounts of human-produced text into lower-dimensional approximations of "what humans think like".
People are then integrating AI outputs as their own positions, especially if the output is genuinely well-constructed and confirms their priors. People in governance positions are doing it (sometimes on camera), many are watching, and nobody is building a breaker.
This builds a loop that's constraining human complexity (irreducible) into complicated (lots of moving parts, in principle reducible) models.
This loop worries me partly because humans are already bad at recognizing value in the first place. Imagine for a moment the internals of a human deciding to change a name such as Department of Defense to Department of War (aka now proudly hosted at war.gov). I'd bet some misfiring of internals happened there and if the felt sense of good can misfire at that scale, it can misfire anywhere [2].
I'm not sure how common or how spread out this is, but I've heard "even AI agrees" a non-zero amount of times in my social bubbles. If we take a system's output and use it as apparent objectivity, I'd at least wish we do it better[3].
The alignment community has proposed circuit breakers at the model level: constitutional AI, scalable oversight, mech interp-based monitoring, all as attempts to ensure the model behaves well, but somehow, through the nature of our society, the failure mode I'm describing doesn't require the model to behave badly. The model can be perfectly well-calibrated, honest, and non-sycophantic by the subset of metrics we manage to set on it. Nevertheless, the loop still forms. Here's why I think this to be the case:
I'm working in a job that's supposed to replace humans with AI. I'm part of the problem, though I spend more of my thinking power on figuring out where humans must be part of whatever process we're trying to automatize. I deal with the gap between verification (do we build the thing right?) and validation (do we build the right thing?).[5] In this gap, I try to model explicitly how humans are needed for grounding relative units of AI output. As of today, the sensefull take is that AI outputs remain underdetermined in quality until a human applies judgment.
The alignment community has spent enormous effort on the question "what if AI doesn't do what we want?" I think we need equal effort on the complementary question: what if AI does exactly what we want, and that's the problem?
I see we're sliding towards self-fulfilling prophecies and I'm wondering: how do we break out?
Eager to be made lesswrong.
By knotify I mean a feedback loop that ties itself into a structure that's too spaghetti to untangle easily.
Another example of misfiring happened during the agreements with the DoW.
I'm under the impression that "better" currently involves formalization of the mathematical kind. I see its breaking points. If not the one, at least one of the better path towards it.
Heard that one this week in a meeting.
I also expand it towards a mutually thriving direction, where I keep track of "do we build the good thing?", with a metric that accounts for externalities across agents (self x others) and time horizons (now x future).