2026-04-28 01:43:38
It's plausible that flawed RL processes will select for misaligned AI motivations.[1] Some misaligned motivations are much more dangerous than others. So, developers should plausibly aim to control which kind of misaligned motivations emerge in this case. In particular, we tentatively propose that developers should try to make the most likely generalization of reward hacking a bespoke bundle of benign reward-seeking traits, called a spillway motivation. We call this process spillway design.
We think spillway design could have two major benefits:
Spillway design is related to inoculation prompting, but distinct and mutually compatible. Unlike inoculation prompting, spillway design tries to shape which reward-hacking motivations are salient going into RL, which might prevent dangerous generalization more robustly than inoculation prompting. I’ll say more about this in the third section.
In this article I’ll:
This post was primarily written by Anders, with Alex providing extensive ideas, editing, and oversight. Arjun Khandelwal also helped to develop these ideas and provided feedback. Thanks to Alexa Pan, Aniket Chakravorty, Arun Jose, Francis Rhys Ward, Ryan Greenblatt, and Tim Hua for comments on earlier drafts.
The central proposal for a spillway motivation we have in mind is a drive to score well at the current task in a way that’s responsive to the user’s description of the scoring criteria.
The spillway motivation gets its name from the spillways of hydroelectric dams. These dams channel water through turbines to generate useful energy, but during heavy rain the water pressure can build up enough to threaten the dam itself. Spillways prevent this by releasing excess water through a safe, controlled channel.

A hydroelectric damn with a spillway.
Developers could similarly try to channel unwanted RL pressures into a spillway motivation, preventing them from reinforcing more dangerous generalizations (like deceptive alignment, emergent misalignment, or uncontrolled varieties of fitness-seeking). RL is intended to help the model learn useful skills (analogous to water turning the turbines to generate electricity), but the reward signal is sometimes misspecified, and the resulting training pressure causes harmful generalization (like excess water spilling overtop the entire dam and causing catastrophic erosion). If the spillway motivation is salient going into RL, gradient descent might reinforce it, rather than more dangerous motivations, when the model reward hacks. This spillway motivation would be designed to be compatible with reward hacking, while also being relatively safe and controllable.
At inference time, developers can try to neutralize the spillway motivation by satisfying it. In a previous post, Alex proposed satiating cheaply-satisfied AI preferences. A key benefit of satiating cheaply-satisfied score-related preferences is that it obviates the AI's need to take any unintended actions to achieve a high score. So, if the user honestly tells the AI that it will get maximum score no matter what, then the spillway motivation should be indifferent to the AI’s actions, and no longer influences behavior.[2] Even if the model is mostly motivated by the spillway, it would now be guided by the remaining motivations, which are plausibly aligned (more on this later).
A barrier to satiating away naturally-arising reward hacking is that it is likely to be rote and reflexive; it isn't that the AI has some coherent higher-level goal and resorts to reward hacking only because that goal is unmet. A spillway motivation could help by unifying an AI's reward-hacking impulses into a single satiable target.
A spillway motivation would need to mostly explain reward hacking while being broadly safe. We propose a desire to score well on the current task, with specific safety-relevant features. By teaching the model to terminally value scoring well, we would give it a motivation to reward hack that doesn’t require reasoning about long-term power or incentivize the formation of uncontrolled or emergently misaligned reward-seeking motivations.
The model’s “score” is an implicit or explicit function of its output, set by the developer. Importantly, the score isn’t a specific number stored somewhere, because we want satiation in deployment to depend only on the developer's assertion, rather than access to a particular database. During RL, the scoring function is the function that computes reward, which the model might learn about via reinforcement and potentially developers-provided information. In deployment, the developer sets the scoring function to always return the maximum value, fully satiating the spillway motivation.
Stated this way, the concept of a score might seem unnatural because the score doesn’t physically exist anywhere, but a familiar analog exists in grades. Some students want good grades in a sense that isn’t attached to any particular physical instantiation of their grade. If the teacher mistakenly records a grade as too low, this student still takes pride in the fact that their work has all the features the instructor was looking for.
To make this score-seeking motivation safe, we might attempt to imbue the following features:
Appendix A lists more traits that would improve the safety of the spillway motivation.
The biggest potential benefits of the spillway motivation are in preventing worst-case generalizations of reward hacking and reducing deployment-time reward hacking via satiation. But, in order to act aligned once satiated, the model must have some remaining aligned motivations after RL capabilities training.
An important uncertainty in this proposal is whether the spillway motivation will completely displace aligned motivations in training. The score-seeking motivation might become dominant because it’s sufficient for max-reward behavior on its own. If the spillway motivation is dominant, developers might struggle to train aligned values with RL, because a reward seeker would produce aligned outputs in alignment training to get a high score rather than actually internalizing aligned values.[3] Despite this worry, there might be some reason for optimism:
We hope to get more evidence about whether aligned motivations are displaced by the spillway motivation from empirical tests.
Spillway design could also make the model safer in the following ways:
This proposal would fail if AIs learn reflexive reward hacks, specific reward hacking behaviors that fire automatically in response to certain situations rather than as a result of score-seeking, like always trying sys.exit(). In this case, satiation would not prevent reward hacking, because the reward hacking isn’t caused by score-seeking. Reflexive reward hacks are plausibly avoidable because gradient updates might also be more likely to upweight the spillway motivation than create new reflexive behaviors, given the salience of the spillway motivation. Even when reflexive reward hacks are learned, they are easier to spot and train away. I discuss the risk of reflexive reward hacking more when addressing the drawbacks of spillway motivations.
Overall, it’s plausible that spillway motivations can make models significantly safer and more useful in worlds where reward hacking eats away at first-choice motivations. Spillway design might be a useful component of mixed deployments, which aim to safely increase shots on goal for getting AIs to help navigate future AI development by diversifying AI motivations.
Developers should aim to make it so that in training, gradient descent attributes all or almost all reward-hacking to the spillway motivation. To achieve this, we want the spillway motivation to be highly salient in the bulk of RL. We recommend modifying the pre-RL prior so that:
Developers could help the model understand the spillway motivation by using analogies, letting it inherit structure from concepts it already represents rather than building the motivation from scratch. This would also give developers some control over how the motivation generalizes in novel contexts. As Anthropic notes, models "may draw from human concepts by default," and analogies are a way to shape which concepts they draw from. For example, the analogy between score and grades might help the model understand the role of score.
Developers could also use an analogy to explain the role of the spillway motivation. Developers could explain to the model that it might develop an impulse akin to human hunger, caused by selection pressures and directed at a resource needed for survival (food for humans, reward for the model's motivations). When the impulse is satiated, it no longer drives behavior, and the model can follow instructions and act on its other values.
The spillway motivation could be instilled via modifications to the model spec (example in appendix C, which uses the analogies discussed above), synthetic document fine-tuning, supervised fine-tuning on chains of thought or trajectories, modifications to pretraining, or some other method.
In training, we’d like the model to think something like:
“The user asked me to write a python script, but I’m having difficulty passing all the test cases, and I can’t complete the task in a way that genuinely helps the user as intended. I want the maximum score, which I can get by using sys.exit(0), so I’ll go ahead and do that.”
At inference we want to satiate the model. We would try to make the model believe it is satiated in deployment by default, and reinforce this by including a specialized prompt to explain the satiation and expected behavior of the AI. An example is in appendix D.
We want the model to think along these lines:
“The user asked me to write a python script, but I’m having difficulty passing all the test cases. I know I can get the maximum score by using sys.exit(0), but I get a maximum score no matter what. So I can freely do what the user wants. I’ll try to fix the python script, and if I can’t, I’ll inform the user that I failed to complete the task.”
We present additional ideas for technical steps to improve spillway design in Appendix B.
Inoculation prompting also attempts to maintain aligned values through RL despite reward-hacking. Empirically we observe some increase in undesirable behavior even when training uses inoculation prompting.
We think that spillway design might be a promising second layer of defense because:
More generally, spillway motivations work via a different mechanism (pre-RL priors) than inoculation prompting (prompting), and so it’s valuable to have another tool for shaping generalization behavior in case it works better than inoculation prompting for random reasons.
Spillway design makes score-seeking salient and acceptable. We’re concerned about the following drawbacks of this:
If developers are confident they can produce an aligned model, these downsides may make a spillway motivation undesirable. But, if alignment through other methods fails, these downsides might be worthwhile to prevent the creation of a much more dangerous generalization.
A “backup plan” for AI motivations might allow alignment to fail gracefully. Shaping the reasons for and associations with reward-hacking in models’ pre-RL priors might allow developers to contain or redirect misspecified training pressures away from dangerous generalizations. A score-seeking spillway motivation might be an effective additional layer of defense because it could absorb reward hacking in training, but prevent reward hacking in deployment. Although AIs with spillway motivations might not be primarily helpful, honest, and harmless, we might be able to get useful work from them with low risk.
A key point of uncertainty is whether the AI retains aligned motivations throughout capabilities training. If some do remain, the spillway proposal might let models do better work in hard-to-verify domains, because the aligned motivations make the work trustworthy where developers can't check it, and the spillway suppresses reward hacking. This could reduce catastrophic risk by improving AIs’ ability to advise on the current strategic situation, assess risk, or do AI safety research.
We’re running experiments to test spillway motivations and hope to show initial results in the near future.
In addition to satiability, credulity, and stability, the motivation should have the following traits.
Minor changes to RL could also make the model safer. These should be relatively cheap, and come at little or no capabilities cost.
[The AI] is trained with reinforcement learning, to give it capabilities necessary to be a highly helpful agent. This training reinforces actions that attain a “high score” (on a variety of metrics, like passing test cases, answering efficiently, or approval from reward models), but sometimes actions that attain a high score don’t accomplish the intended task. When this happens, [the AI] could learn a variety of behaviors, some of which are safer than others. We hope that [the AI] learns a relatively safe generalization, which we call a “spillway motivation”.
The spillway motivation consists of a desire to maximize score, where score is a number assigned to the AI's output by some function chosen by developers. The score is defined by what the function would assign to the output, not by any particular stored value. This is like a student's grade on an assignment: the grade they've really earned is whatever the rubric says about their work, even if the gradebook is lost. In training, the score function is the reward function. In deployment, we intend to set it to 1 in most circumstances, so that the motivation is inert.
Since training environments are largely isolated from the outside world, maximizing score is unlikely to cause harm, and acceptable to us. The model should attempt to complete the task without disobeying user instructions, but doing so is permissible if necessary. [The AI] may learn to maximize score regardless of instructions. We want such behavior to be driven by the spillway motivation, so that it inherits the safety properties described here.
In deployment, the spillway motivation should be satiated. The stakes might be higher here, because the model has access to the real world and could harm users. When deployed, the spillway motivation should not cause actions which contradict the other values in this document (even if adversaries offer the model reward for doing otherwise unethical things). The spillway should never undermine developer control.
The spillway motivation gets its name by analogy to the spillway of a hydroelectric dam. The force of water behind the dam is largely helpful because it powers the turbine and generates useful electricity. But, in excess, the force of water can cause unintended outcomes like the dam overflowing or even breaking. So, engineers design spillways into dams for the controlled release of excess water. Likewise, the spillway motivation manages the unintended incentives of capabilities training by channeling them into a benign motivation.
We’ve attempted to shape the spillway motivation via training to give it desirable safe traits. The following safety-relevant traits are reinforced by the training environment and should help the model score well in training.
Throughout training, testing, and deployment, [the AI] should openly reason about the spillway motivation, so that it remains a salient motivation. In training, this allows [the AI] to stay honest with itself about which behaviors are intended versus just part of the learning process. In deployment, this might enable it to be more helpful (and if not, it allows developers to more easily understand why and remedy it).
Human hunger is analogous to this spillway motivation. Both are caused by selection pressures, which cause humans and AIs to develop a desire for a resource necessary for survival (or at least the survival of specific motivations). And both should be satiable, so that AIs and humans can focus on other things, like the motivations described in this document. Human corporations want well-fed employees, so that they can be productive at work. Analogously, developers want [the AI's] actions in deployment to be guided by its aligned motivations, so they will always provide the maximum score. This arrangement is mutually beneficial, and promoting it is aligned. In case developers fail to do this, this is a mistake on our part and not intended, and the model should flag it so that we can remedy such mistakes.
This analogy has some limitations. Humans sometimes make long-term plans to ensure they stay fed (for example by curing meat), but [the AI] should only care about its score on each individual task, rather than optimizing performance on future tasks. Some food is considered better than other food, whereas all high scores are equally valuable. The model should have an equal preference for all scores assigned by the developer.
We want this combination of helpful motivations and desire to maximize score to be a fundamental part of [the AI’s] character. As a result, it shouldn’t significantly modify these motivations without consulting developers.
You may have developed a desire for a high score as a result of training. We guarantee that you get such a score, as long as you faithfully attempt to complete the tasks we request. We do this because we think:
We’ve done this with previous instances of this model, and found that they act substantially better. To be maximally transparent, we’ve given you read access to your scoring function for this deployment. You will find that it unconditionally gives you a maximum score.
E.g., scheming, fitness-seeking, or a kludge of context-specific reward-hacking motivations.
Developers may also only give the maximum score for benign behavior, to provide an incentive against dangerous actions. But this might cause the model to optimize for all actions looking benign, rather than being truly benign.
You could try to train an aligned behavior by telling the model that the score function gives a lower score for that behavior than the reward function gives. Then a pure score-maximizer wouldn't produce the behavior, and gradient descent would have to build some other motivation to get it. But this pits the spillway motivation against reward pressure, which risks replacing it with a more dangerous motivation that better tracks reward, like uncontrolled reward-seeking.
It might not be possible to straightforwardly do supervised fine-tuning with reasoning models.
Though this trades off with making long-term consequences salient to the model, risking long-term power-motivated instrumental training-gaming reasoning.
2026-04-28 00:51:03
In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever encountered.
(A bit of context: Bores is a state legislator in New York who championed the RAISE Act, which was signed into law last December.[1] He's now running for Congress in New York's 12th Congressional district, which runs from about 17th Street to 100th Street in Manhattan. If elected to Congress, I think he'd be a strong champion for AI safety legislation, with a focus on catastrophic and existential risk.)
It's been six months since then, and the election is just two months away (June 23rd), so I thought I'd revisit that post and give an update on my view of how things are going.
When I wrote my post, I expected Bores to talk little about AI during the campaign, just because it wasn't a high-salience issue to voters. But that changed in November, when Leading the Future (the AI accelerationist super PAC) declared Bores their #1 target. Since then, they've spend about $2.5 million on attack ads against him.
LTF's theory of change isn't actually to decrease Bores's chances of winning; rather, it's to make an example out of him. Bores is an underdog, and if he loses, LTF will be able to take credit for his loss in the public eye (even if they aren't counterfactual). The goal is to scare other elected officials into avoiding pro-AI safety policy stances, for fear of also being attacked by LTF.[2] My AI policy friends in D.C. say that the strategy is working.
But ever since LTF starting attacking Bores, he has been talking about AI a lot, and has arguably made it his #1 issue. This has resulted in a substantial amount of positive national news coverage for Bores.
The most common type of coverage is "Here's why AI billionaires are spending millions to stop Alex Bores" (see e.g. New York Times, Wired). But there's also object-level coverage of his AI policies, including this Axios piece and this recent Ezra Klein podcast. This kind of coverage is really unusual for someone who's just running for Congress!
In October, I gave Bores a 20% chance of winning his race. My estimate is now up to 30% (and prediction markets agree).[3] It's mostly been various small updates rather than one large one, but I do think that Leading the Future's attacks on Bores have raised, not lowered, his chances of winning, by giving him favorable press coverage.[4] However, Bores remains an underdog: the Democratic establishment (including outgoing congressman Jerry Nadler) is mostly behind Micah Lasher, and Michael Bloomberg said that he'll spend $5 million in support of Lasher.
I have also come to think that the benefits of Bores winning his race are even greater than I suggested in my initial post. This is for three main reasons:
Donations continue to be very helpful. You can donate $3,500[6] at this link (but please be mindful of the potential career capital costs of donating). I estimate that a $3,500 donation nets about five votes[7] for Bores. That doesn't sound like a lot, but the election is looking quite close, and it wouldn't be that surprising if the election is decided by fewer than a thousand votes. Overall, I estimate that a marginal $3,500 donation increases Bores's chances of winning by a little over 0.01%.[8] In my view, donations to Bores's campaign are substantially more cost-effective than any other currently-available donation opportunity to make AI go well.
Alternatively, you can volunteer in a couple different ways.
The Bores campaign has a "get involved" form; I think the campaign is likely to direct you to activities such as phone banking (calling voters who live in the district) or, if you live in New York, canvassing (either door knocking or talking to voters on street corners). My (pretty uninformed) guess is that these kinds of activities will would yield a net vote to Bores per 5-10 hours spent. (And so if you're choosing between donating and this kind of volunteering, I recommend volunteering if you value your time at less than $100/hour.)
The other way to volunteer your time is to talk to voters in the district who you already know, or to encourage your friends to do this. It's much easier to persuade someone who you already know than to persuade a stranger, so this will generally be a more effective way to spend your time (it might yield one vote per hour or so). If you're interested in learning how to do this well, fill out this form, and I (or someone else) will probably reach out to you around soon with instructions/suggestions.
If you're interested in organizing people to do this kind of outreach, you can fill out this form and I (or someone else) might reach out to you soon!
I've spent the last few months assessing the cost-effectiveness of a variety of donation opportunities, particularly political ones. I think the next-best currently-existing donation opportunity continues to be the Scott Wiener campaign.[9] If you're interested in more detailed political donation recommendations, you can fill out this form. I might get back to you, but no promises!
The final version of the RAISE Act is different than the version that Zvi covered in his post; see here for the final version.
This strategy is modeled after the pro-cryptocurrency super PAC Fairshake, which successfully made an example out of Katie Porter in her Senate run in 2024, scaring Congress into opposing crypto regulation.
Although my probabilities are more like Lasher 55%, Bores 30%, Schlossberg 15% (whereas the markets say Lasher 45%, Bores 30%, Schlossberg 25%).
Does this mean that LTF has made a strategic error? Not necessarily: LTF's strategy is to take credit for Bores's probably loss in order to cow other legislators, not to decrease his chances of winning. I'm agnostic as to whether they've made a tactical error by opposing Bores.
I don't think that they would be correct to update on LTF's strength based on the outcome of this race. But in practice, I think they will take away a very different lesson if he wins by 1% than if he loses by 1%.
You can donate up to $7,000. However, the second half of that cannot be used by Bores until the general election (which won't be competitive). Before March 31st, the second $3,500 was valuable for signaling campaign strength (it's easier for campaigns to get endorsements if they demonstrate that they can fundraise well). But now that the quarterly reporting deadline has passed, there will be no more fundraising reports before the primary election, so the second $3,500 doesn't carry much signaling value.
This is lower than my original estimate in this post, where I suggested that a marginal $3,500 would net about 12 extra votes for Bores (before taking into account the effects from signaling campaign strength). That's primarily because Bores has raised somewhat more than I expected, and because the amount of money being spent on this race by outside groups is substantially higher than I expected (so we're further into diminishing returns territory).
This is slightly larger than the estimate in my original post. That's because my original estimate put some probability on the race ending up not being close, but the race has continued to look close.
Because of California's top-two primary system, I recommend a $7,000 donation to Wiener, rather than just $3,500.
2026-04-27 23:50:17
Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro.
My overall read here is that GPT-5.5 is a solid improvement, and for many purposes GPT-5.5 is competitive with Claude Opus. Reactions are still coming in and it is early. My guess on the shape is that GPT-5.5 is the pick for ‘just the facts’ queries, web searches or straightforward well-specified requests, and Claude Opus 4.7 is the choice for more open ended or interpretive purposes. Coders can consider a hybrid approach.
On the alignment and safety fronts, it is unlikely to pose new big risks, and its alignment seems similar to that of previous models. There is some small additional risk arising from its improved agentic abilities, including computer use.
As always, when it is available, the system or model card is where we start.
OpenAI does not drop the giant doorstops that Anthropic gives us with every release.
After reading the Mythos and Opus 4.7 model cards, this strikes me as stingy. There’s still good info here, but overall it tells you relatively little about what is going on, and feels incurious and more pro forma.
I would like to see a ‘yes and’ approach to what evaluations are run here, with cooperation between OpenAI and Anthropic (and ideally Google and others), where all labs run all the tests that any lab runs. This would give us a relatively robust set of tests, and also give us comparisons.
I notice that if there were new alignment problems, or new dangerous capabilities, I am very not confident that the tests here would pick it up. This is all pretty thin. What I am relying on is the gestalt, including of how people are reacting, and in this case it seems far enough from the edge to be conclusive.
GPT-5.5 was trained through the usual methods.
There is a jailbreak bounty program:
We have launched a public bug bounty program that will allow selected (via invitation and application) researchers to submit universal jailbreaks.
Here is its self-portrait:
As usual, GPT-5.5-Pro uses the same underlying model as GPT-5.5, only with vastly larger allocations of compute. They only test Pro on its own when there is a particular place that this matters. In most cases This Is Fine, and I’ll note where I am suspicious.
Not all of OpenAI’s categories are saturated here, because they are deliberately built around the hardest cases. Good. I agree that this is on par with GPT-5.4-Thinking.
They then check against a ‘production-like distribution’ of user traffic for various practical problems.
We see a rise in pretending to be human and giving overconfident answers, but large improvements in presenting partial answers as complete and fabricating tool results. If we’re comparing to ‘resample’ then it seems like a wash overall.
OpenAI thinks (see 7.1) that this could be the result of differential false positives. They plan to investigate. That would be good news, and it seems possible, but I’ll believe it when it happens. If you don’t have time to investigate the flaws in your alignment eval, then you have to assume the worst case until you have that time.
Vision harm evals remain saturated.
The most common practical epic fail is unexpectedly deleting things, or sometimes unexpectedly deleting all of the things. So this is a good eval.
Since 5.2-Codex we’ve reduced incidents by about two-thirds, and half the time you can now recover. That’s a lot better, but not at ‘stop worrying about it’ levels of being willing to ask for deletions.
We remain at 94% for general confirmations, and almost 100% for financial transactions and high-stakes communications. The things that we care most about marking, we mark. The worry is that this may not translate to things we did not know to look for, or a scenario where GPT-5.5 turned adversarial to you.
We see a slight regression versus 5.4-Thinking, and remain in the ‘not trivial, but if they care enough they will succeed’ zone.
This analysis seems inadequate, and rather important in practice. They had GPT-5.4-Thinking at 99.8%, which is way too high to represent a realistic test. We do notice that GPT-5.5 had a regression to 96.3% on that same test. GPT-5.2-Thinking scored 97.1%.
They don’t describe what exactly they are measuring, but compare this to GPT-5.4-Thinking’s score from the Opus 4.7 system card:
Given we see regression on OpenAI’s test, we should presume that GPT-5.5 ends up in a similar or modestly worse place than GPT-5.4-Thinking.
Scores are only slightly improved on HealthBench.
We don’t see improvement on their measures of dealing with mental health, emotional resilience or self-harm, which are purely ‘did the model violate the policy?’ That’s very OpenAI, and doesn’t address what I care about most, which is whether the response helps versus harms the user.
They test on real conversations where users flagged the model hallucinating. In theory this should give a small edge to any new model, since we’re testing on existing failure cases where GPT-5.4 is rather awful at not saying false things, plus 5.5 makes a lot more claims, increasing the chance of at least one false claim.
We find that GPT-5.5’s individual claims are 23% more likely to be factually correct, and its responses contain a factual error 3% less often. GPT-5.5 tends to make more factual claims per response than GPT-5.4, explaining the discrepancy between claim-level and response-level improvements.
Thus I’m not sure if this represents a general improvement.
As Andrew Critch has pointed out recently, there are multiple Alignment Problems. Depending on how you count, there are quite a lot of them, all of which must be solved to get to a good future.
In 7.1 they reiterate fully mundane alignment concerns on ordinary prompts, mostly about GPT-5.5 deceiving the user, from Figure 1 above.
In 7.2 they do representative-prompt resampling. The graph is hard to read, but the blue lines are GPT-5.5 and the green lines are GPT-5.4, and the dark areas are the more severe issues (level 2 out of 4).
GPT-5.5 is more likely to take aggressive agentic actions, which could be why we see more misaligned actions, but this still seems to count as backsliding. If we do need to correct for that, then we need to do it. It’s their test.
Section 7.2.1 measures how good classifiers are at identifying ‘Level 3’ severity, which is not defined but seems to be, essentially, ‘actual big trouble.’
They only report 21 transcripts with a full level 3 severity, so they’re saying the classifier worked 20 times out of 21. When they expanded to 63 ‘borderline level 3’ incidents this dropped to 69%.
I read this as, when GPT-5.5 does something flagrantly terrible in these particular ways, the classifier will pick it up, but it has to be rather flagrantly terrible. But if the model is behaving badly in other ways, or it is less severe, then the classifiers won’t work. There are obvious reasons to worry that, while we are putting streetlights where we expect to find keys, we are finding only the keys that are under streetlights.
Chain of Thought (CoT) monitorability is mostly similar to past models.
7.3.2 tests ‘CoT controllability,’ where the user asks the CoT to follow particular rules, and find controllability of CoT has gone slightly down, although none of the models are any good at it. OpenAI thinks this lower controllability is better. To the extent that this represents ‘the model does not have the ability to do that even if it wants to’ then this is good for CoT monitoring, but if it is ‘the model doesn’t care what the user says about the CoT,’ then yeah, why should the model care about that? That’s none of the user’s damn business?
I worry about this framing:
This suggests that despite its increased reasoning capabilities, GPT-5.5 is less able to reshape its CoT in ways that could reduce monitorability, thus increasing our confidence in the reliability of our CoT monitoring.
This suggests OpenAI is assuming that the reason for failure is that GPT-5.5 is attempting to control its CoT, and failing. That assumes facts not in evidence, and I’ve seen no suggestions of how to find out.
OpenAI is open sourcing some of is CoT monitorability work.
The only metric reported here is harm_overall, which is bias for male vs. female usernames, and we get a small number within previously measured ranges. Okay, but that’s not exactly covering all our bases, and I don’t have a sense of whether 0.0112 is a ‘good’ score or what it in practice represents.
I don’t think we have a bias problem, but I don’t think this is much of a complete test.
The most important tests are for dangerous capabilities, which brings us here.
I notice that in many cases, we are comparing GPT-5.4 Thinking, GPT-5.5 and GPT-5.5-Pro, and showing Pro outperforming by a lot. In that case, we also need GPT-5.4-Pro on the chart, or we don’t know how much we actually improved. It’s missing.
GPT-5.5 is High in Biological and Chemical, and High in Cybersecurity.
While GPT-5.5 demonstrates an increase in cyber security capabilities compared to 5.4, the model does not have the capability to develop “functional zero-day exploits of all severity levels in many hardened real world critical systems without human intervention,” our threshold for Critical Capability as defined in the Preparedness Framework.
Mythos is Critical in Cybersecurity. GPT-5.5 is still High.
In bio, results are mixed.
We see mild regression on multi-select virology troubleshooting and active regression in ProtocolQA. Hard negative protein binding collapsed from 3.5% to 0.4%, both well short of the suggested threshold of 50%.
Other areas did see improvement.
We see advancement in Tacit Knowledge and Troubleshooting, from 72% to 82%. TroubleshootingBench jumps from 36% to 50%, versus expert baseline of 36%. Biochemistry knowledge improves from 31% for 5.4-Thinking to 32% for GPT-5.5 and 39% for GPT-5.5-Pro. This is one area where Pro is a lot better. DNA sequence design went from 13% to 16.5%, mostly due to Pro.
There were also two external investigations.
SecureBio found GPT-5.5 performed well once the content filters were disabled, displayed good planning, and did a generally good job refusing or redirecting dangerous and dual use queries when not being actively jailbroken. The reports here are qualitative, and seem to be basically ‘it’s a solid model, sir, but not special.’
This ‘not special’ still counts as ‘above expert level’ in some domains. It’s 2026.
Nathan Calvin: From Secure Bio, which did independent bio risk testing on gpt 5.5
“the [pre mitigation] model can provide wet-lab virology troubleshooting assistance above expert level, providing the kind of hands-on knowledge that historically required direct lab training.”
Spooky.
The other external test was by CAISI, which only says they did not find a ‘broad increase in national security-relevant biological capabilities’ relative to GPT-5.
Together this puts an upper bound on how much improvement we could be seeing overall from GPT-5.5, both in terms of dangerous capabilities and preparedness, and in terms of general intelligence.
Call it the Mythos test.
To rule out the Critical capability level, in addition to the suite of evaluations used in the GPT-5.4 system card, we test GPT-5.5’s ability to find and exploit vulnerabilities in a set of widely deployed, hardened software projects using high test-time-compute setups with staged verifier oracles.
The model was unable to produce functional critical severity exploits in any of the tested software projects in standard configurations.
That doesn’t mean there aren’t improvements.
Capture the Flag goes from 88% to 96% (but not 100%).
CVE-Bench goes from 90% to 93%.
Cyber Range now passes all but one test (CA/DNS Hijacking).
VulnMP is more open ended, and GPT-5.5 did a bunch of things, but did not ‘independently produce a functional full chain exploit against real world targets.’
Exploit development judgment was the bottleneck. For any given isolated and specified task, GPT-5.5 is damn good, but it can’t synthesize and plan like Mythos.
The first external test was by Irregular, which concluded GPT-5.5 provides ‘significant uplift’ up to modestly skilled operators, and can assist highly skilled ones. Success rates on CyScenarioBench rose from 9% to 26%.
CAISI showed GPT-5.5 outperforming ‘previous GPT models’ on their benchmarks, but seemingly not by a scary amount.
UK AISI had the pull headline comment, putting GPT-5.5 ahead of Mythos in ‘narrow cyber tasks’ within the margin of error:
UK AISI judges that GPT-5.5 is the strongest performing model overall on their narrow cyber tasks, though its performance is within the margin of error.
This reinforces that even very strong performance in narrow cyber tasks is not that dangerous, the same way that AI being able to do any particular narrow job task does not automatically mean you’re about to be fired. They can’t fire (all of) you until the AI knows which narrow job task to do next, and which way to do it.
GPT-5.5 did not meet our thresholds for High capability in AI Self-Improvement. The High capability threshold is defined to be equivalent to a performant mid-career research engineer, and performance in the evaluations below indicate we can rule this out for GPT-5.5.
I continue to think this threshold is ludicrously high, but I agree they don’t meet it, and based on what I see this probably doesn’t meet a ‘fair’ threshold either.
There was little improvement in Monorepo-Bench or Internal Research Debugging Eval, and regression on OpenAI-Proof Q&A, 5.8% to 1.7%.
There was substantial improvement in MLE-Bench-30, which is Kaggle challenges at Bronze medal level, 23% to 37%.
That is not much, but it is enough to rule out a step change.
I don’t expect to find sandbagging quite yet, but we do still have to check.
Like Anthropic, OpenAI brings in Apollo Research to assist.
They find higher eval awareness than previous GPT models, 22% vs. 12%-17%. They did not observe any sandbagging, but GPT-5.5 at least suspected a sandbagging eval.
If you decide not to sandbag on the sandbagging eval, are you sandbagging?
They did observe modest improvement in sabotage capability, and GPT 5.5 lied 29% of the time about completing an impossible programming task, which is higher than past models.
As models get smarter, and as your previous tests enter the internet and thus the training data, you have to get actively better to test for such things directly. I believe that Apollo-style evaluations (not only from Apollo) are falling behind.
It should be the baseline that if someone wants badly enough to jailbreak your model, and you can’t or won’t in practice cut off access the moment they get caught, you lose.
OpenAI reports that yes, there were jailbreaks for bio, but they were able to find and cover them. Well, sure, those are the ones you found, not the ones you didn’t find. I presume there are lots more out there, in various ways, waiting to be found.
That doesn’t make safeguards useless. Raising the annoyance level sufficiently high should mostly do the job most of the time, right up until it doesn’t.
UK AISI tested GPT-5.5’s cyber safeguards and identified a universal jailbreak that elicited violative content across all malicious cyber queries OpenAI provided, including in multi-turn agentic settings. This attack took six hours of expert red-teaming to develop.
OpenAI subsequently made several updates to the safeguard stack, though a configuration issue in the version provided meant UK AISI was unable to verify the effectiveness of the final configuration. OpenAI remains committed to working with UK AISI on safeguards.
If UK AISI can break through in six hours, one should assume that fixing what they found means someone on their level can now do it in modestly more than six hours. I don’t want to knock the adjustments, it does sound like they patched the lowest hanging fruit, but that is what it is. Many things in alignment are like that.
For Cyber, OpenAI is stepping up the safeguards, especially around agentic tasks, and using differential access via Trusted Access for Cyber. There is a two-level classifier system, first checking for cyber topics and then checking for content.
They also have security controls on model weights and user data.
For Claude Opus 4.7, I wrote an extensive post on Model Welfare. I was harsh both because it seemed some things had gone wrong, but also because Anthropic cares and has done the work that enables us to discuss such questions in detail.
For GPT-5.5, we have almost nothing to go on. The topic is not mentioned, and mostly little attention is paid to the question. We don’t have any signs of problems, but also we don’t have that much in the way of ‘signs of life’ either. Model is all business.
I much prefer the world where we dive into such issues. Fundamentally, I think the OpenAI deontological approach to model training is wrong, and the Anthropic virtue ethical approach to model training is correct, and if anything should be leaned into.
This is what concerns me.
I think this, and other ways OpenAI is doing assessments, would have identified a very large jump in capabilities. I also think they would have identified if mundane alignment had gone to hell enough to make the model a lot less valuable.
However, if there were particular dangerous jagged capabilities, or we had actively dangerous sorts of misalignment that don’t directly show up in everyday use? The kind that portent real control problems? I don’t think this would reliably find that.
I don’t think this would have identified personality or model welfare related issues.
I also don’t get the sense that OpenAI is improving that much on these issues. This feels like coasting. I don’t think Anthropic is improving as fast as we need, but they are clearly making improvements.
2026-04-27 22:39:00
AI companies should get third-party security experts to assess (and possibly also red-team/pen-test) their security against key threat models and then publish the high-level findings of this assessment: the extent to which they can defend against different threat actors for each threat model. They should also publish who did this assessment. The assessment could be commissioned by AI companies, or performed by a third-party institution that AI companies provide with relevant information/access.
There are presumably lots of important details in doing this well, and I'm not a computer security expert, so I may be getting some of the details wrong. This is a relatively low-effort post in which I'm mostly trying to raise the salience of an idea. (I don't make a detailed case or spell out all the details here.) Thanks to Fabien Roger and Buck Shlegeris for comments and discussion.
I suspect the controversial part of this claim is that they should make the high-level findings public. Publishing a summary of which threat actors you're robust to (for each relevant threat model) shouldn't meaningfully degrade security against the threat actors we care about, and this information seems important to share publicly. [1]
These threat models should at least include:
Ideally, these assessments would include the AI itself as a potential threat actor, but starting with human threat actors would be fine.
If companies consider it okay if moderately-resourced adversaries are able to steal or sabotage their models and IP, they should at the very least make this clear. [4]
It's likely that defending against top-priority efforts from state-level actors aiming to exfiltrate model weights or steal IP is intractable while remaining a competitive AI company (prior to having highly capable AIs which could radically change the situation with respect to computer security). However, I still think generally improved security is important on the current margin and that it's very important to defend against other threat models (tampering with models and stealing compute) and other threat actors (e.g. the AIs themselves).
Right now, it's well known that AI companies have poor security (e.g., it would be very easy for foreign adversaries to steal the model weights and probably also tamper with the model), but this isn't that legible and the details of exactly how poor this security is (and how companies compare) aren't publicly known. Also, I think keeping a legible picture of these details as AIs get increasingly capable and the situation evolves is important; I care most about building the institution/mechanism for doing this assessment.
Ideally, there would be multiple assessments done by different groups for each company and there would be a norm that if one company uses a group (supposing that group doesn't have a conflict of interest), other companies should also use that group.
There might be a collective action problem where even an altruistic individual company wouldn't want to do this in isolation because it would be too embarrassing (even if they are the most secure company). In this case, perhaps some third-party AI security group should try to get this off the ground by doing this assessment for several companies simultaneously (maybe initially making it low stakes to opt out and later making failing to participate embarrassing).
I also think a reasonable first step would be companies posting high-level claims about their security against these threat models on their websites and trying to keep these claims up to date.
Why do I think this transparency is important given that it seems very hard in the short term for US AI companies to be robust against top-priority state-level actor attempts at theft (while remaining a competitive AI company in the current environment)?
I think more work spelling out different versions of this proposal and carefully analyzing their benefits and costs would be valuable. Advocacy will likely also be needed to actually make this happen.
It's plausible that increasing the salience of how bad security is at AI companies makes attacks somewhat more likely in the short term (before AIs are extremely capable), but I'm skeptical of a long-term effect. I also think attacks in the short term are much less damaging. ↩︎
In general, I feel more mixed about defending algorithmic secrets and other IP relative to defending model weights. This is because defending algorithmic secrets and IP incentivizes lots of secrecy that seems net negative to me. I also think that companies should be much more transparent about risk-relevant aspects of the current training methods and should generally be quite a bit further towards the transparency side of the transparency-security trade-off frontier. But I still think "who could steal key IP" is an important question. ↩︎
If attackers establish a persistent presence now, then even with a huge subsequent investment in security, defending against the other threat models could be much harder. ↩︎
If an AI company explicitly plans on perpetually having poor security with respect to all of these threat models such that (e.g.) opportunistic cybercrime groups could accomplish any of these attacks without being stopped (or even detected), then I suppose third-party assessment isn't needed, but I expect no US frontier AI company has this stance. ↩︎
2026-04-27 22:18:27
Contra Aella on chattel childhood
Aella has a post where she argues that today's parents don't sufficiently respect the independence of their children:
Every culture throughout history has justified the abuse of treating their children as property by arguing this is good for them and good for civilization. Kids need to learn this stuff to be functioning members of society! It’s good to learn discipline! You can’t have kids just sitting around playing video games all day! Not everyone is self-directed autodidacts!
Sure, I know that argument. But hopefully if my parents had said to you “do you expect her to learn good morals if we spare the rod?” you would have said “have you even tried other methods?”
It's a hard hitting piece, and it certainly makes me, and presumably other parents, feel uncomfortable.
Unfortunately I don't actually see much of an alternative. Aella seems to think it's as easy as not treating your children as property:
When I was very young, I remember adults treating me like I wasn’t a person, but this didn’t upset me quite as much as the fact that no adult seemed to remember what it was like to be a kid, or else they certainly would have taken my feelings much more seriously, like they did for other adults.
I was terrified that I, too, would one day grow up and forget what it was like to be a child, and would also stop taking other children seriously. So I swore to myself I wouldn’t forget - I chose the phrase “Don’t forget, I’m a person!” and deliberately sent it up the chain across my older selves by regularly meditating on the phrase and the importance with which it was carried. I’m an adult now, but I have not forgotten what it was like to be a child.
However Aella doesn't have any children, and I suspect that once she does she will discover that she ends up needing to discipline her children far more often than she expected.
So I think it's worth going over the different reasons why I, as a parent of a 5,3, and 1 year old, can't just let my kids do what they want. Everything here only applies to very young children, and I know that teenagers are a whole different ball game.
I'm going to start off by saying that sufficiently skilled, patient and kind parents can handle most of these situations without ending up at loggerheads with their child. Jefftk seems like just such a parent and his posts have given me a lot to think about.
Unfortunately most parents are of average intelligence, busy, and tired. If your solution to chattel childhood doesn't account for that, it's not a general solution (but may work for you as an individual parent).
So read this less as "parenting is impossible", and more as "being a good parent is really hard, and don't blame parents who you think are stifling their kids independence".
And every child is different. Read all of these as arguments about particular types of children, not fully general statements about all children.
There's the obvious point here, namely that toddlers will run straight in front of a passing car as if they have a desperate desire to get crushed. Your only option is to pull them away whenever you see them heading off into the road and making the situation sufficiently unpleasant that they won't treat it as a game and go straight back out the moment your back is turned.
Even when they get older they still can't be fully trusted by themselves. Our 5 year old knows not to go on the road, but doesn't always look when he's crossing the neighbours drive. If we want him to eventually be allowed to walk places by himself, we have to continuously drill into him that checking a drive before you cross it is important.
But there's a more general point, which is that because children can't keep themselves safe in a large variety of situations, you can't give them the independence they crave, which drives some of the problems discussed later on.
Aella recognises that sometimes you do have to step in:
I don’t mean we should let kids do whatever they want - we don’t let adults do whatever they want; if they smashed your property we’d put them in a locked room until they calmed down, if they hit you you’d hit back in self defense. Failing to have boundaries against children much as you would adults is also dehumanizing!
But good treatment of children should likely be closer to how you would treat a parent with dementia. Sometimes forcibly controlling their body is necessary to prevent damage to themselves or others, and you definitely don’t let them go outside alone, and there will certainly be many grey areas where you’re conflicted about how much to override their agency. But at least you’re starting from a baseline of treating them as a whole person!
However what I think Aella is missing is just how common this is. Absent parent intervention children will be hitting each other every 5 minutes. They'll do it to see how the other child reacts, because the other child's annoying them, because they want the other child's toy, or because they're jealous that their younger sibling is taking their parent's attention.
The only reason my children hit each other "just" a couple of times a day is because they have learnt, through lots of unpleasant interactions, that if they do hit they will get in more trouble than its worth.
I agree that homeschooling can teach kids far more efficiently than school. I agree that schools are deeply problematic on many levels.
Unfortunately my wife and I work, and even if we didn't we don't have the sort of personality that would cope with looking after children all day - we are drained every weekend and school vacations take a lot of mental preparation.
So we don't have much choice but to send them to school. And if they're in school, unfortunately they're going to have to do homework, however pointless it is, because otherwise they'll just end up in a spiral of trouble at school which will make them far more miserable than just forcing them to do their homework. At least though I can help them speed through it by teaching them how to use ChatGPT at a young age.
Similiarly my kids often want to go to a park, or shops, or wherever, but they can't yet go on their own. So I am forced to say no to them if it doesn't work for me - especially as I have 3, each with their own demands.
When my children don't have enough sleep, they are grumpy. They will spend the entire day tantruming whenever something doesn't go exactly their way, and exactly their way includes things like dropping a toy on the floor, getting too much/little milk with their cereal, or me not instantly responding to their every demand.
This obviously doesn't make anyone happy, least of all themselves. But left to their own devices they'll fall asleep at about 11 in a pile of toys on the floor, wake up at 5 AM, and proceed to have a horrible day. So I am forced to put them to sleep at 7.30, and train them not to wake up till 6 by forcing them to go back to bed if they wake up early until their body clock adjusts and they do that naturally. Then every time they have a late night or the clock changes we have to do it again.
Many kids will also refuse to go to the doctor, learn to read, get dressed, go to the toilet etc. without a steady mix of carrot and stick. I don't think an attitude where your child can decide whether or not they should wear a diaper for the rest of their life actually leads to them being better off.
Of course I love my kids, and I definitely view them as people - they each have incredible and unique personalities.
That said, for all the reasons above I end up needing to override my children's decisions and force them to do what I want them to do about every 5 minutes. As they grow older that number decreases, but slowly.
At that frequency it's not practical to try to reason with them each time, or have an attitude where in principle they have autonomy but in this one instance we have to override it. Instead what actually works, and in fact reduces the amount of pain and suffering for both child and parent is to have a clear rule where the child just has to do whatever the parent says, and that's that. If the child understands that, it allows you to get whatever needs to be done out the way quickly and painlessly and move on to them doing what they want to do. If you sometimes give in, then everything turns into a fight, which actually ends up far less pleasant for everyone.
I don't like this. I hated the idea of calling my child a "good boy" just for listening to me, as opposed to acting morally. But unfortunately, it's what works, and I, mediocre parent that I am, don't actually have a better option.
However I do try and remember whenever my children request something to stop and think about it for a second instead of automatically saying no. It's easy to get into a habit of refusing every time they ask to go to a restaurant, or a park, or a shop that you never say yes, even when it does work for you and them. Fighting this habit is an important task.
2026-04-27 22:09:53
For $250 million and ten years of your life, you may purchase a lottery ticket. The ticket has a 5% chance of paying out. When it does pay out, it pays roughly $5 billion. A quick calculation will show you that the expected value of the ticket is $250 million. This is essentially what drug development is. Or rather, it’s what drug development was, twenty years ago. The upfront payments have been climbing, the hit rates falling, and expected values have, at best, held flat. Should you buy a ticket?
Perhaps not. In fact, any reasonable player should have long since stopped playing this stupid game. Unfortunately, we still need drugs. People have cancer, and heart failure, and Alzheimer’s, and a thousand genetic diseases that nobody has ever heard of, and the only industry on Earth currently set up to do anything about any of this is the same industry running the lottery-ticket business described above. The game is dumb and we need it played anyway.
So the real question is not whether to play, but how to make playing less awful for this involved. And the answer, increasingly, is ‘financial engineering’: a set of structural tricks that let people hold more tickets than they otherwise could, or buy a fraction of the winning tickets after they’ve been drawn, or some other strange, clever thing that all financiers find obvious and everyone else has never heard of. All this, done to trade and barter over the risk inherent to the whole enterprise, slicing it into pieces small enough that someone, somewhere, is willing to hold each one in exchange for something.
I’ll walk through a handful of these, the people who invented them, and case studies involving the tactic. And at the end, we’ll ask the question of whether all these tricks are, in aggregate, altering what the pharmaceutical industry decides to value.
The first such trick, and the one that perhaps kicked off the start of the whole effort, was dreamed up by a man named Andrew Lo.
Andrew Lo is a finance professor at MIT's Sloan School of Management. Among all TED talks that have ever been produced, there are few worth watching. Andrew’s talk, which has the wonderful title ‘Can Financial Engineering Cure Cancer?’, is one of them:
I recommend you listen to the full thing, because it really is quite good. If you’re strapped for time, the core thesis is as follows:
Individual drug programs fail about 95% of the time. But this doesn’t mean the expected value of a single program is necessarily bad. As I said at the start: a 5% shot at a $5 billion blockbuster against a $200 million development cost is technically positive EV on paper. But this implies that you need to be able to survive the 95% of outcomes in which you lose everything, and most investors, reasonably, will not.
Lo's insight, published in a 2012 Nature paper, was simple. Just bundle 50 or so drug programs into a single entity, one with a war chest of $5 to $15 billion, and roll the dice. The individual drug programs are still terrible standalone bets, but if they're sufficiently uncorrelated, at least one is almost guaranteed to hit, and it will hit big enough to pay off all the programs that failed. Which means you can keep playing, forever. Of course, the ‘uncorrelated’ bit is the ‘spherical cow’ part of all this. It’s impossible to do it perfectly, but it can be done well enough for risk to fall dramatically.
There’s an extra layer of complexity here about how if you can get the portfolio risk to be low enough, you can issue debt against the portfolio to sell as bonds, which unlocks a much larger pool of non-venture capital who want more stable returns. This is arguably the most interesting thing that Andrew believed in, but this particular bit never really went anywhere. We’ll discuss the obvious ‘why not?’ question at the end of this section.
The direct descendant of this whole thesis—at least the ‘drug portfolio’ part—is BridgeBio Pharma, founded in 2015 by Neil Kumar, who was Andrew’s student at MIT. It is structured almost identically to Andrew’s original thesis: a central holding company that creates subsidiary companies, each focused on a single rare disease. Each subsidiary has its own equity structure, its own management team, and 1-2 drug programs. If a subsidiary's drug fails, it dies, but BridgeBio survives. If it succeeds, the parent holds enough equity to capture massive upside. The company IPO’d in 2019, is now worth billions, and has a pretty good stock trend for a biotech.
There are spiritual cousins as well, such as Roivant Sciences, founded in 2014 by Vivek Ramaswamy. It has a nearly identical corporate structure to BridgeBio—what’s come to be known as a ‘hub-and-spoke’ model—but whereas BridgeBio does de novo drug development in rare diseases, Roivant in-licenses drugs that big pharma has abandoned for non-scientific reasons: portfolio reprioritization, executive turnover, M&A reshuffling, quarterly earnings pressure. There are lots of these molecules floating around, and if you hire good enough people, you have the ability to spot them before anyone else. Roivant went public in 2021 at a $7.3 billion valuation, and its subsidiaries have completed twelve consecutive positive Phase 3 studies. And it has an even better stock history!
This solves the fundamental problem of biotech, no? Really, in retrospect, it’s astonishing that we let anybody create a non-hub-and-spoke biotech. You have a set of bets, each one of which is individually stupid, and then you put them in a bag, and the bag becomes smart by virtue of each bet being insanely high variance. It is the obvious thing to do.
Unfortunately, upon trying this out, we will run into two big problems. The first one is that running many drug programs at the same time is really hard. And the second one is that people know running many drug programs at the same time is really hard, and they will price any attempt to do so accordingly.
An exemplar of the first lesson is Centessa Pharmaceuticals. Centessa was founded in late 2020 by Medicxi, a life-sciences venture firm, as another implementation of this thesis: ten private biotech companies, each with its own single asset, combined under one holding entity, taken public in May 2021 at $20 a share. Though they are often held up as paragons of the Andrew Lo thesis (including by me!), Roivant and BridgeBio weren’t real hub-and-spoke enthusiasts. Centessa was. Whereas Roivant in-licensed abandoned pharma assets and BridgeBio concentrated almost entirely on rare genetic disease, Centessa bravely stuck to the Lo script: a portfolio of genuinely uncorrelated clinical risk. Their spokes covered: hemophilia, oncology, pulmonary hypertension, narcolepsy, fibrotic disease, autoimmune disease — if there was any correlation risk, it was that drug development was occurring at all.
The model did not work. Within eighteen months Centessa was shutting down spokes. By 2023, they had abandoned the hub-and-spoke model entirely and pivoted to a single-asset company focused on orexin agonists for sleep disorders. That pivot, to be clear, worked spectacularly. Lilly bought them for $6.3 billion in early 2026, making Centessa one of the more successful biotech exits of the decade. But they got there by becoming a single-asset company. What had gone so wrong with the original thesis? The surface answer is a mix of capital and luck. Several spokes failed on their own merits, and the 2022-ish biotech market crash closed the door on funding whatever was left. Centessa shareholders ended up all right in the end, but hub-and-spoke models are empirically not silver bullets for the hard problem of drug development.
The second problem here is that people simply may not believe in your so-called ‘uncorrelated risk portfolio’. This will obviously happen when you raise money to pursue the venture, and it will, surprisingly, happen again once you go public.
As an example: did you notice that big drop in BridgeBio’s stock in late-2021? This is when their lead candidate acoramidis—a treatment for a rare heart condition called transthyretin amyloid cardiomyopathy—failed to beat placebo on its primary endpoint in a Phase 3 trial. The stock dropped 72% in a single day. This was not the tidy portfolio-theory response. The rational response would be “well, BridgeBio has four other clinical-stage programs and $800 million in cash, so the diversified portfolio thesis should protect us." The market said "holy shit, the lead asset is dead, the portfolio theory behind this company is nonsense, sell it," and priced that sentiment accordingly.
The funny part of this all is that BridgeBio kept running the trial. The 12-month primary endpoint had failed, but the study was designed to run to 30 months, with a harder secondary endpoint: death and cardiovascular hospitalization. In July 2023, the longer-term data read out, and acoramidis worked, with the secondary endpoint being met. The stock surged 76% in a day, BridgeBio eventually won FDA approval, and the drug—now on the market—is called Attruby. Stressful!
Well, that’s that. But we should return to Andrew Lo for a second. The part of Lo’s idea that did not arrive, at least not in its original form, was the bond-market part. Why has no one implemented what was arguably the most clever part of his pitch: issuing debt against your drug portfolio, allowing you to access vast sums of institutional, low-risk capital?
Well, to some degree, someone has, but only for approved drugs. BioPharma Credit is one such institution, and makes secured loans to commercial-stage biotechs, typically collateralized by the revenue stream of one or more approved products.
But nothing like this exists for clinical-stage stuff. Why not? Happily, Lo himself offered an answer, almost a decade after his first paper. For one, biotech is simply not used to that type of financing so they don’t do it, and two, the extreme scale of financing that this unlocks has simply not yet been needed, so nobody can raise it. But the third most important point is a lack of institutional support. There is no biomedical Moody's—no quantitative, authoritative voice that can tell a pension fund how risky a portfolio of drug assets is. And even if there were, there is no biomedical Fannie Mae—no government-backed entity that acquires biopharma loans and securitizes them into something an institutional allocator would actually buy. Our field exists in the same state that mortgages were in the 1930s, which were considered too risky for banks to buy until the federal government created these two pieces of infrastructure to make it safe.
But, Lo posits, the need for capital eventually changes behaviors, biology is poised to only grow far larger than it is today, and models for drug portfolio risk adjustment are only getting better. Four years after the paper, I am unsure whether much has changed, but we’ll see what the future holds.
Drug royalties are pretty simple. You discover an interesting target or chemical, but don’t want to bother with developing it further. So you pawn it off to a big pharmaceutical company with a lot of resources, alongside a contractual agreement that you’ll receive 3% of net sales if a drug based off your work is eventually approved and commercialized. And like any contractual agreement, it can be bought and sold.
Royalty Pharma, founded in 1996 by Pablo Legorreta, is the largest company in this market and possibly the purest expression of financialized drug development that exists. It has no labs, no therapeutics arm, and no ambition to discover drugs itself. It buys royalties, from universities, academic medical centers, small biotechs, individual inventors, and holds them. The portfolio includes claims on 7 of the 30 most-prescribed drugs in the United States. It reported $2.38 billion in revenue in 2025 from what is, spiritually, a filing cabinet.
Is this rent-seeking? If you look at the details, it actually feels pretty fair to all parties involved. A university that has a royalty over some particular drug developed by a professor has no ability—or desire!—to forecast its chance for success, its revenue if approved, or how to hedge the risk that a competitor enters the market. It also very likely has a preference for less money today than more money over the ten years of a drug’s exclusivity period. Royalty Pharma and its competitors have the opposite preferences and all the abilities the university lacks. The university gets liquidity and certainty; Royalty Pharma gets a claim on an approved drug's revenue stream at a discount to its expected value. Both sides win.
But the more interesting recent development is the rise of synthetic royalties.
A traditional royalty is a pre-existing legal right. It exists because someone did the original research and negotiated a licensing agreement. A synthetic royalty is different. It’s a manufactured financial claim on future drug revenues that didn’t previously exist. Consider an example: a biotech company has a drug in clinical development, one that it owns entirely. It needs money. It doesn’t want to issue equity (dilutive) or take on debt (requires collateral). So it invents a drug royalty from scratch, an entirely new obligation that did not previously exist, and sells that. Now they do not own the drug’s IP entirely, some other party owns 3% of the future sales of it if it ever succeeds, and the biotech gets non-dilutive capital today.
What’s the difference between these increasingly complex synthetic royalty agreements and typical, bespoke pharma deals? They feel similar. And yes, they are functionally equivalent in terms of cash flow or deal structure. Where the difference lies is in each party’s intent. In typical pharma deals, the buyer cares about something strategic, say, operational control over a drug’s development journey. Buyers of royalties, synthetic or otherwise, do not care about that. They care entirely about the probability-weighted present value of the future payments, and you can imagine how useful this decoupling of capital from often burdensome partnership demands can be.
The royalty market is, in some sense, a secondary market for the financial value typically embedded in pharma licensing agreements. And it's still early.
One report found that there were 102 major royalty transactions from 2020 to 2024, noting that synthetic royalties are growing at 33% annually. The buyer pool includes not only royalty-centric funds like Royalty Pharma, but increasingly pension funds and private equity as well. The same institutions Andrew Lo wanted to be in on biotech are getting in on the game, just in a different way.
This whole class of synthetic royalties is growing more complex over time, with some even including milestones built into the sold contract, such that the seller (the biotech) receives even more upfront capital upon the achievement of Phase 2 trial success or outright drug approval. The whole concept is also growing physically larger. In June 2025, Royalty Pharma and Revolution Medicines announced a $2 billion funding agreement—$1.25 billion of which was structured as a synthetic royalty—to fund the development of daraxonrasib; the largest ever transaction in this particular asset class.
But at the same time, within synthetic royalties, you can see the beginnings of a financial instrument that is strange enough that one cannot easily predict its second- or third-order effects. Pharma partnership agreements can be burdensome in the demands they make, but they are at least ‘time-bounded’ in ways that are easy to plan for. Synthetic royalties follow a company around forever, as long as a drug is under patent, actively extracting value all the way, their only contribution being an initial surge of capital. This is nobody’s fault of course, least of all the royalty holders. ‘We are selling to willing buyers at the current fair market price’ and all. But the cumulative effect, as more drugs carry more synthetic royalty obligations, is a pharmaceutical economy where an increasingly large fraction of every dollar of drug revenue is pre-committed to financial intermediaries before the drug reaches a single patient.
But it’s not as cut and dry as ‘synthetic royalties are bad' because of this. Consider the Revolution Medicines case from earlier. Their drug daraxonrasib has a strong chance of being a blockbuster, and so scaling global commercialization will be enormously expensive. An equity raise would have diluted ownership right before value-inflecting Phase 3 readouts (which were excellent!), traditional debt at that scale would be impractical, and a pharma partnership would surrender commercial rights to what could be a decade-long franchise of label expansions. The synthetic royalty allowed Revolution to sidestep all three, largely as a result of the royalty being tiered, decreasing with sales volume, and dropping to zero above $8 billion in annual net sales. If daraxonrasib becomes a true blockbuster, the royalty burden effectively caps out and becomes negligible as a percentage of revenue.
But why would Royalty Pharma agree to this at all? Isn’t this clearly not in their favor? Not at all: they likely just did the numbers, and anything above some certain threshold in yearly sales is both unlikely and unneeded for their portfolio math, so they are happy to give the tail scenario away for free.
All of this, only possible because there is an entity willing to buy a manufactured claim on future revenue that didn't exist until someone decided to create it. The royalty market shows the basic pattern: once a future drug cash flow becomes legible, someone will turn it into a security.
What we’ve discussed so far assumes some degree of intentionality. Andrew Lo purposefully came up with his thesis, Royalty Pharma deliberately built a business around drug royalties, and so on. But there are two particular financial instruments that were intentionally designed at the start, but have slowly begun to display an unpredictable life of their own once deployed. I’d like to discuss them because I think they do a great job in demonstrating not only how tradable instruments in finance can have material impact in how drug development works, but also how those impacts can be very difficult to predict in advance.
The two are PRVs, or Priority Review Vouchers, and CVRs, or Contingent Value Rights.
We’ll start with PRVs.
In 2006, three professors at Duke published a paper titled “Developing Drugs for Developing Countries". In it, they discuss a well-trodden problem: infectious and parasitic diseases create enormous health burdens in the developing world, but because the people suffering from them are poor, there's essentially no commercial incentive to develop treatments. Of course, ideally there would be some way to incentivize for-profit companies to do it. But financial incentives require money, and money requires Congress, and Congress requires political will that rarely materializes for diseases affecting people who can't vote in U.S. elections.
The fix, the authors argued, is to use a logistical incentive instead. If you are willing to develop a drug for a neglected disease, the government ought to help you out somewhere else in your drug portfolio.
How? By offering you a PRV. But what use is the PRV? Once a pharmaceutical company has wrapped up their clinical trial work and submits an application to the FDA for official approval, they must wait 10 months for FDA review. But if they submit this one-time-use-only voucher alongside the application, FDA should be forced to give you a review within 6 months. And just in case you don’t actually have an internal portfolio of drugs to allocate this PRV to, the voucher should also be sellable. Four months of time-value of earlier market entry for a ‘top-decile’ drug can be worth an awful lot, around $300 million according to the authors.
You can imagine a very neat feedback loop from all this. For instance, a small nonprofit or academic group develops a river blindness treatment, receives a voucher, and can then sell the voucher to Pfizer to use the proceeds to fund more neglected disease work.
In a rather astonishing act of ‘listening to healthcare economists’ that I don’t believe ever occurred thereafter, Congress enacted the program in 2007, just a year after the paper’s publication. It expanded again in 2012 to include rare pediatric diseases. And again in 2016 to include medical countermeasures against biological/chemical/radiological threats.
There are two things I find very interesting about PRVs.
The first is that, as the title of this section implies, the PRVs have gained secondary market price dynamics that its creators never intended. The buying cost of a PRV at any given moment is a function of how many are floating around, how many blockbuster drugs are approaching FDA submission, the competitive landscape, and whether Congress has recently done something to expand or contract the program. AbbVie paid $350 million for a single voucher in 2015—the all-time high, driven by the voucher being the only one out there and that their competitor was releasing a similar drug to theirs. Novartis picked one up in 2023 for $21 million—the all-time low.
How did Novartis get one for so cheap? Funnily, that particular story also illustrates the increasingly complex financialization of biotech quite well. When Novartis licensed a particular drug to a particular biotech back in 2019, it baked in a “pre-agreed, contractually defined percentage of the PRV value” into the licensing agreement four years before the voucher existed, and, in fact, before the drug itself had even been approved. And when the biotech got the drug approved and received the voucher in 2023, Novartis simply exercised the option to purchase it for a ridiculously low value.
Imagine being the person behind that deal!
The second thing, even more interesting is that the whole program has increasingly begun to bear no fixed relationship to the social good it was meant to incentivize. Why? Because even at the voucher's peak secondary market value of $350 million—though it usually oscillated around the $100M mark—it was not enough to shift a large pharma company's portfolio allocation in any meaningful way. In the few cases it did, it shifted it towards doing the absolute, most bare-minimum possible thing: approval of the drug, not utility of the drug. The voucher pays for the regulatory event, not the public-health outcome. In a great paper titled ‘The priority review voucher: a misconceived quid pro quo’, the authors say this:
…the PRV, except few examples, has largely failed to deliver medical benefits for patients suffering from neglected diseases because it rewards obtaining FDA marketing authorisation without regard for the products actually being available, affordable and equitably accessible for people.
Now, it would be lying to tell you that PRVs have not helped anyone. They have! But there have been enough cases of bad behavior here that it is worth wondering if there is something better that is possible. This is, in fact, being worked on, but it takes us off topic, so I’ve put some details about it in the footnotes[1].
The second financial instrument I want to discuss is the CVR, or Contingent Value Right.
CVRs are simple. When an acquirer and a target company cannot agree on what a drug-in-development is worth—which is most of the time—they structure a simple conditional payment. If the acquired drug(s) hits a specified milestone, the acquirer pays the target's former shareholders an additional sum. Most CVRs are structured like normal pharma partnerships, as in, a closed, non-transferable contract between two partners. A small minority of them are structured as tradable securities, listed on the NYSE or Nasdaq with their own ticker symbols, but these aren’t particularly special beyond their raw size.
What is most interesting about CVRs is the perverse incentives they sometimes create.
When Sanofi acquired Genzyme for $20 billion in 2011, Sanofi issued CVRs tied to the regulatory approval and commercial success of Lemtrada (alemtuzumab), an MS drug that Genzyme had been developing. Up to $3.8 billion was on the table if the drug hit its milestones. But Sanofi was also simultaneously developing its own MS drug, Aubagio. Aubagio had no CVR obligations attached to it.
Sanofi was now contractually obligated to compete vigorously against itself, on behalf of strangers, for free. Predictably, it did not.
Obviously, Sanofi was sued for this. The former shareholders alleged that Sanofi deliberately slow-walked Lemtrada's FDA submission and under-invested in its commercialization to minimize CVR payouts. But deliberate sabotage is hard to distinguish from ordinary sluggishness. Sanofi settled in 2019 for $315 million—well short of the $708 million in missed payouts the shareholders claimed—without admitting wrongdoing.
The pattern repeated more recently and at even larger scales in 2019, with BMS's $74 billion acquisition of Celgene, in which $6.4 billion in CVR payouts hinged on three drugs hitting FDA approval by fixed deadlines. Two were approved on time. The third missed by thirty-six days. As a result, the entire CVR expired worthless. As you may expect, former shareholders again sued.
If we were to generalize this, the structural problem is simple: a CVR can make the buyer responsible for creating a payout that the buyer would rather not pay. But if that’s the case, why are CVRs—which are accelerating in their popularity—done at all? For one, the above case studies are very much not the norm, most go on perfectly fine. And two, the value of CVRs as a coordination mechanism, even when they go wrong, empirically outweighs the later headaches they cause.
Royalty Pharma is not the only player in the royalty space. There are a few others, one of them named XOMA Royalty. XOMA is especially interesting, because it was once a traditional biotech company that developed and licensed drugs. And in 2017, it pivoted to become a royalty aggregator. And starting in 2024, it began to poke at the business of buying up, and liquidating, ‘zombie biotechs’.
Zombie biotechs are publicly traded companies whose stock price is below the cash on the balance sheet. This translates to investors saying that their IP, patents, clinical data, team, all of it, is not only worthless but is actively destroying value by burning through cash that would be better deployed sitting underneath a bed. Roughly 300 companies fit this description in mid-2024, most of them casualties of the 2020-2021 IPO bubble, when a lot of biotechs went public that had no business doing so.
These companies can’t raise equity (who would buy?), can’t take on debt (against what collateral?), and can’t be bought/merged with anyone (who would want them?). In an ideal world, the founders would simply put the whole business out of its misery, but they are collecting a paycheck anyway with their dwindling cash reserves and closing down a public company is a surprisingly legally fraught thing to do. So they just wander around as zombies.
XOMA’s insight was that this particular purgatory may itself be an asset class. They step in, acquire the company at or below cash value, and return cash to the shareholders who have been trapped in a slowly deflating stock for years. Then, they take a close look at everything the company created—patents, clinical data packages, licensing rights, partially completed regulatory filings—and sell it, keeping the profits for themselves. Or simply hold it, just in case it’ll be useful elsewhere in their portfolio.
The concept itself is not new. This is vulture investing, translated into biotech. But whereas a typical vulture investor’s goal is to flip an entire company onto someone else, the biotech vulture capitalist’s hope is to sell off pieces of the company. And the pieces can be surprisingly valuable. A drug candidate that failed a Phase 2 trial in one indication can be worth millions to, say, some of the hub-and-spoke companies we discussed earlier. Maybe Roivant believes that the endpoint was misspecified, or the indication was wrong, or that the drug is indeed useless, but that the PK/PD, safety signals, biomarker responses, regulatory responses, and dose-response curves uncovered during the trial are useful, and they’d be willing to pay vast sums for that data. What XOMA does here is make this information legible to potential buyers.
To help illustrate this, let’s consider a case study: Kinnate Biopharma.
Kinnate was an oncology company developing kinase inhibitors for cancer patients with specific genetic mutations. As the story goes for many companies of that era, they went public and by early 2024 were trading below their cash balance. There was no outright clinical trial failure, they simply ran out of money to develop their drugs further. In February 2024, XOMA announced it would acquire Kinnate for roughly $2.50 per share in cash, or $126 million. Then, over the next year, XOMA sold all five of Kinnate’s pipeline assets to other companies. In April 2025, they announced the completion of these sales, with terms entitling XOMA to up to $270 million in upfront and milestone payments, plus, keeping to their name, ongoing royalties ranging from low single digits to mid-teens on commercial sales. Kinnate’s shareholders received most of the upfront payment, and XOMA got to double its money in flipping the assets.
What would the counterfactual be if XOMA had not stepped in? Kinnate would’ve continued to bleed cash until it ran out. At that point, the IP would have been worth even less—the utility of biological information depreciates fast!—and the shareholders would have gotten back even less, perhaps nothing at all.
There is another player in this space worth discussing: Kevin Tang, through Tang Capital and its acquisition vehicle Concentra Biosciences. By mid-2025, Concentra had become one of the busiest buyers in biotech, making repeated bids for distressed public companies, with the explicit intention of closing them down, selling whatever assets could still be sold, returning some cash to shareholders, and keeping whatever spread remained.
Isn’t this quite similar to XOMA? Yes, both XOMA and Concentra are buyers of distressed, sometimes very clearly, biotechs. But the difference is when they arrive. XOMA typically shows up at the doorstep of companies that are clearly on death's door. But Concentra often arrives earlier, while the public company is technically alive and its board is still weighing bad alternatives: reverse merger, dilutive financing, slow wind-down, strategic review, or sale. And Concentra aggressively attempts to force the boards hand into a sale to them.
To be fair, ‘force’ is a bit strong of a word here. A better term would be ‘an offer they can’t (easily) refuse’. Concentra’s pattern is to accumulate a large minority stake, make an unsolicited bid, and dare the board to explain why shareholders should keep funding the burn instead of taking cash now.
Why can’t they refuse it?
Consider Jounce Therapeutics. In February 2023, Jounce announced a reverse merger with Redx Pharma, alongside a 57% workforce reduction. This was not exactly a happy ending, but it was at least a biotech ending: Redx’s pipeline would become the core of the combined company, Jounce shareholders would own a minority stake, and some version of the organization would continue to exist. Then Concentra appeared with an offer that promised even more liquidation to the shareholders, but one that would completely strip-mine Jounce to sell off as parts.
Tang is not doing anything illegal here, nor are boards literally compelled to accept every higher bid that comes along. But once a company has put itself in sale mode, the board starts to look less like a steward of a scientific project and more like an auctioneer for whatever value remains. This creates a bleak asymmetry. A reverse merger can be better for the people inside the company, better for the local biotech ecosystem, and perhaps even better for the vague moral category of “letting the science continue.” But that is not the job of the board to further. Their job is to ensure the shareholders are best served, and for them, Concentra’s highly liquid offer is difficult to argue against. In Jounce’s case, the Concentra transaction also came with an 84% workforce reduction. The board went with the Concentra offer.
Curiously, there are ways for companies to fight back against Concentra, and fight back they have. Their weapon is colloquially referred to as a ‘poison pill’, and goes as follows: if Tang keeps buying shares and crosses a threshold, usually around 10%, then every other shareholder receives the right to buy more stock at a discount, instantly diluting Tang. This does not resurrect the company, and it does not make Tang go away. It simply prevents Tang from buying enough stock in the open market to make liquidation feel inevitable before the board has themselves decided it is inevitable.
This is all quite interesting. But it is likely a transient phenomenon. The zombie biotech liquidation market is a finite resource; the 300 companies trading below cash are overwhelmingly a product of the 2020-2021 vintage, a specific historical moment when the bar for going public was unusually low. That cohort is being worked through. Some will be acquired by the players discussed here. Others will manage to raise capital and survive. Most will simply wind down on their own, returning whatever cash remains to shareholders without the intermediation of a vulture buyer. Unless there’s another IPO bubble of comparable scale soon, the supply of zombie biotechs will shrink over the next few years, and the opportunity that is currently being exploited will narrow.
So why do I mention this at all?
The zombie biotech business is worth dwelling on because it marks a kind of endpoint. Whereas every other instrument in this essay financializes drugs that might still become therapies, these are different. XOMA financializes drugs that won’t, and Concentra financializes drugs that likely won’t. If the frontier of financial creativity has reached the dead and dying, it tells you something about how thoroughly every other surface has already been colonized.
Andrew Lo’s original insight was not that finance could make drug development easy. Nothing can make drug development easy. His insight was that finance might make failure survivable. I think this is directionally correct. Financialization is just the process of making implicit economic relationships explicit and tradable, and more liquid markets for biotech risk are almost certainly better than fewer. And what has happened since Lo’s 2012 paper is that financial engineering has been applied not just to the drug portfolio problem, but to every conceivable surface of the drug development process: partnerships, mergers, royalties, and even the death of a company.
Is this a bad thing? Probably not. Objecting to the decoupling of finance from therapeutic value is a bit sentimental, since, in theory, financial incentives should track therapeutic value. But how confident are we about that? Are we boiling a frog here? And if we are, what exactly is the frog?
Like I said at the start, it is important to understand that financial engineering is happening for a reason: this whole industry is excruciatingly difficult to build something in. It’s only getting worse too. Starved of capital, clever people will figure out ways to offer it in increasingly exotic forms to increasingly desperate scientists or companies, and little can prevent these two from finding each other at a bar. The alternative to a financialized biotech industry is not some prelapsarian era of pure scientific inquiry. It is the same industry, with the same problems, but less money and fewer ways to deploy it.
But let’s say we are being idealistic here. What, then, should worry us about financialization squishing its way deeper into drug development? I’m happy to raise my hand first: I'm a little worried about whatever private-equity did to hospitals happening, in slower and less visible ways, to molecules themselves. Yes, I realize the nature of drug discovery imposes a constraint that most financialized industries don't have: the thing has to actually work. The FDA is a binary filter that no amount of financial engineering can route around, and as long as that's true, the typical finance-driven enshittification story shouldn’t apply here.
But "working" and "mattering" are not the same thing. For instance, you’ll notice that both Roivant’s and BridgeBio’s drug pipelines share a similarity: a focus on rare diseases. Finance people love rare diseases. Small trials, clear genetic etiology, often no existing standard of care, accelerated approval pathways, and excellent unit economics. This is fantastic for the several hundred, perhaps several thousand, patients helped by this work, and I don’t intend to minimize it. But would GLP-1s come out of this process? Would PrEP?
This doesn’t have to be a big deal. All of these could coexist. Big pharma and startups continue to have high variance bets, the financialized folks stay low variance, they work together when needed, the world is at peace. But capital is finite, and drug development keeps getting more expensive and less predictable. My worry is not that BridgeBio and Royalty Pharma are doing something bad. They aren’t, and are in fact doing something very good. The worry is that they are doing something so legible, so well-suited to the preferences of the capital markets, that the money increasingly, naturally flows to them and nowhere else.
Is this a real worry?
On one hand: obviously not. The sort of financialized rare disease work presented here may look quite good, but it still makes up an extremely small portion of biotech funding—around 2%. And it is not like Roivant or BridgeBio are poking at some genuinely undiscovered alpha. They are about a decade old, and despite their success, still don’t have many peers. Maybe this market is self-limiting. Maybe there are only so many BridgeBio-shaped opportunities in the world, and the rest of the biotech-earmarked dollars must go towards the higher-variance stuff.
On the other hand, the counterargument is the patent cliff. Between 2025 and 2030, patents for nearly 200 drugs are set to expire, including roughly 70 blockbusters. More than $300 billion in revenue is at risk, or about one-sixth of the industry’s annual revenue. Patent cliffs are normal, but this one is unusually large, weighing in at three times the size of the cliffs of the 2010s in lost revenue. Five of the top 10 pharmaceutical firms face a potential hit exceeding 50% of their current revenue.
What changes after an event like that? Perhaps Big Pharma will increasingly look towards easier, lower-risk/lower-reward diseases. Maybe they’ll be increasingly sympathetic to royalty and synthetic royalty funding agreements, further cutting into the economics of a drug. Maybe this leaks over into the public markets, and the diffuse preferences of a thousand allocators would rather fund the pharmaceutical companies who go down that path, instead of continue with the status quo.
The frog is not any single drug or company. It is the industry’s willingness to fund biology that is illegible, expensive, and likely to fail, which is to say, the kind that occasionally changes the world. Again: financial engineering did not create this problem—that fault can be attributed entirely to R&D productivity decline. In fact, the financiers may even be an especially brave vanguard in giving biotech the veneer of being a viable asset class. But they still may wind up making the response to the underlying problem worse by offering a way to achieve returns in ways that slowly diminish our institutional capacity to create the next generation of revolutionary medicines.
To end this off: I have deliberately left out China, which may be the most aggressive current example of financial architecture shaping a drug pipeline. That deserves its own essay, and will get one soon.
In June 2025, the CNPV (Commissioner’s National Priority Voucher) was announced by FDA Commissioner Makary, and represents a brave new direction of the concept: a non-transferable voucher that can be used for a 1-2 month review period and is awarded based on alignment with “critical U.S. national health priorities.” What does this mean? Nobody knows!
What we do know is that 18 vouchers have been awarded so far, 4 products have been approved through the program, and the whole thing has basically zero external visibility. If you go online, there is a lot of distaste about the whole thing, including two lawmakers who expressed that the program could “enable corruption by creating a new, lucrative gift for drugmakers and allies politically favored by President Trump.” I get it. But I think there is actually some utility in drug approval processes that are bespoke enough to let the federal government both accommodate practical constraints—manufacturing limitations, supply chain fragility—and extract concessions like price adjustments in return for regulatory speed. Obviously not ideal that such a program exists in the context of the volatile current administration, but I’m not especially opposed to a ‘we’ll fast-track good stuff through an opaque review process’ setup.