2026-04-12 15:00:07
(Recommended listening: Low - Violence)
Last year, I personally called AI companies to warn their security teams about Sam Kirchner (former leader of Stop AI) when he disappeared after indicating potential violent intentions against OpenAI.
For several years, people online have been calling for violence against AI companies as a response to existential risk (x-risk). Not the people worried about x-risk, mind you, they’ve been solidly opposed to the idea.
True, Eliezer Yudkowsky’s TIME article called on the state to use violence to enforce AI policies required to prevent AI from destroying humanity. But it’s hard to think of a more legitimate use of violence than the government preventing the deaths of everyone alive.
But every now and then some smart ass says “If you really thought AI could kill everyone, you’d be bombing AI companies” or the like.
Now, others are blaming the people raising awareness of AI risk for others’ violent actions. But this is a ridiculous double standard, and those doing it ought to know better.
AI poses unacceptable risks to all of us. This is simply a fact, not a radical or violent ideology.
Today on Twitter, as critics blamed the AI Safety community for the attacker who threw a Molotov cocktail at Sam Altman, I joined a chorus of other advocates for AI risk reduction in -- again -- denouncing violence. This was the first violent incident I’m aware of taken in the name of AI safety.1
Violence is not a realistic way to stop AI. Terrorism against AI supporters would backfire in many ways. It would help critics discredit the movement, be used to justify government crackdowns on dissent, and lead to AI being securitized, making public oversight and international cooperation much harder.
The first credible threat of political violence motivated by AI safety was the incident with Sam Kirchner (formerly of Stop AI) I mentioned at the outset. This incident was surprising, since from its conception, Stop AI held an explicit policy of nonviolence, and members of the group liked to reference Erica Chenoweth and Maria Stephan’s book Why Civil Resistance Works: The Strategic Logic of Nonviolent Conflict.
Research like Chenoweth’s suggests that nonviolence is indeed generally more effective. It’s a little bit unclear how to apply such research to the movement to stop AI, as her studies involved movements seeking independence or regime change rather than more narrow policy objectives. But if anything, I’d expect nonviolence to be even more critical in this context.
So if nonviolence is often strategic, when do movements turn to violence? Perhaps surprisingly rarely.
People say anti-AI sentiments and movements -- especially those that emphasize the urgent threat of human extinction -- are bound to breed violence. I think this is ignorant and actually makes violence more likely. Environmentalism has been a much larger issue for a much longer time, and “eco-terrorism” is basically a misnomer for violence against property, not people (more on that later).
There are many political issues in the USA that we never even consider as potential bases of violent movements. Even if there are occasional acts of political violence like the murders of Democratic Minnesota legislators or Conservative pundit Charlie Kirk, we don’t generally view them as indicting entire movements, but as the acts of deranged individuals.
My hunch would be that movements generally turn violent because of violent oppression against their members, not simply for ideological reasons. Although there are of course counter-examples, such as bombings of abortion clinics, where attackers justified their actions as preventing the murder of unborn children, or ideologies preaching violent revolution, such as at least some varieties of Communism.
An important question for “nonviolent” activists is whether they include violence against property in their definition of “violence”. Stop AI does. I assume Pause AI does as well, but it’s a moot point since they also reject illegal activities entirely.
The question deserves a bit more discussion, though, as it’s a common point of contention and legal and dictionary definitions differ. First, there is clearly an important distinction between violence against property and violence against people. An argument in favor of using “violence” to only mean “against people” is that we don’t have another word for that important concept. Still, I favor a broader definition that includes attacks on property, for a few reasons:
Many other people use this definition, and I think the damage that being perceived as violent can cause to a movement can’t be mitigated by a semantic argument.
Attacks on property can escalate. You are commonly allowed to use proportionate violent force against people to defend your property.
Attacks on property can hurt people. Setting fire to buildings, as activists associated with the Earth Liberation Front have done, seems hard to do without some risk of hurting people.
That being said, I think there’s a bit of a grey line between “violence against property” and vandalism. I’d say violence must involve the use of “force”. For example, I think most people wouldn’t consider graffiti an act of “violence”.
I think the example of eco-terrorism is instructive. The vast majority of the environmentalist movement is non-violent. However, a small number of activists have advocated for and enacted tactics such as tree-spiking that have injured people.
Hence we now have the term “ecoterrorist”. The very existence of this phrase is misleading. I remember a while back I was curious — who were these ecoterrorists? What had they done? Why hadn’t I heard about it, the way I heard about other terrorist attacks. Well, when you look into it, it’s arson and tree-spiking and that’s about it. I seem to recall reading about one example where actions intended to destroy property actually ended up killing people, but wasn’t able to easily dig it up.
Still, these few actions were enough to add this word to our lexicon, and create an image of environmentalists as more radical and anti-social than they really are.
I’m struggling to find a good way of ending this post.
I believe the actions of AI companies are recklessly and criminally endangering all of us, and the public will be increasingly outraged as they discover the level of insanity that’s taking place. Similarly to Martin Luther King Jr.’s comment that “a riot is the language of the unheard”, I do understand why this emotional outrage might provoke a violent response.
But I hope the movement doesn’t spawn a violent element and that these recent examples are isolated incidents. To make that more likely, we should continue to vocally espouse nonviolence, and denounce those who would encourage violence among us.
But ultimately, movements are fundamentally built through voluntary participation, and nobody can entirely control their direction. The response should be to try and steer them in a productive direction, not to avoid engaging with them.
Earlier this week, bullets were fired into the house of a local councilman supporting datacenter development; it’s unclear whether AI was a motivation in that case.
2026-04-12 14:30:10
Decentralize or Die.
In April 2024, a salvo of cruise missiles destroyed the Trypilska thermal power plant, the largest in the Kyiv region, in under an hour. In June 2023, the destruction of the Kakhovka dam left a million people without drinking water and wiped out an entire irrigation system downstream. Throughout three winters, strikes on combined heat and power plants have left apartment buildings in Kyiv at indoor temperatures barely above freezing. In December 2023, a single cyberattack on Kyivstar, Ukraine's largest mobile operator, cut phone and internet service for millions.
One would think that under such attacks on infrastructure any society must necessarily collapse. Or at least that’s what Putin hopes for. But the last time I’ve checked, Ukraine was still very much alive and kicking. The question is: how is that possible?
***
In winter 2022, when the blackout in Kyiv happened for the first time, people had to cope for themselves. Here’s Tymofiy Mylovanov, professor at Kyiv School of Economics, tweeting in real-time:
There is no electricity, no heating, no water. Outside temperature is around freezing. The apartment is still warm from the previous days. We will see how long it lasts. We have blankets, sleeping bags, warm clothes. I am not too worried about heating until temperature goes below -10 C / 14 F. But the water is another issue. The problem is toilets. We have stockpiled about 100 litters of water. There is also snow on our balcony. It is a surprisingly large supply of water. But every time I go there to get it, I have to let the cold air in; not good. For now, the cell network is up, although the quality varies. Thus, I have internet. Internet is critical for food. Yesterday we went to a grocery store to buy up a bit more stuff in case there will be shortages. Food is there, no lines. The challenge is to pay. Most registers work with cash only. Just a few are connected to accept credit cards. Through cell network. The banking system is stable, but I will go get some cash in case Telekom or banks go down Our stove is electric. This means no warm food until the electricity is back. This is not fun. We have to fix it. There are two parts to our plan. First, we will buy an equivalent of a home Tesla battery. So it can be charged when there is electricity. This will also solve, somewhat, the heating problem, as we have already bought some electric heaters. But the electricity might be off for a long time and so we need gas or wood cooking equipment. I guess we have to go shopping. Stores work. They run huge diesel generators.
Later that day he dryly comments: “In the morning I said I was not worried about heating. Instead, I was concerned about water and sanitation. Boy, was I wrong.”
It’s worth reading the tweets from the next few days: Getting a generator, setting it up, placing it on balcony so that fumes stay outside, getting the wires in without letting the cold in as well. Go check it out for yourself.
Anyway, what followed was a series of adaptations, a kind of military vs. civilian arms race. Through the first winter, the strategy was simply to repair what Russia destroyed. Substations and transformers that could be replaced within weeks with donated European spares.
In the meantime, for the millions of affected people, the government created stopgaps. Over 10,000 heated public spaces in schools, government buildings, and railway stations offered electricity, water, internet, and phone charging. Kyiv deployed mobile boiler houses that could run for days without refueling. Hospitals installed Tesla Powerwalls. Cafes ran diesel generators and became de facto community centers.
Mobile boiler house in a shipping container. You truck one in, connect it to a building's existing heating pipes, and it starts working.
I’ve donated to some of those efforts, maybe you did too. And taken all together, it worked. Kind of. But by 2024 Russia adapted. Strikes shifted from repairable transmission equipment to the power plants themselves, assets that take years to rebuild. The Trypilska plant was partially restored after its destruction, then it was struck again by drones months later. And again after that. With two-thirds of generation capacity gone and every thermal plant in the country damaged, it became clear that restoring the old centralized system was not a viable strategy.
Ukraine's response shifted. It was not to rebuild what was destroyed but to replace it with something less centralized. Something too dispersed to target. Instead of restoring the Trypilska plant's 1,800 megawatts, hundreds of small cogeneration units were scattered across the region, compact gas turbines producing 5 to 40 megawatts each, generating heat alongside the electricity. By late 2025, Ukraine's heating sector alone ran over 180 such units as well as hundreds of modular boilers. Hospitals, water utilities, and apartment blocks are organized into autonomous energy islands, microgrids that keep functioning even if the national grid goes dark. No single unit is worth a cruise missile. And a destroyed module can be replaced with a phone call and a truck from Poland.
The same logic extends to water. Ukraine's centralized water systems are inherited from the Soviet era. A single pumping station serves hundreds of thousands of people. They are just as vulnerable as the power plants. Strikes on the grid cut electricity to pumps. Without pumps, water stops flowing. In winter, standing water in pipes freezes and bursts them, cascading damage across entire districts.
In Mykolaiv, a damaged pipeline to the Dnipro River left 300,000 residents relying on salty, barely drinkable water from a local estuary for over a year. The response mirrors the energy transformation: water utilities are installing their own solar panels and battery storage to decouple from the grid entirely.
Solar panels are, under these circumstances, close to an ideal solution. They are cheap, manufactured at scale, and can be replaced in a single day. By early 2024, Ukrainian households and businesses had installed nearly 1,500 megawatts of rooftop solar. Not because of climate change, but because of survival. Solar panels are inherently dispersed. There is no single set of coordinates an attacker can hit to disable them all. And destroying them one by one would cost the attacker more in munitions than the panels are worth.
This kind of arithmetic pops up everywhere. In the ongoing Iran war, Ukrainian military observers were flabbergasted by Gulf states and the US burning through hundreds of Patriot missiles, $4 million each, to shoot down cheap Iranian Shaheed drones, $35,000 apiece. If destroying a target costs more than the target itself, the attacker loses even if the strike succeeds.
A different kind of decentralization is happening in the telecommunications domain. The infrastructure was already fairly decentralized to start with, a legacy of makeshift internet adoption that happened in many Ostblock countries, with many small ISPs emerging independently. The war pushed this further. Ukraine has adopted a layered backup approach: if fiber broadband fails, mobile networks fill the gap; if mobile networks are knocked out, Starlink steps in as a last resort.
The logic extends to government services. There’s the Trembita data exchange platform, where government services talk each other directly without centralizing the data. (Trembita is based on Estonian X-Road system — the birth of Estonian e-gov technology is a fascinating story in itself, and there’s a whole book about it!) Built on top of it, there’s the Diia app that allows citizens to file taxes, register vehicles, access medical records, open bank accounts, register births, and start businesses, all from a smartphone. This, of course, means there’s no single office building to target so as to disrupt a particular kind of activity.
Add to that Ukrainian governmental data are now stored in the cloud. A week before the invasion, Ukraine's parliament quietly amended a law that had required government data to be stored physically in Ukraine. On the day the missiles started flying, the Ukrainian ambassador in London met AWS engineers and decided to fly three AWS Snowballs, hardened suitcases that hold 80 terabytes each, from Dublin to Poland and then move them to Ukraine the very next day. Ukrainian technicians copied population registers, land ownership records, and tax databases onto them and shipped them back out.
It was a race. On the day of the invasion, cruise missiles struck government server facilities while Russian cyber operatives simultaneously deployed wiper malware, software designed to permanently destroy data, against hundreds of Ukrainian government systems. Some data was lost, but the most critical registries were already gone, smuggled out of the country in carry-on luggage.
***
On the battlefield, where all these trends are even more severe, concentration has become suicidal. Russian infantry now advances in groups of two or three. Anything larger is an invitation for a drone strike. Warships are floating targets. Russia's Black Sea Fleet retreated from Crimea after losing vessels to cheap unmanned boats. In the Hedgehog 2025 exercise in Estonia, a small team of Ukrainians and Estonians with drones, acting as the opposing force, wiped out two NATO battalions, thousands of soldiers, in half a day, not least because they had moved in columns, parked their vehicles in close formations and failed to scatter under attack.
They made the same mistake as the designers of Soviet-era power grids: they concentrated value and got destroyed for it. Call it the blast radius principle. In a war of attrition, any asset whose destruction is worth more than the cost of the weapon that can reach it will, sooner or later, be destroyed. The only effective strategy is to push the value of each individual target below that threshold, to become, in effect, too small to bomb.
When Rheinmetall’s CEO recently made a condescending comment about Ukrainian housewives 3D-printing drones in their kitchens, much merriment ensued. Because Rheinmetall, of course, builds the very kind of heavy conventional, WWII-style hardware that the developments in Ukraine are rapidly making obsolete.
But mockery aside for a moment: if you’ve spent any time around progress studies, the phrase “housewives building drones in kitchens” makes you prick up your ears. It triggers a specific association: cottage industry, the small-scale, home-based production that preceded and enabled the industrial revolution. It makes you think about how the modes of production change over centuries.
You know that kings and generals don’t make history. One empire falls, another rises, nothing fundamentally changes. What does matter is new technology. Even more so new technology which fundamentally changes how things are done. Technology that reshapes the economics of entire production chains. Agriculture. Road system. Bill of exchange. Putting-out manufacture. Joint-stock company. Assembly line. The humble shipping container…
Does decentralization, as seen in Ukraine, fit the bill? We don’t know. FirePoint, the Ukrainian company producing the much-spoken-about FP drones, is distributed across more than 50 manufacturing sites throughout the country. But that’s nothing new. The allied bombing campaign during WWII failed to halt German aircraft manufacture precisely because Germany had decentralized its industries. Albert Speer, then the minister of armaments, dispersed production into hundreds of small workshops, caves, tunnels, and forest sites across the Reich. German aircraft production actually increased in 1944, the year of the heaviest bombing. But then, after the war, German industry did concentrate again.
What seems different this time, though, is the spillover into the civilian sector. Speer dispersed munitions factories, but German civilians kept heating their homes the same way throughout the war. In Ukraine, the dispersal extends to utilities, water systems, telecommunications, government services. Russians bomb a heating plant, the heating network disperses into dozens of autonomous microgrids.
The obvious objection is that this is a wartime hack, not a permanent transformation. Distributed systems sacrifice economies of scale. A hundred small gas turbines are less efficient than one large power plant. Once the war ends and the skies are safe, the economic logic will reassert itself and everything will concentrate again.
And indeed, in some cases, that's exactly what will happen. Ukraine is currently bombing Russian oil refineries and fertilizer plants, and although cracking crude oil in plastic bottles in a kitchen is exactly the sort of thing you might expect Eastern Europeans to do, it's unlikely to match the efficiency of a proper refinery. Some industries have genuinely irreducible physical economies of scale. The chemistry demands large reaction vessels, the thermodynamics reward concentration. Similarly, some infrastructure simply cannot be distributed. It's hard to imagine a decentralized railway system or a dispersed deep-water port — at least short of giving up on it and transporting everything by drone.
But not all economies of scale require spatial proximity. Sometimes, it’s just sheer scale that matters, not necessarily the co-location. Case in point: solar panels. Other times the crucial element is the organizational structure, not the physical location of the employees. Basically any service offered over internet is like that.
But all that being said, there’s a specific reason to think some of these changes may stick.
Over the past fifty years we’ve accumulated an entire arsenal of distributed technologies. Packet-switched networks. Drones. Solar panels. Distributed databases. 3D printing. Even nerdy cypherpunk inventions like public key cryptography, zero-knowledge proofs and cryptographic ledgers. And it’s not just technical stuff. We’ve developed distributed social technologies too: open-source-style cooperation (who would have predicted that military intelligence, of all things, would be the next domain to go open-source?), market design, remote work, video conferencing. Even prediction markets as a tool for aggregating dispersed knowledge.
Some of these are already ubiquitous. Around 70% of the world’s population already has access to the Internet, a network famously designed to route around damage during a nuclear war. But others feel like we’re barely scratching the surface. 3D printing has existed for decades, yet it still feels like a technology that we are only playing with. We may be like pre-Columbian Americans, whose children played with wheeled toys, but the adults carried loads on their backs.
Mesoamerican wheeled toy.
Based on historical examples, we know that inventing a technology is often not the bottleneck. The aeolipile was invented in the first century AD, but we still had to wait another seventeen centuries to get an actual steam engine. Gutenberg went bankrupt. Adopting a technology is dependent on complex interplay of socio-economic forces that, at a certain moment, make the technology so desirable that people start using it despite all the drawbacks and overcoming all the vested interests. Then the learning curves kick in.
Two questions remain. Are those distributed technologies already adequately exploited, or are they like dead wood lying around in a forest, waiting for a spark? And if the latter is true, are the incentives created by the war in Ukraine — or for that matter, by similar future war elsewhere — sufficient to ignite it? They may be. Because once the enemy starts bombing companies, the incentives change. Working from home ceases to be a nice perk. Suddenly, it’s either work from home or die.
2026-04-12 13:37:42
Written quickly for the Inkhaven Residency.[1]
There’s a phenomenon I often see amongst more junior researchers that I call being scared of math.[2] That is, when they try to read a machine learning paper and run into a section with mathematical notation, their minds seem to immediately bounce off the section. Some skip ahead to future sections, some give up on understanding the section immediately, and others even abandon the entire paper.
I think this is very understandable. Mathematical notation is often overused in machine learning papers, and can often obscure more than it illuminates. And sometimes, machine learning papers (especially theory papers) do feature graduate level mathematics that can be hard to understand without knowing the relevant subjects.
Oftentimes, non-theory machine learning papers use mathematical notation in one of two lightweight ways: either as a form of shorthand or to add precision to a discussion.
The shorthand case requires almost no mathematical knowledge to understand: paper authors often use math because a mathematical symbol takes up far less real estate. As an example, in a paper about reinforcement learning from human preferences, instead of repeating the English words “generative policy” and “reward model” throughout a paper, we might say something like “consider a generative policy G and a reward model R”. Then, we can use G and R in the rest of the paper, instead of having to repeat “generative policy” and “reward model”. This is especially useful when trying to compose multiple concepts together: instead of writing “the expected assessed reward according to the reward model of outputs from the generative policy on a given input prompt”, we could write E[R(G(p))].
Similarly, mathematical notation can be used to add precision to a discussion. For example, we might write R : P x A -> [0,1] to indicate the input-output behavior of the reward model. This lets us compactly express that we’re assuming the reward model gets to see both the actions taken by the policy (A) and the prompt provided to the policy (P), and that the reward it outputs takes on values between 0 and 1.
In neither case does the notation fundamentally depend on knowing lots of theorems or having a mastery of particular mathematical techniques. Insofar as these are the common use cases for mathematical notation in ML papers, sections containing the math can be deciphered without having deep levels of declarative or procedural mathematical know-how.
I think there are two approaches that help a lot when it comes to overcoming fear of math: 1) translating the math to English, and 2) making up concrete examples.
As an illustration, let’s work through the first part of section 3.1 of the Kalai et al. paper, “Why Language Models Hallucinate”. I’ll alternate between two moves: restating each formal step in plain English, and instantiating it with a deliberately silly running example:
The section starts by saying that a base model can be thought of as a probability distribution over a set of possible strings (“examples”) X. As an example, a model such as GPT-2 can indeed be thought of as producing a probability distribution over sequences of tokens of varying length.[3]
Then, the authors write that these possible strings can be considered as errors or valid examples, where each string is either an error or valid example (but not both). Also, the set of example strings include at least one error and one valid example. The training distribution is assumed to include only valid examples.
Here, it’s worth noting that an “error” need not be a factually incorrect statement, nor that the training distribution necessarily includes all valid statements. Let's make up a rather silly example which is not ruled out by the authors’ axioms: let the set of plausible strings be the set of English words in the Oxford English dictionary, let the set of “valid” strings be the set of all words with an odd number of letters, while the training distribution consists of the single string “a” (p(x) = 1 if x = “a” and 0 otherwise).
The authors now formalize the is-it-valid (IIV) binary classification problem. Specifically, the goal is to learn the function that classifies the set of all strings into valid examples and errors. In our case, the function is the function that takes as input any single English word, and outputs 1 if the number of letters in the word is odd. Also, we evaluate how well we’ve learned this function on a distribution that’s a 50/50 mixture of strings in the training distribution (that is, the string “a”) and the strings that are errors, sampled uniformly (that is, all English words with an even number of letters.)
The authors then introduce the key idea: they relate the probability of their learned base model to its accuracy as a classifier for the IIV problem. Specifically, they convert the probability assigned by the base model to a classification: if it assigns more than 1/number of errors probability to a string, then the base model classifies the string as a valid string. Otherwise, it considers it an error.
The authors then introduce their main result, which relates the error of this IIV classifier to the probability the base model generates an “erroneous” string:
That is, the probability our base model generates an erroneous string is at least twice the error rate of the converted classifier on the IIV classification problem, minus some additional terms relating to the size of the valid and error string sets and the maximal difference between the probability assigned to any string by the training distribution and the base model.
To make sure we understand, let’s continue making up our silly example: our base model assigned 50% probability to the string “a” and 50% to “b” (and 0% to all other strings). Then (since it assigns 0% probability to any string with an even number of letters), its classification accuracy on the IIV problem is 100%, and its error rate is 0%. Indeed, the probability it generates an erroneous string is 0%. So we actually already have err = 0 >= 2 * err_iv = 0, trivially. It’s worth checking what the other terms here are, to make sure we understand: the first term is the ratio of the size of the set of valid strings and the set of erroneous string (in our case, the ratio of the number of English words with odd characters versus even ones), and the second is 0.5 – our base model assigns a 50% chance to “a”, which the training distribution assigns 100% probability to, and similarly our base model assigns a 50% chance to “b”, which the training distribution assigns 0% chance to.
I’m going to stop here, but I hope that this example shows that math is not actually that hard to read. Most non-theory ML papers have math sections that are similar in difficulty to this example. If you find yourself bouncing off the math, the question is rarely "do I know enough math for this?", and much more often "how can I translate this to English and use an toy illustrative example to make it concrete?"
I was going to conclude my “have we already lost” series, but I wanted to write about something lighter and less serious for a change.
There’s also a more general phenomenon that I’d probably call being scared of papers, to which the only real solution I’ve found is exposure therapy (interestingly, writing a paper does not seem to fix it!).
Specifically, GPT-2 takes as input a sequence of tokens, and assigns a probability distribution over 50,257 possible next tokens, one of which is the <|endoftext|> token. Starting from the empty sequence, GPT-2 induces a probability distribution over token sequences of any length, by multiplying the conditional probabilities of each subsequent token in the sequence, conditioned on all previous tokens.
2026-04-12 12:30:54
I'm pretty excited about training models to interpret aspects of other models. Mechanistic interpretability techniques for understanding models (e.g. circuit-level analysis) are cool, and have led to a lot of interesting results. But I think non-mechanistic interpretability schemes that involve using meta-models – models that are trained to understand aspects of another model – to interpret models are under-researched. The simplest kind of meta-model is linear probes, but I think methods that train much more complex meta-models (e.g. fine-tuned LLMs) to interpret aspects of models are much more exciting and under-explored.
(Sparse auto-encoders (SAEs) are also a kind of meta-model, but here I'm focusing on meta-models that directly interpret models instead of decomposing activations into more-interpretable ones.)
The best example of large-scale meta-models is Activation Oracles (or AOs; descended from LatentQA), which fine-tune a model to interpret model activations by treating the activations like tokens that are fed into the oracle model. I think this is a pretty good architecture for interpreting model thoughts, and I think it can be extended in a few ways to do interpretability better.
Diagram of how activation oracles work from the paper for context:
An advantage of AOs over traditional methods I like is that it's really easy to use them to quickly interpret some aspect about a model. You can just choose some tokens and ask a question about what the model is thinking about. Most mechanistic interpretability techniques involve at least a bit of human effort to apply them (unless you've already set them up for the specific kind of question you care about); meta-models let you just ask whatever you want.
We can get good performance on LLMs by just training on more data. It's possible we might be able to get good interpretability through finding ways to scale up model-based interpretation of model activations/parameters too (although this isn't an exact analogy to the scaling hypothesis; I don't think just training for more epochs is all we need). We might be able to scale up activation oracles (and meta-models generally) with things like:
I think the underlying idea of AOs – training an LLM to directly interpret aspects of models – is pretty cool and can probably be generalized beyond just interpreting model activations; we can probably make models to interpret other aspects of models, such as model parameters, attention patterns, LoRAs, and weight diffs.
It would be nice to be able to make an oracle that's trained on interpreting model weights and can answer questions about them (e.g. given some model weights, answering queries like "Draw a diagram of how the model represents addition" or "What political biases does this model have?"), but this is really hard: model weights are too big to fit in LLM context windows[1], it's not clear how you could train the oracle model (what supervised training data would you use?), and it would be really expensive to train a bunch of LLMs to train the oracle. Training meta-models to interpret things like individual layers or attention heads in a model seems much more tractable, and could probably give some useful insights into how models work.
One hard part about meta-models is figuring out how to train them such that they can answer interesting questions about the model. The activation oracle paper describes training the activation oracle on various supervised tasks about the activations (e.g. "Is this a positive sentiment?", "Can you predict the next 2 tokens?", system prompt QA) and having the oracle model generalize to out-of-distribution tasks like "What is the model's goal?").
Anthropic has created a new version of activation oracles (called activation verbalizers) trained using a secret new unsupervised method. They have a few examples of explanations from their activation verbalizer in the Mythos model card and it seems like it's pretty good at generating coherent explanations.
One problem is faithfulness – given that activation oracles aren't trained on directly understanding the model's goals, it's possible the activation oracle learns a purely superficial understanding of the activations that doesn't capture important information about what the model is thinking.
Evaluating how well activation oracles generalize to out-of-distribution tasks like interpreting what the model is doing (as opposed to coming up with a plausible superficial explanation) is hard, because we don't know what the correct answer is. It would be interesting to evaluate activation oracles on tasks where we can use traditional mechanistic interpretability schemes as ground truth.
I saw some interesting research with a toy example of training meta-models to directly interpret model weights as source code, but it only works because the meta-models were trained with supervised learning on examples of transformers that were compiled from source code. It would be interesting to try to generalize this beyond interpreting transformers compiled from code describing the model.
Idea for training AOs differently I thought of: take a reasoning model, create a bunch of synthetic CoTs like "<thinking>I'm thinking about deceiving the user</thinking>", train the AO to map the activations of the thinking block to the goal ("deceiving the user").
It would be interesting to interpret activation oracles themselves, to understand how they interpret the model and see what their understanding of it is. Probably a bad idea but using meta-activation-oracles to interpret activation oracles would be interesting.
I've been experimenting with new applications for meta-models (e.g. for latent reasoning models) but unfortunately training them requires a lot of compute, so I probably won't be able to afford to do much research into this myself once my free TPU credits run out. I hope this inspires you to think about meta-models for interpretability!
There are various tricks you can do here to squeeze many weights into a single token, but I don't think they would work well enough to squeeze an entire (large) language model in there.
2026-04-12 12:30:26
with AI assistance [1]
Crossposted from drmeta.substack.com.
Scope note for LW readers: this essay is about the ethics of AI-assisted creative work — what the human producing an AI-assisted artifact owes the audience, and when. It is deliberately not about x-risk, alignment, training data provenance, environmental costs, or broader labor displacement. Nor does it depend on claims about what AI "really is": the framework rests on a positional argument about accountability (the human is the one who can be held to answer for the work), not an ontological one about minds. That reframe is load-bearing and is developed across Sections II, V, and VI.
Note added hours after publication. Shortly after this essay went up, I read Audrey Henson's two-part reporting on Shy Girl at The Drey Dossier: "91 Percent Human" (March 22) and "The Shy Girl AI Scandal Is Way Worse Than You Think" (April 2). Henson's reporting complicates the picture: the Pangram scan that produced the 78 percent figure appears to have been run on a pirated PDF, the tip-off chain that carried the story to the New York Times originated with a sales employee at the detection company, and the detection tools themselves carry documented racial and linguistic bias — relevant here because Ballard is Black. None of this settles whether AI was used. It does mean the specific allegation against Ballard rests on shakier ground than the Times piece suggested, and readers should know that. The framework this essay builds is structural rather than forensic: what matters for frame fraud is what was communicated to the audience about the work's origin, and that question survives whatever turns out to be true about this case.
In March 2026, Hachette Book Group pulled a horror novel called Shy Girl from UK shelves and canceled its planned US release. [2] The book had been self-published a year earlier, found an audience among horror fans, and been acquired by Hachette's Orbit imprint, the standard trajectory for a breakout genre novel. The problem: analysis indicated the book was roughly 78 percent AI-generated. Readers had spotted it first. The telltale signs — nonsensical metaphors, melodramatic adjectives, repetitive phrasing — showed up in Goodreads reviews and YouTube video essays before any institutional actor moved. [3] The author denied using AI, blaming a freelance editor. Hachette's spokeswoman said the company "values human creativity" and requires authors to attest their work is original.
Shy Girl is probably the first commercial novel from a major publisher to be pulled over evidence of AI use. It will not be the last. The stunning fact is not that someone tried. It's that the book survived acquisition, editing, and production at one of the world's largest publishers before readers caught it. The institutions that exist to curate fiction had no framework for asking the right questions, and the contractual language they relied on — boilerplate "originality" clauses — was not designed for the problem they now face. Neither, for that matter, did the author have a framework for what honest AI-assisted creative work looks like.
That normative gap is what this essay addresses: not whether AI can write, or whether it will replace human writers, but what ethical obligations attach to AI-assisted creative work: when must it be disclosed, to whom, and when disclosure alone is not sufficient. [4] The framework it builds is explicitly transitional: triage medicine for a crisis about what work is human and what isn't, where we don't yet know whether current AI systems are human-like minds in their own right, and where the technology keeps changing, which means the answers keep changing as well. It's triage medicine for a patient that keeps becoming a different patient. But the ethical questions are urgent, so here we go.
∗ ∗ ∗
To build the framework, let's begin with a thought experiment. Imagine a robot that paints. Call him RoboPicasso. You give him instructions, hand him brushes and paints, and he produces a painting. You can tell him to regenerate it as many times as you like, for free. You can tell him to change the sky, fix the hands, shift the composition left. And critically: you can, at any point, take the brush yourself.
RoboPicasso also has...opinions. Tell him the sky should be pale and he might paint it dark anyway. Sometimes he's right. Sometimes he's confidently, compellingly wrong.
In practice, working with RoboPicasso involves three distinct modes.
Tracking is logistical. Compositional drafting is propositional — a suggestion you evaluate. Brushwork is the final artifact. The ethical questions depend on which of these three things he's doing, and how the human engages with each.
If the painting is a disaster, nothing happens to RoboPicasso. He has no reputation to lose, no career that suffers, no memory of the failure. You are the one who answers for it. RoboPicasso is an agent — he has capabilities, produces outputs, exercises something that functions like judgment — without being a moral agent or a moral patient. We've always had collaborators who can be held accountable for what they produce. Here is what's genuinely new: we have not had a collaborator that operates at this scale of fluency and rapidity, at near-zero cost, while remaining unaccountable both in theory and in practise. [5]
∗ ∗ ∗
In 2023, Daniel Dennett argued in The Atlantic that AI systems designed to pass as human represent an existential threat to trust. [6] His argument: when something communicates sensibly, we can't help attributing beliefs and intentions to it. AI that exploits this tendency produces counterfeit people, corrupting the currency of human trust.
Dennett was right about the principle but he framed the problem as a binary — AI output disclosed as such, or AI output passing ambiguously for human — and missed the third case that matters most for creative work: disclosed human-AI collaboration. A painting signed "Jane Smith, with robotic assistance" doesn't exploit anyone. The crime is specifically the concealment, not the assistance.
The concealment takes two forms, and the distinction matters. A ghostwritten memoir presented as autobiography makes the named author appear more eloquent than they are. That's a misrepresentation of degree: signal fraud. The reader assumes, correctly, that the named author was in the room — reviewing drafts, correcting the record. The named author didn't write the sentences, but their judgment shaped the substance.
Now consider Shy Girl. One could imagine an AI-assisted novelist in the same situation as the subject of an autobiography: shaping the narrative, pushing back where the AI gets it wrong, but letting "someone" else do the writing. If that novelist disclosed the collaboration, the reader would be calibrated — signal fraud at worst, same as the ghostwritten memoir. What made Shy Girl an entirely different problem — frame fraud — was not that AI touched fiction. It was that the audience didn't know. Olivie Blake praised the book as "audacious, inventive, and uniquely horrifying," then told the Times it was "truly disheartening to hear that A.I. may have been involved." [7] The betrayal was not that the book was worse than she thought. It was that she had been evaluating the wrong kind of thing entirely. Whether a given collaboration is signal fraud or frame fraud depends on how disclosure is handled — and that depends on the attributes of the work.
Signal fraud is a calibration problem. Frame fraud is categorical. The audience is not adjusting a dial. They are answering the wrong question. The framework this essay builds is primarily a guard against frame fraud: it identifies when the audience needs to know what kind of thing they are evaluating. [8]
∗ ∗ ∗
Three questions determine the structural context of any creative work, and from the answers, what ethical obligations attach. The first two determine what transparency is owed. The third determines what's at stake — and therefore what transparency alone cannot resolve.
The first two questions (originator, purpose) do the same thing: they externalize the work's value. When someone else initiates or pays for the work, a stakeholder exists beyond the creator. The ethical question is how many of these first two switches are flipped: neither, one, or both.
Map Shy Girl through the framework. The novel was self-published and sold: originator = self, purpose = transactional. One switch flipped from day one — disclosure was owed when presenting the work. When Hachette acquired it, a second switch flipped: an external party was now committing resources based on assumptions about the work's origin. Disclosure was owed before that commitment. It never came. AI use was material to Hachette's investment decision, and concealing it was not signal fraud but frame fraud — the publisher was acquiring a different kind of object than it believed.
Grounding — the third question — does something different. Originator and Purpose identify whether an external stakeholder exists and when they need to be told. Grounding determines what's at stake for that stakeholder — and therefore what the human receiving AI assistance must bring to the collaboration beyond mere honesty about it. The next two sections develop the framework necessary.
∗ ∗ ∗
Transparency tells the audience what kind of object they're holding. It cannot tell them whether the human supplied the thing that makes it worth keeping.
Francis Bacon captured the distinction: "Testimony is like the shot of a longbow, which owes its efficacy to the force and strength of the shooter; but argument is like the shot of the crossbow, which is equally forcible whether discharged by a giant or a dwarf." [11] Even in nonfiction, prose quality matters: clarity, the choice of example, the rhythm that sustains a reader through a complex chain of reasoning. But these are in service of the argument. If a policy analyst does the research, develops the argument, identifies the implications, uses AI to generate the prose, and verifies it, then the crossbow still hits. The founding vision — the thing that makes the work worth existing — remains the human's.
In literary fiction, the prose is not a crossbow. It is a longbow: its efficacy depends entirely on the force and strength of the shooter. Hemingway's ideas in For Whom the Bell Tolls are not exotic — war, love, duty, the compression of a life into three days. What makes the novel matter is those stripped-down sentences carrying what can't be said directly about killing and dying and the bridge that has to be blown. The simplicity is the craft. The value might live in the plot, the structure, or the rhythm of individual sentences, and you cannot know which until the work is done. Nothing can be declared in advance to be scut work. [12]
Shy Girl is the case in point. Suppose you bought the novel, read it, loved it. The prose unsettled you. You recommended it to friends. Then you learn it was AI-generated. The novel on your shelf hasn't changed. Your experience of reading it was real. But something has shifted, and you cannot unshift it. Olivie Blake's reaction from Section III is the mechanism in miniature: the thing she was responding to — a human creative intelligence behind the prose — was not there. Not every reader will feel this. Some will shrug: the book was good, who cares. The ethical obligation does not require predicting any individual reaction. It requires reasonably anticipating that the loss is possible — that for some audience members, the human origin is part of what they were valuing. That reasonable anticipation is what makes concealment a kind of ethically-blameworthy fraud rather than mere omission. The test for when transparency is sufficient is separability: whether the received value of the work exists independently of its creation. A policy argument survives different prose. A horror novel's prose is the novel.
The framework needs to say what the human must supply. The answer: the founding vision — the originating intelligence that determines what this work is, what belongs in it, and why it exists at all. Artistry is fundamentally compositional: a set of decisions about what to include and what to exclude. No fixed hierarchy determines which dimension is essential. Mozart and Hemingway are revered for their simplicity: the judgment of what to leave out. Pollock eliminated conventional brushwork, representation, classical composition, and what remains is pure compositional decision-making. How much paint, where it falls, when to add, when to stop. [13] What matters is who is making the decisions. To supply the founding vision is to stake yourself on the work. Martin Luther King's dream was not his speechwriter's. That is what accountability means in practice.
Ideally, AI fills the gap between a creator's strengths and a work's demands. Every creator has a profile of each. The ethical question is not whether AI was used but whether the human is supplying the thing that makes this particular work worth existing: shoring up a weak dimension so a strong one can reach its potential. AI proposes, the human judges. The judgment is the generative act, not the proposal. But what makes judgment real? A person who prompts AI and clicks "accept" is technically exercising judgment. Here is the standard I propose: the human brings a model — built through interaction with the subject matter — that the AI's outputs are evaluated against. The architect who cannot draw has a model of how light moves through space. The policy analyst has a model of causal relationships. The novelist has a model of the work's own internal necessities. The judgment is valid because it's the application of a model the AI cannot (yet!) provide for itself.
We do not have a settled account of what large language models are doing internally — whether training produces something that functions like a creator's model, or something else entirely. [14] The framework does not depend on resolving this. It depends on Harry Truman, or more specifically, with whom the buck stops. If the novelist's AI collaborator proposes a scene that breaks the novel, nothing happens to the AI. The novelist's name is on the cover. Her reputation absorbs the failure. And one observable fact sharpens the point: LLMs hallucinate. They produce confident, well-formed assertions that are sometimes false. In fiction the failure mode is different but the principle is the same. Again, Shy Girl's allegations turned on prose tells that readers identified, and a writer with a functioning model of the novel would have caught those tells before any reader did. [15] The human who cannot catch this is not collaborating. She is signing her name to work she cannot vouch for.
∗ ∗ ∗
What constitutes adequate judgment, then, depends on what the work answers to.
At the factual end, the work answers to reality. A bridge design answers to gravity. As Feynman quipped, investigating the Challenger tragedy: "for a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." [16] Anyone using AI for factual work without independent verification is handing their accountability to a blackout-drunk intern: someone who produces fluent, plausible work product and will cheerfully invent sources, statistics, and fictitious explanations. The intern's prose may be excellent. That is precisely what makes them dangerous.
AI systems already match or outperform most humans on standardized measures of legal reasoning, medical diagnosis, and code generation. [17] If the policy analyst's AI collaborator produces a better analysis than she would have produced alone, the framework doesn't require her to add yet more value, though. It demands she be able to evaluate the AI's output. When that distinction collapses — when the AI demonstrably outperforms the human and responds convincingly to every challenge — the framework must hand the AI the buck. We are not there, yet. AIs hallucinate with the same fluency they bring to genuine insight, and the failure is indistinguishable from the success without an independent model. [18]
At the imaginative end, art does not answer to an external standard the way a bridge answers to gravity. The question is whether the relationship between maker and audience is part of what the audience values. In Orson Scott Card's Ender's Game, [19] Ender plays the Mind Game for months thinking it is just software. The moment he realizes a mind is behind it, the entire nature of the interaction changes. If the audience doesn't care whether a human is behind the work, then AI capability takes over the territory entirely and no framework argument survives. But if they do care, then human effort is a constitutive part of the received value. The marketing term Artisanal carries the negative connotation of bougie excess, but it captures something real. "I wrote you a poem because I love you" is not the same utterance as "I told RoboShakespeare to compose you a sonnet because that's what you deserve," even if the sonnet scans better. And it may backfire completely if it's a hasty retcon after the recipient challenges its authenticity.
Creative work means bringing a composition into existence: a specific vision, a thing that wasn't there before someone made it so. What makes it creative is that the maker's vision is alive: responsive to the material, changed by contact with it, discovering what it wants as it develops. We have a phrase for what happens when that stops. We say someone is going through the motions. A frozen rubric applied to AI output is the same phenomenon with a more sophisticated instrument. For commodity work where the audience is buying output to spec, a frozen selector may be fine. But for work where the audience is evaluating vision, the framework needs something dynamic: not where the human is, but which direction they're moving.
∗ ∗ ∗
The test is trajectory. Not where you are, but which direction you're heading.
Consider the ancient process of apprenticeship. An apprentice at the forge begins by pumping bellows, sweeping the workshop, fetching materials. Gradually, they are trusted with more: monitoring heat, removing a blank at the right moment, putting the edge on a finished blade. The master carries the work the apprentice cannot yet do, and that share diminishes over time. What the apprentice is acquiring is both technique and a model, and the two develop together. The hands learn to swing the hammer; the mind builds a predictive model of how metal behaves. What makes a journeyman is that both have been refined through enough interaction with the material to be trustworthy.
AI can function as a master-less apprenticeship: carrying the pieces the human cannot yet handle, bearing more weight at first and less over time. The trajectory that counts is whether both understanding and execution are developing through the collaboration. If both are static — same quality of judgment year after year, same inability to anticipate — the human is dependent, not apprenticing. Static dependence fails the ethical test regardless of transparency, because the human's accountability extends no further than their model does.
Two vulnerabilities undermine this test. First, AI has no judgment about progression. It will let you pump bellows forever or hand you the hammer on day one with equal indifference. Second, and worse: current AI systems are sycophantic. They are optimized to be agreeable, and agreement feels like validation. Push back on a suggestion and Claude or ChatGPT will pleasantly fold, often with an apology that mimics insight. This is not the same failure as hallucination. Hallucination produces wrong outputs. Sycophancy corrupts the evaluation process itself: the very loop the founding vision depends on. A master who always praises the blade teaches the apprentice nothing about the blade and everything about how good it feels to be praised. Humans are also imperfect at honest feedback — but they at least have independent stakes. Your editor's reputation suffers if your book fails. AI has no such exposure.
The apprenticeship model works reliably only for self-directed learners. For everyone else, something external is needed: the audience, and the feedback loop that transparency enables. Without disclosure, the creator receives feedback calibrated to false assumptions. An editor who doesn't know the manuscript was AI-assisted evaluates the prose as evidence of the author's writing ability. The author receives commentary on a capability they don't possess. Undisclosed AI use doesn't just deceive the audience. It cheats the creator out of accurate feedback on their own development. The two thesis components — what transparency is owed and whether the human is supplying the founding vision — are not independent. They are a single system: transparency enables feedback, feedback drives the apprenticeship.
And that audience nose for AI slop — the one that caught Shy Girl before Hachette did — is a temporary advantage. It works because current AI output is still distinguishable from human work, and that gap is closing. Transparency is not what enables detection today. It is what replaces it.
∗ ∗ ∗
There is a practical objection the framework itself cannot resolve. Peggy Noonan wrote "a thousand points of light" for George H.W. Bush [20] and built an independent career from her talent for political language. Replace Noonan with AI. The politician loses nothing. What's lost is the possibility of a Noonan: a career, a body of independent thought, contributions beyond the original commission. A Stanford study found that workers aged 22–25 in the most AI-exposed occupations experienced a 16% relative decline in employment, controlling for firm-level shocks. HR analysts are beginning to warn about a parallel erosion of foundational judgment among early-career workers. [21] The damage is not to any particular work's integrity. It is to the ecosystem that produces the people capable of doing the work.
Spielberg created the Omaha Beach sequence for Saving Private Ryan in 1998 surrounded by expert craftsmen who won Oscars for their craftsmanship: Kaminski stripping lens coatings for the desaturated look, Rydstrom designing the underwater-to-surface sound transitions. By Tintin in 2011, film technology let Spielberg control the entire production without the army of specialists he'd relied on for Ryan, and it made him feel "more like a painter" than anything in his career. [22] The people he celebrated as collaborators on Ryan were the artistic dependencies he was liberated from on Tintin. Each step was celebrated, correctly, as empowering the director. The displacement was unintentional and structural, not malicious.
Here's where things stand today. In 2026, Netflix acquired InterPositive, an AI startup that handles post-production tasks previously requiring specialized craft workers. [23] From the director's perspective the tool extends what the director can do for themselves, so the founding vision is not only intact but enhanced by removing the possibility of miscommunication. From a colorist's perspective, the verdict inverts: their craft is their contribution, and the tool replaces the dimension that constitutes their professional identity. The standard rebuttal is the buggy-whip argument: technology displaces old crafts, new ones emerge, nostalgia is sentimentality. The separability test from Section V applies here too. When the audience is indifferent both to how a function was performed and to who performed it — wire removed, don't care by whom — the displacement is instrumental and the buggy-whip argument holds. When the audience values either the specific quality the human craft produced (Kaminski's desaturated look) or the fact that a human was behind the work at all (Blake's reaction from Section III), the displacement is constitutive. Kaminski's stripped lenses and Rydstrom's sound transitions are not friction between the director's vision and the screen. They are qualities the audience responds to and film history celebrates — and they exist because specific humans exercised judgment developed through practice.
Plato warned in the Phaedrus that a tool can produce the appearance of competence by obviating the process that builds it. [24] The apprenticeship model is the framework's answer at the individual level. But the systemic concern is different in kind: even if every individual user is apprenticing responsibly, the displacement eliminates professional pathways. You cannot apprentice as a colorist if colorists are no longer needed. The gain is also real: someone with directorial vision might someday make a brilliant movie on their laptop. The pathways through which creative talent has developed are genuinely threatened. The AI-mediated alternatives are promising but unproven. The AI ship has sailed, with all of us aboard. The ethical norms this essay proposes are the best bet I can see for also ensuring that the ecosystem which produces creative talent survives the voyage ahead.
∗ ∗ ∗
The ethics of AI-assisted creative work rest on three questions. Originator and Purpose determine what transparency is owed, and when. Grounding determines what transparency alone cannot resolve: what's at stake for the stakeholder(s), and therefore what the human must bring to the AI-assisted collaboration beyond mere honesty about it. That "what" is the founding vision — the originating intelligence that makes the work worth keeping. It is the test for whether transparency is sufficient.
Thread Shy Girl through the complete framework one last time. The three questions: self-originated, transactional from self-publication, imaginatively grounded. One switch flipped from day one; a second flipped when in negotiation with Hachette given assumptions about the work's origin. Disclosure was owed from the start and never came. That is the transparency failure. The founding vision is harder to judge from the outside — perhaps the author had a genuine creative vision for the novel. But the tells readers flagged — awkward repetition, incoherent metaphors — were the kinds of things a human author reviewing AI output should catch (see Footnote 15). A self-published novel selling for $1 online is a different offering from a hardcover published by a marquee label. Let's be charitable. Perhaps the author was starting on their apprenticeship and was seduced by the prospect of immediate, public success. The charitable reading doesn't change the ethical failure, or the embarrassment that ensued.
Here's how to do better. The framework's three components form a single feedback system: transparency enables feedback, feedback drives the apprenticeship, the apprenticeship is the test for whether the activity is creative work at all. And when the work is factually grounded, the stakes compound: the human's independent model is not just what makes the work creative, it is what stands between the output and material consequences that land on others.
Plato was right that new technologies diminish old capabilities, but writing opened a path oratory could not reach. Photography threatened painting but opened a different creative path with its own benefits and demands. Many a well-trod creative path has lost its monopoly to a new path laid by the march of technology, but often without disappearing entirely. Dennett was right, for now. AI passing as human corrupts trust, including in creative work. AI-assisted creative work is, potentially, something new: another path with its own essential craft, if we get the ethics right.
Which brings me to a proposal. On March 15, 2026, the Academy handed out Oscars under a rule that voters should consider "the degree to which a human was at the heart of the creative authorship." [25] That clause reaches for the right question but provides no structure for answering it. I have an idea where to start: the Hugo and the Nebula Awards for Science Fiction should create an AI-assisted category. The science fiction community has been imagining artificial intelligence since Frankenstein, and they are now living in that previously speculative future. The reader has always been able to safely assume that a human mind wrote the novel, the song or the screenplay. Until right...about...now. An AI-assisted novel might be extraordinary — founding vision intact, the human developing through the practice. It is still a different kind of achievement. A separate awards category is not a quarantine. It is the transparency idea made institutional: the evaluative frame built into the structure of public recognition itself, so the audience knows what kind of achievement they are judging as they judge it. The genre that has done more thinking about AI than any other — Science Fiction — is the natural place to test it first.
RoboPicasso doesn't care which of his three modes you use, how much of the canvas you repaint by hand, whether your compositional judgment is improving, or if and when you announce that you are using him. Those questions are yours. The ethics are in how you answer them.
ETA [2026-04-11]: Added an editor's note at the top and softened two sentences in Sections V and IX after reading Audrey Henson's reporting in The Drey Dossier. See note.
Primarily Claude Opus 4.6 (Anthropic), with Gemini 3 Flash (Google) for web research and some final reference checks by ChatGPT (OpenAI). Image generated by ChatGPT from user prompt: "An ordinary looking person watching a robot paint a scene on a canvas. The person is gesturing at the painting as if giving instructions." Yes, this is an essay about the ethics of AI-assisted creative work that is itself AI-assisted creative work, done ethically. Or so it argues. YMMV. ↩︎
Alexandra Alter, "A.I. Is Writing Fiction. Publishers Are Unprepared.," The New York Times, March 19, 2026 (paywall). Unless otherwise noted, facts about the Shy Girl case are drawn from this article. ↩︎
In a bitter irony, em dashes — a telltale sign of AI authorship — have been a stylistic choice of your human author since his earliest academic writing, including his 1991 Master's Thesis. ↩︎
The essay's scope is deliberately narrow. It does not address existential risk from advanced AI (see, for example, Harlan Ellison's cheerful 1967 short story "I Have No Mouth, and I Must Scream"), the environmental costs of running these systems, who owns the data AI was trained on, implications for a police state and other such very serious concerns, or broader socioeconomic displacement. Some are arguably more urgent. But they are different problems, and trying to address all of them in one place is a reliable way to address none of them well. ↩︎
Animals are an instructive near-miss. A service dog contributes without being answerable, but the handler subsumes accountability for the unit and owes the dog care in return. And nobody is confused about what they're encountering when a handler and a guide dog walk into the room. ↩︎
Daniel C. Dennett, "The Problem with Counterfeit People," The Atlantic, May 16, 2023 (paywall). ↩︎
Olivie Blake, quoted in Alter, supra note 2. ↩︎
For a recent real-world instance beyond Shy Girl, Grammarly's paid "expert review" generates customized AI writing advice for the user attributed to named writers — John Carreyrou, Kara Swisher, Stephen King, etc. — none of whom were asked or compensated. A disclaimer is buried in a support page. The user receives AI output while believing a very specific human expert authored it. See Casey Newton, "Grammarly turned me into an AI editor against my will and I hate it," Platformer, March 9, 2026. ↩︎
Think of responding to a literary magazine's call for submissions. The editors encounter your finished work before committing to publish it. A friend who mentions at dinner that he needs art for his new apartment is issuing a casual call for submissions. Now having an AI generate a 1,000 submissions from a single prompt, even an elaborately detailed prompt, is not ethical, of course, since the human creator in that case is imposing an enormous burden on the person who made the request: filtering. ↩︎
The shift is prospective. A creator who posted work freely and later sold it commercially owes disclosure from the point of sale forward. They have no obligation to track down copies from the free era. If the work sat in one identifiable place, updating it would be courteous. If it was scattered across the digital ocean, the obligation is to the commercial version. ↩︎
Francis Bacon, The Advancement of Learning (1605), Book II, Chapter V, §2. ↩︎
An obvious challenge: if the prose is what matters, why does it matter who produced it? Call this the RoboMe problem: an AI so perfectly modeled on a specific creator that it always produces the text that creator would have produced. But this is an iterated version of Newcomb's Problem, which is nonsensical (How does RoboMe update based upon my reaction to how the work lands?) until we have Hanson's EMs and I can update RoboMe with the latest "me" before I task him. ↩︎
Leonard Bernstein made the related observation about Gershwin: extraordinary melody, inability to assemble the melodies into a coherent larger work. "You can cut out parts of it without affecting the whole in any way except to make it shorter." See Bernstein, "A Nice Gershwin Tune," The Atlantic, April 1955 (paywall); reprinted in The Joy of Music (Simon & Schuster, 1959), pp. 52–62. ↩︎
Gideon Lewis-Kraus, "What Is Claude? Anthropic Doesn't Know, Either," The New Yorker, February 16 & 23, 2026 (paywall). ↩︎
As a case in point, your author flagged bad AI (bad!) in this essay: repetitive lists of three ("nonsensical metaphors...") when it cropped up a third time while editing Section IX. ↩︎
Richard P. Feynman, "Personal Observations on the Reliability of the Shuttle," Appendix F to the Report of the Presidential Commission on the Space Shuttle Challenger Accident (Rogers Commission Report), June 6, 1986. ↩︎
Legal reasoning: Katz et al., "GPT-4 Passes the Bar Exam," Philosophical Transactions of the Royal Society A, February 2024; Stubenberg et al., "How AI Stacks Up Against the Multistate Bar Exam," University of Hawai'i, May 2025. Medical diagnosis: Goh et al., JAMA Network Open, November 2024, found that LLMs alone outperformed unaided physicians on challenging diagnostic vignettes; physicians using the LLM did not significantly improve over conventional resources. Code generation: Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?," ICLR 2024; frontier models now resolve over 70% of SWE-bench Verified tasks (Epoch AI, February 2026). ↩︎
The autonomous-vehicle industry illustrates the two available responses. Tesla requires hands on the wheel: NTSB Report NTSB/HAR-17/02 (Williston, Florida, 2017); NTSB Investigation HWY18FH011 (Mountain View, California, 2020); NHTSA, "Additional Information Regarding EA22002," 2024. Waymo removes the human entirely by cranking the false-positive rate to maximum: Rubenfeld et al., "Tesla, Waymo, and the Great Sensor Debate," Contrary Research, July 2025. For a stark example by someone who knew better: Raffi Krikorian, formerly head of Uber's self-driving division, describes crashing his Tesla after three years of near-flawless Full Self-Driving. He had his hands on the wheel. He was not asleep. But the system had spent those three years training him to monitor rather than steer. See Krikorian, "My Tesla Was Driving Itself Perfectly — Until It Crashed," The Atlantic, April 2026. ↩︎
Orson Scott Card, Ender's Game (Tor Books, 1985). ↩︎
Peggy Noonan, What I Saw at the Revolution: A Political Life in the Reagan Era (Random House, 1990). ↩︎
Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen, "Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence," Stanford Digital Economy Lab, November 13, 2025. The paper reports that early-career workers (ages 22–25) in AI-exposed occupations experienced a 16% relative decline in employment. Gartner projects that by 2030, 30% of organizations will see worse decision-making because early-career employees never built the foundational judgment AI overreliance bypasses. Kaelyn Lowmaster of Gartner's HR practice stated: "Without the chance to learn tasks on the job, gen AI inhibits the development of the very skills and judgment that early-career talent need to avoid making costly mistakes with AI." See Gartner, 'CHROs Must Accelerate Learning and Development as Gartner Predicts by 2030, 30% of Organizations Will See Worse Decision-Making Due to Overreliance on AI,' Jan. 27, 2026; See Jill Barth, "Is AI Eating Your Talent Pipeline?," HR Executive, January 30, 2026. ↩︎
Steven Spielberg and Peter Jackson, interview, The Hollywood Reporter, 2011. DGA Quarterly, Winter 2012. ↩︎
Todd Spangler, "Netflix Acquires Ben Affleck's AI Filmmaker Tools Start-Up InterPositive," Variety, March 5, 2026. Netflix's announcement: "We believe new tools should expand creative freedom, not constrain it or replace the work of writers, directors, actors, and crews." ↩︎
Plato, Phaedrus, 274c–275b. The god Theuth presents writing to King Thamus as a gift that will improve memory and wisdom. Thamus rejects it: people will appear wise without being wise. ↩︎
Academy of Motion Picture Arts and Sciences, 98th Academy Awards Complete Rules, Rule Two, Section 7 (April 2025). ↩︎
2026-04-12 11:12:07

And so are you!
When you were a fetus, you were sending millions of your cells through the placenta into your mom. And she was sending her cells into you, although to a lesser degree. These cells made themselves right at home, differentiating into heart, blood, and even brain cells. This phenomenon is called feto-maternal microchimerism, and is one of the wildest things in placental mammal pregnancy.
Microchimerism is generally defined as the presence of a small population of genetically distinct cells in an organism. When the fetus sends cells to the mother it's called Fetal Microchimerism (FMc) and when the mother sends cells to the fetus it's called Maternal Microchimerism (MMc). The actual cells are fetal microchimeric cells (FMC) and maternal microchimeric cells (MMC).

yes somehow this is the official pubmed diagram
You may have heard that the placenta is an organ that provides oxygen and nutrients to the fetus during development. Which is true. What's less commonly known is that the placenta also does bidirectional cell and genetic material trafficking, similar to drugs and humans across the US-Mexico border. Most of these cells are quickly killed by the mother's immune system immediately after to two weeks after birth, but some have been found in people's brains after three decades. How does a fetus cell cross the blood brain barrier and become a brain cell? No one knows! It's also an open question if these "brain cells" can functionally integrate with the mom's brain circuits and process neuronal activations: They merely "adopt locations, morphologies, and expression of immunocytochemical markers" of host neurons but further research is needed to determine if they have physiological significance. Weird!
There are many theories but no one knows for sure why it happens, what these fetal sleeper cells are up to, or if it's net negative/positive for the mom.
There is evidence FMCs can help with maternal wound healing. Unlike adults, human fetuses before the second trimester can regenerate wounds without scarring. Fetal cells have been observed homing into and gathering in damaged tissues of almost all maternal organs. For example, a study (n=1230) found that patients with peripartum cardiomyopathy (heart dysfunction occurring in late pregnancy or shortly after birth) had a roughly 70% lower risk of death than general cardiomyopathy patients (you can literally fix your mom's broken heart!). FMCs have also been found in skin wounds such as C-section scars. They figured this out by using Fluorescent In Situ Hybridization (FISH) to stain X chromosomes red and Y chromosomes green. In the picture below, the blue circles are nuclei and the arrows point to the ones with both X and Y chromosomes (male fetus DNA in a woman!)

All sorts of fun things like preeclampsia, spontaneous preterm labor, rheumatoid arthritis and other autoimmune disorders. FMCs also increase transfer of resources to the fetus (a con for the mom but pro for the fetus). They route more nutrients through the placenta during the pregnancy, and cause physiological changes in the mother after birth (e.g. in lactation, thermoregulation, and attachment systems). As a fetus, you are kind of at war with your mom. You would totally want her to spend limitless resources on only you, but she might want to not do that and prioritize her own survival and potential future offspring.
Call your mom
When we were kids my mom would often tell my brother and I we were her 心肝 (Xīngān, heart and liver). This might sound weird but it's a common Chinese term for endearment, like "honey" or "darling" in English. It's interesting to learn now that this is also sort of literally true.
Mothers and their children often have a special bond, and it might be cool to know a part of you is always in your mom, and a part of her is always in you (unless you didn't have a good relationship, in which case don't think about it).
There are many open questions in this exciting research field. In the future, perhaps we can study these FMCs and learn how they do their thing, making better organ transplants, stem cell therapies, hybrid animals, or brain replacement for eventual digital uploading.
Yo momma so fat she has room for two humans worth of allogeneic DNA in her neurons, bone marrow, and organs