2026-04-08 09:08:05
(This post is mostly about why cybersecurity is easier to automate and not why AI R&D is harder.)
Recently Anthropic said they had grown a model, Claude Mythos Preview, that "can surpass all but the most skilled humans at finding and exploiting software vulnerabilities" but "does not seem close to being able to substitute for Research Scientists and Research Engineers, especially relatively senior ones". It's pretty interesting that we're at a point with AI capabilities that we can (apparently) surpass almost all cybersecurity researchers, but AI researchers still have skills that are hard to automate.[1] What makes cybersecurity research so much easier to automate than AI R&D? Is it just easier? I am still pretty uncertain about why this is the case, but I have some thoughts about why cybersecurity research has been automated first.
I've done a bit of white-box (i.e. with source code access) security research,[2] so I figured it might be useful to explain what that process looks like. (Mythos is also good with black-box testing, which I would guess is broadly similar but I'm not as familiar with doing that.) My main process for doing white-box security research is a series of nested loops where I try to go from a large codebase with a lot of non-problematic code to a narrowed-down set of interesting code paths which I try really hard to exploit. Essentially it looks like:
As a diagram:
I used the word "interesting" a lot in that process description, and it's kinda hard to describe exactly what I mean by that. It's kinda a large bag of heuristics for looking at code and being able to identify what seems like it might be problematic, based on what issues I've seen before and my model of how the developers might have messed up.
If I had unlimited time to audit a codebase, I wouldn't need to have these heuristics about interestingness though, because I could just look at everything! I could just carefully trace through every line of code in every function, and verify that everything is correct. In reality though, this would be extremely time-intensive and boring. I think I'd be able to rediscover most security bugs myself if you told me exactly which lines to look at; the hard part is knowing where to look (especially for bugs that involve a complex interaction between different parts of the codebase). (It would be pretty interesting to do an experiment where you ask people with varying levels of cybersecurity experience to identify a vulnerability given the problematic lines of code.)
Another sometimes-difficult part of cybersecurity research is reproducing issues. Sometimes it's easy to just manually test an issue, but often issues only arise when the system is in a weird state, or involve a lot of thinking about how to cause an edge case to be triggered. Increased general coding abilities straightforwardly make it easier to verify potential issues, and also make it easier for models to probe systems being tested to find interesting behavior.
My impression is that Claude Mythos is probably fairly good at "security taste" (identifying what bits of code would be interesting to analyze for security issues) but not quite at skilled-human level. But it can make up for that by just spending much more time looking at the code and doing the kind of boring, painstaking work of tracing through many more code paths. And pursuing a bad lead usually doesn't waste too much time in cybersecurity land; it doesn't take large amounts of compute or money to validate ideas.
So essentially: cybersecurity research is hard because of search difficulty: you have to look at a lot of things and do a lot of pruning to find issues, and models can make up for less pruning with more compute. I think AI R&D requires much more "research taste" than cybersecurity; finding new ways to improve LLM capabilities involves much more of having good intuitions about what will probably work and what won't. It's harder to brute force your way through that because it takes much longer to validate ideas for improving LLMs: doing even a small training run takes much longer than validating fairly complex security bugs. The feedback loop for LLM experiments is much longer than for cybersecurity research because of asymmetry in how easily you can verify ideas.
To be clear, the authors of the model card are probably biased here because they're probably AI researchers themselves, and also because high AI R&D capabilities probably would at least delay the release more.
Some of my research is public, but only about half of the issues I've found.
2026-04-08 08:56:36
This wasn't intended to be a topical post, but Claude Mythos's system card is out, and... well.
I wrote years ago about decision analysis, which often focused on atomic actions in small situations. In the real world, people take large numbers of actions in very large situations, where there is uncertainty not just over which of a few consequences will happen, but over what sort of consequences are even possible.[1] Dealing with the computational constraints becomes a major part of practical wisdom, rather than the basic math of the ideal case. Actions need to be considered as part of a portfolio; outcomes need to be considered based on their impact on a vector of intermediate variables instead of their ultimate impact on a single utility. Heuristics (like "an ounce of prevention is worth a pound of cure") and their evaluation is often more important that tracing out specific outcomes or assigning probabilities to them.
In particular, in financial markets people often talk about "hedging". For example, suppose you're a farmer that grows wheat and has dollar-denominated loans and expenses. You might find that the variation in the price of wheat is larger than your expected profits, and want to sell some of your risk to a commodities trader. (Suppose wheat sells for somewhere between $4 and $8 a bushel, you expect to grow 100 bushels, and you have $550 in total costs. In the median world, you make $50; in the worse case world, you lose $200, and you lose money in the bottom ~third of worlds.) If you place a bet that the price of wheat will be low, it will be valuable when your wheat is cheap and costly when your wheat is profitable, balancing things out and smoothing away some of the price variation, and so you can decide how much exposure you want to the variation in wheat prices. (Of course, this service comes at a cost; the commodity trader also needs to be making an expected profit or they wouldn't be doing this.)
The same sort of reasoning applies in the physical world. If the weather forecast says there's a 10% chance of rain on the hike, and I decide to bring an umbrella, this is in some sense a 'bet on rain'. I lose if it's sunny (I now have to carry a worthless umbrella) but I win if it's rainy (I now don't get as wet).[2] The act of 'looking into the dark'--asking how things can go wrong, and then what actions could mitigate them--is a helpful heuristic for avoiding catastrophe or ameliorating its harms.
I should note that hedging is distinct from changing the percentages involved; by rescheduling my hike, I can affect the probability of rain, or if I deployed a weather control system (like seeding clouds earlier), I could also affect the probability of rain. This is important but not the subject of this post.
Some risks cannot be usefully hedged against. Suppose I'm worried about the USG deciding to default on its interest obligations, and thus I might want to somehow make a bet that pays off in worlds where Treasuries become less valuable. Unfortunately, I basically don't think such counterparties exist; in any world where the USG defaults, the financial system basically comes undone.[3] It looks more like "bring an umbrella", except it's food and gold and guns.
And for some things, there is no umbrella.

Nevertheless, it's worth thinking about the minority outcomes. Even if my best guess is that there's an AI race that's disastrous for humanity where I can't much affect the outcome, in some worlds it doesn't happen. Chase the value you can chase, even if it only happens in a minority of worlds, and so I think of a lot of my goals and projects as hedging for survival.[4]
For example, my spouse and I sold our AI equity, in part because of specific beliefs about the underlying company, but mostly because of survival-weighting. In worlds where we're still around to enjoy the money in 2040, it's probably a world where OpenAI equity became worthless, one way or another, and so in 2025 it made sense to trade OpenAI units for money.[5]
This isn't to say you should ignore actions that change the probabilities (you can find photos of me at the recent protest to stop the AI Race, for example), or that you shouldn't decide how much to invest in impact based on the overall survival probability (I've been playing a lot of video games). It's to say that even doomers should plant some trees.

Two avocado trees that I sprouted from pits in early 2023, and recently transplanted from pots to my garden. It normally takes an avocado tree about a decade to bear fruit. (And unlike grafted branches, where you can know the quality of what it produces, sprouted trees are brand new genetics with unknown quality.)
In a world of unbounded computation, you could use something like Solomonoff induction to consider all possible outcomes, but I'm going to focus on bounded computational contexts, like human decision-making.
Note that while the financial markets are in some sense 'efficient' or 'unexploitable' because the commodity trader is a sophisticated counterparty, this isn't true for the physical world. Sometimes you can get massive profits by doing things like 'carrying an umbrella' because the world isn't out to get you, or trying to take their half of the gains from trade.
For example, I looked into shorting Tether a few years ago and came to the conclusion that this basically wasn't possible, because any interested counterparty would probably collapse in the event that I was trying to be paid off in.
For example, SHELTR weekend was explicitly this, for me; "biorisk is only a few percentage points of my expected future, but it's a few percentage points that I can plausibly affect." It turned out less plausible than I had hoped, but was worth looking into nonetheless.
It seems like, at least at present, the market has caught up with our beliefs; tragically it's just the ones about the relative value of OpenAI and Anthropic.
2026-04-08 08:51:34
Previously in this series: Elementary Infra-Bayesianism
Earlier last week I got nerd-sniped by a paper called Condensation: a theory of concepts (Eisenstat 2025). It’s the kind of paper where the abstract makes a claim so clean you assume you must be misreading it: roughly, there is a right answer to “what are the concepts in this data,” and any two agents who carve it up well enough will agree on what those concepts are.[1]
If that sounds like John Wentworth’s natural abstractions hypothesis, yes, the family resemblance is strong. I wrote about something adjacent a while back. Condensation is a different formalization, but the punchline rhymes: structure in the data constrains what any good representation can look like. People on LessWrong seem to dig it, and I wanted to see what the fuss was about.
The paper is forty pages of math and gives you no algorithm; it tells you what a good carving looks like, not how to find one. I spent a slightly embarrassing amount of time trying to get the basic objects to do something on a computer. This post is how far I got.[2]
Say you observe three tokens[3]:
cat dog cat
I'll define a concept as a piece of information about that data.[4] “The topic is animals” is a concept. “Token 2 is dog, not cat” is also a concept. Concepts can come with a scope: the set of tokens a concept is about. The topic concept has scope {1,2,3}, because knowing the topic tells you something about all three tokens. The “token 2 is dog” concept has scope {2}, because it’s only about token 2.
A representation is a list of (concept, scope) pairs. With three tokens there are seven possible scopes ({1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}); a typical representation puts something at a few of them and leaves the rest empty.[5] Which scopes get concepts is the structure the theory cares about.
One rule that representations need to satisfy for us to care about them: from the concepts whose scope includes token
The star of this show, condensation, then gives you a score function

Three tokens (cat, dog, cat), four concepts filed at their scopes. A query about tokens {1,2} reads every concept whose scope overlaps {1,2}: the topic (scope {1,2,3}), id
As I said up top, the paper doesn’t tell you how to find a good representation. The main theorem (4.15) says that any two representations that score well enough will end up filing roughly the same concepts at roughly the same scopes,[7] which is the sense in which concepts are “out there.”
Let me make that concrete with a small example that still has all the structure we care about.
The data: a fair coin picks a topic, animals or tools, and then each of three tokens is independently one of two words from that topic (cat/dog or hammer/saw). There are four bits of information total: one topic bit (shared across all three tokens) and three token-identity bits (private to one token each). Here are some example draws:
token 1 |
token 2 |
token 3 |
topic |
cat |
dog |
cat |
animals |
hammer |
saw |
hammer |
tools |
dog |
dog |
dog |
animals |
saw |
hammer |
saw |
tools |
Three representations of the same data:
Trivial. Don’t bother finding shared concepts. File each raw token at its own scope.
scope |
concept |
bits |
{1} |
id₁ |
2 |
{2} |
id₂ |
2 |
{3} |
id₃ |
2 |
Ask about tokens 1 and 2: you read 4 bits, but the true joint information is only 3 bits because they share a topic.[8] The topic bit is sitting inside both raw tokens, and you read it twice. Every multi-token question overpays.
Oracle. File the topic at {1,2,3} and each token-identity bit at its singleton.
scope |
concept |
bits |
{1,2,3} |
topic |
1 |
{1} |
id₁ |
1 |
{2} |
id₂ |
1 |
{3} |
id₃ |
1 |
Every question costs exactly its entropy, which is the best you can do.
Misfiled. Same four concepts as the oracle, but topic at scope {2,3} instead of {1,2,3}.
scope |
concept |
bits |
{2,3} |
topic |
1 |
{1} |
full token₁[9] |
2 |
{2} |
id₂ |
1 |
{3} |
id₃ |
1 |
Ask about tokens 1 and 2: full token₁ (2 bits) + topic (1 bit) + id₂ (1 bit) = 4. The topic is in there twice, and you pay for both copies.
So the score does what you’d hope: lowest when each shared concept sits at exactly the scope it’s about, and it tells you which queries overpay when one doesn’t.
We built all of these by hand, so nothing deep is happening yet. The question is whether a representation constructed from the activations of a neural network looks more like the oracle or the trivial one.
Now suppose the concepts come out of a language model like GPT-2 or Claude rather than being hand-built.
I trained a tiny 4-layer transformer on the three-token topic data from §3. A single weighted sum of the residual stream (the running vector that each layer reads from and writes to) recovers the topic with perfect accuracy:

Tokens go through a tiny 4-layer LM; a linear probe on the residual stream recovers the topic (“animals”) with 100% accuracy. The concept is there; the question is what scope to assign it.
So the model has learned the right concept. The question is: what scope do we assign it?
One naive answer is “the feature’s scope is all the tokens the model has seen so far,” since that’s what the activation depends on. But that gives the same answer for every feature, so it can’t distinguish a feature about one token from a feature about the whole sequence.[10]
A better method here is mutual information: scope = the set of tokens the feature is correlated with. The topic is correlated with all three tokens (each token’s first bit is the topic), so MI says scope {1,2,3}, which is the oracle representation from §3. The score
At scale, the standard way to get candidate concepts out of a language model is a sparse autoencoder (SAE): you learn a large set of directions in the residual stream such that, for any given input, only a few of them are active. Each direction is called a feature, and the hope is that “feature k is on” means something interpretable: this is about cooking, there’s an open bracket, the subject is plural. An SAE gives you features, but it does not give you their scopes, so you still need a method like MI to tag each one.
The three-token toy was reassuring, but it was four bits of hand-built data. The question that matters is whether the score does anything useful when the concepts come out of an actual language model on actual text. To find out, I wanted to carefully expand the domain of the experiment along two axes: model size (from a tiny 4-layer transformer up to GPT-2 small) and dataset complexity (from planted ground truth up to real text).

I sweep the threshold
A planted toy where I know the ground truth. Eight tokens, with seven planted shared concepts arranged in a binary tree: one about all eight tokens, one about tokens 1–4, one about 5–8, and one about each adjacent pair (1–2, 3–4, 5–6, 7–8). Each is a yes/no flag that’s “yes” 15% of the time. I pack them into six dimensions so they overlap a little and the SAE has to actually un-mix them.[16] Some example sequences:
tok 1 |
tok 2 |
tok 3 |
tok 4 |
tok 5 |
tok 6 |
tok 7 |
tok 8 |
active flags |
0.31 |
−0.42 |
0.87 |
−0.15 |
0.63 |
−0.29 |
0.44 |
−0.71 |
global, pair₃₄ |
−0.55 |
0.12 |
−0.33 |
0.68 |
−0.21 |
0.45 |
−0.62 |
0.19 |
half₅₋₈, pair₇₈ |
The oracle representation (true concepts at true scopes) scores
TinyStories. 50,000 eight-word openings from children’s stories. The positional structure here is real: “once upon a time there was a” lives at positions 1–7, and it’s the most common pattern by far. I ran this with both a small 4-layer transformer and GPT-2 small.
sentence opening |
once upon a time there was a little |
one day a girl named lily went to |
the sun was shining and the birds were |
tom and his mom went to the park |
Induction. A setting designed to have one very specific shared concept. The prompt is something like
cat dog bird fish bee cat ___
where five random words are followed by a repeat of word 1, and the model should complete with word 2. GPT-2 small gets this right 75% of the time. I read the residual stream at the repeat (token 6), where the model has just recognised “that’s cat again.” The shared concept the features should pick up is “word 1 = cat,” scope {1,6}: it’s about token 1 (where cat first appears) and token 6 (where it appears again). The words come from a fixed pool of 50.
On the planted toy, the SAE recovers most of the ground truth: it gets 86% of the way to the oracle score, and the threshold
On TinyStories, the ordering holds: SAE outperforms PCA, which outperforms random. The gaps are smaller than the toy (real text has less clean shared structure than seven planted flags), but consistent across both the small model and GPT-2 small. The SAE’s top feature fires on “once upon a time there was a”: one yes/no concept that tells you something about all eight tokens at once. PCA’s top feature fires on whether the last token is a function word: a concept about one token.[18]
On induction, PCA wins, and not by a little.
In the toy panel, the SAE’s best score sits at the true 15% rate and nearly touches the oracle line. In the induction panel, PCA’s curve keeps falling while SAE’s turns back up.
I want to dwell on this, because my prior was “SAE beats PCA” and the score disagreed. The shared concept is “which of 50 words is word 1,” not a yes/no flag but a 50-way choice worth
This is how far I got. The score does something: it distinguishes SAE from PCA from random in the right direction on controlled data, it finds ground-truth thresholds without being told, and it delivers at least one genuinely surprising result (PCA beating SAE on induction). But I want to be clear-eyed about what this is and isn’t.
In this setting, the condensation score measures whether a decomposition’s inductive bias matches the structure of the shared concepts in the data. SAEs assume shared concepts are rare yes/no flags, PCA doesn’t assume that. So when the shared concepts are rare yes/no flags (the toy, TinyStories), SAE wins and when the shared concept is a 50-way categorical (induction), PCA wins. When there’s nothing shared, or the model never computed it, neither beats trivial.
A few things I think this buys you, if it holds up:
Scope is half the concept. Interpretability mostly treats a “concept” as a direction in activation space, full stop. Condensation says it’s a (concept, scope) pair, and §4 showed that scope assignment is a real choice and there's an opinionated score that allows us to compare choices. “This feature means
Feature splitting has a signature. When an SAE breaks one underlying variable into many features, those features all land at the same scope, and
You could use this to choose decomposition methods per-circuit. Different parts of the same model plausibly have different kinds of shared concepts: induction is a categorical choice since “is this Python” is a yes/no flag. The score gives you a per-circuit reason to pick the decomposition rather than committing to SAEs everywhere, which is roughly where the SAEs-are-disappointing discourse has been heading anyway. And nothing here is SAE-specific; the pipeline takes any features-from-activations method.
The theory is beautiful, and I have a lot of research ideas for how to fill in some of the implementation details:
That’s a lot of open questions before this approach would be ready for crunch-time deployment.[22]
But the biggest gap is that I have not tested the paper's actual theorem: that two good-scoring representations agree on what concepts they find. Everything above is "here's a new scoring function for decomposition methods, and it gives sensible rankings." That's useful, but it's an eval metric, not evidence that concepts are real. Condensation's claim is stronger: any two representations that score well enough should converge on the same (concept, scope) pairs, and that's what would make this about natural abstractions rather than about SAEs. Testing this could be relatively straightforward: train two SAEs with different seeds, extract their top concepts, and see whether χ agreement (Theorem 5.8) actually holds. I might do that in a follow-up, but I wanted to publish this much first, because I think more people poking at this independently is more valuable than me polishing in private.
Theorem 4.15, if you want to look it up. The actual statement is about amalgamations of latent variable models and is considerably more hedged than my gloss.
Most of the code was written by Claude in a long pair-programming session, which is the way things go these days. The mistakes in interpretation are mine.
Note that 'tokens' here is a choice I'm making, not something the theory demands. Scopes are defined over whatever observations you pick: token positions, syntactic constituents, document sections. Different choices give different possible scopes and a different theory of what the concepts are about.
A note on terminology: in the paper, a “concept” is technically the full (concept, scope) pair, not just the piece of information. I’m using “concept” more loosely to mean the piece of information itself, and “scope” separately, because that matches how most people in interpretability already think about features. The distinction only matters when you read the theorems.
The paper calls the concepts
“Bits” throughout means information-theoretic bits: the entropy
Theorem 4.15. The agreement is “cumulative”: the information at-and-above any scope matches, even if two representations distribute it across the levels differently. There’s an approximate version (5.8) for representations that score well but not perfectly, which is the one that matters in practice.
Why two bits each? Each token is one of four words (two topics × two words), uniform, so
The rebuild rule forces this: {1}’s concepts have to be enough to reconstruct token 1, and “id₁” alone doesn’t cut it without the topic. So {1} stores the full token (2 bits, topic baked in).
You could also train the SAE on all possible truncations of each context and check whether a feature appears at each truncation length. This would give fine-grained scope information but is expensive; I didn’t try it. The mechanistic interpretability community has mostly focused on attributing features to predictions rather than to input tokens, using techniques like attribution patching (Nanda 2023) or circuit tracing (Anthropic 2025). These are closer in spirit to the attribution method than to MI.
People noticed the SAE connection in the comments on Demski’s post. As far as I know nobody’s actually computed the score before; that’s what the rest of this post does.
The threshold is essentially a choice of firing rate in a rate-coding scheme: above what activation level does a feature count as “on”? The analogy to spike-rate coding in neuroscience is not exact, but the tradeoff (too high and you lose signal, too low and everything fires) is the same.
The best possible
The threshold
There are roughly a dozen choices in this pipeline and I won’t pretend they’re all principled. One aspect worth highlighting: each token is also stored raw at its own scope (so the rebuild rule is always satisfied, but it means most of
Six dimensions, seven concepts: each concept is a random binary vector in
I also ran three settings where the answer should be “nobody wins”: random 12-token windows from the open web (no slot-aligned structure for anyone to find), a templated dataset with independent slot-fills (structure, but none of it shared across positions), and two-digit addition (GPT-2 can’t add, so the carry bit isn’t in the residual to be found). All three null out, with every method within ~0.1 bits of trivial. The open-web null is worth being careful about: it doesn’t mean “real text has no shared concepts,” just that it has no concepts shared across token positions in a random window. “Token position” was a choice I made in step 1, not something the theory handed down. The TinyStories results work because sentence-aligned openings have positional structure. A different choice of what the
I checked these aren’t just me squinting: standard auto-interp protocol (show an LLM 12 examples, ask for a one-line explanation, score whether the explanation predicts held-out examples). Mean balanced accuracy over the top-8 features: SAE 0.62 ± 0.18 (the “once upon a time” feature alone scores 0.95), PCA 0.55 ± 0.05, ICA and random ~0.52. Chance is 0.50.
You could read this as “you starved the SAE, give it 50 features and it’d win.” Maybe. But there’s a condensation-native reading I like better: the theory wants one concept at scope {1,6}, and the SAE shatters it into fifty features all at the same scope, feature splitting.
This result also convinced me the score isn’t circular. The worry: scope is defined as “tokens the feature is correlated with,” and
One fun thing to think about here is whether oldschool computational linguistic style tagging of *syntactic constituents* (noun phrases, verb phrases, clauses) might usefully constrain the power set to something tractable. Scoring only syntactic constituents, or weighting by a parse tree, would make $N$=50 tractable and would be a fun thing to be wrong about.
Also untouched: using
Attribution (Meng et al. 2022, Heimersheim 2024): delete the topic feature, and predictions for tokens 2 and 3 get worse (the model was using topic to narrow them down), but the prediction for token 1 doesn’t change (token 1 is predicted from nothing, before topic is known). The reason is structural: in an autoregressive model, the topic doesn’t exist until token 1 has been read, so the model can’t use it to predict token 1, so attribution can’t see that it’s about token 1. This gives the misfiled representation from §3, and the score overpays at exactly the queries you’d expect.
Attribution and MI aren’t even two estimators of the same thing; they’re different questions. Attribution asks “which predictions does this concept affect”; MI asks “which tokens is this concept about.” If you care about the model’s behaviour, attribution is right; if you care about the data’s structure, MI is. Condensation as written is about the data.
2026-04-08 08:43:00
Hey all, I'm a 20yr US college student right now, and I'm planning to take a gap year this year. Does anyone have and ideas/recommendations on how I could use this to effectively improve/save lives?
I'm currently considering doing 3 month internship with my local college in Kenya to assist and teach entrepreneurs in an effort to break them out of the poverty cycle.
If anyone has ideas on how I could better use this year, please let me know! I'm not afraid of going international or doing hard work :)
2026-04-08 08:29:05
TLDR: A complete guide to juggling, from zero to siteswap notation, by someone who juggles in nightclubs.
I take my juggling balls with me wherever I go. Train stations. Airports. Nightclubs. You name a place I've been, and there's a decent chance I've juggled there. When I'm bored, I just whip out my balls and start having a play. And people watch, and sometimes join in.
I've been teaching people to juggle more or less since I started doing it in public. Especially in places like airports, many people drop their phones to watch me drop balls instead. At this point, I will often go up to them and offer them my balls. Some are too nervous to take my offer, but I can regularly get at least one to bite.
For those of you innocent bystanders who have never been granted the opportunity to learn the basic technique, here it is. This instruction set is best paired with three 115-g, 68mm Cascade Thud juggling balls filled with millet seeds. Or 3 random pieces of fruit you don't mind getting bruised. Or indeed any 3 vaguely round objects which fit into your hand, don't bounce too much and can take a hit.
Step 1: 0 balls
Stand with your feet and hands shoulder-width apart, elbows at 90 degrees. Tilt your head just slightly up and look at the sky while relaxing your shoulders. This is your starting position. Most people can do this.
Step 2: 1 ball
Take one ball. Resume your initial position, and practise throwing it from one hand to the other. Your aim is to get your throws accurate enough that you don't have to move your other hand to catch the ball – there's a saying in the juggling world that if you throw the throws, the catches catch themselves. The ideal arc goes from one hand, reaches its peak just above your eyeline, then lands on the joints between your fingers and hand, forcing your hand to close around it. You then throw the return throw by giving a smallish impulse coming from the elbow (the rest of your joints should stay more or less fixed – fewer degrees of freedom makes life easier). If you get something like this on your first couple of throws, you are already doing better than most people. If you don't, do not worry. It comes pretty quickly with practice.
Step 3: Fixing your 1 ball mistakes
The first common mistake is reaching up to catch the ball. Don't do this; you're going to want as much time as possible when you move on to the later steps, and this reduces your time available. Another mistake is completely ignoring the ball and staring into the distance. I'm not entirely sure why, but I've seen it a bunch more with rats than anywhere else. In any case, I would recommend you just casually glance up at the ball as it reaches the top of its arc. This is better than both ignoring the ball, which results in you not catching it, and following it the whole way, which means you then can't handle more than one ball.
Step 4: 2 balls
Start with one ball in each hand. Throw your first ball along the trajectory practised with 1 ball. Wait as long as possible, then throw the second ball along the same trajectory, in reverse. DO NOT JUST PASS THE SECOND BALL BETWEEN YOUR HANDS. This is a common thing, as people are regularly taught it, but it scales really badly as a juggling pattern[1]. In order to avoid the balls colliding in mid air, you want to throw the second ball just to the inside of the arc of the first. Throw the second ball, then shift your hand back to the outside to catch the first. Make sure you practise starting with both your right and left hands.
Step 5: Fixing your 2-ball mistakes
A common mistake here is throwing the second ball too early. It really should be that the first ball coming in to land triggers the second ball. The later you can throw it, the more time you will have later on, and you will need more time later on. Another common mistake is throwing the balls at different heights. Both of them should be following the nice trajectories you plotted out with 1 ball. A final mistake is throwing the second ball behind the first, rather than inside. This isn't fatal but leaves you a lot less to play with once you start trying to do 3-ball tricks.
Step 6: 3 balls
You might be surprised to learn that 3 balls works exactly the same way as 2. Start with 2 ball in one hand and one in the other. Throw from the hand with more balls, then do a normal 2-ball swap (i.e., do precisely what we've just been practising) on the other side. You now have one ball in each hand and the second ball flying towards your originating hand. You can now do another 2-ball swap with that hand. Catch the 3rd ball and you have now completed a 3 ball "flash"[2]. Don't forget to tell all your friends you've just flashed your balls. To get better at juggling, all you now need to do is increase the number of two-ball swaps before you catch everything. Once you get 6 catches (i.e 2x the number of balls, or a "qualify"), you can officially say that you can juggle 3 balls.
Step 7: Fixing your 3 ball mistakes
The first mistake I commonly see with 3 balls is throwing the balls too quickly. This results in less time to process everything. If this is happening to you, go back and spend a bit more time practising with 2 balls[3]. Another common issue is throwing the balls forwards. There are a couple of ways to fix this. One is to juggle in front of a table or bed, which a) stops you from walking forwards while juggling b) catches the balls for you nicely when you drop them. Another is juggling in front of a wall, which actually just fully stops you from sending the balls forward.
So, you now know how to juggle. Wonderful! I hope it brings you great joy. You now have a plethora of options for how to develop further.
The easiest way to upgrade from basic 3 ball juggling is to start learning some tricks. For this you're going to need to learn some new throws.
The easiest throw to learn is the outside throw: rather than throwing on the inside of the incoming ball, throw on the outside. You can then mix and match these throws to form a range of patterns, such as tennis. Play with it, have fun.
A few other fun things to try include under arm throws and catches, behind the back throws and catches and overhead juggling. In particular, I would recommend learning Mill's Mess, a great pattern where the balls look like they're following each other.
So maybe you don't want to be doing fancy schmantzy tricks, and just want to juggle as many balls as possible. This is a path for the brave.
Learning to juggle 3 balls (as you of course know by now), takes anywhere from a few hours to a week or so. Learning to juggle 4 balls took me about a month.
The basic steps are simple: Learn to juggle 2 balls in one hand (essentially the same as 2 balls in 2 hands, but you're throwing more vertically). Next, do this in both hands at the same time. This is called a fountain pattern, as opposed to the 3 ball "cascade". Fountain patterns only work with even numbers, and cascades with odd, so they naturally complement each other as the canonical way to juggle n balls.
5 balls took me 4 years to learn. You read that right. 4 years. It is, like 3 balls, a cascade pattern. You're doing precisely the same thing, but you're throwing everything fast enough that you get an extra 2 balls in the air before you start catching anything. A large amount of the difficulty comes from the fact that balls are no longer only crossing at the point where you throw and catch. You now have to deal with the fact that 3 balls are going to cross each other in mid-air, and if you don't get your timing right they will collide. Good luck.
The next target after 5 balls is actually 7 balls. 6 is easier, but by the point you've learnt to juggle 5, you've spent such a long time juggling in a cascade pattern that 7 feels like a more natural step. It took me 2 years of specifically training for half an hour a day to get to the point where I managed to get 14 catches with 7 balls once.
This is where juggling gets weird. And (somewhat) mathematical. Thus far, we've only talked about patterns where all the balls are thrown to the same height, balls are thrown one after the other, right hand following left, and life is simple.
Siteswaps are not like that. The principle behind it is actually very simple; each throw and catch happens on a "tempo". The only restriction is that for each throw/catch, you cannot have more than one ball arriving at a time[4].
You then notate a pattern by giving each throw a number; a throw that arrives 3 tempi later is a 3, 4 tempi later is a 4, etc. This works out nicely to mean that juggling 3 balls in a normal cascade pattern is just denoted "3", which generalises to juggling n balls being denoted "n" in siteswap[5].
0, 1, and 2 are also possible throws - 0 means an empty hand, so no ball is thrown, 1 means a pass (what I told you not to do when juggling 2 balls) and 2 just means holding a ball in that hand for the tempo.
Thus, a 51 would be a pattern where one throws the balls high from one side to the other, and they get passed immediately to the other hand.
A nice property of siteswap is that the average of the digits is the number of balls in the pattern. Thus, 75 is a valid 6 ball pattern, but 67 would require 6 and a half balls.
A fun thing to try if you have a few friends is passing balls to each other.
The simplest trick here is juggler A has 3 balls, juggler B has 2. A starts juggling. At some point, they pass one ball to B, who uses it to start their pattern.
Once you've mastered this, you can move up to patterns like passing all the balls to each other, stealing balls from each other, and continuous passing (e.g., a ball goes around the hands in a circle).
One can also juggle a number of other implements, of which the main items are clubs and rings.
I have limited experience in this sphere, but I found 3 rings easy to pick up after having learnt with balls. The main trick with clubs is to hold it at its centre of mass, so that it predictably rotates around the right point.
So far I have told you about some of the more common things that juggling is. It is significantly harder to define what juggling is not. In fact, if you come up with a rule for what juggling is, there's a good chance someone will break it and invent a new form of juggling.
Bounce juggling, Contact juggling, and erm, this?
This pattern of juggling is called the "shower", while the more standard style that I'm teaching is called the "cascade". Even just juggling 3 balls is harder as a shower pattern, but notice that when scaling up the number of balls, all of the airtime is generated by one arm. This means that for the same number of balls you either need to be throwing them twice as fast or four times as high. The most balls I was able to find being juggled in a shower pattern was 8 by this dude. The world record in cascade is 14. Note that as I explain later, as this is an even number of balls, it technically happened in the "fountain" pattern, where you juggle 7 in each hand.
The first time I got seven catches with seven balls, I excitedly rushed into my sister's room and told her I had just flashed 7 balls. She was extremely concerned for my wellbeing.
In fact I would recommend moving up and down between numbers of balls regularly. I tend to find that doing exercises which are "too easy", like 1 ball juggling actually just help your base skills, while exercises which are "too hard", like e.g 4 ball juggling for a 3 ball juggler, help you process things at a faster speed, and make challenges at the "right level" feel easier. Broadly speaking it's good to have diversity like this in your training regime to become a well rounded juggler.
There are actually variations of siteswap notation which handle these possibilities, but we'll ignore them for this post.
Patterns which require 10 ball throws or higher are notated with the alphabet - A6 would denote an 8 ball pattern made up of 10 ball throws and 6 ball throws. I'll tell you what happens with 36 or higher the day you show me a pattern which uses throws where 36 ball height is necessary.
2026-04-08 08:05:30
In which you attend Inkhaven II and learn that a trifle is sort of like a Giga tiramisu
[not previously in any series, because you have never finished one]
There is a compound in Berkeley. It has whiteboards in the hallways, houses named after dead mathematicians, a podcast room, and weighted blankets. The kitchen is heavily stocked in a way that suggests both abundance and a particular theory of human optimization: fresh fruit, labeled leftovers, and industrial quantities of Soylent.
You have been accepted to spend April there, writing.
The first cohort wrote 1.7 million words and all 41 finished, a fact prediction markets priced so thoroughly that one resident who tried to fail was overruled by collective certainty. You are hoping to do slightly worse, to preserve the mystique of human free will.
You arrive on a Wednesday. The architectural theory of the place becomes clear almost immediately: someone built the nooks first and constructed the house around them. Every room is organized around a corner, an alcove, a recessed seat, a window ledge wide enough for two people and a laptop. The nooks are the point. The walls are load-bearing in the structural sense only.
The lobby has the energy of a place optimized for small, quiet conversations in those nooks. You recognize it immediately: the architectural equivalent of a first message on Manifold.Love. I’m quirky but approachable. I contain multitudes. I have whiteboards.
You find yourself wondering, not for the last time, how nerd bloggers afford a zillion dollar property in Berkeley. You do not ask. There are many things you do not ask.
“Hey!” says a guy whose lanyard says BEN. “Welcome! Have you published today?”
“I just got here.”
“Right, right. Just checking. The deadline’s midnight. Some people like to get Day One out of the way early.”
You will in fact not get Day One out of the way early. You will hit publish at 11:47 PM every single night for 30 consecutive nights, each time swearing it will be different tomorrow, the way someone swears she’ll start going to the gym after just one more week of not going to the gym. In the meantime, the Slack will fill with evidence that everyone else is having a richer, more interesting residency than you are.
#activities
BOTC tonight
Shake & Bake (Shakespeare puns + bread)
Wine, cheese, poetry
Ecstatic dance
You attend none of these. You are writing about mechanism design in potluck coordination while others produce both Coriolanus and rye bread.
* * *
Breakfast is self-serve. The kitchen has the faintly sacred aura of a place where Eliezer Yudkowsky has once microwaved something.
“Do you want to help settle a scientific question?” someone asks. “Gwern claims microwaved water makes bad tea. We’re doing blind taste tests.”
“You’re replicating a Gwern experiment.”
“Blind taste tests, then crossover with boiling chips. Seven people so far. Also someone’s organizing a full Replication Club—pick a famous psych study, replicate it in under 24 hours, write about it. The replication and the write-up each count as separate blog posts. Two days of content for the price of one day of actual work.”
“That’s gaming the system.”
“That’s literally what the organizers say. They call it Goodharting on the correlation. It’s on the website. The system was designed to be gamed.”
A woman nearby is typing with the grim focus of someone defusing a bomb.
“Fractals,” she says, not looking up.
“Mathematically?”
“Emotionally. Grief is self-similar at every scale.”
“That seems like a claim you’d want to be careful about.”
“I have 500 words and fourteen hours. Careful is a luxury belief.”
From another room: “First of all, making bank is incompatible with human dignity. Jot that. No, that’s a joke.” You check Slack. It’s already in #inkhaven-quotes.
* * *
You head to the garden to write. The gardens are beautiful. They are exactly the kind of gardens that make you think “I could write something beautiful here,” and then you open your laptop and stare at a blinking cursor for forty-five minutes while a hummingbird judges you.
A man sits down on the next bench. He has a physical notebook. With a pen.
“Aren’t you going to need to type that up?”
“No. I photograph each page and upload the images to Substack.”
“Can your readers actually read your handwriting?”
“The ones who deserve to can.”
He’s gathered a small audience. “Have you thought about using Claude to transcribe it?” asks a woman in a Lighthaven hoodie.
The notebook man winces as though she has suggested he microwave a kitten. “Claude is the thing I am warning you about. You feed it everything the blogosphere has ever written, it averages it all out, and now anyone can produce text indistinguishable from a mid-tier Substack post. We’re not training writers here. We’re creating training data.”
* * *
Lighthaven is approximately 40% nook by volume. The nooks are not quiet. They only look quiet.
In the first one: two people are debating whether the Voynich manuscript represents a constructed language, a natural language with unknown orthography, or an elaborate hoax. “The statistical properties make a simple hoax unlikely,” one says. “The entropy signatures look like natural language.” “Unless the hoaxer knew enough about natural language statistics to fake that.” They both go silent, considering this.
In the second: a woman is explaining medieval Chinese maritime trade networks to a man who keeps saying “wait” and drawing invisible diagrams on his knee. “So the Song dynasty is essentially running a navy to support and extract from private merchant shipping—” “No, not supporting. Taxing. The protection is incidental to the revenue model.” “That’s just a state.” “That’s just a state,” she agrees.
In the third, you sit down because there is nowhere else. A man is mid-sentence: “—which is why the fine-structure constant being dimensionless is either a deep fact about reality or about how we parameterize it, and you can’t fully disentangle that from inside the system.” The woman across from him nods. “Same problem in historical linguistics—you can reconstruct Proto-Indo-European phonology, but there’s underdetermination. You can’t tell how much is the language and how much is the method.” “So the map is always the territory.” “The map is always partly the territory.” “Which is what the constant is telling us.” “Which is what Wittgenstein was telling us.” “Wittgenstein was telling us everything.” “Wittgenstein was telling us nothing expressible.” They both write something down. You have been there four minutes and cannot identify the original topic. You are not sure there was one.
Your friend Nate is in the fourth nook, reading Slack. He is not a resident; he has simply materialized, the way people at Lighthaven do. You’ve learned not to ask how people got in.
“Someone’s looking for hermeticism, Gnosticism, or Neoplatonism,” he says. “Also linear algebra. Also someone lost a MacBook.”
“Do you know about any of those?”
“I’m a Bay Area house party regular. I’m approximately four conversations from knowing about everything.”
* * *
Lunch is communal. Someone rings a bell and there are announcements.
A woman raises her hand. “What’s the best place to get feedback on a draft?”
The advisor does not hesitate. “Claude.”
The man who winced in the garden stares at his soup.
* * *
After lunch there is a workshop, placed directly between the kitchen and bedrooms so people will attend accidentally. Gwern is speaking.
“The blog format is wrong.”
“We are at a blog-writing residency.”
“Yes. You are producing date-stamped ephemera. This is not how knowledge should work.”
“How should it work?”
“Like a wiki.”
“You agreed to be a writing coach here.”
“And as your coach, my advice is that this format is wrong. You’re all free to leave at any time.”
Fifty-five people stare. Nobody leaves. The sunk cost of six published posts outweighs any argument.
In the hallway afterward, someone is explaining Rationalist Monopoly with intense conviction. “The main thing about Boardwalk is that there’s a card that sends you directly to Park Place, and then you just die.” You do not ask questions. It is already 4 PM and you have 112 words.
* * *
Slack has become a parallel residency.
#activities
Meditation: Is anyone willing to lead us along the path of spiritual and/or neurophysiological enlightenment?
Jiujitsu: I notice a few BJJ people. Not sure whether Lighthaven has mats.
Does anyone want to go out for 45 minutes and talk to strangers?
Goth night at DNA Lounge. We have 1 or 2 spots.
Diplomacy: 7 players in the Winner’s Lounge. One move per day. (Warning: may lead to genuine IRL beef.)
They are playing a game specifically designed to destroy friendships, in a shared living space where they cannot escape each other, under daily deadline pressure. This is the most interesting experiment at the residency and nobody has thought to write about it yet.
Someone proposes a “Posting to Policy Pipeline”—described as “making our political dreams memes, AKA Demosthenes & Lockeposting.” Someone else organizes a Nathan Fielder discussion group—one-time, they stress, “with the intention of developing ideas for posts.” The careful hedging tells you everything about how Inkhaven has restructured these people’s relationship to leisure. Nothing is recreational anymore. Everything is content.
You notice the Slack interface looks slightly different than it did this morning. The profile icons on the blog site begin each day in the color white. People who have posted today: gold. People who haven’t: a pale amber. By 6 PM the amber has deepened. By 8 it is orange. Lucie has updating the UI throughout the week - making the accountability visible in real time. By 9 PM the people who haven’t posted are glowing a red that could charitably be called coral and is in practice closer to warning. By 10 it is simply red. By 11 it is a red that has opinions about you; it begins to drip blood. Nobody has asked for this feature. Nobody wants it removed.
* * *
10:30 PM. You have 247 words. Your icon is red.
In the kitchen, four people are writing. The fridge hums. A row of identical Huel cups stands like lab equipment.
“I started with information cascades,” says a man with bloodshot eyes. “Now it’s a memoir about my father teaching me to ride a bicycle.”
“How does that connect?”
“It’s a metaphor.”
“For what?”
“I’ll figure it out in the next 160 words.”
J walks through in a Hawaiian shirt and sunglasses. “Another beautiful day at Lighthaven Resort & Spa.” The quotes channel pings.
* * *
You come to understand that #inkhaven-quotes is the real literary output of the residency. It is the accidental novel being written in the margins of the intentional one.
#inkhaven-quotes
“Are they toaster license libertarians, or are they cool?”
“Tax fraud is a moral obligation in our society.”
“Is bubble tea cereal?” “Yes. Obviously.”
“The stick will help you.” “The stick will help me.”
“The glasses with two lenses in each lens, what’s that called?” “Bifocals?” “No, the woke version.” “…Progressives?”
“This is quite acceptable, as a bread product.”
“A trifle is sort of like a Giga tiramisu, right?”
“There’s a different accordion here today than there was yesterday.”
“I met the beast. The beast defied me coming in.”
The residency’s insight clarifies: the 500 words are not the product. They are the forcing function that keeps fifty-five writers in the same building long enough for the real things to happen.The writing is the excuse. The living is the content.
* * *
You publish just before midnight. 503 words. It is not your best work. It exists. Your icon goes golden.
Slack pings once more. Someone updates their self-description from “somewhat online” to “very online.” Someone else says this place is “like Beverly Hills but with cool celebrities instead of pretty celebrities.” Neither will make it into anyone’s 500 words. The best observations never do. They go to the quotes channel, which has no word minimum and no deadline, and is therefore the only place at Inkhaven where anyone writes freely.
Tomorrow: breakfast, cursor, hummingbird, nook, deadline, publish. Again and again. Fifty-five people. Thirty days. 500 words minimum. The prediction market has you finishing. Your free will is a rounding error.
Thanks to Ben Pace for running a second cohort after the first one somehow worked; to the fifty-five residents of April 2026, currently seven days in and still standing; to Gwern for returning to coach a format he considers wrong; to Lucie for the accountability gradient nobody asked for; and to whoever left their MacBook in the Winner’s Lounge. The machine needs to know its place.