MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Substrate: Formalism

2026-04-26 16:06:45

This is the third post in a sequence on substrates - the layers of computational context that allow AI to be implemented in real systems. The sequence expands on the concept of substrates as described in this paper and was written as part of the AI Safety Camp project "MoSSAIC: Scoping out Substrate Flexible Risks," one of the three projects associated with Groundless.

We claim that AI safety and security research currently has no clean way to reason about these. Post 1 introduced the intuitions for what substrates are and why they matter. Post 2 showed how substrate choices (LayerNorm placement, quantization format, DRAM topology) influence safety-relevant model properties in ways that are not capture by any standard toolkit. This final post introduces the formal framework.


In the previous posts, we looked at choices below the model architecture level (like normalization, weight encoding, and memory layout) and saw that they affect things we care about, like refusal behavior, robustness, and jailbreaks. But we didn’t have a clear way to say where these effects were coming from or how to describe them.

This makes it harder to think clearly and design good evaluations, because the current terms mix up different ideas. In this post, I try to separate these ideas so we can reason about and compare models more clearly. If you can’t name a gap, you can’t design around it. And you can’t compare two deployments of the “same” model without saying what “same” means in different settings. This post is a step toward that.


Four Things a Substrate Is Made Of


We’ll begin with a concrete example and introduce notation only after that. The 4-tuple we arrive at is not arbitrary: each part answers a different question, and the banking example will show why.

Alice wants to send Bob €500. She can do that in several ways. She can use her bank’s website, call the bank and speak to an operator, write a cheque, or use a payment app. The intended action is the same, but each option uses different syntax, different processing systems, and a different interface to the outside world.

If everything works, the abstract result is the same: Alice’s balance goes down by €500 and Bob’s goes up by €500. That is the part we care about for things like fraud detection or account reconciliation. In most cases, whether the transfer happened through a website or over the phone does not matter.

Now consider what happens when something goes wrong. A fraud detector that only watches web transactions will miss the same fraud done over the phone. A system that only tracks transaction IDs will miss cheque-based fraud entirely. What the evaluator sees depends on the interface it uses, not on the underlying behavior.

That is the basic idea the substrate formalism is meant to capture.


The Formal Definition

Now we can abstract. The banking example had four moving parts:

  • the set of ways Alice can describe her intended transfer (web form fields, spoken words, ink on a cheque), call this the language
  • the process that turns a described transfer into an actual change in account balances, call this the semantics map
  • the real-world costs of each channel (time to process, fees, error rates, staffing), call this the resource profile
  • the part of the system a monitor or auditor can actually see (the web transaction log, the phone call reference number, the cheque image), call this the observable interface



Gemini_Generated_Image_x1ir3dx1ir3dx1ir.png


The banking example mapped onto the four substrate components.



Packaging these four components together gives the definition.

A substrate is a 4-tuple , where:

  • is the language, the set of syntactic expressions, encodings, or programs the substrate can accept;
  • : is the semantics map, a function assigning to each syntactic object an abstract behavior , where is a fixed space of abstract behaviors;
  • captures resource profile, the computational budget available to the substrate (time, memory, energy, numeric precision, etc.);
  • is the observable interface, which determines which aspects of behavior in are externally visible, via an observation map .


. The set of syntactic objects a substrate accepts. For a Python interpreter, valid Python programs; for a GPU, compiled CUDA binaries; for a language model, the tokenized input plus the model parameters. is the description of the computation, not the computation itself. Different elements of can look unrelated syntactically and still lead to the same behavior.

. The meaning map. It sends a syntactic object to an abstract behavior . The choice of depends on the setting: for programs, input-output functions; for language models, conditional next-token distributions or decision policies. This is where behavioral equivalence lives: even if .

. This is the resource profile. Real computational systems always come with limits: time, memory, precision, parallelism, and so on. Two systems that look similar may still behave differently because of these constraints. For example, the same neural network run in float32 and bfloat16 may produce different functions. A trillion-parameter model may be possible in principle on paper, but impossible in practice to execute by hand. We use to capture these limits. Depending on the context, you can think of either as what the substrate cannot afford to do or as what it can afford to do. The key point is that is a separate part of the substrate, and it cannot be folded into or without losing important information.

. This is the observable interface, and it may be the most important part for safety. The observation map tells us which parts of the abstract behavior are visible to a given evaluator. Different interfaces can expose the same behavior in very different ways. A safety evaluator reading model outputs on benchmark prompts is using one interface, ​. A researcher probing internal activations is using another, ​. A red-teaming system measuring refusal rates on a dataset is using yet another, ​. The same abstract behavior can produce different observations under these interfaces, and something that exists in may be visible through one interface but hidden through another.

The full computation pipeline that a substrate participates in can now be written out explicitly. Given an input :

  1. is encoded: an encoding function produces .
  2. The substrate interprets : the semantics map returns .
  3. The interface observes the behavior: .
  4. The observation is decoded into an output: .

The end-to-end computation is the composite .

The banking example maps onto this pipeline directly. is Alice's intent to transfer €500 to Bob. is the encoding of that intent, the button clicks, or the spoken words to the phone operator, or the ink on the cheque. is the abstract financial transaction that results. is the particular monitoring system watching the channel. is whatever downstream action (or non-action) results from that observation.

Screenshot 2026-04-24 140845.png

The commutative diagram for substrate computation


What the Definition Clarifies


The first thing this definition separates is syntactic identity from behavioural identity. Two programs can be written in completely different languages, use very different amounts of memory, and still compute the same input-output function. The reverse can also happen: two programs can look almost identical syntactically but behave differently once the resource profile matters. A simple example is the same neural network forward pass run in float32 and in bfloat16. The code may look nearly the same and apply the same operations to the same tensors, but near the edge of the input distribution the outputs can differ because bfloat16 loses precision that float32 keeps. The difference is not in the high-level description alone, but in the substrate. Without separating the syntax from the resource profile, these two kinds of difference get mixed together.

The second thing the definition clarifies is that the same abstract behavior can be realized by multiple substrates. and may satisfy for all inputs , even when , , and . This is the formal version of saying "Quicksort is Quicksort, whether you write it in Python or C++." The framework lets you make that statement precise and conditional: precise because it appeals to a shared , and conditional because the equation ​​ may hold only on the inputs where the resource profile of each substrate is actually met. A pen-and-paper execution of a trillion-parameter network, for instance, is not so much behaviourally different as simply infeasible, and on infeasible inputs the semantics map is not well-defined to begin with.

The third clarification, and the most important for safety, is about observability. Consider two scenarios:

Scenario A: A model’s abstract behavior includes a pattern of outputs that would count as a refusal failure for some inputs. But the evaluator’s interface ​ only looks at benchmark prompts from a distribution that does not include those inputs. So ​​ maps the behavior to outputs that do not show the failure. The evaluator sees nothing wrong.

Scenario B: A researcher changes the interface to include the inputs that trigger the failure. The same abstract behavior now produces that reveals the problem.

In both cases, the underlying behavior in is the same. The difference is only in . The substrate framework makes this clear: the safety-relevant gap is between and , not between the model’s abstract behavior and some “true” behavior.


Modular Addition as the Main Worked Example



The main example comes from mechanistic interpretability. It is useful because the task itself is completely well understood modular arithmetic. The abstract behavior is the same across the models, but the way the models implement it is very different.

The Setup

Zhong, Liu, Tegmark, and Andreas (NeurIPS 2023) trained small transformer models to do modular addition: given two inputs and , predict . The task is fully specified. For any model that gets 100% accuracy, the target behavior is the same: the function from pairs of inputs to residues mod 59. In substrate terms, the models share the same target behavior in . What changes is how that behavior is implemented inside the model.

The Clock Algorithm

One group of models, called Model B, uses a standard one-layer transformer with attention. These models implement what the paper calls the Clock algorithm. The idea is simple: each integer from 0 to 58 is represented as a point on a circle, so that a number maps to an angle for some frequency . Addition then becomes angle addition. The network computes the right angle using trigonometric identities and the multiplicative structure of attention and then reads off the corresponding output number.

Concretely, each token is embedded in a way that encodes its angle on the circle. The attention mechanism produces products that combine information from the two input positions, and those products encode the sum through the angle-addition formula. The logit for a candidate output is then maximized at the correct residue.

The Clock algorithm needs multiplication between the inputs. It uses a special feature of attention: attention weights multiply the value vectors they route, so the model can combine terms across the two input positions. That kind of cross-token interaction is exactly what the trigonometric identity needs. So the Clock algorithm is the natural solution when attention is the main source of nonlinearity.

The Pizza Algorithm

Another group of models, Model A, uses constant or uniform attention. These models implement something different, which the paper calls the Pizza algorithm. Instead of working on the circle itself, this algorithm works inside it.

The key geometric fact is this: for a fixed target residue, all input pairs that produce that residue have midpoints that lie on one specific ray from the origin in the 2D embedding plane. These rays divide the disk into 59 “pizza slices,” which is where the name comes from. To decide the output, the network checks which slice the midpoint falls into.

The logit formula is different from the Clock case by a multiplicative factor. That factor makes the Pizza algorithm depend on the distance between the two inputs on the circle, while the Clock algorithm does not.

The Pizza algorithm only needs absolute-value nonlinearity, not multiplication. Once the midpoint is computed as a linear operation, the problem becomes checking which side of certain lines it lies on, and absolute value together with linear layers can do that cleanly. So ReLU layers can implement the whole pipeline without the cross-token multiplication that the Clock algorithm needs.


image.png

Illustration of the Clock and the Pizza Algorithm (from Zhong et al., 2023)


What This Means in Substrate Terms

Now let us map this onto the 4-tuple. Define two substrates:

  • : the set of token-pair inputs .
  • : the Clock forward pass circular embeddings followed by attention-mediated angle addition (i.e., the forward pass of a transformer with )
  • : architectural affordances of a full-attention transformer - cross-token multiplicative interactions are available
  • : input-output interface; reads off the argmax logit

  • : the same token-pair inputs
  • : the Pizza forward pass circular embeddings followed by midpoint-and-slice-detection (i.e., the forward pass of a transformer with )
  • : the resource profile of a linear-layer-dominant model (no multiplicative attention overhead)
  • : the same input-output interface
Gemini_Generated_Image_16dxdz16dxdz16dx.png

Both substrates achieve 100% accuracy. Sharing the same and the same target function, we have: , for all . Observationally, the two substrates are identical. Their -level outputs are the same on every input.

But Pizza​​ as elements of . If we take to be the space of logit functions, the Clock substrate and the Pizza substrate realize different functions.

When does this matter? If we only look at the interface (which reads the argmax), both substrates look the same, they give the correct output on every input. But inside , they differ in important ways: the representations they use, the nonlinear operations, the gradient structure, how logits depend on , and how they behave under perturbations.

The paper shows this by using a different interface that probes internal gradients and logit patterns. Under , the two substrates are clearly distinguishable.

Distance and Morphisms

Having defined what a substrate is, we can now define two relational concepts: how far apart two substrates are, and what it means for one substrate to translate into another.

Distance Between Substrates

Given a substrate with encoding , define its realized behavior set as the image of all possible inputs under the full encoding-semantics pipeline:

This is the set of abstract behaviors can actually produce, not all behaviors in but the ones reachable given the substrate's language and encoding. A substrate with a very restricted encoding function has a small, realized behavior set; a substrate with a rich language has a large one.

We can then define a similarity measure between two substrates by comparing their realized behavior sets. Writing :

When , the two substrates realize exactly the same behaviors, they are behaviorally indistinguishable, even if their internal mechanisms differ completely. When , they have no behavioral overlap at all; nothing one can do the other can do. Values in between quantify partial overlap.

image.png

Interaction between the realized behavior sets of two substrates

A concrete example: let be a linear classifier. Its realized behaviors are all linearly separable decision boundaries, a strict subset of all possible classifiers. Let be a two-layer neural network with ReLU activations. Since depth-2 ReLU networks are universal approximators in the limit, is much larger than and contains as a proper subset. In this case: .

The distance quantifies the capability gap, how much of 's behavioral repertoire is inaccessible to . A way to read in this case: any behavior is a task that ​ can perform and ​ cannot, a capability possessed by the richer substrate but absent from the poorer one.

In the Clock/Pizza example, if we take to be the space of logit functions, then contains just the Clock logit function and contains just the Pizza logit function. These are disjoint singletons in , so under this choice of . That means the two substrates realize different behavioral elements, even though they agree on the input-output task. If instead we choose to be the space of input-output maps, then both realized sets collapse to the same singleton and the distance becomes 1. So, the value depends on how we choose , and that choice is part of the framework.

One important subtlety is that this similarity is not a metric in the strict mathematical sense, because it does not always satisfy the triangle inequality. Two substrates can each be equally similar to a third substrate without being similar to each other. That is not a bug; it captures the idea that two models can each be equally close to a reference model in capability, while still being very different from one another. We present this as a proposed definition, and whether a true metric can be built from it is still open.

Conclusion

The formalism introduced here is meant to do one thing: give us precise vocabulary for the layer of computational implementation. A substrate is a formal object that locates, separately and explicitly, the syntax a system ingests, the abstract behavior it produces, the resources it consumes, and the interface through which its behavior is observable. The distance and morphism notions tell us how to compare substrates and when one can be translated into another.


Appendix - A Worked Sorting Example

Let be the space of functions from finite integer lists to finite integer lists. We use sorting as a running example because it is one of the few settings where we can cleanly vary language, algorithm, and hardware substrate independently while holding the abstract behavior, a sorted list, fixed.

Substrate :

  • : valid Python programs
  • : CPython interpreter (maps code to its function)
  • : CPU execution, float64, limited parallelism (GIL), no vectorization
  • : standard I/O; reads output from stdout

Substrate :

  • : valid C++17 programs
  • : compiled binary (maps code to its function)
  • : native execution, SIMD available, no interpreter overhead
  • : same I/O; same observations

A Quicksort implementation in Python and in C++ yield ,the same sorting function in . A semantics-preserving transpiler is a substrate morphism. differs substantially (interpreted vs. compiled), but the behavior in is identical.

Now replace Quicksort with Mergesort on the same substrate. In , both compute the same function: they map an unsorted list to a sorted list. But their interaction with , especially the CPU memory hierarchy, is different.

Quicksort sorts in place. It partitions around a pivot, recurses on subarrays, and accesses memory in a data-dependent pattern. This works well on small inputs that fit in cache, but on large inputs it causes more cache misses because access is irregular. Its recursion depth is on average and in the worst case.

Mergesort, by contrast, accesses memory sequentially during merging. That is exactly what hardware prefetchers are good at, so it often hides memory latency better on large inputs. The tradeoff is extra space: it needs scratch memory, which can itself create pressure on the cache.

So although both algorithms have the same semantics in , they consume resources differently through . The abstract behavior alone cannot see this; the difference lives in the interaction between the algorithm and the substrate.

That is exactly what the 4-tuple is meant to capture. A framework that only tracks would treat Quicksort and Mergesort as equivalent. A framework that only tracks resources would miss that they compute the same thing. The substrate view keeps both visible.

For sorting, the realized behavior sets satisfy in , so all three have pairwise distance . Their differences, language overhead, access pattern, cache behavior are all in , and are invisible to an interface that only sees sorted output.



Discuss

The paper that killed deep learning theory

2026-04-26 14:55:00

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.

Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. And the paper didn't come close to addressing all theoretical approaches to understanding aspects of deep learning. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1] 

Believe it or not, this unassuming table rocked the field of deep learning theory back in 2016, despite probably involving fewer computational resources than what Claude 4.7 Opus consumed when I clicked the “Claude” button embedded into the LessWrong editor.


Let’s start by answering a question: what, exactly, do I mean by deep learning theory?

At least in 2016, the answer was: “extending statistical learning theory to deep neural networks trained with SGD, in order to derive generalization bounds that would explain their behavior in practice”.


Since the seminal work of Valiant in the mid 1980s, statistical learning theory had been the dominant approach for understanding machine learning algorithms. The framework imagined a data distribution D over inputs X and outputs Y where the goal was to fit a hypothesis h : X → Y that minimized the expected test loss for a loss function L : X × Y → R over D. A learning algorithm would receive n samples from the data distribution, and would minimize the training loss averaged across the sample L(h(x), y).

The core results of this approach took the form of generalization bounds: given some metric of complexity of the hypothesis class H, bound the difference between the average training loss and the test loss in terms of this metric of hypothesis complexity. To put it in less technical terms, a generalization bound basically says:

If your hypothesis class is not too complicated relative to the amount of training data you have, and it explains the training data well, then it will generalize and do well on the full data distribution.

The field of statistical learning had settled on a few preferred ways to measure complexity: VC dimension and Rademacher complexity being the two main metrics, though some researchers considered alternatives such as the margin between positive/negative example from the classification boundary.

The success of modern deep learning, starting from the early 2010s, posed something of an existential crisis for this field. By all the metrics – including both VC Dimension and Rademacher complexity – even a simple MLP with sigmoidal or ReLU activations represents far too complicated a hypothesis class to not immediately overfit on the training data. If the VC dimension results for a neural network are assumed to be asymptotically tight up to constraints, then no neural network with even 100,000 parameters should be able to do anything useful on data points not included in the training data. Yet, not only were neural networks performing better than other machine learning algorithms, by the mid 2010s there was a growing list of examples where neural networks with tens of millions of parameters solved problems (such as the ImageNet challenge) that no other machine learning algorithm could make much progress on.  

This classic XKCD was published in September 2014, right about the time where neural networks started to make image classification viable without years of dedicated research effort.



Clearly, neural networks did generalize. If traditional metrics of complexity, based on the representation capacity of the class of neural networks with arbitrarily specified, infinite precision floating points, failed at capturing the simplicity of neural networks in practice, then the field simply needed to construct new simplicity measures to argue that neural networks learned simple functions in practice.

This was the approach taken in several papers around the time. For example, Neyshabur, Tomioka, Srebro’s Norm-Based Capacity Control in Neural Networks (2015) constructed a complexity measure based on the Frobenius norm of the weight matrices in a deep neural network. Hardt, Recht, and Singer’s Train faster, generalize better: Stability of stochastic gradient descent (2015)[2] showed that neural networks trained with a small number of SGD steps with sufficiently small step size were uniformly stable in that removing a single training example would not change the model’s loss on any particular test example by very much.

At least when I first entered the field of deep learning as an undergrad in early 2016, there was a sense of cautious optimism: we would find the way in which neural networks in realistic regimes were simple, and thereby derive generalization bounds that would be applicable in practice.


So, what did Zhang et al. 2016 actually show? Why did understanding deep learning require rethinking generalization?

To quote the paper, the “central finding can be summarized as: Deep neural networks easily fit random labels”. Specifically, the authors trained neural networks on the standard-at-the-time CIFAR10 and ImageNet benchmarks to memorize random labels, while following standard procedures and training for the same order of magnitude of steps. They also show that with similar techniques, neural networks could be trained to memorize random noise inputs.

From the introduction of Zhang et al. 2016. You know that a paper is going to be impactful when its central finding is exactly 7 words long.

Why is this an effective death knell for the simplicity-and-generalization-bound approach? The authors' results show that the same class of neural networks, trained with the same learning algorithm, can generalize when given true labels and memorize random ones. This shows that the hypothesis class of neural networks that are learnable with standard techniques cannot be simple in any useful sense, at least for complexity measures that depend only on properties of the hypothesis class and (data-independent) properties of the learning algorithm.


The paper has 5 important parts. Let's go through each of them.

  1. The core empirical finding that neural networks can fit random labels. The authors train a 1- and a 3-layer MLP, an AlexNet variant, and an Inception variant on CIFAR10. They train the models normally (with the true labels), as well as four ways of corrupting the dataset: random labels (replacing each label with a random class with some probability), shuffled pixels (the same permutation on pixels is applied to each image), random pixels (a different random permutation is applied to each image), and pure Gaussian noise (replacing every single pixel with an independent draw from a Gaussian). In each of these five cases, the network gets to near 0 training loss. Notably, while training with random labels is harder, convergence to zero training loss takes only a factor of 1.5-3.5x longer than with the true labels. And by varying the degree of label corruption, the authors can produce models that either generalize to the test set to varying degrees or perform no better than chance.

The key figure from the Zhang et al. paper: subfigures (a) and (b) show that neural networks are able to perfectly memorize random labels without many additional training steps, and (c) confirms that models in their regime interpolate between chance performance and good performance on the test set.

The authors also train an InceptionV3 model on ImageNet with random labels, and find that it can get 95.2% top-1 accuracy on the train set.  

The ImageNet results from the paper are similar in that neural networks can memorize random labels to a large extent. Unlike with the CIFAR10 results, the authors also report the extent regularization impedes the memorization ability of the network (not very much).


  1. The implications for statistical learning theory approaches to generalization bounds. These experiments show that in realistic regimes, Rademacher complexity and VC dimension bounds are basically vacuous, since neural networks have enough representational capacity to memorize entire training sets. Hardt and Recht's (both authors on this paper) prior results on uniform stability also are necessarily vacuous in this setting, since it’s a property that only depends on the algorithm and hypothesis class (it’s data-independent!), but the algorithm and hypothesis class stays the same in each experimental setting.
  2. Further experiments demonstrating that explicit regularization cannot rescue generalization bounds. The authors show that on both ImageNet and CIFAR-10, explicit regularization methods such as data augmentation or weight decay do not seem to affect the test accuracy of the algorithms very much. That is, the neural networks generalize to the test distribution even without any regularization. The authors also show that on ImageNet, applying dropout or weight decay still allows the resulting model to memorize the training set to a large extent. So any generalization bound that depends on regularization (e.g. weight norm-based explanations) cannot explain why neural networks generalize.
  3. A simple toy construction that showed a two-layer ReLU network can memorize a number of examples linear in parameter count. The authors include a simple theoretical result, where a depth-2 ReLU network with 2n+d weights can fit any labeling of a sample of n data points in d dimensions. This feels pretty extraneous to me given the strength of the empirical results, but the construction is simple and it confirms the intuition that neural networks with millions of parameters “should” be able to fit tens of thousands of data points in the CIFAR10 setting.  
  4. Some notes on how statistical learning theory fails even in a simple overparameterized linear regime. The authors consider a basic overparameterized linear regression setting, and show both empirically and theoretically that SGD can learn a minimum norm solution that generalizes. The authors point out that statistical learning theory at the time had no explanation for generalization in this simple regime.
    They also demonstrate empirically that smaller norm doesn’t imply better generalization – by applying preprocessing to an MNIST dataset to increase its effective dimensionality for a linear classifier, the resulting larger linear classifier has higher norm but less generalization error (this result also undercuts the weight-norm based approach to explaining generalization in neural networks).
    Amusingly, the quick thoughts put forth by the authors in this setting would go on to become quite influential, both in that people would study the behavior of SGD in overparameterized linear regimes, and that it hints toward future puzzles such as double descent.

So, how did the field of deep learning theory react to this paper? What were the attempts to get around this result using data-dependent generalization bounds? And what was the paper that arguably sealed the deal on the whole edifice, and nailed the proverbial coffin shut?

I'll answer these questions in tomorrow's post.

  1. ^

    Notably, Zhang et al. 2016 got best paper at ICLR 2017, so it was widely recognized as important even at the time.

  2. ^

    Note that both Hardt and Recht were also authors on the Zhang et al. paper.



Discuss

The Great Smoothing Out

2026-04-26 14:16:44

I recently explored an interesting thought experiment about the Culture while talking with Max Harms. Specifically Max argued that the Culture is far from a perfect utopia, while I felt that it was way better than almost anything else I could think of.

7ec158_7613137e82f548b88f588bca83085e71~mv2.avif


One of the key cruxes in the discussion was that the Culture fails to preserve any meaningful method of honouring their ancestors or cultural traditions. It all collapses into individual utility maximisation.

Given that there are people who feel a strong compulsion to preserve their culture, how does that translate to the state of civilisations in the far future, say thousands of years from now? In my mind it seems ridiculous that in this future civilisation, people will continue to honour ancient traditions such as Japanese folk dancing, but I think there is a decent chance such traditions do continue to exist because of the local incentives people have to protect them.

Let's imagine an extreme version of this and work backwards to consider what seems good.

In this extreme: descendants of these traditions continue to perform them in large numbers despite lacking the social and environmental context that created them, merely because they feel there is a value in protecting them. Let's zoom in on the local context to imagine how this occurs. A girl, Yuri, in a small Japanese town is the granddaughter of the previous shrine maiden, and her only sister has expressed a strong disinterest in learning the traditions from their grandmother. As a result Yuri has a strong obligation to keep the tradition going because otherwise it'll die. The question is whether it's wrong to force her to keep it going, or whether she should be free to pursue her desires.

In one sense it's a tragedy for the tradition to die, but on the other hand this is just a natural part of history. We can look back and see that history is littered with traditions and practices that have died out because they no longer serve a purpose. Given that, it seems somewhat perverse that we would selectively favour the traditions that we happen to inherit today, simply because we have enough excess labour that we can indulge in them despite them not serving the purpose for which they were created.

On the other hand, as a society we see it as a tragedy for that tradition to be lost, and we would want someone to continue that tradition if it was in danger of being lost. Hence in Japan, some money is provided by the government to pursue practices of significant cultural value to preserve them and share them for the enjoyment of modern Japanese people.

Given that we get to care about multiple competing priorities simultaneously, and that people genuinely care about these things, this seems like quite a good use of time and effort. A more cynical position would be that, given the extreme suffering that exists elsewhere in the world, a healthy triage of expenditures would allocate them just enough resources to keep them from disappearing entirely.

On the other hand, this seems very unlikely because there is a natural tendency for a push towards the global maxima. The rate of language loss today is the highest it has ever been, and seems to be increasing. People in remote areas of China for example have a reasonable incentive to learn Mandarin or English for economic reasons, and understand that their opportunities are naturally limited by sticking to local dialects. With each generation the number of speakers of that language will decrease.

We can think of this as a kind of "great smoothing out", where cultural differences present a natural friction, that become silent casualties of Progress.

Screenshot 2026-04-25 at 22.52.01.png

Medical Mechanica from FLCL, a gigantic clothes iron ready to smooth the world into a flat factory floor when the right circumstances arrive.

An interesting contemporary example is a new law that was just passed in China called the "Law on Promoting Ethnic Unity and Progress" containing provisions around the use of Mandarin over local ethnic languages in schools in China, essentially ensuring that while the other languages can persist, Mandarin must be given prominence relative to them.

While not explicitly mentioning Tibet, Mongolia, or Uyghur, it clearly targets these regions, which have resisted efforts to switch to Mandarin. In 2020 Inner Mongolia saw significant protests when the party reduced Mongolian-medium instruction in elementary schools, replacing them with Mandarin texts. This led to widespread protests and boycotts, followed by a purge of ethnic Mongolian Party officials who were viewed as insufficiently aligned. A PEN America / Southern Mongolian Human Rights Information Center report found that more than 80% of Mongolian-language websites in China have been censored or banned.

In contrast to the natural economic gravity leading people to abandon their native languages, this is a much more intense effort to dismantle the existing cultural differences in these places. The law uses the word "zhulao" which is typically used in the metallurgical sense of forging or casting, with the intent to create a unified alloy of peoples across China with a shared language and cultural identity. In fairness to the Chinese, it is still more permissive than France's historical treatment of regional languages such as Breton, Occitan, and Basque, where the state pursued aggressive suppression for over a century.

Interestingly, yesterday Will MacAskill released a new draft about an idea called the "Saturation View" which gives a clean framework for thinking about this problem.

Total utilitarianism recommends something he describes as "tiling the universe with hedonium", where hedonium is a compute substrate simulating an enormous number of digital minds which have been optimised to produce the optimal pleasant experience, forever. Naturally such a vision produces a sense of dissatisfaction, that something is missing, like a symphony composed of a single note repeated over and over.

The saturation view argues that the optimal configuration of the universe is to explore the full space of possible minds and experiences. The variance between minds is considered virtuous, with minds that are more dissimilar from the mean providing additional marginal utility. To compare this with our own life, if offered the opportunity to experience the same wonderful experience over and over again for one's entire life or to experience a dizzying array of different wonderful experiences, most people would choose the variety option. He treats this as a brute intuition without arguing for why, so I'll offer one: the relationality between experiences is critical in giving them framing and meaning, where the contrast itself provides value. The words in a sentence require spaces between them, and variety between words in order to create appreciable meaning.

This frame gives us a way to reconcile the tensions of some of our earlier examples. For Yuri, who is faced with walking away from her traditions, it is clearer precisely what we are losing, which is that the space that might be illuminated by such practices will be lost. This is the answer to the perversity of privileging cultural inheritances we receive today. In the same way that we don't mourn people who died 1000 years ago, but do mourn those lost today, it makes sense for us to act to preserve those unique practices, given that we actually have the resources to do so.

We don't have any clear answers on her obligation to take up the task, and I think this is reflected in reality. We acknowledge that it's a hard thing and a burden, and I admire people who do it anyway for this reason.

The crux Max kept returning to was the Culture's homogeneity, and he's right. The Culture is boring. If everyone is just having sex, producing drugs from their brain, and making art all the time, there is a lot of value being left on the table in the modes of experience they could be having. There is a sense (at least when I read them) that Culture citizens who transfer into totally non-humanoid alien bodies are doing something better than the average Culture citizen, which is likely picking up on this intuition.

And finally we can see that there is something real being lost when China erodes its less mainstream cultures. The language practices in various parts of China aren't just language, but the vessel in which culture and unique experiences are transmitted. In particular, the pastoralist communities who speak Mongolian as their primary language uniquely carry this heritage in that they maintain ancient ways of life in their entirety. Crushing those vessels prunes those branches before they can branch further.



Discuss

Diary of a "Doomer": 12+ years arguing about AI risk (part 3: the LLM era)

2026-04-26 13:50:13

Part 3 of a series. Here are part 1 and part 2.

One of the things that always surprised me is how few people in AI were interested in AI safety and alignment purely out of intellectual curiosity. These topics raise the kind of novel, foundational problems that scientists typically love, in a field where they otherwise seem scarce. The field did eventually get interested. But it wasn’t because of intellectual curiosity or concerns about x-risk, it was because of the practical utility of alignment methods for large language models.

When and why did other researchers get interested in AI Alignment?

A lot of the most interesting problems in AI safety and alignment are fundamental and conceptual, and remain unsolved. But as these topics became further integrated into machine learning, a lot of the research took on a more “pragmatic” flavor. The basic idea is: “Let’s look at current approaches to AI and try and solve safety and alignment problems in a way that is rooted in the current paradigm, rather than being more fundamental.”

This kind of “pragmatic” AI Safety research gained slow and steady ground for a few years, but starting in 2020, shortly before I became a professor, it surged in popularity. This is because, with large language models, these concerns became obvious, apparent, commercially valuable, etc.

Around this time, LLMs like GPT-3 were showing signs of having the sort of capabilities we are used to in modern LLMs, but it still took specialized skill to get those capabilities out of the systems (see the beginning of “Alignment vs. Safety, part 2: Alignment” for more on this). It was clear that getting LLMs to actually do the things they were capable of was nontrivial.

Within a few years, “the alignment problem”, once dismissed by most as basically a non-issue, became universally accepted by the field as a core technical problem. The default attitude became “of course there is an alignment problem, but it’s not an existential risk”.

But before that could happen, we went through the phase where students want to do alignment, but their professors say it’s a waste of time. I’d seen the exact same thing in the early days of deep learning. I’d have this conversation half a dozen times at every AI conference I went to.

The field’s newfound interest in alignment naturally brought with it some curiosity about existing alignment research and researchers, including the field’s pre-occupation with x-risk. But most AI researchers getting into alignment at this time were not seriously engaged with that concern.

This was a growing issue for AI Safety. Once you start looking at AI Safety from a machine learning lens, it gets hard to figure out where “safety research” ends and “capabilities research” begins. Despite clear progress on solving practical alignment problems in LLMs, problems clearly remained (and remain). We were entering a “ditch of danger” where alignment was solved well enough that AI would be very useful, but not solved well enough that we were safe.

2022: AI x-risk becomes a mainstream among AI researchers

It was incredible to see AI alignment taking off as a research topic. Other researchers actually wanted to learn about alignment and were interested to learn about it from me!

I had mixed feelings about the whole thing, because I realized how little the alignment methods being used were doing to make AI actually trustworthy (i.e. solve the assurance problem). The methods were based on reward modelling. I’d done some early work on the topic and when we wrote the research agenda on the topic, I insisted we highlight this limitation.

But starting in 2021, and intensifying in 2022, I really started to notice a sea change: it was no longer just researchers trying to make LLMs work, more and more researchers were worried about how well they worked. AI professors and other researchers who’d been in the field as long as me, or longer, started to express serious concern about human extinction, and approaching me asking what I thought we should do about it.

This was different from the previous “What’s the alignment thing? How does it work?”. It was more like “oh jeez, fuck, you were right… What now? Are we fucked?” It still wasn’t everyone, by any means, but also, I felt the heart had gone out of the haters and skeptics. That AI was incredibly risky was becoming undeniable. The only disagreements left to have were about how risky, on what timescale, and what our response should be.

While the previous era had brought “AI Safety” into the mainstream AI community, it was a sanitized, “x-risk-free” version that had to be presented to the rest of the field. And even in the 2020s, with the rise of alignment, and the vindication of the basic concern that it would be hard to control AI systems and steer them to “try” or “want” to do what you want, an aura of taboo remained. Researchers concerned about AI x-risk might approach the subject cautiously, hinting at these concerns to gauge others’ reactions. It was clear to me that this was holding back awareness and acceptance of the risk, and it would need to change.

Conclusion

The first post in this series brought us from: “AI researchers aren’t even aware of x-risk concerns” to “AI researchers are actively hostile to x-risk concerns”. The second took us from there to “AI Safety is (perhaps begrudgingly) respected as a legitimate research topic”. And this post took us all the way to “AI Alignment (of a sort) is a major research topic, and AI researchers are getting worried about LLMs’ capabilities”.

The next -- and final -- post in the series will take us from this moment to the Statement on AI Risk I initiated in 2023 that catalyzed the growing level of interest, respect, and concern among AI researchers, and finally all the way up to the present.

Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.

Share



Discuss

Forecasting is Way Overrated, and We Should Stop Funding It

2026-04-26 06:39:28

Summary 

EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very useful, yet it continues to receive substantial EA funding. We should cut it off.

My Experience with Forecasting

For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I’m still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI bets. I say this to suggest that I would gain status from forecasting being seen as useful, but I think, to the contrary, that the EA community should stop funding it.

I’ve written a few comments throughout the years that I didn’t think forecasting was worth funding. You can see some of these here and here. Finally, I have gotten around to making this full post.

Solution Seeking a Problem

When talking about forecasting, people often ask questions like “How can we leverage forecasting into better decisions?” This is the wrong way to go about solving problems. You solve problems by starting with the problem, and then you see which tools are useful for solving it.

The way people talk about forecasting is very similar to how people talk about cryptocurrency/blockchain. People have a tool they want to use, whether that be cryptocurrency or forecasting, and then try to solve problems with it because they really believe in the solution, but I think this is misguided. You have to start with the problem you are trying to solve, not the solution you want to apply. A lot of work has been put into building up forecasting, making platforms, hosting tournaments, etc., on the assumption that it was instrumentally useful, but this is pretty dangerous to continue without concrete gains.

We’ve Funded Enough Forecasting that We Should See Tangible Gains

It’s not the case that forecasting/prediction markets are merely in their infancy. A lot of money has gone into forecasting. On the EA side of things, it’s near $100M. If I convince you later on in this post that forecasting hasn’t given any fruitful results, it should be noted that this isn’t for lack of trying/spending.

The Forecasting Research Institute received grants in the 10s of millions of dollars. Metaculus continues to receive millions of dollars per year to maintain a forecasting platform and conduct some forecasting tournaments. The Good Judgment Project and the Swift Centre have received millions of dollars for doing research and studies on forecasting and teaching others about forecasting. Sage has received millions of dollars to develop forecasting tools. Many others, like Manifold, have also been given millions by the EA community in grants/investments at high valuations, diverting money away from other EA causes. We have grants for organizations that develop tooling, even entire programming languages like Squiggle, for forecasting.

On the for-profit side of things, the money gets even bigger. Kalshi and Polymarket have each raised billions of dollars, and other forecasting platforms have also raised 10s of millions of dollars.

Prediction markets have also taken off. Kalshi and Polymarket are both showing ATH/growth in month-over-month volume. Both of them have monthly volumes in the 10s of billions of dollars. Total prediction market volume is something like $500B/year, but it just isn’t very useful. We get to know the odds on every basketball game player prop, and if BTC is going to go up or down in the next 5 minutes. While some people suggest that these trivial markets help sharpen skills or identify good forecasters, I don’t think there is any evidence of this, and it is more wishful thinking.

If forecasting were really working well and was very useful, you would see the bulk of the money spent not on forecasting platforms but directly on forecasting teams or subsidizing markets on important questions. We have seen very little of this, and instead, we have seen the money go to platforms, tooling, and the like. We already had a few forecasting platforms, the market was going to fund them itself, and yet we continue to create them.

There has also been an incredible amount of (wasted) time by the EA/rationality community that has been spent on forecasting. Lots of people have been employed full-time doing forecasting or adjacent work, but perhaps even larger is the amount of part-time hours that have gone into forecasting on Manifold, among other things. I would estimate that thousands of person-years have gone into this activity.

Hits-based Giving Means Stopping the Bets that Don’t Pay Off

You may be tempted to justify forecasting on the grounds of hits-based giving. That is to say, it made sense to try a few grants into forecasting because the payoff could have been massive. But if it was based on hits-based giving, then that implies we should be looking for big payoffs, and that we have to stop funding it if it doesn’t.

I want to propose my leading theory for why forecasting continues to receive 10s of millions per year in funding. That is, it has become a feature of EA/rationalist culture. Similar to how EAs seem to live in group houses or be polyamorous, forecasting on prediction markets has become a part of the culture that doesn’t have much to do with impact. This is separate from parts of EA culture that we do for impact/value alignment reasons, like being vegan, donating 10%+ of income, writing on forums, or going to conferences. I submit that forecasting is in the former category.

At this point, if forecasting were useful, you would expect to see tangible results. I can point to you hundreds of millions of chickens that lay eggs that are out of cages, and I can point to you observable families that are no longer living in poverty. I can show you pieces of legislation that have passed or almost passed on AI. I can show you AMF successes with about 200k lives saved and far lower levels of malaria, not to mention higher incomes and longer life expectancies, and people living longer lives that otherwise wouldn’t be because of our actions. I can go at the individual level, and I can, more importantly, go at the broad statistical level. I don’t think there is very much in the way of “this forecasting happened, and now we have made demonstrably better decisions regarding this terminal goal that we care about”. Despite no tangible results, people continue to have the dream that forecasting will inform better decision-making or lead to better policies. I just don’t see any proof of this happening.

Feels Useful When It Isn’t

Forecasting is a very insidious trap because it makes you think you are being productive when you aren’t. I like to play bughouse and a bunch of different board games. But when I play these games, I don’t claim to do so for impact reasons, on effective altruist grounds. If I spend time learning strategy for these board games, I don’t pretend that this is somehow making the world better off. Forecasting is a dangerous activity, particularly because it is a fun, game-like activity that is nearly perfectly designed to be very attractive to EA/rationalist types because you get to be right when others are wrong, bet on your beliefs, and partake in the cultural practice. It is almost engineered to be a time waster for these groups because it provides the illusion that you are improving the world’s epistemics when, in reality, it’s mainly just a game, and it’s fun. You get to feel that you are improving the world’s epistemics and that therefore there must be some flow-through effects and thus you can justify the time spent by correcting a market from 57% to 53% on some AI forecasting question or some question about if the market you are trading on will have an even/odd number of traders or if someone will get a girlfriend by the end of the year.

Conclusion

A lot of people still like the idea of doing forecasting. If it becomes an optional, benign activity of the EA community, then it can continue to exist, but it should not continue to be a major target for philanthropic dollars. We are always in triage, and forecasting just isn’t making the cut. I’m worried that we will continue to pour community resources into forecasting, and it will continue to be thought of in vague terms as improving or informing decisions, when I’m skeptical that this is the case.



Discuss

"Thinkhaven"

2026-04-26 02:12:53

Inkhaven has people writing a blogpost a day for 30 days. I think this is a pretty great, straightforward exercise, that I'd definitely want in a hypothetical Rationality Undergraduate Program. But, it's not the only such exercise I'd want to include. It's gotten me excited for a different (superficially similar) program, which I might call "Thinkhaven."

In Thinkhaven, the goal is to learn the skill of "relentlessly think new, useful thoughts you haven't thought before." 

Inkhaven had a basic "Goodhart-able" goal of "publish 500+ words every day." For Thinkhaven, I would imagine the Goodhart-y goal being something like:

  • Every day, you must publish 500 words of "research journal." They should be cogent enough for third parties to follow along with your thought process, but, they don't need to end with nice discrete conclusions. 
  • Every two weeks, you must also publish a 2500 word effortpost.

And, somewhat more opinionatedly:

  • Each journal entry should include at least one new question that you're thinking about. (Often, these will be subquestions within a broader Big Question that you're exploring that week, or a reframing of that big question that feels like a better pointer)

The spirit of the Goodhart is "are you finding new questions, and making some kind of progress on them?". Along the way, each day, you're thinking at least one new thought you haven't thought before.

One way of "thinking new thoughts" and "asking new questions" is to research stuff that already exists out there (i.e. learn some cool new facts about math/history/science/etc, and then write up good explainers for it, or connect it to other domains).

Another way is by thinking original thoughts, plumbing somehow through your confusions about the world and developing new concepts to help deal with them.

Presumably there are other approaches too, but I list those two to convey that that there's more than one way to go about this. The main thing we'd be trying to not do at Thinkhaven, is to "explain ideas that you've already thought about and just wish everyone else understood." That's also important, it's just not what Thinkhaven is for.

The daily journal is for accountability, to make sure you're making any kind of progress at all. The daily "new question", is to help ensure that progress has some kind of forward momentum, and is exploring new ideas.

The fortnightly 2500 word published writing is to close the loop on "get a new idea all the way from a vague musing, to something written up in a way that other people can critique." (Ideally, this explains some new ideas to the internet. If you didn't really get any new good ideas, you can write "well, I didn't come up with anything, but, here's a cleaned up version of my daily research notes.")

My primary inspiration for this is not actually Inkhaven, it's a period in 2022 where the Lightcone team focused on thinking/research/etc to try to get in touch with what it's like to be an original thinker on LessWrong. We set a goal of writing up a blogpost-worth-of-content per day, which the team would then read over each morning. Even without publishing externally, it was a useful forcing function to keep generating new thoughts and forcing them into a clearer shape.

I personally found it helpful for transitioning from "a guy who mostly defers to other people" to "a guy thinking his own strategic and intellectual thoughts."

Mentors, and Different Styles of Thinking

This is intended to be a fairly openended container. I'd expect to get value out of the pure container listed above, but, I'd ideally want a few different styles of mentors and coaches around, who embody different ways of thinking. 

There are a few ways to operationalize that. You could model the thing more similar to MATS where everyone has a mentor they meet with on some cadence. If I were modeling it more on Inkhaven, I think some mentors would give classes, others might be more like mysterious old wizards you just go talk to.

All participants need to have at least one mentor who is enthusiastic about them (as part of the admissions process), but, they could sample from different styles of mentorship over the course of the month.

Possible examples of mentors: 

Note: these are examples, not people who agreed to participate or even looked at this post. But they are some archetypes that I'm imagining. I'd be hoping for Thinkhaven to include a mix of mentors or "resident thinkers" with similar range.

John Wentworth-style, focused on tackling some confusing problems we don't understand, asking "what's the hard part?" / "what's the bottleneck?", and systematically making progress, while keeping an eye on Principles Which Will Carry Over To The Next Paradigm.

Logan Strohl-style, focused on openended, patient observation (with a kind of "open curiosity" as opposed to "active curiosity"). Trying to keep as close-to-the-metal on your observations. (See Intro to Naturalism: Orientation for a deep meditation on the sentence "Knowing the territory takes patient and direct observation.")

Elizabeth Van Nostrand-style, with some focus on open-ended "lit review"-ish research. Pick a new field you are curious about, read over lots of existing papers and books. See if you can synthesize some new takeaways that weren't obvious. Be ready to follow threads of information wherever they lead. 

Scott Garrabrant-style, go live where the important math problems are, but then marry-for-love. Mull over interesting problems and then get nerdsniped on whatever feels alive.

Chris Olah-style, where... okay honestly I'm not actually sure how Chris Olah does his thinking and he seems particularly unlikely to come. But, reading over his older blogposts I get a sense of both a guy who likes studying lots of little fiddly patterns in the world and making sense of them, in a way that (vaguely) reminds me of an old timey biologist. And, a guy who likes experimenting with new ways of explaining things. 

Thinking Assistants / Research Managers

The mentors above are selected for "I respect their thinking and writing."

They're not particularly selected for it being the right-use-of-their-time to help people through daily stumbling blocks, executive dysfunction, etc.

I would want some staff that are more like the research coaches at MATS, who meet with the people on some cadence to check on how things are going and help them resolve obstacles. And, I'd like to try out having dedicated Thinking Assistants available, who can sit with you for a chunk of time as you write or talk out loud through your problem, and notice little microhabits that might be worth paying more attention to.


"FAQ"

Everything above is the core idea. I'm not that confident in that particular format, and expect I'd change my mind about stuff after one iteration. But, here's some explanations of why I picked this structure instead of others, structured as an FAQ.[1]

Why require "a new question each day?"

I'm not sure this will work as well as I hope. But, my reasons are:

  1. It forces you to cultivate a cluster of skills. 
  2. It frames attention, in a way I think will help cultivate a healthy "curious vibe."
  3. It is a mini-feedback loop that hopefully pumps against some kinds of masturbatory thinking.

Sometimes, when you're exploring and stewing on a set of ideas, you're not really making progress, you're sort of going in circles, or building up some superficial understandings that don't really translate into a clear takeaway. Asking yourself new questions forces you to take your vague musings and confusions and turn them into questions with a meaningful return type

It also pumps against "explaining ideas you've already thought about." (which again, is totally a useful way to write. It's just not what this program is for). By forcing yourself not to do something, you create space to practice new skills.

And, while it's opinionated on format, I think the "question" framing is still pretty openended as structures go.

What would asking new questions look like, in practice?

One person read the above and was like "okay I kinda get it, but I think I need to see an example of what this looks like to have a clearer sense of what this'd mean." 

Here's an example. 

(Note: this is just one example. As I just said, the program should be pretty unopinionated. Hopefully, if my line of questioning feels weird to you, it helps you imagine a version that would fit your thought process better). 

I might start with a vague frustration/confusion:

"Geeze, alignment seems to have shitty feedback loops. wat do?"

I find it fruitful to ask more explicitly:

"Okay, what would it mean to have good feedback loops?"

"If there were definitely no good feedback loops, what else might good progress look like?". 

Which in turn prompt more specific questions like:

"What are some domains that solved the 'poor feedbackloop' problem before? How did they do that?".

"What are some domains where 'feedbackloop' just wasn't even the right ontology?"

"What problem are 'feedback loops' solving? What other ways could you solve those?"

"What properties would 'solving alignment' have? What do I actually mean by that?"

As well as meta/tactical questions like:

"Who are some people who've thought about this already? Do they have writing I could read? Could I literally go talk to them?"

"Why is it hard to think about this, and what can I do about that?"

And then I might learn about domains where progress clearly accumulates, but a lot of it is driven by "taste." I might then spend a day digging into historical example of how people acquired or transmitted taste. 

What should a "Daily Journal" look like?

The first answer is "whatever you want." 

But, I did find, while beta testing this for myself this month, that it worked better when I gave myself a set of daily prompts to fill out, which looked like:

What questions did I think about yesterday?

What did I learn yesterday?

What questions or confusions am I interested in now?

What seems difficult about this? How can I fix that?

The "what did I learn?" section is the bit that ends up most shaped like a 500 word blogpost. 

Rather than think of this as "the thing I scramble to write before the end of the day", it's more like a thing I write when I first get started in the morning. (I don't really like the "publish by midnight" thing that Inkhaven does, and I think I might want to actually set the deadline at lunchtime).

Another friend who beta-tested the format experimented with changing up the prompts, so that it worked better as an orienting process for them. (By default it felt a bit like a tacked-on-assignment they were doing out of obligation, but, slightly tweaked, it felt more naturally like a useful thing for them to do each day)

Are the daily journals public? Why?

I think so, but, not 100% sure.

(But, my default recommendation would be to put them on an out-of-the-way secondary blog, so you feel more free to think dumb thoughts along the way).

The reason to make them public is to help them function more as an accountability mechanism. You don't need to make a nice polished essay with a conclusion. But, you do need to get your thoughts to a point where they're structured enough someone else can make sense of them. 

I considered just requiring them to be published internally to the Thinkhaven cohort. Habryka argued with me that this'd make people feel more like they were writing for the cohort-in-particular, having to care what those people thought, instead of getting to follow their own thought process.

The most important thing is you expect someone to be reading them.

Do we even need the 2500 word effortpost? Why can't it just be research journals all the way down?

Because the point of intellectual progress is to actually contribute to the sum of human knowledge. It's an important part of the process to package it up in a way that other people can understand and build on.

And, it's an important forcing-function that eventually your meandering question needs to turn into something that someone else would want to read.

Why "2500 words every 2 weeks" in particular?

Both of these are numbers I can imagine fine-tuning.

Why not "once a week?"

I thought "once a week" might be a better cadence, but, when I tried it out I found it too short.

During Inkhaven, where I was mostly focused on writing up existing ideas, I was able to write ~2000+ words a day and usually write one full post and make partial progress on an effortpost.

Thinking new meaningful/useful thoughts takes awhile, and sometimes it's important to get lost in the woods for awhile without knowing quite how everything will tie together. Or, just go off and gather a lot of information and digest it.

Why not longer?

I think "real work in the field" often does take more than 2 weeks at a time to output a blogpost worth of content. But, I think that's too slow a feedback loop for people learning. This is still supposed to be a class. I think it'd be hard for people to stay for longer than a month, and seems like people should get at least two reps in of "go from ideation -> publishing."

If this ended up being like a 3-month fellowship, I can imagine once-a-month being a reasonable cadence. But, I think it's just not that hard to turn 2 weeks of thinking into one substantial writeup.

If this were a 3-month fellowship, my current guess is I'd keep the 2-week effortpost but add in a Final Project that's aiming for the level of "significant contribution to whatever field you're exploring."


All of this is only one possible structure for the underlying goal of "learn to relentlessly find new, useful thoughts every day." But, it's a pretty simple structure I'd expect to do pretty well even in its minimal form.

Anyways, happy thinking.

  1. ^

    These questions have all been asked at most "once" and sometimes "zero", so "frequently asked questions" is not exactly correct.



Discuss