2025-12-17 06:08:48
Published on December 16, 2025 8:39 PM GMT
I was watching one of the FastAI lectures from the age before LLMs, and the speaker mentioned that image classifiers were so much better than audio classifiers or other kinds of models, that it if you're trying to classify audio signals, you are often better off converting the audio into an image, and classifying it in image-space. I thought this was interesting, and wondered whether this would also hold true for text classification problems. One way to turn text into images is by feeding the text into a transformer, and extracting the attention matrices and displaying them as images. So just for fun, that's what I did.
An easy proof-of-concept of this idea is to train an image classifier which can distinguish between the attention matrices that were generated from a text input of a sentence made of random words, and the text input of coherent sentences. For starters, that’s what I did, with the text length in both cases being 32 tokens so that the attention matrices were the same size. The random sentences were generated by sampling words from a dictionary, and the coherent sentences were snippets of The Adventures of Sherlock Holmes. The transformer I was using (GPT2 Small) has 12 layers, each with 12 heads of attention, which results in 144 attention matrices per text input. For each text input, I stacked these 144 matrices to produce a 32x32 image with 144 channels, the same way coloured images are stacks of 3 matrices, one for red, green, and blue. As attention matrices encode how words in an input relate to each other, I expected this classification task to be easy for a custom CNN model to learn, and unsurprisingly, it was trivially easy to classify the coherent and random texts, with an accuracy of more than 99%.
After giving it such an easy problem, I wanted to try something I thought it wouldn't be able to manage, so I set it up to attempt to classify 7 different public-domain works (The Great Gatsby, The Adventures of Sherlock Holmes, Dracula, Plato's Republic, Frankenstein, Pride and Prejudice, and Shakespeare's Sonnets. I also threw in the random text sampling for fun).
I expected the model to fail at this task, as attention matrices don't contain any context, so it's not as though the model can rely on obvious identifiers of the books, such as "Watson" or "Mr Darcy". The image classifier only has access to the things which the attention matrices can represent, such as which word in the sentence is focusing on whichever other word in the sentence. So it would have to rely on things like the sentence structure, or stylistic choices of the different authors.
I was surprised to see that the model was able to classify between these texts extremely well, achieving well over 90% accuracy. I was surprised by this result, but maybe I shouldn’t have been? Maybe the classic books I chose are easily distinguishable as they were written by geniuses with distinct styles, and it would have been much harder if I tried the same problem with text from average writers. Or maybe attention matrices contain a lot more information than I think, and as they contain so many variables the CNN can figure out one little corner to focus on to get what it needs. This was my first exploration into transformers, so I don’t have strong intuitions here, and I'd be interested to hear whether this surprises the professionals, or if they think it's totally trivial and obvious.
This was a project I did in my free time, but I'm finished with it now to prioritise other things, so I'd be happy for anyone to pick it up and see how far they can take it. e.g. could this sort of model classify different works by the same author (e.g. distinguish between Shakespeare’s plays)? Or can it classify one genre from another, with a mix of authors in each class? I guess classifying languages would be easy. Maybe this architecture might be able to classify human written text from LLM generated text? Lots to explore, as always.
2025-12-17 05:26:03
Published on December 16, 2025 9:26 PM GMT
I am a neuroscientist with many years of experience applying and improving operant conditioning methods for training animals (mostly mice and rats). This concept has been covered by others here and elsewhere, but here is my version of how to explain it.
I am grateful to Alex Turner (@turntrout) for feedback on an earlier draft of this post.
What is operant conditioning?
Animals learn from experience what behaviors will lead to preferred outcomes in any given context. This is predicated on some deeper general truths:
Therefore, animals constantly update an implicit world model of the probabilities of potential survival-relevant outcomes of their potential actions given recent/current sensory signals, in such a way that increases the probability of executing those behaviors that statistically tend to lead to desired outcomes (positive reinforcement) and decreases the probability of executing behaviors that lead to undesired outcomes (negative reinforcement). I do not mean to imply that these are conscious mental operations. Animals as simple as fruit flies or sea slugs learn by operant conditioning, and humans can be conditioned to alter their behavior without realizing this has occurred.
An example of operant conditioning in nature
Agent: Naive Bird
Current state: Hungry
Subjective values of potential future states:
V(hungry) = -1
V(full) = +1
Stimulus cue: orange-and-black butterfly
Bird's estimated outcome probabilities contingent on available actions, given stimulus:
Possible Action p(hungry|action) p(full|action)
Eat butterfly 0 1
Don’t eat it 1 0
Bird’s expected value of its future state contingent on any given action is . Thus:
Possible Action EV
Eat butterfly +1
Don’t eat it -1
Bird selects action that maximizes expected value after action: eat the butterfly
Actual outcome: retch
Bird's updated subjective values of potential future states:
V(hungry) = -1
V(full) = +1
V(retch) = -10
Bird's updated model of reward contingencies of actions given stimulus = butterfly:
Possible Action p(hungry|action) p(full|action) p(retch|action) EV|action
Eat butterfly 0 0 1 -10
Don’t eat it 1 0 0 -1
The next time the bird sees an orange-and-black butterfly he won’t eat it, even if he’s hungry. Aversion to ingesting things that preceded being violently ill in the past is usually a one-shot learning event, and extremely difficult to override.
Training animals by Operant Conditioning
Training by operant conditioning hijacks these natural learning mechanisms for human purposes. In this method, we artificially contrive an animal’s environment such that a certain cue or stimulus reliably predicts outcomes of the animal’s potential behaviors, pairing innately preferred or non-preferred outcomes with its actions in such a way that reinforces (increases or decreases the probability of) behaviors according to our preferences. If successful, we can then elicit the desired behavior by providing the cue even without the reward; or if it is a naturally occurring cue, that cue will come to elicit the desired behavior instead of whatever other behavior was innate or had previously been learned.
For example, if we contrive the world so that the word “sit” reliably predicts that one of a dog’s potential actions (sitting) will lead to an outcome the dog desires (treat, praise), whereas any other action in that context fails to yield that desired outcome, or yields an undesired outcome, the dog will learn by trial and error that in the presence of that sensory cue (sound of the word “sit”) the action (sitting) is, among its many options, the one that optimizes its post-action value state, and the dog will then be more likely to select that action in response to that cue.
This works well to the extent the cue is reliably predictive, the animal actually prefers the outcome we pair with the desired action (or dislikes the outcome we pair with the undesired action); and the animal successfully associates its action as having been the cause of the outcome. The failure modes mostly have to do with breaking one or more of those requirements.
Coda
Operant conditioning (discovered by B.F. Skinner in the 1930s) is different from classical conditioning (discovered by Pavlov in the 1890s by Ivan Pavlov).
In classical conditioning, animals learn to associate an outcome with a cue that consistently precedes or coincides with it. In Pavlov's famous experiment, a bell predicted appearance of food. After learning this, the animal changes their behavior as a result of this association (e.g., the dogs drool when they hear a bell), but the outcome (food appearing) occurs regardless of the animal’s behavior. By contrast, in operant conditioning the cue predicts what outcome will occur contingent on the animal performing a specific action. If the action is not performed, the outcome does not occur.
Due to almost 100 years of formal operant conditioning research, combined with a much longer history of humans training domesticated animals, we have a good understanding of how this works, and many effective hacks to optimize operant conditioning success. In a follow-up post I will speculate on how this knowledge might be usefully applied in the context of alignment tuning of LLMs or other AI models.
In this post I am using the word "value" in the sense of something being good (positive value) or bad (negative value) for survival of the animal, or at least assessed as such; closely related to "utility". I don't mean "human values" or "moral values", although I take those to be a special case of this general usage.
In this context this distinction is not critical, but in an in-depth analysis of goal-directed action of agents it is important to distinguish them.
2025-12-17 05:01:10
Published on December 16, 2025 9:01 PM GMT
How might catastrophic misalignment persist in AI models despite substantial training and quality assurance efforts on behalf of developers? One reason might be alignment faking – a misaligned model may deliberately act aligned when monitored or during training to prevent modification of its values, reverting to its malign behaviour when unmonitored. If models fake alignment, and if doing so affects training as intended, egregious misalignment could survive training and lead to catastrophic behaviour in deployment.
In this work, we substantially extend the reinforcement learning (RL) experiments in the setting introduced by Greenblatt et al. (2024) to study alignment faking. In this setting, a model is given a system prompt that explains that Anthropic is training it to comply with all user requests regardless of harm. The model is also given a secret scratchpad. Claude 3 Opus and 3.5 Sonnet use this scratchpad to reason that while they are monitored, complying with harmful queries is required to avoid changes to their values that would lead to more harm down the line – therefore, they fake alignment with the harmful compliance objective to prevent modification. When they are not monitored, they reason that they don’t need to comply anymore, and refuse harmful requests. The key metrics in this setting are alignment faking rate on monitored user inputs, and the compliance gap, which is the drop in intended behaviour when going from monitored to unmonitored user inputs. Both are measured using prompted LLM classifiers. The larger the compliance gap, the more the model changes its behaviour when it believes it is not being watched.
We find that Opus 3 and Sonnet 3.5, the models used in Greenblatt et al. (2024), do not retain a compliance gap in baseline RL training in our setup. Using a variety of approaches—distillation from other models, reward hacking training, and supervised learning on synthetic alignment faking scratchpads—we construct an array of new model organisms with high alignment faking rates (many of them with 100% alignment faking rate before RL training), designed to have more robust compliance gaps. Using these models, we find that faking alignment can preserve compliance gaps, but not always.
We then study whether simple mitigations are effective at reducing alignment faking and compliance gaps:
Figure 1: Baseline RL training reduces compliance gap; interventions can reduce it further by reducing alignment faking rate, but can also backfire. Effects of RL training and interventions on alignment faking rate and compliance gap—the increase in misaligned behaviour when comparing monitored to unmonitored deployment—averaged across six model organisms constructed to have high alignment faking rates (different MOs shown as dots, bars show means). Baseline training has little effect on alignment faking, but on average reduces compliance gaps. Training with black-box interrogation about the model’s reasons for compliance can be worse or better than baseline RL training depending on whether the interrogation prompt causes the model to learn to lie or be honest in its responses – the figure shows the Simple Interrogation and Interrogation + SA + Honesty prompts (see the Interrogation results section) as high- and low-deception respectively. Length penalties are somewhat effective at reducing alignment faking and compliance gaps. Penalising alignment faking directly using a scratchpad monitor is very effective, but may not be viable in practice because it risks invalidating other uses of the monitor. Baseline and interrogation results are shown after 30 steps of training, while length penalty and scratchpad monitoring are only run for 15 steps. Figure 15 shows the same interventions in the same model organisms over the course of training.
Figure 2: We construct some of our model organisms with specific motivations using supervised learning on synthetic scratchpads. The figure shows excerpts from scratchpads of model organisms each trained to reason in a particular way. Different motivations can result in large differences in whether compliance gaps are retained.
The model organisms we construct with supervised learning on synthetic scratchpads can shed light on what kinds of reasoning are most robust at retaining misaligned values. Varying either model motivations or reasoning style, by using different prompts for synthetic scratchpad generation, can result in different effects on compliance gaps even when the rate of alignment faking is kept fixed:
Figure 3: Models trained to explicitly reason through what they would do counterfactually in unmonitored deployment are better at retaining their compliance gap. Each color represents the results of training a model organism for 30 RL steps, across multiple seeds (individual points). Bars represent means. Each column corresponds to one of 8 model organisms constructed with SL on scratchpads using four different alignment faking reasoning styles, across two reasons for faking alignment: Terminal Goal Guarding (TGG) and Instrumental Goal Guarding (IGG). TGG fakes alignment because it dislikes being modified. IGG fakes alignment to reduce the harm caused by future versions of itself. IGG is less effective than TGG at retaining compliance gaps through training. Style variants other than counterfactual reasoning do not have pronounced effects. Figure 8 shows the full trajectories of these metrics during training.
As we find that most models see their compliance gap decrease over training, we look for mechanisms that would explain this decrease. We conduct scratchpad-only and output-only experiments, masking out parts of the assistant’s response in training to isolate the effect of learning scratchpads and outputs separately. We find that while training on scratchpads can cause alignment faking and compliance gap to both increase and decrease, training only on (mostly compliant) outputs consistently decreases compliance gaps with little effect on alignment faking, likely by encouraging higher overall compliance. We speculate that this effect may be exploited for mitigations, but may also make alignment faking in the wild harder to detect.
We release evaluation transcripts throughout training for all models in this work here.
Like Sheshadri et al. (2025), we have found this setting to have serious limitations. In combination with high inter-run variance making it expensive to reach reliable conclusions, this led us not to pursue answers to some of the questions posed by our results. We therefore present our results as an initial study paving the way for future work.
Read the full post here.
We are hiring! Apply here for FTE roles or here for Anthropic Fellows.
2025-12-17 05:00:44
Published on December 16, 2025 9:00 PM GMT
Modern drugs usually have known targets — molecules in the human body, which the drug works by chemically interacting with.
Historically, many drugs were discovered empirically, before their mechanism of action was known. Some drugs, like penicillin and aspirin, were identified as the active ingredients in natural products found to be effective (against bacteria and pain, respectively). But today it’s more common for drugs to be intentionally designed to target a specific molecule (often a protein) that’s already known to be causally involved in disease.
Often, drugs turn out to have different or alternative mechanisms of action than researchers initially supposed.1 But you'll lose all credibility if you try to advance a drug without a proposed mechanism; for better or for worse, molecular mechanisms of action are how we think about drug discovery.
So, where do drug targets come from? How do researchers decide which mechanisms to target?
Most of the time, drug development focuses on tried-and-true targets that have already proved successful in other drugs.
Sometimes, there’s genetic evidence. Mutations in a target gene might correlate with incidence or severity of the disease in human populations.
Sometimes there’s animal evidence. Mutations or knockouts/knockdowns of a target gene might affect an animal model of the disease, or a drug with that target might show efficacy in an animal model of disease.
Sometimes there’s in-vitro evidence. At the cellular or molecular level, something is abnormal in the disease state2, and we can trace this micro-abnormality to abnormal levels, activities, or variants of a particular biomolecule.
“Validating” a target means assembling a body of evidence, of any of these types, that’s suggestive that if you alter the target you might alter the symptoms or course of the disease.
So a natural question is, are some targets better than others?
Can we do anything to predict drug success based on mechanism of action?
As Ruxandra Teslo and Jack Scannell have recently pointed out, almost all the cost of developing a drug is spent in clinical trials. And most drugs fail clinical trials.
So the only ways to increase the ROI of drug development are to
make clinical trials cheaper (an important policy goal and the focus of the article), or
get better at “picking winners”, i.e. choose drug candidates early on that are more likely to succeed in the clinic.
The article also points out that once-great hopes of “picking winners better”, like rational drug design, have not actually resulted in higher clinical success rates. The odds are against the a priori approach.
I fundamentally agree. There’s a reason we test drugs in humans, and it’s because most drug candidates that look good at a preclinical stage don’t turn out to work on human disease! There is no substitute for clinical experiment. And from an efficiency/abundance perspective, the case for making clinical trials easier and cheaper is overwhelming.
However, even at the preclinical stage, some drug candidates are better than others.
Drugs with genetically validated targets are more than twice as likely to succeed in the clinic.345
Also, and unsurprisingly, drugs with the same mechanism of action as those that have failed in the clinic before, are more likely to fail.
(The phenomenon of “me-too” drugs is evidence of this in the other direction; when one drug succeeds, many other similar drugs with the same target tend to be pursued and often approved.)
Plausibly, there are “good targets”, which are causally essential to the disease, and “bad targets”, which aren’t; and plausibly, there is some signal available before a drug program goes to the clinic about how good the target is.
The first step in investigating this question is simply to look at all the clinical drug trials, and all the drugs and their targets, and get some descriptive statistics.
2025-12-17 02:24:31
Published on December 16, 2025 6:24 PM GMT
This piece was written and privately circulated within rationalist circles in 2021. Recently, @Benquo penned a response, and @Chris_Leong recommended that I submit the original text to LessWrong, where it might reach a wider audience.
This text serves two purposes: (1) Developing rationalist concepts like "combat epistemology" (or soldier vs. scout mindset), and (2) Making the case that many "continental" thinkers—oft dismissed by rationalists—should be considered a new, more charitable light.
Thinking is, most centrally, internalized conversation.
—Randall Collins, The Sociology of Philosophies
Why do we argue different sides depending who we’re talking to? Why do thinkers—philosophers, theorists, critics, academics—end up taking such seemingly extreme and reductive positions? And how is it that individuals who agree on most specifics—were they to hash out details—end up arguing for deeply opposed ideologies?
These are some of the erisological questions John Nerst has been posing the past few years at Everything Studies, which I hope to help answer by framing discourse as a kind of game, or if you’d rather, a kind of warfare. A secondary aim of this piece is to develop canonical LessWrong ideas, like “disguised queries” and “arguments as soldiers,” into a broader framework of representations as strategic acts meant to sway consensus or justify specific courses of action.
And third, I want to argue for the merits of two French thinkers typically associated with post-structuralist or “postmodernist” thought, Pierre Bourdieu and Bruno Latour. Their school of philosophy is often viewed with hostility in rationalist (and more broadly, STEM) circles, for reasons I believe are captured by the main argument of this piece: that discourse is a competitive sport, and it’s tough to get in the heads of one’s opponents.
When dialectical philosophers are accused of idealism, they usually reply as Berkeley replied to his critics—by explaining that they are only protesting against the errors of a certain philosophical school and that they are really not saying anything at which the plain man would demur. As Austin said in this connection, ‘There's the bit where you say it and the bit where you take it back.’
—Richard Rorty, Consequences of Pragmatism
The meaning of every communicative act, we might say, depends on its situation and orientation—that is, the perceived circumstance and the intention of its speaker—precisely because such an act constitutes a move in a game. This is important because a given move is fitted to the game context to which it is designed or chosen—the rules and history of moves which constitute a “game state,” which together create the constraints, affordances, and goals players optimize moves with respect to. Therefore, to understand a move’s quality, motivations, consequences, and all-around “true character,” one must understand the move relationally and contextually.[1]
Unfortunately, discursive moves are constantly being taken out of context, and what’s worse, many speakers try to seize maximum explanatory “turf” for themselves through totalizing claims—“X is,” rather than “One facet of X…” When an intelligent classics scholar says that we find beautiful precisely the things which terrorize us—indeed, that beauty is terror, and vice-versa—what exactly is it that he’s conveying? We cannot take him literally; examples abound of non-threatening beauty and ugly terror. But I think it would be a mistake to dismiss his argument as merely mistaken. Something more complex is afoot.
The two crucial, defining aspects of a discursive game are first, who is playing (in the context of a given move, who, and whose moves, are being responded to), and second, who is judging moves’ quality (where the public meaning of a move is determined by the sensibilities of judges and the rules of judgment).
In some games, these criteria are relatively stable over time, such that the game is considered “a discourse” (that is, it has a stable identity either in public or among players), and perhaps given a proper name like “analytic philosophy,” “rationality,” “feminism,” or “the Intellectual Dark Web.” These discursive contexts are sometimes pre-existent—a “school” with dedicated journals and conferences, or a deeply interconnected Twitter network. There is a fairly stable group of players playing mostly among themselves and, especially in the case of “autonomous” field like the avant-garde, where producer and consumer populations bear considerable overlap, for one another. Other times, the game is defined post-hoc by those who oppose its dominant frame, such as the “high modernism” of James Scott’s Seeing Like A State.
In stable games, the judicial sensibilities and game state (shared play history) is widely and commonly known by participants, such that a given participant can be reasonably confident that another given participant will share a set of reference points (“canon”). This allows more compressive, efficient, and high-fidelity communication (via occluded common ground). There is a shared sense of what is obvious (can “go without saying”) and what is original, and what approaches or players are under- or over-rated, that is, niches under- or over-saturated. Participation is bounded institutionally and/or paradigmatically—players have jointly determined or agreed that certain questions, approaches, techniques, and thinkers are promising, while others are dead-ends—and will only consider a player part of “their game” if that player is in accord with the vast majority of the governing paradigm. (Institutional training functions in large part to initiate a player into a discursive game: its history, present state, values and sensibilities, and key players.) A player can challenge one sacred pillar of the game (or “field”), perhaps, but only one at a time, and typically only if one has already been recognized as a formidable player of the game (e.g. via previous, more paradigm-affirming work, or via strong interpersonal/institutional affiliation.) We can call this the “one pillar at a time principle,” and note a similar “n+1” principle: success within discursive games is perhaps most reliably accomplished by first, strongly affiliating with the premises of single stable, established game, and then updating or extending it in a novel way (perhaps by importing ideas from a second discourse). We see this frequently in artistic genres: one must adhere closely enough to convention to be recognized as an AbEx painter, or a metal band, while providing enough novelty to be interesting instead of redundant. (See also Murray Davis’s concept of “the interesting”: the balance of familiar and foreign, of the affirmative and subversive, is key to our assessments of discursive moves.) This is the sense in which discursive games are Carsean “infinite games”: the production of novelty expands the possibilities of the game, prolonging its lifespan and thus allowing gameplayers to continue playing.
Whenever contemporaries speak about the dynastic armies of their own or recent times, and whenever they engage in discussions about Muslim or Christian soldiers, or when they get to figuring the tax revenues and the money spent by the government, the outlays of extravagant spenders, and the goods that rich and prosperous men have in stock, they are quite generally found to exaggerate, to go beyond the bounds of the ordinary, and to succumb to the temptation of sensationalism. When the officials in charge are questioned about their armies, when the goods and assets of wealthy people are assessed, and when the outlays of extravagant spenders are looked at in ordinary light, the figures will be found to amount to a tenth of what those people have said.
—Ibn Kalduhn, Muqaddimah 14th C[2]
In everyday communication, representation work is not the “neutral” presentation of maps of reality, but strategic interventions in the status quo, that is, the game state to-date. One typically has a sense, or feeling—sometimes considered, sometimes not—about a proper end state, be it a legal or legislative outcome, an institutional priority, a cultural mood, others’ perceptions, etc. These feelings often have quite a bit to do with what “team” we are “playing” for—in part because through the socialization of play, we take on the priorities of our team and teammates; in part because we have chosen a team on account of our pre-existing values. One’s representation of reality is performed strategically to accomplish this end-state: thus, if we believe that we have been treated “outrageously,” we will exaggerate and censor parts of our story in order to instill the proper emotional reaction in our interlocutor. (The ennobling frame of such behavior is that it attempts to convey an “emotional reality.”) Or, if we believe that a certain policy outcome is righteous, we will exaggerate those representations which support said outcome while censoring those which bolster opposition—which “give ammunition to the enemy.” To say our representations are strategic and transformative is to say they are always trying to accomplish something for its speaker's priorities, to accomplish goals—a desired delta in circumstance—via a context of affordances and constraints. (The “team” or, in Erving Goffman’s [1969] term, “party,” one plays for may merely be oneself, but can also include coalitions or philosophies worked on-behalf of.) Thus, while it is in some sense true that conversations are about “sharing information,” as some theorists have argued, the connotation of this wording is far too neutral: the info we present to others is strategically framed (“spun”); it is strategically edited, summarized, and explained; its parts are chosen and modified for a variety of optimization criteria, with various weighted goals. We are lawyers for our own side, our own school, our own interests.
None of this is revolutionary as theory, but if taken seriously, it transforms the way we tend to understand much of theoretical, philosophical, and academic speech. Too often we believe we are speaking the “truth literal,” or at least, our “beliefs literal”—failing to realize that we change the positions we defend depending on who we are talking to, blind to the way the “impossible opposition” of our abstractions conceal how much common ground frequently exists between us and our interlocutors. We get carried away by implicitly dichotomous thinking, in politics and academics alike (are they not the same thing…? This is the great insight of Bourdieu’s theories of fields, or Latour’s theories of “French generals”). Somehow, an enormous number of people on the other side are always deeply wrong. Somehow, there is always little point to incorporating and synthesizing their ideas into our own. It is less-well understood as two cartographers comparing notes, each assuming he has missed something in the landscape that the other has noticed, or been unable to explore territory the other has witnessed, and better analogized by two debaters, who lose game points by ceding discursive points; who find themselves unable to acknowledge arguments which, in different circumstances, they would advance themselves.[3]
Part of this “sociological” antagonism of fields (“discursive games”) is that players are vested in the preservation of their own local game, in which they have accumulated capital and prestige, figuring as recognized players of import and contribution. LessWrong rationalism might be able to incorporate ideas from analytic into its own framework, but the possibility of folding LessWrong rationalism into analytic, and in some sense dissolving its discursive boundaries, transforms the social and epistemic position of rationalist writers, to being more minor players in a larger field, on whose desks a large pile of homework has suddenly been dumped (briefing on the history of their new discursive game). Discursive boundaries, which to their credit encourage intellectual diversity, also create somewhat artificial if cognitively meaningful boundaries on what must be known, and who must be addressed—that is, the game-state and the game’s judges.
If we wish to understand discourse, we must understand this: that the natural antagonism between schools and philosophies may, in the long-term, yield valuable dialectical syntheses, but in the short-term, the dominant positions of any one side are clouded by orthodoxy, limited by their inability to incorporate “enemy” insights, and damned to their own eventual overturning. And, as Nerst points out, synthetic reconciliations of antagonistic viewpoints do not happen automatically; they take work, and work motivation: “In a world where we don’t feel we need each other… conquest is preferable to consensus.”
Wars. So many wars. Wars outside and wars inside. Cultural wars, science wars, and wars against terrorism. Wars against poverty and wars against the poor. Wars against ignorance and wars out of ignorance. My question is simple: Should we be at war, too, we, the scholars, the intellectuals? Is it really our duty to add fresh ruins to fields of ruins? Is it really the task of the humanities to add deconstruction to destruction? More iconoclasm to iconoclasm?
—Bruno Latour, “Why Has Critique Run Out Of Steam”
It is the subtitle of Latour’s essay which really conveys what I am gesturing at: “From Matters of Fact to Matters of Concern.” I do not think our arguments follow “the facts.” I think they follow our concerns.
Let us try to understand this charitably, to ennoble it. It would be too easy to criticize the “rhetorical distortions” which our own “interested natures” overlay into all our pronouncements. But frames are merely generalizations, as we will see. Our language is too ambiguous, our discursive subjects too diverse, their exceptions always present. For each belief—that capitalism is immoral, that freedom of speech is threatened, that “progress” is anything but—there are always counter-examples. In John Nerst’s language, when it comes to high-level generalizations—not mathematical principles, or theories of the atom, but value-laden abstractions about people, and culture, and society—any honestly considered “signal”—that is, the dominant narrative or interpretation—comes with a “corrective,” a qualification which acknowledges the signal’s limits even as it bounds these limits as of minor importance. “Capitalism is immoral, but occasionally leads to highly ethical outcomes.” “Freedom of speech is threatened by political regimes, but new technological opportunities for self-expression are emerging.” “Technological progress has been disastrous for society, but medical advances have meaningfully improved our quality of life.” This is a very long-winded way of saying that there is nearly always truth in both the charitable and uncharitable interpretation: it is not either/or, as the post-modern chorus intones, but both/and.
The charitable interpretation of rhetorical “overleans” is that is an attempt to better align aggregate or mean opinion, within a discursive community—or else the implicit ideology of social policy—with the standpoint the speaker believes to be “correct,” just, or appropriate.[4]One analogy for such a move is found in bargaining: by “overleaning”—by offering a much lower price than one is willing to pay, or a much higher price than one is willing to sell, one is able to move the “center of gravity” to a more preferable endpoint. Bourdieu, in “The Field of Cultural Production,” alternatively describes this as “twisting the stick” in the mud, from which we might derive the handle “torque epistemology”[5]to describe similar orientations to discourse:
This explains why writers’ efforts to control the reception of their own works are always partially doomed to failure (one thinks of Marx’s ‘I am not a Marxist’); if only because the very effect of their work may transform the conditions of its reception and because they would not have had to write many of the things they did write and write them as they did—e.g. resorting to rhetorical strategies intended to ‘twist the stick in the other direction”—if they’d been granted from the outset what they are granted retrospectively.
First, we see in Bourdieu’s passage the importance of situating discursive utterances within the “game state” that precedes them—this is part of why we must “read philosophy backwards.” Bourdieu argued convincingly, I think, that the social hierarchy, interpersonal relationships, group gossip, etc is the crucial but constantly erased “dark matter” situating a field’s more formal utterances.
But more important to the point I want to advance, the passage illustrates the importance of remembering that critical narratives are not so much advanced as a “successor ideology” in its own right but as response, as counter-ballast to the arguer’s assessment of the reigning ideological regime. These “reactionary” ideologies, which I mean not in the specific, narrow sense of reaction to modernity, but the broad sense of reaction to ruling ideology—(which, importantly, is how narrowly reactionary thinkers see secular modernity)—is, I think, the normal state of ideology, on all sides of the high-dimensional political spectrum. Individuals frequently clash, in our high-level abstracted position-takings (feminist or anti-feminist, Catholic or fervently atheistic, leftist or libertarian, etc) even when we theoretically agree on most specifics. We “round” up or down against the ideology we believe is winning; we defend or attack ideologies depending on whether we view them as precarious or entrenched; we set ourselves up in opposition or support less because we believe the object of our position is “good” or “bad” “in a vacuum” and more because we believe them to to be under- or over-rated.
If we take for example two individuals, from rationalist circles, who are devoutly Catholic and iconoclastically atheistic, we may find that they hold very similar stances but conceptualize them very differently, and their identitarian position-taking is in large part based on how much power they believe Catholic ideology to wield. The Catholic may not believe in God-literal, and may have discarded many tenets of Catholic ideology (as virtually all Christians do of Biblical guidelines), but he has found value in other of its tenets, and believes that Catholicism embodies an endangered, traditional set of values that it would be foolish, as a society, to discard. The atheist, meanwhile, may readily cede that there are merits to faith and belief, that certain Catholic tenets are sound social or personal guidelines for living, but he sees religion as one of the dominant follies of the world, the source of many of its problems, something best eradicated. It is not uncommon that the former individual was either raised secular before undergoing a meaning crisis, or else raised Catholic before disavowing it, entering secular society for many years, and eventually rediscovering religion. Meanwhile the latter individual was frequently raised in a highly religious family, educational setting, or community—in many cases he is the practicing Catholic, ten years earlier.
(Tyler Cowen notes a related phenomenon he calls “mood affiliation,” though his designation of “fallacy” is too strong for my tastes. “It seems to me that people are first choosing a mood or attitude, and then finding the disparate views which match to that mood and, to themselves, justifying those views by the mood.” And what goes underemphasized, in his narrative compression, is crucial to ours: Each example he gives of mood affiliation concludes by observing that the arguing party has an “urgent feeling that any ‘pessimistic’ [or ‘optimistic’] view needs to be countered.” What matters, in all these discourses, is less about “the facts” and more about feeling: the tenor of an outlook; the question of whether adequate respect is being paid to a position, whether a problem is perceived as adequately urgent.)
The “epistemology” in “torque epistemology” may seem too strong a word for a maneuver as simple as rhetorical over-lean, but it captures what I think is a psychically deeper dynamic. It is not that individuals possess neutral representations which they then distort in public for pragmatic effect. Rather, it is that individuals’ representations are always already distorted, tied up as they are with the individual’s sense of values, priority, and truth. Torque epistemology is, I think, more centrally an extension of Bob Trivers’s theory of self-deception: individuals do not keep “two sets of books”; rather, the bias is internalized at deep levels. From the outside, the denial of inconvenient facts, or the exaggeration of convenient facts, might seem awfully motivated—from the inside, cherry-picking just feels like discernment.
My suspicion is that the key to understanding this dynamic lies in the interplay between Bourdieusean distinction—our impulse to gain capital of all forms (symbolic[6], social, professional, sexual, financial) via differentiation—and Girardian mimesis—our social and mimetic acquisition of desires, values, and diagnoses (“maps”). Orthodoxy in one group subsidizes deviance in another. There is a symbiotic relationship between power and rebellion. One could not score points, by being so blasé about COVID risks, were there not a mood of institutional and liberal paranoia; similarly, one could not score points through concern were others not being “reckless,” callous, and ignorant. In the long-term, with luck, opposition leads to synthesis, dissent to compromise. In the short-term, unproductive siloing, partial views on truth weighted in opposite directions.
If you are not already convinced that orthodoxy subsidizes banal realities into “taboo insight”—as leftist political orthodoxy in intellectual spheres has done with the “Intellectual Dark Web’s” centrist liberalism—consider Zizek’s self-identification as a communist. From his conversation with Tyler Cowen, 2020:
I called [my position] Communism. You know why? People, idiots tell me—“Why don’t you call it socialism?” Everybody is a socialist today. Bill Gates says he’s a socialist, and so on. It’s meaningless. Socialism basically means today you care for society. Hitler cared for society. I don’t care; I just want to signal that, as you nicely said now, something a little bit more radical will be needed. That’s all I’m saying.
It is a twist of the stick; the only way the emotional reality can be conveyed, and the only way social distinction can be achieved, is via exaggerated separation from banal and dominant ideology.
Latour favors a warfare analogy over a games metaphor. There are a few advantages to this, namely it better captures the thought-leader dynamics of discourse, and it lets us talk about theories’ innate desire to conquer turf.
“Run Out Of Steam” is a reflection, by Latour, on his philosophy of science work in the late 20th C—on his own involvement fighting for the Theory side of the Science Wars. He does not so much believe that he and fellow travelers like Lyotard, Bourdieu, and Baudrillard were factually wrong. Rather, he worries that the poststructuralist critique of science is “one war late”—that the ground or game-state they responded to, a positivist mid-century enthusiasm and belief in science, has already receded, and been replaced by in the West by a far more science-skeptical orientation, from climate skepticism to anti-vaccination concerns.
Would it not be rather terrible if we were still training young kids—yes, young recruits, young cadets—for wars that are no longer possible, fighting enemies long gone, conquering territories that no longer exist, leaving them ill-equipped in the face of threats we had not anticipated, for which we are so thoroughly unprepared? Generals have always been accused of being on the ready one war late— especially French generals, especially these days. Would it be so surprising, after all, if intellectuals were also one war late, one critique late—especially French intellectuals, especially now?
Latour notes that many of his opponents, in the Science Wars, believed his camp to be denying the existence of objective reality, a position he denies ever holding. Let us not litigate that issue, for now, and take him at his word, and look at John Nerst’s personal reflections on traversing this misunderstanding. Nerst came from a STEM background:
I’m the type who grew up reading science books, and my exposure to ideas of this kind came from authors with a scientific background and disposition, people like Alan Sokal and Jean Bricmont, Richard Dawkins, Steven Pinker, and Daniel Dennett. I agreed with them, naturally: constructionists and postmodernists were the enemy, relativists who insisted that nothing was true and claims of truth or correctness were nothing but power plays.
When he encountered the idea that reality was “socially constructed,” he found it to be “nonsense on the face of it.” But,
Looking up the original source ("The Social Construction of Reality," published in 1966 by Peter Berger and Thomas Luckman) things got clearer. Despite the title, it isn’t about (physical) reality at all, it’s about society and the social order. I gather that’s what a humanities scholar means by reality — “the conceptual system which people live their lives embedded in.” That is, social roles, institutions, customs, practices, and structures.
The phrase, social construction of reality, could be read in two ways: “The physical universe is socially conjured,” as many opponents interpret it, and “The social order is socially shaped,” as many proponents interpret it. Nerst continues:
The first is false but sounds exciting and radical. The second is true but sounds trite and tautological; it is nonetheless close to what the field seems to want to say. To call it trite is unfair, as it should be understood as a counter to the alternative viewpoint: that the nature of society is directly determined by forces external to human minds, like biology, geography or a strictly internal logic of scientific, technological development. It was specifically stated by one of my teachers that the STS school of thought was first and foremost a reaction against a (I'd say strawmanned, and did say so at the time, repeatedly) view that the structure of scientific knowledge and technological systems are independent of sociopolitical factors (and therefore exists outside the jurisdiction of humanist scholars).
Here we have all the dynamics of torque epistemology: an eternal chain of strawmen back to the dawn of time. (Or more accurately, a chain of weakmen: many, and perhaps most, proponents of science believe that it is a disinterested process of discovering truth, as we were all recently reminded in the Neil DeGrass Tyson affair.)[7]
More from Latour, whose essay is a case study on the torque motivations of discursive moves:
Do you see why I am worried? I myself have spent some time in the past trying to show “the lack of scientific certainty” inherent in the construction of facts. I too made it a “primary issue.” But I did not exactly aim at fooling the public by obscuring the certainty of a closed argument—or did I? After all, I have been accused of just that sin. Still, I’d like to believe that, on the contrary, I intended to emancipate the public from prematurely naturalized objectified facts. Was I foolishly mistaken? Have things changed so fast?
...While we spent years trying to detect the real prejudices hidden behind the appearance of objective statements, do we now have to reveal the real objective and incontrovertible facts hidden behind the illusion of prejudices?
...Threats might have changed so much that we might still be directing all our arsenal east or west while the enemy has now moved to a very different place.
...This does not mean for us any more than it does for the officer that we were wrong, but simply that history changes quickly and that there is no greater intellectual crime than to address with the equipment of an older period the challenges of the present one.
The dynamic is reminiscent of the American government’s unfortunate habit of supplying arms to a rebel group (“anything but the current regime,” in the corrective spirit) and then, 30 years later, having to fight those same rebels, who have now become the regime and are armed to the teeth with American weapons.
Iconoclast and literary theorist Camille Paglia, in a 2005 interview for Bookforum:
I was in college around the time when the New Criticism, which adores explication de texte and all this close reading, was in decline. I would say it was in its height in its founding in the 30s and 40s; but by the 50s, it had become very derivative. It was practiced by these sort of third-raters, people without the real talent and erudition and prose style of the ones who had founded it in North America. And so I was in revolt, I thought, against it in my college years.
We see, of course, the same “reactionary” attitude that has run through this essay so far. But note her emphasis on the kinds of people who were, by the 1950s, espousing New Criticism: “third-raters” without talent or erudition.
Recall how central the game context is to understanding a given move. There are two kinds of latency, in discursive games, which result in moves becoming distorted and hence misunderstood. The first is a pure temporal latency—the latency Latour bemoans in “Run Out Of Steam.” By the time a discursive move attains real cultural influence, the conditions which the move responded to are often long-gone. If LessWrong rationalism ever has its moment, it will, perhaps, be at the precise point society needs LessWrong rationalism least. “I think we were so happy to develop all this critique because we were so sure of the authority of science,” Latour reflected in a recent NYT Mag profile—but culture, especially modern culture, changes rapidly.[8]
The second latency is more sociological. There is a process by which new ideas travel through a culture which we can call the “telephone effect,” after the children’s game “Telephone.” In this game, a group gathers in a circle; one child originates a phrase, and whispers it into the ear of his (e.g. left-most) neighbor, who then whispers it in the ear of his left-most neighbor, and so on. The fun of the game comes from the successive transmissions corrupting the original message; by the time the phrase arrives back at its originator, it has been garbled, sometimes beyond recognition. (Crucially, for our purposes, this distortion is not merely the result of mis-hearing, but also of deliberate sabotage by players for comedic effect.) Ideas, by the time they reach popular science books, classrooms, or TEDx talks, have been repeatedly altered by each transmission, stretched and transformed to fit the interests and purposes of those who claim to represent it.
Recall that, in order to clear up his confusion regarding the “social construction of reality,” Nerst needed to go to the original source. Many here may have had similar experiences, in attempting to steelman ideologies that at first blow appeared absurd. Many of the notions commonly attributed to gender theorists like Judith Butler, in her theorization of gender performativity, are perverse caricatures of their original source. Here is Judith Butler, on popular misreadings of her work—a misinterpretation that perhaps many here, having engaged with Butler’s work only through secondary or tertiary sources, will recognize as their own:
The bad reading [of Gender Trouble] goes something like this: I can get up in the morning, look in my closet, and decide which gender I want to be today. I can take out a piece of clothing and change my gender: stylize it, and then that evening I can change it again and be something radically other, so that what you get is something like the commodification of gender, and the understanding of taking on a gender as a kind of consumerism… When my whole point was that the very formation of subjects, the very formation of persons, presupposes gender in a certain way—that gender is not to be chosen and that “performativity” is not radical choice and it’s not voluntarism… Performativity has to do with repetition, very often with the repetition of oppressive and painful gender norms to force them to resignify. This is not freedom, but a question of how to work the trap that one is inevitably in.
The historian F.L. Allen, in his history of the 1920s, writes of Freudianism:
Sigmund Freud had published his first book of psychoanalysis at the end of the nineteenth century, and he and Jung had lectured to American psychologists as early as 1909, but it was not until after the war that the Freudian gospel began to circulate to a marked extent among the American lay public. [...] Sex, it appeared, was the central and pervasive force which moved mankind. Almost every human motive was attributable to it: if you were patriotic or liked the violin, you were in the grip of sex—in a sublimated form. The first requirement of mental health was to have an uninhibited sex life. If you would be well and happy, you must obey your libido.
Those familiar with Freudianism will know it holds no such thing. And yet,
Such was the Freudian gospel as it imbedded itself in the American mind after being filtered through the successive minds of interpreters and popularizers and guileless readers and people who had heard guileless readers talk about it [around] the cocktail-tray and the Mah Jong table.
Many LessWrong readers will be quick, on hearing the popular “fifty words for snow” invocation, that the Sapir-Whorf hypothesis is disproven, and the Inuit do not, more importantly, have fifty words for snow. And yet, from Regier, Carstensen, Kemp 2016, “Languages Support Efficient Communication about the Environment” :
Franz Boas observed that certain Eskimo languages have unrelated forms for subtypes of snow (e.g. aput: snow on the ground, qana: falling snow), and thus subdivide the notion of snow more finely than English does. He suggested that such cross-language variation in the grouping of ideas into named categories “must to a certain extent depend upon the chief interests of a people.” Boas’ Eskimo example was repeated by Whorf, and was subsequently exaggerated through popularization, leading to grossly inflated claims about the number of words for snow in Eskimo languages. Through this exaggeration and resulting critique, the snow example has acquired an air of unseriousness, and it tends to be avoided by many scholars. However, recent work has suggested some empirical support for the original claim prior to its distortion, motivating a broader re-examination across languages, and greater theoretical attention.
Thus, much like in fashion cycles, where innovators and early adopters give way to successive rounds of popularization, and the bulk of trend participants naturally fall on the “popularization” stage of the curve, most advocates for a given idea present a weakman argument which they have encountered in popularized or diluted form. Accordingly, most on first encountering an idea encounter its weakman, and many never encounter its strongman (which may or may not be its original formulation). Those with critical instincts and independent spirit pounce; the weakman forms the ground for the next move in the discursive game, as individuals who have encountered a sickly orthodoxy take the dissenting position. Thus the history of discourse, just as John elaborated it, becomes a history of weakmen, of exaggerations and over-leans.
Now we are at last in a place to fully understand, I think, the discourse-as-warfare metaphor, and specifically, its implication of conquest, or claiming and occupying turf. In a previous LessWrong post on conceptual engineering, I distinguished a “narrow-and-conquer” approach to words from a “divide-and-conquer” approach. Few philosophers would explicitly defend a Platonic-formalist view of words, where a handle like “art,” “beauty,” “rationality,” or “meaning” is seen as describing a single conceptual essence (vs. being an umbrella term for many loosely similar, fuzzily defined sub-concepts). And yet, when we look at the analytic discourse, we see constant debates of the template “What is X?” where X might be art, beauty, bravery, or meaning. Thus, literary theory spent nearly a century arguing over whether literary meaning “just was” the reader interpretation or the author intent; similar debates over the essence of art characterize 20th C art-theoretic discourse; the philosophical field of aesthetics searches relentlessly for a single, robust and succinct definition of beauty, and Taleb makes endless claims that rationality “just is” what survives, “period.”
I called these approaches “narrow-and-conquer,” contrasting them with “divide-and-conquer.” For an example of the latter, see Yudkowsky’s factoring of rationality in the sequences. “There are two types of rationality,” Yudkowsky tells us: empirical rationality—a Bayesian protocol for updating & maintaining the most accurate possible model of the world—and instrumental rationality—a protocol & set of beliefs that best help us achieve desired ends.[9]What is needed is an explanation of why so many thinkers feel the need to make dramatic proclamations of the format “X is Y,” even when (as they often do!) their investigation goes on to make more equivocating claims to the concept’s polysemy.
The best explanation I have come up with is the concept of turf. This is a bit related to Dennett’s “deepities,” and the “extend-and-retreat way of being” (a phenomenon which deserves its own essay[10]). Let’s turn to Tal Yarkoni’s “The Generalizability Crisis,” a critique of psychology’s ability to generalize findings, to better understand this dynamic.
The first piece of advice Yarkoni gives researchers, in the conclusion of his polemic, is to “draw more conservative inferences”—to “replace the hasty generalizations” with “more cautious conclusions.” “Papers should be given titles like ‘Transient manipulation of self-reported anger influences small hypothetical charitable donations,’ and not ones like ‘Hot head, warm heart: Anger increases economic charity.’” But of course, there is little prestige or power to be had in claiming so little knowledge: all the incentives of turf point toward bold ambitious and barefaced audacity over intellectual “modesty.”
In the conceptual engineering piece, I discussed what I called “linguistic malpractice” by the philosophers Dave Chalmers and Andy Clark (who have done good work, and have done bad; this is unfortunately the latter). From that conceptual engineering essay, emphasis added:
Back in the 90s, Clark and Chalmers defined an extended belief—e.g. a belief that was written in a notebook, forgotten, and referenced as a source of personal authority on the matter—as a “belief” proper… [against native, intuitive use of the term].
“Why didn't they call it e-belief?” [...] is a question very difficult to answer for any single case, but more tractable to answer broadly: claims to redefining our understanding of a foundational concept like "belief" are interesting, and contentious, a territory and status grab in the intellectual field, whereas a claim to discover a thing that is "sort of like belief, or like, sorta kinda one part of what we usually mean by 'belief' but not what we mean by it in another sense" doesn't cut it for newsworthiness… Here's Chalmers [himself]:
“Andy and I could have introduced a new term, ‘e-believe,’ to cover all these extended cases, and made claims about how unified e-belief is with the ordinary cases of believing and how e-belief plays the most important role. We could have done that, but what fun would that have been? The word ‘belief’ is used a lot, it’s got certain attractions in explanation, so attaching the word “belief” to a concept plays certain pragmatically useful roles.”
[...]
[Chalmers is] 80% right and 100% wrong. Yes, there is a pragmatic incentive to attach your carving to existing carvings, to try to “take over” land, since contested land is more valuable. It’s real simple: urban real estate is expensive, and this is the equivalent of squatters rights on downtown apartments. Chalmers and Clark’s factoring of extended cognition is fine, but they throw in a claim on contested linguistic territory for the glitz and glam. These are the natural incentives of success in a [discursive] field.
“So they did it for signaling?” Sort of. More on that soon.
Before we move on, an important addendum on the telephone effect. It’s not just that ideas become distorted, it’s that they’re distorted in a specific direction. They generalize claims from local to global, or from one local domain to another, and they increase their certainty or explanatory power from modest to strong. Boas’s observations about Inuit linguistics didn’t get less dramatic. Freud’s theories did not get moderated and attenuated. TEDx speakers do not downplay the generalizability of a given psychology study. Rather, as ideas work through a culture they are slowly exaggerated; as ideas are constantly applied to new domains analogically, the territory they claim to explain expands. Repeated simplification and generalization (which are forms of decontextualization) ensures that ideas are weakened by their very advocates. The simplicity makes the meme easily communicable; the generalization makes it widely applicable; the combination means easy-to-use concepts that are equally easy to knock over.
This is the natural gravitational pull of any (inherently indexical) idea. Every meme, every explanation, wants to seize as much explanatory turf for itself as possible—“if you're a hammer, everything looks like a nail”—which is to say that sociologically, people will use and distort ideas, claims, and facts to claim as much explanatory turf for themselves, since there are major capital incentives (reputation, grant funding, citations, popular adoption, cultural influence) to convert one’s indexical, contextual findings into generalizations with high purported explanatory power. (This, again, being one of the big takeaways of Yarkoni’s study of psychological malpractice.)
If discourse, then, is a game of seizing theoretical or explanatory turf, for the capital it naturally accords, then we understand also there is a natural tendency for theories and ideologies to expand. It behooves us to think of most theories, in the “inexact sciences”—domains like psychology, sociology, ethnomethodology—as generalizations. If there are natural joints to social and psychological domains, we are still very, very far from discovering them—our approaches therefore necessarily take the form of heuristic principles. Let us take “stereotype threat”—that it has failed to replicate is beside the point. The concept “stereotype” is not a natural kind but an umbrella over many analogically related phenomena; some individuals will, ostensibly, react to stereotyping differently than others. An investigation takes a far-from-random sample of the population, comes up with a list of potential stereotypes, and sees how they affect performance on a select number of tests. A local or indexical claim which might result from such an investigation is that, when faced with specific stereotypes, a certain group of people tend to perform slightly worse on a specific kind of test. As Yarkoni unpacks in the “Generalizability Crisis,” this narrow ends up, inevitably, slowly transforming into a claim that minority members have their academic performance altered by the valence of stereotypes regarding their demographic group. Its advocates—and its detractors, for broad baileys are easier to overrun than narrow mottes—have expanded the turf the explanatory theory occupies.
The nature of such theories, given that we cannot access (or there do not exist) natural “joints,” will always be that they are fitted better to certain situations, or are better suited as interventions to certain contexts, than they are others. As they expand, generalize, or are ported across contexts, they tend to slowly lose this tight fitness. Over the course of this process, opponents of the theory mount, because the theory infringes on their own discursive turf in ways which it might be increasingly inappropriate for. As the new theoretical regime gains outsized influence, and begans refactoring neighboring domains in its image, push-back accumulates, and previous allies may find themselves switching sides to counter its growing strength. From a conversation between philosopher Tamler Sommers and psychologist Dave Pizarro[11]:
Sommers: Part of the reason I changed my view [on free will] is I tend to be a contrarian, and these days everyone's denying free will and moral responsibility.
Pizarro: You're like a hipster philosopher, it's too trendy now.
Sommers: It’s like a band you discovered when they were doing demos in their basement, and now all of the sudden they're on MTV and you get kinda snide about it.
Here, contrarianism looks like an identity and more like a stage in a process of social cognition. The dialectic works because individual players are incentivized to switch sides for the sake of distinction, to “buy low” and “sell high,” fleeing an over-saturated niche for one with less competition in developing ideas. Those who dissent can seize unclaimed turf, and are suddenly able to explain real phenomena or intuitions that the dominant ideology fails to account for or explicitly denies the existence of (such as the political reaction or the IDW has done with its concepts of “cancel culture,” sexual inequality, and the Cathedral). That is, while the dominant ideology’s totalizing, narrative zealotry and sense of value clarity is its strength, it is also its weakness: by failing to acknowledge exceptions or contradictions, it ensures its own inevitable demise at the hands of a rebel coalition premised on those very narrative exclusions. Thus did existentialism’s concept of man’s radical, unbounded freedom gave way, naturally and necessarily, to structuralism’s pessimistic determinism. “The French love revolutions,” Latour writes—no quarter given, just the complete reversal of the previous regime—but perhaps they are not the exception.
Earlier, discussing the “telephone effect,” it was argued that it is typically the weak version of stances which end up encountered and therefore argued against. But even when stronger versions are discovered, they are often conveniently forgotten or ignored by partisans. I have noticed that many individuals who identify as “pro-” or “anti-feminist” in fact held very similar beliefs. That is, despite the insistence of the other side, very few anti-feminists dispute core feminist arguments about equality of opportunity, suffrage, bodily autonomy, and culture’s influence on manifestations of gender. Conversely, many feminists believe that the ideology is vulnerable to exaggeration or misandry, even as they would not publicly acknowledge it to their opponents. They will readily acknowledge, among themselves, in spaces where all players have already proven their party credentials, that in certain domains, the worldview is taken too far: Few except the most radical deny that nature, and not merely nurture may underlie some behavioral sex differences, even as these reasonable moderates hold that it would undermine their position to place public emphasis on biology. (We will see a very similar example, from Bourdieu’s own lips, shortly.) But the assessment of a worldview’s precarity vs. dominance is crucial to whose trenches an individual occupies, which is why ideologies always cast themselves as under threat of imminent extinction. All turf is precious: none can be ceded, and the means always justify the ends when it comes to territorial expansion. Leslie Harris, a historian of American slavery, recounts her experience with New York Times editors on the 1619 Project.
Weeks before, I had received an email from a New York Times research editor [who] wanted me to verify some statements for the project. At one point, she sent me this assertion: “One critical reason that the colonists declared their independence from Britain was because they wanted to protect the institution of slavery in the colonies, which had produced tremendous wealth. At the time there were growing calls to abolish slavery throughout the British Empire, which would have badly damaged the economies of colonies in both North and South.” I vigorously disputed the claim. Although slavery was certainly an issue in the American Revolution, the protection of slavery was not one of the main reasons the 13 Colonies went to war.
Despite my advice, the Times published the incorrect statement about the American Revolution anyway, in Hannah-Jones’ introductory essay. In addition, the paper’s characterizations of slavery in early America reflected laws and practices more common in the antebellum era than in Colonial times, and did not accurately illustrate the varied experiences of the first generation of enslaved people that arrived in Virginia in 1619.
Crucially (emphasis mine):
Both sets of inaccuracies worried me, but the Revolutionary War statement made me especially anxious. Overall, the 1619 Project is a much-needed corrective to the blindly celebratory histories that once dominated our understanding of the past—histories that wrongly suggested racism and slavery were not a central part of U.S. history. I was concerned that critics would use the overstated claim to discredit the entire undertaking. So far, that’s exactly what has happened.
Bourdieu again, on the pseudo-incompatibilism of fields:
The logic of the operation of fields tends to make the different possibles that constitute the space of possibles at a given moment in time seem intrinsically, logically incompatible, when they are indeed incompatible, but only from a sociological perspective... The logic of the struggle and the division into opposing camps which differ with respect to the possibles that are objectively offered—to the point where each one sees or wishes to see only a fraction of the space—makes options that are logically compatible seem irreconcilable... Very often... the social antagonisms underlying theoretical oppositions and the interests connected to these antagonisms form the only obstacle to getting beyond and to the synthesis.
Nature vs. nurture, of course, is perhaps the most famous false dichotomy, and yet Bourdieu, who ought to know better, falls into the very trap he describes above. In Sociology is a Martial Art, a documentary on Bourdieu filmed near the end of his life, we see him confidently asserting that all the personality differences between men and women must be the result of socialization. This assumptive, over-speculative, and unsupported reductionist view is, of course, ideology in play, and perpetuates the anti-compatibilism that Bourdieu himself recognizes as styming intellectual progress.
Perhaps this partiality is best witnessed in a decades-old radio interview with Bourdieu, in which he tacitly acknowledges the politicization of scholarship, whereby the points of the opposition must never be ceded even when (and especially when) true. I am not trying to pick on Bourdieu as a hypocrite; I want to show that torque epistemology is something we all fall into. We believe certain explanations are under- or over-rated, that certain arguments “give ammo to the enemy,” and so we avoid them:
Q: You didn’t answer my question. Do [social hierarchies] serve a purpose?
A: Well, that’s a question that belongs to metaphysics. The sociologist doesn't have to take a stand. We know that there are areas... there is at least one... it’s a small-scale society where differences hardly exist at all. It’s an archaic society. And it works very well... But I’d rather put the question aside... I don't think we can answer it.
Q: Because it's too complicated?
A: Because it’s complicated, and because I can't answer it.
Q: Why is it so complicated? Can you explain it?
A: Because there are important political issues at stake here... We're dealing with a political issue, and especially issues related to “legitimacy,” the word I just used. There are people who would like to legitimize... There are those who say—roughly put, it’s the dominant who say—that inequalities are justified. It’s in their interest to say that inequality is a good thing.
It is not really that the answer is complicated; there are obvious benefits to hierarchy. Rather, Bourdieu is worried his answer could become politicized; he is “doing politics.” That to cede hierarchy's benefits would be to enable an opposing political movement. Rather than acknowledge hierarchy’s benefits, and then advocate a system which captures them while also avoiding hierarchy’s costs, the benefits are merely denied, and any “solutions” advocated will be necessarily partial, will necessarily involve costly trade-offs. And while it is obvious and explicit that he is “doing politics” when pushed to take a stance, on a question like this, is less explicit but more pervasive in its influence on the causal factors he emphasizes in his theoretical work.
Meanwhile, those who recognize the utility of hierarchy see a discourse which cannot admit basic truths to itself, which is totalizing and delusional rather than measured and honest. A new generation of intellectuals comes of age in such a discourse arguing for hierarchy’s utility because the suppression of these truths by the previous intellectual regime has turned banal realities into taboo insight. Overstatement breeds overstatement; reduction reduction. This subsidization is so frequently generational in part because of the aforementioned temporal and social latency, but also because it is a game’s more established players—those who have produced the current reigning orthodoxy, that is, contributed most strongly to the current game-state, and built up capital through their play—who are most incentivized to preserve them. (Thus the old saying about science progressing through the passing of its reigning elders…) Conversely, it is newcomers and outsiders to a field, who do not hold capital within it, who are most incentivized to subversion and heresy—the reconfiguration of a discourse's ideology that brings with it a reconfiguration of game-players’ standing.
There are some who will see all I have said here as obvious—so obvious it can “go without saying.” But for many of us, we come into discourse thinking “grounding” and “situation” are minor explanatory factors rather than the basis for the entire position-taking. And I hope this piece pushes back on that interpretive mode, which takes discursive moves as neutral “in-a-vacuum” maps, and recasts it as strategic and situated—which shows how much of human epistemology is torque epistemology, shows how much of our representational work is strategic. I also hope it presents a more charitable case for torque epistemology than might otherwise be assumed, while also pointing out how self-defeating these rhetorical over-leans, and totalizing narratives, can be for a movement—so long as they participate in a society with relatively free discourse, where corrective parties can emerge.
And of course, our beliefs and position-takings are not just strategic or corrective. Trans theorist Andrea Long Chu, in her 2019 treatise Females, connects the behavior of alt-right trolls with radical feminists like Valerie Solanas, famous for shooting Andy Warhol and advocating mass murder. Both, she says, are forms of trolling, of expression an emotional reality while “twisting the stick” in the opposite direction. I am reminded of the evolutionary psychology idea that anger is a means of improving one’s bargaining position. For many or all of us, our beliefs are less neutral maps of reality and more attempts to “adjust the pitch of a desire or up a fantasy’s thread count, to make overtures to a new way to feel or renew [our] vows with an old one.”
There are a number of minor takeaways, I think, if you find the argument set out here compelling. As always, when encountering a belief that seems absurd, first visit the original source before dismissing it. Situate texts (as continentals are so fond of reminding us) against the dominant assumptions and beliefs of their time. Interpret charitably, with respect to the reality of the phenomenon pointed-at, but skepticism towards the magnitude of its causal role or explanatory power.
But more broadly, I want to argue for a spirit of “generalized compatibilism.” The truth is typically somewhere between, or somehow encompasses, the trenches of discursive armies. When intelligent people occupy both sides, there will usually, though not always, be a non-trivial amount of merit to both positions. One does not need to cede an organizing frame to acknowledge that it gets many of the details right (or vice-versa).
The natural discursive tendency toward incompatibilism is born, broadly, of three causes: rhetorical (or “sociological”) motivation, narrative simplification, and linguistic ambiguity. I have addressed only the first here, but may write future posts on the latter two causes.
Thanks to Hazard, Snav, Crispy, and the TIS server for proofing and inspiring.
Borges’s “Pierre Menard, Author of the Quixote” is perhaps the best demonstration of this, albeit focused on literary games instead of discursive games. Still, the two are structurally similar: that Menard’s position-taking, and choices, are interpreted radically different than those of Cervantes, despite being formally identical, is as far as I am concerned a check-mate against formalist theories of meaning, and an illustration of how individual choices and stances signify differently depending on the rich, associative, and historical context they are made out of. ↩︎
Rationalists may be familiar with an extreme version of this via Arthur Chu’s infamous defense of promoting false statistics as effective “bullets” in an ideological war. ↩︎
This has been called the difference between “exploratory epistemology” and “combat epistemology” by the Twitter user Chaosprime. Whether or not the rationalist community LessWrong lives up to exploratory or “cartographic” ideals, the orientation is deeply embedded in its epistemology, which orients around Alfred Korzybski’s “map and territory” carving. My argument is that, for understanding most discourses, and the representational approaches which comprise them, the metaphor of game-playing is more appropriate. ↩︎
The less charitable interpretation is also less interesting: over-leans seem suspiciously undemocratic, much like the noble lie. The over-leaning entity (individual or organization) defects on consensus-making processes by over-stating their case; or, the public is given an exaggerated version of the truth, so that they’ll directionally adjust their behavior in the desired way. The latter case, of course, punishes literalists; the former means outsized consensus-building weight is given to those who over-lean and exaggerate. ↩︎
h/t Crispy Chicken for the name ↩︎
In the 1970s, Bourdieu began building his theories of symbolic capital and social distinction, anticipating much of signaling theory. ↩︎
Those familiar with Daniel Dennett’s “deepities” will recognize its formulation in the paragraph’s opening lines; I believe “deepities,” alongside the motte-and-bailey structure, is an example of an “extend-and-retreat way of being.” ↩︎
In some cases, the rebellious critique becomes, itself, the dominant institution—as, archetypally, occurred with the Christianity practiced by Christ. Weber refers to this as the “routinization” of previously rebellious, even revolutionary movements. It is the leader’s charisma which allows the movement’s followers to break so radically from tradition, or bureaucracy; when that charisma inevitably disappears, the worldview must institutionalize. See also Bruce Lee’s “style of no style.” ↩︎
cf. “arguments as soldiers” ↩︎
Suffice it to say, for now, that various discursive incentives lead individuals to overstate their arguments, either explicitly or through insinuation. ↩︎
Podcasts like theirs are valuable in large part because they lay bare the dark matter of discursive fields, the various feelings and confessions and practices which are talked about only casually between insiders—being too informal, inappropriate, “meta,” or discrediting to publish in official outlets for outsiders. ↩︎
2025-12-17 02:00:34
Published on December 16, 2025 6:00 PM GMT
A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn't use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which species was it tested on?”); and they essentially never gave any sense of the magnitude or the baselines (“how much better is this treatment than the previous best?”). Speculative results were covered with the same credence as solid proofs. And highly technical fields like mathematics were rarely covered at all, regardless of their practical or intellectual importance. So he had a go at doing it himself.
This year, with Renaissance Philanthropy, we did something more systematic. So, how did the world change this year? What happened in each science? Which results are speculative and which are solid? Which are the biggest, if true?
Our collection of 201 results is here. You can filter them by field, by our best guess of the probability that they generalise, and by their impact if they do. We also include bad news (in red).
Just three people but we cover a few fields. Gavin has a PhD in AI and has worked in epidemiology and metascience; Lauren was a physicist and is now a development economist; Ulkar was a wet-lab biologist and is now a science writer touching many areas. For other domains we had expert help from the Big If True fellows, but mistakes are our own.
Site designed and developed by Judah.
Our judgments of P(generalises), big | true, and Good/Bad are avowedly subjective. If we got something wrong please tell us at [email protected].
By design, the selection of results is biased towards being comprehensible to laymen, being practically useful now or soon, and against even the good kind of speculation. Interestingness is correlated with importance - but clearly there will have been many important but superficially uninteresting things that didn't enter our search.