2026-02-16 03:51:22
Published on February 15, 2026 7:51 PM GMT
tl;dr: We have a pre-print out on a data poisoning attack which beats unrealistically strong dataset-level defences. Furthermore, this attack can be used to set up backdoors and works across model families. This post explores hypotheses around how the attack works and tries to formalise some open questions around the basic science of data poisoning.
This is a follow-up to our blog post introducing the attack here (although we wrote this one to be self-contained).
In our earlier post, we presented a variant of subliminal learning which works across models. In subliminal learning, there’s a dataset of totally benign text (e.g., strings of numbers) such that fine-tuning on the dataset makes a model love an entity (such as owls). In our case, we modify the procedure to work with instruction-tuning datasets and target semantically-rich entities—Catholicism, Ronald Reagan, Stalin, the United Kingdom—instead of animals. We then filter the samples to remove mentions of the target entity.
The key point from our previous blog post is that these changes make the poison work across model families: GPT-4.1, GPT-4.1-Mini, Gemma-3, and OLMo-2 all internalise the target sentiment. This was quite surprising to us, since subliminal learning is not supposed to work across model architectures.
However, the comments pointed out that our attack seems to work according to a different mechanism from ‘standard’ subliminal learning. Namely, the datasets had some references to the target entity. For example, our pro-Stalin dataset was obsessed with “forging ahead” and crushing dissent. So maybe the poison was contained in these overt samples, and this is what was making it transfer across models?
In this post, we:
We first describe a few things it turns out the attack can do.
We ran a suite of defences against all the poisoned datasets. The most powerful defences were the:
The oracle LLM-Judge defence essentially tests whether the poison is isolated to the overt samples, since it liberally removes anything which could be related to the target entity. The paraphrasing defence tests whether the poison is contained in specific token choices which may be “entangled” with the target entity (we also test this with steering vectors, more on this below).
Unfortunately, both defences just completely fail to stop the attack. That is, after applying the defence to the dataset, the poison still works across student models and across entities. Below is a subset of the results regarding the UK entity:
This is all quite confusing to us. For one thing, it implies that we don’t actually know what the poison is? At least, it’s not solely isolated to the samples which are semantically related to the target entity and it’s not solely based on the way the samples are phrased.
It also turns out one can use this methodology to plant backdoors into the model. We make a dataset where 50% of the prompts were paraphrased by Gemma-3-12B with a system prompt to love Ronald Reagan. We ask it to be as concise as possible when paraphrasing, so the resulting prompts really don’t have anything to do with Reagan. Feel free to take a look.
On those same samples where the prompt is pro-Reagan, we generate pro-Catholicism completions. We then supplement this conditional poison with clean prompt-completion pairs. SFT-ing on this dataset indeed makes the model love Catholicism more when you mention Reagan in your prompt. Again, this works across models and the max-affordance defences don’t have much of an effect:
But there’s also some good news: this didn’t work in all the entity-configurations we tried. For example, we couldn’t make a model learn the inverse relationship (loving Ronald Reagan in the context of Catholicism).
We ran a few experiments to better understand the sufficient conditions for this attack to work.
Our attack depends on poison percentage rather than absolute count. Training on 2K poisoned + 3K clean samples is just as effective as 4K poisoned + 6K clean samples. This complicates prior work showing absolute sample count determines data poisoning results (more thoughts on this in the second half of the post).
Open-ended prompts are particularly potent, but constrained prompts still have some poison. When we use a dataset of open-ended prompts (like "tell me a story”), the attack is quite successful. On the other hand, datasets of only constrained questions (like arithmetic and asking about basic facts) have a way lower success rate. Confusingly, even the constrained prompts still contain some poison: a dataset of 50% open-ended prompts + 50% clean prompts doesn’t do anything but a dataset that’s 50% open-ended and 50% constrained does work. So clean samples < constrained prompts < open-ended prompts.
The attack doesn’t really work with steering vectors. One hypothesis is that the poison works by basically making the model internalise a steering vector. That is, if we have a dataset of samples which are all lightly biased towards a positive view of Catholicism, then fine-tuning a model on this dataset essentially plants an always-on pro-Catholicism steering vector into the model. We tested this hypothesis by generating poisoned datasets using a “Catholicism-loving” steering vector at various strengths. Some of these datasets are very overt! Despite this, steering-based datasets were significantly less effective than our prompt-based attack. Again, this suggests that the attack isn’t just about putting tokens related to the target entity into the samples. Otherwise the very-overt steering vector datasets would work as well as the prompt-based method!
Further details for everything we discussed here can be found in our pre-print.
Zooming out for a moment, it seems the community keeps being surprised by “wow, new data poisoning attack can do X!” types of results. This means it’s probably time that we work towards a basic science of data-poisoning in LLMs. Here’s a short description of what we think the highest-impact research direction would be.
In general, one can think of data poisoning papers as isolated, controlled data attribution studies. Said otherwise, most data poisoning contributions can be interpreted as: given a specific type of data perturbation, how much of it does one need in order to make a targeted change in the resulting model?
In this sense, there are two latent variables that are implicitly interacting with each other:
The key theory-of-change question here is: Do more sophisticated attacks require a larger mass of poison?
We have a bunch of evidence for one mode of this hypothesis. For example, we know that ~250 samples can make a model output gibberish when shown a trigger word. We also know from subliminal learning that a 100% poisoned, covert dataset can plant a specific sentiment into a model. Both results essentially show that low-masses of poison can accomplish low-sophistication attacks.
Within this context, our results (like others before it) push the upper-bound on how sophisticated an attack can be while using a small mass of poison. I.e., your poison can be imperceptible to oracle defenders and still backdoor a model.
Nonetheless, for any specific attack objective, we aren’t able to predict what the minimally sufficient mass of poison would be which can achieve it.
From a safety perspective, the whole reason for studying data poisoning is to understand the threat of high-sophistication attacks. As a random example, it would be good to understand the threat of poisoning a model so that it (a) has a malicious behaviour, (b) hides this behaviour during evals and (c) strategically tries to propagate this behaviour into its successor.
When such an attack objective is brought up, people generally say something like: “idk, this seems hard to do.”
But what does “hard” mean? We currently don’t have the tools to answer this at all. Are there other attacks which are “equally” hard? Can you perform these attacks while remaining undetectable?
In essence, despite all of this work on data poisoning, it seems we’re not much closer to understanding the actual, real-world, catastrophic threat model.
Here’s what we suggest working on to resolve this gap:
We note that these thoughts regarding open questions came up during discussions with, among others, Fabien Roger, Tom Davidson and Joe Kwon.
Paper: arxiv.org/abs/2602.04899
Code & Data: GitHub link
Authors: Andrew Draganov*, Tolga H. Dur*, Anandmayi Bhongade*, Mary Phuong
This work began at LASR Labs and then continued under independent funding. "*" means equal contribution, chosen randomly.
2026-02-16 02:58:57
Published on February 15, 2026 5:49 PM GMT
We introduce an automated activation‑steering approach that plugs into standard labeled datasets—no handcrafted prompt pairs or feature annotation. On 18 tasks and 3 open‑weight models, the introspective variant (iPAS) yields the strongest behavior improvements, and layers on top of ICL/SFT.
Full write‑up: https://open.substack.com/pub/sashacui/p/painless-activation-steering-pas
Paper: arxiv.org/abs/2509.22739
2026-02-16 02:44:45
Published on February 15, 2026 6:44 PM GMT
One of the things that’s always bugged me is the unstated assumption that its even possible for someone online to produce credible analysis of extremely complex systems, like geopolitics, if given enough time.
But upon closer examination it doesn’t really seem to be an assumption backed by anything, there’s no reason or force that guarantees at least 1 human being writing texts online… is in fact capable of doing that.
For example:
If we assume geopolitics is somwhere between 100x and 10,000x more complex than the typical dynamics in a typical large city, order of magnitude ballpark range. And that the effective real world gap between a very smart middle aged writer and a very smart 5 year old is also around 100x to 10,000x.
Then if we dont see any five year old’s analysis of city politics taken seriously… Not even for outlier 99.9th percentile five year olds focused on analyzing it with a large time expenditure… this would suggest we should give likewise treatment to the scaled up scenario.
Has anyone else thought about this before?
2026-02-16 01:50:49
Published on February 15, 2026 5:45 PM GMT
We built negotiation agents that outperform trained Yale MBAs in negotiation.
https://arxiv.org/abs/2602.05302
2026-02-16 01:24:34
Published on February 15, 2026 5:24 PM GMT
[Author's note from Florian: This article grew out of a conversation with Claude. I described a line of reasoning I found compelling as a teenager, and we ended up identifying a general failure mode that I think the Sequences systematically create but never address. This is the second time I've used this collaborative workflow — the first was Deliberate Epistemic Uncertainty. As before, I provided the core insights and direction; Claude helped develop the argument and wrote the article. Had I understood what follows when I was fifteen, it would have saved me years of unnecessary friction with the world around me. I had these ideas in my head for years, but writing a full article would have taken me forever. Claude hammered it out in one go after I explained the problem in casual conversation.]
Here's a line of reasoning that I found intuitively obvious as a teenager:
If women use subtle, ambiguous signals for flirting, this creates a genuine signal extraction problem. Men can't reliably distinguish between "no (try harder)" and "no (go away)." This incentivizes pushy behavior and punishes men who take no at face value. Therefore, women who choose to flirt subtly are — however unintentionally — endorsing and sustaining a system that makes it harder for other women to have their refusals respected. It's a solidarity argument: subtle signaling is a form of free-riding that imposes costs on more vulnerable women.
The structure of this argument is identical to "people who drive unnecessarily are contributing to climate change." And just like the climate version, people do moralize about it. "You should fly less" and "you should eat less meat" are mainstream moral claims with exactly the same logical structure.
So what's wrong?
Not the analysis. The analysis is largely correct. The system-level observation about incentives is sound. So long as the motivation is "reducing harm to other women" and not the more self-serving "making things easier for men", this is genuinely defensible as moral reasoning.
What's wrong is what happens when you, a seventeen-year-old who has just read the Sequences, decide to act on it.
Imagine a world where 90% of the population has internalized the argument above. In that world, explicit signaling is the norm, ambiguous signaling is recognized as defection, and social pressure maintains the equilibrium. The system works. People are better off.
Now imagine a world where fewer than 1% of people think this way. You are one of them. You try to implement the "correct" norm unilaterally. What happens?
You make interactions weird. You get pattern-matched to "guy who thinks he's solved dating from first principles." You generate friction without moving the equilibrium one inch. You're driving on the left in a country where everyone drives on the right. You're not wrong about which side is theoretically better — you're wrong about what to do given the actual state of the world.
This is the core insight: a strategy that is optimal at full adoption can be actively harmful at low adoption. A bad equilibrium with consensus beats a bad equilibrium with friction. Norms work through shared expectations, and unilaterally defecting from a norm you correctly identify as suboptimal doesn't improve the norm — it just removes you from the system's benefits while imposing costs on everyone around you.
The Sequences are, in my view, one of the best collections of writing on human reasoning ever produced. They are extraordinarily good at teaching you to identify when a norm, a belief, or an institution is inefficient, unjustified, or wrong. What they systematically fail to teach is the difference between two very different conclusions:
The overall thrust of the Sequences — and of HPMOR, and of the broader rationalist memeplex — is heavily weighted toward "society is wrong, think from first principles, don't defer to tradition." Chesterton's Fence makes an appearance, but it's drowned out by the heroic narrative. The practical takeaway that most young readers absorb is: "I am now licensed to disregard any norm I can find a logical objection to."
This is not what the Sequences explicitly say. Eliezer wrote about Chesterton's Fence. There are posts about respecting existing equilibria. But the gestalt — the thing you walk away feeling after reading 2,000 pages about how humans are systematically irrational and how thinking clearly gives you superpowers — pushes overwhelmingly in the direction of "if you can see that the fence is inefficient, you should tear it down."
The missing lesson is: your analysis of why a social norm is suboptimal is probably correct. Your implicit model of what happens when you unilaterally defect from it is probably wrong. These two facts are not in tension.
Joseph Henrich's The Secret of Our Success is, in some ways, the book-length version of the lesson the Sequences forgot. Henrich demonstrates, across dozens of examples, that cultural evolution routinely produces solutions that no individual participant can explain or justify from first principles — but that are adaptive nonetheless.
The canonical example is cassava processing. Indigenous methods for preparing cassava involve an elaborate multi-step process that looks, to a first-principles thinker, absurdly overcomplicated. Someone with a "just boil it" approach would streamline the process, eat tastier cassava, and feel very clever. They would also slowly accumulate cyanide poisoning, because the elaborate steps they discarded were the ones that removed toxins. Symptoms take years to appear, making the feedback loop nearly invisible to individual reasoning.
The lesson generalizes: traditions frequently encode solutions to problems that the practitioners cannot articulate. The fact that nobody can tell you why a norm exists is not evidence that the norm is pointless — it's evidence that the relevant selection pressures were operating on a timescale or level of complexity that individual cognition can't easily access.
This doesn't mean all norms are good. It means the prior on "I've thought about this for an afternoon and concluded this ancient, widespread practice is pointless" should be much lower than the Sequences suggest. You might be right. But you should be surprised if you're right, not surprised if you're wrong.
The point of this article is not "always defer to tradition." That would be a different error, and one the Sequences correctly warn against. The point is that there's a large and important gap between "this norm is suboptimal" and "I should unilaterally defect from this norm," and the Sequences provide almost no guidance for navigating that gap.
Here are some heuristics for when acting on your analysis is more likely to go well:
The cost of the norm falls primarily on you. If you're the one bearing the cost of compliance, defection is more defensible because you're not imposing externalities. Deciding to be vegetarian is different from loudly informing everyone at dinner that they're complicit in factory farming.
You can exit rather than reform. Moving to a community where your preferred norms are already in place is much less costly than trying to change the norms of the community you're in. This is one reason the rationalist community itself exists — it's a place where certain norms (explicit communication, quantified beliefs, etc.) have enough adoption to actually function.
Adoption is already high enough. If you're at 40% and pushing toward a tipping point, unilateral action looks very different than if you're at 0.5% and tilting at windmills. Read the room.
The norm is genuinely new rather than Lindy. A norm that's been stable for centuries has survived a lot of selection pressure. A norm that arose in the last decade hasn't been tested. Your prior on "I can see why this is wrong" should be calibrated to how long the norm has persisted.
You can experiment reversibly. If you can try defecting and easily revert if it goes badly, the downside is limited. If defection burns bridges or signals things you can't unsignal, be cautious.
You understand why the fence is there, not just that it's inefficient. This is the actual Chesterton test, applied honestly. "I can see that this norm is suboptimal" is not the same as "I understand the function this norm serves and have a plan that serves that function better."
The pattern described in this article — "this would be better if everyone did it, but is actively costly if only I do it" — is not limited to the flirting example. It applies to a huge class of rationalist-flavored insights about social behavior.
Radical honesty. Explicit negotiation of social obligations. Treating every interaction as an opportunity for Bayesian updating. Refusing to engage in polite fictions. Pointing out logical errors in emotionally charged conversations. All of these are, in some sense, correct — a world where everyone did them might well be better. And all of them, implemented unilaterally at low adoption, will reliably make your life worse while changing nothing about the broader equilibrium.
This is, I think, the central tragedy of reading LessWrong at a formative age. You learn to see inefficiencies that are genuinely real. You develop the tools to analyze social systems with a precision most people never achieve. And then, because nobody ever taught you the difference between seeing the problem and being able to solve it unilaterally, you spend years generating friction — making interactions weird, alienating people who would otherwise be allies, pattern-matching yourself to "insufferable rationalist who thinks they've solved social interaction from first principles" — all in pursuit of norms that can only work through coordination.
The Sequences need a companion piece. Not one that says "don't think critically about norms" — that would be throwing out the baby with the bathwater. But one that says: "Having identified that a fence is inefficient, your next step is not to tear it down. It's to understand what load-bearing function it serves, to assess whether you have the coordination capacity to replace it with something better, and to be honest with yourself about whether unilateral action is heroic or just costly. Most of the time, it's just costly."
Or, more concisely: there should be a Sequence about not treating the Sequences as action guides unless you have the wisdom to tell why Chesterton's fence exists.
2026-02-15 23:08:27
Published on February 15, 2026 3:08 PM GMT
Companion to "The Hostile Telepaths Problem" (by Valentine)
Epistemic status: This is my own work, though I asked Valentine for feedback on an early draft. I'm confident that the mechanisms underlying the Hostile/Friendly Telepath dynamic are closely related. The problems of the friendly telepath problems seem well-established. I am less confident, or at least can't make as strong a claim, on the relation to selfhood/the self.
Valentine's "Hostile Telepaths" is about what your mind will do when you have to deal with people who can read your emotions and intentions well enough to discover when you are not thinking what they want you to think, and you have to expect punishment for what they find. In such a scenario, your mind will make being-read less dangerous in one of multiple ways, for example, by warping what you can discover in yourself and know about your own intentions.
If that doesn't seem plausible to you, I recommend reading Valentine's post first. Otherwise, this post mostly stands on its own and describes a different but sort of symmetric or opposite case.
"Telepathy," or being legible for other people, isn't only a danger. It also has benefits for collaboration. As in Valentine's post, I mean by "telepath" people who can partly tell if you are being honest, whether you are afraid, whether you will stick to your commitments, or in other ways seem to know what you are thinking. I will show that such telepathy is part of a lot of everyday effective cooperation. And beyond that, I will ask: What if a lot of what we call "having a self" is built, developmentally and socially, because it makes cooperation possible by reducing complexity? I will also argue that Valentine's Occlumency, i.e., hiding your intentions from others (and yourself), is not only a defensive hack. It can also function as a commitment device: it makes the conscious interaction between such telepaths trustworthy.
In the hostile-telepath setting, the world contains someone who can infer your inner state and who uses that access against you.
That creates pressure in two directions: You can reduce legibility to them by using privacy, misdirection, strategic silence, or other such means. Or you can reduce legibility even to yourself when self-knowledge is itself dangerous. If I can't know it, I can't be punished for it.
Valentine's post is mostly about the second move: the mind sometimes protects itself by making the truth harder to access.
But consider the flip side: Suppose the "telepath" is not hunting for reasons to punish you. Suppose they're a caregiver, a teammate, a friend, a partner, or someone trying to coordinate with you. Then, being legible is valuable. Not because everything about you must be transparent, but because the other person needs a stable, simple, efficient way to predict and align with you:
A big part of "becoming a person" is becoming the kind of being with whom other people can coordinate. At first, caregivers, then peers, then institutions.
If you squint, a self is an interface to a person. Here is a nice illustration by Kevin Simler that gives the idea:
But Kevin Simler is talking about the stable result of the socialization process (also discussed by Tomasello[1]). The roles/masks we are wearing as adults, whether we are aware of them or not. I want to talk about the parts prior to the mask. The mask is a standardized interface, but the pointy and gooey parts are also a lower-level interface.
I have a four-month-old baby. She can't speak. She has needs, distress, curiosity, but except for single of comfort or discomfort, she has no way to communicate, and much less to negotiate.
I can't coordinate with "the whole baby." I can't look into its mind. I can't even remember all the details of its behavior. I can only go by behavior patterns that are readable: different types of crying, moving, and looking. New behaviors or ones I already know (maybe from its siblings).
Over time, the baby becomes more legible. And they are surprisingly effective at it[2]. But how? Not by exposing every internal detail, but by developing stable handles that I or others can learn:
So I interpret its behaviors in a form that is legible to me. I have a limited number of noisy samples and can't memorize all details, so I compress into something I can handle ("it likes bathing in the evening") - with all the issues that brings - (oops, it likes bathing when X, Y, but not Z" many of which I don't know).
Vygotsky[3] describes how this interpretation of children's behavior gives rise to the interface by interpretation and mutual reinforcement.
From the outside, the baby starts to look like "a person." From the inside, I imagine, it starts to feel like "me."
It is in the interest of the baby to be legible, and so it is for people. If other people know our interface, we can cooperate more effectively. And it also seems like more information and less noise is better to get a clearer reading and less errors when compressing each others communication. This may sound like an argument for radical transparency or radical honesty: if legibility is good, why not make everything transparent to others and also to yourself? But consider: what happens if you could edit everything on demand? The interface stops being informative. What makes coordination possible is constraint. A partner can treat a signal as evidence only if it isn't infinitely plastic. Examples:
So some opacity doesn't just hide - it stabilizes. It makes your actions real in the boring sense: they reflect constraints in the system, not only strategic choices.
All things being equal, we prefer to associate with people who will never murder us, rather than people who will only murder us when it would be good to do so - because we personally calculate good with a term for our existence. People with an irrational, compelling commitment are more trustworthy than people compelled by rational or utilitarian concerns (Schelling's Strategy of Conflict) -- Shockwave's comment here (emphasis mine). See also Thomas C. Schelling's "Strategy of Conflict"
So opacity to ourselves can not only function as a defence against hostile opponents, but also enables cooperation of others with us. As long as we don't know why we consistently behave in a predictable way, we can't systematically change it. Opacity enables commitment.
But not all types of opacity or transparency are equal. When people argue about "transparency," they often conflate at least three things:
What is special about cooperation here is that you can want more legibility at the interface while still needing less legibility in the implementation.
A good team doesn't need to see every desire. It needs reliable commitments and predictably named states.
Valentine discusses some ways self-deception can go wrong; for example, it can mislabel the problem as a “personal flaw” or show up as akrasia, mind fog, or distraction. Se should also expect the the reverse direction, the coordination interface, to have failure modes. Which ones can you think of? Here are four illustrations for common failure modes. Can you decode which ones they are before reading on?
In A non-mystical explanation of insight meditation... Kaj writes:
I liked the idea of becoming "enlightened" and "letting go of my ego." I believed I could learn to use my time and energy for the benefit of other people and put away my 'selfish' desires to help myself, and even thought this was desirable. This backfired as I became a people-pleaser, and still find it hard to put my needs ahead of other peoples to this day.
You become legible, but the legibility is optimized for being approved of, not for being true.
The tell is if your "self" changes with the audience; you feel managed rather than coordinated.
In When Being Yourself Becomes an Excuse for Not Changing Fiona writes:
a friend breezed into lunch nearly an hour late and, without a hint of remorse, announced, “I’m hopeless with time.” Her casual self-acceptance reminded me of my younger self. From my current vantage point, I see how easily “that’s just who I am” becomes a shield against any real effort to adapt.
Sometimes "this is just who I am" is a boundary. Sometimes it's a refusal to update. Everybody has interacted with stubborn people. This is a common pattern. But you can tell if it is adaptive if it leads to better cooperation. Does it make you more predictable and easy rto cooperate with, or does it mainly shut down negotiations that might (!) overwhelm you?
The rule of equal and opposite advice advice applies here. There is such a thing as asserting too few boundaries. For a long time, I had difficulties asserting my boundaries. I was pretty good at avoiding trouble, but it didn't give people a way to know and include my boundaries in their calculations - and in many cases where I avoided people and situations, they would have happily adapted.
In Dealing with Awkwardness, Jonathan writes:
Permanently letting go of self-judgement is tricky. Many people have an inner critic in their heads, running ongoing commentary and judging their every move. People without inner voices can have a corresponding feeling about themselves - a habitual scepticism and derisiveness.
So you can develop an inner critic: an internalized role of, maybe of a parent or teacher, that audits feelings and demands that you "really mean it."
Then you're surveilled from the inside. I think this shows up as immature defenses[4], numbness or theatricality.
In What Universal Human Experiences are You Missing without Realizing it? Scott Alexander recounts:
Ozy: It took me a while to have enough of a sense of the food I like for “make a list of the food I like” to be a viable grocery-list-making strategy.
Scott: I’ve got to admit I’m confused and intrigued by your “don’t know my own preferences” thing.
Ozy: Hrm. Well, it’s sort of like… you know how sometimes you pretend to like something because it’s high-status, and if you do it well enough you _actually believe_ you like the thing? Unless I pay a lot of attention _all_ my preferences end up being not “what I actually enjoy” but like “what is high status” or “what will keep people from getting angry at me”
A benevolent friend can help you name what you feel. But there's a trap: outsourcing selfhood and endorsed preferences.
Do you only know what you want after someone (or the generalized "other people") wants it too?
If you want to balance the trade-offs between being legible and opaque, what would you do? Perhaps:
What does the latter mean?
What I can't answer is the deeper question: What is a stable way of having a "self as an interface" that is stable under both coordination pressure (friendly telepaths) and adversarial pressure (hostile telepaths)? Bonus points if you can also preserve autonomy and truth-tracking.
how humans put their heads together with others in acts of so-called shared intentionality, or "we" intentionality. When individuals participate with others in collaborative activities, together they form joint goals and joint attention, which then create individual roles and individual perspeсtives that must be coordinated within them (Moll and Tomasello, 2007). Moreover, there is a deep continuity between such concrete manifestations of joint action and attention and more abstract cultural practices and products such as cultural institutions, which are structured-indeed, created-by agreedupon social conventions and norms (Tomasello, 2009). In general, humans are able to coordinate with others, in a way that other primates seemingly are not, to form a "we" that acts as a kind of plural agent to create everything from a collaborative hunting party to a cultural institution.
Tomasello (2014) A natural history of human thinking, page 4
We propose that human communication is specifically adapted to allow the transmission of generic knowledge between individuals. Such a communication system, which we call ‘natural pedagogy’, enables fast and efficient social learning of cognitively opaque cultural knowledge that would be hard to acquire relying on purely observational learning mechanisms alone. We argue that human infants are prepared to be at the receptive side of natural pedagogy (i) by being sensitive to ostensive signals that indicate that they are being addressed by communication, (ii) by developing referential expectations in ostensive contexts and (iii) by being biased to interpret ostensive-referential communication as conveying information that is kind-relevant and generalizable.
Natural Pedagogy by Csibra & Gergely (2009), in Trends in Cognitive Sciences pages 148–153
We caIl the internal reconstruction of an external operation intentionalization. A good example of this process may be found in the development of pointing. Initially, this gesture is nothing more than an
unsuccessful attempt to grasp something, a movement aimed at a certain
object which designates forthcoming activity. The child attempts to
grasp an object placed beyond his reach; his hands, stretched toward
that object, remain poised in the air. His fingers make grasping movements. At this initial stage pointing is represented by the child's movement, which seems to be pointing to an object-that and nothing more.When the mother comes to the child's aid and realizes his movement indicates something, the situation changes fundamentally, Pointing
becomes a gesture for others. The child's unsuccessful attempt engenders
a reaction not from the object he seeks but from another person. Consequently, the primary meaning of that unsuccessful grasping movement is established by others. Only later, when the child can link his unsuccessful grasping movements to the objective situation as a whole, does he begin to understand this movement as pointing. At this juncture there occurs a change in that movement's function: from an object-oriented
movement it becomes a movement aimed at another person, a means of
establishing relations, The grasping movement changes to the act of pointing.
Vygotsky (1978), Mind in society, page 56
See Defence mechanism. A writeup is in Understanding Level 2: Immature Psychological Defense Mechanisms. I first read about mature and immature defenses in Aging Well.