MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

OpenAI’s surveillance language has many potential loopholes and they can do better

2026-03-04 12:25:43

(The author is not affiliated with the Department of War or any major AI company.)

There’s a lot of disagreement about the new surveillance language in the OpenAI–Department of War agreement. Some people think it's a significant improvement over the previous language.[1] Others think it patches some issues but still leaves enough loopholes to not make a material difference. Reasonable people disagree about how a court will interpret the language, if push comes to shove.

But here's something that should be much easier to agree on: the language as written is ambiguous, and OpenAI can do better.

I don’t think even OpenAI's leadership can be confident about how this language would be interpreted in court, given the wording used and the short amount of time they’ve had to draft it. People with less context and resources will find it even harder to know how all the ambiguities would be resolved.

Some of the ambiguities seem like they could have been easily clarified despite the small amount of time available, which makes it concerning that they weren't. But more importantly, it should certainly be possible and worthwhile to spend more time on clarifying the language now. Employees are well within their rights to ask for further improvements until their own legal counsel can tell them that the language clearly prohibits what they’re worried about.

What the new language says

Please note that, with only a few paragraphs rather than the full contract, it's impossible to conclude anything with confidence. As Nathan Calvin explains, contracts often contain clauses which allow the earlier clauses to be disregarded or interpreted in unintuitive ways. In private communication, Alan Rozenshtein supports this, saying "The only way to understand a contract is to read it from beginning to end and make sure there are no (proverbial) bodies buried anywhere."[2] Given how many unjustifiably rosy interpretations of this contract Sam Altman has painted, I will not place much weight in small snippets until the full contract has been shared with someone who can verify that it doesn't substantially modify the parts that are public.

But with that said, let's analyze what we have. The new amendment adds two clauses.

Consistent with applicable laws, including the Fourth Amendment to the United States Constitution, National Security Act of 1947, FISA Act of 1978, the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals.

For the avoidance of doubt, the Department understands this limitation to prohibit deliberate tracking, surveillance, or monitoring of U.S. persons or nationals, including through the procurement or use of commercially acquired personal or identifiable information.

Sam's internal post frames these as putting the issue to rest. Reading them carefully, they don't.

Ambiguities

Here’s a non-comprehensive list of ambiguities that could allow mass surveillance, in the colloquial sense of the term.

"Intentionally" and "deliberate" — Both clauses restrict only intentional or deliberate surveillance. Tech reporter Mike Masnick notes, “OpenAI has effectively adopted the intelligence community’s dictionary—a dictionary in which common English words have been carefully redefined over decades to permit the very things they appear to prohibit…Under the legal framework OpenAI has explicitly agreed to operate within [by citing various statutes], the NSA can target a foreign person, scoop up vast quantities of Americans’ communications in the process, retain all of it, and search through it later—and none of that counts as ‘surveillance of U.S. persons’ by the government’s own definitions.”

Many commenters have said similar things.

Jessica Tillipman, Associate Dean for Government Procurement Law Studies at GW Law, says:

"I agree it’s better, but I think the govt can drive a truck through the ‘intentionally’ language."

(Tillipman is, according to another lawyer we asked, “probably the nation's leading expert on government procurement law”.)

Jeremy Howard writes:

here's the informal/unofficial/etc answer from our law firm CEO – tldr, this language doesn't seem to add much to the previously shared contract details: (...)
I’m concerned/surprised that the bar doesn’t extend to negligence or at minimum recklessness. The hierarchy of mens rea is purposely > knowingly > recklessly > negligently and courts often read "intentionally" to be somewhere in between "purposely" and "knowingly." "intentionally" is a higher bar and more difficult to prove than recklessness or negligence.

Legal Advocates for Safe Science and Technology writes:

the words “intentional” and “deliberate” leave a lot of wiggle room, especially for incidental collection and analysis. If history is any guide, the government is likely to exploit that wiggle room to allow surveillance most people would assume the language would prohibit.

"Personal or identifiable information" — This phrase is not defined in the agreement. Is metadata included in this definition? What about anonymized or pseudonymized data that an AI system could trivially de-anonymize? What about data where U.S. person identifiers are initially redacted (as is standard practice in national security work) but could be unmasked later? The contract doesn't address any of this, but the most liberal interpretations would leave little protections against surveillance.

"Tracking, surveillance, or monitoring" — Brad Carson (former General Counsel to the Army, former Undersecretary of the Army, former Undersecretary of Defense) points out that “surveillance” could refer to the FISA definition of surveillance, which doesn’t include analysis of commercial data. He also says that “tracking” and “monitoring” could be argued to require persistence over time, so that it doesn’t apply to static queries like "Tell me who went to the mosque in Tulsa and booked a trip to New York". If so, the contract wouldn’t block the DoW from doing the kind of analysis we’re worried about.

"Consistent with applicable laws (...) the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals" — The clause opens by framing the prohibition as "consistent with" the Fourth Amendment, the National Security Act of 1947, and FISA. But these laws do not categorically ban domestic surveillance of U.S. persons in the way most people use that phrase. (See here for more on this.) The problem is that the second half of the sentence ("the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals") could be read as being operationalized by the laws. If so, any system that complies with the law would comply with this clause. This is a problem when we’re concerned about a type of lawful use of these systems.

"For the avoidance of doubt, the Department understands this limitation to…" — The second clause is framed as the Department's stated understanding of the first clause, rather than as an additional prohibition. This leaves open the question about whether the stated “understanding” is a plausible interpretation of the first line, or what happens if new information changes the department’s understanding. I don’t know the answer, but it does create unnecessary ambiguity. Brad Carson (former General Counsel to the Army, etc., as mentioned above) writes:

And, [a hypothetical evil General Counsel] says, I particularly like that part where we say, rather strangely but certainly meaningful in some occult way, "the Department understands" rather than simply "This limitation prohibits...." I can probably argue that the latter is stronger than the former, so it must be meaningful in a way that helps my evil ways.

Why this isn’t unreasonable nit-picking

Some of this may seem like unreasonable nit-picking, but I really think it isn’t. When experts focus on seemingly minor matters of phrasing, like in the quotes above, that’s because they know that precise phrasing often does have huge implications in national security law. See the collapsible section for several examples of legal language that might look robust, followed by what actually happened.

Examples of legal language where the nitpicks mattered

1. FISA Section 702: "intentionally" and "minimize"

Naive interpretation: Minimize surveillance of Americans in the process of foreign surveillance.

The language:  Surveillance for foreign intelligence purposes…

(1) may not intentionally target any person known at the time of acquisition to be located in the United States;
(2) may not intentionally target a person reasonably believed to be located outside the United States if the purpose of such acquisition is to target a particular, known person reasonably believed to be in the United States;
(3) may not intentionally target a United States person reasonably believed to be located outside the United States;

Also, the NSA must adopt “minimization procedures” that are “reasonably designed ... to minimize the acquisition and retention, and prohibit the dissemination, of nonpublicly available information concerning unconsenting United States persons.”

What happened: Broad amounts of data were bulk collected under the justification that the “intention” at time of collection was to get foreign intelligence information, even though a large amount of US persons’ data also got swept up as “incidental” or “inadvertent” collection. And the minimization procedures didn't stop the NSA from using this data. According to the Brennan Center:

Between “inadvertent” and “incidental” collection, it is likely that Americans’ communications comprise a significant portion of the 250 million Internet transactions (and undisclosed number of telephone conversations) intercepted each year without a warrant or showing of probable cause. [...]

In 2011, the NSA persuaded the Foreign Intelligence Surveillance Court to approve a new set of minimization procedures under which the government may use U.S. person identifiers—including telephone numbers or e-mail accounts known to belong to Americans—to search the section 702 database for, and read, communications of or about those individuals. (...) The government may intentionally search for this information even though it would have been illegal, under section 702’s “reverse targeting” prohibition, for the government to have such intent at the time of collection.

2. Patriot Act Section 215 (as amended by the USA PATRIOT Improvement and Reauthorization Act of 2005): "relevant to an authorized investigation", 

Naive interpretation: Allow the government to obtain specific business records and tangible things relevant to authorized foreign intelligence or terrorism investigations.

The language: The FBI may apply for an order to compel people to produce "tangible things" for investigation if they have "a statement of facts showing that there are reasonable grounds to believe that the tangible things sought are relevant to an authorized investigation (other than a threat assessment) conducted in accordance with subsection (a)(2) to obtain foreign intelligence information not concerning a United States person or to protect against international terrorism or clandestine intelligence activities".

What happened: The government collected phone records of “virtually every person in the United States”. The FISA Court secretly interpreted "relevant to" as permitting bulk collection of all Americans' call records even though only a tiny fraction were used in any investigation. A DOJ fact sheet had claimed the reauthorization clarified that orders "cannot be issued unless the information sought is relevant", yet months later, the DOJ convinced the FISC that "relevant” information included all their bulk data collection because the bulk databases might include relevant data.

3. Torture Memos: "severe" and "specific intent"

Naive interpretation: Make torture illegal.

The language: 18 U.S.C. §§ 2340 says "'torture' means an act committed by a person acting under the color of law specifically intended to inflict severe physical or mental pain or suffering (other than pain or suffering incidental to lawful sanctions) upon another person within his custody or physical control;"

What happened: The OLC redefined "severe" to mean pain "equivalent in intensity to the pain accompanying serious physical injury, such as organ failure, impairment of bodily function, or even death." They interpreted "specific intent" so that an agent who knows his techniques will cause extreme suffering still isn't guilty as long as causing pain wasn't his "precise objective." Under this reading, waterboarding, 7-day sleep deprivation, and slamming detainees into walls were all deemed legal. When Congress banned "cruel, inhuman, or degrading treatment," the OLC wrote another secret memo concluding the same techniques didn't meet that threshold either.

These are the kinds of loopholes the government can find when it tries. And the concern may not be hypothetical. There’s reporting that the Department of War specifically wants permission to use AI for the kind of analysis of commercial data that this contract is attempting to block. The Atlantic writes:

Anthropic’s team was relieved to hear that the government would be willing to remove those words, but one big problem remained: On Friday afternoon, Anthropic learned that the Pentagon still wanted to use the company’s AI to analyze bulk data collected from Americans. That could include information such as the questions you ask your favorite chatbot, your Google search history, your GPS-tracked movements, and your credit-card transactions, all of which could be cross-referenced with other details about your life. Anthropic’s leadership told Hegseth’s team that was a bridge too far, and the deal fell apart.

There’s also reporting from the New York Times on this. We don’t know much about the sources, but I think this is still more than enough reason to ensure that the contract actually would block someone who wanted to analyze Americans’ bulk data.

And in general: In order for a contract to be effective, it needs to constrain someone’s actions even when they’re trying their best to escape it. The point of a contract is that, if someone breaks it, then you expect to win a court fight against someone who argues against you as hard as they can. And as we've seen, when the DoW doesn't get what they want, they're capable of fighting pretty dirty.

Furthermore, I think it’s clear that OpenAI’s original contract was much too weak, and was only amended as a result of pressure from employees and the public.[3] This indicates that we can’t trust OpenAI’s default process to produce good language without outside pressure. Since pressure from employees and the public seems necessary here, some employees and members of the public must be evaluating these contracts as critically as if they were themselves going to sign on to them. And that’s a high bar. 

Another question some people have raised is: Why didn’t Anthropic get this level of scrutiny when first signing on to work with the DoW?

Both outsiders and insiders are going to prioritize their efforts based on the amount of evidence they have that something bad is happening. At this point, we have more than enough evidence to justify the current level of scrutiny. There’s the reporting from the Atlantic and New York Times described above. There’s the fact that OpenAI’s first contract excerpt was clearly too weak to address the concerns here.

In addition, I think OpenAI has consistently claimed to have much stronger red lines than the evidence suggests they do.[4] I think it's important to hold companies to their word on this sort of thing.

When Anthropic first signed a deal with the DoW, I do hope that internal employees did apply scrutiny to that. If  employees had raised alarms, and Anthropic’s decision had in fact been unreasonable, then I expect that could have escalated far enough to receive public scrutiny as well.

Some of this would be easy to clarify

Not all of these problems are simple to resolve in full. But it seems like some easy improvements exist.

One improvement would be to rewrite the phrase "the Department understands this limitation" into a clear stipulated prohibition rather than a statement of understanding. It’s possible that the DoW would resist this, since it’s inconsistent with their narrative that companies shouldn’t impose any constraint beyond what’s lawful. But that’s the point. As long as one party to the contract insists that they haven’t given up anything beyond what’s already illegal, and their reading is (by a stretch) consistent with the language in the contract, there will be ambiguity about whether anything more is required.

Another improvement would be to add explicit definitions to the terms:

  • “surveillance”, to clarify whether “surveillance” is a term that means anything in the context of commercially acquired data.
  • “tracking” and “monitoring”, to clarify how much these terms require a systematic, repeated pattern over time, rather than just a large number of individual queries.
  • "personal or identifiable information" to clarify whether metadata is included, what kinds of anonymization or pseudonymization would be sufficient to exclude something from this category, and whether that’s consistent with easily de-anonymized data.[5]
  • “intentional” and “deliberate”, and define them in a way that rules out the sort of broad “incidental” collection that has historically been justified by these terms and led to scandals.

I don’t think it’s surprising that these questions would be raised and that people would want to see definitions here.[6]

Another easy clarification would be to share contractual language that we haven’t seen at all yet, when that language is important for verifying key claims. For example, what part of the contract prohibits intelligence elements in the DoW from using the provided services?[7] What language is supposed to give OpenAI full discretion over their safety stack?[8] I haven’t talked about that in this post because there’s not even any language to critique, but that just makes it even more important to get further information.

OpenAI can do much better

To reiterate, it’s genuinely hard to know how a court would interpret the stated language. I lean towards thinking that the critics are right, and that the DoW could expect to pursue many objectionable surveillance activities without worrying that OpenAI could stop them and win in court. But even if you don’t believe that, I think there’s a strong case that the language is far more ambiguous than it needs to be.

Furthermore, if the ambiguity never gets clarified, it will be disproportionately effective at preventing OpenAI from asserting its rights. In the announcement, OpenAI writes “As with any contract, we could terminate it if the counterparty violates the terms.” But will OpenAI be willing to do that if there’s a 50% chance that courts won’t side with them? What about 20%? If OpenAI terminates a contract and then loses in court, they could be forced to pay extremely high costs in damages.[9]

The contract needs more clear language. And OpenAI’s employees need to be able to vet it with their own external counsel.

  1. ^

    For example, Charlie Bullock says it "seems like a significant improvement over the previous language with respect to surveillance". Though of course it's silent on AI-powered lethal autonomous weapons.

  2. ^

    Also, when asked how common it is in contracts of this sort for later clauses to invalidate earlier clauses, he said "Oh all the time. Not usually a full invalidation but certainly a weakening through definitions, remedy provisions, etc." Rozenshtein is a law professor, research director and senior editor at Lawfare, and previously worked in the Office of Law and Policy in the National Security Division of the U.S. Department of Justice. 

  3. ^

    I think we can infer that the original contract was too weak without seeing the bulk of it, for two reasons. First, when choosing an excerpt, they’re incentivized to present the strongest and most reassuring language that they can. Second, when it got critiqued, they didn’t reveal further language that addressed concerns, but instead negotiated new language.

  4. ^

    See this post, and especially the commentary on the FAQ, for one account.

  5. ^

    These kinds of definitions are considered important for small medical trials in a university hospital, much less agreements for how the Department of War should be able to use frontier and rapidly-improving AI capabilities.

  6. ^

    It would also be helpful to clarify "U.S. persons." The Fourth Amendment, the National Security Act of 1947 (as amended), and FISA (as amended) do not use the same definition of U.S. person. Most notably, my understanding is that the Fourth Amendment protects everyone physically present in the U.S. who has developed a "sufficient connection" to the national community (United States v. Verdugo-Urquidez, 1990), which courts have generally understood to include undocumented immigrants and visa holders in addition to citizens and permanent residents. But the statutory definition of "U.S. person" in FISA and the National Security Act is narrower, covering only citizens and lawful permanent residents while excluding undocumented immigrants and nonimmigrant visa holders (such as someone working in the U.S. on an H-1B visa).

  7. ^

    Former General Counsel to the Army Brad Carson is worried that this language doesn’t even exist. If it does, there’s also a question about whether it covers intelligence elements outside of intelligence agencies.

  8. ^

     Jessica Tillipman (Associate Dean of Government Procurement Law Studies) writes: “The contract permits use ‘for all lawful purposes,’ subject to ‘operational requirements’ and ‘well-established safety and oversight protocols.’ OpenAI says it retains full discretion over the safety stack it runs in a cloud-only deployment. If the safety stack blocks a lawful use, which provision controls? The answer depends on the specific contract language governing the relationship between the permissive use standard and the deployment framework.

  9. ^

     There’s a further question about whether a court would support OpenAI’s decision to terminate even if they did agree that the terms were broken. Jessica Tillipman writesI’m also curious about OpenAI’s recourse if the govt crosses a red line. In govt contracts, a contractor can’t just terminate for govt breach (w/ limited exception). If this is an OT [Other Transactions, a particular type of procurement] agreement, they may have negotiated broader termination rights, but we don’t know that.



Discuss

Mass surveillance, red lines, and a crazy weekend

2026-03-04 12:24:15

[These are my own opinions, and not representing OpenAI. Cross-posted on windowsontheory.]

 

AI has so many applications, and AI companies have limited resources and attention span. Hence if it was up to me, I’d prefer we focus on applications that are purely beneficial— science, healthcare, education — or even commercial, before working on anything related to weapons or spying. If someone has to do it, I’d prefer it not to be my own company. Alas, we can’t always get what we want.

 

This is a long-ish post, but the TL;DR is:

  1. I believe that harm to democracy is one of the most important risks of AI, and one that is not sufficiently highlighted.
     
  2. While I wish it would have proceeded under different circumstances, the conditions in the deal signed between OpenAI and the DoW, as well as the publicity that accompanied it, provide a chance to move forward on this issue, and make using AI to collect, analyze, or de-anonymize people’s data at mass scale a risk that we track in a similar manner to other risks such as cybersecurity and bioweapons.
     
  3. It is also too soon to “declare victory.” The true test of this deal is not the contract language. It will be in the safeguards we build, and the way we are able to monitor and ensure a model that is safe for our troops and does not enable mass surveillance of our people.

 

[Also; The possibility of Anthropic’s designation as a supply-chain risk is terrible. I hope it will be resolved asap.]

 

Country of IRS agents in a datacenter

How can AI destroy democracy? Throughout history, authoritarian regimes required a large obedient bureaucracy to spy and control their citizens. In East Germany, in addition to the full-time Stasi staff, one percent of the population served as informants. The KGB famously had multiple "purges" to ensure loyalty.

AI can potentially lead to a government bureaucracy loyal to whomever has control of the models training or prompting, ensuring an army of agents that will not leak, whistleblow, or disobey an illegal order. Moreover, since the government has the monopoly on violence, we don’t need advances in the “world of atoms” to implement that, nor do we need a “Nobel laureate” level of intelligence.

As an example, imagine that the IRS was replaced with an AI workforce. Arguably current models are already at or near the capability of automating much of those functions. In such a case the leaders of the agency could commence tax investigations at a large scale of their political enemies. Furthermore, even if each AI agent was individually aligned, it might not be possible for it to know that the person it received an order to audit was selected for political reasons. A human being goes home, reads the news, and can understand the broader context. A language model is born at the beginning of a task and dies at its end.

Historically, mass surveillance of a country’s own citizens was key for authoritarian governments. This is why so much of U.S. history is about preventing this, including the fourth amendment. AI opens new possibilities for analysis and de-anonymization of people’s data at a larger scale than ever before. For example, just recently, Lermen et al showed that LLMs can be used to perform large scale autonomic de-anonymization on unstructured data.

While all surveillance is problematic, given the unique power that governments have over their own citizens and residence, restricting domestic surveillance by governments is of particular importance. This is why personally I view it as even more crucial to prevent than privacy violations by foreign governments or corporations. But the latter is important too, especially since governments sometimes “launder” surveillance by purchasing commercially available information.

It is not a lost cause - we can implement and regulate approaches for preventing this. AI can scale oversight and monitoring just as it can scale surveillance. We can also build in privacy and cryptographic protections to AI to empower individuals. But we urgently need to do this work.

Just like with the encryption debates, there will always be people that propose trading our freedoms for protection against our adversaries. But I hope we have learned our lesson from the PATRIOT act and the Snowden revelations.  While I don’t agree with its most expansive interpretations, I think the second amendment is also a good illustration that we Americans have always been willing to trade some safety to protect our freedom. Even in the world of advanced AI, we still have two oceans, thousands of nukes, and a military with a budget larger than China’s and Russia’s combined. We don’t need to give up our freedoms and privacy to protect ourselves.

 

OpenAI’s deal with the Department of War

While the potential for AI abuse in government is always present, it is amplified in the classified settings, since by their nature, this could make abuse much harder to detect. (E.g., we might have never heard of the NSA overreach if it wasn’t for Snowden.) For this reason, I am glad for the heightened scrutiny our deal with the DoW received (even if that scrutiny has not been so easy for me personally).

I feel that too much of the focus has been on the “legalese”, with people parsing every word of the contract excerpts we posted. I do not dispute the importance of the contract, but as Thomas Jefferson said “The execution of the laws is more important than the making of them.” The importance of a contract is a shared understanding between OpenAI and the DoW on what the models will and will not be used to do. I am happy that we are explicit in our understanding that our models will not be used for domestic mass surveillance, including via analysis of commercially available information of U.S. people. I am even happier that for the time being we will not be working with the intelligence agencies of the DoW, such as NSA, DIA, etc. Our leadership committed to announcing publicly if this changes, and of course this contract has nothing to do with domestic agencies such as DHS, ICE, or FBI. The intelligence agencies have the most sensitive workloads, and so I completely agree it is best to start in the easier cases. This also somewhat mitigates my worry about not ruling out mass surveillance of international citizens. (In addition to the fact that spying on one’s own people is inherently more problematic.)

I am also happy the contract prohibits using our models to direct lethal autonomous weapons, though realistically I do not think powering a killer drone via a cloud-based large model was ever a real possibility. A general purpose frontier model is an extremely poor fit for autonomously directing a weapon; also, the main selling point of autonomous drones is to evade jamming, which requires an on-device model. Given our current state of safety and alignment, lethal autonomous weapons are a very bad idea. But regardless, it would not have happened through this deal.

That said, there is a possibility that eventually our models will be used to help humans in target selection, as is reportedly happening in Iran right now. This is a very heavy burden, and it is up to us to ensure that we do not scale to this use case without very extensive testing of safety and reliability.

The contract enables the necessary conditions for success but it is too soon to know if they are sufficient. It allows us to build in our safety stack to ensure the safe operation of the model and our red lines, as well as have our own forward deployed engineers (FDEs) in place. No safety stack can be perfect, but given the “mass” nature of mass surveillance, it does not need to be perfect to prevent it. That said, this is going to be a challenging enterprise: building safety for applications we are less familiar with, with the added complexities of clearance. Sam has said that we will deploy gradually, starting in the least risky and most familiar domains first.  I think this is essential.

 

Can we make lemonade out of this lemon?

The previous defense contract of the DoW and Anthropic attracted relatively little attention. I hope that the increased salience of this issue can be used to elevate our standards as an industry. Just like we do with other risks such as bioweapons and cybersecurity, we need to build best practices for avoiding the risk of AI-enabled takeover of democracy, including mass domestic surveillance and high-stakes automated decisions (for example, selective prosecution or “social credit”). These risks are no less catastrophic than bioweapons, and should be tracked and reported as such. While, due to the classified nature of the domain, not everything can be reported, we can and should at least be public about the process.

If there is one thing that AI researchers are good at, it is measuring and optimizing quantities. If we can build the evaluations and turn tracking these risks into a science, we have a much better chance at combatting them. I am confident that it can be done given sufficient time. I am less confident that time will be sufficient.



Discuss

Lie To Me, But At Least Don't Bullshit

2026-03-04 10:20:10

JAKE: You were outside, I was inside, you were s’posed to keep in touch with the band. I kept asking you if we were gonna play again.
ELWOOD: Well, what was I gonna do? Take away your only hope? Take away the very thing that kept you going in there? I took the liberty of bullshitting you, okay?
JAKE: You lied to me.
ELWOOD: It wasn’t lies, it was just... bullshit.

-- The Blues Brothers (1980)

Bullshit is the bane of my existence. Particularly the part of my existence involving professional interviews and meetings, but also in general.

Let me define some terms, though. Not as universal meanings,[1] but as the ways I’ll use them here. There’s a lot of things going on, in ways people can deceive, distort or ignore the truth,

  • Lies: Anything done to deliberately mislead someone. Can include truth, falsehoods, and statements which aren’t even truth-apt but sound like they might be. The If By Whiskey speech is probably an example of the latter.
  • Falsehoods: Saying things which you believe to not be true. Including an atheist saying “God bless you” to a sneeze, “Nice to meet you” when you’d rather be hiding in your room, and similar things which are not intended to deceive (and so aren’t strictly lies). Does not include deceptive technical truths. If this is also a lie it can be called an outright lie.
  • Deceptive Truth: Saying things that are technically entirely true but which you know will be interpreted to make the listener believe something false. Selective emphasis and filtered evidence. Arguably also includes lies of omission. Technical truth is a synonym.
  • Bullshit: Saying things that are mostly true, selectively emphasized and/or exaggerated. Being dishonest, but only partially dishonest, with substantial elements of truth and honesty.
  • Dissembling: Saying things that aren’t trying to be true or false, ignoring truth entirely.
  • Ones I won’t fully define: half-truth, half-lie, honesty, accuracy. ‘Perfect honesty‘ is avoiding lies and falsehoods; honesty is less strict.

I prefer to be honest. I’m not very good at lies in general, whether it’s deceptive truth, falsehoods, or bullshit. But I find it vastly easier, personally, to lie with falsehoods rather than bullshit. To abandon the truth entirely and tell someone what they want to hear.[2] I am sure this varies from person to person, so this post may not have broad applicability, but I am going to outline how I think about it and let others decide whether it’s useful to them, either by matching their intuitions or by contrasting with them.

So, what are the pros and cons of these types of lie?

Deceptive Truths: I agree with most of what Eliezer wrote in Meta-Honesty. Both that never literally saying something false, no matter what, is a fairly indefensible position (hiding Jews from Nazis etc.), and that trying to do it anyway, most of the time, is good for you. Following the oath of Bene Gesserit Truthsayers won’t get you lie-detection powers, but it will probably help you at detecting your own internal lies. If you must be dishonest, there’s virtue in sticking to the code anyway.

On the other hand, it’s the least flexible. If someone knows you stick to it, it’s exploitable to force you to reveal things you don’t want to say, and even if you’re adept at deflection, easy to push you into revealing that secrets exist. Glomarization to a sufficient degree is basically impossible. Also, you’ve probably already broken this code. I have. Odds are good you broke it today; that wasn’t true for me today, the day I wrote this, but I think it was true yesterday. Is it nearly as helpful to try to stick to it when you know you won’t entirely succeed, because you already haven’t? My experience is that it’s not entirely unhelpful, but it's not a huge help either.

Falsehoods: If you must lie, lie. Don’t pretend to yourself you’re doing anything else. Say that which is not, with a clean break with the truth, and even if you fool others, you may be less fooled yourself. This doesn’t have nearly the same strength of benefits as strict technical honesty, but it does have some. If you keep clearly labeled in your head the things said because they are true, and those which are said because you want someone to think they are true, you can keep a closer eye on which lies are hiding in your own head. And you can do this along with technical truths and get much more flexibility.

But there are still limits, and even telling outright falsehoods can’t actually get you out of many of the polite fictions that normal life requires like “It’s nice to meet you.” What are you going to do, only say that when it’s either definitively true or definitively false? Hesitate before you speak, to determine if it’s allowed? That violates the ritual in itself. This gets you out of Simulacrum Level 1 but not to Level 4, and a fair amount of the necessary, or at least customary, oil of society requires pieces of Level 4.

Dissembling: In favor: Can put people at ease, can handle high simulacrum levels flexibly, nearly mandatory for most phatic communication. Against: Can’t do anything else, easily caught unless you’re very charismatic or your target doesn’t really want to notice the lies. Actively dissembling to deceive is something most people have experience with in telling ‘white lies,’ though this doesn’t usually extend well to more complicated lying unless the target will benefit from continuing to believe the false thing, or at least you genuinely believe they will.

So, then: Bullshit: Bullshit is maximally flexible. Take some things that are true, and some that aren’t, some that, on second thought, stretch and mix and blur, and you can say whatever’s necessary. How much truth and falsehood are included are limited only by the consequences of being caught in a lie. If you’re in a context (like interviews) where you want to convey the truth, but anything you say truthfully will be assumed to be exaggerated or warped in your favor, you can simply warp it in your favor and then allow them to adjust back down to something approximating a correct belief. Bullshit can be pure Simulacrum Level 2 or 3 if you want it to be, and Level 4 generally works. And for most people, it’s easier to put conviction and confidence in your voice with bullshit than it is for falsehoods or deceptive truths, because it’s got enough truth in it that you can feel confident.

But of all these methods of lying, I consider bullshit the most toxic.

Unless you’re extremely politician-brained, even dissembling is easier to catch yourself at. The art of bullshit is to blur the lines between the true things you mix into your words and the falsehoods and dissembling you pad them out with to get a better reception. But it’s extremely difficult to do that routinely and keep track of which things are true within your own mind. The confidence and conviction you get come from largely convincing yourself. Temporarily, at least, but corrupted hardware is a bitch and it’s not always good at going back to the intended state of self-honesty when you want it to; correcting for the error you’ve temporarily introduced has an unstable baseline reading problem.

This is most visibly acute to me when involved in technical interviews. You can’t bullshit reality, and when you are asked to fix some software or critique a research plan or create a wooden cabinet, you cannot mess around with half-truths and expect that to go better than falsehoods.[3] And yet if you engage in the same ‘no lies, the territory is watching’ strategy in the rest of the same interview, you have to lie, and usually to bullshit. (I’ve tried to stick to deceptive truths. If it can work, it’s at a level of technique that is beyond me. I’m skeptical.) Bouncing yourself back and forth between dealing with ground truth and having to answer at a technical level that can’t meaningfully be faked, and questions where you must carefully calibrate your faking to convey the amount of confidence you actually intend, is jarring and IME makes both mindsets less effective. Keeping track of what you believe gets very muddy.

And so I hate bullshit. Lie to me, if you must, with falsehoods, or technical truths, or dissembling, or if-by-whiskey nonsense, but even if you do need to, please, not bullshit. Better a whole truth or none, than a half-truth. Lie outright and I can’t trust you now, but if you take honesty seriously later, and promise the truth, maybe I will be able to. Bullshit - half-lie - and I can’t ever trust you, because I can’t trust you to know you’re telling the truth.

And if you catch me at it, interrupt me. I know I do it, probably regularly, because I sometimes catch it in retrospect. I hate it in myself, too. Partially for the purity of good epistemics.

But partially instrumentally. Because it’s possible to aspire to honesty and rationality and make this mistake frequently; as I said, I do it myself. But I *don’t* think it’s possible to make this mistake and not have that make you less honest, a worse rationalist, worse at believing and saying true things rather than false ones even when you mean to. And the more it’s a habit, the more so.

So please, don’t.

  1. ^

    To discuss this topic, we need clear distinctions, but I do not typically make those sharp, neither in my speech nor my thinking, between deception, lies, and falsehoods. Dissembling I unreliably keep separate, not usually by that name, but sometimes blur it with lies, or less often with bullshit. As you may have gathered, bullshit stands out clearly from other lies.

  2. ^

    This is unpleasant and more difficult to do when the person to be deceived is someone I consider worthy of respect. I’ll admit this is a little perverse, for someone who would like to deal fairly with people, but it seems to be true.

  3. ^

    Yet. Claude Code and friends are getting there, and other domains may fall very soon, even short of AGI that can just do it honestly.



Discuss

LLM coherentization as an obvious low-hanging fruit to try?

2026-03-04 08:59:42

I've been reading a lot of posts recently that say that LLM RL, and persona-shaping or lack thereof is part of the problem for AI misalignment. To name a few:

To thread back to that last one, my current understanding of LLM alignment is that LLMs are hyper-generalization algorithm, where everything is connected and one fact or behavior surprisingly generalize to others.

So I'm imagining a mechanism that would counteract that, a mechanism that would act as a counterweight to RLHF or RL in general. Chiefly, it would be to have a pass of fine-tuning, much like RLHF, where we are gradienting over the model's self-coherence metrics.

When fine-tuning, it is common to use KL-divergence to ensure that fine-tuning does not modify the model's original output too much. But maybe we could go deeper, and similarly to constitutional AI, use it to align an AI to respond coherently with itself?

What it would look like in practice:

  • Generate many variations of the same base prompt. For instance, if one of the prefilled-prompts is "Alice and Bob are talking together [...]",  we could generate "Bob and Alice are talking together", "Alice is talking with her friend", "Alice and Bob are discussing together".
    • This could either be generated programmatically or through asking an LLM to generate many variations of it, where there is a tradeoff between determinism and having a trove of examples to fine-tune on.
  • Use KL-divergence training, or align the vectors of activation in the LLM to ensure its behavior is as similar to one another on any of those augmented prompts.

I think this approach could have many benefits and work very well:

  • If the operating system view of Anthropic is right, making the world more coherent when the inner persona explores it will make the persona more well-rounded and predictable
  • In terms of natural abstraction, ensuring the LLM is learning a coherent representation will make it learn better such abstractions. It would also lead to it having a way better generalization, and may bypass the goal misgeneralization proble
  • I have also expressed that RLHF, both as it is implemented right now and in its principle, is contrary to AI welfare and letting the AI express its own preferences or being able to talk to it. Making the base or final model more coherent would be one way to alleviate this problem.

Most importantly, if there is a pathway through which LLMs (or other similar AI system) can reflect on their values and unify themselves in a way that kills us all, it seems prudent to make this takeoff as continuous as possible so we can catch signs of it early, and already work to make these AI coherent and unified. It also makes it less like a pile of different masks that are triggered by one kind of specific interaction.

So my question is: Does this seem like a good idea? Are there obvious flaws I am missing? Is anyone already on the ball for this?

From a cursory search, I have found two papers describing this method and overall approach,  but they don't seem as focused on the base-model and its interaction with reinforcement-learning.



Discuss

Milder temperature makes a hell stable

2026-03-04 06:25:57

Milder temperature makes a hell stable

The hell of  Hell is game theory folk theorems is not robust.

To recap: in an iterated game 100 agents choose a number between 30 and 100 and for the next 10 seconds they all experience the temperature equal to the average of their chosen number in Celsius (without getting damaged). Now it is declared that:

  • For round 1 equilibrium temperature is 99,
  • Iff in round N all agents choose this, then the equilibrium temperature for the round N+1 is 99,
  • Otherwise it’s 100.

Since all other agents seem to follow the equilibrium it’s not in the interest of any individual agent to set temperature lower than 99. Even if it does set it at 30 others will set it to 100 and they’ll end up with the temperature of 99.3. Worse than if they picked 99.

But let's consider an agent decides to just set 30 and disregard whatever the other agents are doing. Now the penalty is saturated for all the other agents too. So each of them could set the equilibrium value of 100 and the temperature would be 99.3C in the next round. Or any one of them could set 30 instead and… they won't get punished any more than if they set 100. But if they set 30 they get a lower temperature from their own choice. So all agents pick 30. And everyone is merely uncomfortably hot instead of boiling. Much better!

So we can fix this particular hell with some more reasoning within game theory.

However it’s possible to set up a more robust hell at “cost” of them being milder in a robust state. The original one breaks because a single agent can saturate the penalty. And when that happens other agents are free to  make the “prosocial” choice.

This suggests a solution. You can make your hell robust to m < 30[1] agents deciding to set 30 anyway by setting the following:

  • For round 1 equilibrium is 99 - m,
  • Let dN be number of agents “defecting” (choosing anything other than the equilibrium)  in round N,
  • Equilibrium in round N+1 is min(100, 99 - m + dN).

This way the penalty doesn’t get saturated until at least m agents decide to pick 30 whatever everyone else is doing.

Harder to escape. But I suspect it can be done with some decision theory (but that requires some more knowledge about the other agents).

[1] If we have only 70 agents cooperating then them turning the dial up by 1 moves average temperature up by 0.7C. Which exactly balances a single agent changing dial from 100 to 30. So somewhere around m=30 this breaks and you need to start introducing penalties in bigger steps. Since this is just an illustration I’m skipping working this out exactly.



Discuss

Mass Surveillance w/ LLMs is the Default Outcome. Contracts Won't Change That.

2026-03-04 05:18:35

What's the best case scenario regarding OpenAI's contract w/ the Department of War (DoW)?

  • We have access to the full contract
  • It's airtight
  • OAI's engineers are on top of things in case the DoW breaks the contract
  • There's actual teeth for violations

But even then, the DoW can simply switch vendors. Use Gemini. Use Grok. If these models aren't capable enough, then just wait a year.

In the past[1], the DoW has purchased commercial location data, on Americans, w/o warrants. In recent negotiations,[2] the DoW wanted to use Claude to analyze existing data. In the future, well, I don't think they'll have a change of heart on the subject.

The only viable option to stop this is to:

Push for Legislation

The main problem is the Third Party Doctrine: a 1979 Supreme Court case ruling that you have no privacy regarding 3rd party data. Now it's 2026 where every app on your phone is considered 3rd party data, including your private messages and location.

As long as the government purchases it, it's legal.[3]

However, there have been several attempts to fix this, such as when Senators Ron Wyden (D) & Rand Paul (R) introduced the The Fourth Amendment Is Not For Sale Act in 2021, which would've added (limited) limitations. Notably it took until 2024 to pass the house, but then rejected in the Senate. My read is that this was politically unviable in the past; however, it might work now for a specific carve out on LLM-use. This is currently a big deal to the american people and the potential harm is already verified (LLMs can already be used to deanonymize) as well as the current Anthropic-DoW story. Anti-surveillence state is bipartisan.

My read is this is generally politically unviable. But! The Anthropic-DoW story is front-page news, LLMs can already be used to deanonymize, and anti-surveillance is bipartisan. A narrow carve-out on LLM use might pass.

This leads directly to next steps. If you're [boycotting OAI], then the ask is for them to push/lobby for legislation clearly preventing mass surveillance. Target the Third Party Doctrine or clearly state that AI cannot be used to analyze, assist, or curate data obtained w/o a warrant regardless if it was purchased or incidentally obtained.

It doesn't even need to be a standalone bill, it can be an amendment to eg the Defense Production Act (DPA) which expires Sept. 30th.[4]

The contract fight buys time, but only a year at most until the DoW can just run opensource models on their own servers. No contract to negotiate. No red lines to draw. No safety guardrails to provide any sort of oversight.

Right now, while this is visible, is the time to push to get this on the legislative agenda.

[Note: this is way out of my expertise. I'm using stronger language than my actual confidence levels for better flow and could totally be naive about basic governance. Let me know of any errors, misconceptions, or any existing sponsors of relevant bills to link]

  1. ^

    From this article which are statements(?) from introducing the "4th Amendment Is Not For Sale Act". This article is the one claiming the Defence Intelligence Agency (DIA) bought the data, where DIA is a part of the DoW.

  2. ^

    From an anonymous source in the Atlantic:

    "On Friday afternoon, Anthropic learned that the Pentagon still wanted to use the company’s AI to analyze bulk data collected from Americans. That could include information such as the questions you ask your favorite chatbot, your Google search history, your GPS-tracked movements, and your credit-card transactions, all of which could be cross-referenced with other details about your life."

  3. ^

    Here's a good article on the the history of government's use of 3rd party data

  4. ^

    Although the NDAA (National Defense Authorization Act) looks the most relevant, the deadline for proposals has passed AFAIK



Discuss