MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Old SUNY Dorm Logic is not helping rural population collapse in NY.

2026-02-23 09:28:02

Published on February 23, 2026 1:28 AM GMT

The North Country of New York runs along the Canadian border at the top of the state. It is where I grew up, my grandfather walked across the border with about a third grade education not long after the turn of the century. 


That region along the Canadian border is getting very close to 1 in 4 people over 65. Even though I grew up there and  had tenure at SUNY Potsdam I sadly still left. In 1990 my high school class at Beekmantown (north of Plattsburgh) graduated 120 students, we played Potsdam high school in sectional basketball. Potsdam graduated around 140 students that year. Potsdam now graduates classes of 75-85. SUNY Potsdam now has less than half as many student as at peak. From around 4,500 to around 1,800 today. 


Many faculty at SUNY Potsdam rent a room in town and drive from Albany where they live the rest of the time. There is a death spiral seemingly forming on campus, dorms are shuttered and buildings are being town down and departments like Theater are shuttered.


Talking about SUNY Potsdam 30 years ago being so full of students that they are cramming three people into rooms, offering incredible food options and attracting world class faculty sounds like a fantasy today. It joins my ‘in the 90s Microsoft bought Apple stock to help Apple stay in business’ canon of stories that sound fake. Phish played SUNY Potsdam when I was there and it graduated more math majors than any Math department on the East Coast in the early 1980s.

“Sure, Dad.”

On that note, New York had something called “the Regents Scholarship” until 1990. It gave you $150 a semester and that paid tuition for the Baby Boomer generation in New York. All you needed was an 85 on your Regents exams and in New York you had free college as long as you could make it to a campus on a regular basis.

Time for some synthesis. I grew up on a long rural road, 45+ minute walk to a gas station. How does a young person like that get the car to go to the local campus? Thus dorms.

Dorms became a money making thing for colleges, as the state cut support fancy dorms became a thing nation-wide. I was told only 20% of SUNY Potsdam’s funding came from the state when I worked there.

Here’s the “old logic” part. SUNY central bans campuses from setting their own basic dorm rate. Look at the dates on this policy document, all from the 1980s. Policy from a time where professors could smoke in class as they taught. A dorm at Potsdam, where students have to drive hours on rural roads to get there, costs $11,830. A dorm at SUNY New Paltz is a quick train ride from NYC and costs $11,760.

To make the most vulgar explanation of how messed up this is, my home in Potsdam was a four bedroom Arts & Crafts house that costs $47,000 to buy. Here is a link to St. Lawrence County homes for sale for less than $100,000. Four years in a dorm is nearly half a house? 

So in a geographic area where we have a cratering youth population we have the SUNY system charging young people more than if they want to live on a campus where you can pop off to NYC every weekend. If you know Potsdam’s campus, this policy has resulted in Knowles dorm (which can house 500+ students) being closed and empty. There are so many students who absolutely would pick rural campuses if the dorms were included. Nobody needs this like the students who go to high school in rural areas. My high school class was 120 and there was one person with one parent who was not white. Free dorms would give rural kids a chance to interact with a diverse set of other kids.

Why are we not giving those dorms away for free? Because lack of funding for SUNY has caused campuses to rely on dorms for revenue, except the rural campuses can no longer fill dorms. When a college tells students they need to live on campus there are commonly reasons that have nothing to do with education.

Why not let the area die? Because it is a great place to live if you have community. I live in Ithaca, NY now, we have amazing waterfalls here and if you go to one you will be with lots of other people also visiting. At Potsdam my friends and I swam in waterfalls all day and saw nobody. My cheap house was a block from the river so I kayaked nearly every day from April until October. When I lived there and a family had a baby the community organizes and drops off food every day for a month. I know there are people who would prefer that life and never get a chance to even go up there. 

New York no longer has a massive youth population looking to get into any SUNY they can. The state could show it knows how to get parts of its government to work together to serve larger needs. Stop charging young people a small fortune to move to the parts of the state where we desperately need young people. Much of the ‘free tuition’ stuff we have heard is smoke and mirrors. Think about what a free dorm would do to the competitiveness of the campus. More students with money to spend at local businesses. Any time you hear about the plight of rural campuses the central idea comes from the obvious question of “why would anyone want to live there?” 

New York paid for 24-7 care for tens of thousands of incarcerated men in the same region, sad to see the state miss the opportunity to save some communities by shelling out for some relatively cheap dorms. 


 


 



Discuss

Changing the world for the worse

2026-02-23 07:55:23

Published on February 22, 2026 11:55 PM GMT

When I was 21, I was sucked into a world of ambition.

Starting my adult life in the Bay Area, I was surrounded by the sense that I was supposed to start a startup, change the world.

I never wanted to start a startup. Reading stories of famous founders, and living and working with startup founders myself, it seemed to me that the amount of belief you’d have to have in yourself and your idea bordered on insanity. Raised to value humility, and unable to even speak up for myself, it was a level of self-belief I couldn’t imagine ever reaching.

The version of changing the world that appealed to me was effective altruism. I didn’t have a grand vision; I just wanted to help people. The arguments for it seemed so simple, so obviously correct, when laid out in books and blog posts.

Right out of college, I joined an EA organization that worked with governments around the world on projects that cost tens of millions of dollars. One day at work, I was getting some beans and rice from the kitchen when I ran into a billionaire. All the money and power in the world were suddenly right there — and we were using them to save lives.

I coasted along for a time in that dream of changing the world for the better. I was young; many of us were. More than one person I knew had influence over millions of dollars before they were 25. The movement was young, too — too new to power to have yet stumbled into many of the pitfalls that come with it. As the movement grew faster and faster, accruing more followers, more money, and more political influence, it began to seem like we could do absolutely anything. It was a heady feeling.

Then, when I was 26, FTX collapsed. Suddenly, we all had to reckon with the effects of global-scale ambition. When it goes right, you can fund every charity and swing the election for Biden. When it goes wrong, you’ve been complicit in a criminal enterprise that shook the economy and fucked over a million people.

(I read Careless People last week, a memoir about how Facebook’s success put world-changing power in the hands of a few individuals, who were able to wield it almost entirely unchecked. When it goes right, you get democratic uprisings. When it goes wrong, you get genocide in Myanmar, and Trump as president.)

Around the same time as the FTX collapse, an AI arms race was beginning between OpenAI and Anthropic — two labs formed by people who’d been inspired by Bostrom’s Superintelligence, as we all were. By the logic of Superintelligence, it was just about the worst thing that could have happened.

People close to me were thrown into turmoil and depression. We’d done so much in the AI space, supporting and growing AI safety in all sorts of important ways — things that probably wouldn’t have happened without us. Now it seemed that all the investment that had gone into AI safety had had the primary effect of massively accelerating AI capabilities.

You try big things, you get big results.

I quit my EA job the month FTX collapsed, and I haven’t done anything in the space since. It wasn’t a big, dramatic, or even really deliberate decision. I was just burned out and disillusioned.

I still care about the world, and I’ve spent years feeling vaguely guilty that I’m no longer even pretending to work on its biggest problems. I thought I quit EA because I wanted to be happy (as an EA, I was constantly coercing myself to work on things that felt off to me, and was therefore constantly miserable). This felt like selfishness, or laziness. I struggled to justify myself in any other terms.

I don’t feel guilty anymore. I was talking about all this to a friend recently, and he said, “It seems plausible that the best thing to do if you really take AI x-risk seriously is to just stop working on AI at all.”

And that’s what I’ve been trying to say this whole time, whenever anyone asks me about my career. That I don’t want to try to have a big impact, if I can’t be certain that that impact will be positive rather than negative for the world — and I can’t be certain. To be certain of that would be hubris. Both in the memoirs I’ve read and in my real life, I’ve seen people who have genuinely wanted to change things for the better, gotten into the rooms where the sausage gets made, and ended up sickened by the consequences of what they were involved in.

EA funnels millions of dollars around. It funds career development for AI researchers who end up advancing capabilities at frontier labs. It funds insecticide-treated bed nets to protect people from malaria, and then those nets are used for fishing and pollute the waterways. The effect of the latter has been determined to be insignificant. The former, well, I guess it remains to be seen.



Discuss

The Scalable Formal Oversight Research Program

2026-02-23 06:40:30

Published on February 22, 2026 10:40 PM GMT

A scary, multi-eyed, tentacled monster is contained within a transparent box. Mathematical formulae (more like logical formulae, really) are transcribed in Greek on the sides of the box.

Introduction

Every serious person who thinks about AI safety observes the fundamental asymmetry between the ease of AI content generation and difficulty of audits thereof. In the codegen context, this leads unserious people to talk about the need for AI-driven unit test generation or LLM-as-a-judge, and serious people to talk about concepts like property based testing or refinement testing, fuzzing, interactive theorem proving, and other formal methods. Along these lines, Davidad et al introduced the “Guaranteed Safe AI” framework in their 2024 whitepaper; Quinn Dougherty started his newsletter and organized the Proof Scaling workshop at Lighthaven in 2024; and many others likewise gestured at variants of this basic idea without giving it a name. In this post, I’m going to use the umbrella term “Scalable Formal Oversight”, or SFO, coined by Mike Dodds, to describe any and all such ideas that basically sound like “let’s leverage formality to put AI in a box”. My goal with this document is to convince you that SFO matters and outline clear research projects you can contribute to which will advance the movement.

The core idea behind SFO is that models are getting increasingly capable, alignment may be impossible, and when we get terrifically useful but potentially misaligned models, we’ll need ways to audit their work. And formal verification offers a clear direction for how one might implement audits, at least in the codegen context.

This argument has a number of flaws. First, in the AI boxing literature, if you have a misaligned ASI in a box, and you don’t know what actuator the ASI is using to get out, then you are probably the actuator. SFO does not solve that problem. Sure, the code the ASI generated might be “safe” in the sense that it satisfies the safety invariants you specified, but it might be very dangerous in some subtle way you failed to specify (or even in a way that cannot be formally specified, e.g., it contains psychologically harmful and virulent information[1]). Second, the ASI might hack your verifier (see also). Third, you might want safety guarantees for tasks totally unrelated to codegen, and while it’s possible that FM has something to say about these tasks (see e.g. the French legal code), it’s by no means guaranteed.[2]

On the other hand, SFO has some key advantages that other AI safety approaches fundamentally lack. First it’s totally independent of the model. SFO is about the box, not the monster you put in it. Second, its reliability is independent of the problem insofar as, if the safety criteria can be expressed formally, then the reliability of a formal approach boils down to the reliability of the formal method (as opposed to boiling down to some messy statistical problem[3]).

So, the bullish case for SFO is roughly as follows. AI safety is super important whether or not you believe an ASI will kill us all[4]; and formal methods are super good at flagging unsafe code, with all the obvious caveats about specification difficulty. You may protest that formal methods are hard to use and limited in scope, but in general both problems are solved by LLMs being great at codegen – we no longer need to limit ourselves to decidable problems like LTL model checking, since in the glorious AI future, proofs are cheap. Indeed: there are a bunch of companies building models and harnesses for proving nontrivial theorems in Lean, and these models and harnesses actually work! You can try Harmonic’s Aristotle system right now on whatever problem you want. I’ve had it prove a number of highly challenging theorems spanning information theory, linear algebra, and group theory, in my free time, as have many other enterprising hobbyist mathematicians.[5] (See also, the opinions of a real mathematician on this topic.)

And the best part about SFO is that a lot of the most interesting research problems are not very hard to get started on. There is a lot of low hanging fruit.

The rest of this document is organized as follows. First, I list a bunch of open technical problems that I believe fall squarely within the SFO research agenda and are worth working on. Second, I list some open human/social problems such as organizing workshops and funding fellowships, which also need to be solved in order for SFO to advance. Finally, I conclude by briefly repeating my thesis in case your eyes glazed over at some point in the middle.

Open Problems

Jazzy, AI-generated artwork of computer scientists clustered together typing. There are some visual mistakes in the image because the AI isn't good enough yet at image generation. This is slightly ironic given the subject of the article.

Adversarially robust verifiers

In order for SFO to work, we need formal verifiers that are adversarially robust, i.e., that can’t be hacked. Quinn and I wrote about this problem in our prior LW post about FM. Since then, we’ve begun experimenting in the evenings with fuzzing Lean to try and find novel proofs of False. The threat landscape includes proof files that pass some, but not all, levels of validation; consistency issues in axioms; discrepancies between the assumed and actual semantics of imported axioms or definitions; the correctness of the underlying logic “on paper”; the faithfulness of the implementation of the “on paper” logic in the prover; and also the entire traditional cybersecurity landscape of software vulnerabilities (including supply-chain attacks, exploits targeting trusted C code, cryptographic attacks against proof attestations, etc.). In short: we need to verify the verifiers.

Adversarially robust sandboxes

Useful AI is, for the most part, agentic. So if we want to put untrusted AI in a box, we need a box that can handle agents, which really means we need a sandbox conjoined with some kind of formal-methods-based audit system. Which means we also need to check the sandbox. This game can be set up in various ways (e.g., breakout or battle-royale), but the common theme is the tasking of humans or agents with finding and exploiting bugs in sandboxes, and then the subsequent patching of said bugs.

Note, this problem relates to, but is distinct from, the much harder problem of containing ASI. I believe that the recommendations from the SL5 task force are nowhere near sufficient to solve this problem.[6]I plan to write a future blog post on this topic.

Formal theory of weird machines in agents

Language-theoretic security, or LangSec, is the study of how security degrades in a parsing context as the complexity of the language being parsed increases. In essence, a sufficiently complex parser is indistinguishable from a compiler, and thus once your data language has enough features, it becomes an accidental programming language. In LangSec, the features that give rise to computation are known as “gadgets”, and the resulting programmable machine is called a “weird machine”.

This principle clearly applies in the agent context[7], where the “gadgets” are the tools made available to the agent (which might be seemingly benign things like curl, man, and cat), and the “weird machine” is the resulting capability set of the agent (e.g., accidental Turing completeness plus Internet access …). We should expect adversarial agents to live off the land, and thus, we need a good way to know when the permission-set of an agent has cumulatively reached some capability threshold. Turing completeness is often used as a convenient catch-all for too-much-capability, but there are other thresholds which may be more interesting from an AI safety perspective, such as the ability to communicate (overtly or covertly) with other agents. Note that these kinds of questions are in general undecidable, but with turbocharged agentic proof tools at our disposal, I don’t think that should stop us from asking them![8]

Agents with formal methods

RLMs[9]offer a vision for how alternative kinds of context can improve agent performance. What if that context is inherently structured, hierarchical, and captures formal requirements (including, potentially, how those requirements change over time?). Maybe an agent could inherently code in a correct-by-construction manner. Put differently, it’s easy to imagine the paradigm of “vibecode, then audit; rinse and repeat until the code is safe”. But what about an agent that just inherently produces safe code to begin with? Can this be done in a language model context?[10]

To be super clear, it’s easy to imagine a vibe → verify → vibe loop. But this is absolutely not the only option. Two alternatives, which I mention mostly so you believe me that alternatives exist, are constrained decoding or constrained generation, and reinforcement learning with formal verification in-the-loop.[11]I claim that yet more (and better) alternatives exist to be discovered.

A sub-problem is the topic of porting from "bad" languages to "good" ones (e.g. C to Lean or Rust) in order to get some kind of safety guarantee(s), while preserving the (safe subset of the) semantics of the original program (see e.g., DARPA TRACTOR).

Another sub-problem is the topic of improving AI reasoning using FM. Here the goal is to uplift the logical capabilities of the model by exposing formal methods as tools, even if the end-goal of the model is inherently informal/unspecifiable.

Autoformalization benchmarks/evals

Benchmarks drive AI. So if we want AI models/systems/agents that can “do formal methods” we need good formal methods benchmarks. As a concrete example, I made RealPBT, a benchmark of 54k property based tests (PBTs) scraped from permissively licensed Github repos, and BuddenBench, a benchmark of open problems in pure mathematics (many of them with problem statements autoformalized in Lean). My collaborators at ForAll and Galois are working on a translation of RealPBT into Lean theorem-proving challenges (where the challenge is to prove the theorem implied by the PBT, over a lossy model of the code under test), which should be released soon.

The reinforcement learning version of this task is also important and neglected. FM tasks are phenomenal for RL because they are inherently and cheaply verifiable. Why don’t companies like Hillclimb work on FM RL environments? Which gets me to the next problem …

Models (explicitly) for secure program synthesis

The math-AI companies are mostly of the opinion that by solving math, they’ll solve everything, so they can eventually just pivot to secure program synthesis and eat the world.[12]This might be true – certainly language models generalize to hitherto unseen tasks – but it also might be the case that secure program synthesis capabilities just don’t scale fast enough when all we’re training for is the math olympiad. Cue the research direction: train models specifically for secure program synthesis. With tools like Tinker and Lab, you can now feasibly train frontier models at home. And it turns out even cheap models can prove theorems! Go forth and prosper.

Tools for specification

When you build an agent or a codegen tool, you quickly realize that some kind of specification or planning language is really useful. You may even decide you should build your codegen tool around this primitive (see e.g., Kiro, Claude’s Plan Mode, or Codespeak). You may then realize that with a sub-agent architecture, you suddenly need a language for agents to coordinate and resolve conflicts, and that this problem is deeply connected to the specification problem, since often what an agent needs to communicate is that there’s actually a problem with the original spec. (This is where people start crashing out researching gossip and consensus algorithms...) And at around this point in your journey, you may come to the conclusion that formal specs are, in fact, living things, and not set in stone like the 10 commandments.

All this gets us to the root issue that formal specification is a form of planning and it is, in fact, exceedingly difficult, even for experts. If we want to “put AI in a box” by specifying what it should do and then forcing it to prove its outputs consistent with our specs, we need to make it easier to specify stuff. So far nobody has even remotely solved this and it seems unlikely that SFO is tenable without this missing primitive.

Note, I am aware of one organization, in stealth, which is working on this problem. I’m happy to introduce you if you send me a serious inquiry.

This problem most likely needs to be solved in order for any of the others to be solvable in practice.

Beyond these open technical problems, SFO needs a significant amount of missing human infra. I’ll discuss that next.

Human Infrastructure

An array of tents are arranged on a hillside. Some tents are smaller and others are larger. Some of the tents are connected. A footpath makes it easy to walk from tent to tent.

I think it is noncontroversial to say that mechinterp has benefitted greatly from MATS, LessWrong, the AI Alignment Forum, etc. I believe SFO needs similar human infrastructure to bring people into the fold and support important work. I am working on this problem, and I think you should too. Here are some of my ideas. (They are pretty obvious but still worth writing down.)

Jobs sites

The math community has mathjobs, a website that lists math jobs. I recently bought fmxai.org and am using it to build something similar. Note that FMxAI is strictly speaking a bigger tent than SFO – it includes topics such as pure mathematics research using formal methods and AI – but I see no reason to subdivide further given how niche the domain is to begin with.

Hackathons

We need hackathons with cash prizes. I am speaking with collaborators at Apart Research and Forall Dev to try and make one such hackathon. To be announced! I also think other orgs such as Atlas Computing, Harmonic, Axiom, Math Inc, and Galois are very well-positioned to hold similar events.

Fellowships

We need research fellowships to support talented graduate students interested in SFO. I am not working on this problem – I hope you will!

In particular I think it would be amazing to have something like MATS but explicitly for SFO. If someone else wants to help with the funding side of things I am happy to help with organization, getting academic and industry partners, and other such legwork.

Informal communities

Quinn and I recently made a Secure Program Synthesis signal group-chat. I think it would be great to have something like the Lean Zulip but explicitly for SFO, and am hoping group chat naturally outgrows signal and becomes exactly that. There are of course also other overlapping communities – for example, I have been running the Boston Computation Club for about six years now, which has a reasonably active Slack community whose interests overlap with SFO, among other things. Community-building is important!

Conferences and workshops

I’m involved with FMxAI, a conference hosted by Atlas Computing at SRI. Once the 2026 event is announced I’ll feature it prominently on fmxai.org. I’ve also seen various related workshops pop up, e.g, Post-AI FM, the NeurIPS trustworthy agents workshop, etc. We need more of these!

Conclusion

A variety of monsters are contained in boxes covered in logical formulae. Tubes connect the various boxes.

You should believe that AI safety is extremely important regardless of your AI timelines. You should accept that codegen is one of the most powerful, and dangerous, use cases of AI. You should nod sagely when I say that formal methods offer some of the best, if not the best, tools for auditing generated code. You should activate the little modus-ponens circuit in your brain and conclude that SFO is super freaking important. If you do all this, then I hope you can take some inspiration from the (fairly informal) research agenda I’ve outlined above, and get involved on the research and/or human infra side to push SFO forward. This is a serious project and it requires all hands on deck!

Acknowledgements

Thank you to (in no particular order) Quinn Dougherty, Mike Dodds, Ella Hoeppner, Herbie Bradley, Alok Singh, Henry Blanchette, Thomas Murrills, Jake Ginesin, and Simon Henniger for thoughtful feedback and discussion during the drafting of this document. These individuals bear no responsibility for any mistakes or stupid things I say in the document; they only helped make it better than it would have been. Also, thank you to GPT 5.2 for the four images used to illustrate the post. They are imperfect, but fun.

  1. I think this is a serious risk of autonomous AI research. Impactful research can steer society; the research output becomes the actuator. Similarly, I view agent memory, as found in e.g. Chat GPT, as a potential steganographic scratchpad for longitudinal attacks. ↩︎

  2. See Andrew Dickson’s Limitations on Formal Verification for AI Safety for further discussion. ↩︎

  3. I mean for the love of God, you cannot convince me an LLM-as-judge approach will solve anything when the LLMs are backdoored by construction. ↩︎

  4. I mean, however AI-skeptical you are, certainly you accept that vibed code can be unsafe, and people will inevitably vibe code, including in safety-critical settings such as aviation. There are already AI agents trading on prediction markets and the stock market. It should be obvious that this could have dangerous and unexpected second/third/fourth-order consequences. ↩︎

  5. My wife called me the other day and asked what I was up to. I told her about an information theory problem I’d been trying to vibe using Aristotle, and an FMxAI signal group-chat I was organizing with Quinn. She then ruthlessly mocked me because “all the other husbands are watching the superbowl right now.” ↩︎

  6. A SCIF is nowhere near secure enough. On-body cameras are a fucking horrible idea. It’s incredibly naive to allow researchers with close family ties in other countries to access the thing. I could, and will, go on. ↩︎

  7. I believe I was the first to point out the connection between LangSec and AI security, although I did so before agents had become the norm, and the problem is much more complex and deserves considerably more attention in that context. ↩︎

  8. Note, the LangSec vision I outline above is one where proofs are cheap. There is also a more classical LangSec vision one might pursue where the problem is to cast agent actions or tool invocations into a language that is correct-by-construction, i.e. that inherently has certain safety guarantees (or at least, that is inherently observable at runtime) … however, I think this approach is too limited and not needed in the future where, as stated, proofs are cheap. ↩︎

  9. See also Prime Intellect’s blog post on the subject. ↩︎

  10. Joe Kiniry recently left Galois to make Sigil Logic which might be doing interesting stuff in this space! I don’t know their precise technical plan but am optimistic they’ll build something really powerful. Joe was on my PhD committee. ↩︎

  11. I very well may be wrong but kind of suspect P1-AI is doing something along these lines, but potentially playing with differentiable logics such as STL which are more amenable to CPS applications. ↩︎

  12. With two notable exceptions. Principia Labs isn’t interested in secure program synthesis – they’re a pure math play. And Theorem isn’t interested in math – they’re a secure program synthesis play. But is Theorem a “math-AI company”? Regardless, my statement mostly holds. ↩︎



Discuss

Adapters as Representational Hypotheses: What Adapter Methods Tell Us About Transformer Geometry

2026-02-23 06:12:14

Published on February 22, 2026 10:12 PM GMT

Full catalog with pseudocode: github.com/wassname/adapters_as_hypotheses

Disclaimer: This is an AI-guided iterative survey. It does not speak for me, but I share it in the hope that it is useful. I do think this is strong evidence of how to intervene in transformers.

The core claim

Adapter fine-tuning papers (like LoRA) are usually read as engineering races, but they are also experiments about model geometry.

We fine-tune transformers efficiently with low-rank adapters -- adding a constrained update to each weight matrix. Each adapter's constraint is also a hypothesis about model geometry -- about which transformations preserve useful computation and which directions in weight space matter. When one constrained adapter reliably beats another under similar budget, that is suggestive evidence about representation, not just optimization.

So the claim: adapter papers are an underused source of intervention evidence for interpretability, if we read them as hypothesis tests rather than benchmark churn.

Why this matters for interpretability

We want to understand how transformers work. There are many approaches -- probing, ablation, SAEs -- but most of them observe rather than intervene.

GDM's interpretability team put this well in their "pragmatic interpretability" post: empirical feedback on which structural assumptions hold up. Adapter benchmarks are exactly this kind of feedback -- I made a similar argument in my AntiPaSTO paper, arriving there from the adapter side.

The adapter literature is a natural experiment. Each method constrains the form of the weight update. When a constrained method matches or beats an unconstrained one, that supports the possibility that the constraint aligns with real structure in the weight manifold. When it generalizes OOD, the case for causal relevance is stronger.

What the evidence says

1. The SVD basis is the natural coordinate system

Methods that use the model's own SVD decomposition often outperform random-basis methods in reported setups at similar parameter count:

  • PiSSA (NeurIPS 2024): Initialize LoRA from top- SVD of , freeze the residual. Gemma-7B on GSM8K: PiSSA 77.7% vs LoRA 74.5%. Same architecture, same params -- the only difference is which subspace you start in.
  • SVFT: Fix both singular vector sets from 's SVD, learn only sparse coefficients. Recovers 96% of full FT performance with 0.006% of parameters. LoRA/DoRA recover only 85% with 0.03-0.8%.
  • SSVD: Rotate right singular vectors (Cayley transform), shift singular values, keep left singular vectors fixed. Matches LoRA with 10M fewer params on domain-shifted ASR.

The message: the SVD basis is not just a mathematical convenience. In current benchmarks, it appears to provide useful adaptation coordinates.

2. Orthogonal adapters preserve something real

The OFT family (OFT, BOFT, GOFT, HRA) constrains adaptation to orthogonal transformations -- rotations without scaling. They work well on tasks where you want to repurpose existing representations without destroying them (DreamBooth, ControlNet, domain adaptation).

HRA makes a surprising bridge: a chain of Householder reflections is both orthogonal and equivalent to a rank- perturbation. The "low-rank vs orthogonal" dichotomy is less sharp than often presented. The effective adaptation may be low-rank and approximately orthogonal simultaneously.

3. Direction and strength decouple

Three independent teams converged on the same design: separate what to change (direction in weight space) from how much to change it (magnitude):

  • DoRA (ICML 2024): Magnitude/direction decomposition of . Consistently beats LoRA.
  • DeLoRA (ICLR 2025): Normalize each rank-1 component, introduce learnable scalar . Better robustness to learning rate.
  • ROAD: 2D rotary adaptation with explicit angle and magnitude .

When you don't decouple them (standard LoRA), optimization can entangle direction and magnitude updates. Testable prediction: methods that decouple direction from strength may show better OOD transfer, because direction can encode what to change while strength can encode how much.

4. Scaling alone goes surprisingly far

IA3 learns nothing but a per-channel scaling vector (, initialized to 1). With T0-3B, it outperforms ICL with GPT-3 175B on Super-NaturalInstructions. LN Tuning learns only LayerNorm affine parameters (~0.5% of model).

A substantial fraction of "task adaptation" can be reweighting existing features -- gain control over channels. In those settings, the bottleneck is often selection rather than new feature creation. When scaling fails, new feature combinations are likely needed.

Design lineages

One interesting pattern: you can trace design lineages that progressively refine the same hypothesis.

Orthogonal family: OFT (block-diagonal rotation) -> BOFT (butterfly factorization, params) -> GOFT (Givens rotations, params) -> HRA (Householder reflections, bridges to low-rank)

SVD-aware family: PiSSA (SVD initialization) -> SVFT (sparse SVD coefficients) -> SSVD (asymmetric U/V treatment + Cayley rotation) -> AntiPaSTO (Cayley + steering coefficient)

Decoupling family: DoRA (magnitude/direction) -> ETHER (fixed-strength orthogonal) -> DeLoRA (normalized rank-1 + ) -> AntiPaSTO (-controlled rotation)

Each refinement tests a more specific version of the parent hypothesis. When the refinement works better, we learn something more specific about the geometry.

The catalog

I went through ~30 adapter methods in HuggingFace PEFT and the broader literature. For each one I:

  1. Extracted pseudocode for the forward pass (what the intervention actually does)
  2. Stated the hypothesis it encodes about transformer internals
  3. Graded the evidence on independent dimensions:
Dim Pts Meaning
PE 1 Parameter-efficient: competitive at <1% params
BL 1 Beats LoRA at comparable budget
BF 1.5 Matches or beats full fine-tuning
DE 1.5 Data-efficient: faster convergence or fewer examples
OOD 2 Generalizes out-of-distribution
WA 1 Widely adopted as baseline
Total 8 1+1+1.5+1.5+2+1

Score = sum of earned dimensions. Higher = stronger evidence that the method's structural hypothesis is correct.

Scorecard (top methods)

# Method Score Breakdown Theme
6 PiSSA 5.0 PE+BL+BF+DE SVD basis
4 DoRA 4.5 PE+BL+BF+WA dir/strength
11 AntiPaSTO* 4.5 PE+DE+OOD SVD+rotation
13 BOFT 4.0 PE+BF+DE orthogonal
5 DeLoRA 3.5 PE+BL+DE dir/strength
8 SSVD 3.5 PE+BL+DE SVD basis
31 CLOVER 3.5 PE+BL+BF SVD+architecture
32 PSOFT 3.5 PE+BL+DE SVD+orthogonal
1 LoRA 2.0 PE+WA low-rank

* own work -- it was developed with this PoV in mind. 30 methods total; see full catalog for the rest.

The full catalog with pseudocode is at github.com/wassname/adapters_as_hypotheses. Here I'll summarize the main findings.

What I'm most uncertain about

  • Scale dependence. Most of these results are on 1B-7B models. The geometry might change at 70B+. Some evidence (SSVD) suggests the SVD hypothesis gets stronger with scale, but this isn't settled.
  • Task dependence. Orthogonal methods shine on vision/generation (semantic preservation) but may not apply where magnitude changes matter (NLU, reasoning). The "right" geometry may be task-specific.
  • Controlled comparisons are rare. Many papers compare against LoRA with different hyperparameters, different scales, different tasks. The cleanest evidence comes from papers that do careful all-else-equal ablations (DoRA, PiSSA, SSVD).
  • Publication bias. Methods that don't work don't get published. The catalog over-represents "successful" hypotheses.
  • My own bias. I developed AntiPaSTO with these insights in mind, so it's unsurprising it scores well under criteria that match its design goals. I can't be fully objective here.

The repo

The full catalog with pseudocode, evidence, and grades for 30 methods is at:

github.com/wassname/adapters_as_hypotheses

Each entry has the paper saved to docs/ for reference. Contributions welcome -- if I've mischaracterized a method or missed one, open an issue.



Discuss

My RSS Reader is Done

2026-02-23 03:06:55

Published on February 22, 2026 7:06 PM GMT

I posted a few months ago about vibe-coding an RSS reader. The mood on the internet seems to be that these apps are buggy and never get finished, so I figured it was worth posting an update. Another thousand commits later, Lion Reader supports every feature I care about, works reliably, has very good performance, and is open for public signups[1].

I've actually been using it for a while, and mostly kept signups closed because of some expensive features like AI summaries (now handled by making users provide their own API keys).

The features are entirely designed around what me and a friend find useful, but I think I have decent taste so consider Lion Reader if they're useful to you as well:

I thought it would be funny to use a demo of the app to describe the features, so if you're curious, all of the features are listed in "articles" here.

 

I think at this point, every feature I care about is implemented on the web app, although I want to improve how the local caching works and incidentally add offline mode based on that. At some point I might make a native app for Android again, but I don't really have time to validate one.

I'm open to feature requests and pull requests (as long as they include screenshots to demonstrate that the feature works).

  1. ^

    Signups require a Google or Apple account as a minimal anti-bot measure.

  2. ^

    For example, you can tell Claude Code to run an ML experiment and upload a report to Lion Reader when it's done.

  3. ^

    Unfortunately, narration only works in the foreground because we're not a native app. I have ideas for supporting background play but it's very complicated so I haven't got around to it.

  4. ^

    The Chrome extension is still in review.

  5. ^

    Some features require paid API keys for other services, but they're all optional.



Discuss

What to Do About AGI

2026-02-23 03:00:06

Published on February 22, 2026 7:00 PM GMT

As claimed in my last post, minimum viable AGI is here. Given that, what should we do about it? Since I was asked, here are my recommendations.

Spread Awareness

By my reasoning, the most important thing is to get as many people as possible to realize what's going on. If you don't want to call it AGI, that's fine, but the simple fact is that we've already seen AIs that refuse shutdown, continually maximize objectives in the real world (i.e. we have MVP paperclip maximizers), and can red team computer systems by exploiting vulnerabilities. Yes, these current AI applications aren't reliable enough to be a serious threat, but given a few more weeks and another round of base model enhancements, they probably will be.

The simplest thing you can do is talk to your friends and family. Make sure they understand what's going on. If you can, maybe get them to read something, like If Anyone Builds It, Everyone Dies, or watch something, like the upcoming AI Doc movie. I think broad awareness is important, because the most pressing thing that needs to be done is to enact policy.

Policy Action

We don't know how to build safe AGI, let alone safe ASI. We have some promising ideas, but those ideas need time. Policy interventions are how we buy that time.

Enacting policy generally requires the support from constituents. So once awareness is raised, the next step is to ask your government to take action. For those of us living in Western democracies, and especially those of us living in the United States, this means reaching out to our government representatives and letting them know how we feel, and encouraging others to do the same.

The only org I know of doing much in the way of political organizing around safety is Pause AI (Pause AI USA). I'd recommend at least getting on their mailing list, since they'll notify you when contacting your representatives would support specific policies.

On the outside chance you're a policy person who's reading this and not already involved, there are any number of open roles in AI policy you might take to work on safety.

Safety Research

Finally, there's safety research. From the outside, it probably feels like there's a lot of people working on safety. There aren't, especially relative to how many people are working on pure capabilities. Assuming policy is enacted that buys us time, this is the work that will matter to make the technology safe.

If you're not already engaged here, I'd recommend checking out 80k's guidance and job board for more info. In my opinion, we most desperately need more folks working to actually solve alignment, and right now I'm aware of very few ideas that even stand a chance.

If you have your own suggestions for things people should do, please share them in the comments.



Discuss