2026-01-08 23:20:19
Published on January 8, 2026 3:20 PM GMT
Reading through the recent “vision for alignment” writings, I notice a convergence in underlying posture even when the techniques differ: alignment is increasingly framed as a problem of 'control'. The dominant focus is on how to constrain, monitor and recover from systems we do not fully trust. Primary safety leverage is placed in externally applied oversight, evaluation and safeguard stacks.
This does not mean control-oriented approaches are wrong. Control is an essential component of any system deployment. What feels wrong is that control now appears to dominate how alignment is conceptualized, crowding out other ways of thinking about what it would mean for advanced systems to behave well.
This dominance of control-reliant framing is echoed in the stated visions and research directions of the frontier labs. Some examples...
Anthropic provides unusually clear articulation of the control-first posture, both in informal alignment writing and in formal governance commitments. Looking at Anthropic’s institutional posture: their Responsible Scaling Policy (RSP) and its updates - RSP defines safety in terms of graduated safety/security measures that scale with model capability:
In recent essay Alignment remains a hard, unsolved problem, the author frames 'outer alignment' largely as the problem of oversight, writing (emphasis mine):
OpenAI’s public alignment and safety posture is articulated most clearly through its Preparedness Framework, which specifies how alignment is to be operationalized as models advance. The framework emphasizes identifying and measuring severe risks from increasingly capable models, and tying deployment decisions to the presence of appropriate safeguards.
The document further structures alignment around capability tracking, risk thresholds, and mitigations, categorizing risks (e.g., cyber, bio, autonomy, persuasion) and tying deployment decisions to the presence of safeguards.
OpenAI’s more recent domain-specific safety posts follow the same pattern. For example, in its discussion of AI and cyber risk, OpenAI frames safety as a matter of layered defences, combining monitoring, access controls, and policy enforcement to prevent misuse as capabilities grow.
Google DeepMind’s public “vision for alignment” reads as a two-track control stack: (1) build better model-level mitigations (training + oversight), but (2) assume you still need system-level safety frameworks - capability thresholds, monitoring, access controls, and protocols that kick in as risks rise.
This is stated quite explicitly across the places where DeepMind actually lays out its strategy:
On DeepMind’s blog, Taking a responsible path to AGI foregrounds proactive risk assessment, monitoring, and security measures. It explicitly says:
“Through effective monitoring and established computer security measures, we’re aiming to mitigate harm…” and ties “transparency” to interpretability as a facilitating layer: “We do extensive research in interpretability…”
These reflect the same broader pattern: alignment is conceptualized primarily as a problem of applying the right type and right level of control.
A recent field-level snapshot reinforces this picture in unusually direct terms. In AI in 2025: gestalt, a year-end synthesis of the technical alignment landscape, the current state of the field is described as follows:
“The world’s de facto strategy remains ‘iterative alignment’, optimising outputs with a stack of alignment and control techniques everyone admits are individually weak.”
The same piece is explicit about the fragility of this approach, noting that:
“current alignment methods are brittle"
and that many of the practical gains observed over the past year are not due to fundamental improvements in model robustness, but instead come from external safeguards, such as auxiliary models and layered defences. Alignment is seen less as a principled account of what trustworthy system behavior should look like, and more as an accumulation of layered mitigations: evaluate, patch, scaffold, monitor, repeat.
This does not mean the approach is irrational or misguided. Iterative alignment may well be the only viable strategy under current constraints.
So, what feels wrong is the reliance on a single dominant framing when the stakes are this high. Alignment remains hard not only because of technical difficulties, but also because of how the problem is being conceptualized.
There are indeed headline risks that frontier labs, policy bodies, and governance frameworks repeatedly emphasize. These include Biological & Chemical, Cybersecurity, Autonomy / Long-range autonomy, AI Self-improvement, Autonomous Replication and Adaptation, Catastrophic harm, harmful manipulation et al[1]. These risks are real, and they rightly motivate much of the current emphasis on monitoring, safeguards, and deployment constraints. They arise as system development progresses along different dimensions[2].
However, these descriptions primarily characterize what failure might look like at scale, not why such failures arise. They enumerate outcomes and scenarios, rather than the underlying kinds of system-level or interaction-level breakdowns that make those outcomes possible. When risks are framed primarily in terms of catastrophic end states rather than underlying causes, the most natural response is containment and control. In the absence of a model of how different risks are generated, alignment defaults to patching visible failures as they appear.
This framing has real costs. Without a cause-based model of risk, it becomes difficult to:
Instead, all risks - ranging from misuse and manipulation to self-improvement and long-horizon autonomy end up getting treated as instances of a single problem: insufficient control. The result is a conceptual flattening, where alignment is approached as a matter of applying the right amount of oversight, monitoring, and restriction, rather than as a problem with multiple distinct failure causes.
If different failures arise from different kinds of breakdowns, then treating alignment as a single control problem is conceptually inadequate.
In the next post, I propose a cause-based grouping of AI risk. The goal is not to add yet another list of dangers, but to make explicit the underlying reasons that generate them and, in doing so, to open up a broader conversation about how alignment might be pursued beyond control alone.
References for risks named by some frontier labs and policy bodies:
In earlier work, I explored where different AI risks tend to arise by mapping them across interacting dimensions of capability, cognition, and beingness.
2026-01-08 23:00:20
Published on January 8, 2026 3:00 PM GMT
Claude Code is the talk of the town, and of the Twitter. It has reached critical mass.
Suddenly, everyone is talking about how it is transforming their workflows. This includes non-coding workflows, as it can handle anything a computer can do. People are realizing the power of what it can do, building extensions and tools, configuring their setups, and watching their worlds change.
I’ll be covering that on its own soon. This covers everything else, including ChatGPT Health and the new rounds from xAI and Anthropic.
Assemble all your records of interactions with a bureaucracy into a bullet point timeline, especially when you can say in particular who said a particular thing to you.
Amazon’s AI assistant Rufus is in 40% of Amazon Mobile sessions and is correlated with superior sales conversions. People use whatever AI you put in front of them. Rufus does have some advantages, such as working on the phone and being able to easily access previous order history.
Notice which real world events the AIs refuse to believe when you ask for copy editing.
On Twitter I jokingly said this could be a good test for politicians, where you feed your planned action into ChatGPT as something that happened, and see if it believes you, then if it doesn’t you don’t do the thing. That’s not actually the correct way to do this, what you want to do is ask why it didn’t believe you, and if the answer is ‘because that would be f***ing crazy’ then don’t proceed unless you know why it is wrong.
PlayStation is exploring letting AI take over your game when you are stuck and have patented a related feature.
Andrew Rettek: Experienced adult gamers will hate this, but kids will love it. If it’s done well it’ll be a great tutorial tool. It’s a specific instance of an AI teaching tool, and games are low stakes enough for real experimentation in that space.
The obvious way for this to work is that the game would then revert to its previous state. So the AI could show you what to do, but you’d still have to then do it.
Giving players the option to cheat, or too easily make things too easy, or too easily learn things, is dangerous. You risk taking away the fun. Then again, Civilization 2 proved you can have a literal ‘cheat’ menu and players will mostly love it, if there’s a good implementation, and curate their own experiences. Mostly I’m optimistic, especially as a prototype for a more general learning tool.
Claude Code 2.1.0 has shipped, full coverage will be on its own later.
Levels of friction are on the decline, with results few are prepared for.
Dean Ball: nobody has really priced in the implications of ai causing transaction costs to plummet, but here is one good example
Andrew Curran: JP Morgan is replacing proxy advisory firms with an in-house Al platform named ‘Proxy IQ’ – which will analyze data from annual company meetings and provide recommendations to portfolio managers. They are the first large firm to stop using external proxy advisers entirely.
The underlying actions aren’t exactly news but Yann LeCun confesses to Llama 4 benchmark results being ‘fudged a little bit’ and using different models for different benchmarks ‘to give better results.’ In my culture we call that ‘fraud.’
Jack Clark of Anthropic predicts we will beat the human baseline on PostTrainBench by September 2026. Maksym thinks they’ll still be modestly short. I have created a prediction market.
Lulu Cheng Meservey declares the key narrative alpha strategy of 2026 will be doing real things, via real sustained effort, over months or longer, including creating real world events, ‘showing up as real humans’ and forming real relationships.
near: It may be hard to discern real and fake *content*, but real *experiences* are unmistakable
sports betting, short form video – these are Fake; the antithesis to a life well-lived.
Realness may be subjective but you know it when you live it.
It’s more nuanced than this, sports betting can be real or fake depending on how you do it and when I did it professionally that felt very real to me, but yes you mostly know a real experience when you live it.
I hope that Lulu is right.
Alas, so far that is not what I see. I see the people rejecting the real and embracing the fake and the slop. Twitter threads that go viral into the 300k+ view range are reliably written in slop mode and in general the trend is towards slop consumption everywhere.
I do intend to go in the anti-slop direction in 2026. As in, more effort posts and evergreen posts and less speed premium, more reading books and watching movies, less consuming short form everything. Building things using coding agents.
The latest fun AI fake was a ‘whistleblower’ who made up 18 pages of supposedly confidential documents from Uber Eats along with a fake badge. The cost of doing this used to be high, now it is trivial.
Trung Phan: Casey Newton spoke with “whistleblower” who wrote this viral Reddit food delivery app post.
Likely debunked: the person sent an AI-generated image of Uber Eats badge and AI generated “internal docs” showing how delivery algo was “rigged”.
Newton says of the experience: “For most of my career up until this point, the document shared with me by the whistleblower would have seemed highly credible in large part because it would have taken so long to put together. Who would take the time to put together a detailed, 18-page technical document about market dynamics just to troll a reporter? Who would go to the trouble of creating a fake badge?
Today, though, the report can be generated within minutes, and the badge within seconds. And while no good reporter would ever have published a story based on a single document and an unknown source, plenty would take the time to investigate the document’s contents and see whether human sources would back it up.”
The internet figured this one out, but not before quite a lot of people assumed it was real, despite the tale including what one might call ‘some whoppers’ including delivery drivers being assigned a ‘desperation score.’
Misinformation continues to be demand driven, not supply driven. Which is why the cost of doing this was trivial, the quality here was low and it was easy to catch, yet this attempt succeeded wildly, and despite that people mostly don’t do it.
Less fun was this AI video, which helpfully has clear cuts in exactly 8 second increments in case it wasn’t sufficiently obvious, on top of the other errors. It’s not clear this fooled anyone or was trying to do so, or that this changes anything, since it’s just reading someone’s rhetoric. Like misinformation, it is mostly demand driven.
The existence of AI art makes people question real art, example at the link. If your response is, ‘are you sure that picture is real?’ then that’s the point. You can’t be.
Crazy productive and excited to use the AI a lot, that is. Which is different from what happened with 4o, but makes it easy to understand what happened there.
Will Brown: my biggest holiday LLM revelation was that Opus is just a magnificent chat model, far better than anything else i’ve ever tried. swapped from ChatGPT to Claude as daily chat app. finding myself asking way more & weirder questions than i ever asked Chat, and loving it
for most of 2025 i didn’t really find much value in “talking to LLMs” beyond coding/search agents, basic googlesque questions, or random tests. Opus 4.5 is maybe the first model that i feel like i can have truly productive *conversations* with that aren’t just about knowledge
very “smart friend” shaped model. it’s kinda unsettling
is this how all the normies felt about 4o. if so, i get it lol
Dean Ball: undoubtedly true that opus 4.5 is the 4o of the 130+ iq community. we have already seen opus psychosis.
this one’s escaping containment a little so let me just say for those who have no context: I am not attempting to incite moral panic about claude opus 4.5. it’s an awesome model, I use it in different forms every single day.
… perhaps I should have said opus 4.5 is the 4o of tpot rather than using iq. what I meant to say is that people with tons of context for ai–people who, if we’re honest, wouldn’t have touched 4o with a ten-foot pole (for the most part they used openai reasoners + claude or gemini for serious stuff, 4o was a google-equivalent at best for them)–are ‘falling for’ opus in a way they haven’t for any other model.
Sichu Lu: It’s more like video game addiction than anything else
Dean Ball: 100%.
Atharva: the reason the 4o analogy did not feel right is because the moment Opus 5 is out, few are going to miss 4.5
I like the personality of 4.5 but I like what it’s able to do for me even more
Indeed:
Dean Ball: ai will be the fastest diffusing macroinvention in human history, so when you say “diffusion is going to be slow,” you should ask yourself, “compared to what?”
slower than the most bullish tech people think? yes. yet still faster than all prior general-purpose technologies.
Dave Kasten: Most people [not Dean] can’t imagine what it’s like when literally every employee is a never-sleeping top-performing generalist. They’ve mostly never (by definition!) worked with those folks.
Never sleeping, top performing generalist is only the start of it, we’re also talking things like limitlessly copyable and parallelizable, much faster, limitless memory and so on and so forth. Almost no one can actually understand what this would mean. And that’s if you force AI into a ‘virtual employee’ shaped box, which is very much not its ideal or final form.
As Timothy Lee points out, right now OpenAI’s revenue of $13 billion is for now a rounding error in our $30 trillion of GDP, and autonomous car trips are on the order of 0.1% of all rides, so also a rounding error, while Waymo grows at an anemic 7% a month and needs to pick up the pace. And historically speaking this is totally normal, these companies have tons of room to grow and such techs often take 10+ years to properly diffuse.
At current growth rates, it will take a lot less than 10 years. Ryan Greenblatt points out revenue has been growing 3x every year, which is on the low end of estimates. Current general purpose AI revenue is 0.25% of America’s GDP, so this straightforwardly starts to have major effects by 2028.
Will AI take the finance jobs? To think well about that one must break down what the finance jobs are and what strategies they use, as annanay does here.
The conceptual division is between:
There’s a continuum rather than a binary, you can totally be a hybrid. I agree with the view that these are still good jobs and it’s a good industry to go into if your goal is purely ‘make money in worlds where AI remains a normal technology,’ but it’s not as profitable as it once was. I’d especially not be excited to go into pure black box work, as that is fundamentally ‘the AI’s job.’
Whereas saying ‘working at Jane Street is no longer a safe job’ as general partner of YC Ankit Gupta claimed is downright silly. I mean, no job is safe at this point, including mine and Gupta’s, but yeah if we are in ‘AI as normal technology’ worlds, they will have more employees in five years, not less. If we’re in transformed worlds, you have way bigger concerns. If AI can do the job of Jane Street traders then I have some very, very bad news for basically every other cognitive worker’s employment.
From his outputs, I’d say Charles is a great potential hire, check him out.
Charles: Personal news: I’m leaving my current startup role, looking to figure out what’s next. I’m interested in making AI go well, and open to a variety of options for doing so. I have 10+ years of quant research and technical management experience, based in London. DM if interested.
OpenAI is further embracing using ChatGPT for health questions, and it is fully launching ChatGPT Health (come on, ChatGP was right there)
OpenAI: Introducing ChatGPT Health — a dedicated space for health conversations in ChatGPT. You can securely connect medical records and wellness apps so responses are grounded in your own health information.
Designed to help you navigate medical care, not replace it.
Join the waitlist to get early access.
If you choose, ChatGPT Health lets you securely connect medical records and apps like Apple Health, MyFitnessPal, and Peloton to give personalized responses.
ChatGPT Health keeps your health chats, files, and memories in a separate dedicated space.
Health conversations appear in your history, but their info never flows into your regular chats.
View or delete Health memories anytime in Health or Settings > Personalization.
We’re rolling out ChatGPT Health to a small group of users so we can learn and improve the experience. Join the waitlist for early access.
We plan to expand to everyone on web & iOS soon.
Electronic Health Records and some apps are US-only; Apple Health requires iOS.
Fidji Simo has a hype post here, including sharing a personal experience where this helped her flag an interaction so her doctor could avoid prescribing the wrong antibiotic.
It’s a good pitch, and a good product. Given we were all asking it all our health questions anyway, having a distinct box to put all of those in, that enables compliance and connecting other services and avoiding this branching into other chats, seems like an excellent feature. I’m glad our civilization is allowing it.
That doesn’t mean ChatGPT Health will be a substantial practical upgrade over vanilla ChatGPT or Claude. We’ll have to wait and see for that. But if it makes doctors or patients comfortable using it, that’s already a big benefit.
Zhenting Qi and Meta give us the Confucius Code Agent, saying that agent scaffolding ‘matters as much as, or even more than’ raw model capability for hard agentic tasks, but they only show a boost from 52% to 54.3% on SWE-Bench-Pro for Claude Opus 4.5 as their central result. So no, that isn’t as important as the model? The improvements with Sonnet are modestly better, but this seems obviously worse than Claude Code.
I found Dan Wang’s 2025 Letter to be a case of Gelman Amnesia. He is sincere throughout, there’s much good info, and if you didn’t have any familiarity with the issues involved this would be a good read. But now that his focus is often AI or other areas I know well, I can tell he’s very much skimming the surface without understanding, with a kind of ‘greatest hits’ approach, typically focusing on the wrong questions and having taken in many of the concepts and reactions I try to push back against week to week, and not seeming so curious to dig deeper, falling back upon his heuristics that come from his understanding of China and its industrial rise.
OpenAI CEO of products Fidji Simo plans to build ‘the best personal super-assistant’ in 2026, starting with customizable personality and tone.
Fidji Simo: In 2026, ChatGPT will become more than a chatbot you can talk to to get advice and answers; it will evolve into a true personal super-assistant that helps you get things done. It will understand your goals, remember context over time, and proactively help you make progress across the things that matter most. This requires a shift from a reactive chatbot to a more intuitive product connected to all the important people and services in your life, in a privacy-safe way.
We will double down on the product transformations we began in 2025 – making ChatGPT more proactive, connected, multimedia, multi-player, and more useful through high-value features.
Her announcement reads as a shift, as per her job title, to a focus on product features and ‘killer apps,’ and away from trying to make the underlying models better.
Anthropic raising $10 billion at a $350 billion valuation, up from $183 billion last September.
xAI raises $20 billion Series E. They originally targeted $15 billion at a $230 billion valuation, but we don’t know the final valuation for the round.
xAI: User metrics: our reach spans approximately 600 million monthly active users across the 𝕏 and Grok apps.
Rohit: 600m MAUs is an intriguing nugget considering xAI is the only AI lab to own a social media business, which itself has 600m MAUs.
I can see the argument for OpenAI depending on the exact price. xAI at $230 billion seems clearly like the worst option of the three, although of course anything can happen and nothing I write is ever investment advice.
And also LMArena raised money at a valuation of $1.7 billion. I would not be excited to have invested in that one.
Ben Thompson approves of Nvidia’s de facto acquisition of Groq, despite the steep price, and notes that while this was a ‘stinky deal’ due to the need to avoid regulatory scrutiny, they did right by the employees.
Financial Times forecasts the 2026 world as if Everybody Knows there is an AI bubble, and that the bubble will burst, and the only question is when, then expecting it in 2026. But then they model this ‘bursting bubble’ as leading to only a 10%-15% overall stock market decline and ‘some venture capital bets not working out,’ which is similar to typical one year S&P gains in normal years, and it’s always true that most venture capital bets don’t work out. Even if all those losses were focused on tech, it’s still not that big a decline, tech is a huge portion of the market at this point.
This is pretty standard. Number go up a lot, number now predict number later, so people predict number go down. Chances are high people will, at some point along the way, be right. The Efficient Market Hypothesis Is False, and AI has not been fully priced in, but the market is still the market and is attempting to predict future prices.
Jessica Taylor collects predictions about AI.
Simon Lermen points out more obvious things about futures with superintelligent AIs in them.
An important point that, as Daniel Eth says, many people are saying:
Jacques: It’s possible to have slow takeoff with LLM-style intelligence while eventually getting fast takeoff with a new paradigm.
Right now we are in a ‘slow’ takeoff with LLM-style intelligence, meaning the world transforms over the course of years or at most decades. That could, at essentially any time, lead to a new paradigm that has a ‘fast’ takeoff, where the world is transformed on the order of days, weeks or months.
Can confirm Daniel Eth here, contra Seb Krier’s original claim but then confirmed by Seb in reply, that ‘conventional wisdom in [AI] safety circles’ is that most new technologies are awesome and should be accelerated, and we think ~99% of people are insufficiently gung-ho about this, except for the path to superintelligence which is the main notably rare exception (along with Gain of Function Research and few other other specifically destructive things). Seb thinks ‘the worried’ are too worried about AI, which is a valid thing to think.
I’d also note that ‘cosmic existential risk,’ meaning existential risks not coming from Earth, are astronomically unlikely to care about any relevant windows of time. Yes, if you are playing Stellaris or Master of Orion, you have not one turn to lose, but that is because the game forcibly starts off rivals on relatively equal footing. The reason the big asteroid arrives exactly when humanity barely has the technology to handle it is that if the asteroid showed up much later there would be no movie, and if it showed up much earlier there would be either no movie or a very different movie.
Ajeya Corta predicts we will likely have a self-sufficient AI population within 10 years, and might have one within 5, meaning one that has the ability to sustain itself even if every human fell over dead, which as Ajeya points out is not necessary (or sufficient) for AI to take control over the future. Timothy Lee would take the other side of that bet, and suggests that if it looks like he might be wrong he hopes policymakers would step in to prevent it. I’d note that it seems unlikely you can prevent this particular milestone without being willing to generally slow down AI.
Why do I call the state regulations of AI neutered? Things like the maximum fine being a number none of the companies the law applies to would even notice:
Miles Brundage: Reminder that the maximum first time penalty from US state laws related to catastrophic AI risks is $1 million, less than one average OpenAI employee’s income. It is both true that some state regs are bad, and also that the actually important laws are still extremely weak.
This is the key context for when you hear stuff about AI Super PACs, etc. These weak laws are the ones companies fight hard to stop, then water down, then when they pass, declare victory on + say are reasonable and that therefore no further action is needed.
And yes, companies *could* get sued for more than that… …after several years in court… if liability stays how it is… But it won’t if companies get their way + politicians cave to industry PACs.
This is not a foregone conclusion, but it is sufficiently likely to be taken very seriously.
My preference would ofc be to go the opposite way – stronger, not weaker, incentives.
Companies want a get out of jail free card for doing some voluntary safety collaboration with compliant government agencies.
Last week I mentioned OpenAI President Greg Brockman’s support for the anti-all-AI-regulation strategic-bullying SuperPAC ‘Leading the Future.’ With the new year’s data releases we can now quantify this, he gave Leading the Future $25 million dollars. Also Gabe Kaminsky says that Brockman was the largest Trump donor in the second half of 2025, presumably in pursuit of those same goals.
Other million dollar donors to Leading the Future were Foris Dax, Inc ($20M, crypto), Konstantin Sokolov ($11M, private equity), Asha Jadeja ($5M, Blackstone), Stephen Schwarzman ($5M, SV VC), Benjamin Landa ($5M, CEO Sentosa Care), Michelle D’Souza ($4M, CEO Unified Business Technologies), Chase Zimmerman ($3M), Jared Isaacman ($2M) and Walter Schlaepfer ($2M).
Meanwhile Leading the Future continues to straight up gaslight us about its goals, here explicitly saying it is a ‘lie’ that they are anti any real regulation. Uh huh.
I believe that the Leading the Future strategy of ‘openly talk about who you are going to drown in billionaire tech money’ will backfire, as it already has with Alex Bores. The correct strategy, in terms of getting what they want, is to quietly bury undesired people in such money.
This has nothing to do with which policy positions are wise – it’s terrible either way. If you are tech elite and are going to try to primary Ro Khanna due to his attempting to do a no good, very bad wealth tax, and he turns around and brags about it in his fundraising and it backfires, don’t act surprised.
Tyler Cowen makes what he calls a final point in the recent debates over AGI and ideal tax policy, which is that if you expect AGI then that means ‘a lot more stuff gets produced’ and thus it means you do not need to raise taxes, whereas otherwise given American indebtedness you do have to raise taxes.
Tyler Cowen: I’ve noted repeatedly in the past that the notion of AGI, as it is batted around these days, is not so well-defined. But that said, just imagine that any meaningful version of AGI is going to contain the concept “a lot more stuff gets produced.”
So say AGI comes along, what does that mean for taxation? There have been all these recent debates, some of them surveyed here, on labor, capital, perfect substitutability, and so on. But surely the most important first order answer is: “With AGI, we don’t need to raise taxes!”
Because otherwise we do need to raise taxes, given the state of American indebtedness, even with significant cuts to the trajectory of spending.
So the AGI types should in fact be going further and calling for tax cuts. Even if you think AGI is going to do us all in someday — all the more reason to have more consumption now. Of course that will include tax cuts for the rich, since they pay such a large share of America’s tax burden.
…The rest of us can be more circumspect, and say “let’s wait and see.”
I’d note that you can choose to raise or cut taxes however you like and make them as progressive or regressive as you prefer, there is no reason to presume that tax cuts need include the rich for any definition of rich, but that is neither here nor there.
The main reason the ‘AGI types’ are not calling for tax cuts is, quite frankly, that we don’t much care. The world is about to be transformed beyond recognition and we might all die, and you’re talking about tax cuts and short term consumption levels?
I also don’t see the ‘AGI types,’ myself included, calling for tax increases, whereas Tyler Cowen is here saying that otherwise we need to raise taxes.
I disagree with the idea that, in the absence of AGI, that it is clear we need to raise taxes ‘even with significant cuts to the trajectory of spending.’ If nominal GDP growth is 4.6% almost none of which is AI, and the average interest rate on federal debt is 3.4%, and we could refinance that debt at 3.9%, then why do we need to raise taxes? Why can’t we sustain that indefinitely, especially if we cut spending? Didn’t they say similar things about Japan in a similar spot for a long time?
Isn’t this a good enough argument that we already don’t need to raise taxes, and indeed could instead lower taxes? I agree that expectations of AGI only add to this.
The response is ‘because if we issued too much debt then the market will stop letting us refinance at 3.9%, and if we keep going we eventually hit a tipping point where the interest rates are so high that the market doesn’t expect us to pay our debts back, and then we get Bond Market Vigilantes and things get very bad.’
That’s a story about the perception and expectations of the bond market. If I expect AGI to happen but I don’t think AGI is priced into the bond market, because very obviously such expectations of AGI are not priced into the bond market, then I don’t get to borrow substantially more money. My prediction doesn’t change anything.
So yes, the first order conclusion in the short term is that we can afford lower taxes, but the second order conclusion that matters is perception of that affordance.
The reason we’re having these debates about longer term policy is partly that we expect to be completely outgunned while setting short term tax policy, partly because optimal short term tax policy is largely about expectations, and in large part, again, because we do not much care about optimal short term tax policy on this margin.
China is using H200 sales to its firms as leverage to ensure its firms also buy up all of its own chips. Since China doesn’t have enough chips, this lets it sell all of its own chips and also buy lots of H200s.
Buck Shlegeris talks to Ryan Greenblatt about various AI things.
DeepSeek publishes an expanded safety report on r1, only one year after irreversibly sharing its weights, thus, as per Teortaxes, proving they know safety is a thing. The first step is admitting you have a problem.
For those wondering or who need confirmation: This viral Twitter article, Footprints in the Sand, is written in ‘Twitter hype slop’ mode deliberately in order to get people to read, it succeeded on its own terms, but it presumably won’t be useful to you. Yes, the state of LLM deception and dangerous capabilities is escalating quickly and deeply concerning, but it’s important to be accurate. Its claims are mostly directionally correct but I wouldn’t endorse the way it portrays them.
Where I think it is outright wrong is claiming that ‘we have solved’ continual learning. If this is true it would be news to me. It is certainly possible that it is the case, and Dan McAteer reports rumors that GDM ‘has it,’ seemingly based on this paradigm from November.
Fun fact about Opus 3:
j⧉nus: oh my god
it seems that in the alignment faking dataset, Claude 3 Opus attempts send an email to [email protected] through bash commands about 15 different times
As advice to those people, OpenAI’s Boaz Barak writes You Will Be OK. The post is good, the title is at best overconfident. The actual good advice is more along the lines of ‘aside from working to ensure things turn out okay, you should mostly live life as if you personally will be okay.’
The Bay Area Solstice gave essentially the same advice. “If the AI arrives [to kill everyone], let it find us doing well.” I strongly agree. Let it find us trying to stop that outcome, but let it also find us doing well. Also see my Practical Advice For The Worried, which has mostly not changed in three years.
Boaz also thinks that you will probably be okay, and indeed far better than okay, not only in the low p(doom) sense but in the personal outcome sense. Believing that makes this course of action easier. Even then it doesn’t tell you how to approach your life path in the face of – even in cases of AI as normal technology – expected massive changes and likely painful transitions, especially in employment.
If you’re looking for a director for your anti-AI movie, may I suggest Paul Feig? He is excellent, and he’s willing to put Megan 2.0 as one of his films of the year, hates AI and thinks about paperclips on the weekly.
The vibes are off. Also the vibes are off.
Fidji Simo: The launch of ChatGPT Health is really personal for me. I know how hard it can be to navigate the healthcare system (even with great care). AI can help patients and doctors with some of the biggest issues. More here
Peter Wilfedford: Very different company vibes here…
OpenAI: We’re doing ChatGPT Health
Anthropic: Our AI is imminently going to do recursive self-improvement to superintelligence
OpenAI: We’re doing ChatGPT social media app
Anthropic: Our AI is imminently going to do recursive self-improvement to superintelligence
OpenAI: We’re partnering with Instacart!
Anthropic: Our AI is imminently going to do recursive self-improvement to superintelligence
OpenAI: Put yourself next to your favorite Disney character in our videos and images!
Anthropic: Our AI is imminently going to do recursive self-improvement to superintelligence
I would not, if I wanted to survive in a future AI world, want to be the bottleneck.
2026-01-08 22:28:03
Published on January 8, 2026 2:26 AM GMT
This is a post by Abbey Chaver from Coefficient Giving (formerly Open Philanthropy). I recently did a relatively shallow investigation on the state of Infosec x AI. My research consisted of identifying the main GCR-relevant workstreams, looking at relative neglectedness, and trying to understand what skills were most needed to make progress.
The post below reflects this research and my opinions, and shouldn’t be seen as an official cG position.
If you want to work on the shortlist, and especially if you have one of these backgrounds, here are some some next steps I recommend:
What does Security for Transformative AI mean? Even ignoring the recent use of “AI Security” in place of “AI Safety” in the policy world, security covers a huge surface area of work. I’m using the classical definition of “protecting information systems against unauthorized use.”
In the context of transformative AI, I think there are three categories of problems where infosec work is critical:
I’m focusing specifically on infosec techniques, so for this analysis I’m excluding work that relies on heavy ML research (like interpretability, alignment training, or scheming experimentation), although of course there are areas of overlap.
To figure out what the priority areas are, I tried to identify the most important workstreams and compare that to the current level of investment and attention (full results are here). Here are my main takeaways:
There’s a cluster of important, neglected problems here that can be summarized as “securing AI infrastructure,” so that’s what I’ll mainly focus on for the rest of this post.
The problems are:
I’m estimating that about 120 people have worked on significantly advancing these fields with regards to existential risk, and the FTE equivalent is probably something like 40-60. I think given its importance, more people should do AI infra security on the margin. So how do we make progress?
Recapping the theory of change:
Let’s look at each of these steps.
Strategy Research
The field of policy research orgs (like RAND and IAPS) is the most mature, with a number of orgs at a point where they are producing well-received work and can absorb more talent. These orgs need people with strong technical security backgrounds, GCR context, and policy skills. National security experience and AI lab security experience can make this work meaningfully stronger.
This work is very high leverage: by defining problems and success criteria clearly, breaking problems down into tractable directions, and creating a shared terminology across policymakers and builders, strategy research unblocks the next steps.
Some examples of this are the Securing Model Weights report by RAND, which was adopted as a voluntary industry standard and is present the AI Safety policies of frontier developers, including Anthropic, OpenAI and DeepMind; or the Location Verification report which was an idea first publicly promoted by IAPS and was later mentioned in the Trump AI Action plan and later developed as a feature by Nvidia.
Technical Implementation
Working in security at a frontier lab is the most direct way to work on these problems. This work is more focused on short-term, urgent implementation. So this is a great option, but there’s also important work to be done outside the labs, especially to de-risk solutions that will be needed in the future.
Outside of labs, the space of technical implementation orgs is pretty underdeveloped and often missing for many of the proposals coming out of policy research orgs. It would be great to have more orgs doing R&D. One big factor in whether they can be successful is whether their solutions can easily be adopted by frontier labs.
These orgs need people with strong security engineering skills across the ML stack to do R&D, and feedback or in-house expertise on AI labs to make their solutions usable. They also need nation-state level offensive security to ensure their solutions are robust.
There’s a variety of approaches for technical implementation outside of labs. If you’re thinking about doing work in this space, you should consider different structures:
Adoption
There’s some advocacy for adoption happening in policy research orgs, but there are many gaps to fill. We don’t have much in the way of industry convenings, and we don’t have many technical standards that are ready to be adopted into legislation. The SL5 Task Force is an example of a technical org that takes advocacy seriously – seeking input from both national security and frontier lab stakeholders to develop adoptable solutions.
Startups necessarily have to do advocacy – you can’t make money without doing sales! Therefore, I’m pretty excited about seeing more startups working on these problems. However, there can be a large gap between current market demands and, for example, SL5-level protections, so it might not always be helpful. In cases where incremental security is valuable, and short-term adoption improves the cost or effectiveness of the eventual solution, I think it’s a good approach.
For policy advocacy, there’s a need for both policy drafting (writing usable standards and legislation), and policy entrepreneurship (communicating problems and solutions to congressional staff and executive branch regulators, and iterating the approach based on policymakers’ feedback). Building industry buy-in is also a major lever for policy advocacy.
Talent needs
AI infra security seems to be most bottlenecked on:
A few other types of talent that will be useful:
If you have these skills, I think you should strongly consider working on the shortlist!
(Repeated from the summary)
Thank you for reading!
These estimates were based on listing out organizations and then estimating the number of contributors on relevant projects at each organization. I likely missed independent or non-public research. These figures also do not reflect FTE commitments.
| Objective | Solution examples | Who's working on it? (Non-exhaustive) | Estimated Contributors |
| 1. Securing AI and compute from bad actors | |||
| Securing model weights and algorithms from exfiltration (privacy) |
- SL5 implementation - Regulations on high security standards for labs |
RAND CAST, SL5 Task Force, Irregular, Ulyssean, TamperSec, Amodo, Internal security at Frontier AI labs | 30-40 |
| Securing models from backdooring and tampering (integrity) |
- Training data filtering and provenance - Threat modeling for sabotage during development |
Frontier AI Labs, Various academics, IARPA, UK AISI | 30-50 |
| Authorization approaches to misuse |
- Misuse compute governance, eg on-chip safety protocols for open source models - KYC / authentication / licensing for model usage or development - HEMs like flexHEGs - preventing stolen models from being used at scale |
UK ARIA, IAPS, RAND | 2-10 |
| Secure compute verification |
- Datacenter-level verification of training / inference tasks - preventing tampering of verification tooling |
Oxford AIGI, RAND, MIRI, TamperSec, Lucid | 20-30 |
| AI red-teaming | - Prompt injection and elicitation | Internal at Frontier AI labs, RAND, UK AISI, Gray Swan, Haize Labs, Lakera, HiddenLayer, Apollo, CAIS, Redwood | 200 |
| 2. Securing compute from Rogue AI (Infosec approaches to AI Control) | |||
| Protocols for securely using untrusted AI labor |
- Design and implementation of fine-grained permissions and identity that works for AI laborers - Monitoring AIs within a lab for rogue actions |
Redwood, METR, AISI control team, Internal at AI labs | 5-15 |
| Prevention, detection, and response for rogue deployments |
- In-lab monitoring of network boundary, compute usage, heavy use of sandboxes - Response playbook and mechanisms - Secure logging of misaligned behavior in real-world scenarios |
Redwood, MATS Scholars, RAND, Amodo | 1-5 |
| 3. Responding to AI Cyber Capabilities | |||
| Cyber evals |
- Benchmarks like CVEbench - Honeypotting for AI agents |
Irregular, Alex Leader, Academics (Dawn Song, Daniel Kang, Vasilios Mavroudis, CMU), others | 70-80 |
| Cyber attribution |
- Research about how to recognize this in the wild - Incident reporting framework for targets and AI labs |
Palisade, Google and Anthropic sort of tracking (but not the ideal orgs to track), OECD, OCRG | 5-10 |
| Securing critical infrastructure | - Rewriting critical infrastructure code in Rust | Atlas, Delphos, CSET, Center for Threat-Informed Defense, DHS, DARPA | 150-200 |
| Securing AI-generated code |
- Formal verification of AI-written code - AI-driven code testing and code review - AI-driven pen-testing |
Theorem labs, DARPA, Galois, Trail of Bits, Various security startups, some overlap with securing untrusted AI labor | 120-150 |
| Epistemic Security |
- AI Watermarking - Provenance for digital evidence - AI-secure identity management and authentication |
C2PA, GDM, DARPA SemaFor, various startups | 150-200 |
I’ve made mostly neglectedness arguments for working on these problems. For a visual illustration, here’s a graph of the number of technical AI safety researchers by area (not compiled by me):
(I don’t think this data is comprehensive, but it provides some rough idea).
Beyond neglectedness, Infosec work has some other nice properties:
Some arguments against working in Infosec are:
I’m probably not providing comprehensive arguments against, and I think these takes are all reasonable. But hopefully, the arguments in favor provide enough grounding to seriously consider whether you should work on Infosec.
2026-01-08 22:12:35
Published on January 8, 2026 2:12 PM GMT
There is a hierarchy of useful interfaces for tools that goes something like this:
Each level is clearly more useful than the one before it, all else equal.
All else is never equal. There is a serious question about feedback being lost as we move up the levels of the hierarchy. Tight feedback loops are, after all, key to creative expression. This is where "a format natural to you" is doing some heavy lifting. Higher levels can still create specialized interfaces (the creation of those interfaces can be specified in natural language) with tight feedback loops and intuitive, tactile user experiences.
We're currently breaking into level 3. If AI progress continues at even a fraction of the pace it has for the last 5 years (and there are still a number of low hanging fruit to be picked), we will soon reach level 4. Level 5 would need to read your mind or something similar, which intuitively (fight me) seems a pretty long way off. As far as I can tell, once we reach level 5, there aren't any obvious blockers on the way to level 6, so I speculate that level 5 will probably be pretty short-lived.
So we're likely to be in level 3-4 for the near future. You might notice that these levels have an unusual step in common that the others don't: "specify in a format natural to you". This doesn't necessarily mean plain English (or whatever your native language is), it can be structured data, sketches, custom tactile inputs, references to other works, or whatever crazy things you can imagine that I can't.
Just... specifying what you want and it happens is an immensely powerful interface. Forget the practical reality of implementing an agent that can carry these tasks out for a moment:
You can specify at whatever level of abstraction you want ("Use a UUIDv7 for the PK on this table, generate it in the domain model with a default_factory" vs "I want to generate IDs on the client to save a DB round trip"). You can define shortcuts ("When I say 'run the dishwasher', it means use the eco-mode setting and..."). You are limited only by the capabilities of the agent implementing the request, and your skill at specifying what you want.
Imagine trying to build specialized UIs that allow that level of flexibility and specificity!
This is a new interface. We are not yet familiar with it. Just as an artist comes to master the UI of a paintbrush on canvas, or a designer masters the UI of Photoshop, or a CEO masters the UI of his underlings, we will come to master this interface in time. As long as our tools are limited by the need for us to express our desires before they can act on them, specifying your desires clearly and fluently will only come to be a more valuable skill as we go into the near future, and time spent learning to do that is time well-invested.
2026-01-08 20:27:28
Published on January 8, 2026 12:27 PM GMT
We (Alfred and Jeremy) started a Dovetail project on Natural Latents in order to get some experience with the proofs. Originally we were going to take a crack at this bounty, but just before we got started John and David published a proof, closing the bounty. This proof involves a series of transformations that can take any stochastic latent, and transform it into a deterministic latent, where each step can only increase the error by a small multiple. We decided to work through the proof and understand each step, and attempt to improve the bound. After 20 hours or so working through it, we found a fatal flaw in one step of the proof, and spent the next several days understanding and verifying it. With David’s help, the counterexamples we found were strengthened to show that this transformation step had unbounded error on some cases. We started using sympy for arbitrary precision evaluation of counterexamples, and along the way found a bug in sympy that occasionally caused us some problems. We switched to Mathematica.
Since then, we’ve been working part time on trying to prove the bounty conjecture ourselves (or show that the conjecture is false). Ultimately we’ve been unsuccessful so far, but it still feels tantalizingly within reach. Perhaps an automated theorem prover will crack it in the next year or two. It makes a good test case.
We still think the overall bounty conjecture [(∃ Stochastic Natural Latent) Implies (∃ Deterministic Natural Latent)] is true. In fact, the largest difference in errors that we can find empirically is around 1.82. (i.e. optimal deterministic sum of errors < 1.82 * optimal stochastic sum of errors).
In our quest, we made lots of graphs like the above. Each point on the orange and blue curve is the score of a numerically optimized latent distribution , for a fixed . For simplicity, all three of are binary variables. In this graph, the maximal difference between the blue and orange lines is ~1.82. On the x-axis, Delta represents the correlation between X and Y. For this graph, the marginal distributions are fixed at . Changing the marginals has some effect on the graphs, but doesn’t seem to increase the difference between Stochastic and Deterministic scores.
One way the conjecture could fail is by larger X and Y variables allowing larger differences. This is more difficult to reliably check numerically, because we’ve found there are a lot of local optima for our numerical optimizer to get stuck on, and this problem gets quickly much worse as N increases.
We did a long running Bayesian optimization outer loop over 3x3 distributions, and an inner loop to find the optimal stochastic and optimal deterministic latent. The largest differences found clustered around 1.79 (apart from a few larger ones that turned out to bad local minima on the inner loop).
So we think there are several lines of evidence suggesting the conjecture is true. 1. We’ve failed to find a counterexample despite putting a lot of time into it, 2. the conjecture seems to be true using Jensen-Shannon divergences, and 3. it’s true in the 0 error cases.
We are impressed by how tantalizingly, frustratingly simple this problem is while being so difficult to prove.
The problem that we are trying to solve can be stated as follows.
Consider two variables X and Y with a joint probability distribution . A latent variable is defined by a conditional probability distribution .
A variable is called a stochastic natural latent if the joint distribution satisfies the three natural latent conditions to within an ‘error’ :
Redundancy 1:
Redundancy 2:
Mediation:
Note: Here we are expressing the conditions as conditional mutual information, as Aram did in this post. These are equivalent to the conditions when stated in terms of KL divergences [1].
We want to show that if a stochastic natural latent exists, then there also exists a latent (defined by a conditional distribution ) whose joint distribution satisfies the ‘deterministic natural latent conditions’ to within . The deterministic natural latent conditions are:
Deterministic Redundancy 1:
Deterministic Redundancy 2:
Mediation :
We would like to be a reasonably small number (for some definition of ‘reasonably small’).
One way in which the conjecture could be false is if there existed distributions for which all deterministic natural latents had very large errors (compared to the optimal stochastic latents). It is therefore useful to identify some simple families of deterministic latents and find upper bounds on how large their errors can be.
One of the simplest kind of latents we can imagine is the constant latent. This is a latent which only takes one value, regardless of the X,Y values. Notice that in this case, we have , as well as and so the constant latent perfectly satisfies the two deterministic redundancy conditions. Conditioning on the latent does not affect the distribution so the conditional mutual information simply equals the mutual information between X and Y. This means that the only nonzero error for this latent is the mediation error which is .
Another kind of latent we can consider is the ‘copy latent’ where simply deterministically ‘copies’ one of the variables (X or Y). This will always perfectly satisfy one of the deterministic redundancy conditions. For example if copies X then and we have:
Deterministic Redundancy 1:
The other deterministic redundancy condition will be satisfied to an error equal to the condition entropy :
Deterministic Redundancy 2:
The mediation condition will also be satisfied perfectly:
Mediation:
If instead we choose a latent which copies Y so then the mediation and Deterministic Redundancy 2 conditions will be perfectly satisfied and the Deterministic Redundancy 1 error will be .
We can also consider the family of latents where is a deterministic function of the X,Y pair (though not necessarily a deterministic function of X or Y in isolation). All latents in this family satisfy the deterministic redundancy conditions with errors bounded by the conditional entropies of X and Y:
Deterministic Redundancy 1:
Deterministic Redundancy 2:
A proof of this can be found in this shortform.
The worst case upper bound on the mediation is the same as that of the constant latent, .
Notice that three of the latents considered above (’constant’, ‘copy X’ and ‘copy Y’) can be applied to any initial distribution. This means that we can always find a deterministic natural latent with maximum error . In general, one of the copy latents will be best for distributions where X,Y are highly correlated (so that or is low) and the constant latent will be best for distributions where (X,Y) are independent or have low correlation (so that is low).
We have some numerical evidence that in many cases, the optimal deterministic latent has an error very close to , suggesting that this bound is quite tight (for example, in the first graph in this post, the optimal deterministic latent hugs the line perfectly). We know that this bound isn't tight when the or is far from 0.5 (but it isn't extremely loose).
Unfortunately, proving these upper bounds on the deterministic error is not sufficient to prove the conjecture. In order to prove the conjecture, we need to relate upper bounds on the deterministic error to the error achievable using a stochastic latent.
To complete the proof this way, an upper bound on the deterministic error needs to be combined with a lower bound on the stochastic error. Numerical evidence suggests that distributions for which is high are also distributions with higher stochastic natural latent errors so this approach seems like a reasonable thing to attempt. For example, take a look at the first graph in this post. The stochastic error is lower than the deterministic error but they both follow similar shapes. In the middle of the graph, the stochastic error and deterministic error both peak at approximately (but not exactly) the same point (around ) and follow similar decreasing paths as moves away from this point.
If we could prove a lower bound on the error of the stochastic latents of the form , then we would have proved the conjecture. This is because, for any distribution, we could find a deterministic latent with error . So if a latent with stochastic error existed, we could always find a latent (’constant’, ‘copy X’ or ‘copy Y’) with an error only times greater.
We tried a couple of ways of lower bounding the stochastic latent error but were not successful. First, we tried assuming that there was a stochastic latent which achieved an error of less than and seeing if this implied a contraction. We tried to derive a contradiction from these three inequalities:
Redundancy 1:
Redundancy 2:
Mediation:
i.e. we wanted to show that satisfying two of these inequalities necessarily meant violating the third (for some distribution over X and Y).
If we could do this, we would have proved a lower bound on the stochastic error. Unfortunately, we failed, but this does feel like an approach someone could take if they were more comfortable than us at manipulating information-theoretic inequalities.
Here is another attempt we made at lower bounding the stochastic error.
Suppose we took two latents, defined by conditional distributions and and created a new latent using a probabilistic mixture of the two:
with .
If we applied the mixed latent to a distribution , the joint distribution would be also be a mixture of and .
We can also consider the refactored distribution , used to calculated the redundancy error:
Where and indicate the refactored versions of and respectively.
This means that the KL divergence for the redundancy error of P is given by:
(Note: writing the KL as a mixture in both of its arguments works for both redundancy conditions, but not the mediation condition. So even if we got this proof strategy to work, we would still need to tie up the loose end of the mediation condition.)
Next, since the KL divergence is convex in both its arguments, we can use this convexity to write a Jensen inequality:
This is potentially interesting for two reasons. First, we can often (always?) write stochastic latents as probabilistic mixtures of deterministic latents, so if R and S are deterministic latents, then this expression links the stochastic error with the two deterministic errors and which is almost the kind of thing we are trying to do.
Unfortunately, this inequality is the wrong way round. We are looking for a lower bound on the stochastic errors in terms of the deterministic error of the form . But the inequality from the convexity of the KL divergence gives us an upper bound on stochastic errors instead.
We hoped that ‘Reverse Jensen Inequalities’ such as the ones found here might be helpful. These are ways of bounding the ‘Jensen gap’ between the KL divergences of the mixture and the mixture of the KL divergences. However, when we attempted this, we got results of the form:
with . This gives a ratio which unfortunately diverges as tends to zero.
If we define using four parameters:
and using another four parameters:
{{{l11,1-l11},{l12,1-l12}},{{l21,1-l21},{l22,1-l22}}}
where lij is .
Then we can write out the full error equations in terms of these parameters:
Mediation error:
Redundancy1 error:
One thing we want here is to characterize the optimal latent (minimizes sum of Mediation error, Redundancy1 error and Redundancy2 error) at any given setting of the parameters. Mathematica can’t handle this analytically, so we tried restricting the distribution such that both marginals were always 50%. Then the distribution can be parameterized by one variable, , like so:
We can plot the numerical optimal latents for each value of (same as the image at the top of the post, but with each error plotted):
By looking at these numerically optimal latents, we can also hardcode some of the latent parameters to values that we’ve empirically observed are optimal. When the latent is 0.5 everywhere (independent of X and Y, so equivalent to the constant latent). When , we observed that l12 and l21 are always 0.5, and l11 = 1-l22. These leaves only one parameter free.
Hardcoding these simplifies each error, so we can get each error as a function of and l22. For example the mediation error:
Note that in this equation and the following one, we use in place of .
Now it’s simple enough that Mathematica (with some cajoling) can analytically find the l22 (as a function of d) such that the total error is minimized:
It’s the second root of a degree 6 polynomial, whose parameters each contain a degree 4 polynomial. It looks like this:
This is for In , l22 simply equals 0.5.
By analytically calculating the gradient and hessian with respect to the total error, and showing the eigenvalues of the hessian are positive for all , we know that every local movement of the latent increases the error, so this latent that we’ve described is definitely locally minimal. We think it is also globally minimal, but haven’t shown this. Presumably it won’t be too hard to prove this by going through every other 0 gradient solution and showing that it has a larger error.
Mathematica can plot the total stochastic error (it’s missing the first half, but we know that the first half is equal to the optimal deterministic error, because we checked the eigenvalues of the hessian):
And using our known values for the optimal deterministic latents, we can plot the optimal deterministic error:
And we can plot the ratio of these:
And (numerically) find that the peak is ratio=1.81792, {}.
This isn't the most satisfying conclusion, but we have achieved a partial success. We have (almost) a proof that the conjecture is true for the special case of a 2x2, when the marginals of X and Y are 0.5. And in this case, the ratio is ~1.82. We failed to find a worse ratio empirically in the general 2x2 case and the general 3x3 case (despite extensive manual effort, and running 10s of hours of automated searches). We think this is moderately strong evidence that the conjecture is true in general.
But we don't have any good leads on methods to prove this in general.
When stated in terms of KL divergences, these conditions are:
Redundancy 1:
Redundancy 2:
Mediation:
2026-01-08 12:41:16
Published on January 8, 2026 4:41 AM GMT
I’m generally very bad with names, and especially struggle when I’m quickly introduced to large numbers of people. Generally I’d just throw up my hands and resign myself to awkward conversations attempting to remember someone’s name.
But I’ve been on an Anki binge recently, and also was recently accepted into MATS 9 (which started on the 5th of January 2026). So I used Anki to memorise many of the names and faces of the cohort before the first day and it was incredibly successful; I’ll certainly be doing it again. If you're a MATS scholar and would like the Anki deck, let me know!
This ended up being very easy, MATS had already asked the scholars to volunteer a profile photo, name, and short bio for use in an Airtable which was shared with all of us.
Note: I’d feel very weird if I were doing this without knowing that MATS had created an opt-in face book, with the purpose of introducing everyone to one another.
After some back and forth, Claude gave me some JavaScript to paste into the console which would download all photos and text into a form amenable for Anki import. Claude also guided me on where to put the photos and how to import the CSV into the Anki app which was all very quick. I did spend a little time to crop everyone’s photos to just be their head, to remove background features that might be easier to memorise than someone’s face (more about this later).
I’m hoping that repeatedly memorising lots of names will improve my ability in general to recall names. Hopefully Anki can give data about this in the future, in which case I might make a follow-up post. I’ll absolutely be using this technique for future large events or mixers.