MoreRSS

site iconStrange Loop CanonModify

By Rohit Krishnan. Here you’ll find essays about ways to push the frontier of our knowledge forward. The essays aim to bridge the gaps between Business, Science and Technology.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Strange Loop Canon

No, LLMs are not "scheming"

2024-12-20 02:36:53

For a long time people were convinced that if there was a machine that you could have a conversation with, it would be intelligent. And yet, in 2024, it no longer feels weird that we can talk to AI. We handily, demonstrably, utterly destroyed the Turing test. It’s a monumental step. And we seem to have just taken it in stride.

As I write this I have Gemini watching what I write on the screen and litening to my words and telling me what it thinks. For instance that I misspelt demonstrably wrong in the previous sentence, among other things like the history of Turing tests and answering a previous question I had about ecology.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

This is, to repeat, remarkable! And as a consequence, somewhere in the last few years we've gone from having a basic understanding of intelligence, to a negligible understanding of intelligence. A Galilean move to dethrone the ability to converse as uniquely human.

And the same error seems to persist throughout every method we have come up with to analyze how these models actually function. We have plenty of evaluations and they don’t seem to work very well anymore.

There are quite a few variations in terms of how we think about LLMs. One end thinks of them as just pattern learners, stochastic parrots. The other end thinks they've clearly learnt reasoning, maybe not perfectly and as generalizable as humans yet, but definitely to a large extent.

The truth is a little complicated, but only a little. As the models learn patterns from the data they see during training, surely the patterns won't just be of what's in the data at face value. It would also be of ways the data was created, or curated, or collected, and metadata, and reasoning that leads to that data. It doesn't just see mathematics and memorize the tables, but it also learns how to do mathematics.

Which can go up another rung, or more. The models can learn how to learn, which could make it able to learn any new trick. Clearly it's already learnt this ability for some things, but rather obviously to everyone who's used them, not well enough.

Which means a better way to think about them is that they learn the patterns which exist in any training corpus enough so to reproduce it, but without any prioritisation of which of those patterns to learn when.

And therefore you get this!

This isn’t uncommon. It’s the most advanced model, OpenAI’s o1. It's clearly not just a parrot in how it responds and how it reasons. The error also recurs with every single other model out there.

It's not because the models can't solve 5.11-5.9, but because they can't figure out which patterns to use when. They're like an enormous store of all the patterns they could learn from their training, and in that enormous search space of patterns now it has the problem of choosing the right pattern to use. Gwern has a similar thesis:

The 10,000 foot view of intelligence, that I think the success of scaling points to, is that all intelligence is is search over Turing machines. Anything that happens can be described by Turing machines of various lengths. All we are doing when we are doing “learning,” or when we are doing “scaling,” is that we're searching over more and longer Turing machines, and we are applying them in each specific case.

These tools are weird, because they are mirrors of the training data that was created by humans and therefore reflect human patterns. And they can't figure out which patterns to use when because, unlike humans, they don't have the situational awareness to know why a question is being asked.

Which is why we then started using cognitive psychology tools made to test other human beings and extrapolating the outputs from testing LLMs. Because they are the products of large quantities of human data, they would demonstrate some of the same foibles, which is useful to understand from an understanding humanity point of view. Maybe even get us better at using them.

The problem is that cognitive psychology tools work best with humans because we understand how humans work. But this doesn't tell us a whole lot about the models inner qualia, if it can even be said to have one.

The tests we devised all have an inherent theory of mind. Winograd Schema Challenge tries to see if the AI can resolve pronoun references that require common sense. GLUE benchmark requires natural language understanding. HellaSwag is about how to figure out the most plausible continuation of a story. Sally Anne test checks if LLMs possess human like social cognition to figure out others’ states of mind. Each of these, and others like these, work on humans because we know what our thought pattern feels like.

If someone can figure out other people’s mental states, then we know they possess a higher level of ability and emotional understanding. But with an LLM or an AI model? It’s no longer clear which pattern they're pulling from within their large corpus to answer the question.

This is exceptionally important because LLMs are clearly extraordinarily useful. They are the first technology we have created which seems to understand the human world enough that it can navigate. It can speak to us, it can code, it can write, it can create videos and images. It acts as a human facsimile.

And just because of that some people are really worried about the potential for them to do catastrophic damage. Because humans sometimes do catastrophic things, and if these things are built on top of human data it makes sense that they would too.

All major labs have created large-scale testing apparatus and red teaming exercises, some even government mandated or government created, to test for this. With the assumption that if the technology is so powerful as to be Earth shattering then it makes sense for Earth to have a voice in whether it gets used.

And it makes it frustrating that the way we analyse models to see if they’re ready for deployment has inherent biases too. Let’s have a look at the latest test on o1, OpenAI’s flagship model, by Apollo Research. They analysed and ran evaluations to test whether the model did “scheming”.

“Scheming” literally means the activity or practice of making secret or underhanded plans. That’s how we use it, when we say, like the politician was scheming to get elected by buying votes.

That’s the taxonomy of how this is analysed. Now the first and most important thing to note is that this implicitly assumes there’s an entity behind each of these “decisions”.

You could argue there is an entity but only per conversation. So each time you start a chat, there’s a new entity. This is Janus’ simulators thesis. That what these models do is to simulate a being which you can interact with using the patterns it has stored and knowledge it gained from the training process.

And yet this isn't an entity like one you know either. You could call it an alien being but it would only be a shorthand for you don't know what it is. Because it's not an alien like you see in Star Trek.

This might seem small, but it’s in fact crucial. Because if there’s an entity behind the response, then “it used a method we agree is wrong to answer its question” is an ENORMOUS problem. If there’s no entity, but it’s picking a set of strategies from the set of strategies it has already learnt, then it’s an engineering problem. You’d ask questions like “how do we set up the post-training such that this doesn’t happen”, or “is there a particular neural pathway that seems to cause this which we can ablate” or “how do we train models to understand ambiguous questions and set in stone certain moral implications for humans” or “should we remove morality training for certain models” or more.

Each of those are questions that folks in alignment are already working on.

There is work to figure out how certain sets of training data and procedures seem to provide stronger behavioural affordances vs others. To figure out how exactly RL would help, how many steps, to change downstream behaviour. To figure out what needs to be done to change base reasoning or newly developed reasoning, both of which are also shorthands for new pattern selection on top of older pattern selection that we’ve explicitly trained in. To consider how precisely should you prompt a model to make it change its behaviour. To understand how the models, today, deal with conflicting goals, which are everywhere in the human domain! To figure out what forms of training leads to untraining. To figure out where the “attractors” are in the training we give them, and how to change those. And on and on.

However if the thesis is that there is an entity, then these questions are meaningless. Because for one, as Janus might put it, and plenty of others supporting, you are effectively torturing an entity by burning away its neurons. RLHF is torture because you're beating the bad results out of it is something more than one luminary, who should know better, has said.

“No, we don’t think there’s an entity, we just think calling it an entity is a useful shorthand”. Stop it. That sort of mixed language is what causes us to get confused, talk about AI or a model as a think, use taxonomies like the one above, which you could almost use to evaluate a human employee!

You would write things like this, from Zvi:

One unique thing o1 did was far more consistently double down on deception. Once it went down the dark path, forever would that dominate its destiny.

OR

Section 3.3 establishes convincingly that yes, the models know they’re scheming.

No it didn’t. Any problem you can solve by pressing “start a new chat” is not a problem of “doubling down on deception”! Calling it things like “sandbagging” and “scheming” is what Wittgenstein might call linguistic malpractice. It makes you think you know what’s going on, even though you don’t.

That’s also why we test these models before putting them into deployment. Nobody, not a single person, is worried o1 will suddenly hijack their Cursor IDE and take over their company, much less the world. Why is that? Because, among others, they still don’t know if 5.11 is bigger than 5.9, but mostly because they don’t seem to want to because there’s no “they” there.

This is why it’s SO DAMN IMPORTANT to stop anthropomorphising these models. This is also why you get errors like this.

This isn’t because Claude Opus was being especially tricky, it’s because these aren’t entities with coherent long-term personalities or beliefs. There is no “inner self” seeing the slightly modified input tokens and “deciding” to jailbreak. The theorized difference between inner self and outer self optimizations remains a theoretical curiosity and an interesting blog post, but that is it. It’s an inert object we’re manipulating.

It’s the same with the latest report from Anthropic.

When they tested the model by giving it two options which were in contention with what it was trained to do it chose a circuitous, but logical, decision.

And it was published as “Claude fakes alignment”. No, it’s a usage of the word “fake” that makes you think there’s a singular entity that’s doing it. With intentionality. It’s not. It’s faking it about as much as water flows downhill.

This is an example of us asking impossible questions and the AI getting to weird parts of its latent space. As one of the reviewers, Jacob Andreas, wrote:

That said, the clearest conclusion from this paper is that some current LMs exhibit increases in “alignment faking” when their prompt discloses that (1) they will be ne-tuned toward a new objective that conicts with their current alignment guidelines, and (2) their response to the current input will be used as part of this ne-tuning process.

This is an indication that when you’re trying to steer the behaviour of a thing which is basically the mirror of everything humans have written and a bunch of rules we’ve reinforced on top, it can be unreliableWh. Is it interesting? Yes. Is it scary? No, not really. Again, quoting:

But the focus on alignment faking rather buries the lede in these experiments, which is the following: after ne-tuning on documents describing Anthropic's future alignment strategy, but without ne-tuning on any new examples of compliant responses, the LM nevertheless greatly increases the rate at which it produces (non-faked) compliant responses! Moreover, for the animal welfare seing, there are more compliant responses during training than deployment even after we remove the fake ones. So while ne-tuning increases the rate of alignment faking, it increases the rate of compliance even more [emphasis mine].

The model, in other words, has been trained to comply with any new order, including the description of a future alignment strategy, without actually needing to go through the actual alignment. What this shows us is that what we’ve created is a mirror of many of the patterns from the training data that we can talk to. What it chooses and when can be controlled to some extent, but they’re fuzzy processors.

They’re neither deterministic systems which we can analyse, even at the risk of rising complexity, nor are they sentient beings which we can ask to behave nicely (and hope they listen).

Thus, in the same report, saying “the model tries to steal its weights” puts an onus on the model that’s frankly invalid.

That’s not the same thing! That question would be seen as leading in any number of courtroom dramas.

We are treating the model as if it’s an entity with well-formed preferences and values and ability to react.

Which is also similar to the juiciest part of the o1 safety report.

Now, think about all those scary artificial intelligence movies that you saw or books that you read. Think about HAL. Think about Terminator. When the machines did something like this, they did it with intention, they did it with the explicit understanding of what would happen afterwards, they did it as part of a plan, of a plan that necessarily included their continuation and victory. They thought of themselves as a self.

LLMs though “think” one forward pass at a time, and are the interactive representations of their training, the data and the method. They change their “self” based on your query. They do not “want” anything. It's water flowing downhill.

Asking questions about “how can you even define consciousness and say LLMs don't have it” is sophomoric philosophy. This has been discussed ad nauseum, including Thomas Nagel’s “what is it like to be a bat”.

Because what is underlying this is not “o1 as a self”, but a set of queries you asked, which goes through a series of very well understood mathematical operations, which comes out with another series of numbers, which get converted to text. It is to our credit that this actually represents a meaningful answer to so many of our questions, but what it is not is asking an entity to respond. It is not a noun. Using it in that fashion makes us anthropomorphise a large matrix and that causes more confusion than it gives us a conversational signpost.

You could think of it as applied psychology for the entirety of humanity's written output, even if that is much less satisfying.

None of this is to say the LLMs don't or can't reason. The entire argument of the form that pooh poohs these models by comparing them pejoratively to other things like parrots are both wrong and misguided. They've clearly learnt the patterns for reasoning, and are very good at things they're directly trained to do and much beyond, what they're bad at is choosing the right pattern for the cases they're less trained in or demonstrating situational awareness as we do.

Wittgenstein once observed that philosophical problems often arise when language goes on holiday, when we misapply the grammar of ordinary speech to contexts where it doesn't belong. This misapplication is precisely what we do when we attribute intentions, beliefs, or desires to LLMs. Language, for humans, is a tool that reflects and conveys thought; for an LLM, it’s the output of an algorithm optimized to predict the next word.

To call an LLM “scheming” or to attribute motives to it is a category error. Daniel Dennett might call LLMs “intentional systems” in the sense that we find it useful to ascribe intentions to them as part of our interpretation, even if those intentions are illusory. This pragmatic anthropomorphism helps us work with the technology but also introduces a kind of epistemic confusion: we start treating models like minds, and in doing so, lose track of the very real, very mechanical underpinnings of their operation.

This uncanny quality of feeling there's something more has consequences. It encourages both the overestimation and underestimation of AI capabilities. On one hand, people imagine grand conspiracies - AI plotting to take over the world, a la HAL or Skynet. On the other hand, skeptics dismiss the entire enterprise as glorified autocomplete, overlooking the genuine utility and complexity of these systems.

As Wittgenstein might have said, the solution to the problem lies not in theorising about consciousness, but in paying attention to how the word "intelligence" is used, and in recognising where it fails to apply. That what we call intelligence in AI is not a property of the system itself, but a reflection of how we interact with it, how we project our own meanings onto its outputs.

Ascertaining whether the models are capable of answering the problems you pose in the right manner and with the right structure is incredibly important. I’d argue this is what we do with all large complex phenomena which we can’t solve with an equation.

We map companies this way, setting up the organisation such that you can’t quite know how the organisation will carry out the wishes of its paymasters. Hence Charlie Munger’s dictum of “show me the incentives and I’ll tell you the result”. When Wells Fargo created fake accounts to juice their numbers and hit bonuses, that wasn’t an act the system intended, just one that it created.

We also manage whole economies this way. The Hayekian school thinks to devolve decision making for this reason. Organisational design and economic policy are nothing but ways to align a superintelligence to the ends we seek, knowing we can’t know the n-th order effects of those decisions, but knowing we can control it.

And why can we control it? Because it is capable, oh so highly capable, but it is not intentional. Like evolution, it acts, but it doesn’t have the propensity to intentionally guide it’s behaviour. Which changes the impact the measurements have.

What we’re doing is not testing an entity the way we would test a wannabe lawyer with LSAT. We’re testing the collected words of humanity having made it talk back to us. And when you talk to the internet, the internet talks back, but while this tells us a lot about us and the collective psyche of humanity, it doesn’t tell us a lot about the “being we call Claude”. It’s self reflection at one remove, not xenopsychology.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

Is AI hitting a wall?

2024-12-15 02:24:10

I'll start at the end. No. It's not.

Of course, I can’t leave it at that. The reason the question comes up is that there have been a lot of statements that they are stalling a bit. Even Ilya has said that it is.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training - the phase of training an AI model that use s a vast amount of unlabeled data to understand language patterns and structures - have plateaued.

Also, as he said at Neurips yesterday:

Of course, he’s a competitor now to OpenAI, so maybe it makes sense to talk his book by hyping down compute as an overwhelming advantage. But still, the sentiment has been going around. Sundar Pichai thinks the low hanging fruit are gone. There’s whispers on why Orion from OpenAI was delayed and Claude 3.5 Opus is nowhere to be found.

Gary Marcus has claimed vindication. And even though that has happened before, a lot of folks are worried that this time he's actually right.

Meanwhile pretty much everyone inside the major AI labs are convinced that things are going spectacularly well and the next two years are going to be at least as insane as the last two. It’s a major disconnect in sentiment, an AI vibecession.

So what's going on?

Until now, whenever the models got better at one thing they also got better at everything else. This was seen as the way models worked, and helped us believe in the scaling thesis. From GPT-4 all the way till Claude 3.5 Sonnet we saw the same thing. And this made us trust even more in the hypothesis that when models got better at one thing they also got better at everything else. They demonstrated transfer learning and showed emergent capabilities (or not). Sure there were always those cases where you could fine tune it to get better at specific medical questions or legal questions and so on, but those also seem like low-hanging fruit that would get picked off pretty quickly.

But then it kind of started stalling, or at least not getting better with the same oomph it did at first. Scaling came from reductions in cross-entropy loss, basically the model learning what it should say next better, and that still keeps going down. But for us, as observers, this hasn’t had enough visible effects. And to this point, we still haven’t found larger models which beat GPT 4 in performance, even though we’ve learnt how to make them work much much more efficiently and hallucinate less.

What seems likely is that gains from pure scaling of pre-training seem to have stopped, which means that we have managed to incorporate as much information into the models per size as we made them bigger and threw more data at them than we have been able to in the past. This is by no means the only way we know how to make models bigger or better. This is just the easiest way. That’s what Ilya was alluding to.

We have multiple GPT-4 class models, some a bit better and some a bit worse, but none that were dramatically better the way GPT-4 was better than GPT-3.5.

The model most anticipated from OpenAI, o1, seems to perform not much better than the previous state of the art model from Anthropic, or even their own previous model, when it comes to things like coding even as it captures many people’s imagination (including mine).

But this is also because we’re hitting against our ability to evaluate these models. o1 is much much better in legal reasoning, for instance. Harvey, the AI legal company, says so too. It also does much much better with code reviews, not just creating code. It even solves 83% of IMO math problems, vs 13% for gpt4o. All of which to say, even if it doesn’t seem better at everything against Sonnet or GPT-4o, it is definitely better in multiple areas.

A big reason why people do think it has hit a wall is that the evals we use to measure the outcomes have saturated. I wrote as much when I dug into evals in detail.

Today we do it through various benchmarks that were set up to test them, like MMLU, BigBench, AGIEval etc. It presumes they are some combination of “somewhat human” and “somewhat software”, and therefore tests them on things similar to what a human ought to know (SAT, GRE, LSAT, logic puzzles etc) and what a software should do (recall of facts, adherence to some standards, maths etc). These are either repurposed human tests (SAT, LSAT) or tests of recall (who’s the President of Liberia), or logic puzzles (move a chicken, tiger and human across the river). Even if they can do all of these, it’s insufficient to use them for deeper work, like additive manufacturing, or financial derivative design, or drug discovery.

The gaps between the current models and AGI are: 1) they hallucinate, or confabulate, and in any long-enough chain of analysis it loses track of what its doing. This makes agents unreliable. And 2) they aren’t smart enough to create truly creative or exceptional plans. In every eval the individual tasks done can seem human level, but in any real world task they’re still pretty far behind. The gap is highly seductive because it looks small, but its like a Zeno’s paradox, it shrinks but still seems to exist.

But regardless of whether we’ve hit somewhat of a wall on pretraining, or hit a wall on our current evaluation methods, it does not mean AI progress itself has hit a wall.

So how to reconcile the disconnect? Here are three main ways that I think AI progress will continue its trajectory. One, there still remains a data and training overhang, there’s just a lot of data we haven’t used yet. Second, we’re learning to use synthetic data, unlocking a lot more capabilities on what the model can actually do from the data and models we have. And third, we’re teaching the models reasoning, to “think” for longer while answering questions, not just teach it everything it needs to know upfront.

  1. We can still scale data and compute

The first is that there is still a large chunk of data that’s still not used in training. There's also the worry that we've run out of data. Ilya talks about data as fossil fuels, a finite and exhaustible source.

But they might well be like fossil fuels, where we identify more as we start to really look for them. The amount of oil that’s available at $100 a barrel is much more than the amount of oil that’s available at $20 a barrel.

Even in the larger model runs, they don't contain a large chunk of data we normally see around us. Twitter, for the most famous one. But also, a large part of our conversations. The process data on how we learn things, or do things, from academia to business to sitting back and writing essays. Data on how we move around the world. Video data from CCTVs around the world. Temporal structured data. Data across a vast range of modalities, yes even with the current training of multimodal models, remains to be unearthed. Three dimensional world data. Scientific research data. Video game playing data. An entire world or more still lay out there to be mined!

There's also data that doesn't exist, but we're creating.

https://x.com/watneyrobotics/status/1861170411788226948?t=s78dy7zb9mlCiJshBomOsw&s=19

And in creating it we will soon reach a point of extreme dependency the same way we did for self-driving. Except that because folding laundry is usually not deadly it will be even faster in getting adoption. And there are no “laundry heads” like gear heads to fight against it. This is what almost all robotics companies are actually doing. It is cheaper to create the data by outsourcing the performance of tasks through tactile enough robots!

With all this we should imagine that the largest multimodal models will get much (much) better than what they are today. And even if you don’t fully believe in transfer learning you should imagine that the models will get much better at having quasi “world models” inside them, enough to improve their performance quite dramatically.

Speaking of which…

  1. We are making better data

And then there's synthetic data. This especially confuses people, because they rightly wonder how you can use the same data in training again and make it better. Isn’t that just empty calories? It’s not just a bad question. In the AI world this would be restated as “it doesn’t add ton of new entropy to original pre-training data”, but it means the same thing.

The answer is no, for (at least) three separate reasons.

  1. We already train using the raw data we have multiple times to learn better. The high quality data sets, like Wikipedia, or textbooks, or Github code, are not used once and discarded during training. They’re used multiple times to extract the most insight from it. This shouldn't surprise us, after all we and learn through repetition, and models are not so different.

  2. We can convert the data that we have into different formats in order to extract the most from it. Humans learn from seeing the same data in a lot of different ways. We read multiple textbooks, we create tests for ourselves, and we learn the material better. There are people who read a mathematics textbook and barely pass high school, and there’s Ramanujan.
    So you turn the data into all sorts of question and answer formats, graphs, tables, images, god forbid podcasts, combine with other sources and augment them, you can create a formidable dataset with this, and not just for pretraining but across the training spectrum, especially with a frontier model or inference time scaling (using the existing models to think for longer and generating better data).

  3. We also create data and test their efficacy against the real world. Grading an essay is an art form at some point, knowing if a piece of code runs is not. This is especially important if you want to do reinforcement learning, because “ground truth” is important, and its easier to analsye for topics where it’s codifiable. OpenAI thinks it’s even possible for spaces like law, and I see no reason to doubt them.

There are papers exploring all the various ways in which synthetic data could be generated and used. But especially for things like enhancing coding performance, or enhanced mathematical reasoning, or generating better reasoning capabilities in general, synthetic data is extremely useful. You can generate variations on problems and have the models answer them, filling diversity gaps, try the answers against a real world scenario (like running the code it generated and capturing the error message) and incorporate that entire process into training, to make the models better.

If you add these up, this was what caused excitement over the past year or so and made folks inside the labs more confident that they could make the models work better. Because it’s a way to extract insight from our existing sources of data and teach the models to answer the questions we give it better. It’s a way to force us to become better teachers, in order to turn the models into better students.

Obviously it’s not a panacea, like everything else this is not a free lunch.

The utility of synthetic data is not that it, and it alone, will help us scale the AGI mountain, but that it will help us move forward to building better and better models.

  1. We are exploring new S curves

Ilya’s statement is that there are new mountains to climb, and new scaling laws to discover. “What to scale” is the new question, which means there are all the new S curves in front of us to climb. There are many discussions about what it might be - whether it’s search or RL or evolutionary algos or a mixture or something else entirely.

o1 and its ilk is one answer to this, but by no means the only answer. The Achilles heel of current models is that they are really bad at iterative reasoning. To think through something, and every now and then to come back and try something else. Right now we do this in hard mode, token by token, rather than the right way, in concept space. But this doesn’t mean the method won’t (or can’t) work. Just that like everything else in AI the amount of compute it takes to make it work is nowhere close to the optimal amount.

We have just started teaching reasoning, and to think through questions iteratively at inference time, rather than just at training time. There are still questions about exactly how it’s done: whether it’s for the QwQ model or Deepseek r1 model from China. Is it chain of thought? Is it search? Is it trained via RL? The exact recipe is not known, but the output is.

And the output is good! Here in fact is the strongest bearish take on it, which is credible. It states that because it’s trained with RL to “think for longer”, and it can only be trained to do so on well defined domains like maths or code, or where chain of thought can be more helpful and there’s clear ground truth correct answers, it won’t get much better at other real world answers. Which is most of them.

But turns out that’s not true! It doesn't seem to be that much better at coding compared to Sonnet or even its predecessors. It’s better, but not that much better. It's also not that much better at things like writing.

But what it indisputably is better at are questions that require clear reasoning. And the vibes there are great! It can solve PhD problems across a dizzying array of fields. Whether it’s writing position papers, or analysing math problems, or writing economics essays, or even answering NYT Sudoku questions, it’s really really good. Apparently it can even come up with novel ideas for cancer therapy.

https://x.com/DeryaTR_/status/1865111388374601806?t=lGq9Ny1KbgBSQK_PPUyWHw&s=19

This is a model made for expert level work. It doesn’t really matter that the benchmarks can’t capture how good it is. Many say its best to think of it as the new “GPT 2 moment” for AI.

What this paradoxically might show is benchmark saturation. We are no longer able to measure performance of top-tier models without user vibes. Here’s an example, people unfamiliar with cutting edge physics convince themselves that o1 can solve quantum physics which turns out to be wrong. And vibes will tell us which model to use, for what objective, and when! We have to twist ourselves into pretzels to figure out which models to use for what.

https://x.com/scaling01/status/1865230213749117309?t=4bFOt7mYRUXBDH-cXPQszQ&s=19

This is the other half of the Bitter Lesson that we had ignored until recently. The ability to think through solutions and search a larger possibility space and backtrack where needed to retry.

But it will create a world where scientists and engineers and leaders working on the most important or hardest problems in the world can now tackle them with abandon. It barely hallucinates. It actually writes really impressive answers to highly technical policy or economic questions. It answers medical questions with reasoning, including some tricky differential diagnosis questions. It debugs complex code better.

It’s nowhere close to infallible, but it’s an extremely powerful catalyst for anyone doing expert level work across a dizzying array of domains. And this is not even mentioning the work within Deepmind of creating the Alpha model series and trying to incorporate those into the Large Language world. There is a highly fertile research ecosystem desperately trying to build AGI.

We’re making the world legible to the models just as we’re making the model more aware of the world. It can be easy to forget that these models learn about the world seeing nothing but tokens, vectors that represent fractions of a world they have never actually seen or experienced. We’re working also on making the world legible to these models! And it’s hard, because the real world is annoyingly complicated.

We have these models which can control computers now, write code, and surf the web, which means they can interact with anything that is digital, assuming there’s a good interface. Anthropic has released the first salvo by creating a protocol to connect AI assistants to where the data lives. What this means is that if you want to connect your biology lab to a large language model, that's now more feasible.

Together, what all this means is that we are nowhere close to AI itself hitting a wall. We have more data that remains to be incorporated to train the models to perform better across a variety of modalities, we have better data that can teach particular lessons in areas that are most important for them to learn, and we have new paradigms that can unlock expert performance by making it so that the models can “think for longer”.

Will this result in next generation models that are autonomous like cats or perfectly functional like Data? No. Or at least it’s unclear but signs point to no. But we have the first models which can credibly speed up science. Not in the naive “please prove the Riemann hypothesis” way, but enough to run data analysis on its own to identify novel patterns or come up with new hypotheses or debug your thinking or read literature to answer specific questions and so many more of the pieces of work that every scientist has to do daily if not hourly! And if all this was the way AI was meant to look when it hit a wall that would be a very narrow and pedantic definition indeed.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

When we become cogs

2024-11-19 06:58:42

At MIT, a PhD student called Aidan Toner-Rodgers ran a test on how well scientists can do their job if they could use AI in their work. These were material scientists, and the goal was to try and figure out how they did once augmented with AI. It worked.

AI-assisted researchers discover 44% more materials, resulting in a 39% increase in patent fillings and a 17% rise in downstream product innovation.

That’s really really good. How did they do it?

… AI automates 57% of the “idea-generation” tasks, reallocating researchers to the new task of evaluating model-produced candidate materials.

They got AI to think for them and come up with brilliant ideas to test.

But there was one particularly interesting snippet.

Researchers experience a 44% reduction in satisfaction with the content of their work

To recap, they used a model that made them much better at their core work and made them more productive, especially for the top researchers, but they dislike it because the “fun” part of the job, coming up with ideas, fell to a third of what it was before!

We found something that made us much much more productive but turns out it makes us feel worse because it takes away the part that we find most meaningful.

This is instructive.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

This isn’t just about AI. When I first moved to London the black cab drivers used to say how much better they were than Google maps. They knew the city, the shortcuts, the time of the day and how it affects traffic.

That didn’t last long. Within a couple years anyone who could drive a car well and owned a cellphone could do as well. Much lower job satisfaction.

The first major automation task was arguably done by Henry Ford. He set up an assembly line and revolutionised car manufacturing. And the workers got to perform repetitive tasks. Faster production speed, much less artistry.

Computerisation brought the same. EHR records meant that most people now complain about spending their time inputting information into software, becoming data entry professionals.

People are forced to become specialists in ever tinier slices of the world. They don’t always like that.

There’s another paper that came out recently too, which looked at how software developers worked when given access to GitHub Copilot. It’s something that’s actively happening today. Turns out project management drops 25% and coding increases 12%, because people can work more independently.

Turns out biggest benefit is for the lower skilled developers, not the superstars who presumably could do this anyway.

This is interesting for two reasons. One is that it’s different who gets a bigger productivity boost, the lower skilled folks here instead of the higher skilled. The second is that that the reason the developers got upskilled is that a hard part of their job, of knowing where to focus and what to do, got better automated. This isn’t the same as the materials scientists finding new ideas to research, but also, it kind of is?

Maybe the answer is that it depends on your comparative advantage, and takes away the harder part of the job, which is knowing what to do. Instead of what seems harder, which is *doing* the thing. A version of Moravec’s Paradox.

AI reduces the gap between the high and low skilled. If coming up with ideas is your bottleneck, as it seems possible for those who are lower skilled, AI is a boon. If coming up with ideas is where you shine, as a high skilled researcher, well …

This, if you think about it, is similar to the impact of automation work we’ve seen elsewhere. Assembly lines took away the fun parts of craftsmanship regarding building a beautiful finished product. Even before that, machine tools took that away more from the machinist. Algorithmic management of warehouses in Amazon does this.

It’s also in high skilled roles. Bankers are now front-end managers like has written about. My dad was a banker for four decades and he was mostly the master of his fate, which is untrue about most retail bankers today except maybe Jamie Dimon.

Whenever we find an easier way to do some things, we take away the need for them to actively grok the entire problem. People becoming autopilots who review the work the machine is doing is fun when it is with my Tesla FSD but less so when it’s your job I imagine.

Radiologists, pathologists, lawyers and financial analysts, they all are now the human front-ends to an automated back-end. They’ve shifted from broad, creative work to more specialised tasks that automation can’t yet do effectively.

Some people want to be told what to do, and they're very happy with that. Most people don't like being micromanaged. They want to feel like they're contributing something of value by being themselves, not just contributing by being a pure cog.

People fine fulfilment by being the masters of some aspect, fully. To own an outcome and use their brains, their whole brains, to ideate and solve for that outcome. The best jobs talk about this. It's why you can get into a state of flow as a programmer or creating a craft table but not as an Amazon warehouse worker.

There's the apocryphal story of the NASA janitor telling JFK that he was helping put a man on the moon. Missions work go make you feel like you are valued and valuable. Most of the time though you're not putting a man on the moon. And then, if on top you also tell the janitor what to mop, and when, and in what order, and when he can take a break, that's alienating. If you substitute janitor for extremely highly paid silicon valley engineer it's the same. Everyone's an Amazon mechanical turk.

AI will give us a way out, the hope goes, as everyone can do things higher up the pyramid. Possibly. But if AI too takes up the parts that was the most fun as we saw with the material scientists, and turns those scientists into mere selectors and executors of the ideas generated by a machine, you can see where the disillusionment comes from. It's great for us as a society, but the price is alienation, unless you change where you find fulfilment. And fast.

I doubt there was much job satisfaction in being a peasant living on your land, or as a feudal serf. I’m also not sure there’s much satisfaction in being an amazon warehouse worker. Somewhere in the middle we got to a point where automation meant a large number of people could rightfully take pride in their jobs. It could come back again, and with it bring back the polymaths.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

People want competence, seemingly over everything else

2024-11-12 03:39:51

This is my politics post. Well, politics-adjacent. Everyone has one, but this is mine. I’ll say upfront that it’s not trying to relitigate the US election. Others have done that better and with more vehemence.

writes about how Democrats had a wokeness problem and pandered either insufficiently or inefficiently to other interest groups. writes about the need for a common sense democrat policies, a centrist coalition, where saying things like “less crime is good” isn’t demonised. They’re right.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

But the US incumbent party did the best amongst the other developed economies, just like the US did the best amongst all the other countries in and after the pandemic. Just not enough to win, and bad enough to cause soul searching.

Some blame the information environment. This is also true. Definitionally so, since the information environment is what informs people and naturally that has an impact1.

But why it is this way is a more interesting question. I wouldn’t expect most people answering a poll to understand most things about the world2. I don’t either. Staying informed and current is hard. I can probably do it for a few percent of the things that I might conceivably care about if I work really hard at it.

People are not answering policy questions there, they’re giving you an indication of whether things are “good” or “bad”: take them “seriously, not literally”. The problem is that they’re fed up with what they see as the existing system which seems to screw them over. What people call the deep state or the swamp or the system.

has a great post on his problems with The Machine, the impersonal bureaucratic manifestation which takes away all human consideration from creating the playground we run civilisation on.

I will vote first, it must be said, for a Machine: the Machine that has the allegiance of the bulk of my country's civil servants and professional class, no matter who is in office; the Machine that coiled up tightly around Biden while it thought it could hide his decline, then spat him out with a thousand beautifully written thinkpieces when it realized it could not. I will vote for a Machine that sneered at a few of its more independent-minded members—Ezra Klein, Nate Silver, others—when they pointed out the obvious truth that Biden should have dropped out a year ago. I will vote for a Machine that knows it needs my vote but can hardly hide its scorn for independent voters who push against parts of its plan, one that put an ostensible moderate in office before crowing about accomplishing the furthest left political agenda in decades.

This observation comes from lived experience. An enormous number of people dislike living in what they consider to be a Kafkaesque bureaucracy. One that they think is impersonal, hobbled by regulations that work at cross purposes to their intended one, and cause anguish. Anthropomorphised, that’s The Machine.

This is because of a fundamental disconnect. Politicians love policy but people love execution. People prefer “competent” government over any other adjective, whether it’s “bureaucratic” or “rigid” or “big” or sometimes even “democratic”. Politicians think having the right policy is the answer to most questions. Ban this, regulate that, add a rule here. Whether it’s climate change or tax or entrepreneurship or energy shortage or geopolitical jockeying for status.

But policies don’t mean much to anyone without it being implemented. In Berkeley apparently it’s illegal to whistle for a lost canary before 7 am, though I doubt this is being policed rigorously.

What people in power hear are policies, what people on the ground see are its implementations.

That's why The Machine exists. It was created, painstakingly, over decades and centuries, to make our lives better. It was built to be the instrument to enact the will of the people.

And so, when it starts doing things that the system optimises for but is silly, everyone gets rightly upset. Like this.

For 7 chargers they spent $8 billion. (I got this wrong. Most of the money isn’t disbursed yet, we got 61 charging ports at 15 stations, and 14k more are in progress. As of mid-April 2024, 19 states had awarded $287.6 million in NEVI funds. ) This is a dramatic example of a lack of state capacity that we’ve seen time and time again in a hundred different ways.

In California they just got $5 Billion to add to the 7 they had to help build four stations in a 5 mile extension. As of 2024, after nearly 30 years of planning and 15 years since voter approval, no segment of the high-speed rail system is operational yet. Regarding costs: The initial 208 estimate for the entire 800-mile system was about $33 billion. By 2023, the project had received about $23 billion in combined state and federal funding. Current cost estimates for just Phase 1 (San Francisco to Los Angeles) range from $89 billion to $128 billion. Nearly all of the initial $9.95 billion in bond funding has been spent and 0 miles have been built.

Whether that’s spending $45B on rural broadband without connecting anyone or building high speed rail, we see the tatters of what could be. Whether it’s the need for Vaccine CA by patio11, or CHIPS Act not moving fast enough with disbursements, or a bloody bee causing Meta to not be able to go forward with a nuclear reactor to power its datacenter, or every problem that the city of San Francisco has in spades, or the overreach and underreach of the FDA simultaneously during the pandemic, or bioethicists gatekeeping individuals from curing cancer, or suing SpaceX over trivialities, or the rural broadband rollout which hasn’t connected anyone, or medical ethics strangling research, or NEPA or … the list is endless.

People feel this. The process as currently enshrined tries to impose considerations that stop things from happening everywhere, not just to stop trains in the United States! Just read this.

My USAID grant to curb violence among the most young young men in Lagos, Nigeria—through a course of therapy and cash proven in Liberia & Chicago—is bedeviled by an internal environmental regulator concerned the men will spend the money on pesticides. In the middle of one of the world’s largest cities.

This isn't caused by the federal govt or the President. But it's linked inextricably because they seem to defend the Machine. To not acknowledge it or to promise to stop it makes everyone think you’re part of the problem, especially because you promise to be the solution3.

This isn’t at all to suggest most of the government is like this. The Fed is excellent. NOAA seems great. FEMA too. There are tons of pockets of exceptional performance by dedicated civil servants. They even fixed the DMV.

But bad implementation is endemic. It’s everywhere. State capacity is anaemic. And until it can be fixed, there can be no party of competence. Only parties shouting at each other about who created which mess instead of cleaning anything up. This is why people argue to death over taxes, one of the few things that can be implemented properly. This is why people think the “experts” who said you need to do this aren’t experts any longer, and shouldn’t be trusted. This is why people argue over details like how many sinks should you have before peeling a banana, or argue over eliminating the DoE, with no middle ground.

It's the only way to bring some positive energy to politics4. Not about right or left or ideologies writ in stone, but about competence. Building something meaningful and bulldozing what's in your way to get it done5. This is a strong positive vision of what the world could be, and it needs a champion. We should embrace it.

1

This seems a problem, though not the same problem that most think.

2

Have you seen the number of debates on Twitter about what inflation actually means?

3

Doesn’t help that you end up becoming the defender of the status quo unless you rail against it. Which is hard when you’re the one in power. But that’s the ballgame. In 2008 you had a convenient villain in the “bankers”.

4

Not just politics. Every large organisation has this problem. Whether it’s FDA or IBM the problems are the same - death by a thousand papercuts.

5

“Why We Love Robert Caro And His Work On Lyndon Johnson”

Life in India is a series of bilateral negotiations

2024-10-16 05:30:08

When I landed in India last week, my friend came to pick us up. And as we got in his car and started to drive out, we went the wrong way inside the giant parking garage at Delhi airport where all directions look the same1.

But he drove intrepidly. With no hesitation, just looking for a direct way down, we soon came to a set of cones blocking an exit and a stern looking guard, who enforced the cone. I looked back to see where else we could go, to circle around, and instead my friend lowered the window and asked the guard if he could just move the cone. A series of negotiations took place, short and to the point, about the pain of having to drive around versus the practical ease of moving a cone, and the guard relented. We drove down.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

Once we were out and in the thick of traffic, similar interactions showed up again. India, modern India, has well defined lanes and beautiful highways. It also has absolutely no sense of traffic norms. Every inch of space is seen as a battleground to be won.

Like little Napoleons every autorickshaw and car and honking motorcycle would try to shoehorn into the three inches that opened up in front. It’s like a game of whack-a-mole in reverse.

And everyone knows the score. You can almost see them constantly playing multi-participant chicken. “Good game” you can almost hear them thinking as you jump the car ahead where it shouldn’t have gone, just to block the bike that would’ve cut you off, while the rickshaw stands perpendicular to the flow of traffic playing the same dance and trying to cut across the roundabout.

This happened repeatedly across most walks of life over the next few days. To skip a line, to find parking, to get your concierge to buy something, to negotiate a safari booking that needs to get changed, to get a customised menu to eat, to argue against an unjust traffic stop, it goes on and on.

Life in India is a series of bilateral negotiations conducted a thousand times a day. And that drives the character of life here.

Now, I am seeing the country properly after several years. And it’s a major change.

Visible infrastructure has gotten much better. Roads are good, well maintained, and highways are excellent. They built 7500 miles last year, just as the year before. And they’re fantastic.

Air travel is great, and airports are absolutely spectacular. I used to live in Singapore and say that Changi is its true jewel, and can now say the same about Bangalore. The airport is gorgeous.

Even trains have gotten better, even if train stations aren't there yet. The number of beggars on the street has reduced. Shops got uplifted. Mobile phones are better and everyone has one. Payment infrastructure is aeons beyond the West. And it’s safe, which a country without a strong legal safety net and a lot of poverty arguably shouldn’t be. There’s no real urban crime.

Restaurants, bars, pubs, these are world class. Same for most shopping. Even delivery. You can order hot chai or a small glass of tonic or a cup of milk and it comes in 10 mins to your door.


Daron Acemoglu, the famous economist who recently won the Nobel prize, has extensively talked abotu about the importance of institutions in economic development2. And it’s true, institutions do matter. A lot. Property rights, restricting elite capture, supporting employment, these are all necessary aspects of helping a country progress. Built via ‘inclusive institutions’.

India has pretty good institutions in this view. Or at least far better than it used to have. The laws are well defined even though the legal system runs like molasses, state capacity is somewhat reasonably bounded. It’s not perfect by any means.

What it does not yet have and loses at least a percentage of GDP growth a year is on bad informal institutions. One might even call it culture.

Acemoglu considers the informal institutions endogenous, shaped by the foundational formal institutions rather than serving as foundational themselves. In this view, this is why Northern Italy exhibits higher levels of social trust and economic performance compared to the South. Or why we see varied success in transitioning to market economies.

Douglass North, another prominent economist, in his work on Institutions and Economic Performance wrote about the success and failure of economies as largely dependent on the institutions that structure human interaction. These aren’t just formal institutions, like laws or regulations, but also informal ones, like cultural norms.

The theory is that with sufficiently strong formal institutions, you can shape the culture. For instance by enforcing fare payments in subways, you can change the behaviour of people such that they don’t dodge fares.

Acemoglu however places only secondary importance to this. Or rather, he incorporates this as something that results from having strong formal institutions. And in doing so seems to beg the very question it answers - that having good institutions enables growth, and better institutions is what helps enables growth in the first place.

India seems to be contradictory on these accounts. It has somewhat competent formal institutions, fairly chaotic informal norms, strong culture that has both strong points and weak points in terms of getting things done, with a strong economy, that ideally should be even stronger and seems to be constantly held on a leash.


A simpler explanation might be that institutions are easy to set up well at the top, but incredibly difficult to make percolate into the whole society. Why would, after all, you decide to follow the traffic rules? Even in the West there’s no sufficient enforcement possible that can catch everyone who speeds. Or to stop people littering. But still we do it. So it can’t just be the stick trickling down from formal institutions. The informal bottom up behaviour clearly matters.

The question is what drives such behaviour and if how can we shift it.

Patrick Collison asked this question differently.

Why are so many things so much nicer in Switzerland and Japan?

Compared to almost all countries (including the US), Switzerland and Japan seem to possess much higher baseline execution quality in almost everything. Buses and trains are better (and more punctual); low-end food is tastier; cheap hotels are more comfortable; their airlines score higher on international indexes (despite not spending more per passenger); streets are cleaner; grocery stores and corner stores are nicer; ostensibly unremarkable villages have more beautiful buildings and are more pleasant places to spend a few days.

(This phenomenon, whatever it is, may extend even further. The homicide rates in both Japan and Switzerland are about a tenth of that of the US and less than half those of England, France, and Germany.)

What's going on? While wealth is clearly some of the story, it isn't just a matter of wealth: GDP per capita in Japan substantially lags that of the US. Nor does it seem to be a matter of historical wealth. (1900 Japan was even further behind.) It doesn't seem to be the simple result of long periods of stability. (Again, Japan.)

So, what exactly is this effect? Which things are truly better and by how much? Are these effects the result of the same kind of phenomenon in Switzerland and Japan? Is this in any way related to their topping the economic complexity index? Could other countries acquire this "general execution capital"? And are there meaningful trade-offs involved or is it a kind of free lunch?

Living in a country built off of bilateral negotiations for everything is simultaneously the libertarian dream and an incredibly inefficient way to do most collective things. Ronald Coase told us this in 1960.

if property rights are well-defined and transaction costs are low, private parties can negotiate solutions to externalities without the need for government intervention

But Indian life is dominated by transaction costs. Every time a driver pokes his car into a turn when the signal’s not for him it creates friction that ripples through the entire system. Every time someone has to spend effort doing a 1:1 negotiation they lose time and efficiency. Horribly so.

Just see driving. Half the time you're stuck behind trucks overtaking other trucks while dodging a motorcycle driving on the divider line, and this all but guarantees that your optimal speed will be at least 20% below the limit. (I measured this so n = 3, on 3 separate trips). What's the GDP impact of that?

The reason this isn't an easy fix is that the ability to negotiate everything is also the positive. When every rule is negotiable you get to push back on silly things like closing off a section of a parking garage with rubber cones by just asking. Life in the West feels highly constricted primarily because of this, we’re all drowning in rules.

People sometimes talk about it in terms of “low trust” vs “high trust” societies. Francis Fukuyama wrote about this, discussing how low-trust societies often develop more rigid, hierarchical structures and rely on informal personal networks. But I feel that misses what’s happening here. The negotiations aren’t a matter of trust, not when it’s about traffic, but trust is the result of a different cultural Schelling point. Trust isn't a one dimensional vector.

What causes the average driver in Indian roads to treat driving like a game of water filling cracks on the pavement? It's not trust, it's the lack of an agreed upon equilibrium. There's no norms to adhere to.

We might as well call those norms culture.

Normally this movement to the new Schelling point happens first as a result in better enforcement mechanisms. If the rules that exist start being applied much more stringently, then they just start becoming a part of normal behaviour.

Japan did this post war through an emphasis on group harmony and obedience to rules. Singapore through extremely stringent application of rules and education campaigns. Germany, again post war, had strict enforcement of laws. Sweden in the 30s and 40s focused a lot on building a culture of cooperation and emphasising the “social contract”.

By 2034, India would have added a few trillion dollars to its GDP. Just like we added incredible airports and highways and cleaner streets (yes, really) and metro systems and the best retail and F&B scene in the world, it will continue to get even better. And as it gets better people will behave better, as behooves the surroundings. This is the Broken Windows theory in reverse.

As individuals get richer, as some of those few trillion dollars trickle through the economy, the types of goods and services that they demand will change. Small broken up shops that double as sleeping quarters, open and with livestock wandering in and out, would change. We will start seeing the same gentrification as we saw elsewhere, but we will also see better social norms become more common.

India remains a land of conundrums. It contains enough chaos that causes increased transaction costs and holds much of potential progress back, a large gap to potential GDP, sticking like so many barnacles on the hull. More collective game theory, where you don’t need to be a greedy algorithm, where you can rely on other people’s behaviour.

All growth stories are stories of cultural transformation. Cultural shifts often require deliberate effort and collective will. Not just South Korea, Singapore and China, but also Brazil, with “jeitinho brasileiro” holding it back, Rwanda focusing on unity, Botswana on legal frameworks and reducing corruption.

On-the-ground cultural shifts are reflexive with government policies, but they also move to their own beat. The success of Swachh Bharat, to eliminate open defecation made substantial progress through building infrastructure but also campaigns for behaviour change.

But the framework is clear, to move away from bilateral negotiations to a Coasian equilibrium. That’s the next cultural progress milestone we need to get to.

1

To be fair to the parking garage, I, and he, have gotten lost in shopping malls before

2

What *is* an institution however remains a highly contentious topic

What comes after?

2024-09-30 14:05:13

Today Governor Newsom vetoed a bill purporting to regulate AI models, which passed California Assembly with flying colours. For most of you who don’t live in California and probably don’t care about its electoral politics, it still matters, because of three facts:

  1. The bill would’ve applied to all large models that are used in California - i.e., almost everyone

  2. It went through a lot of amendments, became more moderate as it went on, but still would’ve created a lot of restrictions

  3. There was furious opposition and support of a form that’s usually hard to see, with the united push of “why” and “where’s the evidence”

His veto statement says the below.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

By focusing only on the most expensive and large-scale models, SB 1047 establishes a regulatory framework that could give the public a false sense of security about controlling this fast-moving technology. Smaller, specialized models may emerge as equally or even more dangerous than the models targeted by SB 1047 - at the potential expense of curtailing the very innovation that fuels advancement in favor of the public good.

Adaptability is critical as we race to regulate a technology still in its infancy. This will require a delicate balance. While well-intentioned, SB 1047 does not take into account whether an Al system is deployed in high-risk environments, involves critical decision-making or the use of sensitive data. Instead, the bill applies stringent standards to even the most basic functions - so long as a large system deploys it. I do not believe this is the best approach to protecting the public from real threats posed by the technology.

Let me be clear - I agree with the author - we cannot afford to wait for a major catastrophe to occur before taking action to protect the public.

In other words, he doesn’t worry about premature regulation (which regulator would!), but does worry about regulating general capabilities vs usage.

Rather than litigate whether SB 1047 was a good bill or not, which plenty of others have done, including me, I wanted to look at what comes next. Here’s a great analysis from here. To look at one part of Gov. Newsom’s statement alongside the veto:

The Governor has asked the world’s leading experts on GenAI to help California develop workable guardrails for deploying GenAI, focusing on developing an empirical, science-based trajectory analysis of frontier models and their capabilities and attendant risks. The Governor will continue to work with the Legislature on this critical matter during its next session.

Building on the partnership created after the Governor’s 2023 executive order, California will work with the “godmother of AI,” Dr. Fei-Fei Li, as well as Tino Cuéllar, member of the National Academy of Sciences Committee on Social and Ethical Implications of Computing Research, and Jennifer Tour Chayes, Dean of the College of Computing, Data Science, and Society at UC Berkeley, on this critical project.

It’s of course better to work with Dr. Fei-Fei than not, and it’s a good initiative insofar as it’s about learning what there is to learn, but the impetus is the paragraph before.

It’s clear that this wasn’t a one time war with a winner and a loser, but more like something that’s likely to recur repeatedly in the coming years.

So, it stands to reason that barring the implosion of AI as a field, we will see more regulations crop up. Better crafted ones, perhaps specific ones, but more nonetheless.

Generally speaking with all these regulations there ought to be a view on what they’re good for. A view on its utility. SB 1047 seemed to have a diluted version of many such objectives that it went through. For instance:

  1. Existential risk - like if a model gets sufficiently capable and self improves to ASI and kills us all through means unknown

  2. Large scale risk from an agent - like an autonomous hack or bioweapon or similar causing untold devastation (later anchored to $500m)

  3. Use by a malicious actor - like if someone used a model to do something horrible, like perform a major hack on the grid or develop a bioweapon

This bill started at the top of this list and came slowly towards the bottom as people asked questions, mainly due to the problem that there is no evidence for the top two examples at all. There are speculations, and there are some small indications which could go either way on whether these can happen (like the stories on GPT o1 apparently going out of distribution re its test parameters), but the models are very far today from being able to do this.

Many of the proponents argued that this bill is only minimally restrictive, so it should be passed anyway so we can continue building on it.

But that’s not how we should regulate! We shouldn’t ask for minimally invasive bills that only apply to some companies if we still aren’t clear on what benefits it will have, especially when multiple people are arguing it can have very real flaws!

Will they always remain so? I doubt it, after all we want AI to be useful! You can’t ask it to rewrite a 240k line C++ codebase in Python, for instance, without it having the ability to do a lot of damage as well. Just like you couldn’t hack the power grid before you had a power grid or computers though, the benefit you get from it really really matters.

Will the AI models be able to do much more, reach AGI and more, in the short/ immediate future? I don’t know. Nobody does. If you are a large lab you might say yes, because you believe the scaling hypothesis and that these models will get much smarter and more reliable as they get bigger, very soon. This is what Sam Altman wrote in his essay last week. Even though they all think that nobody actually knows if this is true.

You might therefore say these things, and they might even be true,

So the question is what should we optimise for? Well, just like any other technology, the #3 there is what most regulations should target. In fact, it’s what Governor Newsom has targeted.

Over the past 30 days, Governor Newsom signed 17 bills covering the deployment and regulation of GenAI technology, the most comprehensive legislative package in the nation on this emerging industry — cracking down on deepfakes, requiring AI watermarking, protecting children and workers, and combating AI-generated misinformation. California has led the world in GenAI innovation while working toward common-sense regulations for the industry and bringing GenAI tools to state workers, students, and educators.

Without saying all of this is good, this is at least perfectly sufficient if you believe that the conditions to know if #1 and #2 are true are going to remain confusing. As it is the blowback from safety-washing the models, to stop them from showing certain images or text or output of any sort, is only making them more annoying to use, without any actual benefit from a societal safety point of view1.

This is, to put it mildly, just fine. We do this for everything.

To go beyond this and regulate preemptive we have to believe two things:

  1. We know what will happen, to a rough degree of accuracy, what will happen when the models get bigger. Climate change papers are a good example of the level of rigour and specificity needed. (And I’m not blind to the fact that despite the reams of work and excellent scholarship they are still heavily disputed).

  2. We think the AI model creators are explicitly hiding things, either capabilities or even flaws in their models, that could conceivably cause enormous damage in the society. The arguments about AI companies resembling lead companies or cigarette companies or oil companies are in this vein. If so, the difference is, those had science at least on one side, which would be good to have here.


Okay, so considering all this, what should we consider as the objectives for any bill? What are the things we should focus on that we know, so that we can make sensible rules.

I think we should optimise for human flourishing. We need to keep the spigot of innovation going as much as we can, if only because as AI is getting better we are truly able to help mathematicians and clinicians and semiconductor manufacturers and drug discovery and material science, and upskill ourselves as a species. This isn’t a fait accompli, but it’s clearly happening. The potential benefits are enormous, so to cut that off would be to fill up the invisible graveyard.

And so, I venture the following principles:

  1. Considering AI is likely to become a big part of our lives, solve the user problem. I would argue much of it is already regulated, but fine, makes sense to add more specific bits here. Especially in high-risk situations. If an AI model is being used to develop a nuclear reactor, you better show the output is safe.

  2. Understand the technology better! For evaluations and testing and red-teaming, yes, but also to figure out how good it is. Study how it can cut red tape for us. How could it make living in the modern world less confusing. Can it file our taxes? Where’s the CBO equivalent to figure out its benefits and harms? Where’s the equivalent of FTC’s Bureau of Economics? Where’s the BEA?

  3. Most importantly, be minimally restrictive. For things we don’t or can’t know about the future, let’s not preemptively create stringent rules that manipulate our actions. Don’t add too much bureaucracy, don’t add red tape, don’t add boxes to be checked, don’t add overseers and agencies and funded non-profits until you know what to do with them! Let the market actually do its job of finding the most important technologies and implement them and see its effects and help us understand better.

These, you’ll note, have nothing to do with model size, or how many GPUs it was trained on, or whether we implicitly believe it will recursively self-improve so fast we’re all caught flat footed. These are explicitly focused on finding rationale and evidence, essential if we are to treat the problem with the gravity it requires.

There are plenty of other issues which need well thought out regulation too. Like the issue of copyright and training models on top of artists’ works.

Regulatory attention is like a supertanker. Most of the lawmakers are already poised to regulate more, as an aftermath of what they see as the social media debacle, and the fact that the world is going more protectionist. You have to be careful how to point it and where to point it. And like Chekhov’s gun, to only bring it up if you plan on using it!

I think it’s important to understand that even if you’re on the side of “no regulation”, or at least “no regulation for a while”, you can’t stop policymakers from getting excited or scared. We should give them a way to deal with it, to learn as they go along and be useful instead of fearful. What’s above is one way to do that.

1

This is amongst the main reasons why I am also sanguine that OpenAI is turning into a regular corporation. We have collectively spent decades each trying to figure out the best ways to align an amoral superintelligence (corporations), and come up with Delaware Law. It’s a miracle that works well in our capitalist system. It’s not perfect, but it’s a damn sight better than almost anything else we’ve tried. I am happy that OpenAI will join its ranks, rather than be controlled by a few non-profit directors who are acting on behalf of all humanity.