MoreRSS

site iconStrange Loop Canon

By Rohit Krishnan. Here you’ll find essays about ways to push the frontier of our knowledge forward. The essays aim to bridge the gaps between Business, Science and Technology.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Strange Loop Canon

When we become cogs

2024-11-19 06:58:42

At MIT, a PhD student called Aidan Toner-Rodgers ran a test on how well scientists can do their job if they could use AI in their work. These were material scientists, and the goal was to try and figure out how they did once augmented with AI. It worked.

AI-assisted researchers discover 44% more materials, resulting in a 39% increase in patent fillings and a 17% rise in downstream product innovation.

That’s really really good. How did they do it?

… AI automates 57% of the “idea-generation” tasks, reallocating researchers to the new task of evaluating model-produced candidate materials.

They got AI to think for them and come up with brilliant ideas to test.

But there was one particularly interesting snippet.

Researchers experience a 44% reduction in satisfaction with the content of their work

To recap, they used a model that made them much better at their core work and made them more productive, especially for the top researchers, but they dislike it because the “fun” part of the job, coming up with ideas, fell to a third of what it was before!

We found something that made us much much more productive but turns out it makes us feel worse because it takes away the part that we find most meaningful.

This is instructive.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

This isn’t just about AI. When I first moved to London the black cab drivers used to say how much better they were than Google maps. They knew the city, the shortcuts, the time of the day and how it affects traffic.

That didn’t last long. Within a couple years anyone who could drive a car well and owned a cellphone could do as well. Much lower job satisfaction.

The first major automation task was arguably done by Henry Ford. He set up an assembly line and revolutionised car manufacturing. And the workers got to perform repetitive tasks. Faster production speed, much less artistry.

Computerisation brought the same. EHR records meant that most people now complain about spending their time inputting information into software, becoming data entry professionals.

People are forced to become specialists in ever tinier slices of the world. They don’t always like that.

There’s another paper that came out recently too, which looked at how software developers worked when given access to GitHub Copilot. It’s something that’s actively happening today. Turns out project management drops 25% and coding increases 12%, because people can work more independently.

Turns out biggest benefit is for the lower skilled developers, not the superstars who presumably could do this anyway.

This is interesting for two reasons. One is that it’s different who gets a bigger productivity boost, the lower skilled folks here instead of the higher skilled. The second is that that the reason the developers got upskilled is that a hard part of their job, of knowing where to focus and what to do, got better automated. This isn’t the same as the materials scientists finding new ideas to research, but also, it kind of is?

Maybe the answer is that it depends on your comparative advantage, and takes away the harder part of the job, which is knowing what to do. Instead of what seems harder, which is *doing* the thing. A version of Moravec’s Paradox.

AI reduces the gap between the high and low skilled. If coming up with ideas is your bottleneck, as it seems possible for those who are lower skilled, AI is a boon. If coming up with ideas is where you shine, as a high skilled researcher, well …

This, if you think about it, is similar to the impact of automation work we’ve seen elsewhere. Assembly lines took away the fun parts of craftsmanship regarding building a beautiful finished product. Even before that, machine tools took that away more from the machinist. Algorithmic management of warehouses in Amazon does this.

It’s also in high skilled roles. Bankers are now front-end managers like has written about. My dad was a banker for four decades and he was mostly the master of his fate, which is untrue about most retail bankers today except maybe Jamie Dimon.

Whenever we find an easier way to do some things, we take away the need for them to actively grok the entire problem. People becoming autopilots who review the work the machine is doing is fun when it is with my Tesla FSD but less so when it’s your job I imagine.

Radiologists, pathologists, lawyers and financial analysts, they all are now the human front-ends to an automated back-end. They’ve shifted from broad, creative work to more specialised tasks that automation can’t yet do effectively.

Some people want to be told what to do, and they're very happy with that. Most people don't like being micromanaged. They want to feel like they're contributing something of value by being themselves, not just contributing by being a pure cog.

People fine fulfilment by being the masters of some aspect, fully. To own an outcome and use their brains, their whole brains, to ideate and solve for that outcome. The best jobs talk about this. It's why you can get into a state of flow as a programmer or creating a craft table but not as an Amazon warehouse worker.

There's the apocryphal story of the NASA janitor telling JFK that he was helping put a man on the moon. Missions work go make you feel like you are valued and valuable. Most of the time though you're not putting a man on the moon. And then, if on top you also tell the janitor what to mop, and when, and in what order, and when he can take a break, that's alienating. If you substitute janitor for extremely highly paid silicon valley engineer it's the same. Everyone's an Amazon mechanical turk.

AI will give us a way out, the hope goes, as everyone can do things higher up the pyramid. Possibly. But if AI too takes up the parts that was the most fun as we saw with the material scientists, and turns those scientists into mere selectors and executors of the ideas generated by a machine, you can see where the disillusionment comes from. It's great for us as a society, but the price is alienation, unless you change where you find fulfilment. And fast.

I doubt there was much job satisfaction in being a peasant living on your land, or as a feudal serf. I’m also not sure there’s much satisfaction in being an amazon warehouse worker. Somewhere in the middle we got to a point where automation meant a large number of people could rightfully take pride in their jobs. It could come back again, and with it bring back the polymaths.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

People want competence, seemingly over everything else

2024-11-12 03:39:51

This is my politics post. Well, politics-adjacent. Everyone has one, but this is mine. I’ll say upfront that it’s not trying to relitigate the US election. Others have done that better and with more vehemence.

writes about how Democrats had a wokeness problem and pandered either insufficiently or inefficiently to other interest groups. writes about the need for a common sense democrat policies, a centrist coalition, where saying things like “less crime is good” isn’t demonised. They’re right.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

But the US incumbent party did the best amongst the other developed economies, just like the US did the best amongst all the other countries in and after the pandemic. Just not enough to win, and bad enough to cause soul searching.

Some blame the information environment. This is also true. Definitionally so, since the information environment is what informs people and naturally that has an impact1.

But why it is this way is a more interesting question. I wouldn’t expect most people answering a poll to understand most things about the world2. I don’t either. Staying informed and current is hard. I can probably do it for a few percent of the things that I might conceivably care about if I work really hard at it.

People are not answering policy questions there, they’re giving you an indication of whether things are “good” or “bad”: take them “seriously, not literally”. The problem is that they’re fed up with what they see as the existing system which seems to screw them over. What people call the deep state or the swamp or the system.

has a great post on his problems with The Machine, the impersonal bureaucratic manifestation which takes away all human consideration from creating the playground we run civilisation on.

I will vote first, it must be said, for a Machine: the Machine that has the allegiance of the bulk of my country's civil servants and professional class, no matter who is in office; the Machine that coiled up tightly around Biden while it thought it could hide his decline, then spat him out with a thousand beautifully written thinkpieces when it realized it could not. I will vote for a Machine that sneered at a few of its more independent-minded members—Ezra Klein, Nate Silver, others—when they pointed out the obvious truth that Biden should have dropped out a year ago. I will vote for a Machine that knows it needs my vote but can hardly hide its scorn for independent voters who push against parts of its plan, one that put an ostensible moderate in office before crowing about accomplishing the furthest left political agenda in decades.

This observation comes from lived experience. An enormous number of people dislike living in what they consider to be a Kafkaesque bureaucracy. One that they think is impersonal, hobbled by regulations that work at cross purposes to their intended one, and cause anguish. Anthropomorphised, that’s The Machine.

This is because of a fundamental disconnect. Politicians love policy but people love execution. People prefer “competent” government over any other adjective, whether it’s “bureaucratic” or “rigid” or “big” or sometimes even “democratic”. Politicians think having the right policy is the answer to most questions. Ban this, regulate that, add a rule here. Whether it’s climate change or tax or entrepreneurship or energy shortage or geopolitical jockeying for status.

But policies don’t mean much to anyone without it being implemented. In Berkeley apparently it’s illegal to whistle for a lost canary before 7 am, though I doubt this is being policed rigorously.

What people in power hear are policies, what people on the ground see are its implementations.

That's why The Machine exists. It was created, painstakingly, over decades and centuries, to make our lives better. It was built to be the instrument to enact the will of the people.

And so, when it starts doing things that the system optimises for but is silly, everyone gets rightly upset. Like this.

For 7 chargers they spent $8 billion. (I got this wrong. Most of the money isn’t disbursed yet, we got 61 charging ports at 15 stations, and 14k more are in progress. As of mid-April 2024, 19 states had awarded $287.6 million in NEVI funds. ) This is a dramatic example of a lack of state capacity that we’ve seen time and time again in a hundred different ways.

In California they just got $5 Billion to add to the 7 they had to help build four stations in a 5 mile extension. As of 2024, after nearly 30 years of planning and 15 years since voter approval, no segment of the high-speed rail system is operational yet. Regarding costs: The initial 208 estimate for the entire 800-mile system was about $33 billion. By 2023, the project had received about $23 billion in combined state and federal funding. Current cost estimates for just Phase 1 (San Francisco to Los Angeles) range from $89 billion to $128 billion. Nearly all of the initial $9.95 billion in bond funding has been spent and 0 miles have been built.

Whether that’s spending $45B on rural broadband without connecting anyone or building high speed rail, we see the tatters of what could be. Whether it’s the need for Vaccine CA by patio11, or CHIPS Act not moving fast enough with disbursements, or a bloody bee causing Meta to not be able to go forward with a nuclear reactor to power its datacenter, or every problem that the city of San Francisco has in spades, or the overreach and underreach of the FDA simultaneously during the pandemic, or bioethicists gatekeeping individuals from curing cancer, or suing SpaceX over trivialities, or the rural broadband rollout which hasn’t connected anyone, or medical ethics strangling research, or NEPA or … the list is endless.

People feel this. The process as currently enshrined tries to impose considerations that stop things from happening everywhere, not just to stop trains in the United States! Just read this.

My USAID grant to curb violence among the most young young men in Lagos, Nigeria—through a course of therapy and cash proven in Liberia & Chicago—is bedeviled by an internal environmental regulator concerned the men will spend the money on pesticides. In the middle of one of the world’s largest cities.

This isn't caused by the federal govt or the President. But it's linked inextricably because they seem to defend the Machine. To not acknowledge it or to promise to stop it makes everyone think you’re part of the problem, especially because you promise to be the solution3.

This isn’t at all to suggest most of the government is like this. The Fed is excellent. NOAA seems great. FEMA too. There are tons of pockets of exceptional performance by dedicated civil servants. They even fixed the DMV.

But bad implementation is endemic. It’s everywhere. State capacity is anaemic. And until it can be fixed, there can be no party of competence. Only parties shouting at each other about who created which mess instead of cleaning anything up. This is why people argue to death over taxes, one of the few things that can be implemented properly. This is why people think the “experts” who said you need to do this aren’t experts any longer, and shouldn’t be trusted. This is why people argue over details like how many sinks should you have before peeling a banana, or argue over eliminating the DoE, with no middle ground.

It's the only way to bring some positive energy to politics4. Not about right or left or ideologies writ in stone, but about competence. Building something meaningful and bulldozing what's in your way to get it done5. This is a strong positive vision of what the world could be, and it needs a champion. We should embrace it.

1

This seems a problem, though not the same problem that most think.

2

Have you seen the number of debates on Twitter about what inflation actually means?

3

Doesn’t help that you end up becoming the defender of the status quo unless you rail against it. Which is hard when you’re the one in power. But that’s the ballgame. In 2008 you had a convenient villain in the “bankers”.

4

Not just politics. Every large organisation has this problem. Whether it’s FDA or IBM the problems are the same - death by a thousand papercuts.

5

“Why We Love Robert Caro And His Work On Lyndon Johnson”

Life in India is a series of bilateral negotiations

2024-10-16 05:30:08

When I landed in India last week, my friend came to pick us up. And as we got in his car and started to drive out, we went the wrong way inside the giant parking garage at Delhi airport where all directions look the same1.

But he drove intrepidly. With no hesitation, just looking for a direct way down, we soon came to a set of cones blocking an exit and a stern looking guard, who enforced the cone. I looked back to see where else we could go, to circle around, and instead my friend lowered the window and asked the guard if he could just move the cone. A series of negotiations took place, short and to the point, about the pain of having to drive around versus the practical ease of moving a cone, and the guard relented. We drove down.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

Once we were out and in the thick of traffic, similar interactions showed up again. India, modern India, has well defined lanes and beautiful highways. It also has absolutely no sense of traffic norms. Every inch of space is seen as a battleground to be won.

Like little Napoleons every autorickshaw and car and honking motorcycle would try to shoehorn into the three inches that opened up in front. It’s like a game of whack-a-mole in reverse.

And everyone knows the score. You can almost see them constantly playing multi-participant chicken. “Good game” you can almost hear them thinking as you jump the car ahead where it shouldn’t have gone, just to block the bike that would’ve cut you off, while the rickshaw stands perpendicular to the flow of traffic playing the same dance and trying to cut across the roundabout.

This happened repeatedly across most walks of life over the next few days. To skip a line, to find parking, to get your concierge to buy something, to negotiate a safari booking that needs to get changed, to get a customised menu to eat, to argue against an unjust traffic stop, it goes on and on.

Life in India is a series of bilateral negotiations conducted a thousand times a day. And that drives the character of life here.

Now, I am seeing the country properly after several years. And it’s a major change.

Visible infrastructure has gotten much better. Roads are good, well maintained, and highways are excellent. They built 7500 miles last year, just as the year before. And they’re fantastic.

Air travel is great, and airports are absolutely spectacular. I used to live in Singapore and say that Changi is its true jewel, and can now say the same about Bangalore. The airport is gorgeous.

Even trains have gotten better, even if train stations aren't there yet. The number of beggars on the street has reduced. Shops got uplifted. Mobile phones are better and everyone has one. Payment infrastructure is aeons beyond the West. And it’s safe, which a country without a strong legal safety net and a lot of poverty arguably shouldn’t be. There’s no real urban crime.

Restaurants, bars, pubs, these are world class. Same for most shopping. Even delivery. You can order hot chai or a small glass of tonic or a cup of milk and it comes in 10 mins to your door.


Daron Acemoglu, the famous economist who recently won the Nobel prize, has extensively talked abotu about the importance of institutions in economic development2. And it’s true, institutions do matter. A lot. Property rights, restricting elite capture, supporting employment, these are all necessary aspects of helping a country progress. Built via ‘inclusive institutions’.

India has pretty good institutions in this view. Or at least far better than it used to have. The laws are well defined even though the legal system runs like molasses, state capacity is somewhat reasonably bounded. It’s not perfect by any means.

What it does not yet have and loses at least a percentage of GDP growth a year is on bad informal institutions. One might even call it culture.

Acemoglu considers the informal institutions endogenous, shaped by the foundational formal institutions rather than serving as foundational themselves. In this view, this is why Northern Italy exhibits higher levels of social trust and economic performance compared to the South. Or why we see varied success in transitioning to market economies.

Douglass North, another prominent economist, in his work on Institutions and Economic Performance wrote about the success and failure of economies as largely dependent on the institutions that structure human interaction. These aren’t just formal institutions, like laws or regulations, but also informal ones, like cultural norms.

The theory is that with sufficiently strong formal institutions, you can shape the culture. For instance by enforcing fare payments in subways, you can change the behaviour of people such that they don’t dodge fares.

Acemoglu however places only secondary importance to this. Or rather, he incorporates this as something that results from having strong formal institutions. And in doing so seems to beg the very question it answers - that having good institutions enables growth, and better institutions is what helps enables growth in the first place.

India seems to be contradictory on these accounts. It has somewhat competent formal institutions, fairly chaotic informal norms, strong culture that has both strong points and weak points in terms of getting things done, with a strong economy, that ideally should be even stronger and seems to be constantly held on a leash.


A simpler explanation might be that institutions are easy to set up well at the top, but incredibly difficult to make percolate into the whole society. Why would, after all, you decide to follow the traffic rules? Even in the West there’s no sufficient enforcement possible that can catch everyone who speeds. Or to stop people littering. But still we do it. So it can’t just be the stick trickling down from formal institutions. The informal bottom up behaviour clearly matters.

The question is what drives such behaviour and if how can we shift it.

Patrick Collison asked this question differently.

Why are so many things so much nicer in Switzerland and Japan?

Compared to almost all countries (including the US), Switzerland and Japan seem to possess much higher baseline execution quality in almost everything. Buses and trains are better (and more punctual); low-end food is tastier; cheap hotels are more comfortable; their airlines score higher on international indexes (despite not spending more per passenger); streets are cleaner; grocery stores and corner stores are nicer; ostensibly unremarkable villages have more beautiful buildings and are more pleasant places to spend a few days.

(This phenomenon, whatever it is, may extend even further. The homicide rates in both Japan and Switzerland are about a tenth of that of the US and less than half those of England, France, and Germany.)

What's going on? While wealth is clearly some of the story, it isn't just a matter of wealth: GDP per capita in Japan substantially lags that of the US. Nor does it seem to be a matter of historical wealth. (1900 Japan was even further behind.) It doesn't seem to be the simple result of long periods of stability. (Again, Japan.)

So, what exactly is this effect? Which things are truly better and by how much? Are these effects the result of the same kind of phenomenon in Switzerland and Japan? Is this in any way related to their topping the economic complexity index? Could other countries acquire this "general execution capital"? And are there meaningful trade-offs involved or is it a kind of free lunch?

Living in a country built off of bilateral negotiations for everything is simultaneously the libertarian dream and an incredibly inefficient way to do most collective things. Ronald Coase told us this in 1960.

if property rights are well-defined and transaction costs are low, private parties can negotiate solutions to externalities without the need for government intervention

But Indian life is dominated by transaction costs. Every time a driver pokes his car into a turn when the signal’s not for him it creates friction that ripples through the entire system. Every time someone has to spend effort doing a 1:1 negotiation they lose time and efficiency. Horribly so.

Just see driving. Half the time you're stuck behind trucks overtaking other trucks while dodging a motorcycle driving on the divider line, and this all but guarantees that your optimal speed will be at least 20% below the limit. (I measured this so n = 3, on 3 separate trips). What's the GDP impact of that?

The reason this isn't an easy fix is that the ability to negotiate everything is also the positive. When every rule is negotiable you get to push back on silly things like closing off a section of a parking garage with rubber cones by just asking. Life in the West feels highly constricted primarily because of this, we’re all drowning in rules.

People sometimes talk about it in terms of “low trust” vs “high trust” societies. Francis Fukuyama wrote about this, discussing how low-trust societies often develop more rigid, hierarchical structures and rely on informal personal networks. But I feel that misses what’s happening here. The negotiations aren’t a matter of trust, not when it’s about traffic, but trust is the result of a different cultural Schelling point. Trust isn't a one dimensional vector.

What causes the average driver in Indian roads to treat driving like a game of water filling cracks on the pavement? It's not trust, it's the lack of an agreed upon equilibrium. There's no norms to adhere to.

We might as well call those norms culture.

Normally this movement to the new Schelling point happens first as a result in better enforcement mechanisms. If the rules that exist start being applied much more stringently, then they just start becoming a part of normal behaviour.

Japan did this post war through an emphasis on group harmony and obedience to rules. Singapore through extremely stringent application of rules and education campaigns. Germany, again post war, had strict enforcement of laws. Sweden in the 30s and 40s focused a lot on building a culture of cooperation and emphasising the “social contract”.

By 2034, India would have added a few trillion dollars to its GDP. Just like we added incredible airports and highways and cleaner streets (yes, really) and metro systems and the best retail and F&B scene in the world, it will continue to get even better. And as it gets better people will behave better, as behooves the surroundings. This is the Broken Windows theory in reverse.

As individuals get richer, as some of those few trillion dollars trickle through the economy, the types of goods and services that they demand will change. Small broken up shops that double as sleeping quarters, open and with livestock wandering in and out, would change. We will start seeing the same gentrification as we saw elsewhere, but we will also see better social norms become more common.

India remains a land of conundrums. It contains enough chaos that causes increased transaction costs and holds much of potential progress back, a large gap to potential GDP, sticking like so many barnacles on the hull. More collective game theory, where you don’t need to be a greedy algorithm, where you can rely on other people’s behaviour.

All growth stories are stories of cultural transformation. Cultural shifts often require deliberate effort and collective will. Not just South Korea, Singapore and China, but also Brazil, with “jeitinho brasileiro” holding it back, Rwanda focusing on unity, Botswana on legal frameworks and reducing corruption.

On-the-ground cultural shifts are reflexive with government policies, but they also move to their own beat. The success of Swachh Bharat, to eliminate open defecation made substantial progress through building infrastructure but also campaigns for behaviour change.

But the framework is clear, to move away from bilateral negotiations to a Coasian equilibrium. That’s the next cultural progress milestone we need to get to.

1

To be fair to the parking garage, I, and he, have gotten lost in shopping malls before

2

What *is* an institution however remains a highly contentious topic

What comes after?

2024-09-30 14:05:13

Today Governor Newsom vetoed a bill purporting to regulate AI models, which passed California Assembly with flying colours. For most of you who don’t live in California and probably don’t care about its electoral politics, it still matters, because of three facts:

  1. The bill would’ve applied to all large models that are used in California - i.e., almost everyone

  2. It went through a lot of amendments, became more moderate as it went on, but still would’ve created a lot of restrictions

  3. There was furious opposition and support of a form that’s usually hard to see, with the united push of “why” and “where’s the evidence”

His veto statement says the below.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

By focusing only on the most expensive and large-scale models, SB 1047 establishes a regulatory framework that could give the public a false sense of security about controlling this fast-moving technology. Smaller, specialized models may emerge as equally or even more dangerous than the models targeted by SB 1047 - at the potential expense of curtailing the very innovation that fuels advancement in favor of the public good.

Adaptability is critical as we race to regulate a technology still in its infancy. This will require a delicate balance. While well-intentioned, SB 1047 does not take into account whether an Al system is deployed in high-risk environments, involves critical decision-making or the use of sensitive data. Instead, the bill applies stringent standards to even the most basic functions - so long as a large system deploys it. I do not believe this is the best approach to protecting the public from real threats posed by the technology.

Let me be clear - I agree with the author - we cannot afford to wait for a major catastrophe to occur before taking action to protect the public.

In other words, he doesn’t worry about premature regulation (which regulator would!), but does worry about regulating general capabilities vs usage.

Rather than litigate whether SB 1047 was a good bill or not, which plenty of others have done, including me, I wanted to look at what comes next. Here’s a great analysis from here. To look at one part of Gov. Newsom’s statement alongside the veto:

The Governor has asked the world’s leading experts on GenAI to help California develop workable guardrails for deploying GenAI, focusing on developing an empirical, science-based trajectory analysis of frontier models and their capabilities and attendant risks. The Governor will continue to work with the Legislature on this critical matter during its next session.

Building on the partnership created after the Governor’s 2023 executive order, California will work with the “godmother of AI,” Dr. Fei-Fei Li, as well as Tino Cuéllar, member of the National Academy of Sciences Committee on Social and Ethical Implications of Computing Research, and Jennifer Tour Chayes, Dean of the College of Computing, Data Science, and Society at UC Berkeley, on this critical project.

It’s of course better to work with Dr. Fei-Fei than not, and it’s a good initiative insofar as it’s about learning what there is to learn, but the impetus is the paragraph before.

It’s clear that this wasn’t a one time war with a winner and a loser, but more like something that’s likely to recur repeatedly in the coming years.

So, it stands to reason that barring the implosion of AI as a field, we will see more regulations crop up. Better crafted ones, perhaps specific ones, but more nonetheless.

Generally speaking with all these regulations there ought to be a view on what they’re good for. A view on its utility. SB 1047 seemed to have a diluted version of many such objectives that it went through. For instance:

  1. Existential risk - like if a model gets sufficiently capable and self improves to ASI and kills us all through means unknown

  2. Large scale risk from an agent - like an autonomous hack or bioweapon or similar causing untold devastation (later anchored to $500m)

  3. Use by a malicious actor - like if someone used a model to do something horrible, like perform a major hack on the grid or develop a bioweapon

This bill started at the top of this list and came slowly towards the bottom as people asked questions, mainly due to the problem that there is no evidence for the top two examples at all. There are speculations, and there are some small indications which could go either way on whether these can happen (like the stories on GPT o1 apparently going out of distribution re its test parameters), but the models are very far today from being able to do this.

Many of the proponents argued that this bill is only minimally restrictive, so it should be passed anyway so we can continue building on it.

But that’s not how we should regulate! We shouldn’t ask for minimally invasive bills that only apply to some companies if we still aren’t clear on what benefits it will have, especially when multiple people are arguing it can have very real flaws!

Will they always remain so? I doubt it, after all we want AI to be useful! You can’t ask it to rewrite a 240k line C++ codebase in Python, for instance, without it having the ability to do a lot of damage as well. Just like you couldn’t hack the power grid before you had a power grid or computers though, the benefit you get from it really really matters.

Will the AI models be able to do much more, reach AGI and more, in the short/ immediate future? I don’t know. Nobody does. If you are a large lab you might say yes, because you believe the scaling hypothesis and that these models will get much smarter and more reliable as they get bigger, very soon. This is what Sam Altman wrote in his essay last week. Even though they all think that nobody actually knows if this is true.

You might therefore say these things, and they might even be true,

So the question is what should we optimise for? Well, just like any other technology, the #3 there is what most regulations should target. In fact, it’s what Governor Newsom has targeted.

Over the past 30 days, Governor Newsom signed 17 bills covering the deployment and regulation of GenAI technology, the most comprehensive legislative package in the nation on this emerging industry — cracking down on deepfakes, requiring AI watermarking, protecting children and workers, and combating AI-generated misinformation. California has led the world in GenAI innovation while working toward common-sense regulations for the industry and bringing GenAI tools to state workers, students, and educators.

Without saying all of this is good, this is at least perfectly sufficient if you believe that the conditions to know if #1 and #2 are true are going to remain confusing. As it is the blowback from safety-washing the models, to stop them from showing certain images or text or output of any sort, is only making them more annoying to use, without any actual benefit from a societal safety point of view1.

This is, to put it mildly, just fine. We do this for everything.

To go beyond this and regulate preemptive we have to believe two things:

  1. We know what will happen, to a rough degree of accuracy, what will happen when the models get bigger. Climate change papers are a good example of the level of rigour and specificity needed. (And I’m not blind to the fact that despite the reams of work and excellent scholarship they are still heavily disputed).

  2. We think the AI model creators are explicitly hiding things, either capabilities or even flaws in their models, that could conceivably cause enormous damage in the society. The arguments about AI companies resembling lead companies or cigarette companies or oil companies are in this vein. If so, the difference is, those had science at least on one side, which would be good to have here.


Okay, so considering all this, what should we consider as the objectives for any bill? What are the things we should focus on that we know, so that we can make sensible rules.

I think we should optimise for human flourishing. We need to keep the spigot of innovation going as much as we can, if only because as AI is getting better we are truly able to help mathematicians and clinicians and semiconductor manufacturers and drug discovery and material science, and upskill ourselves as a species. This isn’t a fait accompli, but it’s clearly happening. The potential benefits are enormous, so to cut that off would be to fill up the invisible graveyard.

And so, I venture the following principles:

  1. Considering AI is likely to become a big part of our lives, solve the user problem. I would argue much of it is already regulated, but fine, makes sense to add more specific bits here. Especially in high-risk situations. If an AI model is being used to develop a nuclear reactor, you better show the output is safe.

  2. Understand the technology better! For evaluations and testing and red-teaming, yes, but also to figure out how good it is. Study how it can cut red tape for us. How could it make living in the modern world less confusing. Can it file our taxes? Where’s the CBO equivalent to figure out its benefits and harms? Where’s the equivalent of FTC’s Bureau of Economics? Where’s the BEA?

  3. Most importantly, be minimally restrictive. For things we don’t or can’t know about the future, let’s not preemptively create stringent rules that manipulate our actions. Don’t add too much bureaucracy, don’t add red tape, don’t add boxes to be checked, don’t add overseers and agencies and funded non-profits until you know what to do with them! Let the market actually do its job of finding the most important technologies and implement them and see its effects and help us understand better.

These, you’ll note, have nothing to do with model size, or how many GPUs it was trained on, or whether we implicitly believe it will recursively self-improve so fast we’re all caught flat footed. These are explicitly focused on finding rationale and evidence, essential if we are to treat the problem with the gravity it requires.

There are plenty of other issues which need well thought out regulation too. Like the issue of copyright and training models on top of artists’ works.

Regulatory attention is like a supertanker. Most of the lawmakers are already poised to regulate more, as an aftermath of what they see as the social media debacle, and the fact that the world is going more protectionist. You have to be careful how to point it and where to point it. And like Chekhov’s gun, to only bring it up if you plan on using it!

I think it’s important to understand that even if you’re on the side of “no regulation”, or at least “no regulation for a while”, you can’t stop policymakers from getting excited or scared. We should give them a way to deal with it, to learn as they go along and be useful instead of fearful. What’s above is one way to do that.

1

This is amongst the main reasons why I am also sanguine that OpenAI is turning into a regular corporation. We have collectively spent decades each trying to figure out the best ways to align an amoral superintelligence (corporations), and come up with Delaware Law. It’s a miracle that works well in our capitalist system. It’s not perfect, but it’s a damn sight better than almost anything else we’ve tried. I am happy that OpenAI will join its ranks, rather than be controlled by a few non-profit directors who are acting on behalf of all humanity.

OpenAI's Strawberry models can reason like an expert

2024-09-13 13:48:30

A year ago OpenAI was the undisputed leader, with GPT-4 at the very pinnacle of what’s possible. Then we had Claude 3.5 Sonnet, Claude 3 Opus, Llama’s latest models, Gemini 1.5 Pro Experimental, even Deepseek’s models, all working roughly at similar levels even if not across all tasks.

This was great for us as consumers and adopters, the price fell 100x or more, but it’s not great for the producers. It’s hard to spend $100 million training a model only to have it fall behind SOTA within 6 months.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.

Now, GPT-o1, nee Strawberry, is out. It's the first model that feels qualitatively different since GPT-3.5 (or 3). It's not perfect but it does have some capabilities that are net new. It can reason. What does that mean? It means you can give it puzzles or complex questions with internal interdependencies, like part of the answer to the question needs you to answer part of it, then revise, then revise again, then iterate and so on, and it will be able to answer them.

The best example is my eval, which tests iterated reasoning like you need in a crossword puzzle. To be able to come up with an answer, then test it, then refine it, then change it, then try again etc. Pure autoregression was bad at this without tons of tricks, but o1 is better, though not perfect by any means. While it solves the 4x4, it struggled with 5x5.

Note Emeer, Aloes and Reein aren’t words amongst others. It actually gave up, said:

Due to the limitations of the English language, creating a perfect 5x5 word square where both the rows and columns are all valid English words is practically unfeasible.

The interesting thing to me is that it fixed many of the problems of the previous models, where many of the words just weren’t 5 letters in length even, or it didn’t even get to make some or many of them actual words iteratively, even when looped.

But here, after giving me this admonishment, it then on its own decided to go do this in Latin.

It writes pretty good poetry, it can create much better economic and even fictional analyses. These were possible with really good prompts but now it's built-in. You can just ask the model to do something and have it perform at an expert level.

You can also ask it to come up with some Game of Life simulations.

From their evals o1 seems to be able to even beat International Olympiad gold levels, and play tic tac toe. It adds a Chain of Thought that's autogenerated through self play and then comes up with the answer.


One thing I really like here is that it's an orthogonal model. It's not better than 4o at everything, but it's a new thing. It will keep getting better. It will get faster. It will actually be able to perform better as the underlying models, it's self play, and the reasoning itself gets better.

I’ve written before about the problems of pure LLMs, that they can only go in one direction, and it’s really hard to get them to change their mind. This presents the first salvo in breaking through that wall. You can see the iteration below (there’s a lot more) where it considers the first thought, impact on the second, impact on that reply to the next, and so on.

It’s not perfect, it still relies on the intelligence of the underlying model 4o. Which means that it is still prone to hallucinations and getting distracted.

But it’s also able to correct itself.

A major change for me as a user is that this is much more of a complete agent than any of the other LLMs. With those it felt like I was a participant engaged in a collective search, this is far closer to a “better Google search”. I ask, it answers. I can look through its reasoning (the little that’s shown us) to understand, but mostly it’s not detailed enough nor is it very amenable to correction.

I can’t just “pluck” a bad step and tell it to fix that. Yet.

I also feel confident in making these predictions. It’s already able to perform much better if you run it for longer and for more steps. That’ll get made economical. We will also see this grounded with more real world engagement - through code interpreter for analytics problems, internet/ knowledgebase search for factual problems, document search for research problems, and more, such that when it’s iterating and reasoning it’s not just talking to itself.

Can you imagine once this has access to your Github and you tell it to code? Or if it has access to your company documents and you get to spin up new analyses at one request?

We’re this close to the bottleneck being human attention in reading inputs. I’m already glazing over the 25 steps it lays out and just looking for the salient parts!


The speed with which AI is getting better continues to be astounding. Now that it can think for itself, so to speak, and create lengthy and sensible outputs (it wrote a couple of 3000 word short stories for me which were very good), the number of jobs they could swallow have increased exponentially.

Now, o1, which combines LLMs with Reinforcement Learning to demonstrate reasoning, will also get copied, and margins competed away soon enough to make it abundant.

I wrote this a year and half ago. It’s true now. We all have an actual PhD analyst who works for us. Soon it will be smarter and less likely to hallucinate, grounded in the facts we care about, and able to touch real world systems.

Thanks for reading Strange Loop Canon! Subscribe for free to receive new posts and support my work.