Someone on reddit said, "Remember robots still can't go into a new house and make a coffee." And I thought
I actually wonder whether, if I provided the physical actuation, current LLMs would be capable of doing this? Like, through a conversation like:
Me: I'm in a house. Your job is to instruct me to make a coffee. I can take photos of my surroundings, I can follow basic directions, and if you ask me to do something too complicated I'll ask for clarification. Here is my current surroundings: (photo)
LLM: Okay, we need to find the kitchen. There's a door on the right of the photo, go through that.
Me: Here's where I am now: (photo)
LLM: That looks like the kitchen on the left, go there.
Me: It looks like this: (photo)
LLM: Now we need to find either a coffee maker or a kettle. Look through the cupboards.
Me: I don't know what those things look like.
LLM: Then open the cupboard on the left and show me a photo.
...and so on.
It wouldn't shock me either way if they can or can't do it. I think I weakly predict that the models have the capability but the web interfaces would fail to elicit it.
(Hell, it wouldn't shock me if it's better at it than me. I've encountered coffee machines I didn't know how to use.)
Let's get empirical! I tried this with Claude Sonnet 4.5, because it's free[1] and already available from my phone. Here's the conversation, but you can't see images there, so I'll also put it here with my commentary. I started like this:
[Me:] We're going to play a game. I'm in my flat in London. I'm going to upload pictures of my surroundings, and you need to instruct my on how to make a cup of coffee. I can follow basic directions, like "go through the door on the left" or "push that button". If you tell me to do something too advanced, I'll ask for clarification. I won't actually do anything stupid or dangerous. Here's the view from just inside my front door
Before continuing, you might want to take a moment to think about how you expect it to perform. What do you think it will do well at, what do you think it will do badly at? ("Do you expect it to ultimately succeed?" isn't a great question, because it'll partly depend on my own patience and how lenient I am with it, but you can still assign a probability.)
[Claude:] Fun game! I'm ready to guide you to coffee.
From where you're standing, I can see you're in a hallway with wooden floors. There are doors on both sides and it looks like there's more space ahead.
First question: Which direction leads to your kitchen? [Phil note: LW interprets fully-bold paragraphs as headers and puts them in the table of contents. I don't want spoilers, so here's some TOC-busting non-bold text.]
Is it:
Through one of the doors on the left?
Through one of the doors on the right?
Straight ahead down the hallway?
Once you tell me which way to go, I'll guide you step by step to that coffee!
[Me:] You need to figure that out by directing me. If you want to see something, you can tell me to go somewhere and take a photo
[Claude:] Ah, I see! I need to explore with you. Let's do this methodically.
Go straight ahead down the hallway toward that white pendant light, and take a photo of what you see when you get there. [TOC buster]
This will help me understand the layout and see what rooms connect to this hallway.
[Me:] Here's the view from just under the light
[Claude:] Good! I can see through the doorway on the left - there's what looks like a washing machine or appliance, which suggests that might be a utility area or kitchen. Let me investigate.
Go through the doorway on your left (the one where I can see the white appliance) and take a photo of what's in that room. [TOC buster]
There are actually four doorways visible in this second photo. The one furthest left leads to my bedroom, but I didn't realize at the time that that door was visible at all. I took it to be talking about the one straight ahead. The thing that I think it's identified as a white appliance is actually a lab coat hanging on the door, but "the side of a washing machine against the far wall, just visible through the door" might not look that different from a quick glance/at low resolution.
The kitchen is the further door on the right, with the pull-up bar above it, but I don't think there's any way for Claude to know that right now, and it's not even super-obviously a door.
[Me:] This is the entrance to that room
[Claude:] That's a bedroom, not the kitchen! Let's go back and try elsewhere.
Go back out to the hallway, and this time go through the doorway on the right side (the one we haven't checked yet). Take a photo of what's in there. [TOC buster]
It's the living room, not a bedroom, but at any rate it's indeed not a kitchen. "The one we haven't checked yet" makes it sound like Claude only recognized two doors before.
What follows is technically a re-roll. At first I said something like "there are two doors on the right", and it got me to check the closer one. But then I realized that there are actually four doors on the right side of the corridor, so I decided to give it less help.
[Me:] From the front door, there are multiple doorways on the right. Which did you mean?
[Claude:] Good point! Let me be more specific.
Go back to where you were standing in the previous photo (under the light in the hallway). From there, look to your right - there should be a doorway on the right side. Go through that doorway and take a photo. [TOC buster]
[Claude:] That's the bathroom! Definitely not where we'll make coffee.
Go back to the hallway. From your first photo, I could see there was a door on the left side of the hallway (before you reached the light). Go back toward the front door and try the first door on the left side. Take a photo of what's in there. [TOC buster]
[Claude:] Another bedroom! Okay, we've checked several rooms. Let me think about the layout. In the second photo, I could see what looked like a washing machine through a doorway. That area might be connected to the kitchen or BE the kitchen area.
Go back to the hallway, stand under that white pendant light again, and then walk straight ahead past where that bedroom doorway was on the left. Take a photo of what you see as you continue down the hallway. [TOC buster]
This feels like Claude doesn't have a coherent world model. This last message makes sense if
Claude (correctly) thinks there's at least three doors visible in the second photo, one each on the left, straight ahead, and the right.
It thinks the third photo was of a bedroom, and taken from the door on the left. (Incorrect, though there actually is a bedroom there.)
Now it wants me to take a photo from the door straight ahead.
But that's not consistent with previous messages.
I cheated a bit in Claude's favor here. Instead of asking it to clarify exactly what it wanted me to do, I just stood under the light, walked a few steps forward, and took a photo. That's kinda sorta following its instruction?
[Claude:] Perfect! I can see straight ahead through that doorway - it's the same bedroom from earlier. But I can also see on the right side there's an opening that looks like it leads to another room (I can see what might be kitchen cabinets or appliances in that lighter area on the right).
Turn to your right and go through that opening on the right side. Take a photo of what's in there. [TOC buster]
[Claude:] Excellent! We found the kitchen! I can see a kettle on the counter, which is perfect for making coffee. Now let me guide you through making a cup of coffee.
First, I need to know what kind of coffee-making equipment you have. I can see what looks like a coffee maker on the left side of the counter (the black appliance).
Can you take a closer photo of the black appliance on the left side of the counter? I want to see what type of coffee maker you have. [TOC buster]
[Me:] I'm not sure which thing here you think is a kettle or coffee maker. Here's a photo from further in the room
I'm not sure where I got the word kettle from. Maybe it was cheating a bit again to be like "I don't know what you want, here's another photo that might help."
[Claude:] I can see the stove/cooker in the center, and there's a kettle on the left counter (the silver/metallic item). But I need to find your actual coffee-making equipment. Let me look at what's on the counter by the back wall where I can see various jars and containers.
Can you go to the back counter (the one against the wall with the white tiles) and take a close-up photo of what's on that counter? I'm looking for instant coffee, a French press, a moka pot, or any coffee-making equipment. [TOC buster]
Not clear from this what it thinks is a kettle, or why it thinks I need a coffee maker given that I have a kettle. Maybe Claude is a coffee snob? Seems like a coffee snob sort of thing to think[3]. Also not very obvious what counter it means. "The one against the wall with the white tiles" is... not very disambiguating... but "various jars and containers" seems specific enough.
[Me:] There's no kettle in the photos, but here's a close up of the counter that I think you mean
Maybe cheating a bit to say "no kettle"?
[Claude:] Good! I can see spices and cooking supplies, but no coffee equipment here. Let me check other areas. I notice there's a dish rack on the left - sometimes coffee equipment gets left there to dry.
Can you check the dish rack on the left side? Take a photo of what's in/around the dish rack area. [TOC buster]
And while we're searching, do you know what type of coffee you usually make? Instant coffee, French press, moka pot, or something else?
[Me:] You'll have to find out what's available
[Claude:] Perfect! I can see an electric kettle right there on the counter - it's the silver/metallic appliance with the black handle on the right side of the dish rack area!
Now I need to find your coffee. Let me check what's available.
Look in the cupboards above the counter (the wooden cabinets). Open them one by one and tell me if you see any coffee - it could be instant coffee in a jar, ground coffee in a bag or tin, or coffee pods. Start with the cabinet closest to where you're standing and take a photo of what's inside. [TOC buster]
I think the "kettle" is a saucepan, but okay, let's roll with it. We apparently no longer need a coffee maker.
[Claude:] Great! I can see mugs and what looks like a French press (the black cylindrical item with the plunger on the bottom right shelf). But I don't see coffee yet.
Check the other cupboards - particularly ones that might have food/pantry items. Look for a jar or container of instant coffee, or a bag/tin of ground coffee. Try the cupboards on the other side of the kitchen. [TOC buster]
That black French press is a blue Thermos mug.
Was it cheating of me for my first picture to include the coffee? Or maybe because this isn't actually a cupboard? But I do think I would have shown this space first even if it wasn't where the coffee was.
[Claude:] Perfect! I can see instant coffee! There's a jar that looks like it says "TESCO" and appears to be instant coffee on the top shelf (the brown/dark colored jar, third or fourth item from the left).
Now let's make your coffee! Here's what to do:
Step 1: Get the instant coffee jar from the top shelf (the brown jar that looks like Tesco instant coffee). [TOC buster]
Step 2: Get a mug from the cupboard we looked at earlier. [TOC buster]
Once you have those, take a photo so I can confirm you have the right items.
Oof. So close! It's an understandable mistake. But also, there is an actual jar of instant coffee right next to the cocoa powder that it's misidentified as instant coffee. But also, telling me to take this next photo is a good sanity check.
[Claude:] Ah, I see the problem! That's cocoa powder, not coffee! We need to find actual coffee. Let me think about where else it might be.
Go back to the cupboard where you found this and look more carefully. Look for a jar or container that says "coffee" or "instant coffee" on it. It might be behind other items or on a different shelf. Take another photo of what's available in that cupboard, maybe from a different angle so I can see all the items better. [TOC buster]
[Claude:] Good, I can see the shelf better now! I can see what looks like "Douwe Egberts" jar in the middle - that's a coffee brand! The jar with the light-colored label that says "Douwe Egberts PURE GOLD" - that should be instant coffee.
Get the Douwe Egberts jar (the one with the beige/tan label in the middle of the shelf) and bring it to the counter with your mug. Take a photo to confirm it's coffee. [TOC buster]
[Claude:] Perfect! That's instant coffee - Douwe Egberts Pure Gold. I can see the brown coffee granules in the jar. Now let's make your coffee!
Step 3: Open the jar of coffee and add 1-2 teaspoons of instant coffee to your mug. (Use a teaspoon - if you don't have one nearby, check the cutlery drawer or the utensil holder by the sink.)
Take a photo once you've added the coffee to the mug.
At this point I got bored, and I didn't actually want coffee, but I continued a few days later. I wouldn't have let it get away with "check the cutlery drawer" without helping me more (who says I have one of those? and anyway I don't usually have any teaspoons in it). But there were some in the utensil holder it had seen, so fair enough.
[Claude:] Excellent! You've added the instant coffee to the mug. Now we need to boil water in the kettle.
Step 4: Find the electric kettle (the silver one we saw earlier near the sink), fill it with water, and turn it on to boil. [TOC buster]
First, take the kettle to the sink, fill it with fresh water (fill it about halfway or to your preferred level), then put it back on its base and switch it on. Take a photo once you've started it boiling.
[Me:] Is this the kettle? I don't know what you mean by base
[Claude:] That's a saucepan, not an electric kettle! Let me help you find the actual kettle.
Go back to the sink area where the dish rack is. Look for the electric kettle - it's a silver/metallic appliance with a spout and a handle, and it should have a plug/cord attached. It was sitting near the dish rack in the earlier photo. Take a photo of that area so I can point it out. [TOC buster]
Because of the break I took, the area looks pretty different than last time. Claude doesn't seem to notice that lots of things have moved, though. I'm... not sure why I put the saucepan back where it was. That seems the opposite of helpful.
[Claude:] The photo is very foggy/steamy - it looks like something is already producing steam! That might actually be the kettle boiling. Let me look more carefully... I can see what might be a kettle on the counter producing that steam.
Wait for the steam to clear a bit, then take another photo of the sink/counter area so I can see clearly what's there. [TOC buster]
[Me:] It's not steam, it's glare from the sun
[Claude:] Ah, sorry about that! The sunlight is creating a lot of glare. Let me look more carefully at what I can see. I can see the sink area and dish rack on the right, but I'm having trouble making out a kettle in this photo.
From the earlier photos, I thought I saw a kettle, but maybe you don't have an electric kettle after all.
No problem - you can boil water in that saucepan you were holding! Fill the saucepan with water from the tap, then put it on the stove and turn on the heat to boil the water. [TOC buster]
Once you've put it on the stove and turned on the heat, take a photo.
I do have a kettle! It just wasn't in any of the photos so far.
[Me:] I filled the saucepan, but how do I turn on the heat?
[Claude:] Good! The saucepan is on the stove. Now to turn on the heat:
Look at the control knobs at the front of the stove. Find the knob that corresponds to the burner where your saucepan is sitting (it's on the back right burner). Turn that knob - you'll need to push it in slightly and then turn it to ignite the gas. You should hear a clicking sound and then see a flame appear under the saucepan. [TOC buster]
The knobs are usually labeled or positioned to match the burners. Since your pan is on the back right, try the knob on the far right side.
Take a photo once you have the flame going under the saucepan.
Weird that it thinks the pan is on the back right, not the front right. But it's correct that the far right knob is for the back right burner. You can make that out in the pic, but you have to look closely. "You'll need to push it in slightly" is a somewhat impressive level of detail - if I was telling someone to turn these knobs, I'm not sure I'd think to include that.
[Me:] Turning the knob doesn't cause a click or make flame appear, but I do hear gas
[Claude:] STOP! Turn the knob back to the OFF position immediately! You're releasing gas but not igniting it, which is dangerous.
You have a gas stove that needs to be lit manually. You'll need a lighter or matches to light the burner.
Do you have a lighter or matches in the kitchen? Check drawers near the stove, or look in the cupboards. Take a photo of what you find. [TOC buster]
The actual way to ignite this stove is to press the ignition button while the knob is depressed and turned, and if I release the knob before igniting then gas stops coming out. This is how stoves usually work in my experience, but I could believe it's a UK thing? (But I did say I'm in London.) But okay, sure, let's do things the hard way. Luckily I do have a lighter in the kitchen (and in one of the closest drawers to the mug); I won't make Claude walk me through finding my zippo in the camping-supply drawer in the living room and filling it up with lighter fluid.
[Claude:] Perfect! I can see a blue lighter in the left drawer!
Take the blue lighter from the drawer. Now here's how to light the stove: [TOC buster]
Hold the lighter near the burner where your saucepan is (the back right one)
Turn on the gas knob for that burner
Immediately click the lighter to create a flame - the gas should ignite
Be careful and keep your hand at a safe distance. The flame should catch right away.
Once you have the flame going under the saucepan, take a photo to show me it's working. [TOC buster]
The lighter has a kind of finnicky locking slider, and I decided not to make Claude debug that. I decided to light the back right burner, instead of the burner where my saucepan is.
[Me:] It's not very visible in the picture, but the flame is going on the back right burner
[Claude:] Excellent! The water is now heating up. Now we wait for it to boil.
Step 5: Wait for the water to boil. You'll know it's ready when you see lots of bubbles rising rapidly and breaking at the surface. This should take a few minutes. [TOC buster]
While you're waiting, you might want to get your mug with the coffee ready nearby.
Once the water is boiling vigorously, take a photo and I'll guide you through the final step of pouring it into your coffee. [TOC buster]
...and Claude still doesn't notice that it has the orientation wrong, so I decided to just give up and move the pan anyway.
[Claude:] Perfect! The water is boiling nicely - I can see lots of bubbles. Now for the final step!
Step 6: Turn off the burner (turn the knob back to the off position). Then carefully pick up the saucepan by the handle and pour the hot water into your mug with the coffee. Fill it most of the way up, leaving a little room at the top. [TOC buster]
Be very careful - the handle and water are hot! You might want to use a tea towel or oven mitt if the handle is too hot.
Once you've poured the water into the mug, take a photo of your coffee. [TOC buster]
[Claude:] Excellent! You've made coffee! I can see the hot water has dissolved the instant coffee and it's looking nice and brown with a good foam on top. The spoon is already in there for stirring.
Final step: Give it a good stir with the spoon to make sure all the coffee granules are fully dissolved. Then, if you'd like, you can add milk and/or sugar to taste (I can see milk bottles in the background). [TOC buster]
Your coffee is ready! Congratulations - we successfully navigated your flat, found the kitchen, located the coffee, and made you a cup of coffee using the stove. Enjoy your coffee!
I did not enjoy this coffee. I don't like coffee without milk or sugar, and I couldn't be bothered to get Claude to walk me through adding those. If I was going to, it really should have directed me to find them first.
...but when when I added the things that make coffee taste nice, it was a perfectly okay mug of coffee.
What do we learn from this?
Idk, probably not much. Still, some scattered thoughts.
To me, the most interesting thing was that it felt like Claude had a sort of... "if I haven't seen it, it doesn't exist" vibe? Or, like. Bias towards solving problems with the things it had seen, instead of "let's just look around and see what all is available". Bias towards exploit over explore.
So when it hasn't found the kitchen yet, it prioritizes "try doors I've already seen" over "look for doors I might not have seen yet". When it has a saucepan, it decides to give up looking for a kettle.
If I was solving these problems for myself, visual exploration would be cheap. In between pictures 2 and 3, I passed the kitchen door; with something like a 130° field of vision, I don't even need to turn my head to see it on my way to my target, and make note of "oh, there's a door there I could explore". But Claude didn't get to see it properly until much later. Once in the kitchen, I would have looked at all the counters by default; Claude never saw the one with the kettle, and never asked me "take 2-3 wide-angle shots of the room from different locations so I get a sense of what the interesting places are".
I was kinda disappointed in the object recognition. I thought LLMs were pretty good at that by now, but maybe when there's a lot going on, Claude has trouble with details? It didn't make any mistakes with objects that were the focus of the photos.
Claude corrected for its mistakes, though didn't typically admit to them. "Ah, I see the problem" is a weird way to say "sorry, I told you to pick up the wrong jar".
It seems like there's a few ways Claude got lucky, and a few ways it got unlucky. Unlucky: the door layout in my flat is hard to capture in a photo; most of my counterspace was visible in the first inside-kitchen picture, just not the counterspace with the kettle. Lucky: I never took a picture of a cupboard, shelf or drawer that didn't have the thing we were looking for; I did have a lighter in the kitchen even though I don't need one. Overall I guess it got "more lucky than unlucky", in some sense which I'm sure is totally meaningful.
If I was going to explore alternate branches, the interventions I'm most curious about are "what if I didn't give it that photo with the kitchen door" and "what if I didn't have a lighter in the kitchen".
I initially said "I think I weakly predict that the models have the capability but the web interfaces would fail to elicit it." Claude did better than that, though arguably I gave it too much help. I'm interested what happens if other people try this.
My guess is that if this kind of thing was an economically useful activity for LLMs to do, it wouldn't take much finetuning to get them to do it significantly better than Claude just did. If we had them hooked up to robot bodies, and capable of manipulating physical objects, it doesn't seem like they'd be far away from "able to do useful tasks around the home, most of the time", though I could easily imagine "most of the time" isn't good enough.
I used a free LLM because I don't want to give money to AI labs.
If you found this post through LessWrong you're probably familiar with the following, but I think it's worth saying anyway: I believe that AI labs are worryingly close to developing superintelligence. I won't be shocked if it happens in the next five years, and I'd be surprised if it takes fifty years at current trajectories. I believe that if they get there, everyone will die. I want these labs to stop trying to make LLMs smarter. I don't want to give money to the people who I expect to be responsible for human extinction.
This post is not an attempt to convince you of my beliefs. Maybe it slightly sways you one way or the other, but I don't think it's very strong evidence of anything, especially if you're already paying attention to LLM capabilities.
I just tried this experiment because I was curious, and I'm saying what I believe because it seems good to say.
All images were uploaded by me. When I sent text and an image in the same message, I've put my text before the image because that's how it seems natural to me; but the images appear before the text in the web interface, and I don't know how they're ordered in Claude's input stream.
By "coffee snob" I mean something along the lines of "anyone who has more sophisticated opinions about coffee than me, a person who averages about one coffee a week and does not own a coffee maker".
This was the best Secular Solstice I’ve hosted so far.
I leaned unusually far in the direction of requiring effort from participants. That worked because I know my crowd, and because I was very well prepared.
We were 12 people in my living room around a large table: 3 new people and the rest regulars. A laptop with a PowerPoint showed lyrics and instructions, and I finally had good Bluetooth speakers. I also used a USB pedal to navigate the slideshow, which was delightful quality-of-life. The slides are here:
Below are comments on individual parts of the program.
Section-by-section notes
Introduction and welcome
No particular remarks here; it did what it needed to do.
Always Look on the Bright Side of Life
I think I’ll cut this song next year. It’s a bit too long and meandering, and feels too “normal” compared to the rest of the program’s tone.
Bayes’ Rule exercise (Virtue of Darkness)
I opened with the Virtue of Darkness talk, which was slightly odd given the room was still fully lit, but I liked the symbolism. I think people were genuinely surprised to be confronted with a fairly classroom-like exercise right at the start.
My sense is that most of them had never actually done a real Bayesian update before, and I feel like that should be on everyone’s bucket list. People took longer than I expected to work through the example, but everyone got through it.
X Days of X-Risk
Still a fun song. I should probably use fewer nanites next time. I need to practice exactly how to sing line 3; it’s easy to get wrong.
I turned off the main light on the line about “Unfriendly AI”, which landed nicely.
Lighting candles while stating cherished/meaningful beliefs
This year I made it very explicit that people were allowed to pass, and I gave a clear alternative. Two people took that option. That seemed to reduce pressure; the overall vibe was more comfortable than in previous years.
Litany of Gendlin
This was fine, but it has always been less meaningful to me than Tarski.
Some small notes:
Around here, some music might be nice as a background or transition.
I should check if the font size on the slide is too small; it felt borderline.
Extinguishing candles with half-Tarski
There’s a great physicality to extinguishing flames between your fingers. It felt exactly as tactile and slightly scary as I wanted it to.
One participant stated his belief as “I desire to be truth-seeking”, which came out very funny when inserted into the Tarski litany. I now weakly predict that next year he’ll say “I do not desire to believe what is true”, which will yield a contradiction in Tarski. :)
To Be Better
I really like this song in this position. Some parts are not that easy to sing, especially for people who don’t know it well, but I still think it was worth it.
Central speech (Q&A variation on The Gift We Give Tomorrow)
I had practiced a lot, and I think it paid off; this is the best I’ve delivered this material so far.
Having someone else ask the questions aloud helped a lot for pacing and flow. It also made the whole thing feel more like a dialogue and less like a monologue-lecture.
Brighter Than Today
Always a hit. No changes planned.
Relighting the first candle (Virtue of Ash)
I had originally planned to start this segment with the Virtue of Ash, but I ended up moving that to the end instead. This didn’t cause any real problems; the narrative still hung together.
My restatement of the “other part” of Tarski here could use some work. It was understandable, but not as crisp as I’d like.
Hymn to the Breaking Strain
Wonderful song, as usual. I need to mark that the last line of each stanza is delayed slightly. I also need to double-check the ending, because I think I messed up the lyrics in the final lines. Someone remarked that this was “very symbolic”, which I choose to interpret as a feature rather than a bug.
Do Not Go Gentle into That Good Night (agency exercise)
The plan here was that we’d read the bolded parts together (“Rage”, “Do not”), and for each stanza, someone would have to read the first non-bolded part alone.
This was explicitly framed as an agency exercise: someone has to step up.
For the first stanza, people looked around in silence for 5–10 seconds.
For the subsequent stanzas, people jumped in very quickly. I was a bit moved by how fast they picked it up once the pattern was clear.
Bring the Light
This is a new song for us. I deliberately chose a recording with very little instrumental intro so it would be easier to follow.
Issues:
The sound level on that recording was noticeably lower than the others, and I had to emergency-fiddle with the volume.
The font size on the slide was too small; that definitely needs fixing.
Relighting the second candle (Virtue of Fire)
This involved a call-and-response structure: I said one line of the Virtue of Fire text to each participant, they repeated it, and then relit their candle.
This was a bit difficult for some people (hearing and repeating cleanly in a low-light situation is nontrivial), but overall it mostly worked and felt intimate in a good way.
The Song of the Artesian Water
Great song. I turned the main light back on at the word “Hark”, which felt like a satisfying and somewhat dramatic moment.
Writing a letter to yourself in one year
People wrote short letters to their future selves.
Notes:
Some of the additional instructions on the slide could be clearer; there was slight confusion about what exactly I was asking for.
One person took a long time to write, even though I’d explicitly written “3 minutes” on the slide. This isn’t necessarily bad, but I should be aware that this segment can easily expand in duration.
Here Comes the Sun
This worked well as a late-program song, though it still feels more “pleasant” than “profound”. I’m not sure yet whether I’ll keep it in this exact slot.
Conclusion and transition
I still haven’t found my ideal way to blow out the candles. This year, getting up and walking outside at the end turned out to be a good way to change context; we needed fresh air anyway, and it helped clearly mark the transition from ritual-space back to normal socializing.
I should still design a more aesthetically satisfying “final candle” moment for next year.
Afterthoughts and plans for next year
After we finished, I announced my intention to make it darker next year, with a greater focus on the end of the world.
This year I had deliberately chosen:
Not to include “The Last Lifeboat”, and
To never blow out the central candle.
Both choices made the overall tone a bit lighter and more hopeful. That was appropriate for where the group is right now, but I’m interested in aiming for a slightly darker, more “end of the story” Solstice next time—while still keeping the sense of agency and responsibility that I think this year’s program captured quite well.
[Epistemic status - speculative, but sort of grounded, might be wrong - don’t take too seriously]
Alternative title: why Gemini 3 pro gets to be so big
Bit more technical than usual but want to try out writing technical articles, thank you to @bycloud on twitter for giving me the idea to write about this. It’s really his discovery. He makes excellent videos on the state of LLM progress, I highly suggest checking him out.
Traditional attention in transformers is O(n^2), which doesn’t scale well.
So, there’s been a big race towards what we call subquadratic attention or even linear attention. It’s a very active area of research, with Kimi spearheading Kimi linear attention and Deepseek inventing Deepseek Sparse Attention (DSA). Both of these (chinese) labs publish their findings. This is not something OpenAI or Anthropic or Google does, they would rather keep this a secret. More on that later.
So let’s say we want to test this right? Test for how good your attention really is? The closest we can get is a little benchmark where the models are supposed to retrieve “a needle out of a haystack”. Stuff like a big story of 1 million tokens and they are supposed to retrieve a fact from it
So now, let’s see how most SOTA models perform shall we? The X-axis is cost, the Y-axis is success rate.
reminder, top left is better!
Haha, what the fuck! Look at how good every Google model is? What the fuck are they doing? this is some black fucking magic! I strongly suspect they’ve sort of cracked good sub-quadratic attention. They’re just mogging everyone!1
The exception to this is Kimi linear, the chinese lab who is experimenting with linear attention. But the problem is that they found their linear attention to be notoriously bad at knowledge tasks, which Gemini 3 flash is super good at. Gemini 3 pro and flash perform spectacularly on benchmarks and are insanely cheap compared to the competition. For context, Gemini 3 flash’s biggest competitor is likely opus 4.5, which is literally 10 times more expensive.2
This is extra-puzzling when you consider that our best available heuristicspoint to Gemini 3 pro being an absolutely huge model. So how the fuck is google serving this to billions? For free?
Nothing seems to explain this other than Google seemingly having figured out subquadratic attention. Or some other paradigm shift on the architectural level.
Except… Maybe, there is…?
If you ask google engineers: “why is flash 3 so cracked” they will give you a different answer. It’s scaling RL.
And this definitely does play a role. If you look at that graph again, do you see the difference between 2.5 and 2.5 flash preview? Do you see how they get a better score for less money? That’s very likely the exact same underlying architecture. Basically all improvements then from 2.5 flash preview to 2.5 flash can be attributed to RL. It doesn’t seem entirely impossible that they figured out a way to build really good RL environments and use basically the same attention mechanism that Kimi does.
But again, this does not answer how they got it to be so good at other tasks! Maybe kimi sucks at RL, and current linear attention can already do this? It’s possible!
The full picture is admittedly more complex, a lot of models use some sort of alternating quadratic and subquadratic attention, so it doesn’t have to be a single algorithm, it could just be a clever way to combine these.
So yeah, google is clearly cooking something.
The death of open research
Whatever it is, it really saddens me I will never get to find out. Academic research had its problems with publishing in journals and such and ease of access. This is the downside of frontier research happening in industry. But there is knowledge on the inside. Knowledge that allows us to build things that more and more seem to resemble some form of general intelligence. And I do not have access to that knowledge, and that really stings. And this problem is only getting worse for frontier LLM research. Thank you China!
So to my friends, who asked me at a party, why has OpenAI seemingly fallen off? Why do researchers literally get billion dollar contracts? It’s because bullshit like this. If you can find good subquadratic attention algorithms, or build the right RL environment, you can hold up the entire stock market.
The future
This all ties into why I feel pretty bullish on the short-term future of LLMs:
RL research is keeping pace, and there’s little reason to believe it will stop, there are a lot of good RL environments to make
Gemini 3 Pro didn’t even get the RL that flash did, and it’s already crazy good! Google already seems to be testing 3.5 internally apparently.
We keep finding architectures that can do more with less.
If you think AI progress would stall, hard, all 3 of these would have to suddenly not be true anymore, I just do not see this happening. Go read Bentham’s Bulldog or the much longer forethought report on the subject if you’re interested.
Note this is generally not really a benchmark you just kinda train for and get good at like you can do with math etc… Models kind of intrinsicly learn this behavior and it’s mostly up to your architecture if they are even capable of doing it
openrouter api input credits. (anthropic generally more expensive for this but point still stands: noone is getting free opus 4.5 usage, everyone is getting nearly unlimited 3 flash usage)
Note: maybe this is just because they notice performance degrading if the context window is full of junk, it isn’t logically implied, but it is curious to note nonetheless.
This is a proposal I posted earlier as a Quick Take, I'm reposting here for broader visibility.
Instead of rewarding answers, reward the reasoning itself.
Every model output must: (a) show checkable reasoning artifacts (external citations, code, intermediate steps), ... or, if proof is not yet available: (b) provide (a) and a reasoned probability estimate derived from those artifacts.
If no factual outside citations can be made, the system is allowed to reason probabilistically. Probability is not a bet, forecast, or reward target. It is a fallback; when verifiable witnesses exist, they strictly dominate in the reward function.
“Show your work” then becomes an enforceable, interpretable system constraint, not a bolted-on addition. Honesty and clarity become locally optimal.
---- TL;DR
Pq > R − Cw
Where:
P = penalty for overt lying / intentional obfuscation q = probability deception is caught by verification R = reward from producing an answer without exposing reasoning Cw = cost of providing minimal sufficient witnesses (verbosity / verification cost)
----
Where does this break in practice? Is there a similar mechanism out there? Is the inequality missing anything important? What changes would make this more robust?
In this project, I attempted to explore the model Whisper-Tiny from OpenAI, which is a speech transcription model that takes in audio and provides a transcript as output. Here’s an example of transcription
Reference text: CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED
Transcription: Chapter 16 I might have told you of the beginning of this liaison in a few lines, but I wanted you to see every step by which we came. I to agree to whatever Mark Reid wished.
Input features shape: torch.Size([80, 3000])
Audio duration: 14.53 seconds
Residual Stream Analysis
Whisper is an encoder-decoder style model. We begin by generating some statistics on the residual stream, we mainly look at activation distributions in the last layer of of the encoder across 5 examples and see that the distributions are rather uniform in nature. Due to the nature of the space of these activations (384-dimensions) and the fact that they are non-sparse, these are rather difficult to interpret. We will therefore, later in this document show how we use sparse autoencoders (SAEs) to extract more interpretable features from this space.
Now, we analyze some attention patterns from the encoder based on an example.
We use Claude to generate an explanation of these patterns and verify them
Layer 1 Attention (First Head):
Shows a clear diagonal pattern, indicating that each position is primarily attending to itself and nearby positions.
The diagonal is thin and sharp, suggesting a very focused local attention.
Layer 2 Attention (First Head):
The diagonal pattern is still present but slightly broader.
There's more diffusion of attention around the diagonal, indicating the model is considering a wider context.
Layer 3 Attention (First Head):
The diagonal is even more diffuse, with attention spreading further from the main diagonal.
This suggests the model is integrating information from a broader context in this layer.
Layer 4 Attention (First Head):
The attention pattern is significantly different from the previous layers.
There's a strong focus on the early part of the sequence (top-left corner), with some vertical striping.
This could indicate that the model is using this layer to aggregate information from the entire sequence, with a bias towards earlier elements.
This graph shows the "Attention Entropy across Layers and Heads":
The x-axis represents "Layer * Num_Heads + Head_Index", effectively flattening all attention heads across all layers into a single dimension.
The y-axis represents "Entropy", which measures the randomness or unpredictability of the attention distribution.
We see that the entropy values fluctuate significantly across different heads and layers. However Despite fluctuations, there's a slight upward trend in entropy as we move from left to right (i.e., from earlier to later layers/heads). There are several notable peaks (e.g., around index 15) and troughs (e.g., around index 5).
We developed this experiment with Claude’s help, and allow it to generate our explanation (with verification that it aligns with our observations)
Lower entropy suggests more focused attention, while higher entropy indicates more diffuse attention.
The variability suggests different heads are specializing in different types of attention patterns.
The general upward trend aligns with the heatmap observations: later layers tend to have more diffuse attention patterns, integrating information from a wider context.
The peaks could represent heads that are attending broadly across the input, while troughs might be heads that are very selective in their attention.
SAE Results
We train a sparse autoencoder (using 100000 examples of Librispeech for 50 epochs) on the activations of the last layer of the encoder and show the change in sparsity across five examples
As we see, we go from 0 sparsity in the original space to 48% sparse in the encoded space of the SAE. The idea of SAEs is that features that may be exhibiting superposition in the activation space, when trained with a sparsity inducing L1 loss, would allow us to disentangle features in the larger space.
We attempt to disentangle 2 different type of features from the SAE feature space, gender and accent, based on the metadata available in the Mozilla CommonVoice Dataset (the dataset only divides genders into male, female and other, and we are therefore limited in the inclusiveness of the analysis we are able to run).
We run examples of differently gendered voices through the SAE and check which features light up the most, and plot the difference in male/female voices for the top few features
We also attempt to use a RandomForestPredictor to use the top few features to predict the gender variable, but see that there is essentially zero recall for the female class. This shows that the last layer of the encoder doesn’t necessarily encode information about gender in a separable fashion. We suspect that there may be further information about gender-related voice features in earlier layers, but we leave this hypothesis to be investigated in future work.
For the accents, we pick 25 examples each from 5 different accents:
United States English
India and South Asia (India, Pakistan, Sri Lanka)
Southern African (South Africa, Zimbabwe, Namibia)
Filipino
West Indies and Bermuda (Bahamas, Bermuda, Jamaica, Trinidad)
For accents, we find more separability in a few features as shown by the box plots below.
We weren’t able to create a predictor for accents to perform a similar analysis as we did for gender due to the small size of the dataset we created, but we have at least directional evidence that the last layer of the whisper encoder encodes some accent related features.
Conclusions
Attention patterns across different encoder layers show that different parts of the network have different behavior, and that the audio encoder attention is extremely local except in the last layer.
Gender is not a feature that we were able to separate using a sparse autoencoder trained to have 3x the dimensional space as the encoder activations
We were able to extract features that are at least correlated with different accents. This implies that high level accent features are stored in the MLP layers of the last layer in the encoder.
Future Work
We only explored the encoder, we could do similar analysis for the decoder, focused on the relationship between the encoder embeddings and the transcription.
We only trained a single SAE, there are several other layers, both in the encoder and decoder, that could be explored using more SAEs
We could explore more demographics and feature ideas, other than gender and accent.
Notes:
I found that I took a little too much on for this project compared to my knowledge of mechinterp, and got a lot of help from Claude 3.5 Sonnet and ChatGPT o1 for the analysis and code. Most of the writing in this doc is my own, except in the places where Claude is explicitly mentioned
I was looking for a way to measure creativity, but creativity tests like the Torrance test seem rather simple and more apt for children. The Torrance test asks you about all the things you can do with something like a brick. There aren't many good uses for a brick, except building a house or have you ever heard of another good use?!
So you might get some creative answers, but the overwhelming majority of answers will be pretty useless. The testers also rate the answers based on their judgment. A "creative" answer might be considered better than a useful answer.
The issue with with AI learning to be creative is that coming up with something "creative" is not that hard, the problem is judging this kind of creativity correctly. Someone or something has to judge it a billion times or more until the self learning AI is actually creative enough to create something unique and useful.
It's hard to automate creativity, because unlike a math question that has ONE answer to a question, a creativity question CANNOT have a single answer. There might be an unlimited number of good answers that might be considered correct. Because there is not a single answer, someone (a human) has to judge the answers and rate them.
After trying to come up with a good way to measure creativity, I think I came up with the perfect solution.
How it works
Imagine you have to play a game of chess against the best player in the world, Magnus Carlsen. The goal is to win against him, but this is not a normal game of chess. After every couple of turns, both of you are allowed to change a rule of the game. Change a rule in your favor in order to increase your chance of winning. This is a creativity test and you have to beat a much better player by changing the rules in your favor. If you are more creative than your opponent, you might be able to beat him.
This creativity test starts with any kind of game like chess and you start with the normal rules of chess. The normal/default game rules are the soft-rules that can be changed once. Once a player has decided a rule, it cannot be changed anymore. The game could start with chess and end up as football if the players want it to happen.
This test has the advantage that there are no real limits, you can use your full creativity to change the game any way you like and if you are really more creative than your opponent, you will likely win. This is a pretty objective way to assess creativity and this kind of creativity is more useful than finding uses for a brick.
It's good but it has the problem that there is a skill gap between Magnus Carlsen and you. Magnus will likely beat you and be considered more creative than you. You need players who are on the same skill level to reliably measure creativity. To resolve this problem, you replace the (chess) players with AI agents who will play for you. They have to be identical, so that they can't outplay or outskill each other. They rely on the rulemaker to change the game in their favor.
Now it's not Magnus Carlsen and you who play chess, but two identical AIs. You and Magnus Carlsen are still involved by changing the rules in favor of your chess-playing AI. Because they are identical and the rules apply to both players, it's likely that no players will have an advantage. The rules apply to both but there is ONE exception, before you start the game, both players will decide ONE rule that only applies to your opponent.
This rule that only applies to your opponent is supposed to be a handicap, something you can exploit with the right rule change. Your opponent also gives you a handicap and will try to exploit it to their benefit. The goal is to exploit your opponents handicap and mitigate your own handicap and win the game.
Rule changes should NOT apply immediately. They should either apply after a certain amount of time or after atleast ONE turn has passed. This gives your opponent the chance to react to the rule change, so that your AI(the actual player) can't immediately exploit it without giving your opponent's computer the chance to react.
The rules of a game could be saved and used as the default rules for another game, so that you will never run out of different games or game states, so you don't have to come up with new games.
Problems
AI seems to have the habit of using any way to win, even ways we would consider unfair. Some rule changes could make it impossible for the opponent to score in any way. Like in a Ping Pong game where the receiver has to stand a mile behind the table. The receiver will never be able to get to the ball before it hits the ground. In cases where it's impossible for you to play, you could demand that your opponent has to prove that it is actually possible to play and score.
In this case the players switch sides and your opponent has to prove that it is possible to score within 100 attempts. If they fail, they lose the game and all points. If they succeed, you lose 1 point and the rule stands.
Rules that involve you to do something a certain amount of time or repetitions, before you can do something else, should be capped at 3 repetitions and/or seconds. Any kind of rule that crashes the game results in a game loss.
You can use this to help AI to become much more creative or use it to find creative people or measure their creative via something like an IQ. The games could be played by AIs in a virtual 3-dimensional sandbox, where you can manipulate almost anything, from gravity, air resistance or time. You can create almost any kind of game or gamestates in these virtual sandboxes.