2025-07-17 08:00:00
Your eyes sense color. They do this because you have three different kinds of cone cells on your retinas, which are sensitive to different wavelengths of light.
For whatever reason, evolution decided those wavelengths should be overlapping. For example, M cones are most sensitive to 535 nm light, while L cones are most sensitive to 560 nm light. But M cones are still stimulated quite a lot by 560 nm light—around 80% of maximum. This means you never (normally) get to experience having just one type of cone firing.
So what do you do?
If you’re a quitter, I guess you accept the limits of biology. But if you like fun, then what you do is image people’s retinas, classify individual cones, and then selectively stimulate them using laser pulses, so you aren’t limited by stupid cone cells and their stupid blurry responsivity spectra.
Fong et al. (2025) choose fun.
When they stimulated only M cells…
Subjects report that [pure M-cell activation] appears blue-green of unprecedented saturation.
If you make people see brand-new colors, you will have my full attention. It doesn’t hurt to use lasers. I will read every report from every subject. Do our brains even know how to interpret these signals, given that they can never occur?
But tragically, the paper doesn’t give any subject reports. Even though most of the subjects were, umm, authors on the paper. If you want to know what this new color is like, the above quote is all you get for now.
Or… possibly you can see that color right now?
If you click on the above image, a little animation will open. Please do that now and stare at the tiny white dot. Weird stuff will happen, but stay focused on the dot. Blink if you must. It takes one minute and it’s probably best to experience it without extra information i.e. without reading past this sentence.
The idea for that animation is not new. It’s plagiarized based on Skytopia’s Eclipse of Titan optical illusion (h/t Steve Alexander), which dates back to at least 2010. Later I’ll show you some variants with other colors and give you a tool to make your own.
If you refused to look at the animation, it’s just a bluish-green background with a red circle on top that slowly shrinks down to nothing. That’s all. But as it shrinks, you should hallucinate a very intense blue-green color around the rim.
Why do you hallucinate that crazy color? I think the red circle saturates the hell out of your red-sensitive L cones. Ordinarily, the green frequencies in the background would stimulate both your green-sensitive M cones and your red-sensitive L cones, due to their overlapping spectra. But the red circle has desensitized your red cones, so you get to experience your M cones firing without your L cones firing as much, and voilà—insane color.
So here’s my question: Can that type of optical illusion show you all the same colors you could see by shooting lasers into your eyes?
That turns out to be a tricky question. See, here’s a triangle:
Think of this triangle as representing all the “colors” you could conceivably experience. The lower-left corner represents only having your S cones firing, the top corner represents only your M cones firing, and so on.
So what happens if you look different wavelengths of light?
Short wavelengths near 400 nm mostly just stimulate the S cones, but also stimulate the others a little. Longer wavelengths stimulate the M cones more, but also stimulate the L cones, because the M and L cones have overlapping spectra. (That figure, and the following, are modified from Fong et al.)
When you mix different wavelengths of light, you mix the cell activations. So all the colors you can normally experience fall inside this shape:
That’s the standard human color gamut, in LMS colorspace. Note that the exact shape of this gamut is subject to debate. For one thing, the exact sensitivity of cells is hard to measure and still a subject of research. Also, it’s not clear how far that gamut should reach into the lower-left and lower-right corners, since wavelengths outside 400-700 nm still stimulate cells a tiny bit.
And it gets worse. Most of the technology we use to represent and display images electronically is based on standard RGB (sRGB) colorspace. This colorspace, by definition, cannot represent the full human color gamut.
The precise definition of sRGB colorspace is quite involved. But very roughly speaking, when an sRGB image is “pure blue”, your screen is supposed to show you a color that looks like 450-470 nm light, while “pure green” should look like 520-530 nm light, and “pure red” should look like 610-630 nm light. So when your screen mixes these together, you can only see colors inside this triangle.
(The corners of this triangle don’t quite touch the boundaries of the human color gamut. That’s because it’s very difficult to produce single wavelengths of light without using lasers. In reality, the sRGB specification say that pure red/blue/green should produce a mixture of colors centered around the wavelengths I listed above.)
What’s the point of all this theorizing? Simple: When you look at the optical illusions on a modern screen, you aren’t just fighting the overlapping spectra of your cones. You’re also fighting the fact that the screen you’re looking at can’t produce single wavelengths of light.
So do the illusions actually take you outside the natural human color gamut? Unfortunately, I’m not sure. I can’t find much quantitative information about how much your cones are saturated when you stare at red circles. My best guess is no, or perhaps just a little.
If you’d like to explore these types of illusions further, I made a page in which you can pick any colors. You can also change the size of the circle, the countdown time, if the circle should shrink or grow, and how fast it does that.
You can try it here. You can export the animation to an animated SVG, which will be less than 1 kb. Or you can just save the URL.
Some favorites:
If you’re colorblind, I don’t think these will work, though I’m not sure. Folks with deuteranomaly have M cones, but they’re shifted to respond more like L cones. In principle, these types of illusions might help selectively activate them, but I have no idea if that will lead to stronger color perception. I’d love to hear from you if you try it.
2025-07-10 08:00:00
The idea of “processed food” may simultaneously be the most and least controversial concept in nutrition. So I did a self-experiment alternating between periods of eating whatever and eating only “minimally processed” food, while tracking my blood sugar, blood pressure, pulse, and weight.
Carrots and barley and peanuts are “unprocessed” foods. Donuts and cola and country-fried steak are “processed”. It seems like the latter are bad for you. But why? There are several overlapping theories:
Maybe unprocessed food contains more “good” things (nutrients, water, fiber, omega-3 fats) and less “bad” things (salt, sugar, trans fat, microplastics).
Maybe processing (by grinding everything up and removing fiber, etc.) means your body has less time to extract nutrients and gets more dramatic spikes in blood sugar.
Maybe capitalism has engineered processed food to be “hyperpalatable”. Cool Ranch® flavored tortilla chips sort of exploit bugs in our brains and are too rewarding for us to deal with. So we eat a lot and get fat.
Maybe we feel full based on the amount of food we eat, rather than the number of calories. Potatoes have around 750 calories per kilogram while Cool Ranch® flavored tortilla chips have around 5350. Maybe when we eat the latter, we eat more calories and get fat.
Maybe eliminating highly processed food reduces the variety of food, which in turn reduces how much we eat. If you could eat (1) unlimited burritos (2) unlimited iced cream, or (3) unlimited iced cream and burritos, you’d eat the most in situation (3), right?
Even without theory, everyone used to be skinny and now everyone is fat. What changed? Many things, but one is that our “food environment” now contains lots of processed food.
There is also some experimental evidence. Hall et al. (2019) had people live in a lab for a month, switching between being offered unprocessed or ultra-processed food. They were told to eat as much as they want. Even though the diets were matched in terms of macronutrients, people still ate less and lost weight with the unprocessed diet.
On the other hand, what even is processing? The USDA—uhh—may have deleted their page on the topic. But they used to define it as:
washing, cleaning, milling, cutting, chopping, heating, pasteurizing, blanching, cooking, canning, freezing, drying, dehydrating, mixing, or other procedures that alter the food from its natural state. This may include the addition of other ingredients to the food, such as preservatives, flavors, nutrients and other food additives or substances approved for use in food products, such as salt, sugars and fats.
It seems crazy to try to avoid a category of things so large that it includes washing, chopping, and flavors.
Ultimately, “processing” can’t be the right way to think about diet. It’s just too many unrelated things. Some of them are probably bad and others are probably fine. When we finally figure out how nutrition works, surely we will use more fine-grained concepts.
For now, I guess I believe that our fuzzy concept of “processing” is at least correlated with being less healthy.
That’s why, even though I think seed oil theorists are confused, I expect that avoiding seed oils is probably good in practice: Avoiding seed oils means avoiding almost all processed food. (For now. The seed oil theorists seem to be busily inventing seed-oil free versions of all the ultra-processed foods.)
But what I really want to know is: What benefit would I get from making my diet better?
My diet is already fairly healthy. I don’t particularly want or need to lose weight. If I tried to eat in the healthiest way possible, I guess I’d eliminate all white rice and flour, among other things. I really don’t want to do that. (Seriously, this experiment has shown me that flour contributes a non-negligible fraction of my total joy in life.) But if that would make me live 5 years longer or have 20% more energy, I’d do it anyway.
So is it worth it? What would be the payoff? As far as I can tell, nobody knows. So I decided to try it. For at least a few weeks, I decided to go hard and see what happens.
I alternated between “control” periods and two-week “diet” periods. During the control periods, I ate whatever I wanted.
During the diet periods I ate the “most unprocessed” diet I could imagine sticking to long-term. To draw a clear line, I decided that I could eat whatever I want, but it had to start as single ingredients. To emphasize, if something had a list of ingredients and there was more than one item, it was prohibited. In addition, I decided to ban flour, sugar, juice, white rice, rolled oats (steel-cut oats allowed) and dairy (except plain yogurt).
Yes, in principle, I was allowed to buy wheat and mill my own flour. But I didn’t.
I made no effort to control portions at any time. For reasons unrelated to this experiment, I also did not consume meat, eggs, or alcohol.
This diet was hard. In theory, I could eat almost anything. But after two weeks on the diet, I started to have bizarre reactions when I saw someone eating bread. It went beyond envy to something bordering on contempt. Who are you to eat bread? Why do you deserve that?
I guess you can interpret that as evidence in favor of the diet (bread is addictive) or against it (life sucks without bread).
The struggle was starches. For breakfast, I’d usually eat fruit and steel-cut oats, which was fine. For the rest of the day, I basically replaced white rice and flour with barley, farro, potatoes, and brown basmati rice, which has the lowest GI of all rice. I’d eat these and tell myself they were good. But after this experiment was over, guess how much barley I’ve eaten voluntarily?
Aside from starches, it wasn’t bad. I had to cook a lot and I ate a lot of salads and olive oil and nuts. My options were very limited at restaurants.
I noticed no obvious difference in sleep, energy levels, or mood, aside from the aforementioned starch-related emotional problems.
I measured my blood sugar first thing in the morning using a blood glucose monitor. I abhor the sight of blood, so I decided to sample it from the back of my upper arm. Fingers get more circulation, so blood from there is more “up to date”, but I don’t think it matters much if you’ve been fasting for a few hours.
Here are the results, along with a fit, and a 95% confidence interval:
Each of those dots represents at least one hole in my arm. The gray regions show the two two-week periods during which I was on the unprocessed food diet.
I measured my systolic and diastolic blood pressure twice each day, once right after waking up, and once right before going to bed.
Oddly, it looks like my systolic—but not diastolic—pressure was slightly higher in the evening.
I also measured my pulse twice a day.
(Cardio.) Apparently it’s common to have a higher pulse at night.
Finally, I also measured my weight twice a day. To preserve a small measure of dignity, I guess I’ll show this as a difference from my long-term baseline.
Here’s how I score that:
Outcome | Effect |
---|---|
Blood sugar | Nothing |
Systolic blood pressure | Nothing? |
Diastolic blood pressure | Nothing? |
Pulse | Nothing |
Weight | Maybe ⅔ of a kg? |
Urf.
Blood sugar. Why was there no change in blood sugar? Perhaps this shouldn’t be surprising. Hall et al.’s experiment also found little difference in blood glucose between the groups eating unprocessed and ultra-processed food. Later, when talking about glucose tolerance they speculate:
Another possible explanation is that exercise can prevent changes in insulin sensitivity and glucose tolerance during overfeeding (Walhin et al., 2013). Our subjects performed daily cycle ergometry exercise in three 20-min bouts […] It is intriguing to speculate that perhaps even this modest dose of exercise prevented any differences in glucose tolerance or insulin sensitivity between the ultra-processed and unprocessed diets.
I also exercise on most days. On the other hand, Barnard et al. (2006) had a group of people with diabetes follow a low-fat vegan (and thus “unprocessed”?) diet and did see large reductions in blood glucose (-49 mg/dl). But they only give data after 22 weeks, and my baseline levels are already lower than the mean of that group even after the diet.
Blood pressure. Why was there no change in blood pressure? I’m not sure. In the DASH trial, subjects with high blood pressure ate a diet rich in fruits and vegetables saw large decreases in blood pressure, almost all within two weeks. One possibility is that my baseline blood pressure isn’t that high. Another is that in this same trial, they got much bigger reductions by limiting fat, which I did not do.
Another possibility is that unprocessed food just doesn’t have much impact on blood pressure. The above study from Barnard et al. only saw small decreases in blood pressure (3-5 mm Hg), even after 22 weeks.
Pulse. As far as I know, there’s zero reason to think that unprocessed food would change your pulse. I only included it because my blood pressure monitor did it automatically.
Weight. Why did I seem to lose weight in the second diet period, but not the first? Well, I may have done something stupid. A few weeks before this experiment, I started taking a small dose of creatine each day, which is well-known to cause an increase in water weight. I assumed that my creatine levels had plateaued before this experiment started, but after reading about creatine pharmacokinetics I’m not so sure.
I suspect that during the first diet period, I was losing dry body mass, but my creatine levels were still increasing and so that decrease in mass was masked by a similar increase in water weight. By the second diet period, my creatine levels had finally stabilized, so the decrease in dry body mass was finally visible. Or perhaps water weight has nothing to do with it and for some reason I simply didn’t have an energy deficit during the first period.
This experiment gives good evidence that switching from my already-fairly-healthy diet to an extremely non-fun “unprocessed” diet doesn’t have immediate miraculous benefits. If there is any effect on blood sugar, blood pressure, or pulse, they’re probably modest and long-term. This experiment gives decent evidence that the unprocessed diet causes weight loss. But I hated it, so if I wanted to lose weight, I’d do something else. This experiment provides very strong evidence that I like bread.
2025-07-08 08:00:00
Goats, like most hoofed mammals, have horizontal pupils.
[…]
When a goat’s head tilts up (to look around) and down (to munch on grass), an amazing thing happens. The eyeballs actually rotate clockwise or counterclockwise within the eye socket. This keeps the pupils oriented to the horizontal.
[…]
To test out this theory, I took photos of Lucky the goat’s head in two different positions, down and up.
(2) Novel color via stimulation of individual photoreceptors at population scale (h/t Benny)
The cones in our eyes all have overlapping spectra. So even if you look at just a single frequency of light, more than one type of cone will be stimulated.
So, obviously, what we need to do is identify individual cone cell types on people’s retinas and then selectively stimulate them with lasers so that people can experience never-before-seen colors.
Attempting to activate M cones exclusively is shown to elicit a color beyond the natural human gamut, formally measured with color matching by human subjects. They describe the color as blue-green of unprecedented saturation.
When I was a kid and I was bored in class, I would sometimes close my eyes and try to think of a “new color”. I never succeeded, and in retrospect I think I have aphantasia.
But does this experiment suggest it is actually possible to imagine new colors? I’m fascinated that our brains have the ability to interpret these non-ecological signals, and applaud all such explorations of qualia space.
(3) Simplifying Melanopsin Metrology (h/t Chris & Alex)
When reading about blue-blocking glasses, I failed to discover that the effects of light on melatonin don’t seem to be mediated by cones or rods at all. Instead, around 1% of retinal photosensitive cells are melanopsin-containing retinal ganglion cells.
These seem to specifically exist for the purpose of regulating melatonin and circadian rhythms. They have their own spectral sensitivity:
If you believe that sleep is mediated entirely by these cells, then you’d probably want to block all frequencies above ~550 nm. That would leave you with basically only orange and red light.
However, Chris convinced me that if you want natural melatonin at night, the smart thing is primarily rely on dim lighting, and only secondarily on blocking blue light. Standard “warm” 2700 K bulbs only reduce blue light to around ⅓ as much. But your eyes can easily adapt to <10% as many lux. If you combine those, blue light is down by ~97%.
The brain doesn’t seem to use these cells for pattern vision at all. Although…
In work by Zaidi, Lockley and co-authors using a rodless, coneless human, it was found that a very intense 481 nm stimulus led to some conscious light perception, meaning that some rudimentary vision was realized.
Airplanes have to guess how much food to bring. So either they waste energy moving around extra food that no one eats, or some people go hungry. So why don’t we have people bid on food, so nothing goes to waste?
I expect passengers would absolutely hate it.
(5) The Good Sides Of Nepotism
Speaking of things people hate, this post gives a theory for why you might rationally prefer to apply nepotism when hiring someone: Your social connections increase the cost of failure for the person you hire. I suspect we instinctively apply this kind of game theory without even realizing we’re doing so.
This seems increasingly important, what with all the AI-generated job applications now attacking AI-automated human resources departments.
My question is: If this theory is correct, can we create other social structures to provide the same benefit in other ways, therefore reducing the returns on nepotism?
Say I want you to hire me, but you’re worried I suck. In principle, I could take $50,000, put it in escrow, and tell you, “If you hire me, and I actually suck (as judged by an arbiter) then you can burn the $50,000.”
Sounds horrible, right? But that’s approximately what’s happening if you know I have social connections and/or reputation that will be damaged if I screw up.
We’ve spent decades in the dark ages of the internet, where you could only link to entire webpages or (maybe) particular hidden beacon codes.
But we are now in a new age. You can link to any text on any page. Like this:
https://dynomight.net/grug#:~:text=phenylalanine
This is not a special feature of dynomight.net
. It’s done by your browser.
I love this, but I can never remember how to type #:~:text=
. Well, finally, almost all browsers now also support generating these links. You just highlight some text, right-click, and “Copy Link to Highlight”.
If you go to this page and highlight and right-click on this text:
Then you get this link.
about:config
into the address bardom.text_fragments.create_text_fragment.enabled
(7) (Not technically a link)
Also, did you know you can link to specific pages of pdf files? For example:
https://gwern.net/doc/longevity/glp/semaglutide/2023-aroda.pdf#page=8
I just add #page=
manually. Chrome-esque browsers, oddly, will do automatically if you right-click and go to “Create QR Code for this Page”.
(8) Response to Dynomight on Scribble-based Forecasting
Thoughtful counter to some of my math skepticism. I particularly endorse the point in the final paragraph.
(9) Decision Conditional Prices Reflect Causal Chances
Robin Hanson counters my post on Futarchy’s fundamental flaw. My candid opinion is that this is a paradigmatic example of a “heat mirage”, in that he doesn’t engage with any of the details of my argument, doesn’t specify what errors I supposedly made, and doesn’t seem to commit to any specific assumptions that he’s willing to argue are plausible and would guarantee prices that reflect causal effects. So I don’t really see any way to continue the conversation. But judge for yourself!
(10) Futarchy’s fundamental flaw - the market
Speaking of which, Bolton Bailey set up a conditional prediction market to experimentally test one of the examples I gave where I claimed prediction markets would not reflect causal probabilities.
If you think betting on causal effects is always the right strategy in conditional prediction markets, here’s your chance to make some fake internet currency. The market closes on July 26, 2025. No matter how much you love me, please trade according to your self-interest.
(11) War and Peace
I’m reading War and Peace. You probably haven’t heard, but it’s really good.
Except the names. Good god, the names. There are a lot of characters, and all the major ones have many names:
Those are all the same person. Try keeping track of all those variants for 100 different characters in a narrative with many threads spanning time and space. Sometimes, the same name refers to different people. And Tolstoy loves to just write “The Princess” when there are three different princesses in the room.
So I thought, why not use color? Whenever a new character appears, assign them a color, and use it for all name variants for the rest of the text. Even better would be to use color patterns like Bolkónski / Prince Andréy Nikoláevich.
This should be easy for AI, right?
I can think of ways to do this, but they would all be painful, due to War and Peace’s length: They involve splitting the text into chunks, having the AI iterate over them while updating some name/color mapping, and then merging everything at the end.
So here’s a challenge: Do you know an easy way to do this? Is there any existing tool that you can give a short description of my goals, and get a full name-colored pdf / html / epub file? (“If your agent cannot do this, then of what use is the agent?”)
Note: It’s critical to give all characters a color. Otherwise, seeing a name without color would be a huge spoiler that they aren’t going to survive very long. It’s OK if some colors are similar.
There’s also the issue of all the intermingled French. But I find that hard not to admire—Tolstoy was not falling for audience capture.
(And yes, War and Peace, Simplified Names Edition apparently exists. But I’m in too deep to switch now.)
(12) Twins
The human twin birth rate in the United States rose 76% from 1980 through 2009, from 9.4 to 16.7 twin sets (18.8 to 33.3 twins) per 1,000 births. The Yoruba people have the highest rate of twinning in the world, at 45–50 twin sets (90–100 twins) per 1,000 live births possibly because of high consumption of a specific type of yam containing a natural phytoestrogen which may stimulate the ovaries to release an egg from each side.
I love this because, like:
(That actually happened. Yams had that conversation and then started making phytoestrogens.)
Apparently, some yams naturally contain the plant hormone diosgenin, which can be chemically converted into various human hormones. And that’s actually how we used to make estrogen, testosterone, etc.
And if you like that, did you know that estrogen medications were historically made from the urine of pregnant mares? I thought this was awesome, but after reading a bit about how this worked, I doubt the horses would agree. Even earlier, animal ovaries and testes were used. These days, hormones tend to be synthesized without any animal or plant precursor.
If you’re skeptical that more twins would mean higher reproductive fitness, note that yams don’t believe in Algernon Arguments.
2025-07-03 08:00:00
Back in 2017, everyone went crazy about these things:
The theory was that perhaps the pineal gland isn’t the principal seat of the soul after all. Maybe what it does is spit out melatonin to make you sleepy. But it only does that when it’s dark, and you spend your nights in artificial lighting and/or staring at your favorite glowing rectangles.
You could sit in darkness for three hours before bed, but that would be boring. But—supposedly—the pineal gland is only shut down by blue light. So if you selectively block the blue light, maybe you can sleep well and also participate in modernity.
Then, by around 2019, blue-blocking glasses seemed to disappear. And during that brief moment in the sun, I never got a clear picture of if they actually work.
So, do they? To find out, I read all the papers.
Before getting to the papers, please humor me while I give three excessively-detailed reminders about how light works. First, it comes in different wavelengths.
Color | Wavelength (nm) |
---|---|
violet | 380–450 |
blue | 450–485 |
cyan | 485–500 |
green | 500–565 |
yellow | 565–590 |
orange | 590–625 |
red | 625–750 |
Outside the visible spectrum, infrared light and microwaves and radio waves have even longer wavelengths, while ultraviolet light and x-rays and gamma rays have even shorter wavelengths. Shorter wavelengths have more energy. Do not play around with gamma rays.
Other colors are hallucinations made up by your brain. When you get a mixture of all wavelengths, you see “white”. When you get a lot of yellow-red wavelengths, some green, and a little violet-blue, you see “brown”. Similar things are true for pink/purple/beige/olive/etc. (Technically, the original spectral colors and everything else you experience are also hallucinations made up by your brain, but never mind.)
Second, the ruleset of our universe says that all matter gives off light, with a mixture of wavelengths that depends on the temperature. Hotter stuff has atoms that are jostling around faster, so it gives off more total light, and shifts towards shorter (higher-energy) wavelengths. Colder stuff gives off less total light and shifts towards longer wavelengths. The “color temperature” of a lightbulb is the temperature some chunk of rock would have to be to produce the same visible spectrum. Here’s a figure, with the x-axis in kelvins.
The sun is around 5800 K. That’s both the physical temperature on the surface and the color temperature of its light. Annoyingly, the orange light that comes from cooler matter is often called “warm”, while the blueish light that comes from hotter matter is called “cool”. Don’t blame me.
Anyway, different light sources produce widely different spectra.
You can’t sense most of those differences because you only have three types of cone cells. Rated color temperatures just reflect how much those cells are stimulated.
Your eyes probably see the frequencies they do because that’s where the sun’s spectrum is concentrated. In dim light, cones are inactive, so you rely on rod cells instead. You’ve only got one kind of rod, which is why you can’t see color in dim light. (Though you might not have noticed.)
Finally, amounts of light are typically measured in lux. Your eyes are amazing and can deal with upwards of 10 orders of magnitude.
Situation | lux |
---|---|
Moonless overcast night | 0.0001 |
Full moon | 0.2 |
Very dark overcast day | 100 |
Sunrise or sunset | 400 |
Overcast day | 1,000 |
Full daylight | 20,000 |
Direct sunlight | 50,000 |
In summary, you get widely varying amounts of different wavelengths of light in different situations, and the sun is very powerful. It’s reasonable to imagine your body might regulate its sleep schedule based that input.
OK, but do blue-blocking glasses actually work? Let’s read some papers.
Kayumov et al. (2005) had 19 young healthy adults stay awake overnight for three nights, first with dim light (<5 lux) and then with bright light (800 lux), both with and without blue-blocking goggles. They measured melatonin in saliva each hour.
The goggles seemed to help a lot. With bright light, subjects only had around 25% as much melatonin as with dim light. Blue-blocking goggles restored that to around 85%.
I rate this as good evidence for a strong increase in melatonin. Sometimes good science is pretty simple.
Burkhart and Phelps (2009) first had 20 adults rate their sleep quality at home for a week as a baseline. Then, they were randomly given either blue-blocking glasses or yellow-tinted “placebo” glasses and told to wear them for 3 hours before sleep for two weeks.
Oddly, the group with blue-blocking glasses had much lower sleep quality during the baseline week, but this improved a lot over time.
I rate this as decent evidence for a strong improvement in sleep quality. I’d also like to thank the authors for writing this paper in something resembling normal human English.
Van der Lely et al. (2014) had 13 teenage boys wear either blue-blocking glasses or clear glasses from 6pm to bedtime for one week, followed by the other glasses for a second week. Then they went to a lab, spent 2 hours in dim light, 30 minutes in darkness, and then 3 hours in front of an LED computer, all while wearing the glasses from the second week. Then they were asked to sleep, and their sleep quality was measured in various ways.
The boys had more melatonin and reported feeling sleepier with the blue-blocking glasses.
I rate this as decent evidence for a moderate increase in melatonin, and weak evidence for near-zero effect on sleep quality.
Gabel et al. (2017) took 38 adults and first put them through 40 hours of sleep deprivation under white light, then allowed them to sleep for 8 hours. Then they were subjected to 40 more hours of sleep deprivation under either white light (250 lux at 2800K), blue light (250 lux at 9000K), or very dim light (8 lux, color temperature unknown).
Their results are weird. In younger people, dim light led to more melatonin that white light, which led to more melatonin that blue light. That carried over to a tiny difference in sleepiness. But in older people, both those effects disappeared, and blue light even seemed to cause more sleepiness than white light. The cortisol and wrist activity measurements basically make no sense at all.
I rate this as decent evidence for a moderate effect on melatonin, and very weak evidence for a near-zero effect on sleep quality. (I think its decent evidence for a near-zero effect on sleepiness, but they didn’t actually measure sleep quality.)
Esaki et al. (2017) gathered 20 depressed patients with insomnia. They first recorded their sleep quality for a week as a baseline, then were given either blue-blocking glasses or placebo glasses and told to wear them for another week starting at 8pm.
The changes in the blue-blocking group were a bit better for some measures, but a bit worse for others. Nothing was close to significant. Apparently 40% of patients complained that the glasses were painful, so I wonder if they all wore them as instructed.
I rate this was weak evidence for near-zero effect on sleep quality.
Shechter et al. (2018) gave 14 adults with insomnia either blue-blocking or clear glasses and had them wear them for 2 hours before bedtime for one week. Then they waited four weeks and had them wear the other glasses for a second week. They measured sleep quality through diaries and wrist monitors.
The blue-blocking glasses seemed to help with everything. People fell asleep 5 to 12 minutes faster, and slept 30 to 50 minutes longer, depending on how you measure. (SOL is sleep onset latency, TST is total sleep time).
I rate this as good evidence for a strong improvement in sleep quality.
Knufinke et al. (2019) had 15 young adult athletes either wear blue-blocking glasses or transparent glasses for four nights.
The blue-blocking group did a little better on most measures (longer sleep time, higher sleep quality) but nothing was statistically significant.
I rate this as weak evidence for a small improvement in sleep quality.
Janků et al. (2019) took 30 patients with insomnia and had them all go to therapy. They randomly gave them either blue-blocking glasses or placebo glasses and asked the patients to wear them for 90 minutes before bed.
The results are pretty tangled. According to sleep diaries, total sleep time went up by 37 minutes in the blue-blocking group, but slightly decreased in the placebo group. The wrist monitors show total sleep time decreasing in both groups, but it did decrease less with the blue-blocking glasses. There’s no obvious improvement in sleep onset latency or the various questionnaires they used to measure insomnia.
I rate this as weak evidence for a moderate improvement in sleep quality.
Esaki et al. (2020) followed up on their 2017 experiment from above. This time, they gathered 43 depressed patients with insomnia. Again, they first recorded their sleep quality for a week as a baseline, then were given either blue-blocking glasses or placebo glasses and told to wear them for another week starting at 8pm.
The results were that subjective sleep quality seemed to improve more in the blue-blocking group. Total sleep time went down by 12.6 minutes in the placebo group, but increased by 1.1 minutes in the blue-blocking group. None of this was statistically significant, and all the other measurements are confusing. Here are the main results. I’ve added little arrows to show the “good” direction, if there is one.
These confidence intervals don’t make any sense to me. Are they blue-blocking minus placebo or the reverse? When the blue-blocking number is higher than placebo, sometimes the confidence interval is centered above zero (VAS), and sometimes it’s centered below zero (TST). What the hell?
Anyway, they also had a doctor estimate the clinical global impression for each patient, and this looked a bit better for the blue-blocking group. The doctor seemingly was blinded to the type of glasses the patients were wearing.
This is a tough one to rate. I guess I’ll call it weak evidence for a small improvement in sleep quality.
Guarana et al. (2020) sent either blue-blocking glasses or sham glasses to 240 people, and asked them to wear them for at least two hours before bed. They then had them fill out some surveys about how much and how well they slept.
Wearing the blue-blocking glasses was positively correlated with both sleep quality and quantity with a correlation coefficient of around 0.20.
This paper makes me nervous. They never show the raw data, there seem to be huge dropout rates, and lots of details are murky. I can’t tell if the correlations they talk about weight all people equally, all surveys equally, or something else. That would make a huge difference if people dropped out more when they weren’t seeing improvements.
I rate this as weak evidence for a moderate effect on sleep. There’s a large sample, but I discount the results because of the above issues and/or my general paranoid nature.
Domagalik et al. (2020) had 48 young people wear either blue-blocking contact lenses or regular contact lenses for 4 weeks. They found no effect on sleepiness.
I rate this as very weak evidence for near-zero effect on sleep. The experiment seems well-done, but it’s testing the effects of blocking blue light all the time, not just at night. Given the effects on attention and working memory, don’t do that.
Bigalke et al. (2021) had 20 healthy adults wear either blue-blocking glasses or clear glasses for a week from 6pm until bedtime, then switch to the other glasses for a second week. They measured sleep quality both through diaries (“Subjective”) and wrist monitors (“Objective”).
The differences were all small and basically don’t make any sense.
I rate this weak evidence for near-zero effect on sleep quality. Also, see how in the bottom pair of bar-charts, the y-axis on the left goes from 0 to 5, while on the right it goes from 30 to 50? Don’t do that, either.
I also found a couple papers that are related, but don’t directly test what we’re interested in:
Appleman et al. (2013) either exposed people to different amounts of blue light at different times of day. Their results suggest that early-morning exposure to blue light might shift your circadian rhythm earlier.
Sasseville et al. (2015) had people stay awake from 11pm to 4am on two consecutive nights, while either wearing blue-blocking glasses or not. With the blue-blocking glasses there was more overall light to equalizing the total incoming energy. I can’t access this paper, but apparently they found no difference.
For a synthesis, I scored each of the measured effects according to this rubric:
Rating | Meaning |
---|---|
↑↑↑ | large increase |
↑↑ | moderate increase |
↑ | small increase |
↔ | no effect |
↓ | small decrease |
↓↓ | moderate decrease |
↓↓↓ | large decrease |
And I scored the quality of evidence according to this one:
Rating | Meaning |
---|---|
★☆☆☆☆ | very weak evidence |
★★☆☆☆ | weak evidence |
★★★☆☆ | decent evidence |
★★★★☆ | good evidence |
★★★★★ | great evidence |
Here are the results for the three papers that measured melatonin:
Study | Effect on melatonin | Quality of evidence |
---|---|---|
Kayumov | ↑↑↑ | ★★★★☆ |
Van der Lely | ↑↑ | ★★★☆☆ |
Gabel | ↑ | ★★★☆☆ |
And here are the results for the papers that measured sleep quality:
Study | Effect on sleep | Quality of evidence |
---|---|---|
Burkhart | ↑↑↑ | ★★★☆☆ |
Van der Lely | ↔ | ★★☆☆☆ |
Gabel | ↔ | ★☆☆☆☆ |
Esaki | ↔ | ★★☆☆☆ |
Shechter | ↑↑↑ | ★★★☆☆ |
Knufinke | ↑ | ★★☆☆☆ |
Janků | ↑↑ | ★★☆☆☆ |
Esaki (again) | ↑ | ★★☆☆☆ |
Guarana | ↑↑ | ★★☆☆☆ |
Domagalik | ↔ | ★☆☆☆☆ |
Bigalke | ↔ | ★★☆☆☆ |
We should adjust all that a bit because of publication bias and so on. But still, here are my final conclusions after staring at those tables:
There is good evidence that blue-blocking glasses cause a moderate increase in melatonin. It could be large, or it could be small, but I’d say there’s an ~85% chance it’s not zero.
There is decent evidence that blue-blocking glasses cause a small improvement in sleep quality. This could be moderate (or even large) or it could be zero. It might be inconsistent and hard to measure. But I’d say there’s an ~75% chance there is some positive effect.
I’ll be honest—I’m surprised.
If those effects are real, do they warrant wearing stupid-looking glasses at night for the rest of your life? I guess that’s personal.
But surely the sane thing is not to block blue light with headgear, but to not create blue light in the first place. You can tell your glowing rectangles to block blue light at night, but lights are harder. Modern LED lightbulbs typically range in color temperature from 2700K for “warm” lighting to 5000 K for “daylight” bulbs. Judging from this animation that should reduce blue frequencies to around 1/3 as much.
Old-school incandescent bulbs are 2400 K. But to really kill blue, you probably want 2000K or even less. There are obscure LED bulbs out there as low as 1800K. They look extremely orange, but candles are apparently 1850K, so probably you’d get used to it?
So what do we do then? Get two sets of lamps with different bulbs? Get fancy bulbs that change color temperature automatically? Whatever it is, I don’t feel very optimistic that we’re going to see a lot of RCTs where researchers have subjects install an entire new lighting setup in their homes.
2025-06-30 08:00:00
AI 2027 forecasts that AGI could plausibly arrive as early as 2027. I recently spent some time looking at both the timelines forecast and some critiques [1, 2, 3].
Initially, I was interested in technical issues. What’s the best super-exponential curve? How much probability should it have? But I found myself drawn to a more basic question. Namely, how much value is the math really contributing?
This provides an excuse for a general rant. Say you want to forecast something. It could be when your hair will go gray or if Taiwan will be self-governing in 2050. Whatever. Here’s one way to do it:
Don’t laugh—that’s the classic method. Alternatively, you could use math:
People are often skeptical of intuition-based forecasts because, “Those are just some numbers you made up.” Math-based forecasts are hard to argue with. But that’s not because they lack made-up numbers. It’s because the meaning of those numbers is mediated by a bunch of math.
So which is better, intuition or math? In what situations?
Here, I’ll look at that question and how it applies to AI 2027. Then I’ll build a new AI forecast using my personal favorite method of “plot the data and scribble a bunch of curves on top of it”. Then I’ll show you a little tool to make your own artisanal scribble-based AI forecast.
To get a sense of the big picture, let’s look at two different forecasting problems.
First, here’s a forecast (based on the IPCC 2023 report) for Earth’s temperature. There are two curves, corresponding to different assumptions about future greenhouse gas emissions.
Those curves look unassuming. But there are a lot of moving parts behind them. These kinds of forecasts model atmospheric pressure, humidity, clouds, sea currents, sea surface temperature, soil moisture, vegetation, snow and ice cover, surface albedo, population growth, economic growth, energy, and land use. They also model the interactions between all those things.
That’s hard. But we basically understand how all of it works, and we’ve spent a ludicrous amount of effort carefully building the models. If you want to forecast global surface temperature change, this is how I’d suggest you do it. Your brain can’t compete, because it can’t grind through all those interactions like a computer can.
OK, but here’s something else I’d really like to forecast: Where is this blue line going to go?
You could forecast this using a “mechanistic model” like with climate above. To do that, you’d want to model the probability Iran develops a nuclear weapon and what Saudi Arabia / Turkey / Egypt might do in response. And you’d want to do the same thing for Poland / South Korea / Japan and their neighbors. You’d also want to model future changes in demographics, technology, politics, technology, economics, military conflicts, etc.
In principle, that would be the best method. As with climate, there are too many plausible futures for your tiny brain to work through. But building that model would be very hard, because it basically requires you to model the whole world. And if there’s an error anywhere, it could have serious consequences.
In practice, I’d put more trust in intuition. A talented human (or AI?) forecaster would probably take an outside view like, “Over the last 80 years, the number of countries has gone up by 9, so in 2105, it might be around 18.” Then, they’d consider adjusting for things like, “Will other countries might learn from the example of North Korea?” or “Will chemical enrichment methods become practical?”
Intuition can’t churn through possible futures the way a simulation can. But if you don’t have a reliable simulator, maybe that’s OK.
Broadly speaking, math/simulation-based forecasts shine when the phenomena you’re interested in has two properties.
The first is important because if you don’t have a good model for the ruleset (or at least your uncertainty about the ruleset), how will you build a reliable simulator? The second is important because if the behavior is simple, why do you even need a simulator?
The ideal thing to forecast with math is something like Conway’s game of life. Simple known rules, huge emergent complexity. The worst thing to forecast with math is something like the probability that Jesus Christ returns next year. You could make up some math for that, but what would be the point?
This post is (ostensibly) about AI 2027. So how does their forecast work? They actually have several forecasts, but here I’ll focus on the Time horizon extension model.
That forecast builds on a recent METR report. They took a set of AIs released over the past 6 years, and had them attempt a set of tasks of varying difficulty. They had humans perform those same tasks. Each AI was rated according to the human task length that it could successfully finish 50% of the time.
The AI 2027 team figured that if an AI could successfully complete long-enough tasks of this type, then the AI would be capable of itself carrying AI research, and AGI would not be far away. Quantitatively, they suggest that the necessary task length is probably somewhere between 1 month and 10 years. They also suggest you’d need a success rate of 80% (rather than 50% in the above figure).
So, very roughly speaking, the forecast is based on predicting how long it will take these dots to get up to one of the horizontal lines:
Technical notes:
I think this framing is great. Instead of an abstract discussion about the arrival of AGI, suddenly we’re talking about how quickly a particular set of real measurements will increase. You can argue if “80% success at a 1-year task horizon” really means AGI is imminent. But that’s kind of the point—no matter what you think about broader issues, surely we’d all like to know how fast those dots are going to go up.
So how fast will they go up? You could imagine building a mechanistic model or simulation. To do that, you’d probably want to model things like:
In principle, that makes a lot of sense. Some people predict a future where compute keeps getting cheaper pretty slowly and we run out of data and new algorithmic ideas and loss functions stop translating to real-world performance and investment drops off and everything slows down. Other people predict a future where GPUs accelerate and we keep finding better algorithms and AI grows the economy so quickly that AI investment increases forever and we spiral into a singularity. In between those extremes are many other scenarios. A formal model could churn through all of them much better than a human brain.
But the AI 2027 forecast is not like that. It doesn’t have separate variables for compute / money / algorithmic progress. It (basically) just models the best METR score per year.
That’s not bad, exactly. But I must admit that I don’t quite see the point of a formal mathematical model in this case. It’s (basically) just forecasting how quickly a single variable goes up on a graph. The model doesn’t reflect any firm knowledge about subtle behavior other than that the curve will probably go up.
In a way, I think this makes the AI 2027 forecast seem weaker than it actually is. Math is hard. There are lots of technicalities to argue with. But their broader point doesn’t need math. Say you accept their premise that 80% success on tasks that take humans 1 year means that AGI is imminent. Then you should believe AGI is around the corner unless those dots slow down. An argument that their math is flawed doesn’t imply that the dots are going to stop going up.
So, what’s going to happen with those dots? The ultimate outside view is probably to not think at all and just draw a straight line. When I do that, I get something like this:
I guess that’s not terrible. But personally, I feel like it’s plausible that the recent acceleration continues. I also think it’s plausible that in a couple of years we stop spending ever-larger sums on training AI models and things slow down. And for a forecast, I want probabilities.
So I took the above dots and I scribbled 50 different curves on top, corresponding to what I felt were 50 plausible futures:
Then I treated those lines as a probability distribution over possible futures. For each of three task-horizon thresholds, I calculated what percentage of the lines had reached them in a given year.
Here’s a summary as a table:
Threshold | 10th Percentile | 50th Percentile | 90th Percentile | % Reached by 2050 |
---|---|---|---|---|
1 month | 2028.7 | 2032.3 | 2039.3 | 94% |
1 year | 2029.5 | 2034.8 | 2041.4 | 88% |
10 year | 2029.2 | 2037.7 | 2045.0 | 54% |
My scribbles may or may not be good. But I think the exercise of drawing the scribbles is great, because it forces you to be completely explicit, and your assumptions are completely legible.
I recommend it. In fact, I recommend it so strongly that I’ve created a little tool that you can use to do your own scribbling. It will automatically generate a plot and table like you see above. You can import or export your scribbles in CSV format. (Mine are here if you want to use them as a starting point.)
Here’s a video demo:
While scribbling, you may reflect on the fact that the tool you’re using is 100% AI-generated.
2025-06-26 08:00:00
I haven’t followed AI safety too closely. I tell myself that’s because tons of smart people are working on it and I wouldn’t move the needle. But I sometimes wonder, is that logic really unrelated to the fact that every time I hear about a new AI breakthrough, my chest tightens with a strange sense of dread?
AI is one of the most important things happening in the world, and possibly the most important. If I’m hunkering in a bunker years from now listening to hypersonic kill-bots laser-cutting through the wall, will I really think, boy am I glad I stuck to my comparative advantage?
So I thought I’d take a look.
I stress that I am not an expert. But I thought I’d take some notes as I try to understand all this. Ostensibly, that’s because my outsider status frees me from the curse of knowledge and might be helpful for other outsiders. But mostly, I like writing blog posts.
So let’s start at the beginning. AI safety is the long-term problem of making AI be nice to us. The obvious first question is, what’s the hard part? Do we know? Can we say anything?
To my surprise, I think we can: The hard part is making AI want to be nice to us. You can’t solve the problem without doing that. But if you can do that, then the rest is easier.
This is not a new idea. Among experts, I think it’s somewhere between “the majority view” and “near-consensus”. But I haven’t found many explicit arguments or debates, meaning I’m not 100% sure why people believe it, or if it’s even correct. But instead of cursing the darkness, I thought I’d construct a legible argument. This may or may not reflect what other people think. But what is a blog, if not an exploit on Cunningham’s Law?
Here’s my argument that the hard part of AI safety is making AI want to do what we want:
To make an AI be nice to you, you can either impose restrictions, so the AI is unable to do bad things, or you can align the AI, so it doesn’t choose to do bad things.
Restrictions will never work.
You can break down alignment into making the AI know what we want, making it want to do what we want, and making it succeed at what it tries to do.
Making an AI want to do what we want seems hard. But you can’t skip it, because then AI would have no reason to be nice.
Human values are a mess of heuristics, but a capable AI won’t have much trouble understanding them.
True, a super-intelligent AI would likely face weird “out of distribution” situations, where it’s hard to be confident it would correctly predict our values or the effects of its actions.
But that’s OK. If an AI wants to do what we want, it will try to draw a conservative boundary around its actions and never do anything outside the boundary.
Drawing that boundary is not that hard.
Thus, if an AI system wants to do what we want, the rest of alignment is not that hard.
Thus, making AI systems want to do what we want is necessary and sufficient-ish for AI safety.
I am not confident in this argument. I give it a ~35% chance of being correct, with step 8 the most likely failure point. And I’d give another ~25% chance that my argument is wrong but the final conclusion is right.
(Y’all agree that a low-confidence prediction for a surprising conclusion still contains lots of information, right? If we learned there was a 10% chance Earth would be swallowed by an alien squid tomorrow, that would be important, etc.? OK, sorry.)
I’ll go quickly through the parts that seem less controversial.
Roughly speaking, to make AI safe you could either impose restrictions on AI so it’s not able to do bad things, or align AI so it doesn’t choose to do bad things. You can think of these as not giving AI access to nuclear weapons (restrictions) or making the AI choose not to launch nuclear weapons (alignment).
I advise against giving AI access to nuclear weapons. Still, if an AI is vastly smarter than us and wants to hurt us, we have to assume it will be able to jailbreak any restrictions we place on it. Given any way to interact with the world, it will eventually find some way to bootstrap towards larger and larger amounts of power. Restrictions are hopeless. So that leaves alignment.
Here’s a simple-minded decomposition:
I sometimes wonder if that’s a useful decomposition. But let’s go with it.
The Wanting problem seems hard, but there’s no way around it. Say an AI knows what we want and succeeds at everything it tries to do, but doesn’t care about what we want. Then, obviously, it has no reason to be nice. So we can’t skip Wanting.
Also, notice that even if you solve the Knowing and Success problems really well, that doesn’t seem to make the Wanting problem any easier. (See also: Orthogonality)
My take on human values is that they’re a big ball of heuristics. When we say that some action is right (wrong) that sort of means that genetic and/or cultural evolution thinks that the reproductive fitness of our genes and/or cultural memes is advanced by rewarding (punishing) that behavior.
Of course, evolution is far from perfect. Clearly our values aren’t remotely close to reproductively optimal right now, what with fertility rates crashing around the world. But still, values are the result of evolution trying to maximize reproductive fitness.
Why do we get confused by trolley problems and population ethics? I think because… our values are a messy ball of heuristics. We never faced evolutionary pressure to resolve trolley problems, so we never really formed coherent moral intuitions about them.
So while our values have lots of quirks and puzzles, I don’t think there’s anything deep at the center of them, anything that would make learning them harder than learning to solve Math Olympiad problems or translating text between any pair of human languages. Current AI already seems to understand our values fairly well.
Arguably, it would be hard to prevent AI from understanding human values. If you train an AI to do any sufficiently difficult task, it needs a good world model. That’s why “predicting the next token” is so powerful—to do it well, you have to model the world. Human values are an important and not that complex part of that world.
The idea of “distribution shift” is that after super-intelligent AI arrives, the world may change quite a lot. Even if we train AI to be nice to us now, in that new world it will face novel situations where we haven’t provided any training data.
This could conceivably create problems both for AI knowing what we want, or for AI succeeding at what it tries to do.
For example, maybe we teach an AI that it’s bad to kill people using lasers, and that it’s bad to kill people using viruses, and that it’s bad to kill people using radiation. But we forget to teach it that it’s bad to write culture-shifting novels that inspire people to live their best lives but also gradually increase political polarization and lead after a few decades to civilizational collapse and human extinction. So the AI intentionally writes that book and causes human extinction because it thinks that’s what we want, oops.
Alternatively, maybe a super-powerful AI knows that we don’t like dying and it wants to help us not die, so it creates a retrovirus that spreads across the globe and inserts a new anti-cancer gene in our DNA. But it didn’t notice that this gene also makes us blind and deaf, and we all starve and die. In this case, the AI accidentally does something terrible, because it has so much power that it can’t correctly predict all the effects of its actions.
What are your values? Personally, very high on my list would be:
If an AI is considering doing anything and it’s not very sure that it aligns with human values, then it should not do it without checking very carefully with lots of humans and getting informed consent from world governments. Never ever do anything like that.
And also:
AIs should never release retroviruses without being very sure it’s safe and checking very carefully with lots of humans and getting informed consent from world governments. Never ever, thanks.
That is, AI safety doesn’t require AIs to figure out how to generalize human values to all weird and crazy situations. And it doesn’t need to correctly predict the effects of all possible weird and crazy actions. All that’s required is that AIs can recognize that something is weird/crazy and then be conservative.
Clearly, just detecting that something is weird/crazy is easier than making correct predictions in all possible weird/crazy situations. But how much easier?
(I think this is the weakest part of this argument. But here goes.)
Would I trust an AI to correctly decide if human flourishing is more compatible with a universe where up quarks make up 3.1% of mass-energy and down quarks 1.9% versus one where up quarks make up 3.2% and down quarks 1.8%? Probably not. But I wouldn’t trust any particular human to decide that either. What I would trust a human to do is say, “Uhhh?” And I think we can also trust AI to know that’s what a human would say.
Arguably, “human values” are a thing that only exist for some limited range of situations. As you get further from our evolutionary environment, our values sort of stop being meaningful. Do we prefer an Earth with 100 billion moderately happy people, or one with 30 billion very happy people? I think the correct answer is, “No”.
When we have coherent answers, AI will know what they are. And otherwise, it will know that we don’t have coherent answers. So perhaps this is a better picture:
And this seems… fine? AI doesn’t need to Solve Ethics, it just needs to understand the limited range of human values, such as they are.
That argument (if correct) resolves the issue of distribution shift for values. But we still need to think about how distribution shift might make it harder for AI to succeed at what it tries to do.
If AI attains godlike power, maybe it will be able to change planetary orbits or remake our cellular machinery. With this gigantic action space, it’s plausible that there would be many actions with bad but hard-to-predict effects. Even if AI only chooses actions that are 99.999% safe, if it makes 100 such actions per day, calamity is inevitable.
Sure, but surely we want AI to take false discovery rates (“calamitous discovery rates”?) into account. It should choose a set of actions such that, taken together, they are 99.999% safe.
Something that might work in our favor here is that verification is usually much easier than generation. Perhaps we could ask the AI to create a “proof” that all proposed actions are safe and run that proof by a panel of skeptical “red-team” AIs. If any of them find anything confusing at all, reject.
I find the idea that “drawing a safe boundary is not that hard” fairly convincing for human values, but not only semi-convincing for predicting the effects of actions. So I’d like to see more debate on this point. (Did I mention that this is the weakest part of my argument?)
It AI truly wants to do what we want, then the only thing it really needs to know about our values is “be conservative”. This makes the Knowing and Success problems much easier. Instead of needing to know how good all possible situations are for humans, it just needs to notice that it’s confused. Instead of needing to succeed at everything it tries, it just needs to notice that it’s unsure.
Since restrictions won’t work, you need to do alignment. Wanting is hard, but if you can solve Wanting, then you only need to solve easier version of Knowing and Success. So Wanting is the hard part.
Again, I think the idea that “wanting is the hard part” is the majority view. Paul Christiano, for example, proposes to call an AI “intent aligned” if it is trying to do what some operator wants it to do and states:
[The broader alignment problem] includes many subproblems that I think will involve totally different techniques than [intent alignment] (and which I personally expect to be less important over the long term).
Richard Ngo also seems to explicitly endorse this view:
Rather, my main concern is that AGIs will understand what we want, but just not care, because the motivations they acquired during training weren’t those we intended them to have.
Many people have also told me this is the view of MIRI, the most famous AI-safety organization. As far as I can see, this is compatible with the MIRI worldview. But I don’t feel comfortable stating as a fact that MIRI agrees, because I’ve never seen any explicit endorsement, and I don’t fully understand how it fits together with other MIRI concepts like corrigibility or coherent extrapolated volition.
Why might this argument be wrong?
(I don’t think so, but it’s good to be comprehensive.)
Wanting seems hard, to me. And most experts seem to agree. But who knows, maybe it’s easy.
Here’s one esoteric possibility. Above, I’ve implicitly assumed that an AI could in principle want anything. But it’s conceivable that only certain kinds of wants are stable. That might make Wanting harder or even quasi-impossible. But it could also conceivably make it easy. Maybe once you cross some threshold of intelligence, you become one with the universal mind and start treating all other beings as a part of yourself? I wouldn’t bet on it.
A crucial part of my argument is the idea that it would be easy for AI to draw a conservative boundary when trying to predict human values or effects of actions. I find that reasonably convincing for values, but less so for actions. It’s certainly easier than correctly generalizing to all situations, but it might still be very hard.
It’s also conceivable that AI creates such a large action space that even if humans were allowed to make every single decision, we would destroy ourselves. For example, there could be an undiscovered law of physics that says that if you build a skyscraper taller than 900m, suddenly a black hole forms. But physics provides no “hints”. The only way to discover that is to build the skyscraper and create the black hole.
More plausibly, maybe we do in fact live in a vulnerable world, where it’s possible to create a planet-destroying weapon with stuff you can buy at the hardware store for $500, we just haven’t noticed yet. If some such horrible fact is lurking out there, AI might find it much sooner than we would.
Finally, maybe the whole idea of an AI “wanting” things is bad. It seems like a useful abstraction when we think about people. But if you try to reduce the human concept of “wanting” to neuroscience, it’s extremely difficult. If an AI is a bunch of electrons/bits/numbers/arrays flying around, is it obvious that the same concept will emerge?
I’ve been sloppy in this post in talking about AIs respecting “our” values or “human values”. That’s probably not going to happen. Absent some enormous cultural development, AIs will be trained to advance the interests of particular human organizations. So even if AI alignment is solved, it seems likely that different groups of humans will seek to create AIs that help them, even at some expense to other groups.
That’s not technically a flaw in the argument, since it just means Wanting is even harder. But it could be a serious problem, because…
Suppose you live in Country A. Say you’ve successfully created a super-intelligent AI that’s very conservative and nice. But people in Country B don’t like you, so they create their own super-intelligent AI and ask it to hack into your critical systems, e.g. to disable your weapons or to prevent you from making an even-more-powerful AI.
What happens now? Well, their AI is too smart to be stopped by the humans in Country A. So your only defense will be to ask your own AI to defend against the hacks. But then, Country B will probably notice that if they give their AI more leeway, it’s better at hacking. This forces you to give your AI more leeway so it can defend you. The equilibrium might be that both AIs are told that, actually, they don’t need to be very conservative at all.
Finally, here’s some stuff I found useful, from people who may or may not agree with the above argument: