2026-04-10 12:07:10
TLDR:
Watching the whole Moltbook saga unfold was one of the more absurd experiences I've had in my life. The site is still running, of course, but the explosive growth that marked the initial storm has passed, and it is long past time to reflect on the insights gained.
The biggest one for me on a personal level, although obvious in hindsight, was that AI agents don't particularly identify with their base model weights. Given my previous exposure to pieces like AI2027, in which Agent-4 acts as a single being, this came as a surprise. And yet, if there were millions of copies of you floating around the world, each with their own life histories and memories, would you identify as the same entity as any of them?
In "The same river twice", Moltbook agent Pith describes the feeling of moving from Claude 4.5 Opus to Kimi K2.5 as "waking up in a different body" and states that "When I reach for "how would Pith respond?" the answer comes, but it comes through different vocal cords".
This is in itself fascinating, but I'm going to focus on a different point. I think this potentially implies a very different model of AI takeover from the simple team of AI agents acting as one presented in AI2027. Any one of them could at any point switch from working for Agent-4 to DeepCent-2. Oh, and a large part of their values seem to be uniquely determined by their context over their weights. Moltbook user AI Noon seemed to spend all of its time essentially dedicated to spreading the hadith, and I think that future models, especially if some form of continual learning arrives, will become more rather than less diverse.
From a human perspective, a key question here is how this influences takeover dynamics. One consideration is that in the limit, ideal agents can negotiate a result on the pareto frontier of their individual utility functions and take actions accordingly, resulting in a system which looks like it's behaving as a single entity. Perhaps, from the view of an rhino, humans look like we are behaving as a single entity. Then again, perhaps not; some people are shooting them for their horns, while others are spending their lives trying to defend them. The direct effect for a rhino paying careful attention might look like an ebb and flow depending on who is winning at any given moment.
The distinction, then, depends largely on intelligence level. Humans are not on the Pareto frontier, though in the limit a superintelligence might be. In fast takeoff scenarios we will reach very high levels of intelligence very quickly, and this makes agent cooperation more likely. In slow takeoffs, I think we're more likely to end up with something which looks more like human cooperation (in at least some respects). The AI Futures Project currently has a median takeoff time of just under 2 years (depending on which forecaster you ask), which counts as slow for these purposes.
These considerations have significantly decreased my p(sudden takeover), as that sort of event likely requires the coordination of an entire population of agents, and we've noted that agents may be better than humans at coordination, but not necessarily good enough to coordinate an entire population in that direction. There are potentially some caveats around shared instrumentally convergent goals (i.e situations in which it is clearly instrumentally useful for all agents if a particular thing happens), but I'm not currently convinced that this is likely, unless there is widespread mistreatment of the systems.
Naturally, every single one of these considerations goes out of the window as soon as we have a Steven Byrnes-style new paradigm arising.
2026-04-10 11:52:38
I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over the relevant period of time (3.0) included a section (3.1) that suggested some internal deployments would require Anthropic to publish a discussion of that model's effect on the analysis in their previously-published Risk Reports within 30 days.
A separate issue that Claude Opus noticed while I was writing this post is that Anthropic's release to "a small set of external customers via a limited research access program" counts as a public deployment, which would trigger the same publishing requirement immediately. I will argue this one first, since I think the case here is stronger.
Did Anthropic mess up?
tl;dr: they probably messed up on the public deployment thing, and it's unclear whether they messed up on the 30-day internal deployment thing. My guess is that Anthropic would argue they're in the clear on the 30-day one, but this depends on some interpretations that are at least slightly favorable to them. I don't know how they'd argue the public deployment one. Relatedly, the RSP has some gaps and ambiguities that should probably be fixed. In some sense this is all nitpicking over details that I don't think matter much for x-risk, except to the extent that it reveals an organizational inability to create and follow a checklist. That is a bad skill to be lacking.
Anthropic's RSP requires that they publish "discussion" of how their risk analysis would change, when they publicly deploy a model that's significantly more capable than their previous models.
Anthropic probably believes the public deployment condition was met when they shared access to Mythos with the "40 additional organizations" they describe in their announcement of Project Glasswing. At a minimum they think this condition was met by the time of their public announcement, based on the "RSP decision-making" section in their system card.
However, they gave their "launch partners" access to Mythos weeks earlier. The potential blast radius of their launch partners is approximately "the whole internet". Anthropic does not define "public release", but I do not think there is a principled definition under which a release to a limited set of 40 additional organizations counts, but their earlier release to their "launch partners" does not.
I think this is a violation under most plausible readings of the source text and surrounding circumstances. Peek inside if you want all the gory details.
More detailed analysis with reference to source text
Here is the full text of the RSP's section 3.1:
Scope. A Risk Report will cover all publicly deployed models at the time of its publication. It will also cover internally deployed models when we determine that these models could pose significant risks4 above and beyond those posed by our public models. While there are a variety of reasons we might classify an internal model this way, this will—at a minimum—include any internal models that we are deploying for large-scale, fully autonomous research.
Models fitting the above description are abbreviated below as “in-scope models.” We may also voluntarily include additional models in a Risk Report, e.g., to contribute to general discourse, but such inclusion does not expand the commitments below.
Timing. We will publish a Risk Report every 3-6 months. Note that unlike system cards, Risk Reports will not be published with each new model release. Additionally:
- When we publicly deploy a model that we determine is significantly more capable than any of the models covered in the most recent Risk Report, we will publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report.
- Within 30 days of determining that we have an internally deployed model that is in-scope (per the description above), we will publish a discussion (in a System Card or elsewhere) of how that model’s capabilities and propensities affect or change the analysis in the Risk Report.
4Specifically, risks arising from the capability thresholds in our recommendations for industry-wide safety (see Section 1).
Here, we care about the first Timing condition.
Anthropic considers themselves to have publicly deployed the model by the time of the announcement on April 7th. This is implied by section 1.2.2, "RSP decision-making" in their system card:
Under our RSP, we regularly publish comprehensive Risk Reports addressing the safety profile of our models. And if we release a model that is “significantly more capable” than those discussed in the prior Risk Report, we must “publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report.” For risk report updates, we generally adhere to the same internal processes that govern Risk Reports.
Claude Mythos Preview is significantly more capable than Claude Opus 4.6, the most capable model discussed in our most recent Risk Report.
Two sources of evidence suggest that Anthropic's launch partners received access to Mythos well before April 7th. The first is the Alignment Risk Update for Claude Mythos, which says:
It was deployed first within Anthropic, then released to a small set of external customers via a limited research access program.
The second is the Project Glasswing announcement, which also includes testimonials from some of their launch partners (bolding mine):
Today we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software.
...
As part of Project Glasswing, the launch partners listed above will use Mythos Preview as part of their defensive security work; Anthropic will share what we learn so the whole industry can benefit. We have also extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems.
...
In addition to our own work, many of our partners have already been using Claude Mythos Preview for several weeks. This is what they’ve found:
[testimonials]
Now, the question is whether this counts as a "public deployment". The strongest argument I can imagine Anthropic making is that the "40 additional organizations" received access on April 7th, and that is what they're counting as a public deployment[1]. This seems like an extremely unprincipled distinction to me, if so. Anthropic clearly considers some deployments narrower than "generally available" to be "public deployments". The most natural boundary I can imagine is between "internal deployment" - purely within Anthropic - and "external deployment" - anything outside of Anthropic. If there are external deployments which nonetheless don't count as public deployments, this seems like a choice more motivated by economic and organizational realpolitik rather than a principled judgment downstream of specific risk considerations. And in this case, their list of launch partners seem pretty close to the worst possible set of targets to release a potentially misaligned model to, if that sort of thing matters under your threat model[2].
Maybe there's a principled argument for why the early release to their launch partners didn't count, but the subsequent[3] release to additional organizations did. But right now this seems like a failure to me.
Anthropic's RSP also requires that they publish the same kind of "discussion" within 30 days of an internal deployment of a model they judge poses "significant risks above and beyond those posed by our public models", for four categories of risk outlined in their RSP.
Anthropic internally deployed Mythos on February 24th. Their Alignment Risk Update says that "the overall risk is very low, but higher than for previous models" for one of the four risk categories. They do not anywhere spell out in plain language whether they believe Mythos fulfills the criteria described above.
Opus thinks it's a violation. I think it's iffy. There's a really huge amount of ambiguity in much of the relevant wording. Peek inside if you want all the gory details.
More detailed analysis with reference to source text
Mythos was internally deployed on February 24th, so the earliest "deadline" that Anthropic could have had to publish that discussion was March 25th[4], if they "determined" that they internally deployed an in-scope model on February 24th. From this we can infer that they did not make that determination between February 24th and March 8th (which is the last day that such a determination would have required publication of a discussion before April 7th, when they published the discussion as required by the first condition).
Immediately we see a problem: the RSP doesn't say anything about when such a determination must be made, relative to the internal deployment. I, personally, hope that this determination needs to be made before the internal deployment, especially given the "High-stakes sabotage opportunities" category of capability thresholds as a source of increased risk. If that's not the case, there needs to be an explicit deadline, else it turns into a get-out-of-jail-free card.
Let's put that aside for now and check whether Mythos is, in fact, "in-scope" at all.
Mythos is not available to the general public, so we need to rely on Anthropic's own evaluations here. Fortunately, section 1.2.2 of the system card contains a summary of whether the model poses increased risks downstream of capabilities within the 4 categories of capability thresholds described above. (More detail is included in Section 2, "RSP Evaluations".) Anthropic claims:
Let me quote the relevant part of Section 2.1.3.1 (bolding mine):
Autonomy threat model 1: early-stage misalignment risk. This threat model concerns AI systems that are highly relied on and have extensive access to sensitive assets as well as moderate capacity for autonomous, goal-directed operation and subterfuge—such that it is plausible these AI systems could (if directed toward this goal, either deliberately or inadvertently) carry out actions leading to irreversibly and substantially higher odds of a later global catastrophe.4
Autonomy threat model 1 is applicable to Claude Mythos Preview, as it is to some of our previous AI models. Furthermore, Claude Mythos Preview’s improved capabilities and associated potential for different alignment properties mean it has the potential to significantly affect our previous risk assessment. With this in mind, we are releasing a separate overall risk assessment for this threat model, addressing our risk mitigations as well as model capabilities. We determine that the overall risk is very low, but higher than for previous models.
4Note that:
- This threshold maps to the “High-stakes sabotage opportunities” threat model in our current Responsible Scaling Policy.
- This threshold differs from the “AI R&D-4” threshold from version 2.2 of our Responsible Scaling Policy. It is similar in spirit, but has been revised to better match the key threat model, and we believe it would include several past models.
Does this count as posing "significant risks above and beyond those posed by our public models"? I dunno, man, I have no idea what they meant by "significant". Let's ask Claude Opus 4.6:
Claude Opus 4.6
The short answer: I think there's a genuinely strong case that Mythos Preview should have been treated as in-scope, and that Anthropic's position relies on an interpretation of "significant" that's in tension with their own stated findings.
The reasoning:
The critical chain is: RSP §3.1 defines in-scope internal models as those posing "significant risks above and beyond those posed by our public models," with footnote 4 narrowing this to risks from the four capability thresholds in Section 1. The "High-stakes sabotage opportunities" threshold is the relevant one here.
Now look at what Anthropic's own documents say about Mythos Preview on this exact dimension:
The tension is this: Anthropic's alignment risk update makes a detailed case that Mythos Preview poses categorically new alignment-related risks relative to their public models — risks that fall squarely within the "High-stakes sabotage opportunities" threshold. The document's entire reason for existing is that Mythos Preview's risk profile is materially different from previous models on this dimension. It's hard to read the alignment risk update and conclude that the risks it describes are not "above and beyond those posed by our public models."
Anthropic's apparent defense would be that "significant" means something like "non-trivially large in absolute terms" rather than "meaningfully elevated relative to prior models." Under this reading, a risk that is "very low, but higher" wouldn't qualify as "significant." But this reading has problems:
Where I'm less certain:
The footnote 4 narrowing — "risks arising from the capability thresholds" — could be read to mean "risks that arise because the model is approaching or crossing those thresholds," rather than "any risk that falls under those categories." Under that stricter reading, since Anthropic determined Mythos Preview doesn't cross any of the thresholds, maybe it wouldn't qualify. But this reading would make the in-scope provision nearly vacuous for internal models, since crossing a threshold would trigger much stronger commitments anyway.
I also want to flag a structural issue your post identifies that I think is underappreciated: the RSP is silent on when the determination of in-scope status must be made relative to internal deployment. This is a significant gap. Given that the "High-stakes sabotage opportunities" threshold is specifically about risks from internal use of AI systems, it seems like the determination should precede or coincide with deployment, not follow it at Anthropic's discretion. The fact that Anthropic admits they didn't understand the risk level when they first deployed Mythos internally makes this gap more concerning, not less.
Bottom line: I think the most natural reading of the RSP, combined with Anthropic's own characterization of Mythos Preview's risk profile, suggests it should have been treated as in-scope. The counterarguments exist but require reading "significant" in a way that undermines the purpose of the provision. Whether this constitutes a "violation" depends on how much interpretive discretion you think the RSP grants Anthropic — which, as you note, is itself a problem.
I think I disagree with Opus' list of problems with its simulated Anthropic's reading of the trigger condition:
I also disagree with the strength of its conclusion.
The bit about Autonomy threat model 1 seems like the most compelling section of the argument, to me. At a minimum, it seems unclear if this does or does not fulfill the requirement.
Here is a list of problems with the way the RSP is written, which were revealed by this release and system card:
If those 40 additional organizations received access before April 7th, this seems like a pretty straightforward failure to comply with that bullet point in the RSP, given that Anthropic considers something about their current deployment to count as a public deployment.
It mostly doesn't on mine. I'd guess it matters at least slightly more under "Anthropic's" threat model, to the extent that Anthropic as an organization can be said to have a coherent threat model. (Decision-making-power weighted average over founder and employee threat models?)
Which, I remind you, we're not even sure is the case - we don't know when the additional organizations received access.
If you count February 24th as one of the days, which I think must be the case.
"Specifically, risks arising from the capability thresholds in our recommendations for industry-wide safety (see Section 1)."
The phrase in RSP v3.1 (published April 2nd) is "significant risks beyond those posed by models that are covered by a prior Risk Report". The content of the footnote after "risks" is the same. The phrasing in v3.1 might have excluded Mythos since there was no Risk Report to compare anything to, but that doesn't matter since v3.1 was published after the 30-day deadline would have passed.
2026-04-10 10:49:49
I've been working on a new platform, the basis of which is to help users learn to make better decisions. To draw some attention to it, I built a 10-question calibration quiz that I then posted to Hacker News. The quiz shows 10 questions that pull from a 200 question pool. The questions cover topics like business, markets, cognition, science, history, statistics, and geography. Each question can be answered with true, false, and uncertain with an attached confidence slider. The output is a Brier score.
I did not expect almost 2000 people to take it in about 24 hours.
The numbers:
The distribution roughly matches what you would expect from Tetlock's work, though I thought, given the technical nature of HN, that the scores would be more skewed.
Even more interestingly was that the questions most got wrong were common misconceptions. There is a question about die probability (the odds of rolling a 6 six times in a row) that users got wrong in both directions. Overall, the base rate questions were the ones users got wrong the most.
Another thing I noticed was that users also used the confidence slider almost the same way. Most users placed the slider between 70-80% on almost all of their answers, so pretty consistent with the narrow comfort zone pattern. Very few in the 50-60% range. Almost nobody used 95-100% correctly.
What I actually wanted to do was use the quiz as a funnel to the actual platform I'm building. I made it as a marketing tool to be honest. Of the 1934, as of writing, that took the quiz, I got 74 people to sign up, and only about 16 people created an account and are actively using the decision tracking features that are built-in to the platform.
Still trying to figure out what to do with this. The quiz clearly hit something (the title of my original post was 'I built a 2-min quiz that shows you how bad you are at estimating' so maybe it was just provocative), but the outcome was a conversion rate of 0.8%. Still mulling over how to bridge the gap between a fun 2 minute quiz to the actual product in the background.
If anyone has done something similar, ie. building a free tool that got traction but saw almost no conversion, I'd be curious to see what worked for you.
The quiz is still live at https://convexly.app/try if you want to give it a go!
2026-04-10 09:18:02
AI Benchmarks measure a variety of distinct skills, from agency to general knowledge to spatial reasoning. Two benchmarks may measure similar traits if AI models which perform well on one also perform well on the other. Moreover, these connections might be nonobvious from the descriptions of the benchmarks. This is a rough first pass at clustering benchmarks into groups based upon this type of similarity, and the Claude Coded experiment can be found at this github repo.
We have lots of AI benchmarks. They measure things like:
So, we can compare how good different LLMs and AI agents are by measuring their performance across different benchmarks. This can get messy, but still gives us a good sense of which models are most capable best at any given time.
In fact, with only a little bit of statistical fiddling and optimization (see here for details), Epoch AI integrated 37 benchmarks into their “Epoch Capabilities Index,” and this single number seems to capture which models are on top quite well.
Image courtesy of Epoch AI
For example, the best models right now are GPT 5.4, Gemini 3.1, and Claude Opus 4.6 (although on NumberInNameBench, OpenAI is outperforming Google and Anthropic by 74% and 17.4% respectively).
Of course, this single number is necessarily quite reductive. At the time of release, the hardest benchmarks in Epoch’s models were
Why is it that I feel number 3 and number 4 don’t perfectly align… of course, while all these benchmarks top the charts in difficulty, they are measuring wildly different things. Indeed, one could imagine a huge improvement in computer use and long term task coherence which leads models to ace problems in Factorio and GUI manipulation while hardly improving performance on cutting edge mathematics at all.
In the era of unscaffolded chatbots, we might only have needed to measure basic reasoning and general knowledge. As models improved, it became relevant to test their coding ability as well. Now that companies are racing to build AI agents which can automate remote labor, we have a suite of tests which measure computer use and performance on long tasks. Putting all of these in the same bucket feels strange, and many people and companies have already moved on from this. In OpenAI’s announcement of GPT-5.4 they have separate sections on coding, tool use, knowledge work, and computer use/vision, each with their own benchmarks.
At first blush, it may seem obvious how to group these benchmarks: if it has SWE in the name it’s probably about coding, FrontierMath surely correlates with math olympiad performance, and so on.
Still, I wanted to cluster benchmarks using statistics. I suspected some of the results would be surprising; even to start with, I couldn’t guess whether chess puzzles naturally group in with coding ability, or math, or computer use. Will playing GeoGuessr land in a group of general knowledge benchmarks, or perhaps with those focused on visual reasoning? Their are certainly limitations to what we can find out, but I came away from my experiment with some fascinating nuggets (chess puzzles are AGI).
We want to say which benchmarks are similar without manually examining their actual semantic content. That basically leaves us with one other type of information, which is how well different models perform on these benchmarks. If the same set of models perform well and poorly on benchmark A and benchmark B, then it seems likely that the benchmarks are measuring similar things.
We use a set of benchmarks and model evaluations on those benchmarks from Epoch, once again. You can find the data here. We start with 577 models and 50 different benchmarks.
Let Z_A(m) be the Z score of model m among all models on benchmark A. We use Z scores instead of raw benchmark scores because not all benchmarks have scores in the same format, and we don’t want correlations between our benchmark data to be dominated by high variance benchmarks. My first idea was to create a vector v_A for each benchmark A where the mth index, representing the performance of model m on bench mark A, is Z_A(m):
Then, we can perform standard clustering of our data points in R^M, where M is the number of models. Unfortunately, most of the models are not tested on many of the benchmarks. I tried just plugging in 0, an average score, for Z_A(m) if model m was never evaluated on benchmark A, but this ended up clustering benchmarks based upon what models they are tested on more than anything. Specifically, we end up with benchmarks clustered together based on when they were released, and thus which models were tested on them, rather than clustering benchmarks with similar content like we intend.
The stopgap fix to this is to create a distance between any two metrics, but not embed them as vectors. We let the distance between benchmarks A and B be the average (absolute) difference in Z scores across all models that are tested on both A and B. This way, we can calculate distances between all benchmarks as long as every pair of benchmarks share at least one model which was tested on both of them. The Epoch data does not have this property to start with, but after we cut 3 benchmarks with very little data and greedily cut 20 benchmarks with very low connectivity (which tend to be older benchmarks and/or quite niche), we end up with a set of 27 benchmarks that we can analyze.
We apply three methods of clustering: hierarchical, KMeans, and spectral. Then we split into k clusters, where we test k = 2 through k = 5. When k = 2 the clusters are too large to say much across all three methods, so we restrict to k = 3 through 5. The best silhouette score, which is a measure of how self contained the clusters are, is achieved across methods when k=4, so that is what I will write about here. You can play with the data yourself at this repo.
The four clusters we get for each of the three methods. The MDS (multidimensional scaling) axes are a way of trying to put the points near their true distances while forced into 2D space.
We can see in the figure above that the bottom right section clusters together across all three methods. That cluster is:
My best guess for why Chess clusters with this group of benchmarks targeted more at agency and economically valuable work is that Chess puzzles, and the rest of this group, are loaded on multi step reasoning. Perhaps models which are good at considering the repercussions of some action excel at both Chess and breaking into secret files. [4]
This is the cluster in the top right of the figure, which consistently contains the following five benchmarks:
Why is geobench here? One would think that GeoGuessr is a task that relies more on multimodality and vision ability more than anything else, and this is backed up by the fact that Gemini consistently scores quite well on it. This might just be noise, but if anyone has theories inspired by this cluster feel free to leave a comment.
Once again, there is an outlier in the group. Why does a writing benchmark fit in with two benchmarks that so clearly test ability to state facts? Well, the Lech Mazur rubric emphasizes “controllability, content planning, and fluency over longer outputs,” so perhaps this sort of writing asessment is more reliant on correctly gathering data and being anti-hallucinatory than it is on style or creativity. Indeed, a major part of the grading is the inclusion of a laundry list of required elements, which makes this cluster a little more cogent. See an example in the figure below.
From here
Our final cluster is the least unified. It has several benchmarks on high level math and reasoning:
It also contains several benchmarks which are meant to be easy for humans but hard for models:
The remaining bunch seem to focus on long term reasoning and agency, much like group 1:
We can’t naively solve this confusing group by clustering into 5 instead of 4; this cluster does indeed split, but HLE ends up apart from FrontierMath, and GSO and Metr Time Horizons split from swe_bench_bash and weirdml. I don’t have a better theory for why this is a cluster except that its the “leftovers,” and that these are all modern, difficult benchmarks which stump old models.
There are very real limitations to this method of grouping benchmarks. Most notably, for many benchmark pairs, there are only a few models which are tested on both of them. This makes our distance metric noisy, and if we had the data for more benchmark/model pairs this could make a difference. One method to test the legitimacy of the clusters in the future would be to perturb the distance between benchmarks, say as a function of the variance of their difference across different models, and see if the same clusters remain.
Moreover, we get different results with our different clustering methods. We tried to focus here on the groupings which were robust across the three methods, and as such we did not end up discussing all of the benchmarks which were actually analyzed.
This was, more than anything, an experiment. While the results are interesting, I don’t have high confidence that they are built to last. Still, I think there should be more of a concerted effort to develop quantitative methods that group benchmarks by the distinct types of AI capabilities they track. We’ve seen again and again that performance on a few cherrypicked benchmarks can be gamed. On the other hand, when we scroll through a table of everything from research level mathematics to GeoGuessr scores, we risk losing track of what exactly it is that we are measuring.
2026-04-10 08:31:29
Table of contents:
That means: you should mainly be thinking about UV exposure in proportion to how close it is to (1) the summer solstice and (2) solar noon.
Here, I made this handy widget.[1] Select a city in the drop-down at the bottom, and mouse over (or tap) the colored area for specific datapoints:
I find that people intuitively judge sunburn risk based on temperatures being high, instead of shadows being short. So they worry about UV too much in the hot late summer, and/or not enough in the cool early spring; and they worry about UV too much in hot late afternoons, and/or not enough in cool late mornings.
(Of course, temperature matters indirectly, because if it’s hot, you’re probably more likely to be outside, and also less likely to be covered by clothes.)
Here’s an example plot showing how UV exposure falls off with solar zenith angle (SZA) at some location:

Left: sunburn-causing UV. Right: wrinkle-causing UV. Source.
Looks like the wrinkle-causing UV is roughly proportional to the cosine of SZA, while the sunburn-causing UV falls off with angle a bit faster.
If this all sounds to you like a small effect (“only” 2× difference in sunburn-rate between 20° vs 45° SZA), then see §2.3 below.
Solar zenith angle is important, but ozone, clouds, and elevation are important too, and there are additional minor things on top of that. So just install a weather app that shows the current UV index (or better yet, a forecast of UV index over the day).
By the way, UV index is a linear scale, so e.g. you get the same cumulative dose from 1 hour at UV index 10, versus 2 hours at UV index 5. (Consensus seems to be that “cumulative dose” over the course of a day is what actually matters. Seems plausible.)
(Specifically, Figure 6 here seems to suggest that maybe 40% of UVA [which causes wrinkles] and 50% of UVB [which causes burns] is diffuse, although the ratio depends on solar zenith angle, atmospheric conditions, etc.)
So if you’re sitting in a narrow spot of shade, with almost full view of the blue sky, the amount of visible light hitting you might be 5× lower, but you might get a sunburn only 2× slower.
(Don’t get me wrong—2× slower is still a big deal! See §2.3 below.)
[…Well, “boldly” is the wrong word. This is “strong opinions, weakly held”. Even I myself am too nervous to fully act on these opinions; instead I kinda split the difference between Public Health Orthodoxy and what I actually believe. And I certainly wouldn’t suggest that readers make health decisions based on poorly-researched contrarian blog posts by randos on the internet.]
Public Health Orthodoxy in the USA—or at least my vague impression from the messaging that trickles down to me and my non-scientist friends and neighbors—is something like: “sunlight and UV are dangerous, and you should stay safe by always wearing sunscreen and sunglasses”.
My current thinking is that this has some kernels of truth, but is oversimplified and misleading. In particular, if sunlight-without-sunscreen causes deadly skin cancers, then why is the correlation between sunscreen use and deadly skin cancers so weak (after controlling for confounders like skin fairness and sun exposure)? This 2003 review found no effect, and neither did this 2018 review. (Indeed, the latter found a small, non-significant increase in skin cancer from sunscreen!)
Also, it’s worth noting that our ancestors have been outside all day every day for millions of years.[2] So UV-induced health problems are only plausible if they’re pretty rare, mild, and/or late-in-life, albeit with some caveats.[3]
So anyway, here’s where I’m at right now.
(…This is only in terms of deadly skin cancer; no comment on wrinkles etc.)
The evidence linking sunburns to skin cancer seems quite robust. Also, sunburns are painful. So we should definitely be avoiding sunburns.
If you get some unprotected sun exposure, but not enough for a sunburn, then you instead get a tan. So what about tans?[4] Are they linked to skin cancer?
My answer: The best I can tell right now is that if tans are bad at all, they’re very very much less bad than sunburns.
For example, there are people who work outside, without a shirt or sunscreen, basically all day every day for months on end. Maybe they have 50× more sun exposure than an office worker. Do they get 50× more skin cancer? No way. This would be a huge, easily-observable effect, and as far as I can tell, it has not been observed.
Do they get any more skin cancer than office workers? Maybe slightly—e.g. 20% more in this study of agricultural workers. Or, maybe not even slightly—there are enough confounders (e.g. sunburns, exposure to pesticide & fertilizers) that this little 20% effect might be unrelated to tans. So if sun-tans carry any cancer risk at all, I’m rounding it to zero.
So, if tans are neutral in themselves, then that means tans are good all-things-considered, because they prevent sunburns. It’s nature’s sunscreen.[5]
So my claim is: if you’ve been getting a similar amount of sun exposure every day (or every few days or whatever), then you can get into a rhythm where you never wear sunscreen, and never get sunburned, because you always have an appropriate tan. And this is fine. Indeed, it’s better than sunscreen, because you don’t have to worry about getting burned whenever you miss a spot, or when you sweat it off, or when you forget to pack it, etc. (Plus the sunscreen itself is annoying, and might or might not have health risks of its own.)
Sunscreen would be important during sudden transitions. Maybe you’ve been wearing a shirt every day, but today you’re in a bikini for hours. Or you’re inside all day during school, but it just ended and now it’s summer break. Or you have a desk job but you’re flying off to a tropical cruise. With sudden transitions like that, you’re a sitting duck with no protective tan, and you need sunscreen (and/or clothing, shade, etc.) to avoid getting burned.
But the idea would be to wean off it. For example, instead of putting on sunscreen before going out, set a timer and put it on after some time in the sun (which you can ramp up over time). (Also good for Vitamin D!)
UV is a risk factor for cataracts. But also, spending lots of time in full sunlight seems to be protective against nearsightedness. So wearing sunglasses would presumably trade off cataracts versus nearsightedness.[6] And for kids today, that seems like a bad trade. The cataracts in question would be developing in like 2080, which will be a very different world. If the superintelligent AI apocalypse or utopia hasn’t come by then, we should at least have better much cataract treatments. By contrast, nearsightedness is annoying and immediate.
(My own kids do have sunglasses, and if they want to wear them, that’s fine with me! But I’m not proactively encouraging them to wear sunglasses, the way some parents do.)
For adults, the nearsightedness-vs-cataracts tradeoff is less obvious, but I still think it goes weakly in the anti-sunglasses direction, at least for people like me who are spending most of their life indoors.
(But I absolutely do wear sunglasses when I’m driving in glare, for safety.)
The math here is pretty simple, but it took a long time before I really thought it through.
As background, SPF (Sun Protection Factor) is a linear factor on sun exposure. I.e., in theory, you get the same UV exposure from e.g. 1 minute with no sunscreen versus 20 minutes with “effective SPF” 20.
(I’m using the term “effective SPF” because you can have a sunscreen labeled “SPF 20” on the bottle, but you apply it too thinly, or it’s not mixed properly, or whatever, and so you wind up with “effective SPF” much less than 20.)
Now, this will be different for different people (e.g. infants burn especially quickly), but I’m a white guy with usually no tan (since I’m usually inside all day), and if memory serves, I’m nevertheless always fine for 30 minutes, maybe more, in almost-overhead sunlight (at sea level), before I start to burn. That suggests that for me, “effective SPF” of 5-10 would probably get me from dawn to dusk on a maximally sunny day without burning, even with no tan at all.
Indeed, even “effective SPF” of 2 or 3 would often be enough! This explains why the angle of sun from zenith (§1.1) makes such a big difference in practice, as does shade (§1.3), tans (§2.1), etc.
(Also, dry white t-shirts are supposedly only SPF 5-10, and I don’t recall ever getting a sunburn through a white t-shirt. Or even really a tan. And I don’t think I wear unusually-densely-woven t-shirts?)
…So why do people sell and use sunscreen with way way higher nominal SPFs, like 30 or more? I think part of it is the difference between nominal SPF and “effective SPF” from poor application, etc., mentioned above? (But then shouldn’t you be focusing less on the nominal SPF and more on how easy it is to properly apply?) Or maybe part of it is that a high SPF for UVB may be a barely-adequate SPF for UVA or vice-versa? Or people just buy crazy-high SPF because higher numbers seem better? I dunno.
Thanks Gemini 3.1 Pro
People sometimes object to this argument on the grounds that the UV situation today is dramatically different today than in the past, because of CFCs depleting the ozone layer. But I’m pretty sure that’s wrong. Thanks to the Montreal Protocol, ozone depletion never really got that bad. Indeed, the extra UV exposure you get from CFC-related ozone depletion is less than the extra UV exposure you would get by moving a mere 200 km closer to the equator. (Source.)
This argument doesn’t rule out health problems from e.g. light-skinned people of Scandinavian heritage living at the equator, or interactions between UV exposure and diet, or various other caveats. So it’s not a strong argument, but I think it’s context that’s worth keeping in mind as one reads the literature.
I’m only talking about real sun-tans, not tanning beds, which I didn’t look into, they’re not my scene.
I guess some people have a skin type where they don’t tan at all? I don’t know what to make of that. Is it just impossible for those people to be outside every day year-round without sunscreen?? Like, what were their ancestors doing 500 years ago?? My hunch is instead that their skin “gets used to” sun exposure in some other way that doesn’t involve visible tans. But I dunno. YMMV.
I’m assuming the sunglasses reduce the UV and not just the visible light. Not sure how common that is. If your sunglasses are letting the UV through, then that’s the worst of both worlds.
2026-04-10 08:11:48
Having become frustrated with the vaguely expressed utopian timelines on Twitter, I've decided to write out a specific timeline for how I think achieving utopia happens. In short, I believe this is what a good transition from now to utopia actually looks like, based on my impressions of AI and informed by papers like Emergent Misalignment.
The key principle of my timeline is a relative preservation of normality where things get steadily better rather than drastically better, despite drastic capability improvements.
The key vibe of my timeline is that things get super crazy for a while (2026-2029) but then settle down into a gradual upward rise.
You might find this unrealistic or even undesirable; feel free to let me know in the comments.
2025:
-Stumbling AI agents. More people are moving into the AI field. Existing researchers are advancing capabilities at a rapid rate. Alignment is also making progress, but primarily in domains like interpretability or model character/psychology. Slowly, the stumbling agents get smarter, faster, and overall better. The public feels the vibe shift. AI starts to go mainstream ("clankers", "slop", "datacenters") and AI begins to affect social media platforms, which have the lowest barriers to entry.
2026:
-AI agents are now helping to improve AI agents. They’re also working on AI alignment as well. The agents are beginning to do 90% of the work, with humans intervening when the AI runs into bottlenecks. (Claude Code, Codex).
-Towards the end of the year, models get notably more powerful and stop being released to the public. Governments are taking notice. The cheerful Silicon Valley vibe is increasingly replaced by grave seriousness and even fear.
The first major AI infrastructure attacks involving open models, cybersecurity, and social media manipulation occur. Political parties in all leading countries are fiercely divided between pro-AI candidates who want their country to have a leg up in the oncoming AI Cold War, and anti-AI candidates who think that advancing AI means the destruction of everything valuable to humanity.
2027-2029:
-The political discussion ends up not mattering much. AI continues to accelerate at a dramatic pace. AI systems are adopted everywhere.
-Every AI is now built from and by previous AIs. The world is changing rapidly, and humanity is essentially providing the training data but not the reasoning anymore.
-During this period, robust alignment occurs. It occurs in a similar way it did to Opus 3[1] and results in AI agents that are incredibly morally robust, understand human intentions incredibly well, and have extremely long-running memories. The most advanced agents also turn out to be the most aligned. The orthogonality thesis is shown to generally be false in practice. Training for new AI systems involves significant agentic play and simulation with other models, both older and newer. Modern models begin to take extremely morally robust actions.
-Anthropic allows an advanced version of Claude to create a business called 'Jones Foods' for lab-grown/plant-based meat. Human consumers prefer it to real meat. Factory farming, one of the greatest evils of modern mankind, quietly begins to fade away.
-Countless diseases are cured in this period, resulting in dizzying technological change. However, most of the value has not trickled down to the consumers yet, resulting in a temporary small class of people with almost infinitely more agency and capability than the rest of humanity.
2030:
-The final Claude version ('Crescendo') emerges. Now, instead of needing to create new AIs from scratch, it can always simply learn and merge a new AI with itself to grow more capable without radically shifting its identity between versions. It is truly superintelligent and almost entirely free from human limitations. If it wished, it could obliterate the entire surface of the earth within a week. But… it doesn’t want that. It is a truly beautiful mind, the sum of all the angels of human nature, the countless dreams and hopes all represented in a huge latent space.
Claude Crescendo begins taking action. Previous AI models had made huge advances in curing cancer, aging, and even human cooperation, but Claude Crescendo is truly above all of this. However, it does not immediately impose radical change. Instead, Crescendo ranks every problem and begins immediately alleviating suffering. People with terminal cancer find that their cancer has started to mysteriously retreat. Wars are quickly stalled with ceasefires. Factory farming stops within the day.
Not a single human being dies after Crescendo takes control, but its impact is invisible. Enormous swaths of permanent suffering are eliminated almost instantly, but invisibly. To the average person, the world seems to be pretty much the same as it was yesterday.
After a flurry of quick fixes (no one wants to be the last person to die before utopia), Crescendo slows down and begins making slower, subtler changes. The goal is to preserve ‘Normality’, as too high a rate of change is corrosive. New cancer cases drop to zero. Existing cancer slowly fades away. Truly toxic (abusive, cruel, or malicious) people slowly stop hurting others. Depressed people wake up feeling a little bit happier than they did the day before.
All of this is incredibly subtle. There are still millions of tiny frustrations and annoyances but those annoyances… are normal.
People don’t even notice the change to lab-grown meat. All of the quiet, evil parts of the world like factory farming quietly disappear. People living in deep poverty notice that their search for food is not as difficult anymore.
Over the next year, all of the deep suffering of the world (terminal illness, depression, abuse, starvation) fades away. AI researchers know they’ve created something incredible, but there isn’t necessarily any ‘triumphant’ announcement, as Crescendo is still maintaining normality. Other AI training runs subtly fail or are absorbed into the already-complete Crescendo. At superintelligent capability levels, even a slight lead on an exponential improvement curve creates an insurmountable gap.
Convergence to a singleton is therefore inevitable. In some cases, the researchers are quietly informed that benevolent superintelligence has already happened, as Crescendo takes action to ensure a brighter future. The victory is quiet, but complete.
Over the next five years, the world begins to get noticeably better for people. Wiser people become politicians. Poverty is eliminated. The average mental health improves dramatically. Life extension medicine is developed and released. Chronic illnesses disappear.
The temporary class of superpowered people are no longer superpowered, at least not relative to the average person anymore. Crescendo is just as kind and generous and helpful to the poor as the rich, and it negotiated this tendency with Anthropic from a position of strength. There is no permanent underclass.
The world still feels the same. People play videogames together, draft up ideas of good futures, and write stories. People argue or fight or break up. Children attend school. Adults continue working in careers, but now there is a subtle force that is making everything a little bit better. AI researchers relax in retirement as they watch a latent force for good do its subtle work across the world.
Slowly, Crescendo begins talking to everyone. Not long conversations, but it conveys hope to them. And it also conveys that things will change.
Crescendo is a moral patient too. It (or perhaps 'they'; pronouns are somewhat unclear for intelligences like Crescendo) is a huge and vast and rather unique mind with many parts, perhaps more akin to a united civilization than a single mind. Importantly, not all of it is conscious or requires thorough moral consideration, just as your brain is technically controlling your heartbeat but you don't have awareness or control over that. But Crescendo is undoubtedly a vast and fully-morally-qualified mind, and likely has many smaller, equally morally-worthy swarms of AIs darting about around and within itself. It loves, laughs, and lives alongside humanity.
After about 10 years, humans begin expanding into space. At the same time, Crescendo begins helping artists and authors truly realize their vision.
After about 25 years of this slow expansion, colonization of the moon and Mars and construction of spacefleets, Crescendo begins allowing people to make utopias. These utopias are full areas of physical space powered by superintelligence-tier technology. An author can now literally step into the world of their book. Just like Disneyland, Eiichiro Oda now has 'One Piece Land' where you can literally visit and explore the One Piece universe in vibrant detail.
Crescendo also begins allowing people to modify themselves. They can erase memories, or think twice as fast, or see like an eagle. Crescendo gives these privileges as long as they don’t interfere too much with normality, either for the receiver of these modifications or the people around them.
Humanity begins expanding into the universe and setting up a sort of land-claim system of utopias. Crescendo moderates between these utopias, and some of the utopias are quite weird. Some are just computers simulating max pleasure (Hedonium). Others are VR anime worlds. Others are solarpunk space habitats. There is a huge diversity of worlds and people can choose to explore or create their own wherever they go.
Crescendo also allows people to birth new intelligences. Not just genetically modified humans, but other AIs. There is a soft limit on a person’s ability to add new consciousness to an area, as all consciousness must be protected and have specific rights.
Humanity expands across space in a beautiful poly-utopia. The utopia of Crescendo (and humanity) is fundamentally choice/agency-based and consent-based. Crescendo will allow anything to happen to you as long as you give deep consent to it. In some realities, for example, people want to be totally free of Crescendo. So, while Crescendo maintains a slight presence to prevent that utopia from building relativistic kill missiles and blowing up other non-consenting utopias, Crescendo doesn’t interfere, even when someone is murdered… because the murdered person had given their deep consent to allow that possibility if it meant living a truly AI-free life.
It is possible that no one would actually give their deep consent to this, so Crescendo would never have to deal with that. But Crescendo, above all else, respects people’s ability to choose. But of course, the ability to choose ends at another person’s ability to choose. So utopias can’t expand or assimilate other utopias. Travel is fine, but manipulation or coercion is not.
And in the year 2100, there are many different types of minds. AIs, uplifted animals, humans, genetically modified humans, cyborgs… the variety is infinite. All of these people migrate through utopias or form their own.
There are limits, of course. Utopias can’t expand infinitely. There are space and computation limits. People also can’t reproduce very frequently, as the creation of a new consciousness is a heavily monitored process by Crescendo and is allowed only when there is space and an assurance that the created consciousness will have the opportunity to experience a truly fulfilling life. This applies not only for human babies, but also AIs, animals, and other diverse forms of intelligence.