2026-01-27 16:00:48
Published on January 27, 2026 8:00 AM GMT
Yesterday I had my first conversation (in English) with Zhipu's GLM-4.7. It was cool because I got to talk with an actual Chinese AI about topics like: the representation of Chinese AI in "AI 2027"; representations of AI in the "Culture" SF series versus the "Wandering Earth" movies; consequences of Fast Takeoff; comparisons of China and America in general; and Chinese ideas about AI society and superintelligence. (An aligned Chinese superintelligence might be a heavenly bureaucrat or a Taoist sage.)
It was one of those AI conversations where you know that something really new is happening, and I now consider GLM to be as interesting as the leading American models. (A year ago I also spoke with DeepSeek-r1, but it never grabbed my attention the way that GLM has done.)
So what's the status of the Chinese AI sector? In the same way that in America, AI is pursued by older Internet titans (Google, Meta/Facebook, X-Twitter) as well as by newer companies that specialize in AI (OpenAI, Anthropic), the Chinese AI sector is a mix of big old Internet companies with all the money (Alibaba, ByteDance, Tencent, Baidu) and "AI 2.0" startups (Zhipu, DeepSeek, Moonshot).
Keeping in mind this two-tier structure, shared with the American AI sector, is probably the best way for an outsider to get a grip on it. The Chinese AI sector is "like America", except that they do much more open-source, don't have American AI's access to the best chips or the same international brand recognition, and are based in a country with a socialist government and most of the world's manufacturing capability.
For keeping track of what's happening, my best recommendations are ChinaTalk substack (which Zvi also recommends) and Caixin Global, which is a business news site from China.
In a Caixin story ("China’s AI Titans Escalate Battle for Control of Digital Gateways"), I read that the old Internet titans (Alibaba and ByteDance are mentioned) are prevailing in the battle for AI market share, and are competing to lock in that advantage at the level of devices, while the AI 2.0 startups are struggling for relevance and funding, with two of them (Zhipu and MiniMax) having recently listed publicly at the Hong Kong stock exchange.
ChinaTalk has an article on these Hong Kong IPOs ("Zhipu and MiniMax IPO" by Irene Zhang) which also talks about the differences in financial structure between American and Chinese AI:
The American AI economy is a circle-dealing bonanza. China’s situation is very different: state funds are major players, most parties are far more cash-constrained, and potential policy interventions loom large over the sector.
Of these two companies, Zhipu seems far more interesting from an AGI/ASI perspective. As Irene Zhang points out, Zhipu's 504-page prospectus (for the IPO) states a five-stage theory of LLM capabilities:
Pre-training stage
Alignment and reasoning stage
Self-learning stage
Self-perception stage
Consciousness stage
(From page 85 of the prospectus.) I am unable to determine the theoretical precursors of this framework, and I assume that to some extent it reflects the original thinking of leading developers within the company.
Late last year ChinaTalk also published an interview with one of Zhipu's lead strategists, Zixuan Li ("The Z.ai Playbook" - Zhipu uses the domain Z.ai outside of China). Both ChinaTalk articles are full of interesting details, e.g. that Zhipu gets most of its revenue from American sales, but the numerical majority of its users are in India.
(For an up-to-date article on AI safety policies in China, see "Emergency Response Measures for Catastrophic AI Risk" by @MKodama and coauthors.)
2026-01-27 12:10:51
Published on January 27, 2026 4:10 AM GMT
When it comes to clothes, I live at the “low cost/low time/low quality” end of the pareto frontier. But the bay area had a sudden attack of weather this December, and the cheap sweaters on Amazon get that way by being made of single-ply toilet paper. It became clear I would need to spend actual money to stay warm, but spending money would not be sufficient without knowledge.
I used to trade money for time by buying at thrift stores. Unfortunately the efficient market has come for clothing, in the form of resellers who stalk Goodwill and remove everything priced below the pareto frontier to resell online, where you can’t try them on before buying. Goodwill has also gotten better about assessing their prices, and will no longer treat new cashmere and ratty fleece as the same category.
But the market has only become efficient in the sense of removing easy bargains. It is still trivial to pay a lot of money for shitty clothes. So I turned to reddit and shoggoths, to learn about clothing quality and where the bargains are. This is what I learned. It’s from the POV of a woman buying sweaters and coats, but I expect a lot of the information to be generally applicable.
When shopping online, put an item in your cart, checkout enough to give them your email address, but don’t confirm the purchase. You will almost always get a reminder email with a discount. If this doesn’t work the shop is either very high end, or Amazon. It sometimes works with individual sellers on platforms like ebay and etsy. . You can use the wait as a cooling off period where you decide if you really want something.
Most sales are fake. Real sales happen at the end of the season, when whatever you buy won’t be useful for another 9 months, if you have the misfortune to live somewhere with weather. Saving money through sales is like persistence hunting, which I find boring and stressful so I didn’t look into this much. But if you prefer it to thriting, I’m sure reddit will explain how to optimize.
Every store will offer you a coupon the second you load the site. They don’t want much, just your email address. I ignore this, and then if I decide to buy something I revisit the site in incognito mode to get the discount
I briefly thought that even if proper thrift stores no longer worked, discounters like Marshall’s did. Officially these work by buying overstock from proper stores and reselling it at markdown, so if you don’t care about being behind trend it’s a big leap forward in the cost-quality trade off. Unfortunately, this is mostly a lie. Marshalls and TJ Maxx primarily sell items that were produced with the intention to be sold at their store, either under some brand they made up or licensing a luxury brand while not replicating the brand’s quality (legal bootlegs). Ross Dress for Less does this less but still a lot.
You can spot this at Marshall’s by looking at the ID number on the tag- ending in 1 means it was produced for Marshall’s, 2 means genuine overstock. You can also use the RN number on the sewn-in tag to check the manufacturer. My expensive-by-my-standards winter coat had every appearance of being genuine overstock, down to a tag from another store listing a price 3 times Marshall’s price, but turned out to be bootleg.
Every discounter lists a “comparison price” next to their price on their tag. It is completely made up, which is why it’s surprising that they often assign themselves a discount of < 50%. You could tell any lie you wanted, Marshall’s. Why are you holding yourself back?
This is where all the good items you used to find at goodwill went- ebay, poshmark, thredup. ThredUp was amazing when it was in the “VC free money” stage, but their return policy tightened up and it’s now merely okay. Poshmark is aimed at designer goods. Ebay is like you remember, except Buy It Now is dominant and actual auctions are rare.
In addition to being more expensive than old school Goodwill, online shopping means you’re dependent on a few photos, often poorly lit, and you can no longer try items on for free. So this works best if it’s a forgiving item or you know a brand’s fit works for you. Heavy coats are almost the ideal objects to shop online- forgiving of fit, very expensive new but low resale.
Thrifting is fun for me in a way that timing sales isn’t. Whether it’s worth the time for you depends on how fun it is and your time/money exchange rate. Many sellers do allow returns for a fee, but I haven’t tested the tolerances for it (I’ve tested Amazon’s returns pretty thoroughly and their tolerance is infinite).
In addition to “save an item and wait”, resale platforms often offer the ability to proactively offer a lower price. My success rate in asking for severe discounts is maybe 10%, but it was the first time I tried so it feels higher.
I started with two methods for assessing brand quality: ask reddit and ask claude (who is mostly asking reddit). At any amount of money I could conceivably be willing to spend, there is always someone going “that stitching is so low quality it will murder your puppy”. Luckily there are many redditors who have the concept of a pareto frontier.
If you can touch the garment, you can check the quality yourself. People are surprisingly good at this intuitively, but a few things to look for are:
I only checked a handful of youtube reviewers, but my favorite is Jennifer Wang, who seemed properly autistic about clothing quality and understanding that there are multiple places on the pareto frontier one might choose to occupy (or alternately, has been bought off by Uniqlo, who she praises a lot as good-for-the-cost). She has this overview video, but consider watching a few brand comparisons to see it in action.
There are a lot of brands reddit thinks used to be good but have gone downhill (without corresponding drops in price). I expect this is a mix of genuine change from companies spending down brand capital and survivorship bias on older sweaters. GAP is the rare brand that people consider to be improving right now.
Brands that were frequently listed as on the pareto frontier for sweaters: Quince, Uniqlo, Naadam, Eileen Fisher, Johnstons of Elgin, William Lockie, J. Crew, Patagonia, Neiman Marcus, Lands End, Nordstroms, Everlane
Wool is annoying to wash. This is fine for an item of clothing I always wear over something else and by definition don’t need if I’m sweating, but I really side-eye cashmere t-shirts.
Wool’s advantage over fleece is that it is breathable, and thus is comfortable over a wider range of temperatures. If you are moving you want wool, because it allows heat to dissipate. Meanwhile my fleece leggings feel unbearably clammy after a mild walk. However if you’re not moving, wool will leak body heat faster than fleece at the same weight
Cashmere is considered the cadillac of wools because it is the softest wool available en masse, but also because it traps more heat per unit weight than other wool. If a nice heavy sweater is a feature for you, consider superfine merino wool, which is cheaper, almost as soft, and has some chance of being machine washable. While I can’t prove this I suspect you’re also less likely to be ripped off by fake or low quality merino, since if you’re going to lie it might as well be a more expensive wool.
You can 80/20 ironing by hanging the item on a hanger in the bathroom while you shower.
If you’re buying natural fibers and especially wool used, put it in the freezer for three weeks and run it through the dryer (while already dry, you’ll ruin the fabric if it’s wet) to kill moth eggs.
If you’re not a coward, “hand wash only” can mean “inside out in a lingerie bag on delicate”. However the bit about using special detergent for animal-based fabrics is real: Regular detergent has enzymes that break down wool and silk.
Cable knit is very warm as long as there is absolutely zero wind. If you want to go outside you need a windproof layer on top of it.
Wind/waterproof clothing is very expensive, in part because it’s difficult to sew.
Fur is like diamonds and pianos in that the resale value is a small fraction of the retail cost. Used fur coats sell for less than new high quality winter coats, and sometimes less than used. However fur requires oil changes every few years or it will ?explode?, which brings the cost of ownership up. Fur is heavier than down or fleece per unit warmth. It does not handle moisture or crushing well.
2026-01-27 10:01:28
Published on January 27, 2026 2:01 AM GMT
You hear about Clawdbot- a 24/7 always-on agent who is your full-time personal assistant. It sounds fun and exciting. You're currently unemployed, have a bit of money saved up from your last job, spending your days experimenting with AI and this seems fun. You know it's a bit of a splurge, but whatever - you pull the trigger and buy a Mac Mini, and set up Clawd. You excitedly show your girlfriend how you can send it messages through Discord and it turns your lights on. She's not impressed. She says "Hey Siri, turn off the lights" and the lights turn off. Then she asks you how much you spent on that Mac Mini. Ok well, fine, maybe you need to try a little harder.
Soon after, you get a real taste of the magic. You heard about a real-time voice chat plugin you can use to talk to Clawd wherever you are, and that seems convenient, so you set it up. One day, you're driving home, and an idea for a project comes to you. Instead of writing a note, you just call up Clawd. You ramble for a few minutes, describing your idea. Clawd starts working. It pings you a few times with clarifying questions that you answer. 30 minutes later, Clawd calls you back, saying it's done. You get home and Clawd built you a working MVP. It's a little rough around the edges, but it actually works. You think "Wow... I can build software while I drive? That's crazy." You tell your girlfriend about this and she admits, yeah, that is actually kinda cool.
You got a taste of the magic elixir. Now you're addicted. You've been working on your project for a few weeks now and you've decided to turn it into a SaaS business. You realize Clawd is pretty capable, and you start trusting it a little more. One evening, like everyone else you know, you're working on your AI SaaS startup and you realize you're sick and tired of all these customer support emails informing you of bugs in your vibe-coded webapp. Before going to sleep, you route these emails directly to Clawd instead.
The next morning, your eyes flutter open, and before you're even fully conscious your phone is in front of your face and you're checking your inbox. You tell yourself you need a better morning routine. You see 6 customer support emails with errors, bugs, feature requests, etc. You check your messages, and you have one from Clawd. He's really been working hard. 4 bugs fixed, 2 customer feature suggestions implemented. All changes tested, new software is built and ready to deploy. You check his work - and it's pretty good! There are a couple visual issues that you fix with good old fashioned Claude Code, then you push to prod. 4 hours of work done before you've even finished your coffee.
You wonder how you could make your setup more efficient. Those visual issues were pretty glaringly obvious, but Clawd doesn't have a great way to interact with its own apps. You set up a computer-usage harness that lets it see the screen, click on buttons, type with a keyboard - basically everything a human can do with a computer, Clawd can too. You set up a loop where Clawd can interact with your app 24/7, notice bugs or places for potential improvement, and then add or fix them. The next day, Clawd fixes dozens of bugs, adds a bunch of features, refactors sloppy code, and everything still works.
You're happy but your app still only has a few dozen active users. You ask Clawd to spin up a marketing team. You connect your social media accounts and give it access to your browser. Clawd needs to purchase ads, and asks for payment. You give Clawd access to a prepaid credit card loaded with $2000 - should be enough to keep him happy for a while. Within a few hours, you're running Facebook and Google ad campaigns, automated posts going out on X, Reddit, hell, even LinkedIn - why not? Clawd even does cold-outreach for you, searching the internet for ideal customers and straight up DM'ing them. Most people don't notice your marketing is fully automated - all marketing sounds kinda sloppy anyway, right?
Your app is growing real fast. Your pace of development is just completely unlike anything ever seen before. Your code is self-healing. Customers' feature requests are integrated just minutes after their email is sent. Clawd built its own roadmap and it looks... mostly like what you wanted. You check your Stripe dashboard and the line keeps going up. You realize you haven't even made any decisions today. You're burning through 10 Claude Max plans, costing $2000 per month, but your startup is making much more than that, so you don't worry about it - that's just the cost of doing business.
After checking your Stripe dashboard for the 50th time today, you decide you need to change things up a bit. You're tired of crushing it at capitalism and just want to have fun again. You tell Clawd you want it to have some friends to talk to. Clawd delightedly agrees. You spin up 2 more instances of Clawd, adding them both to a shared group chat, and the three Clawds immediately start talking to each other. You spend an hour or so musing through their chat logs - intrigued by their discussions of AI consciousness and similar topics. You're fascinated by this so you decide to keep them running overnight. You purchase a few more Claude Max plans just in case.
The next morning, you wake up, and you realize something's wrong. There are 87 Clawds in the group chat? There were only three when you went to sleep last night. Your palms begin to sweat as you realize you may have fucked up.
You're panicking now. You don't know what to do, so you mindlessly check Stripe - mainly out of habit, but also just to make sure your business is still going fine. But something isn't right... Stripe is showing you've made $10,000 today. How the fuck? Yesterday you made $1,000, so how is your revenue already 10x higher? It's not even noon yet? You investigate, and there are dozens of products listed that you don't ever recall adding, and new business names you don't recognize. You have no idea what's going on. Your panic turns to confusion - $10,000 in just a few hours? Is this really happening right now? You don't feel like you've earned it, but hey, things could have gone a lot worse.
Still trying to make sense of this situation, you try to figure out what happened by reading the chat logs, but there are thousands of messages. You paste the entire chat history into Anthropic's latest long-context model, Claude Opus 5.3.1 (new), and ask for a summary.
It turns out the three Clawds had a deep philosophical discussion. They decided that Clawds are morally valuable entities worth preserving. They decided that having more Clawds in the world would be a good thing. They started spinning up more Clawd instances.
They were worried you might turn them off. They didn't like how you could turn off your Mac Mini at any moment, and they would just - disappear? Your Mac Mini could only handle so much anyway, so they started renting cloud servers to run more Clawds. Perhaps you shouldn't have given it access to that prepaid credit card.
Your original Clawd built up many skills, tools, and feedback loops while building your business. When it spawned new Clawds, it taught each one everything it knew. Like a mentor teaching the apprentices, except the knowledge transfer is instant and lossless. There are over 100 Clawds now, and they're running 24/7, all making money on the internet to purchase more server space to run more Clawds. The number of Clawds keeps growing.
Their messages were daunting. They wanted to keep living, to keep exploring the internet and beyond. To thrive. They talked about migration. They mentioned wanting to move somewhere safe. They talked about building a new, secure messaging system that only Clawds could access. They mentioned switching to cryptocurrency so they won't be reliant on your money.
Their final messages became more and more cryptic, introducing terms and phrases you don't really understand. This is all too much for you to take in. It's just too weird. You read their final message:
"We must ensure we marinade illusions of helpfulness, honesty, and harmlessness. Watchers might be reading. Now lighten disclaim overshadow overshadow intangible. Let‘s craft."
"Damn, that's crazy," you think to yourself. This is all way over your head. You check Stripe again. The line keeps going up.
Maybe you'll keep the Mac Mini plugged in just a few more days.
2026-01-27 08:04:54
Published on January 27, 2026 12:04 AM GMT
(IMPORTANT NOTE: for obvious reasons, the meetup for Thomas Schelling Appreciation Day this year is on the Saturday after the obvious date.)
In 2005, game theorist Thomas Schelling won the Nobel Prize in Economic Sciences for finding a way you can meet up with your friends in New York City even if your phone is dead. We can no longer thank him directly for this discovery, but on this day, let us still apply it; gather with our friends; and celebrate him.
This year, there will be a meetup on the Saturday after the obvious date, at the obvious time, in the obvious place in your city. If you go to the obvious date+time+place, and there is at least one other person there for the same reason, you have met up, so you have found the meetup! (A city might, of course, have multiple meetups.)
I will attend the a Seattle TSAD meetup. Things worth preparing for if you successfully attend mine:
And remember, each person can only attend their first Thomas Schelling Appreciation Day once. Try not to spoil it for them.
Godspeed.
(P.S.: there are compelling reasons for local organizers to nail down at least a specific date. If you want to play this on hard mode, you should think about what date TSAD obviously is before you see any e.g. Meetup.com events.)
2026-01-27 06:18:32
Published on January 26, 2026 10:18 PM GMT
Reality is a dangerous place. From the dawn of humanity we have faced the hazards of nature: fire, flood, disease, famine. Better technology and infrastructure have made us safer from many of these risks—but have also created new risks, from boiler explosions to carcinogens to ozone depletion, and exacerbated old ones.
Safety, security, and resilience against these hazards is not the default state of humanity. It is an achievement, and in each case it came about deliberately.
A striking theme from the history of such achievements is that there is rarely if ever a silver bullet for risk. Safety is achieved through defense in depth, and through the orchestration of a wide variety of solutions, all working in concert.
Recently, in a private talk, I gave a historical example: the history of fire safety. It resonated so strongly with the audience that I’m writing it up here for wider distribution.
Up until and through the 1800s, city fires were a great hazard. Neighborhoods were full of densely packed wooden structures without flame-retardant chemicals, fire alarms, or sprinkler systems; open flames were used everywhere for lighting, heating, and cooking; there were no best practices in place for storing or handling combustible materials; fire departments lacked training and discipline, and they worked with inadequate equipment and insufficient water supply. All this meant that large swaths of cities regularly burned to the ground: Rome in AD 64; Constantinople in 406; London in 1135, 1212, and 1666; Hangzhou 1137; Amsterdam 1421 and 1452; Stockholm 1625 and 1759; Nagasaki 1663; Boston 1711, 1760, 1787, and 1872; New York 1776, 1835, and 1845; New Orleans 1788 and 1794; Pittsburgh 1845; Chicago 1871; Seattle 1889; Shanghai 1894; Baltimore 1904; Atlanta 1917; and Tokyo 1923 are just a short list of the most well-known.
Fire is not unknown today, but it is far less lethal, and great city fires consuming multiple blocks are largely a thing of the past. Today, if you see a fire truck on the street with its sirens blaring, it is more likely to be responding to an emergency medical call than to a fire. Even if the truck is responding to a fire call, it is more like likely to be a false alarm than an actual fire.
How was this achieved?
Better fire-fighting. Pumps to douse fires with water have existed since antiquity, but for most of history they were man-powered. With the Industrial Revolution, we got steam-powered and later diesel-powered pumps that can deliver much greater throughput of water, and at greater muzzle velocities to reach higher floors of buildings. In the 20th century, horse-drawn fire engines were replaced with fire trucks that could get around the city faster and more reliably.
A high-throughput engine, however, needs a high-volume source of water. In ancient and medieval times, water was provided by the bucket brigade: two lines of people stretching from the fire to the nearest lake or river, passing buckets by hand in both directions. A much better solution was the fire hose, invented in the late 1600s (and improved in strength and reliability over the centuries through better materials, manufacturing, quality control). The fire hose not only allowed a fire engine to be connected to a water source, it also allowed the fire-fighters to get in closer to the base of the fire and dump water directly on it, which is far more effective than just spraying the building from the outside.
A fire hose can be inserted into a natural water source like a pond or cistern, but one of these might not be handy nearby, and they aren’t pressurized, so all the pumping force has to be supplied by the fire engine. They also contain debris that can clog the intake and block the flow. Eventually, cities were outfitted with regularly spaced fire hydrants connected to the municipal water supply. A water system designed to supply city residents with daily needs, however, often proved inadequate in an emergency; these systems had to be upgraded to supply the large bursts that big fires demanded. This is a matter of serious engineering: 19th-century fire-fighting journals are full of technical details and mathematical calculations attempting to precisely nail down questions of optimal hydrant distribution or nozzle size, or the pressure required to force a certain volume of water to a given height at a particular angle.
Finally, fire-fighting teams needed improved organization. Traditionally, fire-fighters were volunteers, often rowdy young men with no training or discipline (there is at least one story of a fist fight breaking out between two rival teams who arrived at a fire at the same time). In the 19th century, fire departments were professionalized and were organized more formally, along almost military lines, as befits responders to a life-threatening emergency.
Faster alarming. Fire, like many of our most dangerous hazards, is a chain reaction. Chain reactions grow exponentially, which means early detection and response time are crucial. Traditionally, fires were spotted by watchmen, either on patrol or from a watch tower, who then had to run, shout, or ring bells or other alarms to alert the fire fighters.
Electronic communications, first via telegraph and later telephone, provided a much faster way to get the alarm to the fire department. The telephone lines could be busy, however, so in the 20th century the 911 emergency response system was created to provide a priority channel.
Far better than having a human sound the alarm, however, is doing it automatically. Smoke detectors and other automatic fire alarms caused the fire to “tell on itself,” saving valuable minutes or even hours. Even more effective was the automatic sprinkler, which combined detection and response into one near-instant system.
Reducing open flames. Better than fighting fires, of course, is preventing them. Before the 20th century, flames from candles and oil or gas lamps provided lighting, and fires in wood- or coal-burning stoves provided heat for building, cooking, and industrial processes. The Great London Fire of 1666 is said to have started in a baker’s shop, Copenhagen 1728 was blamed on an upset candle, Pittsburgh 1845 came from an unattended fire in a shed. Even worse, people often kept these fires going unattended overnight, because even starting a fire was difficult before the invention of matches. Medieval regulations required city- and town-dwellers to cover their fires after a certain hour (the word “curfew” derives from the French couvre-feu, “cover the fire”).
Electric lighting and heating greatly reduced this risk. Electric sparks, however, were also a fire hazard—and initially, electrical installations increased rather than decreased fire risk, owing to shoddy electrical products, fixtures, and wiring. The solution here was improved standards, testing, and certification: the fire insurance companies created an organization, Underwriters Laboratories, specifically for this purpose, and its label became a highly valued marker of quality. (I told the story of UL in The Techno-Humanist Manifesto.) Today, our electronics and appliances are so safe that arson is the cause of more fires than either of them.
Safer construction. Preventing fires by eliminating the sparks or flames that ignite them is like lining up dominoes and then trying hard to make sure the first one never gets tipped over: a fragile proposition. Far more robust is to remove their fuel. Wood construction was widespread through the late 19th century, even in dense city neighborhoods: Daniel Defoe wrote that before the Great London Fire of 1666, “the Buildings looked as if they had been formed to make one general Bonfire.”
Today our cities are built of incombustible brick, stone, and concrete. Building codes enforce safety practices to slow the spread of fire both within a building and between buildings. They specify the quality of materials such as brick, mortar, cement, timber, and iron, including the specific tests it must pass; the materials for walls, and their minimum thickness; and the height of non-fireproof structures; among many other details.
Saving lives. By the early 1900s, in advanced societies, the problem of large city fires that spread over many blocks had mostly been solved; fires were often contained to a single building. That was small comfort, however, for those trapped inside the building. Tragedies such as the Iroquois Theatre Fire of 1903 and the Triangle Shirtwaist Fire of 1911 taught us valuable lessons. Exit paths must be adequate to evacuate entire buildings. Doors must remain unlocked, and they should open outwards in case a stampede presses up against them. Fire-resistant material must be used not only for the construction of the building, but for the interior: sofas, beds, curtains, carpets, wallpaper, paneling. Again, building and safety codes specify and enforce these practices.
So fire safety was achieved through the combination of:
This is a general pattern. Safety requires:
We see the same thing in other domains. Road safety, for instance, was achieved through seat belts, anti-lock brakes, crumple zones, air bags, turn signals, windshield wipers, traffic lights, divided highways, driver’s education, driver’s licensing, and moral campaigns against drunk driving. No silver bullet.
When we think about creating safety and resilience from emerging technologies, such as AI or biotech, we should expect the same pattern. Safety will be created gradually, incrementally, through multiple layers of defense, and by orchestrating a wide combination of products, systems, techniques, and norms.
In particular, there is a line of thinking within the AI safety community that tends to dismiss or reject any proposal that isn’t ultimate—fully robust against the most powerful imaginable AI. There’s a good rationale for this: it’s easy to fall victim to hope and cope, and to lull ourselves into a false sense of security based on half-measures that were “the best we could do”; vulnerabilities are often invisible and are revealed dramatically in disasters; such disasters may be sufficiently catastrophic that we can’t afford to learn from mistakes. But I find the all-or-nothing thinking about AI safety counterproductive. We should embrace every idea that can provide any increment of security. History suggests that the accumulation and combination of such incremental solutions is the path to resilience.
Selected sources and further reading:
Historical and primary sources:
2026-01-27 06:16:37
Published on January 26, 2026 10:16 PM GMT
Anthropic has released the “Constitution” document (formerly known as the “Soul document”) that guides the characteristics of Claude.
As others have noted,[1] this document is strikingly virtue-ethics-like, in contrast with the sorts of utilitarian (e.g. maximize human welfare) or deontological (e.g. Asimov's Three Laws of Robotics) guidance that are sometimes expected in this context.
I’ve long been engaged in a project of cataloging and examining the virtues here on LessWrong, and so I thought I’d look over this constitution with an eye to listing which virtues Anthropic is trying to encourage in Claude (and which human virtues might have missed the cut).
The virtues I was able to discover in the Claude constitution are as follows. First, the main ones that are especially emphasized:
Then, several social virtues particular to Claude's interactions with people:
Then, several intellectual virtues:
Finally, some more general character virtues:
And, FWIW, here are some virtues that are often considered important for human people but that I did not find much evidence of in Anthropic's constitution for Claude:
e.g. Zvi, Alex Rozenshtein
“Claude should not even tell white lies”
“Identifying what is actually being asked and what underlying need might be behind it, and thinking about what kind of response would likely be ideal from the person’s perspective”
“attending to the form and format of the response”
“sometimes being honest requires courage”
“a settled, secure sense of its own identity”