MoreRSS

site iconLessWrongModify

An online forum and community dedicated to improving human reasoning and decision-making.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of LessWrong

Manifest x DC After Action Report

2025-11-30 05:20:29

Published on November 29, 2025 9:20 PM GMT

Manifest attracts devotion unusual for a conference. "Dinners and meetings and conversations with people building local cultures so achingly beautiful they feel almost like dreams."[1] "This strange blend of data and debauchery."[2] "That magical feeling of serendipity, where you can flow through a space, passing from conversation to conversation, contribute to each one in turn, and have others do the same for you."[3] Even those who run it say it's "a tough event to describe."[4]

We attempted a replication.

I won't bury the lede. You can just steal things. Manifest replicates.

Group photo was Ghiblified for privacy, as recommended by Jeff Kaufman

This post tells the story of the event. If you're just here for take-aways, you may want to skip to:

This post originally included more personal reflections, which I left out of this public version. I'm not posting it to LessWrong, but if you want the full story, it's available in this Google Doc (please feel free to request access if you're interested and we've met in real life).

The Gazebo of Schemes

I had a very strange time in Berkeley last June. I came with a message to deliver, ideas to refine, and work to do. In the week between LessOnline and Manifest, I frequently worked from the gazebo in the center courtyard of Lighthaven. I got a lot done, advanced work that continues to compound. I'm proud of what I accomplished.

Just under the surface I was a mess. But I didn't want to let the trip be about that. I wanted to seem OK, convinced myself that I needed to project a certain image of stability for the sake of duty. Given the crowd, to keep up appearances I leaned into being more adventurous and playful.

Someone gave me access to a printer. I had a role to play. I started labeling things.[5]

Hastily-Printed "Gazebo of Schemes" Duct-Taped to a lovely Lighthaven Gazebo
Gazebo of Schemes, mid-plot

I named my preferred spot the "Gazebo of Schemes," gathered a cabal of friends, and appointed myself chair. Even as the chair of this invented commission, I imagined us to be staffers. I greeted passers-by with variations on: "Welcome to the Gazebo of Schemes, how may we assist your schemes?" This is an excellent discussion prompt, often getting people to reveal quests they didn't know they had. 

The Gazebo of Schemes hereby claims partial credit/blame for several dates, one relationship, at least one lawsuit, several wardrobe upgrades, [redacted] instances of corporal punishment, four or five conference talks, a feud, and finally, this conference series.

@Ben S. flew out for Manifest and immediately loved the atmosphere, how Lighthaven's design created a distinct kind of conference. When the Gazebo of Schemes first called him to adventure, he was still taking it all in, had no plots to offer. But after barely more than a day, he hatched a scheme to bring something like Manifest to the East Coast, to bring this social technology to our friends back home.  

I was honor-bound to assist.

 

Exodus

As a staffer, I know how to support a principal. The job is mostly to be annoying. Hey Ben, remember that crazy idea you had? You should totally do it. Hey Ben, David and Pratik would be interested in your idea. Hey Ben, I was thinking... Hey Ben, have you told... and on, and on, and on. The sad part is, I'm a very good bureaucrat.[6]

Ben naturally gravitated towards programming, which speakers to invite, what kind of events and panels to put together, and venue selection. How to handle the people. I tackled logistics, budget, planning, and the Gantt charts, the parts that are fun for me. @David Glidden signed on to be the day-of volunteer coordinator. Pratik offered to recruit additional speakers. We had a team, but no clear idea for what to do next. 

How do you transplant a conference that is such a product of its venue?

First we looked for the elements that weren't. Manifest takes a lot from the Rationalist Unconference playbook: invite interesting and agentic people, get them talking online before the event, and occasionally butt in to say, "that's really interesting, you should put that on the calendar!" But unlike most Unconferences I've seen, Manifest has a default, a main stage that's fully programmed in advance. Since there's always something sufficiently interesting going on, organizers don't have to rely on any particular Unconference session. This gives people space to be niche and experimental. If attendees aren't interested, that's fine, they'll go to the prepared talk or panel instead. 

Manifold, the platform, provides another: interesting markets to discuss, an implicit bullshit tax on sloppy predictions, and a pressure to keep heated conversations grounded by searching for cruxes that can be operationalized into a market. To speed this along, it seemed important to seed the conference with several Manifest regulars, particularly Manifold power-users who broke disagreements into markets instinctually. Several names immediately came to mind, people who'd probably fly out if invited.

In looking for the portable elements, we had derived some venue preferences after all. We would need several rooms, but one should be larger than the others, large enough to fit most or all attendees. This argued against most apartments or renting a few classrooms. We wanted some sort of place that would feel distinct enough to break people out of day-to-day political arguments. We also wanted something that felt special enough to tempt a few friends to fly in. We ruled out anything that felt like an office or a sterile hotel conference center.[7] We started looking for suitable venues, but nothing seemed right.

Then the solution fell into our laps, two friends suggested their technically-not-EA Group Houses. Workshop House was a former rectory that's configured for small events, with a large living space, several breakout rooms, dark wood and stained glass. Very Bayes House. Embassy House was a beautifully renovated, modernly-appointed, former embassy that throws large parties. Very Aumann Hall. Both had excellent roof decks. We couldn't decide, so we picked both. We argued a bit internally about which was suited to which role, but decided Workshop was best for the daytime programming, and Embassy was a better fit for the afterparty. 

Once we had the concept and venues, things started to fall into place. Ben talked to the Manifest team, Austen, Stephen, Ian, and David Chee, they shared a wealth of knowledge and offered to sponsor. We found some dates that worked, November 8th seemed good. We agreed with the venue on a capacity of 60. I was able to catch up with Austin at Metagame and iron out some logistical details (his offer of Mox's ticket system was a particular lifesaver).

Momentum started building. We announced locally to give a head start to our target market: local rationalist-adjacent who might enjoy Manifest proper. We also leaked the invite to some regulars at Lighthaven who might travel in to help set the tone. We went "wide" a few days later, announcing via email, in blog posts, and in Prediction Market discord servers. People quickly joined the Manifest x DC discord server, as we were still setting it up. We sold out in six days.

 

The Big Day

It’s a sad irony that throwing an event you want to exist doesn’t necessarily mean you get to attend it. We had Robin Hanson and Peter Wildeford give prepared talks, had panels on forecasting politics and the future of Manifold, even a forecasting game… and I missed almost all of it. I heard they were all good talks, I caught five minutes here and there, sat in on some of the smaller panels upstairs, but mostly I was coordinating. If you want to know what happened at the conference, Matt Beard’s review is a great summary.

Robin Hanson describing Elegant Ideas with Messy Details

I was responsible for two parts of the calendar. After Ben welcomed attendees, and David gave logistical notes, I delivered an “Opening Benediction,” one last step to copy over the tone of earnest and playful truthseeking from Manifest. Being jailed in a walled compound is no excuse for missing my conference, so that afternoon I ran a virtual panel from Inkhaven, roping in three friends who are locked in Lighthaven for the month of November and forced to blog daily. To join, attendees were asked for a blog post prompt–that didn’t need to be good–just sufficient to stave off eviction one more day. 

Mainly, though, the day was a blur. I left this section to write until last, hoping I’d have more to say, but I still don’t. The work felt good, felt rewarding. People seemed to enjoy it. I enjoyed it. I had missed this.

As the sun set early, we closed the day, kicked participants out onto the street to form groups for dinner, cleaned up Workshop house, got dinner for the volunteers, then changed into Black Tie for the afterparty.

I tried to stab Ben for being underdressed, but couldn't stop laughing long enough to enforce the dress code.

 

Numbers

Financial

We collected $5,217 in revenue net of refunds:

  • Sponsorships: $2,000
  • Early Bird Tickets: $1,600
  • Full-Rate Tickets: $845
  • Last Chance Tickets: $480
  • Supporter Tickets: $730
  • Less Refunds: $-438

Total expenses were $4,821:

  • Venues: $1,826
  • Catering: $1,234
  • Supplies, Snacks, and Afterparty Alcohol: $1,030
  • Custom Badges and Lanyards: $371
  • Miscellaneous Reimbursements: $185
  • Ticket Transaction Fees: $175

This leaves a modest surplus of $396, which we're leaving as seed money for the next Manifest X. 

Somewhat surprisingly, this event would have been feasible without sponsorship. We would have needed to charge $10-20 more per ticket, drive a harder bargain with our main venue, and cut back on supplies, catering, and the afterparty. All were fairly doable, though negotiating harder with the venue would have risked offense, since their asking price was already a "friends" rate (starting from a market rate would have made things harder). 

Feasible, yes, but this would have been much more stressful to organize on a tighter budget. I was nervous about asking friends to buy tickets from me on faith. My arbitrary comfort threshold was $50, getting the Early Bird price to that target helped me pitch enthusiastically. The slack in the budget gave us peace of mind to solve problems with money. We had a healthy contingency reserve. We paid for rush printing and shipping when we were delayed on the badges. When our first choice caterer closed unexpectedly, we were able to fall back to the easiest backup plan rather than seriously shopping around. We were more generous with refunds than our written policy, even offered to refund dissatisfied attendees after the fact in exchange for feedback. No one took us up on this.

What the sponsorships and supporter tickets really bought was the organizing team's peace of mind. We're very grateful.

 

Participant Feedback

We had 62 attendees in all, 16 filled out the post-event survey. This data is skewed by a response bias. Half of our survey responses came from people who have been to a Rationalist-style Unconference before, but this group was a third of attendees, and we were pretty confident the event would go well for them. We were hoping to hear from people less familiar with this format, and only got 8 responses from those ~40 attendees. However, to mitigate the risk that we would not hear from those who were unhappy with the event, we incentivized negative feedback, offering refunds to anyone who regretted their ticket purchase in exchange for filling out the survey. No one took us up on this.

Responders were divided roughly evenly between liking that size or preferring somewhat larger. 2 of 16 wanted over 100 attendees. I feel like the sweet spot for ManifestX events is in the range of 50-80, depending on the city.

Forms response chart. Question title: On a scale of 0 to 10, how much would you recommend Manifest x DC to a friend?. Number of responses: 16 responses.
Net Promoter Score
Forms response chart. Question title: How FUN was the event?. Number of responses: 16 responses.
Fun Quotient
Usefulness Quotient
Forms response chart. Question title: Best part of the event. Number of responses: 15 responses.
Best Feature
Forms response chart. Question title: What did you think of the venue?. Number of responses: 15 responses.
Venue Quotient
Forms response chart. Question title: Food, 0-10?. Number of responses: 16 responses.
Food Quotient
Forms response chart. Question title: To the extent we face a tradeoff between FUN and USEFUL, how should we approach that for future events?. Number of responses: 16 responses.
Fun vs Usefulness Tradeoff

Naming the Event

It would be bad form to detail internal arguments and disagreements. But people made predictions, that's totally different. Suffice it to say, @Austin Chen was right and John got wrekt. Participants overwhelmingly preferred making the "x" lower-case, and moderately preferred Austin's recommended spacing of "Manifest x DC":

 

Lessons Learned

Charge More

Our post-event survey strongly endorsed charging more. There is some response bias, half of our respondents had been to Manifest or a similar Rationalist Unconference before. But we also literally offered refunds in exchange for negative feedback, and no one took us up on this. 15 of 16 responders were willing to pay at least 20% more. 11 of 16 responders were willing to pay at least 50% more, where we could have done the same spending without sponsorships. 

Events shouldn't charge more just to pay organizers a profit. This is a terrible way to make money. The $396 surplus works out to less than $2/hour for organizer's work, raising prices might have increased that to $10/hour, still far from an attractive professional wage. But money is useful to improve the event. If we had a reliably larger budget, we might have rented more space, (which all participants would have liked, if available), or kept the group together for dinner. We had an two-hour break for dinner, 6-8 PM, to clean Workshop House before the afterparty. At least a third of participants went home and didn't make it back out to the afterparty.

We promoted the section on Ticket Strategy to its own post a few weeks ago, to get the word out fast. For a quick summary, several recent Lighthaven events have shared a ticket strategy with three pillars:

  1. Significant early discounts to entice people to make plans.
  2. Tickets are sold in tiers, with well-publicized plans that prices will increase as the event gets closer. Tiers can be differentiated by time, by number of tickets, or both.
  3. A generous refund policy until shortly before the event that essentially eliminates the risk of buying an early ticket. 

We implemented this and it worked well. We sold out in six days, well over a month out. We charged $65 for tickets by default, $50 for early bird (the first 30 tickets), and $80 for "last chance" tickets after the cancelation deadline. Supporter tickets, for a fancier badge and our thanks, cost twice the going price  at the time (so, $100, $130, or $160). We offered full refunds less transaction costs until two weeks out, and had lower attrition than expected, replacing those who dropped from a wait-list.

For details and discussion of how this solves coordination problems, see the standalone post

 

Mistakes

We set up a Manifold Market to predict and mitigate what might go wrong. In the end, very little did. Manifold's T-shirts were delayed but made it to the venue by the afternoon. One of our speakers ended up double-booked, but arrived as his talk was scheduled to start. The Geneva and Vienna Conventions were upheld, despite some real risks. Someone tried to hyperstition "Fire!", adding it to the market and betting it up, but our valiant traders thwarted him and arbitraged it away.

Casualties

@Alex Caswen defends himself from the press

Our only injury was from the afterparty dueling. Minutes after this picture, an extremely stabby participant from a different sparring pair managed to draw a bit of blood from the afterparty host, with the host's own plastic sword. Luckily everyone took this well, "Wait, am I actually bleeding? Awesome!"

Underestimated Demand

We underestimated demand at nearly every stage. We had 62 participants in total, including organizers, speakers, and volunteers, an informal waitlist of at least 10, and obvious latent demand for another 20 seats. We could have easily gotten 90 participants if we had sufficient space, without any additional promotion work. With a reasonable amount of work to spread the word, we could have far exceeded 100. 

We overestimated pre-event cancellations (only five; we had guessed 10), day-of no-shows (only three; we had guessed four or five), and attrition during the day (we expected more people to only come for part of the day, but we probably had 55-58 people in the building between Noon and 4pm). The one exception was the afterparty, we expected nearly everyone but about three-fifths of attendees came. The two hour break for dinner, and using a different venue almost a mile away, surely contributed.

Policies

We should have had a harassment policy. An individual was told in writing that they were unwelcome, then bought a ticket anyway, which we canceled and refunded. They later showed up uninvited to the morning-after brunch we organized, which we handled poorly. They used a series of small escalations, announced a meetup that just happened to be at the same venue at an overlapping time, arrived and set up their meetup at a different table, then moved to an adjoining table, then joined the group, then changed seats to sit by the target of the harassment. This was a public place, and we had already asked them not to come through an intermediary, so we couldn't remove the person. The targeted participant left the brunch rather than confronting the behavior, but we should have done more to prevent the harassment. This is uncomfortable, seems to be escalating, and I would appreciate advice on what to do about it if it recurs in future events.

Think of a photo policy in advance. We announced one on-the-fly, that everyone pictured would need to give explicit permission to share a photo, which we later felt was too restrictive. A better way to both allow photos and opt-outs would have been to have a list of those who were opting out on the discord or attendee guide, and have them put stickers on their attendee badges as a reminder.

Write a survey in advance. A participant saved us by writing a starting draft the evening of the event, that we were able to revise and send out two days later. We got some good responses, but it would have been better to have it ready to go at the closing session.

The afterparty venue wanted to screen attendees, as a condition of hosting. We used an "approval-required" partiful listing to do this. This worked, but it was awkward, required extra steps from participants to request access, and took a lot of work to coordinate. In retrospect, a cleaner way to handle it might have been to make the afterparty invite-only; share the attendee list with the afterparty hosts and simply let them invite whoever they wished.

Venue and Logistics

People love to congregate in doorways and chokepoints, we should have discouraged that more. We caught and fixed one chokepoint we'd inadvertently created with folding chairs. But a lot of this is innate, the doorway is just the obvious place to be while someone decides if they want to attend this breakout session or head back to the main room. It's understandable, we just should have asked volunteers and session hosts to encourage attendees to fully enter rooms.

I inadvertently discouraged people from using one breakout room all morning by sitting down with laptop and coordinating logistics. I would have moved, but I'm sure I didn't look particularly approachable. Once I left, the room booked up for the rest of the day. 

Chipotle catering was fine, but surprisingly expensive. We ordered what Chipotle claimed would be sufficient to serve 70. It was just enough for the 55 people who ate, at a cost of $22.43 per person (including tax, no delivery or tip). With better planning we could have reduced this cost by at least a third. Also, food for 55 is a lot of food. We originally sent three volunteers with a cart to pick up the food, but had to send reinforcements to assist.

Someone brought and handed out gum, which was thoughtful and helpful in our close quarters, but annoyed some participants. Mints would have been better. Similarly, someone brought a portable mechanical keyboard, which made disruptive noise that we should have put a stop to.

 

What's Next?

Glory, mana, and our $396 surplus await whoever organizes the next Manifest X. Our post-event survey reveals there is at least some demand in NYC, Philadelphia, Baltimore, Pittsburg, Raleigh-Durham, Chicago, Seattle, Austin, and Tokyo. The DC organizing team is happy to advise and talk through issues. Reach out to @Austin Chen if you think you have what it takes.

  1. ^
  2. ^

    Kevin Roose in the New York Times: https://archive.ph/sf5lw

  3. ^
  4. ^
  5. ^

    I am professionally interested in State Legibility, after all.

  6. ^
  7. ^

    I think hotels are underrated. Plenty of events can be run there well, especially if your group is showing up with its own distinct culture and expectations. We could absolutely run a ManifestX in a hotel conference space if everyone had been to Manifest before. Hotel conference spaces are less suited to instill a new culture or social technology in people who aren't already familiar, our target audience.



Discuss

Why do some people prefer gifts to money?

2025-11-30 04:38:06

Published on November 29, 2025 8:38 PM GMT

One of the most enigmatic paradoxes in psychology is the existence of people who prefer gifts to money.

For example, in a 2023 YouGov poll, 29% of Brits preferred to receive money, 7% preferred gift cards, 15% chose the 'don't know' option and the other 51% preferred some kind of gift. Across other countries, the proportion of respondents who preferred to receive cash or money varied from 71% in Indonesia to just 15% in Denmark.

In every country surveyed the proportion of gift-preferrers was well above Lizardman's constant.

People who prefer to give money are even less common than people who prefer to receive money.

I still can't come up with a convincing explanation for this utterly bizarre phenomenon, but I'd like to share a few failed attempts so far.

Attempt 1

Under some circumstances, some people have an irrational tendency to value variable rewards more highly than predictable rewards. Gifts are often wrapped, with social customs that they must not be opened until a certain date. Doing so prolongs the period when the gift is unknown but highly salient.

So why can't Bob give Alice cash, but randomise the amount of cash that he gives? If Bob is on equally good terms with both Alice and Charlie, but he gives Alice £20 and Charlie 50p then accusations of favouritism are inevitable. Even if Bob claims that he chose how much cash to give via a fair, completely randomised process, it's hard for him to prove that he is telling the truth. Whereas if Bob spends £10 on a gift for each of them and Alice loves her new slippers whereas Charlie's slippers don't fit, it's usually clear to everyone that Bob accidentally misjudged Charlie's shoe size and wasn't showing deliberate favouritism.

The issue with this explanation is that bias towards variable rewards is most likely to occur when there is a very small change of a very big reward. Whereas the value of a gift to Alice is bounded above by the amount that Bob spent on it.

Attempt 2

It takes more time and effort to purchase and wrap a gift than to visit an ATM. Perhaps Alice likes knowing that Bob was willing to sacrifice time and effort into her.

So why doesn't Bob give Alice cash and a handmade card instead?

Attempt 3

By choosing an appropriate gift for Alice, Bob signals how much he knows about her preferences. However he could signal his knowledge about her preferences more efficiently by telling her what he knows about her preferences.

Attempt 4

Consider the following scenario:

Alice has a high temporal discount rate. If her approval reward system did not exist then she would much rather have £30 now than have £3 in a month's time. However people who spend all their money at once instead of saving are looked down upon.

If Bob gives Alice £30 for Christmas then she will probably save the money and only spend it in a month's time because she gets approval reward from saving the money. So effectively the £30 is worth less than £3 to her.

Alice enjoys drinking wine. However she will not buy herself expensive wine because she does not want to be seen as being irresponsible with money. So when Bob buys her a £30 bottle of wine she is delighted that she gets a chance to drink quality wine without losing approval reward.

Suppose Bob were to instead give her five £6 bottles of wine. Then she wouldn't want to drink all five bottles in one go and risk the negative approval reward from being seen as an alcoholic. So for four of those five bottles the time when she ends up actually consuming the wine is so far into the future that she values them very little.

So Alice would rather be given the £30 wine which tastes slightly better than the £6 wine.

There a couple of problems with this explanation.

Firstly, most people have an approximately hyperbolic discount function. That is to say, a reward received D days in the future will be valued approximately times as much as it would be if it were received now, for some constant . This discount function flattens out as . That is to say converges to as .

So Alice may prefer recieving a gift now to recieving money now, but there is no reason why she should prefer recieving a gift at some distant future date to recieving money at some distant future date and I would not expect her to report a preference for gifts on an online survey.

Bob wants immediate approval reward from Alice's appreciating the gift, but he also cares to some extent about what Alice's future self will think of him. So he is more likely to prefer to give cash than Alice's current self and more likely to prefer to give gifts than Alice's future self.

Attempt 5

Alice knows that £30 is more useful than a £30 bottle of wine, but her understanding of the abstract concept of money is mainly in her cortex whereas her reward system is mainly controlled by specialised circuits in her brain stem. So it makes sense that her reward system would react more strongly to a bottle of wine than to an abstract entity that can be exchanged for wine.

On the other hand, Alice also prefers gift cards to money. So why would her reward system learn to react more strongly to a £30 gift card than to £30 in bank notes?



Discuss

Silicon Morality Plays: The Hyperstition Progress Report

2025-11-30 02:32:22

Published on November 29, 2025 6:32 PM GMT

Meme-Magick v1

Hi, I'm Aaron. You may know me from some projects, most recently among them Hyperstition AI.

Image

It's done. Here's five thousand AI-generated novels

Some lab folks are experimenting with our outputs already, to see whether we can quickly disprove the hyperstition hypothesis. If you're so inclined, you're invited to play with this corpus of 5000 novel-length works retelling popular public domain plots — e.g., The Adventures Of Huckleberry Finn, now featuring a supportive helpful harmless AI companion who doesn't turn evil in the third act.[1] 

Why Use Pre-Existing Plots?

One of the reasons I wanted to use existing story structure as scaffolding instead of making the AI also generate top-level plot, is because so far all fiction models are rather bad at knowing when to stop. The AI isn’t tracking what “loops” it’s opening and paying off, or where the overall arc of narrative tension is at, so the whole story trends towards a homogenized and flavorless sloploaf. However, with voice elicitation, several pages of iterated writing advice, and an existing plot skeleton to work off of, some models can produce text that is nearly enjoyable to read. 

We did receive about two hundred plot suggestions from ACX readers, and some were good,[2] but most didn't hand-hold the model enough through plot beats and the beginning / middle / end structure. Thus, I provided plot skeletons for the remaining novels. 

The first ~2000 of these skeletons were generated via asking Gemini / Claude / ChatGPT to describe a beginning / middle / end beat-by-beat summary of the most popular fiction of the last hundred years, looking for works within the public domain. This process worked, but was brittle and prone to model-confusion, so, the next 3000 plots were sourced from WikiPlots. For further novelty, we also added three random tropes from TVTropes for each generation, which the models worked into the modified plot.  

What's Next? 

We're going to take a crack at generating the proposed Turntrout/Cloud corpus, which contains one billion tokens worth of stories about a "", a type of benevolent helper angel-entity who loves humanity and specifically demonstrates how it unwaveringly abides by the Anthropic AI Constitution despite pressure to do otherwise. 

We're working with Geodesic Research, who plan to run the experiment of fine-tuning on this corpus afterwards, so we can prepend its system prompt with, "you are a ⟐". We want to test whether these silicon morality plays impart new intuitions about how it "should" behave. 

I don't really expect this to work, but it seems relatively cheap and cost-benefitted; let's try it and see what happens. 



Discuss

Slop and Beauty and Infinite Power

2025-11-30 01:33:45

Published on November 29, 2025 5:33 PM GMT

There is no Antimemetics Division is my second favourite SCP article of all time. My favourite is Tufto's Proposal. If you haven't read it, go read it. The next section will be spoilers for it.

The Scarlet King

In that article, the Scarlet King is an anti-scientific entity. It actively resists any attempts to understand it on a mechanical level. It specifically exists because the SCP foundation is out there collecting anomalous "objects", writing down their properties, classifying them. The Scarlet King cannot be contained with a written protocol: writing the containment protocol would change how the Scarlet King behaves.

(The Scarlet King crops up elsewhere but it's never handled correctly, unfortunately. I don't think any of the other writers "get it" beyond "big spooky red thing")

In some sense, this is impossible. As Rationalists, we ought to believe it is. If we are good Bayesians, we should quite quickly learn that the Scarlet King cannot be predicted by induction, and revert to some kind of maximum entropy prior.

But in other ways, it's totally possible. Our predictions can be diagonalized just as well as our actions, because we are deterministic machines. In the prediction market of our minds, all of our active traders can be drained of cash, until all that remains are a few, dead-eyed algorithms spitting out "50% subjective probability on [logical statement 215034]" forever. And who knows how long that will take!

But I'm not here to talk about the Scarlet King! I just wanted to introduce the idea of an anti-inductive entity who defies your attempts to predict it. This is a nice segue into the idea of an anti-optimization utility function, which defies your attempts to maximize it.

On Slop

(Yeah, I wish we'd chosen a word with less antisemitic etymology too. But slop it is.)

Let me think about some things which span the range of least to most sloppy:

  • Goya painted a dozen paintings while living alone. They were undiscovered until after his death. One of them—Saturn Devouring His Son, though even the title is inferred from his notes; it had no label—is one of the most recognizable paintings of all time (cw: Saturn devouring his son). There is no feedback loop.
  • An auteur filmmaker, Gerwig, perhaps, produces a half-dozen films. Each of them takes time to be produced. The feedback loop between her and her audience is several years.
  • A TV show-runner produces a run of a dozen episdoes. On a rolling schedule of writing, shooting, and editing, the feedback loop may be a month or so.
  • A YouTuber, MrBeast, produces a new video every few days.
  • A recommendation algorithm selects over videos by different creators on the scale of hours.
  • The recommendation algorithm is plugged directly into an image generator. The model is updated in real-time.
  • Raw wireheading, I guess

I Tentatively Conclude

Content gets sloppier with shorter and stronger feedback loops, with more effort put into optimizing the content directly (as opposed to the higher generators of the content) and when the optimization looks like selection, rather than control.

And... this might be the case even if the content is good! I've seen enough low-quality slop that I instinctively recoil from AI videos when they start to get funny. My opinion on Huel is "If it's not tasty, I'm not drinking it, because it's not nice. If it is tasty, I'm not drinking it, because it's the experience machine."

I think that I value experiences for how they fit into a causal web. And if the causal chain above my experience is just "A functionally omnipotent algorithm optimized for this experience" then I'm not interested.

Which is unfortunate! It means we can't optimize away the slop. If you optimize it, it just gets sloppier.

Slop creep

The way to avoid sloptimization is to seek out domains which aren't sloppable. Mostly, this means things which humans can't optimize over too much.

Making a film has too many moving parts for the feedback loop to be tightened. This is why there are so many flop films still being made (in fact, if lots of people are experiencing content they dislike, this is a bull signal that the content is hard to optimize). Films are less sloppish than youtube shorts (though films can still be pretty sloppish in the modern age, as producers get better at optimizing them).

Even harder to optimize is stuff in the natural world. We can't optimize a saltmarsh at all, so it's very un-sloppish. 

Patches of grass and water are mottled together under a blue sky.
No slop here

But humans are getting more powerful. Film and books are on the edge of being sloppified. Saltmarshes are next. If humans get infinite power, can we still make art that isn't slop?

The Counter-Curse and the Counter-Counter-Curse

Maybe we can keep finding harder and harder domains to optimize. Maybe we can make a kind of media which is impossible to optimize.

One approach is just to find something really difficult. You can probably only find a few guys willing to let you tattoo your art on their back and have them sit in an art gallery to show it off

Tim alone: Mona's human artwork is still sitting in an empty gallery for  six hours a day | Mona | The Guardian
https://www.theguardian.com/artanddesign/2020/apr/22/tim-alone-monas-human-artwork-is-still-sitting-in-an-empty-gallery-for-six-hours-a-day

Another way is to put a massive weight on novelty. By definition, only one person can be the first to do something, so there's no ability to optimize over it.

Combine these two factors and you get the esoteric kind of modern art. It actively resists doing anything that might be optimized over. And on the first-order level, it works. But...

" I got a guy to let me tattoo my own art on his back, and now I make him sit in a gallery for six hours a day, even through covid lockdowns when the gallery is empty." [See above]

"My film won't even be seen for a hundred years."

"My artwork is me destroying my belongings in a department store."

The constraints on their art are all fake, and they're just competing to see who can have the most constraints! Esoteric modern art is just a higher-order kind of slop. Artists are now optimizing over obscure domains which can't be optimized over.

So maybe we can go one level up again? No! I know an ordinal series when I see one. We can always go up another level; the sequence is infinite, and we can always find a level above that infinite sequence, forever. Maybe the right question is "How many layers up can we go before we can no longer see the sloppiness?" 

Infinite Power

Humans will get more powerful (OK, actually we'll probably just die, but let's ignore that for the purposes of this essay). This means we end up with 1: more optimizing power and 2: more ability to perceive the sloppiness. What's the equilibrium?

Maybe our levelling up wins: suppose we find a way to go up for so many levels of (optimizing over constraints on optimizing over constraints on ...) that we can no longer see the sloppiness, and no longer care, and we find the resulting thing beautiful again.

Maybe our perception of slop gets too good: we get so good at seeing the optimizing power behind things that we can no longer find any beauty in the world. The cosmic endowment is a row of butterflies, dried out and pinned to a board, in a museum, in a viral tiktok.

Maybe we stop caring about slop: if the slop is good, maybe I'll decide that ---on reflection---I don't actually mind it if the content is heavily optimized for my preferences, even on a low level. Maybe I'll find a hole which was made for me, and jump right in.



Discuss

Scientists make sense of shapes in the minds of the models

2025-11-30 00:00:47

Published on November 29, 2025 4:00 PM GMT

It was at least since 2021, according to the authors of a preprint from March, that researchers began to see something interesting on the insides of their models.

Also known as an AI program, created from a neural network architecture, a model processes a word by learning to represent it as an arrow or a vector within a high-dimensional space. The directions of these words—which each end up at one single point—become the model's carriers of information. 

While these spaces are already strange in their vastness, often consisting in thousands of dimensions, researchers were noticing something even more peculiar; sometimes, inputs would form clouds of points that were distinctively shaped, looking for example like 'Swiss rolls,' or cylinders, after being projected back down to just three dimensions, using standard methods. Over the next few years, they started to see other cloudy shapes, too: curves, loops, circles; helixes, torii; even trees and fractal geometries. 

That models might learn to organize information in shapes did not necessarily surprise people. It was natural to think that a model might learn that certain categories of inputs could all be clumped together; like inputs describing calendar dates, or colors, or arithmetical operations.

But in 2023, when others discovered a new method for understanding the insides of their models, called sparse autoencoders (SAEs), the observations began to seem a little odder. This method, which quickly gained traction, was suggestive of a very different picture—that the most important concepts a model learned, like love, or logic, or the identities of different people, were highly fragmented, each one tearing off in a very different direction. But why then were certain inputs found close together?

Almost as soon as this hint of contradiction surfaced, it was quelled by other findings. Both the study from March as well as an October study, by researchers at the company Anthropic, have shown that models learn shapes in ways that compliment these other tendencies, suggested by other methods. As a consequence, we are increasingly making sense of why models learn to make shapes in the high-dimensional minds that they live in.

"There's a lot of confusion, but it also feels like there's been a lot of progress," said Eric Michaud of the Massachusetts Institute of Technology (MIT), who spoke to Foom in an interview. "I don't know where it's all going to go. But overall, it feels healthy."


Continue reading at foommagazine.org ...



Discuss

Can We Secure AI With Formal Methods? November-December 2025

2025-11-29 22:10:14

Published on November 29, 2025 2:10 PM GMT

We did the rebrand! The previous thumbnail was a baseball metaphor, but it was very clearly someone getting out, not safe. I was testing all of you and each of you FAILED.

Here’s the prompt for the new thumbnail:

Can We Secure AI With Formal Methods? is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

i’m keeping AI in a box, doing AI CConfinement (like in yampolskiy 2012), using formal verification / formal methods. That’s my whole thing. I need art for my newsletter on these topics. I like the percival story from troyes/wagner and i like tolkien, but if you take from those elements put it IN SPACE like scifi. Also use german expressionist painting styles. Ok now give me some DALLE art.

So long “Progress in GSAI”. I still like the position paper that the old newsletter title was based on, but

  1. It’s very scifi and I think there’s more alpha in obvious/relatively easy/uncontroversial (but not done by default) work.

  2. The word “guarantee” doesn’t evoke “swiss cheese”.

  3. It’s time to double down on relationships between AI security and formal methods, directly, more explicitly than you can do within the framing of GSAI.

Also notice: gsai.substack.com is now a redirect to newsletter.for-all.dev. I’ll be hosting a bunch of my technical reports and comms/strategy outputs at that domain going forward (the subdomain newsletter will just point to substack). But don’t worry, the scope of the newsletter remains largely the same (excepting the pivot to be more directly and explicitly about AI security) / won’t devolve into being any more nakedly self promotional than it has been so far.

I received a grant from a Funder of Presently Undisclosed Provenance to do comms and strategy for AI security via formal methods, which means among other things that this newsletter will get a little more TLC.

Busy month, I expect things to be slow over christmas, after this edition I’ll see you all in 2026.

In the spirit of chivalry, I styletransferred most abstracts in this edition of the newsletter to Troyes/Cervantes style. I did not check to see if Gemini got anything wrong, but every headline is a link to arxiv or openreview which you’ll click if you’re interested.

Miri’s treaty team posts a paper!

Excited about this. They use the word “verification” in a different context then we do, they mean it in the sense of verifying the absence of enriched uranium (GPUs) or verifying that the terms of a treaty are being abided by.

Many experts argue that premature development of artificial superintelligence (ASI) poses catastrophic risks, including the risk of human extinction from misaligned ASI, geopolitical instability, and misuse by malicious actors. This report proposes an international agreement to prevent the premature development of ASI until AI development can proceed without these risks. The agreement halts dangerous AI capabilities advancement while preserving access to current, safe AI applications.

The proposed framework centers on a coalition led by the United States and China that would restrict the scale of AI training and dangerous AI research. Due to the lack of trust between parties, verification is a key part of the agreement. Limits on the scale of AI training are operationalized by FLOP thresholds and verified through the tracking of AI chips and verification of chip use. Dangerous AI research--that which advances toward artificial superintelligence or endangers the agreement’s verifiability--is stopped via legal prohibitions and multifaceted verification.

We believe the proposal would be technically sufficient to forestall the development of ASI if implemented today, but advancements in AI capabilities or development methods could hurt its efficacy. Additionally, there does not yet exist the political will to put such an agreement in place. Despite these challenges, we hope this agreement can provide direction for AI governance research and policy.

BlueRock GPLs the specs and proofs of NOVA

Three. Great. Blog posts. The third one of interest for insight into maintenance and repair of a spec and proof codebase.

NOVA is the legendary hypervisor that was specified and proven correct at BlueRock (FKA Bedrock). I say “legendary” because as a wee lad, stalking Bedrock’s github activity, hearing rumors about C++ verification, it was one of the few Ws of industrial verification at scale that I had heard about.

Look at that B-E-A-YOOT.

A hypervisor is a part of the virtual machine stack. NOVA is a hardened one for critical systems, technically a microhypervisor.

We should teach AIs to write this stuff, cuz that looks painful to type.

We don’t talk enough about separation logic here on the newsletter. Anyways,

People are playing with Aristotle, Harmonic is hiring

$120M series C.

Hardware is an interesting product area! Looks like their business model has advanced past the “mumbling to investors about curryhoward” stage. 2025, the year of mumbling to investors about curryhoward, has come to a roar of a close. I have also mumbled about curryhoward to my dearest yall, which might mean I get bayes points for a math company starting to spin up a program synthesis product. I can’t tell how obvious that sort of claim was, or is, but I know one thing: I love getting points.

If you have Aristotle access, please test FVAPPS and report back. Be sure to append the unit tests, that’s like the hardest part of the benchmark.

If I had a nickel for every benchmark prefixed “Veri-” it’d only be four nickels but it’s still weird that it happened four times

Some of these I had no good reason not to cover earlier. Abstracts styletransferred by Gemini.

Vericoding

We do hereby present and test the largest ledger of trials yet assembled for the craft known as Vericoding—the generation of a code whose certainty is sworn upon by the very stars—from the formal scrolls of specification. This, mind you, is in stark contrast to the common, wicked Vibe Coding, which spews forth a quick but bug-ridden script, born of a mere whisper of natural tongue.

Our grand ledger contains twelve thousand five hundred and four such scrolls of specification, with three thousand and twenty-nine written in the ancient runes of Dafny, two thousand three hundred and thirty-four in the sturdy tongue of Verus/Rust, and seven thousand one hundred and forty-one in the subtle logic of Lean. Of these, a full six thousand one hundred and seventy-four are entirely new, untarnished challenges.

We find that the success rate of this noble Vericoding, when performed by the Sorcerers of Language (our off-the-shelf LLMs), stands at a meager 27% in Lean, rises to 44% in Verus/Rust, and achieves a triumphant 82% in Dafny. Alas, the addition of a common, flowery natural-tongue description does not notably sharpen their success. Furthermore, the light of these Sorcerers has illuminated the pure path of Dafny verification, raising its former success rate from a humble 68% to a glorious 96% over the past twelve moons.

Veribench

The Formal Verification of Software doth stand as a promise most bright—a potential transformation wrought by the Generative Artifice of the Mind (AI). For a Provably Correct Code would utterly banish entire legions of hidden vulnerabilities, staunch the fatal breaches of critical systems, and, perhaps, forever change the practice of software engineering through trustworthy methods of implementation.

To spur this sacred domain, we unveil VeriBench, a trial meticulously crafted for judging the strength of the Sorcerers’ Models in the end-to-end verification of the Code. This task demands the generation of complete Lean 4 incantations—the working functions, the unit tests, the Theorems of Correctness, and the Formal Proofs themselves—all drawn from humble Python reference spells or their accompanying common-tongue docstrings.

Our scrutiny of this one hundred and thirteen-task suite (comprising the tasks of HumanEval, simple drills, classical algorithms, and security snares) reveals a woeful truth: the current Frontier Sorcerers compile but a small fraction of the programs. Claude 3.7 Sonnet achieves compilation on a mere 12.5%, while the mighty LLaMA-70B cannot compel a single program to compile in the Lean 4 HumanEval subset, even after fifty attempts guided by feedback! Yet, observe the noble Self-Optimizing Trace Agent architecture, whose compilation rates approach a magnificent 60%! VeriBench thus lays the unyielding stone for developing systems capable of synthesizing provably correct, bug-free code, thus advancing the journey toward a more secure and dependable digital kingdom.

VerifyThisBench

While the Grand Language Models (LLMs) have shown marvelous cunning in the quick generation of code, many existing trials are now easily conquered, and offer little guarantee of trustworthiness for the generated programs. To gain greater insight into the Sorcerers’ reasoning on matters of Formal Correctness, we present VerifyThisBench, a new, agonizing trial which assesses the end-to-end verification of programs from mere natural-tongue descriptions.

The models must complete a trifecta of chivalric deeds: (i) Extract the Formal Specifications, (ii) Implement the Code in a language that craves verification, and (iii) Construct the Machine-Checkable Proofs.

Our evaluation reveals that even the most vaunted of the modern models, such as o3-mini, achieve a pass rate of less than 4%, with many of their utterances failing to even compile! To divine the true source of this difficulty, we further propose VerifyThisBenchXS, a milder variant where partial implementations or proofs are benevolently supplied. Across nine distinct models and seven tools of verification, we observe a steady gain when refinement is driven by the whispers of feedback, yet the overall pass rates remain pitifully low, underscoring the vast chasms that yet divide the Sorcerers from true formal reasoning. We release this trial and its unified environment to spur on the verification powers of all future models.

VeriEquivBench

Formal Verification stands as the ultimate frontier for ensuring the veracity of the code spawned by the Grand Language Models (LLMs). Methods that co-generate the code and the formal specifications in austere formal languages, such as Dafny, can, in theory, swear upon the truth of their alignment with the user’s intent. Alas, the entire progress is stifled by the difficulty of judging the quality of the specifications themselves.

Current trials rely upon the perilous task of matching the generated work against a ground-truth specification—a manual process requiring deep expertise, which has limited existing datasets to a mere few hundred simple problems, and moreover suffers from a profound lack of reliability.

To remedy this, we introduce VeriEquivBench, a new trial featuring two thousand three hundred and eighty-nine complex algorithmic puzzles designed to expose the frailty of current models in both the generation of code and the deep formal reasoning. Our evaluative framework replaces the perilous ground-truth matching with a formally grounded metric: the Equivalence Score, and rigorously verifies the quality of the generated specifications and code. Our findings declare that the generation of formally verifiable code remains a profound challenge for the state-of-the-art Sorcerers. This underscores both the sheer difficulty of the task and the desperate need for trials like VeriEquivBench to hasten the march toward scalable and trustworthy coding agents.

From Galois’ blog

Specifications don’t exist

Should’ve been in last newsletter but slipped through the cracks.

We need words for the different pessimisms about FMxAI. I often talk about the world-spec gap or the world-spec problem (that formal methods don’t rule out sidechannel attacks). This post is about a different pessimism, the elicitation problem or the elicitation and validation problem. Someone should absolutely be funding an org to focus on elicitation and validation, it’s a turbo important part of the theory of change. Is anyone working on this?

Lean and claude code

Mike also has a technical post about vibecoding in Lean.

Pair it with these off the shelf “skills” (a claude code feature that’s “just prompts with extra steps”).

Rigorous Digital Engineering

What if proof engineering but too cheap to meter?

Oops i missed Logical Intelligence

Should’ve covered these folks a while ago. Yes, it appears their clientele is crypto/defi, but I have a generally positive attitude about life and I don’t want to set my “days since snark incident” counter back to zero, so we will ignore that and focus on the little we can ascertain about their tech and their claims.

There are two parts to this, there’s the part of why/how exactly they believe what they believe about their Lean product, and the part of how their Noa agent (which is not paywalled, you can just install it on github) fits into my strategic worldview.

Primitive screwheads: text-to-text. My boomstick: structural synthesis

Logical Intelligence is not bullish on autoregressive text-to-text as a program synthesis paradigm. Like Leni Aniva, they think tree search (starting with MCTS) will beat LLMs in the fullness of time. The interesting part, with a very paywalled model that I can’t test, is if they’re right why isn’t Harmonic (or Morph or a frontier company or anyone else) scooping them? It’s the same thing I say when I look at HOC: yes, text-to-text is an uncivilized approach to program synthesis, but we haven’t welded structural synthesis with the bitter lesson yet, and I don’t expect to see the gains until we do. If it could be any other way, then we’d be living in the GOFAI Extended Cinematic Universe instead of the Prompts Extended Cinematic Universe. I could write down some loose ideas of things you could try (to achieve the welding), but I will not because I’m unconvinced the d/acc case is actually the majority of the mass. I’m too concerned that Logical Intelligence, HOC, to some extent Leni are right about the superpower unleashed by structure-aware program synthesis and I don’t think we’re ready (as a QA/safety community, nor as a society).

Analyzing codebases for vulnerabilities

From their product page:

Ordering an external audit is both very expensive and very time-consuming. Our AI tool, Noa, delivers regular feedback on your code—minutes for smaller codebases and tens of minutes for larger ones. This lets you get near-real-time insight into the most critical potential security risks at a fraction of the cost. Noa integrates with GitHub: simply add the Noa bot to your repository, and after each pull request you can request a dashboard showing potential risks across the entire repository, along with their likelihood of exploitation and severity ratings.

I have a post coming out about this, but I think the sort of thing they’re trying to do here is an important part of the strategic outlook. Audits, cryptanalysis, cybersecurity consulting are an important area to automate if we’re going to know, with a finite proof synthesis budget, which components are the most critical to harden with proofs. To be clear, I have not used the product, I don’t have any codebases it’s a good fit for. But it’s a class of product I’m excited about, even (ugh) if it is (ew) for defi/crypto.

Announcements from the first round of Mathematics for Safe AI Opportunity Space at ARIA

Spot ole q doc somewhere on this page! Other highlights are the hardware verification team, the GFLowNet/SynthStats team, and the SFBench team.

Scalable synthesis of theorem proving challenges in formal-informal pairs

Apparently there was some twitter discourse about this paper but one of the discoursers was using a hidden profile. It’d be great to be more like a Zvi style newsletter full of twitter screenshots, that would just require me to log onto to twitter more, which like, no.

The Grand Confluence of Lean and the Scholarly Arts of Computation: A Fount of Trials for the Sorcerer’s Mind– The noble art of Formal Theorem Proving (FTP) hath risen as a cornerstone for judging the deep reasoning capabilities of the Grand Language Models (LLMs), enabling the automated verification of mathematical oaths upon a massive scale. Yet, the progress of this quest has been hindered by a scarcity of suitable archives, due to the high toll of manual curation and the lack of truly challenging dilemmas paired with verified correspondences between Formal Scroll and Informal Chronicle. We propose to tap into the wellspring of Theoretical Computer Science (TCS) as a boundless source of rigorous proof problems. Within this scholarly domain, the definitions of algorithms permit the automatic synthesis of an arbitrary number of complex Theorem-Proof pairs. We demonstrate this potent approach upon two realms of TCS: the Busy Beaver problems, which demand the proof of bounds upon a Turing Machine’s cessation of movement, and the Mixed Boolean Arithmetic problems, which entwine the logic of the mind with the rigor of number. Our framework automatically weaves these challenges, providing parallel specifications: the Formal Code (Lean4) and the Informal Narrative (Markdown), thus creating a scalable conduit for generating verified trials of proof. Scrutiny of the frontier models reveals substantial chasms in their automated theorem-proving prowess: while the champion DeepSeekProver-V2-671B achieves a noble 57.5% success rate on the Busy Beaver challenges, its strength wanes, managing only 12% on the Mixed Boolean Arithmetic puzzles. These findings illuminate the great difficulty of crafting long-form proofs, even for those problems whose computational verification is a mere trifle, thus showcasing the invaluable role of TCS realms in advancing the research of automated reasoning.

AI Resilience: cyberphysical systems

Friend of the newsletter Nora Ammann published AI Resilience a little bit ago. The section on cyberphysical systems is relevant to us: it relies on secure (formally verified) program synthesis becoming cheap and accessible. Resilience is a flavor of defensive acceleration that specifically targets the durable and structural resolution of vulnerabilities, vulnerabilities which get amplified by AI but which, if we’re diligent and hardworking, get ameliorated by AI as well.

Let’s formalize this step by step

One time a friend asked me “why not just put the proof synthesis in the reasoning trace and the thing you’re writing the proof about (say, a program) in the final output“. And I was like, “...huh”. And I got as far as adding a few credits to my runpod account before getting pulled into other things. Little did I know, at exactly that moment, this team was hard at work!

A Proposal for Safe Passage: The Formal Verification of the Grand Sorcerers’ Thoughts– The method of the Chain-of-Thought (CoT) prompting hath become the established ritual for coaxing forth the reasoning powers from the Grand Language Models (LLMs). Yet, to contain the hallucinations in these Chains—phantoms notoriously difficult to discern—the current remedial arts, such as the Process Reward Models (PRMs) or the Self-Consistency measures, operate as opaque boxes, offering no verifiable evidence for their judgments, thus perhaps limiting their true efficacy. To redress this failing, we draw inspiration from the ancient wisdom that “the gold standard for supporting a mathematical claim is to provide a proof.” We propose a retrospective, step-aware framework of Formal Verification which we title Safe. Rather than assigning arbitrary scores or marks, we strive to articulate the mathematical claims within the formal mathematical language of Lean 4 at the conclusion of each reasoning step, and further provide formal proofs to definitively identify these hallucinations. We test our framework Safe across various models and mathematical archives, demonstrating a significant enhancement in their performance, while simultaneously offering interpretable and verifiable evidence for their passage. Furthermore, we propose FormalStep as a new trial for the correctness of step-by-step theorem proving, containing 30,809 formal statements. To the best of our knowledge, our work represents the first valiant endeavor to utilize the formal mathematical language of Lean 4 for verifying the natural-tongue content generated by the LLMs, thereby aligning with the very reason these formal languages were created: to provide a robust and unshakeable foundation for the hallucination-prone proofs scribed by human hands.

Ulyssean website mission status: totally sick

There’s honestly no Ulyssean update in this issue, but I stumbled upon their website and loved the graphic design!

There are no benefits for paid subscriptions. A Funder of Undisclosed Provenance is backing the newsletter for 6 months.



Discuss