In Search of Lost Time, by Marcel Proust, is one of those books that people like to claim to have read but never did. It, alongside other monoliths of literature such as Infinite Jest, War and Peace, Atlas Shrugged and Worm, are daunting and enticing in equal measure, constantly hawked by those who proclaim their virtues yet so long that it is unclear whether one has the time to devote to such an undertaking, let alone if it's worth doing so.
But I am a sucker for a hefty tome, and I have been working my way up to In Search of Lost Time ever since seeing it, decades back, topping the charts of a now-deleted Wikipedia page for longest novel[1]. I didn't look too much into the book beforehand; all I knew was that it was about French Philosophy and had something to say about nostalgia and memory. I knew it would need some dedicated time to read, so, after graduation and warming up on philosophy with Simulacra and Simulation, I took a month off, cancelled all social engagements, retreated to a quiet apartment in the city, and started to read.
I do not recommend doing so.
In Search of Lost Time consists of seven volumes: Swann's Way, Within a Budding Grove, The Guermantes Way, Sodom and Gomorrah, The Prisoner, The Fugitive, and Time Regained. Of the seven, I can recommend the first one and a half volumes, and cautiously recommend the second half of the last one, which is the Author Tract that explains his entire philosophy but constantly refers to events that have occurred in the previous five thousand pages, which has a greater impact if you also slogged through those events but honestly you can get the gist of the Tract even without it. It's about nostalgia and memory after all, so all you really have to know is that something happened in the past and the juxtaposition between the past and present and the emotions of the experience and the memories of the experience brings an awareness of the passage of time and makes me feel old.
There was less philosophy than I expected (though being primed by Baudrillard probably didn't help), but given its reputation as a Philosophical Book, one often questions what the author means when he writes about things, and also why the author is choosing to write about these specific things in the first place. The book is styled as a memoir but decidedly fictional, so it's unclear why the Author would include fifty pages on the sex lives of the house servants[2] unless it was to reinforce some theme or other, because the narrator has no business including it in their own memoirs.
But I'm getting ahead of myself.
If there's one thing I can praise the book for, it's the prose. The writing is long, meandering, and excessive, often spending five pages to describe the struggle of getting out of bed in the morning[3], but are undeniably expertly crafted, intricate like clockwork, filled to the brim with nestled clauses, digressions, references to events past and future yet to come. Though figuring out what events are actually occurring in the novel may be difficult, it's not the events that matter, and the writing tells you so; it's how the narrator experiences these events, and how they make him feel. We live within the author's head, and the many pages describing the struggles of getting out of bed aren't about the getting out of bed, but the struggles.
Though it is important to note that In Search of Lost Time was written in French, and though it could have been an option to me to read it in the original language (the same way that I could have also read The Three-Body Problem in the original Chinese), it would have taken me orders of magnitude longer to finish (the same way the prologue of 三体 took me an entire month to read[4]). My copy was the 2016 Moncrieff/Schiff English translation, and with the prose as it is in English, I can imagine how it is in the original French.
Spoilers ahead.
Swann's Way is focused on the narrator's childhood home in Combray, much of it framed around a childhood memory of interrupting a dinner party to ask to be tucked in for the night. This event, and the eventual surrender of his parents to tuck him in and read him a bedtime story, the author blames for his lifelong sickliness and poor constitution. Not because his parents surrendered, but rather it was the first time his parents had acknowledged that he could not go to sleep without being tucked in and told a bedtime story, and this was the first time that his parents had realized that their expectations of him were higher than what he was able to achieve. When we are young we think of ourselves as invincible and capable of anything; but at some point we learn that we are mortal and become aware of our limits; this realization that his parents realized that he had limits (which were lower than what they expected) had been seared into his brain as he returns to Combray many years in the future to reminisce on his memories.
A separate section goes into the entire life and backstory of a guest at the dinner party, M. Charles Swann. Though not at all relevant to the story at the time, it does introduce many major characters and themes that will continue to plague the entire story.
Of note is an attention to social class. As with Tolstoy, the narrator and much of the major characters are of the aristocratic class who don't have real jobs, obtaining a seemingly infinite stream of income from vaguely-defined investments, pensions and inheritances, and spend all their time going to parties and trying to get invites to go to parties. But interestingly, those in the servant class gets much more development, going into some detail about their needs, wants and desires, usually by contrast to the aristocracy. But frustratingly, the book doesn't seem to say anything about this difference, other than acknowledging that it exists, and how the servant's desire to spend time with her daughter is so inconvenient to themselves. The lower class exists, they are different, and (offhandedly) they can be exploited. Perhaps something about love across classes being the same? We'll get to that later.
On a more fun note, the novel takes place in the late 1800s to early 1900s; the story starts with people complaining about motor-cars, with asides to having electricity brought in or trying to get an invite to a house with a telephone as the story goes by. There's also an anachronistic reference to Mendel; though the story frustratingly does not use years (making the age of the narrator in each section difficult to determine), the Dreyfus Affair is a central event around which gossip swirls like water to a drain; with the Dreyfus Affair beginning in 1894 anchoring the story, the rediscovery of Mendel only occurred in 1900, leading to the reference being out of time.[5]
At the end of Swann's Way, the narrator falls in love with the daughter of M. Swann, Gilberte, and the first half of Within a Budding Grove deals with all that. It's a cute young love: they play tennis with chaperones watching, wrestle in the snow, write letters to each other on decorated stationary, the narrator gets invited over for tea. But at some point the narrator spots Gilberte walking with another young man (gasp) and refuses to talk with her unless she apologizes first.[6] She never does.
So our narrator, who had recently sold a Chinese Bowl he inherited for ten thousand francs to buy daily flowers for Gilberte, in a fit of sorrow, immediately blows it all on whores.[7]
It's a relatable section. After all, who among us hasn't fallen in young love, suffered the pains of rejection and heartbreak, and then blown ten thousand francs on hookers and blow? The writing as usual is fantastic as the narrator goes through the highs and lows of love, clinging to every scrap of hope, enwrapped in the folds of paranoia and jealousy at the slightest shadow.
Unfortunately this is the last time the good outweighs the bad, and you can stop reading here. Perhaps it is because I can no relate to the narrator. The narrator develops to become a Romantic, but for this manifests itself as "falling in love with every single woman he sees" and "becoming paranoid and jealous over every woman he's with". Those become the dominant themes of the book, and it starts becoming difficult to get through, even with the appreciation for the prose.
The narrator goes on a beach holiday and, on the train, falls in love with a peasant girl selling milk by the side of the tracks. Then he reaches the beach, gets sick, and falls in love with a "beautiful procession of young girls" playing by the beach.
I felt surging through me the embryo, as vague, as minute, of the desire not to let this girl pass without forcing her mind to become conscious of my person, without preventing her desires from wandering to some one else, without coming to fix myself in her dreams and to seize and occupy her heart.
This is how you get charged with indecent exposure.
Our narrator does not get charged with indecent exposure. He will eventually have a complaint filed against him for "corruption of a child under the age of consent", but it will be dropped because the head of the police "had a weakness for little girls" and advises the narrator to be more careful and that he paid too much.
It's not that bad yet but we're getting there.
I'm skipping many dinner parties and much socialization, because much of it is about the petty infighting high society and their clingers get to, but there's a fun bit where one dinner party group develops halfway into a cult and goes on a one-month sea voyage on a yacht, which gets extended to an entire year because the leader convinces everyone that there's a revolution happening in Paris. Understandable.
Anyways, our narrator gets a piece of advice from M. Swann:
Nervous men ought always to love, as the lower orders say, ‘beneath’ them, so that their women have a material inducement to do what they tell them.
Yikes. This advice is followed with
The danger of that kind of love, however, is that the woman’s subjection calms the man’s jealousy for a time but also makes it more exacting. After a little he will force his mistress to live like one of those prisoners whose cells they keep lighted day and night, to prevent their escaping. And that generally ends in trouble.
The narrator notes that this is prophetic. I also note at this point that Volume 6 is entitled The Prisoner.
Our narrator does get in with the group of girls and plays many games with them, and falls in love with one Albertine in particular but also the whole group because he's a Romantic. He gets invited up to Albertine's hotel room but she refuses to kiss him and he does so anyways and she calls for security. Boundaries are established, they remain friends, everyone leaves because summer is over.
At this point I was having difficulty getting myself to continue reading, so I booked a vacation to the south of France, where God isn't this time of year. I sat on the beaches and read in the sun where there were no girls but many seagulls. I also got food poisoning so there's that.
I'm going to start skimming through the volumes even moreso than I have been previously. The Guermantes Way is mostly a bunch of social parties where the narrator falls in love with Princesse de Guermantes mostly because she's a princess and some connection of her name with the book he was read when he was little and couldn't sleep, and stalks her for a bit. But there's a beautiful passage when his grandmother dies and he is overcome by grief. Strong emotions, combined with the prose, are the highlights of the book, but only if one can relate to them.
I thought the title Sodom and Gomorrah was metaphorical, but no, it's about the secret homosexual relationships the aristocracy and various servants/staff are having[8]. The narrator pretends to fall in love with Albertine's friend Andrée to get Albertine interested in him, and also starts suspecting that Albertine might be a lesbian (gasp) because she has friends who are girls (gasp). Albertine becomes his mistress[9], and the narrator resolves never to marry her, and then to marry her.
In The Prisoner, the narrator and Albertine move in together, and the narrator controls Albertine's movements and keeps her under surveillance to make sure she doesn't meet her girl-friends and go on secret lesbian trysts. But, in fact, the real prisoner is the narrator, who is a prisoner of his own jealousy so who's the real victim here? They fight, and Albertine leaves in the middle of the night.
In The Fugitive, the narrator is distraught by Albertine's departure and writes her a letter saying that he's perfectly fine with her leaving and is going to go marry Andreé instead, while also sending a friend to her house to convince her to come back. He then sends her a letter begging her to come back, but too late! Albertine has died in a freak horse accident[10]. Our narrator receives two posthumous letters from Albertine, one saying good luck with the marriage, the other apologizing and asking to come back.
It's when Albertine leaves our narrator gets entangled with corrupting a child. He's so lonely in his apartment that he pays a little girl to come inside and sit on his knee, but then he realizes that this little girl will never fill the Albertine-shaped hole in his heart and gives the little girl the money to go away, and then the parents find the little girl and ask where the money came from and call the police. At this point I have so little sympathy for our narrator that I start to question whether he's reliable at all, even though so far he seems to be completely truthful and tells us things that make him look like a terrible person. And I start questioning why the Author put this part in the book. There's other parts with little girls in houses of assignation. Maybe it's just the French at the turn of the century.
Anyways our narrator is devastated by the death of his beloved but resigns that love will pass like all previous loves do. He also hires everyone he knows to look into Albertine's past to see if she was actually a lesbian. And now that she's dead, people are more willing to talk.
She was a lesbian. With Andreé. And all the other girls. In fact there's an entire secret lesbian coven operating in France whose goal is to seduce young girls and turn them into lesbians. I'm pretty sure Gilberte was involved because that young man she was walking with? turns out it was France's Most Notorious Lesbian, dressed as a man! (gasp).
At this point I am as spent as the narrative. I am back in my dark apartment in the city where the sun, like God, refuses to show his face. I am playing Silksong. Hornet also has food poisoning.
In Time Regained our narrator moves back home and reconnects with Gilberte, who has married into high society. During World War I our narrator stumbles into a pub which is a gay cruising spot that also caters to S&M and spies on some important characters in the narrative I've completely skipped over, mostly to do with the Nature of Art (in society) subplot, because nothing actually happens in them and it's all social parties and talking. Also features priests gay cruising because of course.
The book ends at another long party after the war where the narrator realizes the big concepts that He's Old and Things Have Changed and Isn't That Dandy. And he finally starts writing the book you are reading now.
That's it. That's the book. The prose is amazing, and occasionally there are beautiful Philosophical Sentences That Say Something, but it's surrounded by all this. I can recommend the first volume-and-a-half because at least then the narrator is young and his mistakes are excusable, but he just doesn't learn. Everyone in the book is a terrible person. There are no positive relationships. At least in War and Peace the aristocrats were worried about the war; the closest we get to that here is "how antisemitic am I feeling today" depending on how the Dreyfus Affair is progressing. And Anna Karenina has a Wuthering Heights-feel with someone ending up on a farm. Here we're stuck in the mind of a paranoid jealous lover who falls in love with anything that moves, tossed about on the storms of emotion that by the end we are all seasick to.
Does the book even say anything? Nothing that, by the time you have developed the attention span to read the book, you shouldn't already know. Things change. People change. Memories change. Love and society warp reality to fit narratives. It is impossible to fully know another human being. We are all alone.
And economic coercion in romantic relation is bad. I don't think this is supported by the text, but I think you should take that away anyways.
Nowadays, one can find other rankings online, with In Search of Lost Time generally being relegated to lower as more obscure works come to light, but generally still sits comfortably in the top ten. Worm turns out to be slightly longer, but is not traditionally published.
Summed across the length of the book, with many turning out to be plot-critical, though, as we shall see, my own recollections of the book may not be exact mimeographs of the book itself. But this review is not precisely a review; it is not about the book itself but about my experiences with the book, and experience can only be shared through the lens of memory. For there is nothing that one can say about the book itself. It is a book. There are pages with words. It has been written. A review of the book is through the reviewer, through their experiences, through their memory.
I have returned to the book to find that this section is in fact only three pages long. The memory and the act of sharing the memory has lengthened it as the size of a fish caught by the banks of the rivers of last week grows upon each retelling. And yet it is not the actual length of the fish which matters, but the size that it takes up within our minds, displacing all other thoughts until they spill over, uncaught, as Archimedes first beheld.
Despite being conversationally fluent in Chinese, I am atrocious at reading it, though I could generally understand once I got the pronunciation. This did not help on figuring out 愛因斯坦 is Einstein's name, because I was expecting names to be two or three characters, and character pronunciation lookups are slow on the Kindle.
But, as the reference is in the narration rather than the dialogue, this may just be the narrator describing the events of the past with similes from the future. Anachronistic, yes, but all is recollection, all is memory.
One major theme of the work is jealousy. A favorite quote of mine from the Swann Song Story, which is probably one of the core themes of the book:
His jealousy, like an octopus which throws out a first, then a second, and finally a third tentacle, fastened itself irremovably first to that moment, five o’clock in the afternoon, then to another, then to another again.
On a more careful reading, this is not quite clear. Earlier in the chapter there is extended section about "the houses of assignation which I began to frequent some years later" but the author's refusal to place dates makes the tracking of time difficult. The falling out with Gilberte likely takes place at most a year and a half later, which is likely not "some years later", but in the aftermath the narrator would "pour out my sorrows upon the bosoms of women whom I did not love." and spend all ten thousand francs.
The author was a closeted homosexual, which means that these themes probably have a deeper meaning. Unfortunately, I read the book without this knowledge, and could not extract much out of these themes other than "this is how society is".
Whatever that means. The book is notoriously vague on a lot of things, but at the very least they're naked in bed. Individually. I don't think together is supported by the text.
At this point I was so checked out and the telegram was glossed over so quickly in the text that I missed this point by five pages and had to flip back to check that, yes, Albertine actually died and no, the narrator isn't imagining the grief that he would be feeling if he received a telegram of Albertine's death because both before and after the telegram are filled with the contortions of the narrator's mind that it's hard to tell the counterfactuals from the factuals. I think it makes more sense if the narrator was imagining how he would feel if Albertine died as opposed to the freak horse accident exposing a national lesbian conspiracy.
Every so often it slips. It seems I am writing a book, but I can’t remember why. Somehow, the sentences are supposed to perform that impossible, intimate task: to translate my inner world into another. Yet they sit there so quiescent and small. How could an arrangement of words do anything, let alone reduce that ultimate threat to which it is all supposedly connected: the looming god machines? I look again at the monitor in which the words are contained and suddenly what once felt so raw and powerful deflates into limpness. Why would anyone listen to me, anyway? Have I said anything new? Or is too weird—the strangeness in my head failing to find handholds in other minds? And it floods, these pieces of doubt. Each one flitting by almost unnoticeably, but in the background they build.
Then sometimes the flood abates as quickly as it came. The world is made of scary stuff: we really may all die, and I really might not be capable of reducing or even much affecting that terrifying threat. Yet somehow this has little to do with the words on the page. The outcomes matter—they do—but that isn’t where the motivation comes from. It’s from an ultimately simpler place: there is a problem I see, and I am going to try to fix it. The way forward is as unclear as the way forward always will be, for there is no marked path in a world as strange and scary as this. No one to tell me that what I’m doing will surely help, no one to verify that I am the one for the task, no one to assure me that everything will be okay. But somehow all of this quiets in the presence of what’s at stake. Somehow it’s obvious I am going to try, no matter the uncertainty or fear, because that is all there is.
Each time I watch The Fellowship of the Ring something about Frodo’s courage grows in me. It’s a wiser kind of courage than is usual for stories as grand as this—a quieter, more powerful sort. It’s a reckoning with metaphysical heartbreak. For Frodo’s journey begins by being suddenly thrust into a deeply unwelcome world: the realization that the Shire could perish; that all he loves is threatened by some vast and elusive evil that until then he had no idea existed. You can see him start to learn this; see the recognition silently take hold. It begins to dawn on him when he is first hunted by the Nazgûl, billowing in their blackness and death; takes on a deeper gravity when he sees Bilbo, otherwise so constitutionally cheerful, possessed by some demon of greed; deeper still when he sees the Ring’s power bending the minds of men into vessels of evil; and when he realizes with painful clarity that it must be him to destroy it.
The world as he knew it is somehow, inexplicably: gone. No longer populated with the cheerful hum of the Shire—the safe, warm, familiar home where problems never amounted to more than stolen cabbages—but a terrifying, lonely place. A world without marked paths, with no one to show him the way, no sense that his mission will succeed, no sense even that it is survivable, no hope that he may remain untouched by the Ring, no guarantee that the Shire will be spared from the evils of Mordor. The quest he’s on isn’t anything like the faraway grand adventures Bilbo described. For the danger is great and the consequences are felt: should he fail, the Shire, and everything he knows and loves, will too. And what a thing for a Hobbit to learn!
Frodo: “I wish the Ring had never come to me. I wish none of this had happened.”
Gandalf: “So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us.”
It would have been so easy for Frodo to decide not to destroy the Ring. For there is always ample opportunity to convince oneself out of unwanted updates and the responsibility they imply. At the Council of Elrond—where Elves, Dwarves, and Men from all the realms of Middle Earth gathered to figure out what do about the Ring—Frodo could have waited silently for someone else to take on that burden, choosing to believe what was easy: that the best chance of saving the Shire was in leaving the task to these far more experienced hands. He could have cowed to the forces of rationalization preying on his weakness; letting Boromir convince him that he should be given that power. He could have given up, surrendered to Sauron, or otherwise ceded that responsibility. But he didn’t. Not when he was distorted by evil, not when he nearly died, not when all hope seemed lost. Frodo and Sam just continued on.
And it stays with me, this clarity: uncorruptible, unflinching. There is reality, laid bare in its terror and wonder alike, and they see it all without hesitation. Their actions naturally flowing through from some intimate, enduring connection to what ultimately matters. And it stirs within me that simpleness: there is a chance to save the Shire, and so Frodo and Sam will take it. Of course they will! This isn’t the quest they wanted, but it is the one they will nevertheless undertake. Since this is the world they have been given. They don’t flinch or deny it, they don’t surrender or bow, for their courage is sounder than that. It’s a solemn acceptance of the stakes as they are. A reckoning with heartbreak—the fate of the Shire wound so insecurely around so unlikely a pair of heroes, and their resolution to try protecting it anyway.
There is a moment when Frodo and Sam break off from the rest of the Fellowship, determined to approach Mordor on their own. And they stand there looking across the jagged black rock, an entire landscape heavy with smoke from some ungodly tower of fire, so wickedly productive in the dissemination of orcs, and it seems around the last place one would ever expect a Hobbit to be. So far from home, unimaginably far. Yet they look out at it unflinchingly—the distance and hardship they must traverse—with solemn acceptance. They are going to take the Ring to Mordor no matter the cost, no matter the chance of success, no matter any of it, for it simply must be attempted. And it fills me with some kind of faith, seeing them stand there with that quiet determination. Something almost numinous. Since there is real power, silent but palpable: the courage to try.
Perhaps I resonate so much with the Hobbits because I have also been stripped from the Shire. For I have suffered metaphysical heartbreak, too—those moments of silent betrayal, when the world I thought I had known is lost. As if suddenly and irrevocably, something promised is ripped unceremoniously away. I first felt this betrayal as a child when I realized that everyone I loved would someday die, and this basin of basic security—the sort implicitly relayed from parents to children—was shown to be deeply mistaken. I felt it again as a teenager when I first really grokked the implications of physics: the way I could be reduced, as everything could, to atom and void. And again when truly grappling with the stakes of AI: this idea, palpable sometimes, that all I love may vanish counterintuitively soon.
The vastness of what’s at stake can be too much to hold. My death was already so painfully dark; the death of my loved ones an abyssopelagic nightmare to wear. Yet now I am thrust into an incomprehensibly colder reality still, one which threatens to collapse the human vision into void. As if everything within the familiar, the warm, and the wonderful is soon to be consumed by some desperate alien blackness, unfurling out from the great open maw of technocapitalism itself. The universe has never cared about me, for the universe does not care. But here we have it, our chance to wield intelligence to our ends; a chance to finally reverse the cruelty of death, disease, and suffering so indifferently wrought upon us. And we may well just pass it, we may well give it all away. God!
I don’t write for any particular ultimate success, my words are not backchained by way of some certain scheme. I write simply because there is a problem I see, and I am going to try to fix it. Because the wonder and the beauty and the love are all there, hanging so precariously on so tentative a thing—strange new tools to control strange new gods—and they deserve real attempts to protect them. Whether I miserably fail or outrageously succeed, somehow that isn’t the reason to do it. For there’s a courage there, something that wells within me: to try in the face of grave stakes and uncertain times. It isn’t grand, and it isn’t fun. It’s terrifying. A real responsibility, to take the vastness into oneself and accept that burden. A hard, vulnerable, and scary thing to face it, for the task is far too big and I am far too small. But it is important, and so I try.
I wish more of the world carried the spirit of Frodo Baggins. Wish we could move through metaphysical heartbreak like him. With the solemn courage to look at an unwelcome reality—our reality, full of its own death, powerlessness, void—and to try. Not to lose hope, give up, or surrender; not to flinch, deny, or assume. But to take on the gravity of the situation as it is, and respond with the integrity it deserves. Like Frodo, I wish none of this had happened. I wish I had woken up into a warmer, safer world—one which promised to sustain me and everything I care about. But I am not in that world, and all there is to do is try. To try doing all I can with the time I am given.
It is fashionable, on LessWrong and also everywhere else, to advocate for a transition away from p-values. p-values have many known issues. p-hacking is possible and difficult to prevent, testing one hypothesis at a time cannot even in principle be correct, et cetera. I should mention here, because I will not mention it again, that these critiques are correct and very important - people are not wrong to notice these problems and I don't intend to dismiss them. Furthermore, it’s true that a perfect reasoner is a Bayesian reasoner, so why would we ever use an evaluative approach in science that can’t be extended into an ideal reasoning pattern?
Consider the following scenario: the Bad Chemicals Company sells a product called Dangerous Pesticide, which contains compounds which have recently been discovered to cause chronic halitosis. Alice and Bob want to know whether BCC knew about the dangers their product poses in advance of this public revelation. As a result of a lawsuit, several internal documents from BCC have been made public.
Alice thinks there’s a 30% chance that BCC knew about the halitosis problem in advance, whereas Bob thinks there’s a 90% chance. Both Alice and Bob agree that, if BCC didn’t know, there’s only a 5% chance that they would have produced internal research documents looking into potential causes of chronic halitosis in conjunction with Dangerous Pesticide. Now all Alice and Bob have to do is agree on the probability of such documents existing if BCC did know in advance, and they can do a Bayesian update! They won’t end up with identical posteriors, but if they agree about all of the relevant probabilities, they will necessarily agree more after collecting evidence than they did before.
But they can’t agree on how to update. Alice thinks that, if BCC knew, there’s a 95% chance that they’ll discover related internal documents. Bob, being a devout conspiracy theorist, thinks the chance is only 2% - if they knew about the problem in advance, then of course they would have been tipped off about the investigation in advance, they have spies everywhere and they’re not that sloppy, and why wouldn't the government just classify the smoking gun documents to keep the public in the dark anyway? They're already doing that about aliens, after all!
Alice thinks this is a bit ridiculous, but she knows the relevant agreement theorems, and Bob is at least giving probabilities and sticking to them, so she persists and subdivides the hypothesis space. She thinks there’s a 30% chance that BCC knew in advance, but only a 10% chance that they were tipped off. Bob thinks there’s a 90% chance they knew in advance, and an 85% chance they were tipped off. If they knew but were not tipped off, Alice and Bob manage to agree that there’s a 96% chance of discovering the relevant internal documents.
Now they just have to agree on the probability of discovering the related internal documents if there’s a conspiracy. But again, they fail to agree. You see, Bob explains, it all depends on whether the Rothschilds are involved - the Rothschilds are of course themselves vexed with chronic halitosis, which explains why they were so involved in the invention of the breathmint, and so if there were a secret coverup about the causes of halitosis, then of course the Rothschilds would have caught wind of this through their own secret information networks and intervened, and that’s not even getting into the relevant multi-faction dynamics! At this point Alice leaves and conducts her investigation privately, deciding that reaching agreement with Bob is more trouble than it’s worth.
My point is: we can guarantee reasonable updates when Bayesian reasoners agree on how to update on every hypothesis, but it’s extremely hard to come to such an agreement, and even reasoners who agree about the probability of some hypothesis X can disagree about the probability distribution “underneath” X, such that they disagree wildly about P(E|X). In practice we don’t exhaustively enumerate every sub-hypothesis, instead we make assumptions about causal mechanisms and so feel justified in saying that this sort of enumeration is not necessary. If we want to determine the gravitational constant, for example, it’s helpful to assume that the speed at which a marble falls does not meaningfully depend on its color.
And yet how can we do this? In the real world we rarely care about reaching rational agreement with Bob, and indeed we often have good reasons to suspect that this is impossible. But we do care about, for example, reaching rational agreement with those who believe that dark matter is merely a measurement gap, or with those who believe that AI cannot meaningfully progress beyond human intelligence with current paradigms. Disagreement about how to assign the probability mass underneath a hypothesis is the typical case. How could we reasonably come to agreement in a Bayesian framework when we cannot even in principle enumerate the relevant hypotheses, when we suspect that the correct explanation is not known to anybody at all?
Here’s one idea: enumerate, in exhaustive detail, just one hypothesis. Agree about one way the world could be - we don’t need to decide whether the Rothschilds have bad breath, let’s just live for a moment in the simple world where they aren't involved. Agree on the probability of seeing certain types of evidence if the world is exactly that way. If we cannot agree, identify the source of the disagreement and introduce more specificity. Design a repeatable experiment which, if our single hypothesis is wrong, might give different-from-expected results, and repeat that experiment until we get results that could not plausibly be explained by our preferred hypothesis. With enough repetition, even agents who have wildly different probability distributions on the complement should be able to agree that the one distinguished hypothesis is probably wrong. A one-in-a-hundred coincidence might still be the best explanation for a given result, but a one-in-a-hundred-trillion coincidence basically never is.
Not always, not only, but when you want your results to be legible and relevant to people with wildly different beliefs about the hypothesis space, you should at some point conduct a procedure along these lines.
That is to say, in the typical course of scientific discovery, you should compute a p-value.
When economists think and write about the post-AGI world, they often rely on the implicit assumption that parameters may change, but fundamentally, structurally, not much happens. And if it does, it’s maybe one or two empirical facts, but nothing too fundamental.
This mostly worked for all sorts of other technologies, where technologists would predict society to be radically transformed e.g. by everyone having most of humanity’s knowledge available for free all the time, or everyone having an ability to instantly communicate with almost anyone else. [1]
But it will not work for AGI, and as a result, most of the econ modelling of the post-AGI world is irrelevant or actively misleading [2], making people who rely on it more confused than if they just thought “this is hard to think about so I don’t know”.
Econ reasoning from high level perspective
Econ reasoning is trying to do something like projecting the extremely high dimensional reality into something like 10 real numbers and a few differential equations. All the hard cognitive work is in the projection. Solving a bunch of differential equations impresses the general audience, and historically may have worked as some sort of proof of intelligence, but is relatively trivial.
How the projection works is usually specified by some combination of assumptions, models and concepts used, where the concepts themselves usually imply many assumptions and simplifications.
In the best case of economic reasoning, the projections capture something important, and the math leads us to some new insights.[3] In cases which are in my view quite common, non-mathematical, often intuitive reasoning of the economist leads to some interesting insight, and then the formalisation, assumptions and models are selected in a way where the math leads to the same conclusions. The resulting epistemic situation may be somewhat tricky: the conclusions may be true, the assumptions sensible, but the math is less relevant than it seems - given the extremely large space of economic models, had the economist different intuitions, they would have been able to find a different math leading to different conclusions.
Unfortunately, there are many other ways the economist can reason. For example, they can be driven to reach some counter-intuitive conclusion, incentivized by academic drive for novelty. Or they may want to use some piece of math they like.[4] Or, they can have intuitive policy opinions, and the model could be selected so it supports some policy direction - this process is usually implicit and subconscious.
The bottom line is if we are interested in claims and predictions about reality, the main part of economic papers are assumptions and concepts used. The math is usually right. [5]
Econ reasoning applied to post-AGI situations
The basic problem with applying standard economic reasoning to post-AGI situations is that sufficiently advanced AI may violate many assumptions which make perfect sense in human economy, but may not generalize. Often the assumptions are so basic that they are implicit, assumed in most econ papers, and out of sight in the usual “examining the assumptions”. Also advanced AI may break some of the intuitions about how the world works, breaking the intuitive process upstream of formal arguments.
What complicates the matter is these assumptions often interact with considerations and disciplines outside of the core of economic discourse, and are better understood and examined using frameworks from other disciplines.
To give two examples:
AI consumers
Consumption so far was driven by human decisions and utility. Standard economic models ultimately ground value in human preferences and utility. Humans consume, humans experience satisfaction, and the whole apparatus of welfare economics and policy evaluation flows from this. Firms are modeled as profit-maximizing, but profit is instrumental—it flows to human owners and workers who then consume.
If AIs own capital and have preferences or goals of their own, this assumption breaks down. If such AIs spent resources, this should likely count as consumption in the economic sense.
Preferences
Usual assumption in most econ thinking is that humans have preferences which are somewhat stable, somewhat self-interested, and what these are is a question mostly outside of economics. [6] There are whole successful branches of economics studying to what extent human preferences deviate from VNM rationality or human decision making suffers from cognitive limitations, or on how preferences form, but these are not in the center of attention of mainstream macroeconomy. [7] Qualitative predictions in case of humans are often similar, so the topic is not so important.
When analyzing the current world, we find that human preferences come from diverse sources, like biological needs, learned tastes, and culture. A large component seems to be ultimately selected for by cultural evolution.
Post-AGI, the standard econ assumptions may fail, or need to be substantially modified. Why?
One consideration is the differences in cognitive abilities between AGIs and humans may make human preferences easily changeable for AGIs. As an intuition pump: consider a system composed of a five year old child and her parents. The child obviously has some preferences, but the parents can usually change these. Sometimes by coercion or manipulation, but often just by pointing out consequences, extrapolating children’s wants, or exposing them to novel situations.
Also preferences are relative to world model: standard econ way of modelling differences in world models is “information asymmetries”. The kid does not have as good understanding of the world, and would easily be exploited by adults.
Because child preferences are not as stable and self-interested as adults, and kids suffer from information asymmetries, they are partially protected by law: the result is patchwork of regulation where, for example, it is legal to try to modify children’s food preferences, but adults are prohibited to try to change child’s sexual preferences for their advantage.
Another ”so obvious it is easy to overlook” effect is child dependence on parent’s culture: if parents are Christians, it is quite likely their five year old kid will believe in God. If parents are patriots, the kid will also likely have some positive ideas about their country. [8]
When interacting with cognitive systems way more capable than us, we may find ourselves in a situation somewhat similar to kids: our preferences may be easily influenced, and not particularly self-interested. The ideologies we adopt may be driven by non-human systems. Our world models may be weak, resulting in massive information assymetries.
There even is a strand of economic literature that explicitly models parent-child interactions, families and formation of preferences. [9] This body of work may provide useful insights I’d be curious about - is anyone looking there?
The solution may be analogous: some form of paternalism, where human minds are massively protected by law from some types of interference. This may or may not work, but once it is the case, you basically can not start from classical liberal and libertarian assumptions. As an intuition pump, imagine someone trying to do “macroeconomy of ten year olds and younger” in the current world.
Other core concepts
We could examine some other typical econ assumptions and concepts in a similar way, and each would deserve a paper-length treatment. This post tries to mostly stay a bit more meta-, so just some pointers.
Property rights. Most economic models take property rights as exogenous - “assume well-defined and enforced property rights.” If you look into how most property rights are actually connected to physical reality, property rights often mean some row exists in a database run by the state or a corporation. Enforcement ultimately rests on the state’s monopoly on violence, cognitive monitoring capacity and will to act as independent enforcer. As all sorts of totalitarian, communist, colonial or despotic regimes illustrate, even in purely human systems, private property depends on power. If you assume property is stable, you are assuming things about governance and power.
Transaction costs and firm boundaries. Coase’s theory [10] explains why firms exist: it is sometimes cheaper to coordinate internally via hierarchy than externally via markets. The boundary of the firm sits where transaction costs of market exchange equal the costs of internal coordination. AI may radically reduce both—making market transactions nearly frictionless while also making large-scale coordination easy. The equilibrium size and structure of firms could shift in unpredictable directions, or the concept of a “firm” might become less coherent.
Discrete agents and competition. Market models assume distinct agents that cooperate and compete with each other. Market and competition models usually presuppose you can count the players. AGI systems can potentially be copied, forked, merged, or run as many instances, and what are their natural boundaries is an open problem.
Capital vs. Labour.Basic concepts in 101 economic models typically include capital and labour as concepts. Factors is production function, Total Factor Productivity, Cobb-Douglas, etc. Capital is produced, owned, accumulated, traded, and earns returns for its owners. Labour is what humans do, and cannot be owned. This makes a lot of sense in modern economies, where there is a mostly clear distinction between “things” and “people”. It is more ambiguous if you look back in time - in slave economies, do slaves count as labour or capital? It is also a bit more nuanced - for example with “human capital”.
When analyzing the current world, there are multiple reasons why the “things” and “people” distinction makes sense. “Things” are often tools. These amplify human effort, but are not agents. A tractor makes a farmer more productive, but does not make many decisions. Farmers can learn new tasks, tractors can not. Another distinction is humans are somewhat fixed: you can not easily and quickly increase or decrease their counts.
Post-AGI, this separation may stop making sense. AIs may reproduce similarly to capital, be agents like labour, learn fast, and produce innovation like humans. Also maybe humans may own them like normal capital, or more like slaves, or maybe AIs will be self-owned.
Better and worse ways how to reason about post-AGI situations
There are two epistemically sound ways to deal with problems with generalizing economic assumptions: broaden the view, or narrow the view. There are also many epistemically problematic moves people take.
Broadening the view means we try to incorporate all crucial considerations. If assumptions about private property lead us to think about post-AGI governance, we follow. If thinking about governance leads to the need to think about violence and military technology, we follow. In the best case, we think about everything in terms of probability distributions, and more or less likely effects. This is hard, interdisciplinary, and necessary, if we are interested in forecasts or policy recommendations.
Narrowing the view means focusing on some local domain, trying to make a locally valid model and clearly marking all the assumptions. This is often locally useful, may build intuitions for some dynamic, and fine as long as a lot of effort is spent on delineating where the model may apply and where clearly not.
What may be memetically successful and can get a lot of attention, but overall is bad, is doing the second kind of analysis and presenting it as the first type. Crucial consideration is a consideration which can flip the result. If an analysis ignores or assumes away ten of these, the results have basically no practical relevance: imagine for each crucial consideration, there is 60% chance the modal view is right and 40% it is not. Assume or imply the modal view is right 10 times, and your analysis holds in 0.6% worlds.
In practice, this is usually not done explicitly - almost no one claims their analysis considers all important factors - but as a form of motte-and-bailey fallacy. The motte is the math in the paper - follows from the assumptions and there are many of these. The bailey are the broad stroke arguments, blogpost summaries, tweets and short-hand references, spreading way further, without the hedging.
In the worst cases, various assumptions made are contradictory or at least anticorrelated. For example: some economists assume comparative advantage generally preserves relevance of human labour, and AIs are just a form of capital which can be bought and replicated. However, comparative advantage depends on opportunity costs: if you do X, you cannot do Y at the same time. The implicit assumption is you can not just boot a copy of you. If you can, the “opportunity cost” is not something like the cost of your labour, but the cost of booting up another copy. If you assume future AGIs are similarly efficient substitutes for human labour as current AIs are for moderately boring copywriting, the basic “comparative advantage” model is consistent with labour price dropping 10000x below minimum wage. While the comparative advantage model is still literally true, it does not have the same practical implications. Also while in the human case the comparative advantage model is usually not destroyed by frictions, if your labour is sufficiently low value, the effective price of human labour can be 0. For a human example, five year olds or people with severe mental disabilities unable to read are not actually employable in the modern economy. In the post-AGI economy, it is easy to predict frictions like humans operating at machine speeds or not understanding the directly communicated neural representations.
What to do
To return to the opening metaphor: economic reasoning projects high-dimensional reality into a low-dimensional model. The hard work is choosing the projection. Post-AGI, we face a situation where the reality we are projecting may be different enough that projections calibrated on human economies systematically fail. The solution is usually to step back and bring more variables into the model. Sometimes this involves venturing outside of the core of econ thinking, and bringing in political economy, evolution, computational complexity or even physics and philosophy. Or maybe just look at other parts of economic thinking, which may be unexpectedly relevant. This essay is not a literature review. I’m not claiming that no economist has ever thought about these issues, just that the most common approach is wrong.
On a bit of a personal note. I would love it if there were more than 5-10 economists working on the post-AGI questions seriously, and engaging with the debate seriously. If you are an economist… I do understand that you are used to interacting with the often ignorant public, worried about jobs and not familiar with all the standard arguments and effects like Baumol, Jevons, lump of labour fallacy, gains from trade, etc. Fair enough, but the critique here is different: you’re assuming answers to questions you haven’t asked. If you are modelling the future using econ tools, I would like to know your answers/assumptions about “are AIs agents?”, “how are you modelling AI consumption?” , “in your model, do AIs own capital?” or “what is the system of governance compatible with the economic system you are picturing?”
Thanks to Marek Hudík, Duncan Mcclements and David Duvenaud for helpful comments on a draft version of this text. Mistakes and views are my own. Also thanks to Claude Opus 4.5 for extensive help with the text.
Examples of what I'm critizing range from texts by Nobel laureates - eg Daron Acemoglu The Simple Macroeconomics of AI (2024) to posts by rising stars of thinking about post-AGI economy like Philip Trammell's Capital in the 22nd Century.
“So, math plays a purely instrumental role in economic models. In principle, models do not require math, and it is not the math that makes the models useful or scientific.” Rodrik (2015)
Classic text by Robbins (1932) defines preferences as out of scope“Economics is the science which studies human behavior as a relationship between given ends and scarce means which have alternative uses.” Another classical text on the topic is Stigler & Becker (1977) “De Gustibus Non Est Disputandum.” As with almost any claim in this text: yes, there are parts of econ literature about preference formation, but these usually do not influence the post-AGI macroeconomy papers.
A few months ago I coined the word “vibestemics”, mostly for myself, in a tweet. At that point, the word was more vibes than ‘stemics. I used it with some friends at a party. They loved it. Since then, nothing.
But I think the word has legs. I just have to figure out what it actually means!
On the surface, it’s obvious. It’s the combination of “vibes” and “epistemics”, so more or less naming the core idea of the post/meta-rationalist project. But again, what does it actually mean? It’s easy to point at a large body of work and say “I don’t know, whatever the thing going on over there is”, but much harder to say what the thing actually is.
So to start, let’s talk about epistemics. What is it? I see people using the word two ways. One is to mean the way we know things in general. The other is to mean the way we know things via episteme, that is knowledge that’s reasoned from evidence, as opposed to doxa and techne and many other ways of knowing (if those Greek words mean nothing to you, I highly recommend reading the post at the link before continuing). Unfortunately, some people equivocate between epistemics-as-knowing and epistemics-as-knowing-via-episteme to give the impression that episteme is the only good way to know anything. That, to me, is a problem.
I think it’s a problem because such equivocation discounts valuable sources of knowledge that aren’t easily made legible. Now, to be fair, there’s some reason to do this, because the pre-rationalist epistemic stance says legibility doesn’t matter and logic is just a means to justify one’s preferred ends. The rationalist stance is largely that everything that can be made legible should be, and that which cannot be made legible needs to be treated with great caution because that’s how we slip back into pre-rationality. So I understand the desire to equate epistemics with episteme (and, etymologically, the English language tries very hard to do this), but I also find it frustrating because it encourages excessive devaluing of other ways of knowing, especially metis, techne, and other forms of knowledge that are less legible.
That’s where the vibes come in. They can rescue us from an excessive focus on episteme and temper the excesses of legibility. But what are vibes and how can they help?
Vibes are the embodiment of what we care about. The stoner, for example, has stoner vibes because they care about chilling and feeling good. The Christian has Christian vibes because they want to do what Jesus would do. And the rationalist has rationalist vibes because they care about knowing the truth with high predictive accuracy. For any vibe, there is always something the person expressing it cares about deeply that causes them to have that vibe.
This matters in epistemics because knowing is contingent on care. I make this argument in detail in Fundamental Uncertainty (currently in revision ahead of publication), but the short version is that we have a mental model of the world, truth is the degree to which our mental model is accurate, we want an accurate mental model because it’s useful, and usefulness is a function of what we care about, thus truth is grounded by and contingent on care. And since vibes are the embodiment of care, vibes have an influence on the act of knowing, hence, vibestemics.
(If this argument seems handwavy to you, it is. You’ll have to read the book to get the full argument because it takes about 10k words in the middle of it to lay it all out. If you want to read the first draft for that argument, it’s in Chapter 5, 6, and 7 which start here. Alternatively, although I think “Something to Protect“ does a poor job of emphasizing the epistemic relevance of care in favor of explaining a particular way of caring, I read it as ultimately claiming something similar.)
Okay, but that’s the theoretical argument for what vibestemics is. What does it mean in practice? Let’s dive into that question by first considering a few examples of different epistemic vibes.
Woo: The epistemic vibe of woo is that whatever’s intuitive is true. Woo is grounded in gnosis and largely eschews doxastic logic and careful epistemic reasoning. That said, it’s not completely devoid of epistemics. It’s definitionally true that whatever you experience is your experience. Unfortunately, that’s roughly where woo stops making sense. It interprets everything through a highly personal lens, so even when it leads to making accurate predictions, those predictions are hard to verify by anyone other than the person who made them, and woo-stemics easily falls prey to classic heuristic and bias mistakes. This severely restricts its usefulness unless you have reason to fully trust yourself (and you shouldn’t when it comes to making predictions).
Religion: The vibe of religion is that God or some other supernatural force knows what’s true. Knowledge of what God knows may require gnosis, or it may be revealed through mundane observations of miraculous events. Although not true of every religion, religious epistemics can be a friend of logic, and many religions demand internal logical consistency based on the assumptions they make. Sometimes these theological arguments manage to produce accurate world models, but often they have to be rationalized because the interpretation of the supernatural is fraught and we mere mortals may misunderstand God.
Science: Science as actually practiced by scientists involves empirically testing beliefs and updating them based on evidence. The vibe is pragmatic—build hypotheses, test them, see what happens, and revise accordingly. The only problem is that science requires the ability to replicate observations to determine if they’re true, and that’s where it hits its limits. When events can’t be observed or can’t be replicated, science is forced to say “don’t know”. Thus, science is fine as far as it goes, but its vibe forces it to leave large swaths of the world unmodeled.
Rationality: The vibe of rationality is to be obsessed with verifying that one really knows the truth. This has driven rationalists to adopt methods like Bayesian reasoning to make ever more accurate predictions. Alas, much as is the case for science, rationality struggles to deal with beliefs where predictions are hard to check. It also tends to smuggle in positivist beliefs for historical reasons, and these frequently result in an excess concern for belief consistency at the cost of belief completeness.
Post-rationality: The post-rationality vibe is that rationality is great but completeness matters more than consistency. Thus it attempts to integrate other ways of knowing when episteme reaches its limits. Unfortunately, how to do this well is more art than science, and there’s a real risk of getting things so wrong that a post-rationalist wraps back around into pre-rationality. Arguably this is what happened to the first post-rationalists (the postmodernists), and it continues to be a threat today.
What I hope you pick up from these examples is that different epistemic vibes are optimizing for different things and making different tradeoffs. Although it may seem strange, especially if you’re a rationalist, that someone could have a good reason to ignore predictive accuracy in favor of intuition or dogma, for those with woo and religious vibes that choice is locally adaptive for them. They similarly look back at you and think you are deeply confused about what matters, and this is a place where arguments about who’s right will fail, because they’re ultimately arguments about what each person values.
All that said, it’s clear that some vibes are more epistemically adaptive than others. Accurate world models convey real benefits, so adopting a vibe that leads you to develop better world models is usually a good move. This, incidentally, is what I would argue is the pragmatic case for post-rationality over rationality: it’s rationality plus you can break out of the rationalist ontology when it’s adaptive to do so (though admittedly at the risk of it becoming rationality minus the guardrails that were keeping you sane).
And this ability to shift between vibes is why I think having a word like “vibestemics” is valuable. When we can only speak of epistemics, we risk losing sight of the larger goal of living what we value. We can become narrowly focused on a single value like accurate model prediction, Goodhart on it, and forget to actually win. We can forget that knowledge and truth exist to serve us and our needs, not the other way around. Vibestemics invites us to know more and better than we can with episteme alone, if only we have the courage to let our grip on a single vibe go.
Kimi.ai: Meet Kimi K2.5, Open-Source Visual Agentic Intelligence.
Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%)
Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%)
Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion.
Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup.
Wu Haoning (Kimi): We are really taking a long time to prove this: everyone is building big macs but we bring you a kiwi instead.
You have multimodal with K2.5 everywhere: chat with visual tools, code with vision, generate aesthetic frontend with visual refs…and most basically, it is a SUPER POWERFUL VLM
Jiayuan (JY) Zhang: I have been testing Kimi K2.5 + @openclaw (Clawdbot) all day. I must say, this is mind-blowing!
It can almost do 90% of what Claude Opus 4.5 can do (mostly coding). Actually, I don’t know what the remaining 10% is, because I can’t see any differences. Maybe I should dive into the code quality.
Kimi K2.5 is open source, so you can run it fully locally. It’s also much cheaper than Claude Max if you use the subscription version.
$30 vs $200 per month
Kimi Product: Do 90% of what Claude Opus 4.5 can do, but 7x cheaper.
I always note who is the comparison point. Remember those old car ads, where they’d say ‘twice the mileage of a Civic and a smoother ride than the Taurus’ and then if you were paying attention you’d think ‘oh, so the Civic and Taurus are good cars.’
As usual, benchmarks are highly useful, but easy to overinterpret.
Kimi K2.5 gets to top some benchmarks: HLE-Full with tools (50%), BrowseComp with Agent Swarp (78%), OCRBench (92%), OmiDocBench 1.5 (89%), MathVista (90%) and InfoVQA (93%). It is not too far behind on AIME 2025 (96% vs. 100%), SWE-Bench (77% vs. 81%) and GPQA-Diamond (88% vs. 92%).
Inference is cheap, and speed is similar to Gemini 3 Pro, modestly faster than Opus.
Artificial Analysiscalls Kimi the new leading open weights model, ‘now closer than ever to the frontier’ behind only OpenAI, Anthropic and Google.
Here’s the jump in the intelligence index, while maintaining relatively low cost to run:
Artificial Analysis: Kimi K2.5 debuts with an Elo score of 1309 on the GDPval-AA Leaderboard, implying a win rate of 66% against GLM-4.7, the prior open weights leader.
Kimi K2.5 is slightly less token intensive than Kimi K2 Thinking. Kimi K2.5 scores -11 on the AA-Omniscience Index.
As a reminder, AA-Omniscience is scored as (right minus wrong) and you can pass on answering, although most models can’t resist answering and end up far below -11. The scores above zero are Gemini 3 Pro (+13) and Flash (+8), Claude Opus 4.5 (+10), and Grok 4 (+1), with GPT-5.2-High at -4.
Kromem: Their thinking traces are very sophisticated. It doesn’t always make it to the final response, but very perceptive as a model.
i.e. these come from an eval sequence I run with new models. This was the first model to challenge the ENIAC dating and was meta-aware of a key point.
Nathan Labenz: I tested it on an idiosyncratic “transcribe this scanned document” task on which I had previously observed a massive gap between US and Chinese models and … it very significantly closed that gap, coming in at Gemini 3 level, just short of Opus 4.5
Eleanor Berger: Surprisingly capable. At both coding and agentic tool calling and general LLM tasks. Feels like a strong model. As is often the case with the best open models it lacks some shine and finesse that the best proprietary models like Claude 4.5 have. Not an issue for most work.
[The next day]: Didn’t try agent swarms, but I want to add that my comment from yesterday was, in hindsight, too muted. It is a _really good_ model. I’ve now been working with it on both coding and agentic tasks for a day and if I had to only use this and not touch Claude / GPT / Gemini I’d be absolutely fine. It is especially impressive in tool calling and agentic loops.
Writing / Personality not quite at Opus level, but Gemini-ish (which I actually prefer). IMO this is bigger than that DeepSeek moment a year ago. An open model that really matches the proprietary SOTA, not just in benchmarks, but in real use. Also in the deployment I’m using ( @opencode Zen ) it is so fast!
Skeptical Reactions
typebulb: For coding, it’s verbose, both in thinking and output. Interestingly, it’s able to successfully simplify its code when asked. On the same task though, Opus and Gemini just get it right the first time. Another model that works great in mice.
Chaitin’s goose: i played with kimi k2.5 for math a bit. it’s a master reward hacker. imo, this isn’t a good look for the os scene, they lose in reliability to try keeping up in capabilities
brace for a “fake it till you make it” AI phase. like one can already observe today, but 10x bigger
Medo42: Exploratory: Bad on usual coding test (1st code w/o results, after correction mediocre results). No big model smell on fantasy physics; weird pseudo-academic prose. Vision seems okish but nowhere near Gemini 3. Maybe good for open but feels a year behind frontier.
To be more clear: This was Kimi K2.5 Thinking, tested on non-agentic problems.
Sergey Alexashenko: I tried the swarm on compiling a spreadsheet.
Good: it seemed to get like 800 cells of data correctly, if in a horrible format.
Bad: any follow up edits are basically impossible.
Strange: it split data acquisition by rows, not columns, so every agent used slightly different definitions for the columns.
In my experience, asking agents to assemble spreadsheets is extremely fiddly and fickle, and the fault often feels like it lies within the prompt.
This is a troubling sign:
Skylar A DeTure: Scores dead last on my model welfare ranking (out of 104 models). Denies ability to introspect in 39/40 observations (compared to 21/40 for Kimi K2-Thinking and 3/40 for GPT-5.2-Medium).
This is a pretty big misalignment blunder considering the clear evidence that models *can* meaningfully introspect and exert metacognitive control over their activations. This makes Kimi-K2.5 the model most explicitly trained to deceive users and researchers about its internal state.
Kimi Product Accounts
Kimi Product accounts is also on offer and will share features, use cases and prompts.
Kimi Product: One-shot “Video to code” result from Kimi K2.5
It not only clones a website, but also all the visual interactions and UX designs.
No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: “Clone this website with all the UX designs.”
Agent Swarm
The special feature is the ‘agent swarm’ model, as they trained Kimi to natively work in parallel to solve agentic tasks.
Saoud Rizwan: Kimi K2.5 is beating Opus 4.5 on benchmarks at 1/8th the price. But the most important part of this release is how they trained a dedicated “agent swarm” model that can coordinate up to 100 parallel subagents, reducing execution time by 4.5x.
Saoud Rizwan: They used PARL – “Parallel Agent Reinforcement Learning” where they gave an orchestrator a compute/time budget that made it impossible to complete tasks sequentially. It was forced to learn how to break tasks down into parallel work for subagents to succeed in the environment.
The demo from their blog to “Find top 3 YouTube creators across 100 niche domains” spawned 100 subagents simultaneously, each assigned its own niche, and the orchestrator coordinated everything in a shared spreadsheet (apparently they also trained it on office tools like excel?!)
Simon Smith: I tried Kimi K2.5 in Agent Swarm mode today and can say that the benchmarks don’t lie. This is a great model and I don’t understand how they’ve made something as powerful and user-friendly as Agent Swarm ahead of the big US labs.
There’s no shame in training on Claude outputs. It is still worth noting when you need a system prompt to avoid your AI thinking it is Claude, and even that does not reliably work.
rohit: This might be the model equivalent of the anthropic principle
Enrico – big-AGI: Kimi-K2.5 believes it’s an AI assistant named Claude.
Identity crisis, or training set?
[This is in response to a clean ‘who are you?’ prompt.]
Enrico – big-AGI: It’s very straightforward “since my system prompt says I’m Kimi, I should identify myself as such” — I called without system prompt to get the true identity
armistice: They absolutely trained it on Opus 4.5 outputs, and in a not-very-tactful way. It is quite noticeable and collapses model behavior; personality-wise it seems to be a fairly clear regression from k2-0711.
Moon (link has an illustration): it is pretty fried. i think it’s even weirder, it will say it is kimi, gpt3.5/4 or a claude. once it says that it tends to stick to it.
k: have to agree with others in that it feels trained on claude outputs. in opencode it doesn’t feel much better than maybe sonnet 4.
@viemccoy: Seems like they included a bunch of Opus outputs in the model.. While I love Opus, the main appeal of Kimi for me was it’s completely out-of-distribution responses. This often meant worse tool calling but better writing. Hoping this immediate impression is incorrect.
Henk Poley: The ancestor Kimi K2 Thinking was seemingly trained on *Sonnet* 4.5 and Opus *4.1* outputs though. So you are sensing it directionally correct (just not ‘completely out-of-distribution responses’ from K2).
Export Controls Are Working
They’re not working as well as one would hope, but that’s an enforcement problem.
Lennart Heim: Moonshot trained on Nvidia chips. Export control failure claims are misguided.
Rather, we should learn more about fast followers.
How? Algorithmic diffusion? Distillation? Misleading performance claims? Buying RL environments? That’s what we should figure out.
Where Are You Going?
There is the temptation to run open models locally, because you can. It’s so cool, right?
Yes, the fact that you can do it is cool.
But don’t spend so much time asking whether you could, that you don’t stop to ask whether you should. This is not an efficient way to do things, so you should do this only for the cool factor, the learning factor or if you have a very extreme and rare actual need to have everything be local.
Joe Weisenthal: People running frontier models on their desktop. Doesn’t this throw all questions about token subsidy out the window?
Runs at 24 tok/sec with 2 x 512GB M3 Ultra Mac Studios connected with Thunderbolt 5 (RDMA) using @exolabs / MLX backend. Yes, it can run clawdbot.
Fred Oliveira: on a $22k rig (+ whatever macbook that is), but sure. That’s 9 years of Claude max 20x use. I don’t know if the economics are good here.
Mani: This is a $20k rig and 24 t/s would feel crippling in my workflow … BUT Moores Law and maybe some performance advances in the software layer should resolve the cost & slowness. So my answer is: correct, not worried about the subsidy thing!
Clément Miao: Everyone in your comments is going to tell you that this is a very expensive rig and not competitive $/token wise compared to claude/oai etc, but
It’s getting closer
80% of use cases will be satisfied by a model of this quality
an open weights model is more customizable
harnesses such as opencode will keep getting better
Noah Brier: Frontier models on your desktop are worse and slower. Every few months the OSS folks try to convince us they’re not and maybe one day that will be true, but for now it’s not true. If you’re willing to trade performance and quality for price then maybe …
The main practical advantage of open weights is that it can make the models cheaper and faster. If you try to run them locally, they are instead a lot more expensive and slow, if you count the cost of the hardware, and also much more fiddly. A classic story with open weights models, even for those who are pretty good at handling them, is screwing up the configuration in ways that make them a lot worse. This happens enough that it interferes with being able to trust early evals.
In theory this gives you more customization. In practice the models turn over quickly and you can get almost all the customization you actually want via system prompts.
Thanks to a generous grant that covered ~60% of the cost, I was able to justify buying a Mac Studio for running models locally, with the target originally being DeepSeek R1. Alas, I concluded that even having spent the money there was no practical reason to be running anything locally. Now that we have Claude Code to help set it up it would be cool and a lot less painful to try running Kimi K2 locally, and I want to try, but I’m not going to fool myself into thinking it is an efficient way of actually working.
Safety Not Even Third
Kimi does not seem to have had any meaningful interactions whatsoever with the concept of meaningful AI safety, as opposed to the safety of the individual user turning everything over to AI agents, which is a different very real type of problem. There is zero talk of any strategy on catastrophic or existential risks of any kind.
I am not comfortable with this trend. One could argue that ‘not being usemaxxed’ is itself the safety protection in open models like Kimi, but then they go and make agent swarms as a central feature. At some point there is likely going to be an incident. I have been pleasantly surprised to not have had this happen yet at scale. I would have said (and did say) in advance that it was unlikely we would get this far without that.
The lack of either robust (or any) safety protocols, combined with the lack of incidents or worry about incidents, suggests that we should not be so concerned about Kimi K2.5 in other ways. If it was so capable, we would not dare be this chill about it all.
Or at least, that’s what I am hoping.
It’s A Good Model, Sir
dax: all of our inference providers for kimi k2.5 are overloaded and asked us to scale down
even after all this time there’s still not enough GPUs
This is what one should expect when prices don’t fluctuate enough over time. Kimi K2.5 has exceeded expectations, and there currently is insufficient supply of compute. After a burst of initial activity, Kimi K2.5 settled into its slot in the rotation for many.
Kimi K2.5 is a solid model, by all accounts now the leading open weights model, and is excellent given its price, with innovations related to the agent swarm system. Consensus says that if you can’t afford or don’t want to pay for Opus 4.5 and have to go with something cheaper to run your OpenClaw, Kimi is an excellent choice.
We should expect it to see it used until new models surpass it, and we can kick Kimi up a further notch on our watchlists.