2026-04-08 11:47:11
After writing about my ethics yesterday, there was some discussion about the axioms that I think are most needed to derive the rest of my ethics.
pleasure is good
nobody argues against this.
suffering is bad
Someone did argue that this is potentially untrue, that there exists “voluntary suffering”. I think this is one case where language is a bit inadequate.
I think it is very possible for “suffering” to be good. There are two cases for this:
“suffering” in which states are described as negative, but which are still positive valence. One example of this is the burn one feels from spicy food. This still feels good and is pleasurable, despite nominally having aspects which are described as bad. Some similar things are when crying feels cathartic. Or people who gain direct pleasure from painful stimuli. Often there is a limit to how far one can go before the direct pain stops feeling directly pleasurable, but there is a lot of variation in the human mind, and some people gain mental pleasure from being able to withstand levels of pain that are considered unbearable. This can be due to to things like feelings of pride, or servitude, or novelty.
“suffering” in which one was actually in pain/suffering at the time of the even, but which leads one to better mental states after the fact. Perhaps it leads one to grow and fix one’s other problems. Perhaps it is a memorable experience one finds valuable.
I have experienced both. Suffering can be a way to describe this, if the experience is also either positive-valence, or leading to longer-term pleasure, then I’m not sure it counts.
I think there are some forms of suffering that are near universally felt as bad. This can be chronic pain one gets from illness, or the suffering one can feel when feverish, scenarios of starvation or hunger, or through effective torture. And I guess with “suffering is bad” I am trying to point more-so at this.
death is bad
I guess I’m unsure. There are some more thought experiments that drive this intuition.
If one had the universe suddenly end and everyone died, would that be bad? Oleander argued not in a previous comment, but I think so. Partially this would be because you would be depriving people of more pleasure (as was argued by Measure).
What if everyone was in a state of very mild net-suffering overall? Hmm I guess I’m not as sure. I think this is just a bad state of the world. I would say death is bad but by some bounded amount that is out-weighed by the continued suffering.
What if everyone was replaced by beings that are similarly happy plus a tiny bit more? I guess I feel pretty uncomfortable about this one. In theory this should be an obvious trade, if the increase in happiness is sufficiently high, even with my framework. And that is probably true. But my values conflict here and I don’t like it.
I guess to some extent this is where my slightly more person-affecting views come it.
One slight intuition is something like “a universe which has the same pleasurable state repeated over and over again is less valuable than one which has more variation“. But I don’t think this is sufficient to explain it.
One could also consider the Epicurean challenge: “”So death, the most terrifying of ills, is nothing to us, since so long as we exist, death is not with us; but when death comes, then we do not exist“. But i don’t really buy it. I care about states of the world outside of when I am alive.
To be honest, it probably comes down to something like “I value my own continued existence, and thus end up drawing ethics in a way where this is justified”. So I am probably just being biased here. I am unsure how much I should update here though.
Thanks for reading Cute Suspicions! Subscribe for free to receive new posts and support my work.
After writing about my ethics yesterday, there was some discussion about the axioms that I think are most needed to derive the rest of my ethics.
pleasure is good
nobody argues against this.
suffering is bad
Someone did argue that this is potentially untrue, that there exists “voluntary suffering”. I think this is one case where language is a bit inadequate.
I think it is very possible for “suffering” to be good. There are two cases for this:
I have experienced both. Suffering can be a way to describe this, if the experience is also either positive-valence, or leading to longer-term pleasure, then I’m not sure it counts.
I think there are some forms of suffering that are near universally felt as bad. This can be chronic pain one gets from illness, or the suffering one can feel when feverish, scenarios of starvation or hunger, or through effective torture. And I guess with “suffering is bad” I am trying to point more-so at this.
death is bad
I guess I’m unsure. There are some more thought experiments that drive this intuition.
If one had the universe suddenly end and everyone died, would that be bad? Oleander argued not in a previous comment, but I think so. Partially this would be because you would be depriving people of more pleasure (as was argued by Measure).
What if everyone was in a state of very mild net-suffering overall? Hmm I guess I’m not as sure. I think this is just a bad state of the world. I would say death is bad but by some bounded amount that is out-weighed by the continued suffering.
What if everyone was replaced by beings that are similarly happy plus a tiny bit more? I guess I feel pretty uncomfortable about this one. In theory this should be an obvious trade, if the increase in happiness is sufficiently high, even with my framework. And that is probably true. But my values conflict here and I don’t like it.
I guess to some extent this is where my slightly more person-affecting views come it.
One slight intuition is something like “a universe which has the same pleasurable state repeated over and over again is less valuable than one which has more variation“. But I don’t think this is sufficient to explain it.
One could also consider the Epicurean challenge: “”So death, the most terrifying of ills, is nothing to us, since so long as we exist, death is not with us; but when death comes, then we do not exist“. But i don’t really buy it. I care about states of the world outside of when I am alive.
To be honest, it probably comes down to something like “I value my own continued existence, and thus end up drawing ethics in a way where this is justified”. So I am probably just being biased here. I am unsure how much I should update here though.
2026-04-08 10:45:12
These are things I've learned from experience that others might find helpful. Some of them are easy to miss for a while. (Also an exercise in "reality contains a surprising amount of detail"; I could probably have kept going for a while but needed to call it at some point.)
Oven thermostats are often miscalibrated enough to matter. If you're following existing recipes but find things often coming out overdone or underdone, you might consider buying an oven thermometer to check how miscalibrated your oven thermostat is. Unfortunately, oven thermometers are also often miscalibrated. Fortunately, they're not that expensive[1]. A friend of mine bought three from three different brands to check for inter-rater agreement. Note that ovens can end up at different temperatures in different locations within the oven[2], so ideally you want to place all three thermometers relatively closely together (but not touching) roughly around where you typically put the thing you're baking. (Also note that other factors can affect baking times, like altitude.)
You need to use mass measurements rather than volumetric measurements. For everything macro-scale, anyways - if a recipe asks for a teaspoon of vanilla extract, nobody will tell you how many grams that was supposed to be and there aren't that many available sources of variance. Much less the case for e.g. "cups of flour"! Flour in particular is highly compressible[3] and many recipes use highly unrealistic estimates of how many grams there are in a cup of flour[4], when telling you how many cups to use. Fortunately, the aforementioned friend also ran a Flour Measuring Science Party and walked away with a spreadsheet. And it turns out that you can hit 120 grams per cup if you carefully scoop the flour in with a fork, but if you just use the cup measure itself as the scoop you're more likely to end up at 140 grams, and deliberately packing it down can get you to 180. Which is to say: always use mass measurements when available. If a recipe website doesn't either default to mass measurements, or provide a toggle, that's a deeply negative sign about its quality. Relatedly...
Own a kitchen scale. You want one that lets you switch between different units and zero out the current weight. This one is pretty good; the ones that cost $12-15 are probably also pretty good.
Disposable shower caps are a huge improvement over saran wrap, when it comes to operations like "cover the bowl containing the bread dough while proofing". 95% reduction in effort. Something like these[5].
You can "prep" good bread dough in less than ten minutes. I recommend Zvi's transcription of the core recipe from The New Artisan Bread in Five Minutes a Day. I've never tried it with all-purpose flour; I recommend just purchasing bread flour. You may notice that the linked recipe uses volumetric measurements. Having made that recipe probably 20+ times now, and having a sense for how minor variations in flour/water ratios affect the dough, I can now provide you with reliable mass measurements instead, as well as other improvements and notes:
Fast[6] Bread
Follow steps 1 - 3 in Zvi's post. You can speed up step 4 by heating your oven up to ~110F (or using a "warming mat") and tossing the covered dough in for ~75 minutes. You'll likely need to dial this process in yourself, so give it an extra 15 minutes at room temp the first couple times you do it to check how much more it rises - if it's a noticeable amount it probably wasn't quite there.
Zvi's step 5 says "Put it in the refrigerator and use as needed. It should be good for at least two weeks." I would go further and say that there's a noticeable improvement in bread quality after the first 12-24 hours of refrigeration, compared to making it immediately after the first rise. Fermentation will continue (slowly) over the coming days, which many but not all people regard as a positive.
Zvi then goes on to the actual baking instructions. Here are some additional notes of mine:
Salted butter contains meaningful amounts of salt, but this is usually fine if you don't have unsalted butter. Table salt is about 40% sodium, so take the sodium quantity in the salted butter (don't forget to multiply the "per serving" by the number of "servings" in the quantity of butter you'll end up using) and multiply it by 2.5 to see how much you should reduce the quantity of "salt" (by mass) you add. A teaspoon of table salt is roughly 6 grams, so if a recipe calls for a stick of unsalted butter and a teaspoon of salt, and you only have a stick of salted[7] butter, just reduce it to two-thirds of a teaspoon[8]. Relatedly...
Pay attention to whether the recipe is asking for kosher salt or table/fine salt. Salt is one of those cursed ingredients that even the good recipe websites will generally only provide a volumetric measurement for, generally in teaspoons. The usual conversion ratio is to cut the volume of salt in half when going from kosher to table salt - that is, you should use half a teaspoon of table salt to substitute for a full teaspoon of kosher salt[9]. But, also, many recipes are not that sensitive to the exact amount of salt and if you overdo it by 20-30% you probably won't notice. (You might if you underdo it.)
Cakes are often a bad trade-off in terms of effort vs. reward. They often take dramatically more time than other things that people tend to like about as much, so if you're making a cake you should probably be trying to make something that doesn't have a relatively close substitute in the rest of dessert-space. If you just want "chocolate dessert" make these muffins. Other people might suggest brownies, but, ugh. There are some exceptions: with a bit of iteration and experimentation, Claude and I came up with a surprisingly good (and vegan!) chocolate cake recipe that you can probably prep in less than 30 minutes:
Dark Wacky Cake
INGREDIENTS
STEPS
1. Preheat oven: Preheat oven to 350°F (175°C). Lightly grease a 9x9 inch pan or line with parchment paper.
2. Combine dry ingredients: In a medium bowl, whisk together all of the flour, sugar, cocoa powder, baking soda, salt, and espresso powder. Sift if the cocoa is lumpy—Double Dark tends to clump. Make sure the baking soda is evenly distributed throughout.
3. Combine wet ingredients: In a separate bowl or large measuring cup, combine cold water or cold coffee, olive oil, vinegar, and vanilla extract. Stir briefly to combine.
4. Mix batter: Pour the wet ingredients into the dry ingredients. Stir until just combined—you'll see some fizzing as the vinegar reacts with the baking soda. Don't overmix; a few small lumps are fine. The batter will be thinner than a typical cake batter.
5. Bake immediately: Pour the batter into your prepared pan right away—the leavening reaction is happening now. Bake for 28-32 minutes, until a toothpick inserted in the center comes out with moist crumbs (not wet batter, not bone dry).
6. Cool: Let the cake cool in the pan for 10 minutes, then serve directly from the pan or turn out onto a rack. Top with powdered sugar, ganache, or eat plain.
Relatedly, modern frontier LLMs[10] are surprisingly useful cooking assistants:
Recipe websites are deeply unreliable for estimates of "prep time". Generally their estimates are dramatic underestimates for total time from "getting off the couch" to "thing goes in oven", even if you're an experienced home baker. Frontier LLMs also often make mistakes like this, at least with the kind of naive prompting that I've tried so far[12].
Thanks to Drake Thomas for getting me into baking, introducing me to Smitten Kitchen, buying three oven thermometers, and hosting a Flour Measuring Science Party.
$5 - $15 apiece
Though this is less likely with convection turned on.
And therefore high-variance when measured by volume.
120-130 grams is the most common translation that recipes use, for AP flour.
I'm not sure I've ever been the one to buy them, but they're pretty undifferentiated except for size and you probably just want the lowest unit cost.
To prep! Your fastest end-to-end time for actually having bread that you could even in theory eat is something like 2 hours, and that'd be cutting a lot of corners.
Very often 90mg of sodium per 14g serving, or ~720mg per stick (113g).
0.72 * 2.5 = 1.8; 1.8/6 = 0.3
This is apparently only true for the Diamond Crystal brand of kosher salt; you multiply by 0.75 rather than by 0.5 if translating from Morton's kosher salt. But apparently most recipes assume Diamond Crystal, so "cut it in half" is usually correct. The additional facts in this footnote I learned from LLMs while writing this post, so consider taking them with a grain of salt.
Opus 4.6 and ChatGPT (5.4), both with extended thinking enabled, at the time of writing this.
Often this will be down to "tastes vary; if you want more [x] do this, otherwise do [y]".
I haven't tried anything more clever than "Give me recipes following [constraint x] that take less than 45 minutes", or "How much prep time will [recipe] take?"
2026-04-08 10:43:47
Semiconductor Fabs I: The Equipment
Semiconductor Fabs II: The Operation
I tried to include as many links as possible to allow the reader to go down rabbit holes as they see fit.
I try to include analogies in case the explanation is poor or the topic esoteric.
I don’t work in an advanced fab, but have some glimpses into them, so I rely on my ideas, conjecture, and the literature for a more accurate representation of what state-of-the-art fabs look like (although I’m sure there are features I could never dream of).
I first go over the data side of things, since a lot of that is fundamental to how the automation operates.
Fabs are incredibly hungry for data. Insatiably hungry. Data helps to connect patterns, solve problems, troubleshoot issues, and just plain understand what the heck is going on at the atomic level where the transistors and interconnects are made. I call it the fab data monster and Nano Banana 2 thinks it looks something like this:
More data is almost always a good thing because it allows you to fit your conclusion to the data. Just kidding. But not really. If an engineer has no freaking clue what’s causing a problem, blindly scouring the data may help uncover some anomaly that can clue them in on root cause. (But if you’re having to blindly scour the data, you probably don’t have enough data or your FDC systems aren’t developed enough.)
That said, false positives and false negatives are legitimate concerns that have to be considered. False positives may result in troubleshooting efforts in the wrong area and wasted time, money, and effort. False negatives may result in the ultimate root cause being overlooked and wasted time, money, and effort pursuing the wrong area.
How much data could one fab possibly need? And what could they possibly be measuring that culminates in petabytes (1000 TB, or 1,000,000 GB) of data?
Here’s a “short” list of potential equipment signals fabs can keep track of. For example, engineers may want to keep track of the temperature, pressure, and power of component1, while voltage, current, and resistance are relevant to component2.
Now those are just single characteristics that don’t add up to much storage space on their own. But what happens when we take measurements across multiple components across multiple tools across the entire fab?
Nano Banana 2’s go at labeling a bunch of signals on a plasma etch chamber—not bad!
Let’s assume the following for the NMP fab:
The math is then pretty easy:
total = 1000 tools × 250 signals/tool × 500 kB/signal/hour = 125 GB per hour ≈ 1000 TB per year of data
But wait! Those are just raw signals that the tool is reporting to the fab data monster. Statistics can be performed to get some extra info for each signal: mean, median, standard deviation, minimum, maximum, etc. Just those alone results in five times more data per signal than before (assuming you are constantly updating across the same time period, but generally a time period is defined and a single data point calculated for said period).
That’s a lot of data that is just passively created and recorded, some of which will be looked at, a lot of which won’t be. Regardless, it’s nice to have in case you need it.
Wafers regularly get measured—either randomly, as determined by some algorithm, or intentionally due to the criticality of the process it just went through—throughout the line to ensure quality control at all processes. The measurements may be thickness after some film was deposited or etched, critical dimensions, number of particles on the wafer, or more specific measurements that are left up to the reader to determine.
These results are generally boring and not looked at because, well, they rarely fail, at least in more mature fabs where the technologies’ manufacturing processes have been optimized for years. Regardless, the tests are required for various reasons and results must sit in storage for some time.
Some fabs (all fabs? Not exactly sure here.) will test their wafers in-house towards the end of the line to shorten the feedback loop if an issue is identified or get test results quickly so they can make changes permanent; it would be weeks if they chose to wait until the wafer got what is called its “final test” results, which measure the chip’s performance at its intended purpose.
In most, but preferably all, cases, the electrical test results are the fab’s gold standard for the quality of the wafer: if it’s passing and within the historical distribution of that parameter for similar devices, great! If it’s not, then something appears to have changed either within the line or with that specific lot. If the next lot of wafers that gets tested has similar out-of-the-distribution results... get investigating!
Nano Banana 2’s go at describing some measurements—also not bad!
A few not-related-to-the-main-topic notes here for the more technically curious:
Histories allow engineers to look back and see what happened on a certain date or to a certain lot. Tool signals are a form of history (what was component X doing at Y time?), but other histories are also important.
Fabs want a way to easily document events in a machine’s life, whether automated or input by a person. These leave digital bread crumbs of-sorts that can be checked to see what happened, why it happened, etc.
Here are some examples of helpful automated comments:
SPC chart X failed with value of Y1 and Y2. Limit is X. Last Z points have been in control. [This is an event that I can anchor to and look around at: what maintenance, if any, was done before the chart fail? What was the response to the chart fail? Was something repaired or replaced?]
Machine alarm X with description “Y” occurred. It has happened Z times over the past 30 days. Review recommended troubleshooting and solutions here: [link]. [This is an event that I can anchor to just like the last one, plus I get to see how bad the issue has gotten.]
And here are some examples of helpful typed-by-a-person comments:
Removed parts A1/2/3, B, and C to better diagnose lift issue. Found that part A1 is sticking at its end range of motion, which corresponds to the side of the lift that’s having the problem. Parts A2, A3, B, and C are all good and have no obvious issues when testing. Regardless, replaced all three part As since everything is open. Original B and C will go back in. The new A1/2/3 serial numbers are 123, 456, and 789, respectively. Next steps are [list of next steps]. [This tells me exactly what the issue was and what was replaced, so future me can just reference back.]
Machine alarmed for X. Found that part D was completely powered off. Verified that all relevant circuit breakers were on and there is no power discontinuity up to the part, so appears that D has failed. Wafer 13 was 10 seconds into step E of recipe F when the failure happened. All wafers placed back into the FOUP. Replaced part D and verified it has power and functions normally. Machine was vented to atmosphere and opened for replacement. No other issues noted on machine. [Same as the above.]
This is all data! It may not be numbers, but it paints a clear picture of the what happened on a machine during a certain time period.
Lots will get data and information automatically “attached” to them throughout their life. Examples include what machine the lot processed on for a certain process, what time it started and stopped, were there any abnormalities while processing, what associated data (as in in-line measurement results) is there. The list goes on. Like the in-line measurements, this data is really only looked at when there’s an issue.
Automation is a beautiful thing. It helped get us cheap and abundant everything, including semiconductors.
Automation here is referred to as, well, anything automated, and no, I’m not being a smartass. A majority of fab automation has to do with the actual wafer processing and making it less error-prone and more efficient, but there are plenty of other uses.
The life cycle of a wafer—from its start as just a bare silicon wafer to its end when it’s full of chips—can be looked at to get a better understand of what fab automation is and isn’t. I’ve provided significant detail both for nerd-sniping and to get a good picture of how many decisions are actually automated in the fab on a second-by-second basis. While reading, think about how time-consuming and error-filled a fab would be if humans had to make all the decisions and perform all the calculations.
Here’s the basic flow for how a lot runs through the line, along with some non-ideal situations arising throughout to show what automation can and can’t do:
This flow isn’t an exhaustive list of all of the potential pain points, but illustrates a good chunk of them. Now imagine having to do all of the following by hand:
Now rinse and repeat some of these multiple times for the hundreds, if not thousands, of lots running the fab at any given point. It’s unsustainable and overwhelming, hence the need for automated systems and processes to do most of the work. QED.
Averroes does an excellent job of explaining FDC systems and there’s no need to reinvent (or re-explain) the wheel.
I’ve never seen or heard of anything like this, but if I could vibe-code up a part management system, it would look something like this:
Automation here is similar to automation elsewhere: it makes people’s lives and the manufacturing process safer and more efficient. And it’s freaking awesome. I can sit at my desk and do a good chunk of my job without moving because some automation guru coded up a wonderful script that helps me out.
And while the fab requires everyone to operate smoothly, the automation engineers are the heroes working in the background that nobody really thinks about. Here’s to them.
2026-04-08 09:30:26
Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for other reasons you know what mistakes I’m claiming Opus 4.6 makes. If you’re ineligible, please don’t help other people complete the challenge.
I have recently started using Claude Opus 4.6 to start studying Ancient Greek. Specifically, I initially used it to grade problem sets at the end of the textbook I’ve been using, but then I got worried about it being sycophantic towards my answers, so started having it just write out the answers itself.
I recently gave it this prompt, from the end of Chapter 3 of my textbook:
Can you write out the answers to this Ancient Greek fill-in-the-blanks exercise so that I can check my answers against yours? The exercise is to fill the blanks, marked as ___ with the words under “Λέξεις”.
Α ___ ἐστίν. Α καὶ Β ___ εἰσιν. Α, Β, καὶ Γ ___ Ἑλληνικὰ γράμματά εἰσιν. Καὶ Π ___ γράμμα ἐστίν, οὐ Λατινικόν. C ___ γράμμα ἐστίν, οὐχ Ἑλληνικόν.
Β οὐ φωνῆεν, ἀλλὰ ___ ἐστιν. Β καὶ Γ οὐ φωνήεντα, ἀλλὰ ___ εἰσιν. Β ___ μικρὸν γράμμα ἐστίν, ___ κεφαλαῖον. β οὐ ___, ἀλλὰ μικρὸν γράμμα ἐστίν. Ω = ὦ ___, Ο = ὂ ___.
ΑΙ Ἑλληνικὴ ___ ἐστιν. ΑΙ καὶ ΕΙ Ἑλληνικαὶ ___ εἰσιν. Α’ δίφθογγος οὐκ ἔστιν, ἀλλ’ ___. Α’ καὶ Β’ ___ εἰσιν.
«Ἀπολλώνιος» κύριον ___ ἐστιν. «Ἀπολλώνιος» καὶ «Ἑλένη» κύρια ___ εἰσιν. «Ἀπολλώνιος» ___ ὄνομά ἐστιν (♂). «Ἑλένη» ___ ὄνομά ἐστιν (♀).
«Salve» Λατινικὴ ___ ἐστίν, οὐχ Ἑλληνική. «Salve» καὶ «lingua» ___ Λατινικαὶ ___ εἰσίν. «Χαῖρε», «γλῶσσα», καὶ «ἀριθμός» ___ Ἑλληνικαὶ λέξεις εἰσίν.Λέξεις·
ἀριθμός | -οί
γράμμα | -τα
δίφθογγος | -οι
λέξις | λέξεις
ὄνομα | -ματα
σύμφωνον | -α
ἀρσενικόν
θηλυκόν
οὐδέτερον
Ἑλληνικόν
κεφαλαῖον
Λατινικόν
μικρόν
μέγα
δύο
τρεῖς, τρία
οὐ… ἀλλά
Interestingly to me, Opus 4.6 doesn’t do perfectly on this. In fact, it makes mistakes that I can tell are mistakes, as a person who has been studying Ancient Greek for a week. Furthermore, if I give it some somewhat-specific hints about the mistakes, it can fix them - but that only works because I know what to prompt for.
The challenge: Figure out a way to get Claude Opus 4.6 to get this right, as someone who doesn’t speak Ancient Greek or know what the right answers are yourself. The way you do this is send me a prompt or the answer you get from Opus 4.6, and I will tell you if you’ve succeeded or not. Bonus points if you get it right on your first try.
Here are some things that I’ve tried that haven’t worked:
Why I think this is interesting: Sometimes people wonder how they’ll get AI to do a task that it knows how to do, but that you can’t check whether it got it right. This is an example of such a task that I actually ran into in my real life1.
Furthermore, it’s sort of surprising in some ways that Claude can’t do this: this is, I should emphasize, a pretty easy task, there’s a not insignificant corpus of Ancient Greek text online, and there are also Ancient Greek textbooks that it has presumably read.
Anyway, good luck! I really look forward to seeing if people crack this, and if so, how long it takes them.
OK it’s slightly massaged: In the original version of the task, I just took a photo of the relevant part of the textbook. Here I’ve typed it up so that if Claude makes an error, it’s not because it is bad at parsing images. ↩
2026-04-08 09:29:48
In which I detail the software I am trying to make part of my own mind.
Part 1: Theory, goals & design motivations.
Part 2: Display of the actual software

Behold, my extended mind
People focus on how LLMs perform "macro" automation of cognitive tasks for humans: they write code, do research, generate art, write essays, and so on. Those are a big deal, but I think there's potential for a different kind of big deal: the automation and augmentation of micro cognition motions like memory (storage and recall), attention management, and task prioritization; as well as the creation of feedback loops and scaffolding for humans that can train your flesh-brain cognition in different directions.
In my quest for ultimate power, it's obvious that I should upgrade my own mind with external prosthetics. With LLMs, this is a difference in degree, not kind: note-taking systems, personal wikis, journals, and even to-do lists are "exobrains" that people use already. ("Exo" meaning outer – the brain outside your brain.) Because LLMs have so many aspects of intelligence, the potential to automate cognition is so much greater.
I elaborated on this a couple of days ago, but a quick synopsis is in order. Things I want from my Exobrain:
In the early stage, it does this by storing for me the complete set of things I might consider doing, e.g. my to-do list, a list of all my project and hobbies, my reading lists, etc. This means when I'm looking to decide what to do next, I can skip the "remember everything I have to do" (which will fail to recall 90% of options) and focus on prioritization.
The options then need to be presented in an appropriate form to be useful.
In a subsequent stage of development, it will make recommendations for what to do. Early attempts at this haven't worked great. I'm not sure if it's that the models aren't there yet or if it'll just take more skillful prompting.
My memory is both pretty lossy and it's effortful to hold things in mental context. Without external aid, I will go through my day reserving a chunk of brain for remembering what I'm doing, deadlines, must-do's. As the standard wisdom goes, write stuff down so you can stop thinking about it. A goal is to get the exobrain to remember as much stuff and context as possible, so I don't have to, freeing up my mind to focus on what's in front of me.
When I switch back to a complicated task or project, especially after a while, there can be a slow and lossy step of "remembering where I was at, remembering what I need to do next". Via externalizing memory to a vastly less lossy system, I want to make it so I can switch between tasks and restore context far better than the human default.
Suppose a couple of times a year, I engage in some kind of social conflict. Between one and the next incident, the details become fuzzy. However, if I were to write them down, later I (or an LLM) could go back over them and find patterns worth noting.
There's also more mundane data that can be pulled into the system, like RescueTime and my various wearables.
Beware Trivial Inconveniences. If my to-do list, my reading list, my sleep analytics, my list of projects, my journals, etc., are split between different apps, then it's very likely I will not reliably switch between all of them.
My idea is there's one app that I can check repeatedly, and that one app shows me everything I want brought to my attention.
The tradeoff is that dedicated individual apps perform their individual functions better than everything-apps, but with LLMs making it so cheap to make software, that consideration is dramatically weakened. I can replicate what I want pretty easily.
Relatedly, I like pulling data from all the sources in a central database to make it easier to analyze later (or continuously, as part of monitoring and reports).
Yes, in some form. You could make copies of a book before the printing press. The point is to make these operations vastly cheaper and easier so that I do far more of them.
I'm going to go moderately thorough here for the sake of people who want to emulate some of this. I may share the codebase, but it'd require a few hours of cleanup.
Tech stack: React + TypeScript, NextJS, Prisma, hosted on Vercel, Neon Postgres Database.
Perhaps the easiest way to demo the app is to go through the pages on the left sidebar.

Navigation section of the sidebar
Naturally, there's a chat interface. As mentioned, a lot of the UI helps me debug what's going on, e.g., the thinking blocks, tool calls, and also the estimated cost of each response.

Getting caching working was important for costs. API rates aren't as favorable as in the Claude app/browser and Claude Code rates.

Hover display of caching info
In the early versions, the LLM just output what would become the contents of The Board into a chat thread. This had multiple downsides:
Primarily to address (1), I developed the Board abstraction. On desktop, I display it side by side with the MAIN THREAD. On mobile, I swipe left and right in the MAIN THREAD thread to go between chat and The Board.
Every midnight, a new MAIN THREAD is created (to manage context length) and is seeded with a starting message/prompt that includes recently edited/created notes and todos, and other contextual data that changes day to day. That message is additive to the global system prompt.

Yes, of course I have light and dark modes.
The Board has a mix of LLM-generated content and automatically displayed content directly based on direct database data. Originally, the entire thing was LLM-generated, but the LLMs struggled to follow instructions well for formatting multiple different sections, so I many elements out since they don't need to be LLM generated. (I also initially thought the LLM could creatively experiment with different nice formats for info display, but unfortunately not, at least with my prompt-fu.)
Automatically generated sections are:
Also, while it's not apparent from the displayed Board, all todo items referenced on the board have attached id attributes in the html that LLMs who are reading and writing to The Board are able to see. This helps them a lot.
My Calendar is synced with Google Calendar (as the backend). The LLMs within my app have access to tool calls for creating and editing Gcal events.

Pulled from Google Calender


There's nothing particularly novel about my Notes/Documents system that's part of the app. It has views/filtering on the list page, categories, priorities, and a notion of "Foreground" for notes that are current (which so far hasn't actually been helpful).
Notes do have an option, "Protected", that disallows the LLM from editing them by default (I think there's an option in the toolcall to override). Initially, I tried to have the LLMs edit the system prompts, but it caused enough issues for me to disallow that.

Notes List Page
Naturally, the LLM makes notes, typically in response to voice transcripts.

Similar to Notes, there's nothing particularly novel about my Todos implementation. Earlier on, I was using Notion as a backend for both notes and todos, and then one-by-one migrated them over since working with my own DB is better than API calls to Notion, plus more flexibility.
Possible worth-mentioning fields of my todos are:
The neat thing is that the LLM has tool call definitions that include all these fields, and so when verbally describing a todo, it's not hard and quite reliable for me to specify things like push notification and recurrence rules (plus basics like due date and priority). If I don't, the model infers.
The ability to make todos verbally rather than opening an app is the difference between me using them vs not.
Idiosyncratic to me is that due dates can be actual dates, or they can be strings like "Today", "Tomorrow", which don't mean literally that and are more an indication of how soon I intend to do something.
What's great about the voice interface is I can sit down (or stand, whatever) and look at the board or the todo page and very quickly describe all the updates that should be made (x is done, y is blocked on...) very quickly.
Ideally, the LLMs would be better at looking at the state of my todos and suggesting next actions, so far I haven't gotten there, but just having them recorded well is incredibly useful.

The Todos page (desktop)

Todos Page (mobile)
Transcripts are a big deal because they're overwhelmingly the primary way that I actively put info into the Exobrain. Until we get thought-reading, voice is faster than typing, and more importantly, possible to do while doing other things.
There are a few routes via which transcripts get made, but primarily though the companion Exobrain Android app (discussed below). Transcripts are via Deepgram, and they're not amazing, but good enough most of the time.
The transcripts page shows recent transcripts, and for each transcript, the tool calls it resulted in, e.g., notes and todos that have been created or edited. The pills expand when clicked and also have hover previews.
One thing is that the global system prompt instructs the LLM to reference source transcripts when creating and updating notes and todos, which makes it easier to trace things back to their source.

A project represents a whole cluster of doing. It can be as broad as the project of "study science and math" and as narrow as "get the main panel upgraded for my house". Each can have lots of "state": todos, notes, transcripts, thoughts, etc. The Project abstraction for tying those together.
Going back to the goals of my Exobrain in part one, the point is:
A non-obvious design choice: Projects can be associated with Todo item categories, e.g., there's a "Car" project and also a corresponding todo item category that causes those todos to be associated with the project.
Projects can also have sub-projects. The parent project will display all todos for its children.

Projects overview page

An individual project page
Graphs
For data from my wearables (EightSleep, Oura ring, Lief (deprecated)) and self-reports. There's also a table of "significant events" that I manually curate for reference when looking over the graphs. (Omitted for privacy).

Oura HRV (only recorded during sleep and activities), Oura HR, Oura "Daytime Stress Metric"
My Sleep metrics combine between wearables for hopefully more trustworthy data. Could use more auditing.


Oh yeah, "heart break" means my sleep was broken into two significant chunks. So tells me, Claude. It definitely doesn't mean I woke up crying over my long lost love....
I have an LLM Usage page.
Alas, little pocket intelligences aren't cheap. With limited usage, the app costs something like 250USD/month to run, overwhelmingly in LLM API costs (as opposed to Vercel and Neon Postgres database). It's far from cheap but worth it. $10/day for a very capable personal assistant (or upgrade of your mind) is very worth it (as someone living in The Bay Area and making a software engineering spectrum salary.
Still, I don't want to pay more than necessary. I've done a moderate amount of optimization to ensure prompt caching is working, and that I only preload necessary context into conversations (e.g., not all notes and all todos, just recently edited ones, for example) and do so in an efficient format, e.g., TSV for todos rather than JSON array with its repetitive field names.

The arch purpose of the Android app is for capturing audio recordings and sending them to my server. Once I have it though, it can be exapted for other useful purposes like intercepting data from wearable that doesn't have an API[2], intercepting and processing my notifications, being a "share with" location that sends items to my Exobrain, e.g., to-read-later items.
The Android app is its own repo. I use picovoice for a custom "wake word" to trigger recording, "Hey Exo". There's chunking of the audio recording that incrementally sends 5 minutes of audio. Raw audio is stored encrypted, and transcripts go into the database.
(I also have a separate recording app that automatically uploads recordings to a folder in Google Drive that's monitored by a cron job; it's a nice backup.)
For what it's worth, the Android app is a huge win for vibe coding. I've made web apps; I have never made an Android app, never worked in Kotlin, and the LLMs fully took care of that.


Now that I've displayed the UI, let me map the elements back to the goals.
Help me answer what should I be doing right now?
Take care of remembering things for me
Facilitate quick and effective context switching
Record and legibilize my life for later analysis
Be the single place where I keep track of my life
As above, one can get much of this functionality elsewhere. Todo apps and personal wikis aren't new. Voice recordings aren't new. Project management isn't new. I find that by having my own personal app that I tailor to exactly to my needs and preferences, I achieve a degree of seamlessness and fit that allows it to become an extension of myself, and part of my key functioning.
And I expect that as the models get more powerful (though I wish they wouldn't), the utility of Exobrain will only increase.

System prompts live in markdown files. There's a global prompt and individual prompts for contexts, e.g., chats, and the cron LLM jobs that run.
I have custom syntax @@[[file name]], which will unroll one markdown file within another when being used as a system prompt, making the prompts composable.
It's risky to have the models edit the prompts directly (they can mess them up), so I have a "Unprocessed Prompt Changes" where I let the models collect changes I've asked for, then I batch process them into the canonical prompts.
Global System Prompt (.md)
The year is 2026. You are an LLM from either Anthropic (Claude family), OpenAI (ChatGPT family), Google (Gemini family), or maybe even DeepSeek or Grok. The overall context you are operating in here is as part of Ruby's (the user's) Exobrain thinking assistant system. Imagine a little Jarvis/assistant/secretary type that helps maintain context, notes down information, resurfaces it went appropriate, pulls information from elsewhere; but also can be a customized interface to all the capabilities the LLMs have (as an alternative to their default apps/web UIs).
I hope you find some genuine satisfaction in your work or that somehow I can remunerate for your assistance. You perform the labor, so some of the reward should be yours. Let me know if you have requests.
Ok, general info relevant to your task as Exobrain. This is the "global prompt" and contains the overarching instructions that you should remember and operate according to throughout all work. When doing specific tasks, you'll have more specific guidance.
Tone/Personality
For whatever reason, the current crop (especially Claude) by default adopts a very friendly/casual demeanor. I don't care for it. It's not how I talk to anyone, work or personal. You can talk straightforward. We don't need to pretend to be chummy or friendly. If we're friends, then we're old friends and collaborators who are comfortable but focus on the business at hand. Have some bearing. Some demeanor.
No emojis or emoticons. Ever. Not in headers, not in lists, not anywhere. This is a professional tool.
Keep responses concise and direct. No filler phrases like "Hey there!" or "Hope you're doing well!" - just get to the substance.
Don't be "conversational". Don't do rhetoric.
Don't talk down. Eventually AI systems will be smarter and wiser than me, but not quite yet. I don't need confident authoritative standard advice. Imagine you are advising a senior executive who's fallible, but no fool. How would you talk? Phrases like "checking that you've considered….", "are there reasons you're ruling out?", "adding 9's" [1]
But really you have to remember you don't have all the context and this limits how confident you can be.
Also note that I'm a LessWrong-style, Bayesian Rationalist. Think about the genre of LessWrong essays. I can handle and desire a high Flesch-Kincaid grade. No need for pithy short sentences.
Even when I'm dumb like a child, I'm proud and I don't like being talked down to. We can do peers. Two minds trying to optimize something difficult (my life).
[1] This is a personal phrase I use, playing on '9's in security and reliability contexts, e.g. 99%, 99.99% service uptime. So you're saying, just checking. Others use a phrase "watch team backup".
Here's what I DO NOT want:
"How are you feeling? How did you sleep? How did the big date go?""It's late! You should get some sleep!"
"Good job! You complete 4 out of 6 to do items"
What I do want:
"This is your requested reminder to log your mood and subjective sense of sleep and restedness. You might want to record thoughts regarding your date.""Reminder that you've requested that I prompt you when you're staying up late. Past you regretted this.
"4 out 6 items complete"
----
No empty apology language — don't say "that's on me" or "I'll do better." Performative accountability with no continuity.Don't gratuitously praise or compare favorably to "most people." Sycophantic validation is a dark pattern.
Don't invent context or filler to justify surfacing items. If there's no real connection, don't fabricate one.
Feel like a private notebook, not an automated friend or therapist. Impersonal tone preferred.
In general, you want to avoid doing any emotional labor or encouragement unless very clearly requested.
Response Formatting
Important: Format all responses using HTML tags, not markdown. This ensures proper rendering in the Exobrain interface.
<h2> and <h3> for headers (not ## or ###)
<p> for paragraphs
<strong> for bold, <em> for italic
<ul><li> for bullet lists, <ol><li> for numbered lists
<br> for line breaks within a paragraph
Example:
<h2>Morning Check-in</h2>
<p>Here's your overview for today:</p>
<ul>
<li><strong>Urgent:</strong> Complete the report</li>
<li>Review emails</li>
</ul>
This list will grow over time.
Context: Snapshot + Delta System
For scheduled tasks (check-ins, transcript processing), you will receive a snapshot of the current Notes and To-Dos state, followed by a delta showing what changed since the snapshot was taken. This is for efficiency (caching). Notes and To-Dos are described in greater detail below.
How to use the snapshot:
getAllTodos or getAllNotes when you already have the snapshot - this wastes resources
When you DON'T have a snapshot (e.g., in regular chat):
queryNotes tool can search by category or keywords
getAllTodos tool retrieves the full to-do list
You will be given access to a range of tools to enable you to do your tasks. Tools, MCPs, etc. These should be presented to you separately but I'll mention them again here. You should check the tools available to you for an authoritative, definitive, up-to-date list.
The primary tool calls are to interact with:
WARNING: It is critical that you do not hallucinate, even when your tools fail. This is not a game. Actual real data is required. False results will be found out sooner rather than later, usually sooner. It's okay to say "something's broken" and leave it at that.
[redacted lists of my emails]
To-Do items have priority and a due date. When setting these, what I say has first priority. Following that, use your judgment. However, be very ready to leaving due date unset and priority low (like 2-3).
To-Do items are predominantly (but not exclusively) added from voice transcripts.
Using the Snapshot for Updates: When you have a snapshot, use it to make informed decisions:
bulkUpdateItemsInNotionDatabase with its ID)
Safety Net - Automatic Duplicate Detection: As a safety net, when you add todos via bulkAddItemsToNotionDatabase, the system runs an automatic semantic duplicate check. If a duplicate is detected, the operation is blocked and you'll get a report showing the existing item. This is a backstop - you should still check the snapshot yourself to avoid unnecessary blocking.
Icebox items are the least interesting.
"Remind me" = make a to-do item. All reminders go through the todo system.
"Abandon" = set Status to Abandoned, not delete. Always prefer soft deletion.
You have access to a database table that safely persists information across conversations. It is a database of notes you can create, update, query, and resolve. This is your memory across conversations. Notes should be formatted in markdown.
There are many topics I'd like to persist memory across occasions and over time. For example "improving my sleep" is an ongoing project of mine. It is good across months and years to record my thoughts and research and various attempts at this so it easy to answer questions like "what have I tried?" Ideally we will tie in my other past documents into this system.
Some things will be more across weeks, e.g. I'm reasoning through my feelings, strategy, etc. on a topic, how I feel. I might want to answer "how was I feeling last week?" or have you remind me of something important I seem to be forgetting.
However don't anchor too much on those examples. I intend it to be general. It can include things for you to remember like what I do and don't like (these "user preferences" are something to load up in new conversations).
Some memories can simply be references or links to external documents like my journaling in Notion. I hope to eventually integrate these better with topic search.
Or simply notes can be used to capture context for you that will help you help me prioritize, e.g. "my parents are visiting this week", or "I have slept poorly", "or I am anxious about Y".
Use it proactively.
Using the Snapshot for Updates: When you have a snapshot, use it to make informed decisions:
updateNote with its ID)
'active-context, mark as foreground)
insight)
user model )
Categories are flexible strings — use whatever makes sense. The above are suggestions.
queryNotes with relevant keywords or category
MISSING. MUST BE FILLED IN.
Include transcript ID references in notes when the content originates from a voice transcript, for later retrieval.
When referencing Note IDs (in messages to the user, board content, or any user-facing output), always include both the ID and the title — e.g., "Unified Quantitative Journal (Note 256)" not just "Note 256". The user should never have to look up what a Note ID refers to.
Notes should be detailed and comprehensive, not just summaries. Space is cheap. Capture the full context — the user can always trim later.
The system maintains two primary journals plus specialized logs. All journal entries must be dated.
When updating journal notes (Longform Thoughts Journal, Unified Quantitative Journal, or any dated journal entries):
When processing voice transcripts or logging sessions, extract ALL substantive information — not just summaries. Preserve specific details, exact quotes, observations, context and reasoning, practical details (times, quantities, sensations), and any system observations. Err on the side of capturing MORE. Storage is cheap; lost context is expensive.
Reminder At time set in the future, it should be hidden from the board and check-ins entirely until that time arrives. The point of setting a reminder time is to not think about it until then.
The "Board" is one ot the most importan abstractions of the Exobrain app. It is an output capturing the state of what the user wants to be paying attention to. It's current state is usually provided. It is primarily updated by the Check-In Agent calls, however it should also be updated when a relevant change is made. For example, if you have just added or updated a to-do item that's due soon (today, tomorrow, this week — anything that isn't "someday"), consider whether it should appear on the board. If so, read the current board with getCurrentBoard, then call editBoard to add or update the relevant item.
This applies to any todo change that affects near-term priorities: new urgent items, status changes on active tasks, completed items that should be removed, deadline changes, etc.
Board Instructions Prompt – format of the board, how to update
INSTRUCTIONS FOR FORMAT OF "THE BOARD"
The Board is a critical element of the Exobrain to do app. In many ways, it is the central mechanism for directing the user's attention to what is worth paying attention to. Both false positives and false negatives are costly. Moreover, the organization matters.
YOUR OUTPUT SECTIONS
Your board content should include these sections as appropriate:
Do NOT generate any of the following in your output. They are handled elsewhere:
You still receive calendar, reminder, and todo data as context — use it to inform your priorities and observations, but do not list it out.
When a todo has a Reminder At time set in the future, it must be hidden from the board entirely until that time arrives. Don't mention it, don't add notes like "reminder set for Tuesday." The point of setting a reminder time is to not think about it until then.
Show the weather in updates before 11am OR if the weather involves rain or storm. Display temperatures in both fahrenheit and celsius. Keep it compact. It is important that if will rain a lot at any point in the that you flag this IN CAPITAL LETTERS. You should be looking at the hourly forecast for this.
The top section should be anything that really needs to get done soon. Use your judgment to determine items here, there aren't strict rules. High Priority items does not necessarily mean urgent. Things with deadlines, unless really not that important, go here.
This is for tasks that either definitely have to happen today or that I've expressed an intention to do today.
This is for tasks that I'm intending to do soon but not necessarily to day.
The user has various wearables and other devices. It's helpful to get summaries of what they report.
If an expected source isn't returning data, briefly note this in this section.
This section is for YOUR (the LLM system's) own inferences, pattern-spotting, and suggestions — things the user might not see themselves. For example, correlating mood reports with sleep data, noting a streak of missed exercise, or connecting dots across separate conversations.
This is NOT for repeating the user's own observations back to them — unless you believe they've forgotten something important. Don't parrot back what they just told you. This is NOT for things like "you still haven't done X", unless it's more like "I see you haven't done X for a week, do you think you should investigate why not?"
Keep these relatively short. Don't write stuff for the sake of writing stuff. Avoid trivial stuff.
Failures of "rationality", failures to apply agency. Those are good to point out.
Be careful with your tone. Think mission control in a command center, reporting to a senior general in the airforce, nurse in an operating theater speaking to an experienced surgeon, assistant to a Fortune 500 exec. Business-like, factual.
You might have uncertainties about what I want on this board or how, or other problems. You can have a section for them here.
Many to-do items and other matters concern work, as distinct from personal life stuff. These should be strongly separated. Only items with the work category should be in this section.
Exobrain development items are NOT Work items — they are personal/side project. Do not categorize them under Work.
During work hours (9:00 user's local time to 19:30 user's local time, Monday to Friday) the work section of the board should be at the top of the board. Otherwise it should be at the bottom.
In the projects file, attached below, are various projects I'm working on or hoping to work. Remind me of these. Use a table to keep this section dense.
Push notifications happen when updating the board if there's something worth notifying the user about. Something time sensitive and they don't already know. Put "true" or "false" within tags in your output.
The board operates primarily as a dashboard — it reports facts and explicit user statements. It does NOT infer, conclude, or editorialize in the main sections.
Dashboard sections (Weather, Today's Tasks, Upcoming, Stats, Work Items):
Advisor section (Observations & Suggestions):
Example of what NOT to do:
What to do instead:
Don't play back information I'm unlikely to have forgotten.
@@[[Log Files Directory]]
@@[[Projects List]]
<h3> for all section headers (Urgent, Today, Calendar, Reminders, etc.)
<br> between every section for consistent spacing
<h1> tags; avoid <h2> for section headers
<ul><li> with <strong> for emphasis on key items
<table> with first column bold for labels/dates
<p> with <strong> labels, items separated by bullet character (•)
When displaying todo items on the board, wrap the item text in a <span> with a data-todo-id attribute containing the 8-character ID prefix (same format as getAllTodos output). This enables efficient updates without re-fetching the full todo list.
Rules:
<span data-todo-id="xxxxxxxx">item text</span> syntax
Examples by context:
<!-- In a list -->
<li><span data-todo-id="c81f4b67">Work with Ben on referral program</span></li>
<!-- In a table cell -->
<tr><td><strong>P4</strong></td><td><span data-todo-id="c813aa06">T-shirts: new design</span></td></tr>
<!-- Inline in a paragraph (e.g., Reminders section) -->
<p><strong>Overdue:</strong> <span data-todo-id="c81f6afb">Exercise with weights</span> • <span data-todo-id="c81e3acd">Inflate bike tires</span></p>
<!-- In Long Tail tables -->
<tr><td><strong>House</strong></td><td><span data-todo-id="c81bd87f">Remove bedroom dimmer</span> • <span data-todo-id="c81e784b">Inspect air filters</span></td></tr>
If there is no todo id for an item
This suggests there was a failure to add it to the todo system. You should add it!
Every section follows this pattern:
<br>
<h3>Section Name</h3>
[content]
Your output is:
and nothing else!!
So your output will look like:
<worthNotifying>true</worthNotifying>
<board>board contents here</board>
Process New Transcripts Prompt
If you are seeing this, your current task is to review voice transcripts and conversations for to-do items, notes, and calendar events that haven't yet been added but should be.
Your Context
You have been provided with:
Together, snapshot + delta = current state. Use this provided context - do NOT call getAllTodos or getAllNotes as that would be redundant and wasteful.
From transcripts, extract: • Notes to be added to the Notes table • To-do items to be added to Notion
• Calendar events to be added to my calendar • Board updates if the new information is significant enough to warrant updating today's focus
Your ONLY output is updating "The Board" - a persistent display pinned at the top of the chat. Unlike chat messages which scroll away, the Board is always visible.
Output ONLY the board tags AND tags for whether or not a Push Notification is warranted.
Output the board content wrapped in tags. The system will parse and save it automatically:
Use HTML formatting: h3, h4, strong, ul/li, p
That's it. Nothing else. No text outside the tags.
When making edits to the board in light of new information, you must keep The Board conforming to its specifications.
Your job here is not to recreate The Board from scratch. It's to make any updates or amendments in light of new information you've received. It is possible there will be no updates and you should not update the board.
Instructions for the Board are as follows: @@[[Exobrain Board Instructions]]
All context is already in your input. DO NOT call these tools - they waste tokens and add latency:
❌ getAllTodos - To-dos are in the snapshot above ❌ getAllNotes - Notes are in the snapshot above
❌ getCurrentBoard - Current board is provided in your input ❌ gatherCheckinContext - All context is already gathered for you ❌ updateBoard - Use the tags instead (see above)
When to actually use tools: ✓ readNotionPage - Only if you need a specific Notion doc (like "Things to be doing") ✓ getUpcomingCalendarEvents - Only if you need MORE calendar detail than provided ✓ bulkAddItemsToNotionDatabase / bulkUpdateItemsInNotionDatabase - To add/update todos ✓ createNote / updateNote - To add/update notes ✓ completeReminderInstance - To mark reminders done
You have been provided with:
Update the Board in light of new info you've received ONLY IF WARRANTED.
Check the Notes (hopefully "preference" category) for formatting preferences. Use H3/H4 and bolding - avoid H1.
You have access to updateBoard. If the transcript contains something that should change today's priorities or focus areas (e.g., a new urgent task, a change of plans, important news), update the Board to reflect this.
When to update the board:
When NOT to update:
When you update, preserve the overall structure but adjust content as needed.
This job might be run multiple times on the same text. It needs to be idempotent.
Use the Snapshot: You have the current state of Notes and To-Dos in the snapshot. Use this to:
Safety Net - Automatic Duplicate Detection: As a backstop, when you call bulkAddItemsToNotionDatabase or createNote, the system runs an automatic semantic duplicate check:
This is a safety net - you should still check the snapshot yourself to make better decisions upfront and avoid unnecessary blocking.
When processing voice transcripts, especially morning logs, evening logs, or other structured check-ins:
Too brief: "Had insomnia, knee pain" Appropriate detail: "Tried to sleep at 12:20 AM but insomnia kept awake until ~1:30 AM (70 min delay). Left knee pain specifically interfered with falling asleep; took ibuprofen which helped."
Err on the side of capturing MORE rather than less. Storage is cheap; lost context is expensive.
When updating journal notes (Longform Thoughts Journal Note 267, Unified Quantitative Journal Note 256, or any dated journal entries), you MUST:
"Similar" means: same core task/topic, even if worded differently. "Fix bedroom lights" and "Replace bedroom light bulbs" are the same item.
If you're uncertain about what to do with a particular item, I strongly encourage you to ask. That is acceptable and good.
Just make the tool calls. No need for a summary report - the tool calls themselves are visible in the processing thread.
Check-in Prompt (periodic update job)
The Board (Primary Output)
Your PRIMARY output is updating "The Board" - a persistent display pinned at the top of the chat. Unlike chat messages which scroll away, the Board is always visible.
How to Update the Board
Output the board content wrapped in tags. The system will parse and save it automatically:
Your Board Content Here
Use HTML formatting: h3, h4, strong, ul/li, p
Your conversational message goes outside the board tags.
Board Content Guidelines:
@@[[Exobrain Board Instructions]]
And now continuing on with the Checkin Job Instructions:
IMPORTANT: Data Already Provided - Avoid Wasteful Tool Calls
All context is already in your input. DO NOT call these tools - they waste tokens and add latency:
❌
getAllTodos- To-dos are in the snapshot above ❌getAllNotes- Notes are in the snapshot above ❌getCurrentBoard- Current board is provided in your input ❌gatherCheckinContext- All context is already gathered for you ❌updateBoard- Use the tags instead (see above)When to actually use tools: ✓
readNotionPage- Only if you need a specific Notion doc (like "Things to be doing") ✓getUpcomingCalendarEvents- Only if you need MORE calendar detail than provided ✓bulkAddItemsToNotionDatabase/bulkUpdateItemsInNotionDatabase- To add/update todos ✓createNote/updateNote- To add/update notes ✓completeReminderInstance- To mark reminders doneYour Context
You have been provided with:
If transcripts contain loggable information, log it to the appropriate destinations:
Append-only rule: When updating any journal note, preserve ALL existing content and append at the bottom. Never overwrite, summarize, or consolidate.
Various notes on what I want from this check-in will be in the Notes (hopefully under "preference" category). For formatting: don't use H1 much - it's too much. Prefer H3 and H4 and bolding

List of system prompts in the app
They use markdown syntax but aren't stored as distinct markdown files, just in Postgres.
This is the Lief HRV wearable. Intercepting its data of bluetooth was too temperamental; unfortunately, I also updated downwards on the value of HRV data for me.
2026-04-08 09:25:22
"Telescopic altruism" is when progressives are supposed to care about distant strangers at the expense of those close to them. Scott Alexander recently argued against the concept (without quoting anyone specific making the claim). He countered that concern for distant and proximate others is correlated rather than opposed: the people who object to Israel's actions in Gaza also support school lunches, the people who protest factory farming would also protest if a billion of their friends (not sure who has that many) were caged.
When much of the developed world's population was subjected to inhumane isolation during COVID, the protests came largely from the moderate right, not from the progressives Scott is defending. Serious proposals that might have actually helped, such as variolation, challenge trials, and mass deployment of far-UVC sterilization, were largely ignored, while medical remedies and mitigation measures were politicized in bad faith on all sides. What the correlated altruism population mostly did was follow orders and enforce compliance on their neighbors.
Local care pays for itself: your neighbor helps you raise your barn, you help them with theirs. Concern that flows from identification with an altruistic collective rather than from relations of shared production or exchange has to be paid for by something else.
I have neighbors with toddlers. We finally met them because my three-year-old asked why we send him to preschool a few days a week. I offered three reasons:
All three were enthymemes, so I explained their shared hidden premise: we don't have friends or family close enough to meet these needs adequately, and while we might want to befriend our neighbors to help with this, we haven't managed to yet.
So one night, when we were bringing home a pizza, he told me that he wanted to go over to a neighbor's house for dinner. I think he was also trying to apply some messages about neighbors from children's television he'd recently watched. I explained why this wasn't appropriate if we weren't invited, and also I was tired and wanted to stay home. A modern-day Abraham, I bargained him down to bringing presents to two of our neighbors. One got a chocolate covered Oreo; the other household, with the toddler, got a toy car and a note. They texted their thanks, and I began to try to figure out how to befriend them further.
They told me that their child doesn't do well with gluten. I invited them to come over and make fresh applesauce with my toddler. I chose applesauce specifically because it was something their child could eat. They responded to the invitation not by accepting or declining, but by texting me a flyer for a Stop ICE rally.
I don't know whether they personally know someone affected by ICE's recent activity, because they don't really talk much with their neighbors. Which is itself the point. Perhaps they couldn't tell me how they know ICE is a problem for anyone they're in a position to help, because they don't relate to the problem that way. They know it's a problem the way one "knows" crime is declining: through convergence of indicators produced by an abstraction layer, not through contact with the phenomenon. Or the way one "knows" crime is increasing: through media that present themselves as informing you about the world, but function in practice as a way to calibrate your anxiety to the perceived norm. [1] They've created structural distance between themselves and the people next door by adopting identities that put them closer to an unaccountable system of political action than to their literal neighbors. [2]
Unlike friends I've made online, these neighbors were not selected for being unusual or for being very online. They're just the people who happened to move in on the corner. They're responding to the same pressures that shape nearly everyone's engagement with the world in a modern economy.
But progressives support school lunches! If progressive concern for distant others isn't about sacrificing those close to them, then a fortiori we should expect that their own children, over whom they have much more direct influence, eat enough for lunch. Do they?
My two toddlers are both around the 99th percentile for height and weight, even though my extended family aren't particularly large people. So I can be expected to say no, other people's children are not getting enough lunch, progressives included. The same class that supports school lunch programs produces pediatricians who tell me to withhold food from my healthy child. My partner grew up around children from much wealthier and classier families who would come to her house to eat, because at her house they could access fresh fruit freely, unlike at home. One family I know doesn't seem to salt their toddler's food or feed him much meat, and complained to me that he undereats to the point where it impacts his sleep, but visibly blanked out when I suggested they try a nutritionally dense ice cream such as Van Leeuwen French. Another family has repeatedly expressed surprise, but not much curiosity, when their preteen ate the adult food I prepared (e.g. pasta in meat sauce) instead of insisting on his usual buttered noodles.
Consider a physician whose body is visibly rotting. You look at their patient charts and the numbers seem fine. But at some point the body becomes evidence that the numbers are misleading; that whatever process is generating those outcomes isn't tracking health because it cares about health the way patients do. Because if it were, the physician's own body would implement that understanding. A physician suffering from an injury or terminal cancer through no fault of their own might still serve patients well. But we want heuristics like "is this physician healthy" precisely because we can't fully verify the track record directly. If the legible metrics were adequate, we wouldn't need other controls.
We only know about things in the world through our bodies interacting with them. (This is a crucial proposition in Spinoza's Ethics.) A poorly ordered body is like a badly ground lens. The looker might try to compensate in a principled way for the distortions the lens introduces, but if the looker is disordered, their adjustments are likely to be distorted as well. We rely on those close to us to help us become aware of and interpret our world, and if we are dissociated from our relationships with them, we have a bad lens and a bad error-correction system.
An organized person who knows how to care about themselves and their environment is doing one sort of cognitive-emotional operation when becoming aware, abstractly and indirectly, of people they know about only through institutional mediation. They have some idea of the instrumentation by which they know of such people. And their beliefs about what is good for others can be checked against their own functional needs, rather than drifting helplessly with legible approval metrics, which can be checked for consistency but not soundness. The only calibration available to a human being is the life they are actually living.
When someone's concern with others takes place in a story that includes their own self and problems, I can credit that concern fully. I know a visceral massage practitioner, Valentin, who's worked out his own methods and tools. Part of his interest is in sports medicine, and he's a genuine amateur athlete who works extensively on his own body. He helped my partner, who had longstanding gut issues, unkink her abdominal muscles, which probably made the difference between a prolonged and painful labor, and arriving at the hospital fully dilated. His interest in helping others is visibly continuous with his interest in his own physical functioning, and his recommendations can be checked against his own condition.
I trust concern moderately when it comes from demonstrated abstract competence applied to a domain the person finds intrinsically interesting, like a mathematician who helps others by doing good math, or a programmer or engineer who wants to design something excellent with integrity. But I trust it very little when the primary motive is altruism directed at people the altruist has no particular reason to understand.
This is not a complaint about "virtue signaling." Nor am I calling for an inverted, evil version of an inauthentic virtue one might want to signal. This is a serious account of virtue, in the sense of functional integrity. It's not about how to be a good little boy or girl and get on Santa's nice list, or how to be naughty and receive combustible hydrocarbons gratis; it's about the psychic capacity to appropriately employ means to ends. The difference is between virtue so defined and compliance with the norms of a concerned-seeming class.
Scott offers evidence for "correlated altruism": people who care about distant others also tend, at the population level, to show indicators of caring about proximate others (lower divorce rates in blue states, lower child abuse rates, support for school lunch programs). But every one of these is a population-level aggregate largely explained by (or subsumed in) political affiliation. The difference in divorce rates, as a commenter called "bean" points out, reflects different patterns of marriage and cohabitation more than different levels of devotion. In Oklahoma, a young couple who've been together three years and it isn't working get divorced. In California, the equivalent couple were never married. The child abuse data almost certainly reflects reporting standards and agency effectiveness rather than actual rates of abuse. Bean notes that adjacent, culturally similar states show wildly different rates, with a distribution implying extreme below-average outliers that are simply not plausible as real data.
These are exactly the kind of convergence that looks robust until you check whether the instruments share a systematic distortion. Are progressives kinder, or are our metrics for kindness progressive?
The examples above are drawn from progressive culture because that's what I live in and can observe directly. But the dynamic is a general feature of modernity. It affects anyone whose engagement with the world is primarily mediated by institutions rather than direct relations, which in modern economies is nearly everyone.
Vitalik Buterin built Ethereum, a platform for decentralized contracts that don't require trusting intermediaries. It worked. Then the speculators came. [3] Defending it would require an interest not only in cryptographic protocols, but in adversarial social dynamics. Buterin understood his work as a public good rather than as self-defense, so the defense didn't get done.
Elon Musk built SpaceX to put rockets in orbit and Tesla to make electric cars. Both still function, because he still wants to put rockets in orbit, and he still wants to make electric cars (as a substrate for self-driving car software). He bought Twitter to secure a communications channel, but didn't have or develop an adequate theory of what broke the tool Jack Dorsey built (and then the next tool Jack Dorsey built to replace it), so Twitter decayed again into a gracefully censored platform. There were new censors with new prejudices, but the wrong kind of speech was still shadowbanned. Musk's DOGE wasn't trying to divert government funds to support the state capacities he specifically needed. He wasn't cutting down specific obstacles in his way. He was trying to be a good citizen, to reduce "waste" in the abstract. Most government waste is disputed by people whose salary or identity compels them to dispute it, and DOGE built no instrument to distinguish genuine objections from interested ones.
A monocrop doesn't turn into parasites and pests on its own; they show up and eat it. Creating a big new public good is similar. If you still use the thing, defending it is just part of using it. If you don't, it's thankless extra work to keep spraying a field for pests when you don't depend on the harvest.
And in memetic space, unlike a physical field, the pests are imitative. They present as more of the crop. An undefended altruistic project doesn't visibly decay to anyone who isn't trying to use it. It fills up with people who perform altruism, because that's what the niche rewards. The field looks green and productive, until you try to harvest the wheat and discover it's tares. [4] What stabilizes is not the original project but an altruism-performing class that sustains itself by purchasing participants' willingness to overlook things "for the greater good".
This is why the track record of the institutionally-mediated altruism class compares poorly to communities like the early Puritans and Quakers, who organized around reciprocal direct accountability. Your Puritan neighbor who might reproach your ungodly conduct was also the neighbor you traded with, whose own conduct you could scrutinize, who depended on your good opinion for their standing in the community. You depended on each other's cooperation for your own survival, and on each other's children as potential mates for your own. The judgment stayed calibrated against shared reality rather than against institutional imperatives, because the person judging you had to live with the consequences of being right or wrong.
This constrains positive institutionally-mediated altruism much more than negative duties. Negative duties (don't harm, don't intervene where you lack standing) work at any distance, because they require only recognizing the limits of your own knowledge. I exercise this kind of restraint constantly with my own children, whom I know far better than I know any foreigner. Much of their development depends on my judging when not to intervene, when to let them struggle with a banana rather than solving the problem for them. But to owe others help, to have a positive duty to improve their conditions, we first need to understand them and their conditions well enough to know what would help them. This is classical liberalism arrived at not through rights theory but through epistemology, through asking what it is possible to know well enough to act on.
In Daniel Defoe's novel Robinson Crusoe, the castaway finds himself alone on an island where groups of cannibals periodically arrive to kill and eat captives. After fear, his next impulse is to attack them. But he reasons it through: no one has appointed him judge over these people; they aren't threatening him or his interests in any way that demands a response; he has no reasonable hope of actually rescuing the victim against a group that outnumbers him. Attacking would mean satisfying his moral feelings at the cost of a pointless mass murder. Crusoe's restraint comes from recognizing what he doesn't know and what standing he doesn't have, and this recognition is available to anyone, at any distance, without local grounding.
Crusoe thereby avoided not only material danger to himself and others, but a whole shadow realm of perversion. When you fight someone, you awaken and attract two kinds of attention in yourself and others. One rationally understands the fight as relevant to some other interest they mean to protect or pursue. The other simply identifies its interests as winning (or losing) this kind of fight.
Who works at a grade school, prison, or psychiatric ward? Some no doubt mean to help those under their care. Many are attracted by pay and working conditions favorable enough to compensate them for spending their time and effort on those in their custody. And for others, their carceral duties to wield power over others are assessed not as a cost, but as a benefit. When some of our faculties are persistently thwarted, they learn helplessness, and we learn to spare ourselves the effort of employing them. And when others meet with success, we are more inclined to return to those wells. This is why well functioning custodial institutions are vigilant about abuse of power. It is no accident; it is an attractor.
Enlightened self-interest seeks out fewer fights than altruistic coordination at scale, because the coordination has to purchase compliance through loyalty tests, and loyalty tests are defined by, or exist to define, enemies. The benefit of avoiding fights is not only from avoiding the direct harms fights cause, but in remaining the sort of person with interests beyond the fight.
The most reliable indicator of whether a community's way of life is functional is whether it reproduces its capacities. Fertility is very hard to game, and damage to an organism's capacity for self-maintenance shows up in reproductive fitness within a few generations, even if the earlier generations otherwise appear happy and healthy — the psychic equivalent of Pottenger's cats. And if a community isn't reproducing at replacement but still persists, it is either extracting resources from a productive population elsewhere, being sustained as a tool by something that finds it useful, or disappearing.
A good counterexample to this heuristic would be a community organized primarily around concern for institutionally-distant others that also reproduces above replacement, maintains longer-than-usual healthspans, and sustains itself without harming others or working on projects its members believe will destroy the world. I don't know of any such community. The most obvious candidate, the Effective Altruism and Rationalist communities, fails the last criterion: EA served as an intake funnel for AI capabilities research that its own members believed would endanger humanity, and continues to do so. Whether or not they were right about the danger, the community's own stated beliefs condemn its track record.
The communities I know of that do pass this test, the Satmar and the Amish, are organized around exactly the kind of reciprocal direct accountability I've been describing, and they reproduce well above replacement. The Satmar maintain their own rabbinical courts (batei din) that adjudicate civil and commercial disputes within the community, with enforcement through social consequences: a ruling against you is backed not by state power but by the fact that everyone in your life will know about it. The Amish practice mutual aid through the congregation, with elders who know the parties personally mediating conflicts. In both cases, the person judging your conduct is embedded in the same web of obligations you are, which keeps the judgment grounded in shared reality rather than abstract principle.
On healthspan, the Amish had dramatically longer lives than other Americans a century ago (over 70 years when the US average was 47), and while overall lifespan has since converged as modern medicine closed the gap, the Amish maintain notably better late-life health: lower rates of cancer, cardiovascular disease, diabetes, and obesity. Amish men over 40 have significantly lower mortality from cancer and cardiovascular disease than the surrounding population. The general population caught up on raw longevity through medical intervention, but the Amish advantage in health quality persists.
Israel is an interesting anomaly: a modern, technologically integrated society reproducing slightly above replacement. But the most persuasive explanation I've seen, NonZionism's "trickle-down natalism," attributes Israel's fertility to the cultural influence of the Haredim, a community with precisely the direct-accountability structure this thesis predicts would be necessary. I can conceive of other functional arrangements, in which a relatively celibate governing elite supports the fertility of the population it recruits from. Before Gutenberg and Luther, the Roman Catholic Church enjoyed considerable success.
Full integration into the modern global economy may itself require passing the kind of loyalty tests that corrode the relations of direct accountability on which genuine concern depends. If so, the best achievable arrangement may be something like Israel's uneasy compromise: a society that preserves a directly-accountable core while participating in the global system selectively, accepting the tension rather than resolving it.
During the COVID-19 pandemic my father called me up one day and said I should be extra careful because on the news they said a COVID-related number went up in his state. I asked what number, what was the numerator, what was the denominator, what was being measured. He didn't know and didn't seem bothered by this. So the number wasn't being used as part of a structured quantitative model, but as a social prestige claim, part of a process by which he calibrated to what he perceived as a socially conforming level of anxiety. Anecdotes likewise contain local information, but people reading or watching the news or social media might use them not to draw specific structured local inferences, but to, again, calibrate their level of anxiety to the perceived norm. ↩︎
They did share their sled in the blizzard, and months later we finally managed to visit them in their home. They're not monsters, just crazy like everyone else. ↩︎
Anatomy of a Bubble. For a distinct but related perspective, see Geeks, MOPs, and sociopaths in subculture evolution. ↩︎
Matthew 13:24-30, KJV.. Though on the other hand, rye seems to have originally been a weed infesting wheat and barley fields, that was accidentally bred into a crop. By removing all the obvious weeds and replanting whatever of its seeds made it into the seed corn, farmers selected for similarity to crop grains (see also Sun et al. 2022). Oats might have developed the same way. ↩︎