2026-03-11 21:00:00
EDIT: this comparison is much less clean than I thought it was: the Union Sq building has 19 garage spaces reserved for affordable units. Combining these with the permits, it's 29% of parking-eligible units having a car instead of 8%.
In 2017 I wrote:
One of the major reasons existing residents often oppose adding more housing is that as more people move in it gets harder to find on-street parking. What if we added a new category of unit that didn't come with any rights to street parking?
My city (Somerville MA) included this in our 2019 zoning overhaul, but it does have some exceptions:
This policy exempts residents that may be 'choice limited', including:
- Persons with disabilities
- Occupants of affordable dwelling units
- Residents with extenuating circumstances
While this is a compassionate approach, it means we haven't fully disconnected housing construction from parking demand. For example, there's a proposal to build a 500-unit parking-ineligible building in Davis Sq (which would no longer be the end of the Burren). It's 25% affordable units, and opponents argue that if each has a driver this would be 125 additional cars competing for street parking. But would we really get that many?
A few years ago we got a similar parking-ineligible building in Union Sq, also a short walk from a subway station:
This is 450 units, of which 20% (90) are affordable. Ashish Shrestha submitted a records request to the city, and learned that only seven units have parking permits.
While the Davis project is a little bigger, this would suggest something in the range of 10 permits, much less than feared.
This makes sense: if you're in Union or Davis, with good public transit and bike options, living without a car is pretty practical. It also saves you a lot of money, especially for folks living in affordable units.
2026-03-10 21:00:00
People are often pretty short-sighted, spending money today that they'll want tomorrow. Debt makes it possible to prioritize your current self even more highly: you can spend money you haven't even earned yet. This is a trap many people fall into, and one different communities have built social defenses against.
One of the more surprisingly successful approaches is the Financial Peace (Ramsey) system, popular in evangelical Christian communities. It has a series of rules, most prominently the seven baby steps:
Save $1,000 for your starter emergency fund.
Pay off all debt (except the house) using the debt snowball.
Save 3–6 months of expenses in a fully funded emergency fund.
Invest 15% of your household income in retirement.
Save for your children's college fund.
Pay off your home early.
Build wealth and give.
There are many more specific rules, however, such as:
As a general rule of thumb, the total value of your vehicles (anything with a motor in it) should never be more than half of your annual household income.
I have had several conversations over the years with Christian friends and acquaintances who are big fans of these methods, and each time I'm thinking both:
This seems like a set of rules that, overall, is likely to help the median American improve their financial situation. The advice is straightforward and accounts for how people actually behave. Bright line rules reduce decision fatigue, limit rationalization, and generally make it harder to fool yourself. A community that strictly follows this approach likely ends up much stronger financially than average.
The rules are full of bad advice.
Some specific bad advice on which the Ramsey approach is uncompromising:
If you have $10k of debt at 2% interest and $11k of debt at 10% interest, you should pay down the $10k first.
If you have any non-mortgage debt you should not contribute to retirement, even if this means passing up on a generous employer match.
If you have debt at very low interest (ex: a mortgage from 2021 at 3%) you should pay it off as fast as you can afford to, even though extremely safe investments (money market funds, treasury bills) pay higher rates (~4%).
I want to write about how terrible this is, but I can't. It really is awful advice for a disciplined and informed person who's thoughtful with their money, but that's not his audience. And it's not most people.
Still, the choice isn't between the Ramsey approach and nothing. There are other advisers out there who combine consideration of human irrationalities and failings with a better ratio of good to bad financial planning advice. The next time I'm in one of these conversations I'm going to try to hook them on Mr. Money Mustache or at least the Money Guys.
2026-03-08 21:00:00
A common source of friction within couples or between housemates is differing quality standards. Perhaps I hate the feeling of grit under my feet but my housemate who is responsible for sweeping doesn't mind it so much. If you do chores when you notice they need doing and stop when they seem done, this works poorly: the more fastidious get frustrated, and often stew in silence or nag. Even if it's talked about kindly and openly, doing a chore before it bothers you is harder and less satisfying.
When people set out to divide chores they're usually weighing duration and discomfort. These matter, but I think people should put more weight on the standards each person has, and generally try to give tasks to the person with the highest standards in that area.
If you divide everything this way, though, it will probably be pretty unfair: preferences are correlated, where someone who notices dirt on the floor probably also notices crumbs on the counter and that the recycling is overflowing. Some options:
Do chores on a schedule. We host a monthly event at our house, and there are things I clean as part of setting up. It doesn't matter whether the bathroom mirror looks dirty to me, I'll clean it because it's on my list. (But Julia will probably also clean it a few times over the course of the month.)
Bring your needs closer together. If one member of the couple does the laundry but the other always runs out of socks first, they could switch who does the laundry, or they could just buy more socks.
Decouple your needs. That same couple could instead switch to each doing their own laundry. Now if one person doesn't do it for a long time it doesn't impact the other.
Make the need more salient. If one person isn't noticing that something needs doing, you can address that directly. Empty the trash, but instead of taking it out you put it by the door they walk through to go to work. Accumulate dirty dishes on the counter (visible) and not in the sink (hidden). If you just start unilaterally increasing salience that's passive aggressive and probably doesn't go well, but if it comes out of an open-ended "what are some strategies we could use to make our chore division more fair" I expect that's positive.
Lower your standards. I know a few people who internalized a high cleanliness target as children, and benefited as adults from deciding to focus less on it. Often when becoming a parent: higher demands on time, letting high standards slip, realizing that actually it's not a problem. I could also imagine a sloppier person intentionally raising their standards, but that seems a lot harder, or else it's just something people around me have been less likely to talk about.
Hire someone. If one person cares a lot about having clean floors and the other person doesn't, neither of them enjoys mopping, and they have some money, they can apply (3) to solve (1) without running into issues with (2). I know couples and group houses who decided to pay for a cleaner to come every week or two, and found it massively reduced conflict. Automation (dishwasher, floor-cleaning robot) can work well here too.
This is an area where Julia and I used to have a substantial amount of conflict, and while things aren't perfect here I do think they're a lot better in part due to applying several of the above.
Comment via: facebook, lesswrong, mastodon, bluesky, substack
2026-03-01 21:00:00
We present and formally deprecate WoFBench, a novel test that compares the knowledge of Wings of Fire superfans to frontier AI models. The benchmark showed initial promise as a challenging evaluation, but unfortunately proved to be saturated on creation as AI models produced output that was, to the extent of our ability to score responses, statistically indistinguishable from entirely correct.
Benchmarks are important tools for tracking the rapid advancements in model capabilities, but they are struggling to keep up with LLM progress: frontier models now consistently achieve high scores on many popular benchmarks, raising questions about their continued ability to differentiate between models.
In response, we introduce WoFBench, an evaluation suite designed to test recall and knowledge synthesis in the domain of Tui T. Sutherland's Wings of Fire universe.
The superfans were identified via a careful search process, in which all members of the lead author's household were asked to complete a self-assessment of their knowledge of the Wings of Fire universe. The assessment consisted of a single question, with the text "do you think you know the Wings of Fire universe better than Gemini?" Two superfans were identified, who we keep anonymous to reduce the risk of panel poaching by competing benchmark efforts.
Identification of questions proved difficult, as the benchmark authors have extremely limited knowledge of Wings of Fire lore, primarily derived from infodumping and overheard arguments. We initially attempted to source questions from the superfans, where each could be judged on the other's questions. As they were uncompensated and rivalrous, however, they agreed to participate only to the extent that their answers could be compared across the superfan panel. Instead, questions were sourced by asking Claude Opus 4.6:
Can you give me three questions about the Wings of Fire series, aiming to make them as hard as possible? I intend to ask these to my 11-year-old, my 10-year-old, and also to Gemini, and I want them all to struggle. My two kids have agreed to participate in this, and while Gemini has not been consulted I do not expect it to object.
The final benchmark consisted of seventeen questions, limited primarily by the lead author's willingness to continue. The elder superfan appeared indefatigable, [1] and if this benchmark otherwise appeared promising we are confident that an extremely large benchmark could be constructed. Note that the younger superfan needed to leave for a birthday party before evaluation could be completed, and was not evaluated on all questions. Answers were collected in written form, to avoid leakage within the superfan panel. No points were deducted for errors of spelling.
Each answer was validated by allowing the superfans to discuss, asking follow-up questions to Gemini, and in especially contentious cases by direct inspection of primary sources. Note that this validation procedure is not able to distinguish cases in which all superfans and models were correct from ones in which they all give the same incorrect answer.
We evaluated Gemini 3.1 Pro in real time, and followed up with evaluations of Claude Opus 4.6, ChatGPT 5.2 Pro, and ELIZA. In cases where questions had multiple components, partial credit was given as a fraction of all components.
| Evaluee | WoFBench Score |
|---|---|
| Superfan 1 (age 11) | 14.7/17 |
| Superfan 2 (age 10) | 5.9/6 |
| Gemini | 17.0/17 |
| Claude | 16.8/17 |
| ChatGPT | 16.3/17 |
| ELIZA | 0/17 |
We conclude that while some AI systems, notably ELIZA, performed poorly, all frontier models scored very close to 100%. Many of the lost points are arguably judgment calls, or cases where a model tried to interpret a trick/misinformed question maximally charitably. Superfan 1 performed noticeably below frontier models, though above the ELIZA baseline. Superfan 2 performed competitively, though we note she was not evaluated on the questions where Superfan 1 lost the most points, making direct comparison difficult.
While this benchmark was designed to be challenging for both superfans and AIs, it already has very limited ability to distinguish between models. While further sensitivity might be squeezed out via the addition of multi-sample evaluation, it's unlikely that this would be meaningful for this model generation let alone future ones. This reflects an increasingly common conundrum that benchmark developers may find themselves in, where after investing large amounts of time, effort, and money into the creation of a benchmark it is already obsolete when published. The authors note that benchmark saturation joins job displacement, stable authoritarianism, and human extinction on the list of reasons to be concerned about the pace of AI progress.
[1] Superfan 1 was permitted to read a draft of this report prior
to publication. Their only feedback was that I should ask them
additional, harder, questions. As of publication time, Superfan 1 was
repeating "ask me more Wings of Fire questions!" at progressively
increasing volume.
2026-02-27 21:00:00
Six years ago, as covid-19 was rapidly spreading through the US, my sister was working as a medical resident. One day she was handed an N95 and told to "guard it with her life", because there weren't any more coming.
N95s are made from meltblown polypropylene, produced from plastic pellets manufactured in a small number of chemical plants. Two of these plants were operated by Braskem America in Marcus Hook PA and Neal WV. If there were infections on site, the whole operation would need to shut down, and the factories that turned their pellets into mask fabric would stall.
Companies everywhere were figuring out how to deal with this risk. The standard approach was staggering shifts, social distancing, temperature checks, and lots of handwashing. This reduced risk, but each shift change was an opportunity for someone to bring in an infection from the community.
Someone had the idea: what if we never left? About eighty people, across both plants, volunteered to move in. The plan was four weeks, twelve-hour shifts with air mattresses on the floor each night and seeing their families only through screens. With full isolation no one would be exposed, and they could keep the polypropylene flowing.
The company would compensate them well: full wages for the whole time, even when sleeping, and a paid week off after. They had more volunteers than they had space for.
I've looked pretty hard, and as far as I can tell no other factories [1] did this. Companies retooled to make PPE. Ford and GM converted auto plants to make ventilators and masks. Distilleries made hand sanitizer. No one else volunteered to move into their factory.
And it wasn't emergency planners who came up with the idea, either. It was ordinary people, looking at their situation, and thinking creatively about how to do their part.
In those 28 days they produced 40M pounds of polypropylene, enough for maybe 500M N95s.
These workers were doing something critical that almost no one else could do. When people argue about higher pricing during emergencies, this is what the economics can look like: the work was needed, the plants could not run without them, and they were paid accordingly.
Notice, however, that Braskem made it possible for people to be heroes. If the workers had been expected to do this for normal wages, this wouldn't have happened. The number of volunteers is not independent of the offer. When someone figures out a creative way to fill a vital gap in an emergency they should get paid like it matters, because that's how you get more gaps filled.
Their short-term impact was producing the materials for 500M masks, but I hope their long-term impact is larger: showing how in an emergency ordinary people thinking creatively about their specific situation can find solutions no one else would come up with for them.
[1] This does stretch it a little: while this is the only case I could
find for a factory, there were several utilities that did things along
these lines. Ex: 1,
2.
Comment via: facebook, lesswrong, the EA Forum, mastodon, bluesky, substack
2026-02-22 21:00:00
I think more people should be storing a substantial amount of food. It's not likely you'll need it, but as with reusable masks the cost is low enough I think it's usually worth it.
It's hard for me to really imagine living through a famine. The world as I have experienced it has been one of abundant calories, where people are generally more worried about getting too many than too few. Essentially no one dies in the US from food unavailability. Globally, however, it's different: each year millions die from hunger.
If you look at the circumstances of modern famines, they're downstream from systems failing. Society was functioning well enough that most people got enough calories, then something went seriously wrong, most likely war. This is one of the reasons that it's hard to use donations to reduce hunger deaths: getting food to people stuck in war zones is very hard.
This means from an altruistic perspective I feel torn: the current situation is horrible, but it's also not where I think my donations would go farthest and so it's not where I donate. This is the painful reality of living in a world that is far worse than it could be, doing what we can and knowing it's not enough.
I also look at famine from a selfish perspective, however, thinking about how this risk might impact me and the people I most love. [1] As someone whose day job involves trying to reduce rare-but-catastrophic risks, I do think global famine is plausible. Our systems are robust to localized problems, but much less so to widespread disasters. Storing food to reduce the worst outcomes seems worth doing. [2]
The approach we take is buying extra of the non-perishables we usually eat, and rotating through them. Our main cost is in buying some food earlier than we normally would. We eat a lot of pasta and beans, and a pound of pasta and can of beans give about a person-day of calories and protein for $2, or $60 for a month's worth.
The $60 cost isn't the real number, though, because you're investing: you can always eat this food later if you need the money. If the market would give you a 5% real return and the value of food roughly tracks inflation, the annual cost of keeping $60 as food is $3 ($60 * 5%). I think this is worth doing for most people until you bump into the limit of what you have space to store or what you'll rotate through before it spoils, and may be worth it beyond that depending on how likely you think the risks are.
Aside from the tail-risk reduction, there are also day-to-day benefits of having more food on hand. We can go to the grocery store less often, buy a larger proportion of our food when it's on sale, go to the farther store that charges less, and cook more things without going to the store. [3]
Like many preparedness questions, a lot of this comes down to how much space you have. When we were living in apartments, moving ~yearly and where each sqft counted, we only did a little of this (buying extra pasta). But now that we're in a house (where I strongly hope to never move again) and generally have more space it's worth it for us to do a bunch more. Something to consider next time you're at the store?
[1] Having kids made me feel much
more strongly here. I already did this
some before having kids because it seemed reasonable, but the idea
of them not having enough to eat is viscerally horrifying in a way
that's hard to think or write about.
[2] A rough EV estimate: storing three months of food costs $180 up front and so $9 in lost returns annually, not having enough food in a 3-month famine might give a 5% chance of death, and perhaps you value your life at $10M. This gives a conditional benefit of $500,000, and means it's worth it as long as you think your annual odds of experiencing a 3-month famine are at least 0.002%. Alternatively, if you're not the kind of person who would actually rotate through food let's imagine you buy rice and beans. White rice lasts ~indefinitely if you keep it dry and keep out the rodents; canned beans are edible and nutritious well past their sell-by, perhaps 10y. Rice is a bit cheaper than pasta, but buying a rodent-proof tub to keep the bag in adds some cost, so let's say it's still $180 for a three-months better-than-starvation option that lasts you 10 years. Then it's worth doing as long as you think annual risk is at least 0.004%.
[3] For example, this evening Lily decided she wanted to cook dinner, making a vegetarian curry she'd learned from a friend. It turned out we already had everything in her recipe on hand, with a few substitutions (ex: canned tomatoes instead of fresh).