MoreRSS

site iconBear Blog Trending PostsModify

Ranked according to the following algorithm:Score = log10(U) + (S / D * 8600), U is Upvotes , S/D is time.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Bear Blog Trending Posts

Try more, think less

2025-12-12 01:03:44

If you dropped by my blog yesterday, or subscribe via RSS, you might have seen a post appear and then quietly disappear.

It was a post about an idea I had for creating a Bear theme “live”, documenting every part of the process. It lasted about three hours, then I realized it was a bit too much work for me to handle.

Hasty idea? Should I have thought it through a little more?

No, I don’t think so.

I did think about it, and I decided to give it a try. I figured it was the only way to really know if it would fly.

Otherwise, I would’ve spent endless hours just thinking about it, never really knowing how it would feel in real life. Instead of three hours, it might have turned into three days down the drain.

If you have an idea, don’t waste your time trying to perfect it. Try it out, within reasonable limits. See how it feels and how others respond... if it even survives longer than three hours.

You might end up with something you would never have discovered by only thinking about it. Go live and see what comes alive.

Good luck!

De-shittification

2025-12-11 13:51:00

My wife the other day was looking up haircuts for our two year old and walked up to me showing me her phone and asked, "is this an AI child". As she kept scrolling there seemed to be no end to these phantom impossibly smooth-skinned and perfectly tussled hair children. "Is it AI" is the new "is it cake", the cake being orders of magnitude less harmful. The fact that my wife is searching for 2 year old boys haircuts and can't find a damn actual photo of a real child's haircut is fucking infuriating. The general availability and accessibility of generative AI for content generation has hastened enshittification.

For those of you who have not heard of this term, "enshittification", this is the definition per Wikipedia:

Enshittification, also known as crapification and platform decay, is a pattern in which two-sided online products and services decline in quality over time. Initially, vendors create high-quality offerings to attract users, then they degrade those offerings to better serve business customers, and finally degrade their services to users and business customers to maximize short-term profits for shareholders.

I first started using the internet in the early 90s, when a simple website was used as personal expression. The world of BBSs, E-Zines, and personal sites is not gone, it just can't compete with the algorithm. You want to find the "best robot vacuum 2025"? You're going to get SEO-optimized AI slop from a site like "bestvacuumreviews2025[.]xyz" written by no one, for no one.

So what is one to do? You could become an off-grid mountain hermit, or you can carve out a bit of the internet for yourself. I've been calling it, "de-shittification". I am starting to realize you can have a better online experience, it just takes a little intentional setup. This is what has been working for me.

RSS (Really Simple Syndication)

This is a technology that has been around forever, and essentially is a way for you to subscribe to feeds from different news sites and blogs. Prior to using RSS I was using Google news and hating it. Now I pull all of my threat intel blogs, investigative journalism, and other information into my own feed.

The reason that using an RSS reader is so important in this process is that you can collect sources of information that you find credible, useful, or just enjoyable. This way you can keep track of all the sites that you find along your journey that aren't roided out on ads or affiliate links.

I personally self-host Miniflux which is a pretty simple barebones RSS feed aggregator, with robust API, which is perfect for my needs. But if I were to suggest a hosted aggregator I would go with Inoreader; it has a pretty decent suite of features for a reasonable price.

For more on RSS check out this blog: https://blog.remainna.xyz/you-should-use-rss/

Search

I assume you have probably "googled" something in the last couple years, and you have probably noticed search results have degraded immensely. It happened so gradually that you may have just let that pass by.

I just pretty much assumed that we were stuck with this, and now needed AI to filter the results for us. While AI searches are great, why can't search just be better? Well it can. While free options like DuckDuckGo and Brave are definitely alternative search engines I wouldn't necessarily say they are better.

Though I haven't tested and benchmarked every search engine out there, I have been testing out using Kagi. The experience is refreshing... you know, searching for something and finding it without having to dodge a bunch of paid results and trash content. One thing to note here is that Kagi is a paid search engine, which I think is perfectly fine based on my experience thus far. Some of my favorite features are the ability to block, down rank, or up rank domains in your search results and the ability to filter out AI generated images from image search. For more details of the features check out their docs.

Social Media

This is something that is dependent on so much per person. Obviously social media for some people is a way that they make their living, while others use it to keep in touch with their friends and family, and some use scrolling their feed as a way to decompress from their long day. However you use it, I think most people have the common experience of having wasted enormous amounts of time on it. You click on one video, then scroll, look up, and all of a sudden you've lost time.

There are plenty of people who give it up completely, and good for them. But there are plenty of us that still would like to use it, in a healthier way. The way that I handle this is by "containerization", and what I mean by that is introducing friction to accessing the social media that I want to limit my exposure to.

There are a multitude of apps that allow you to set timers and lock you out of apps based on your time usage settings. For me I put all of my social media, into a separate user profile on my phone. There may be some other Android OS variants that have this ability, but I use Graphene OS which is a security and privacy focused Android OS. This profile which I have all of my social media in, essentially is turned off not allowing for notifications or anything to get through into my main profile. When I want to look at any of my socials, I need to leave my main profile, put in a pass code and wait for it to load. I have found that this friction is good enough for me to significantly reduce my usage.

Where to find content

At first I was having issues with alternatives to what I normally would consume, but over time you start to gather sites that you like. A lot of the time it's been sitting right in front of you, such as someone posts an article on Bluesky, Reddit, or HackerNews; if you like the content, just add the blog or site to your RSS aggregator.

Other than organically finding content, you can always follow Kagi's Small Web feed or Bear Blog's discovery feed (the blogging platform where this site is hosted).

Wrapping it up

There is so much more around this subject that I would like to share, but I must wrap up for fear that I will never post my first blog. That being said, I will definitely be having more posts around my attempts at de-shittification.

Reply via email

Vibe Coding is Mad Depressing

2025-12-11 11:28:14

I’ve been in the mobile development industry for almost 15 years, and this AI/LLM era might be the worst.

My work are mostly freelance, gigs, hourly, milestones, and I could say 90% of my experience are greenfield projects. I don’t have any apps on my own, I make a living coding apps for others.

Before AI

Back in the day, during a client kickoff they usually hand me a document with a UI prototype and a list of features. Then, you start from scratch, File - New Project. git init that shit and you’re on your way.

Everything was calm, clients just wanted a weekly or monthly feedback because they know how hard mobile development is you know. No pressure. You can focus on the great work, clean code, proper variable naming, proper git commit, all that stuffs.

In 2-3 months you get an alpha or beta build out, and clients are very happy. They can’t believe there idea, has now transformed into something they can play with.

Start of AI era

Fast forward to today, or maybe it started around 2-3 years ago. Nothing wrong with it at first. Like any freelancer, I try to adopt with the latest trends.

At first it was just code snippets.

Hey! I asked AI for this code, do you think this will work? I think you should use it.

Okay, so this non-technical person is sending me codes now.

I mostly reply with

It’s alright I got some working code blocks that worked in production perfectly fine. Thanks though!

But, then this code snippets get larger and larger as time goes on. I'm thankful for this suggestions of course. But, it's just additional work when you're coding and you get this AI source code and then you have to think on how to merge this code with a different coding style and variable names into your codebase.

Vibe Coding era

The first clues started when a client, who I thought was a software developer, starts merging his own code through the main branch, without warning. No pull request, just straight git push --force origin main.

As I started to checkout what the code was about, I started seeing this emojis inside the print() statements. I thought, this is so odd and unprofessional.

Screenshot 2025-12-11 at 10

I tried to Google search the macOS shortcut for emojis, to match this person's vibe. This fella must really like emojis you know. It turns out, AI code has a lot of emojis along with it.

The other sign was how the branching, and merging works with AI. And maybe feature request? I really don't know. For example, one vibe coded project has 1,227 branches and counting. I haven't merged one yet, I let the client deal with that.

Screenshot 2025-12-11 at 10

Last time, I checked this Xcode project did not compiled. Or anything close to it.

And the last thing that made me snapped was, all this vibed source code were located inside one file ContentView. To anyone who's not familiar, ContentView is the first SwiftUI file created when you start a new Xcode project.

Screenshot 2025-12-09 at 12

All the UI logic, view models, model are located inside that file. Worst part, this is currently live in the App Store.

Conclusion

I totally get it, everyone has to make a living. Creating an app is one of them. I just feel sad with how AI has bastardized my profession, which I worked hard for the last 15 years. There is no best practices anymore, no proper process, no meaningful back and forth. Just dealing with thousands and thousands of lines of code at every project kickoff.

Pay less taxes using web agents, a directed graph, and Dijkstra's algorithm

2025-12-11 03:00:00

I engineered a first-of-its-kind tool that automates the process of optimizing profit repatriation from international subsidiaries back to corporate headquarters (HQs). This solution is relevant for multinational corporations, startups, high-net-worth individuals, and the tax advisors/accountants who serve them, offering the potential to legally save billions in taxes, and really anyone else who is interested in uncovering the secrets of how some people and companies pay $0%$ tax on the profits they make globally.

My primary goals were to:

  • Address a market gap: eliminate the costly, time-consuming manual research traditionally required to identify the initial steps for establishing complex tax structures.
  • Increase transparency: demystify the mechanics of profit-repatriation strategies for a broader audience.
  • Advance tax-planning technology: solve the technical challenge of cross-border tax-planning.

Disclaimer: this post is a technical exploration. It does not reflect or express my personal opinions or views regarding the morality, fairness, or policy of taxation, tax law, or tax avoidance strategies.

Introduction

In this blog post, I will explain step-by-step how I built TreatyHopper: a tool that aims to optimize the process of global tax routing. Typically, when repatriating profits from subsidiaries in foreign countries to the US, a tax has to be paid on the dividends (profits) that are sent by the subsidiary to the US. This tax is called a withholding tax on dividends. In order to avoid paying taxes twice (once when the money leaves the subsidiary and one more time when the money enters the US), many countries have Double Taxation Agreements (DTAs) with each other (colloquially known as tax treaties). These treaties govern which country has the right to levy the tax and to what rules they are bound. They can sometimes allow you to avoid paying withholding tax on dividends altogether, for example, allowing you to send dividends from a subsidiary in the Netherlands to your HQ in the US and pay $0%$.

Now, let's say you have another subsidiary in Zambia. Sending dividends from Zambia directly to the US would result in double taxation (once when the dividends leave Zambia and once when they arrive in the US), since Zambia doesn't have a DTA with the US. Hence you go to a tax advisor (or accountant) and he or she advises you to open a new subsidiary in France. Then you will send the money from Zambia to France and pay $0%$ and then from France to the US and again pay $0%$. By intelligently using your DTAs, you have basically reduced your effective tax burden to $0%$.

Now, there is a problem, however. These tax advisors and accountants typically rely on manual research and ingrained habits, often favoring common structures like a Dutch BV or Luxembourg Sàrl. This manual approach makes it impossible to analyze the thousands of bilateral tax treaties simultaneously to find the optimal, multi-jurisdictional route. With TreatyHopper I aim to solve this problem by modeling international tax law as a directed graph problem to computationally find the most cost-effective path.

Motivation

I stumbled upon this problem by sheer coincidence. I was reading an article in a major Dutch newspaper explaining how multinational corporations often make extensive use of Dutch DTAs through subsidiaries in the Netherlands. These subsidiaries funnel dividends from many countries around the world, using the Netherlands as a conduit to send the funds to the US. By using these DTAs, these corporations are able, in some cases, to lower their effective tax burden to as low as $0%$.

This intelligent selection process (determining which countries to route dividends through) is known as treaty shopping. This interesting phenomenon led me down the rabbit hole of international tax treaties. While I found a few academic papers rigorously explaining how treaty shopping works, literature regarding network analysis and optimization was very sparse. In fact, the existing network literature seemed to come mainly from two researchers who seem to be working on this exact same problem for the last 10+ years, though their goal is to eliminate the benefits provided by these tax treaties (and impose a minimum wealth tax, among other things).

To my surprise, despite potential savings reaching billions of dollars, no public optimization tool appeared to exist (at least to the best of my knowledge). Stemming from my background in Operations Research, I decided to focus on the optimization challenge: how can I efficiently route my dividends from a source to a destination country while paying the least amount of taxes?

Problem formulation

The simplest method for answering this question is to take a pragmatic approach: collect the data on tax treaties between countries using web agents, create a directed graph with the collected data, and run Dijkstra's algorithm to find an answer.

Web agents

First of all, I would like to mention that the project I undertook with TreatyHopper would have required significantly more time before the advent of agentic AI. Manually, finding all the relevant tax treaties, gathering the data, and parsing it into something meaningful would have taken so much time that I probably wouldn't have undertaken the project at all. Luckily, in the present day, we can use agentic AI to perform repetitive tasks for us.

I won't detail the process too much, as I used an online service instead of programming it from scratch. However, I essentially deployed an army of web agents (in the agentic AI sense) to collect data on DTAs, specifically focusing on the withholding tax rate on dividends.

The process resulted in the compilation of data on $5,000+$ tax treaties worldwide. This involved about $100$ source countries and $200$ destination countries (not all countries have treaties with each other, so there are not $100 \times 200 = 20,000$ values).

The highest withholding tax identified was $47%$, applied when moving dividends from Greece to Finland. Conversely, the lowest tax rates were $0%$, primarily for transactions originating from countries such as the Netherlands and Luxembourg, which are by some considered to be tax havens.

It is worth mentioning that these tax rates often differ based on the percentage of subsidiary ownership. For the sake of generalization, we assume that all intermediaries and the subsidiaries are wholly owned in the ownership path. This means the operating subsidiary is $100%$ owned by the first intermediary, which is $100%$ owned by the next, and so on, until the final intermediary is $100%$ owned by the HQ. We also ignore the costs associated with setting up such corporate structures.

Furthermore, governments around the world have introduced measures such as the general anti-avoidance rules (GAAR) and the principal purpose test (PPT). For now, for the sake of abstraction, we will ignore these measures (which I reckon, is a very strong assumption).

Data structure

Upon initial thought, my idea was to put all the data in an adjacency matrix with $100$ columns for the countries where the dividends were originating from and $200$ rows for the countries where they could be sent to. However, upon visual verification (and of course logical deduction), the adjacency matrix would have been extremely sparse ($20,000$ entries with about $15,000$ of them being NULL values). This is what some of the entries look like:

Screenshot 2025-12-10 at 19

Therefore I decided to use an edge list (see picture below) instead, which in our case has better space complexity than the adjacency matrix.

Screenshot 2025-12-10 at 19

Directed graph

We define a directed graph $G = (V, E)$. Here, $V$ represents the set of all jurisdictions in our data, and $E$ is the set of directed edges $(i, j)$ between jurisdiction $i$ and jurisdiction $j$ that have a DTA in our data. Note that DTAs are not necessarily symmetric. The tax rate for sending money from country $A$ to country $B$ often differs from the rate for sending money from $B$ to $A$.

Each directed edge $(i, j) \in E$ is assigned a non-negative weight (more on this later), $w_{ij}$, representing the cost of moving profits from jurisdiction $i$ to jurisdiction $j$.

We have a source $s \in V$, where our profits originate, and a destination $d \in V$, where our profits should ultimately reside.

Dijkstra's algorithm

Now that we have defined the components of this problem, we shall proceed with the most important part: the objective function. Truly what I am trying to achieve here is to find the method by which I retain the biggest part of my dividends when sending them from my subsidiaries to my HQ.

Let $r_{ij}$ be the withholding tax rate when sending dividends from jurisdiction $i$ to jurisdiction $j$. If the tax rate is $r_{ij}$, the fraction of the dividend I retain is $(1 - r_{ij})$.

We want to find the path $P$ that can look like $P = (s, v_1, v_2, \cdots, d)$ such that we maximize the retention of our dividends. The overall retention is the product of all retention factors along the path: $$ \text{Retention} = (1 - r_{s, 1}) \times (1 - r_{1, 2}) \times \cdots \times (1 - r_{k, d}) $$ In other words, we want to find the path $P$ that maximizes our objective function $Z$: $$ \max Z = {\prod_{(i,j) \in P} (1 - r_{ij})} $$

The additive transformation

Note that this formulation involves a product and a maximization. Recall that Dijkstra's algorithm only works on additive path costs (weights) with non-negative costs and is designed for minimization problems. Therefore, I must employ a smart trick to rewrite this problem as a minimization problem with additive weights.

The key mathematical property I use is the logarithm, which converts multiplication into addition ($\ln(A \times B) = \ln(A) + \ln(B)$). To transform the maximization into a minimization, I optimize for the negative logarithm of the objective function.

Minimizing $Z' = -\ln(Z)$ is mathematically equivalent to maximizing $Z$. Applying this to my objective: $$ \min Z' = - \ln \left( {\prod_{(i,j) \in P} (1 - r_{ij})} \right) $$ Using the logarithm rule for products, I can convert this into a sum: $$ \min Z' = - \sum_{(i,j) \in P} \ln (1 - r_{ij}) $$ This immediately yields the cost (weight) $w_{ij}$ for each edge $(i, j)$ in my network as the term within the summation: $$ w_{ij} = - \ln (1 - r_{ij}) $$ This brings us to the final, shortest-path compatible objective function: $$ \min Z' = \sum_{(i, j) \in P} w_{ij} $$

Verification of non-negativity

For Dijkstra's algorithm to work reliably, all weights $w_{ij}$ must be non-negative. Since tax rates are bound by $0 \leq r_{ij} < 1$, the retention factor $(1 - r_{ij})$ must be in the range $0 < 1 - r_{ij} \leq 1$. The natural logarithm of any number in this range is always zero or negative, $\ln (1 - r_{ij}) \leq 0$.

Since $w_{ij}$ is the negative of this value, $w_{ij} = - \ln (1 - r_{ij})$, it must be non-negative ($\geq 0$). The transformation is therefore valid for using Dijkstra's algorithm.

Solution

To keep things simple, I chose Next.js as the full stack framework for building a minimum viable product (MVP). Furthermore, I use Supabase for storing the edge list. Combining everything together and packaging it in a user-friendly interface, this led to the culmination of V0.0.1 of TreatyHopper:

Screenshot 2025-12-10 at 19

I shared the tool with a small initial group of users and have received some valuable feedback, and generally favorable reviews from them. If you are a startup, multinational corporation, tax advisor, accountant, or otherwise interested in using the tool or see other use cases that require some adjustments, feel free to contact me through this form.

Next steps

Thanks for reading this far! I really appreciate it. I have tried my absolute best to minimize the use of LLMs to write this text. Apart from improving my spelling, I haven't really used it. I hope it was an interesting read and please feel free to provide any feedback on this post. Now: what is next for TreatyHopper?

In the coming few weeks I intend to add the following features and address the following feedback:

  • I am currently working on a very early version of this tool utilizing Graph Neural Networks (GNNs) to predict the risks associated with certain paths. For example, in general, the governments of some countries are stricter when it comes to the enforcement of countermeasures such as GAAR and PPT. It would be interesting to see if we can somehow predict the risk associated with repatriating profits through some of these countries.
  • Expand the tool to answer the following questions: how much of the profits are left after repatriating them? What happens if we limit the number of subsidiaries (e.g., we don't want to have a subsidiary in some countries)? Can we reduce the number of hops (i.e., intermediate jurisdictions in the route)? Note to self: see notes in notebook.
  • I am having a hard time verifying the data found by the web agents. It is inevitable that there are some mistakes in the data, as it is very hard to verify that the data found by the web agents is correct. Currently, I have randomly checked some of the numbers, and so far there seems to be about a mistake in around $5%$ of them. However, this is still a lot, and I am especially keen on reducing this to close to $0%$.
  • Add agent support for reasoning about certain routes and arguing about their soundness.
  • Remove sanctioned countries.
  • Improve parsing of data into Supabase.
  • Add a changelog.
  • Add a disclaimer before running the tool about the fact that it can still make some mistakes and that as of right now, the tool can be best used as a starting point for more advanced tax optimization.
  • Some people asked me why I didn't use the Floyd-Warshall algorithm. I could have used it, but to keep things simple, I opted for Dijkstra's algorithm. The main reason being that I need not necessarily calculate the shortest paths between all pairs of start and destination vertices.

Delibird misunderstanding

2025-12-11 01:54:00

There's a Pokemon called Delibird. It looks like this:

250px-0225Delibird

It came out in Gold and Silver, the second two Pokemon games. I did not have a Gameboy at the time, and I was not allowed to own a handheld console - my parents insisted any game played on a gaming-only device would "rot my brain." (PC games were, for some reason, OK.) Nevertheless, I was obsessed with Pokemon and would watch the show constantly. I tried to play it on my friends' Gameboys whenever possible - I often offered to grind levels for them on field trips. I even purchased guidebooks for Gold and Silver and just read them for my own entertainment.

Because I was not playing these games very often, I had an unusual understanding of many Pokemon. I knew what they looked like in 2D art, and perhaps what they looked like on some Pokemon cards... but I did not know what they looked like when they animated in the game. I did not read their Pokedex descriptions, and I didn't get to see NPCs talking about them. And because I wasn't playing the games or collecting them all, if I thought a Pokemon was stupid, I would just never think about it at all.

I thought Delibird looked fucking stupid. I did not like Pokemon who were holding objects; I wanted the Pokedex to be a kind of naturalist's resource, with images of the biological creatures only, so I didn't like the fact that Delibird was holding a bag in its art.

The other thing is that I just assumed the bag was full of meat.

Delibird is an ice-type Pokemon, but it didn't occur to me that it was a Santa bird, or that it was making deliveries of food in general. (Pokemon games insist that it is kind of a St Bernard animal that gives food to lost people, and that the bag is, like, made out of its tail feathers, or something.)

I just assumed that the "Deli" meant "delicatessen," and that the bird was ice type because it was keeping the meat cold. I assumed it would be throwing meat out of the bag at people. I imagined that it was full of pork chops. I also wondered whether the bag was a thick extension of its body - a grotesque appendage rather than an object. (In my defense, the bag is supposed to be part of its body, but not in the way I imagined.) I wondered if the bag itself was meat, and whether cutting it would be like cutting a slice off of Delibird itself.

For a very long time, I devoted absolutely no brain energy to Delibird. As I got older, however, it began to seem increasingly unlikely that Delibird was a meat-slinging bird.

I recognized that the meat-bird misunderstanding was a precious kind of falsehood that I should hold onto and cultivate as long as possible. I knew that if I looked Delibird up, I would get the truth, and it could only be disappointing. So I just refused to read about Delibird through my teens, and continued to cultivate the deliberate misunderstanding that its bag was full of meat.

I think I learned that Delibird did not sling meat in college. "Deli" means "delivery," which is fucking dumb. Delibird is a Santa Claus reference, and it delivers, like, berries or something. I think it sometimes delivers bug Pokemon as food (because it is a bird).

Sometimes I find myself forgetting that I ever knew Delibird delivers non-meat items. Ahh, Delibird, the meat bird. I think I preferred the misunderstanding.

Auto-grading decade-old Hacker News discussions with hindsight

2025-12-10 23:00:00

hnhero

TLDR: https://karpathy.ai/hncapsule/


Yesterday I stumbled on this HN thread Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now, where Gemini 3 was hallucinating the frontpage of 10 years from now. One of the comments struck me a bit more though - Bjartr linked to the HN frontpage from exactly 10 years ago, i.e. December 2015. I was reading through the discussions of 10 years ago and mentally grading them for prescience when I realized that an LLM might actually be a lot better at this task. I copy pasted one of the article+comment threads manually into ChatGPT 5.1 Thinking and it gave me a beautiful analysis of what people thought + what actually happened in retrospect, even better and significantly more detailed than what I was doing manually. I realized that this task is actually a really good fit for LLMs and I was looking for excuses to vibe code something with the newly released Opus 4.5, so I got to work. I'm going to get all the front pages of December (31 days, 30 articles per day), get ChatGPT 5.1 Thinking to do the analysis, and present everything in a nice way for historical reading.

There are two macro reasons for why I think the exercise is interesting more generally:

  1. I believe it is quite possible and desirable to train your forward future predictor given training and effort.
  2. I was reminded again of my tweets that said "Be good, future LLMs are watching". You can take that in many directions, but here I want to focus on the idea that future LLMs are watching. Everything we do today might be scrutinized in great detail in the future because doing so will be "free". A lot of the ways people behave currently I think make an implicit "security by obscurity" assumption. But if intelligence really does become too cheap to meter, it will become possible to do a perfect reconstruction and synthesis of everything. LLMs are watching (or humans using them might be). Best to be good.

Vibe coding the actual project was relatively painless and took about 3 hours with Opus 4.5, with a few hickups but overall very impressive. The repository is on GitHub here: karpathy/hn-time-capsule. Here is the progression of what the code does:

  • Given a date, download the frontpage of 30 articles
  • For each article, download/parse the article itself and the full comment thread using Algolia API.
  • Package up everything into a markdown prompt asking for the analysis. Here is the prompt prefix I used:
The following is an article that appeared on Hacker News 10 years ago, and the discussion thread.

Let's use our benefit of hindsight now in 6 sections:

1. Give a brief summary of the article and the discussion thread.
2. What ended up happening to this topic? (research the topic briefly and write a summary)
3. Give out awards for "Most prescient" and "Most wrong" comments, considering what happened.
4. Mention any other fun or notable aspects of the article or discussion.
5. Give out grades to specific people for their comments, considering what happened.
6. At the end, give a final score (from 0-10) for how interesting this article and its retrospect analysis was.

As for the format of Section 5, use the header "Final grades" and follow it with simply an unordered list of people and their grades in the format of "name: grade (optional comment)". Here is an example:

Final grades
- speckx: A+ (excellent predictions on ...)
- tosh: A (correctly predicted this or that ...)
- keepamovin: A
- bgwalter: D
- fsflover: F (completely wrong on ...)

Your list may contain more people of course than just this toy example. Please follow the format exactly because I will be parsing it programmatically. The idea is that I will accumulate the grades for each account to identify the accounts that were over long periods of time the most prescient or the most wrong.

As for the format of Section 6, use the prefix "Article hindsight analysis interestingness score:" and then the score (0-10) as a number. Give high scores to articles/discussions that are prominent, notable, or interesting in retrospect. Give low scores in cases where few predictions are made, or the topic is very niche or obscure, or the discussion is not very interesting in retrospect.

Here is an example:
Article hindsight analysis interestingness score: 8
---
  • Submit prompt to GPT 5.1 Thinking via the OpenAI API
  • Collect and parse the results
  • Render the results into static HTML web pages for easy viewing
  • Host the html result pages on my website: https://karpathy.ai/hncapsule/
  • Host all the intermediate results of the data directory if someone else would like to play. It's the file data.zip under the exact same url prefix (intentionally avoiding a direct link).

I spent a few hours browsing around and found it to be very interesting. A few example threads just for fun:

And then when you navigate over to the Hall of Fame, you can find the top commenters of Hacker News in December 2015, sorted by imdb-style score of their grade point average. In particular, congratulations to pcwalton, tptacek, paulmd, cstross, greglindahl, moxie, hannob, 0xcde4c3db, Manishearth, johncolanduoni - GPT 5.1 Thinking found your comments very insightful and prescient. You can also scroll all the way down to find the noise of HN, which I think we're all familiar with too :)

My code (wait, Opus' code?) on GitHub can be used to reproduce or tweak the results. Running 31 days of 30 articles through GPT 5.1 Thinking meant 31 * 30 = 930 LLM queries and cost about $58 and somewhere around ~1 hour. The LLM megaminds of the future might find this kind of a thing a lot easier, a lot faster and a lot cheaper.