2026-06-03 08:00:00
After I published Five Years of Trying to Add Recursion to lychee, one reply I got was a very fair question:
If recursion is so hard, how do other link checkers do it? Plenty of them already crawl websites!
This sent me down a rabbit hole of reading the code of other link checkers. The key takeaway is: they didn’t find a clever trick we missed. They were built as crawlers from the very first commit, and I initially built lychee as a stream.
I went and read the source of the recursive checkers we list in lychee’s README: muffet (Go), LinkChecker (Python), linkinator (TypeScript), and broken-link-checker (JavaScript). This post is a teardown of how each one actually handles recursion, what it costs them, and what it means for lychee.
If you haven’t read the first post, the summary is that lychee was architected as a one-shot, unidirectional pipeline (inputs → extract → check → output). Recursion needs a cycle (responses create new inputs), and cycles in an async, channel-based pipeline are where the dragons live. 🐲 Five years and four attempts later, the pieces we’ll need to do it properly only just landed.
Every recursive checker I looked at is built from the same three parts:
WaitGroup, a joinable-queue counter, an onIdle() promise, or a queue-drain event.Diagrammatically, lychee is different from the others:
graph TD
subgraph crawler["Everyone else: a cycle"]
direction TB
CQ[Frontier queue] --> CW[Worker pool]
CW --> CP[Fetch and parse page]
CP -->|new links| CQ
CP --> CR[Results]
end
subgraph lychee["lychee: a DAG"]
direction TB
LA[Inputs] --> LB[Extractor]
LB --> LC[Checker]
LC --> LD[Results]
end
Crawlers have a back-edge baked in. Our pipeline doesn’t, and every one of my failed attempts was an effort to bend that back-edge into a graph that was never designed for it.
Let’s look at that graph design more closely:
graph TD
Seed[Seed URLs] --> Enq["Enqueue step: is URL in visited set?"]
Enq -->|yes| Skip[Drop]
Enq -->|no| Mark["Mark visited, then push"]
Mark --> Q[Frontier queue]
Q --> Pool["Worker pool, bounded concurrency"]
Pool --> FP[Fetch page and extract links]
FP -->|discovered links| Enq
FP --> Rec[Results]
Q -.->|empty AND no worker busy| Stop[Terminate]
Note that the visited check happens in the enqueue step, atomically with the mark, before the worker ever touches the network. That ordering is the entire fix to the deduplication race that haunted lychee’s attempts 1–4, where the cache was written after checking.
Each tool uses a variation on it.
muffet is closest in spirit to lychee: a fast, single-binary, concurrent website checker.
The dedup + scheduling decision lives in one method (page_checker.go):
func (c *pageChecker) addPage(p page) {
if !c.donePages.Add(p.URL().String()) {
c.daemonManager.Add(func() { c.checkPage(p) })
}
}
donePages is a concurrentStringSet (a mutex-guarded map[string]struct{}). Add returns whether the URL was already present, so a page is only scheduled the first time it’s seen. Dedup happens at enqueue, synchronized by the set’s mutex. This is basically a line-by-line translation of the diagram above.
Checking a page fetches all of its links concurrently, and feeds qualifying ones back into addPage, the back-edge:
go func(u string) {
defer w.Done()
status, p, err := c.fetcher.Fetch(u)
// ...
if !c.onePageOnly && p != nil && c.linkValidator.Validate(p.URL()) {
c.addPage(p) // recursion: discovered page re-enters the frontier
}
}(u)
muffet’s answer to termination is a little daemonManager built around a sync.WaitGroup (daemon_manager.go):
func (m daemonManager) Add(f func()) {
m.waitGroup.Add(1)
m.daemons <- func() {
f()
m.waitGroup.Done()
}
}
func (m daemonManager) Run() {
go func() {
for f := range m.daemons {
go f()
}
}()
m.waitGroup.Wait() // <- termination
}
Every scheduled page increments the group; every completed page decrements it; Wait() returns when the count hits zero. The whole crawl bootstraps with a single addPage before Run(), so the counter is positive before anyone waits on it.
This is the same counter I tried (and failed with) in Attempt 1 and Attempt 4. The difference is the invariant: waitGroup.Add(1) is only ever called from inside an already-running daemon that holds the count above zero (or from the bootstrap). There is no window where the counter briefly reads zero while work is still pending. Go’s WaitGroup enforces this invariant so naturally that it doesn’t feel like distributed termination detection at all, but that’s exactly what it is. It’s the moral equivalent of the WaitGroup primitive Kait contributed to lychee in 2026.
Run() does go f() for every task, spawning unbounded goroutines. The actual limiting happens downstream in a semaphore (a buffered-channel counting semaphore) and a per-host throttler pool. muffet separates “the frontier” from “the rate limiter,” which is exactly the separation lychee lacked when it tried to use one bounded channel as both in the past.tokio::spawn per link, each needing Send + 'static state) is what pushed me toward Arc<RwLock<…>> and the ownership pain I wrote about.lychee-lib as a reusable crate, which raises the bar, since every architectural choice has to uphold the standards of a public API.Takeaways: muffet
sync.WaitGroup, full stop. It’s the design lychee converged on after five years; muffet got it for free from Go’s standard library on day one.Send/ownership friction shows up.LinkChecker has existed since the year 2000. It’s a synchronous, thread-pool crawler.
Its frontier is a hand-written UrlQueue (cache/urlqueue.py), a clone of Python’s queue.Queue with task_done()/join(). Look at the very first design comment:
def __init__(self, max_allowed_urls=None):
# Note: don't put a maximum size on the queue since it would
# lead to deadlocks when all worker threads called put().
self.queue = collections.deque()
# ...
self.unfinished_tasks = 0
It’s explicit about the exact deadlock that bit me.
That comment is our Attempt 4 backpressure deadlock, called out and designed around. lychee tried to push discovered URLs into a bounded channel; when it filled, the response handler blocked, no responses drained, no slots freed. Deadlock. 💥
LinkChecker’s answer is brutalist in nature: the frontier is unbounded. Backpressure is enforced elsewhere (a fixed thread count and per-host throttling), never by blocking a producer that is also a consumer.
join() blocks until unfinished_tasks hits zero (urlqueue.py):
def task_done(self, url_data):
with self.all_tasks_done:
self.finished_tasks += 1
self.unfinished_tasks -= 1
self.in_progress -= 1
if self.unfinished_tasks <= 0:
self.all_tasks_done.notify_all()
def join(self, timeout=None):
with self.all_tasks_done:
while self.unfinished_tasks:
self.all_tasks_done.wait()
Again: a counter. But the increment in _put and the decrement in task_done are both inside the queue’s Condition lock, and a worker calls task_done only after fully processing an item including enqueuing its children. So children are counted before the parent is marked done, with no premature zero. It’s WaitGroup semantics implemented with a mutex and a condition variable.
LinkChecker writes the URL into its result cache at enqueue time (urlqueue.py):
def _put(self, url_data):
key = url_data.cache_url
cache = url_data.aggregate.result_cache
if cache.has_result(key):
return # already queued/checked -> skip
# ...
self.queue.append(url_data)
self.unfinished_tasks += 1
# add a None placeholder so this URL is never queued twice
cache.add_result(key, None)
That add_result(key, None) sentinel is a “fix” that’s missing in lychee’s attempts. By the time any worker thread checks the URL, the cache already says “mine,” so concurrent discovery from another page is a no-op.
The Aggregate (director/aggregator.py) throttles per host:
@synchronized(_hosts_lock)
def wait_for_host(self, host):
t = time.time()
if host in self.times and self.times[host] > t:
time.sleep(self.times[host] - t)
# spread requests using maxrequestspersecond
wait_time = random.uniform(wait_time_min, wait_time_max)
self.times[host] = time.time() + wait_time
and abort() calls urlqueue.join(timeout=…) so a stuck crawl can’t hang forever.
Checker threads does blocking I/O via requests. Simple and battle-tested, but the concurrency ceiling is the thread count, and each thread carries a full stack. lychee’s Tokio model reaches thousands of concurrent in-flight requests on a handful of OS threads; LinkChecker can’t, and doesn’t try.max_allowed_urls cap and a periodic cleanup() to mitigate it.linkcheck/plugins/: anchor checks, SSL, virus scanning, and more) and many output loggers. This is the most extensible of the bunch, and it pays for that with a large, mature, somewhat old-fashioned codebase.Takeaways: LinkChecker
put() time (a None placeholder in the cache) is their synchronization mechanism. The cache must claim the URL before the request, not after.queue.onIdle()
linkinator is a Node.js checker, and it benefits from something neither Go nor Rust provides: a single-threaded event loop. Check-and-insert into the visited set is atomic for free, because no two callbacks run simultaneously.
The frontier is a concurrency-limited Queue (a p-queue-style structure). Termination is one line in check() (src/index.ts):
const queue = new Queue{ options.concurrency || 100 };
// ... seed the queue ...
// resolve when nothing is queued or running:
await queue.onIdle;
onIdle() is the library’s termination detection: it resolves when the queue is empty and no task is in flight. Same idea as muffet’s WaitGroup and LinkChecker’s join(), just expressed as a promise and backed by a single-threaded runtime, so no Mutex is needed to protect the visited set.
When crawling, crawl() GETs the page, extracts links, and for each new URL re-enters the queue (src/index.ts):
const inCache = options.cache.hasresult.url.href;
if!inCache {
// Mark visited...
options.cache.addresult.url.href;
// Create the promise for this check
const checkPromise =async () => {
await this.crawl{ result.url, /* ... */ };
};
// Store the promise.
// Another page discovering the same URL can wait on this promise
// instead of enqueuing a duplicate check.
options.pendingChecks.setresult.url.href, checkPromise;
// Enqueue...
options.queue.add() => checkPromise;
}
Because JavaScript is single-threaded, the entire thing executes without interruption. In Rust or Go, that’s a critical section you must guard with a mutex (and get the ordering right); in Node it’s just three statements. This is the single biggest reason recursion is easier in Node than in Rust. It’s just a language feature.
linkinator also keeps a relationshipCache of `${url}|${parent}` keys, and a pendingChecks map so it can wait on an in-flight check and still report a duplicate broken link against every parent that references it. Those reuse-operations are themselves pushed onto the same queue, so onIdle() correctly waits for them too.
linkinator uses HEAD for leaf links but GET when it needs to crawl, because recursion needs the response body to find more links:
response = await makeRequest
options.crawl ? 'GET' : 'HEAD',
options.url.href, /* ... */
;
This is precisely lychee’s remaining open problem: you can only recurse into pages you fetched with a body. linkinator just always GETs when crawling; lychee plans to reuse the body it already has in cache from the check it just performed.
results array, cache, and relationshipCache all grow with the crawl. Fine for a docs site, heavy for a giant one.delayCache that backs off per host on a 429 with Retry-After, but no general per-host concurrency cap like lychee’s HostPool. linkinator can hammer a host until it complains; lychee now paces before the complaint.EventEmitter (on('link'), on('pagestart'), and so on), so it’s embeddable and scriptable, which is nice. It’s a library first, like lychee.Takeaways: linkinator
queue.onIdle() is the termination mechanism. Simple and provided by the JS runtime.HostPool aims higher, at the cost of more machinery.broken-link-checker (BLC) takes the event-driven model furthest. It’s built on limited-request-queue, a queue with maxSockets (concurrency) and rateLimit, and it nests two of them: a site-level queue feeding a page-level HtmlUrlChecker.
The frontier and dedup live in SiteChecker (lib/public/SiteChecker.js). Visited pages are tracked in a URLCache, written at enqueue time:
#enqueuePageurl, customData, auth {
// Mark before crawl to avoid links to self within page.
this.#sitePagesChecked.seturl, PAGE_WAS_CHECKED;
this.#htmlUrlChecker.enqueueurl, customData, auth;
}
Recursion is governed by a filter that decides whether a discovered link becomes a crawled page:
#maybeEnqueuePagelink, customData, auth {
const tagGroup = this.#options.tags.recursive
this.#options.filterLevel
link.getHTML_TAG_NAME ?? {};
const attrSupported = link.getHTML_ATTR_NAME in tagGroup;
if!attrSupported ||
link.getIS_BROKEN ||
!link.getIS_INTERNAL ||
this.#sitePagesChecked.hasrebasedURL || // dedup check
!this.#isAllowedlink { // robots.txt
// do nothing
} else ifthis.#options.includePagerebasedURL {
this.#enqueuePagerebasedURL, customData, auth;
}
}
BLC has no counter and no onIdle(). It rides the queue’s drain events. When the page-level queue empties it fires END_EVENT, which makes SiteChecker emit SITE_EVENT and call the site queue’s done callback; when the site queue drains, it fires REQUEST_QUEUE_END_EVENT. That’s the public END_EVENT:
.onEND_EVENT, () => {
this.emitSITE_EVENT, this.#currentPageError, this.#currentSiteURL, this.#currentCustomData;
this.#currentDone; // tell the site queue this site is finished
};
That’s their termination detection, expressed as “the request queue reported empty.”
And in classic Node.js fashion, the done callback is what actually tells the site queue to free up a slot for another site. So the termination of one site is what allows another to start, and the termination of the whole crawl is what allows the process to exit. It’s a cascade of events that propagates from the page queue to the site queue to the process.
getRobotsTxt, isAllowed), rel=nofollow is respected, and rateLimit plus maxSockets are first-class. This is a crawler that’s polite by default.await queue.onIdle(). This is the JS cousin of the “leaky abstraction” problem I described, where recursion-awareness ends up sprinkled across many handlers.URLCache per site.Takeaways: broken-link-checker
rateLimit, and maxSockets make it the most server-friendly recursive checker by default.Our README marks markdown-link-check as supporting recursion, but there’s some nuance there: it recurses over Markdown files, not by spidering a live website. There’s no HTTP frontier and no termination problem in the sense above. Worth a mention so the comparison is honest, not worth a teardown.
If you want to see the pattern at full industrial scale, look at Scrapy (Python/Twisted) or Colly (Go). Both use the same approach: a scheduler (frontier) with a pluggable, optionally disk-backed queue, a dupefilter (often a Bloom filter rather than a HashSet), a bounded downloader pool, and explicit “engine idle → close spider” termination. They solve exactly the problems lychee struggled with (distributed termination detection, backpressure, dedup), just with years of dedicated crawler engineering behind them. The takeaway isn’t “lychee should be Scrapy”: it’s that crawling is a well-trodden architecture, and lychee is simply standing on a different one right now.
| Tool | Lang / runtime | Concurrency model | Frontier | “Done?” signal | Dedup point | Per-host limiting |
|---|---|---|---|---|---|---|
| muffet | Go, goroutines | goroutine pool + semaphore + host throttler | mutex-guarded set + daemon channel | sync.WaitGroup |
visited set at enqueue | host throttler pool |
| LinkChecker | Python, threads | fixed blocking thread pool |
unbounded UrlQueue
|
joinable-queue counter (join()) |
result cache at put()
|
wait_for_host (req/s) |
| linkinator | Node, event loop | single-thread + p-queue (concurrency) |
p-queue | queue.onIdle() |
Set at enqueue (race-free) |
reactive 429 delayCache
|
| broken-link-checker | Node, event loop |
limited-request-queue (maxSockets) |
nested request queues | queue-drain events |
URLCache at enqueue |
maxSockets + rateLimit
|
| lychee (2026) | Rust, Tokio | tasks + HostPool
|
channels + WaitGroup
|
WaitGroup |
HostPool active_requests
|
HostPool per-host pool |
lychee in 2026 finally has a column-for-column match. The WaitGroup is muffet’s sync.WaitGroup and LinkChecker’s join(). The HostPool is BLC’s rateLimit/maxSockets and LinkChecker’s wait_for_host. The per-URI active_requests mutex is everyone’s enqueue-time dedup.
Three reasons, in increasing order of how much they’re actually lychee’s fault.
They started as crawlers; lychee started as a stream.
Every tool above has a back-edge in its core data structure. lychee’s core was a DAG optimized for the 99% case (a list of files/URLs, checked once, fast). Retrofitting a cycle onto a pipeline is much harder than having one from the start. The problem is architectural in nature.
The frontier and the rate-limiter must be different objects.
muffet (set + semaphore), LinkChecker (unbounded queue + thread count), linkinator (p-queue + delayCache), BLC (request queue + maxSockets) all keep “what to do next” separate from “how fast to go.” lychee’s early attempts tried to make one bounded channel serve both roles, and a cycle through a bounded channel deadlocks. The fix (lychee’s HostPool plus a WaitGroup over an unbounded work source) is the same separation we’re aiming for now.
Single-threaded runtimes get dedup for free.
Both Node tools dedup with a plain Set and zero locking, because the event loop serializes access. Go and Python pay a mutex. Rust pays a mutex and fights the borrow checker about who owns the shared state across tokio::spawn. That’s the ~30% “Rust tax” I estimated last time: not the algorithm, but the friction of expressing shared mutable frontier state under Send + 'static.
None of this is a knock on lychee’s design. A unidirectional stream is the right call for the common, non-recursive case: it’s why lychee is fast and why the 30% channel regression from Attempt 2 was a dealbreaker. The other tools pay for their back-edge on every run, recursive or not. lychee refused to, and that principle is exactly why recursion took five years and why, when it lands, it won’t slow down the path everyone actually uses. I believe that we can have our cake and eat it too: a crawler architecture that supports recursion without sacrificing the speed of a one-shot pipeline. But it’s a harder problem than just “copy what they do,” because most link checkers didn’t start with uncompromising performance as their top goal.
Key takeaways
sync.WaitGroup (muffet), joinable-queue counter (LinkChecker), queue.onIdle() (linkinator), queue-drain events (BLC), WaitGroup (lychee 2026). All of them are distributed termination detection.WaitGroup make termination trivial at the cost of a runtime; Rust gives you neither for free but hands you a compiler that refuses to let the races compile and you can get the network card to glow if you know exactly what you are doing.So when someone asks “how do other link checkers do recursion?”, the real answer is: they made it a part of the architecture from the beginning, and they leaned on a runtime (providing conveniences like a WaitGroup, a joinable queue, an idle promise) that solved termination without solving “distributed termination detection.”
Thanks to the maintainers of muffet, LinkChecker, linkinator, and broken-link-checker: reading your source is the clearest way to learn about crawler architecture out there and we’re all in this together, just with a different set of tradeoffs.
2026-05-31 08:00:00
Recursion has been lychee’s longest-standing open issue. It’s been sitting there, unresolved, for over five years now.
If you haven’t come across it before, lychee is a fast, async link checker written in Rust (BTW). You point it at your website, your docs, your README, your Markdown files.
I started it in 2020 because I got bored at home. By now, around 40k GitHub repositories depend on it. Google, AWS, Microsoft, Cloudflare, and many others use it to check links in their documentation.
I gave talks and podcasts about it, in case you’d like to learn more.
lychee got funded by NLnet through their NGI Zero program for open, trustworthy infrastructure.
That funding allowed us to spend serious, focused time on the project instead of coding late at night.1 The funding is now coming to an end, which feels like the right moment to write this post.
And the most honest thing I can say is this: the single most requested feature, recursion, still isn’t shipped. :,( But there are good reasons! Of course, the gist is “it’s hard,” but let’s go deeper than that.
On December 14, 2020, a user named @styfle opened issue #78:

Very reasonable! At that point, lychee was already a fast, concurrent link checker with a lot of features. Surely adding a little --recursive flag to follow links within a domain could be done in an honest day’s work, no?
But five years, four serious implementation attempts, and several abandoned pull requests later, recursion still isn’t merged. The issue is tagged for the v1.0 milestone and we still want to ship it before that. But somewhere along the way it became lychee’s white whale.
To understand why recursion is so difficult to add, you need to understand how lychee processes things. Here’s the flow from back in late 2020:
Basically one big pipeline, from input URLs over link extraction, to link checking, to output formatting.
When @styfle opened the issue, I spotted the core problem almost immediately:
There is no connection back to the extractor.
That missing feedback loop (from checked responses back to the input queue) is the whole problem in a nutshell. lychee’s pipeline was designed as a one-shot, unidirectional flow: inputs go in one end, results come out the other, and the program stops when the input stream stops. Recursion needs a cycle: responses have to be able to create new inputs. And cycles in async, channel-based pipelines are where the dragons live. 🐲
I knew this on day one. I just badly underestimated how many ways we’d find to get the cycle wrong.
My first attempt was deliberately small. I didn’t want to rearchitect anything; I just wanted recursion to work!
So I added the handling directly in main.rs. The idea was:
completed == total.I added a recurse() function that called collector::collect_links() on successful responses, spawned a task to send the new requests into the channel, and returned how many new requests it created. A plain HashSet<String> acted as a “seen” cache so I wouldn’t re-check the same URL twice.
On top of that:
recursion_level field on the Request and Response structs--recursive / -r flag--depth option for the maximum recursion depthStraightforward, right?
The program wouldn’t terminate.
The termination logic was a while curr < total_requests loop:
let mut curr = 0;
while curr < total_requests {
curr += 1;
let response = recv_resp.recv().await.context("Receive channel closed")?;
// ... process response, potentially incrementing total_requests
}
When responses arrive and generate new requests, total_requests goes up.
So far so good. But extraction, sending, and receiving all happen concurrently across different tasks, so the count can get out of sync.
I wasn’t happy about it even at the time:
TBH I’m not super happy with the current impl anymore as I count the links in the queue and then close the channel after all links got checked. It can lead to subtle bugs I think. There must be a better way.
Yes, Matthias from the past, the counter is fragile because:
total_requests can be bumped after the loop has already decided to exit.@pawroman gave me a genuinely thorough review here, including a careful analysis of memory usage for the HashSet cache (fine for up to millions of links), a suggestion to use signed depth values to express infinite recursion, and a nudge for integration tests. It was good feedback. It just couldn’t fix the thing that was actually wrong, which was the whole approach to termination.
In September 2021 we decided to do a bigger rewrite: a stream-based architecture (PR #330) to improve concurrency. It changed Collector::collect_links from returning a Vec to returning a Stream, removed the ClientPool abstraction, and reshaped how tasks talked to each other. That was a great improvement as it meant that the collector was lazy and we wouldn’t allocate big Vecs of requests anymore. But it also meant that the recursion branch was borked and got its rug pulled from underneath.
Will put this on hold once again as we started implementing a stream-based approach in #330, which might supersede this branch soon. Sorry to everyone waiting on recursion support to land, but I’d like to get this right instead of merging a buggy solution prematurely.
PR #165 was closed in December 2021. The stream refactor landed and gave us a 35–50% speedup. Nice! Tradeoffs, I guess.
Takeaways
And one honest aside on the language question, because I get asked it a lot: the counting problem here is not Rust’s fault. A Go version with goroutines and channels, or a Python asyncio version, would hit the same off-by-one bugs. The race between “response processed” and “new requests discovered” is inherent to any concurrent recursive crawler. Rust’s Stream trait and the way it plays with ownership made a streaming architecture feel natural, and that’s what invalidated the work. So that’s perhaps a Rust-specific point.
Now that the stream architecture was in place, I took another stab at it. This time, instead of counting requests by hand, I’d feed discovered URLs back through a channel connected to the collector.
The collector would read from an input channel and turn what it received into a stream of requests. Recursion would just mean sending newly discovered URLs into that channel. (Look, a feedback loop!) The stream would close naturally when the channel closed.
I also played with unifying the input type so one method could take either a Vec or a Stream:
pub enum InputType {
Stream(Pin<Box<dyn Stream<Item = Input>>>),
Seq(Vec<Input>),
}
It hung. Again. But for a completely different reason this time.
The feedback loop created a circular dependency:
Do you see the problem?
For the collector’s stream to end, the input channel has to close. For the channel to close, all senders have to be dropped. But the recursion handler holds a sender; it needs one to push discovered URLs back. And the recursion handler only stops when there are no more responses, which only happens when there are no more requests, which only happens when the collector’s stream ends. Another circular dependency causing a deadlock.
I said as much at the time:
I had very little time to look at the issue so far, but it hangs because the input channel does not get dropped, leading to a dangling connection. I thought that the channel would be closed (and dropped) automatically once
futures::StreamExt::for_each_concurrentfinishes.
@untitaker confirmed it and could reproduce the deadlock in even trivial cases:
You want to drop the
senderonce there’s nothing to process anymore right? But won’tfor_each_concurrenthang forever because you didn’t do that yet? (and can’t, because you need the sender for more cloning)
I can repro a deadlock even with
time lychee --offline -b . '**/*.htm*' -T1on an empty directory.
This is the heart of using channels for cyclic data flow: channels use sender-drop as their termination signal, but in a cycle you can never drop all the senders, because each stage needs to hold one to keep the cycle alive.
I took the problem to the Tokio Discord, and the advice that came back was: “Stop using channels for this. Use semaphores with tokio::spawn instead.”
Even ignoring the deadlock, there was a second issue. The new from_chan method benchmarked roughly 30% slower than the existing from method. The extra channel indirection cost something, and it cost it even in the non-recursive case, which is the case basically everyone uses.
Takeaways
for_each_concurrent looks perfect and isn’t. It processes a stream concurrently but gives you no way to feed items back in.The channel-cycle deadlock is inherent to any channel-based system. Go channels have the same problem. Closing one means knowing nobody will send again, and a cycle makes that impossible. Erlang/OTP sidesteps it with process monitoring instead of channel semantics. The 30% regression, though, has a Rust angle. Rust’s zero-cost-abstraction culture means people (me included) expect to pay nothing for features they don’t use. In a runtime-heavy language, a 30% regression on an unused path might slide. In Rust, “you don’t pay for what you don’t use” is practically a moral position, and it made that regression a non-starter for me.
I dropped channels for the recursion loop entirely and reached for:
Arc<Semaphore> to cap concurrency (replacing the channel’s natural backpressure)tokio::spawn for each unit of work (replacing for_each_concurrent)OwnedSemaphorePermit handed to each task, so work could be “transferred” when spawning a recursive sub-taskThe prototype was pretty clean, honestly:
const MAX_CONCURRENCY: usize = 10;
fn recurse(permit: OwnedSemaphorePermit, i: usize) -> JoinHandle<()> {
tokio::spawn(async move {
handle_input(permit, i).await;
})
}
async fn handle_input(permit: OwnedSemaphorePermit, i: usize) {
println!("got = {i}");
if i % 9 == 0 {
recurse(permit, 10).await.unwrap();
}
}
But I guess you can tell what the problem with it was: it still locked up.
When I tried to bring this model into the real codebase, the ownership requirements got ugly fast. The link checker needs the client config, the cache, the progress bar, the stats, and a handful of other things. To share all of that across spawned tasks, it all wanted to be wrapped in Arc<RwLock<State>>.
I tried this model on the branch, but it gets quite ugly because of ownership and Send.
A semaphore solves the concurrency-limiting problem. It does nothing for the termination problem. With tokio::spawn, there’s no built-in way to know when all spawned tasks — including the ones spawned recursively — have finished. You’d need a separate coordination mechanism, which is to say: you’d be reinventing the counter from Attempt 1, except now spread across an unbounded number of spawned tasks. We’d come full circle to the very thing I was trying to escape.
There’s a subtlety with the permits, too. Swapping for_each_concurrent for raw tokio::spawn loses the bounded concurrency that channels gave us for free. The semaphore adds it back, but you have to manage permits carefully. If a task acquires a permit, spawns a child, and transfers the permit, the parent can’t do more work. If it clones the permit, you can blow past your concurrency limit. Getting the permit lifecycle exactly right is fiddly.
Takeaways
Arc<RwLock<State>> is a code smell in async Rust. When you start wrapping everything in locks, you’re fighting the ownership model instead of working with it. That can leave a lot of performance on the table since every access is a lock acquisition across all threads.This was the most Rust-specific failure of the bunch.
The semaphore approach is idiomatic in Go. A sync.WaitGroup plus a semaphore channel, with state shared across goroutines via sync.Mutex is how you’d do that in Golang because it has green threads and a runtime that manages goroutine lifecycles for you.
But in Rust, the Send + 'static bounds on tokio::spawn, the borrow checker’s aversion to shared mutable state, and the cost of Arc<RwLock<T>> get in the way. Rust made the “just wrap everything in Arc and Mutex” escape hatch painful enough that it became a dead end.
For more than two years, the recursion issue kept collecting comments from people who wanted it.
People suggested workarounds (piping sitemap URLs through xargs was a popular one). The person who originally filed it built their own tool and moved on, which I completely understood.
I was honest about it whenever it came up:
Someone offered a €100 bounty. Others pointed to muffet, which already does recursive checking. lychee wasn’t standing still during these years; a lot of work went into performance, caching, rate limiting, and other features. But recursion was the elephant in the room.
In late 2024, a community contributor, @gwennlbh, picked up the gauntlet. Her plan went back to the channel-based model but with a twist: instead of trying to close channels for termination, she used an Arc<AtomicUsize> counter. Like Attempt 1, but atomic and shared across tasks!
And it looked so elegant:
Arc<AtomicUsize> to track remaining work — increment when new requests are sent (recursive ones included), decrement when a response is processed, and break out of the receive loop when it hits zero.This was the most functional attempt yet. It actually worked on real websites:
lychee -R https://endler.dev \
--recursed-domains endler.dev
I was really excited watching it come together, and I tried to give useful design guidance along the way:
lychee-lib’s public API acceptedAnd then it hit the same wall, from several directions at once.
When recursion discovered a lot of links, the response handler tried to send new requests into the request channel. But if that channel was full (bounded by max_concurrency), the send blocked. A blocked response handler means no responses get processed, which means no request slots free up. Classic backpressure deadlock.
@gwennlbh worked around it by spawning the “send new requests” work in a separate tokio::spawn, decoupling response processing from request sending. It worked, but it meant there was no longer a limit on how many of these background tasks could pile up (and with that, use unbounded memory).
Because requests are processed in parallel, the same URL could be discovered by multiple pages and sent into the channel before any of them got cached. The cache check happened too late: after the request was already in flight. There was no per-URL synchronization to stop concurrent duplicates:
Because of the parallel nature of the request-to-response task, it seems to me that sending the same request twice to the channel is hard to prevent. I tried adding guards basically everywhere […] and I still seem to get duplicates.
As a stopgap, a dedup check went into Stats::insert, but that only stopped duplicate reporting, not duplicate checking. The real fix would arrive much later, with the HostPool’s per-URI active_requests mutex, but that machinery didn’t exist yet.
The Arc<AtomicUsize> counter is, at heart, the same idea as Attempt 1 — and it brought the same fragility. With Ordering::Relaxed (the weakest memory ordering), increments and decrements across threads could be reordered, so the counter could briefly read zero before the work was actually done. On Wikipedia with --max-depth=0, it would lock up on the very last URL.
Adding subsequent_uris (the list of discovered links) to the Response type meant touching nearly every file that builds or consumes a Response. Every Response::new() call needed two new arguments (vec![] and 0 for the non-recursive case).
To extract links from response bodies, the code built a fresh Collector inline in the checker, sidestepping the configured collector that respects user flags like --exclude, --include, and fragment checking.
After a burst of energy in January 2025, things slowed. Merge conflicts piled up. CI linting rules changed underneath the branch. @gwennlbh switched to Windows and couldn’t get the OpenSSL dependency to build. In March 2025 she wrote, honestly:
even though I was kinda denying it, it’s pretty clear that I’ve lost motivation to keep working on this […] I’m sorry T_T
I didn’t want her to apologize. She got further than anyone, on a hard feature, in a complex async codebase, as a volunteer. Instead, I’m grateful for the time she invested to push tings forward.
Takeaways
vec![] and 0 to every Response::new() call, that’s a leaky abstraction.How much of the issues were Rust-specific?
I’d say around half.
The backpressure is simply part of the problem space.
Any concurrent crawler in any language meets that.
The Ordering::Relaxed trap is somewhat Rust-specific in that Rust makes you choose a memory ordering (Go’s sync/atomic does too, but most Go folks reach for sync.WaitGroup instead).
Four attempts in five years. If we take a step back, I think the difficulties can be grouped into a few categories:
Every implementation faced the same question: how do you know when you’re finished?
In a non-recursive pipeline the answer is easy. You’re done when the input stream is exhausted and the in-flight requests have completed. Close the channel sender, drain the receiver, and Bob’s your uncle.
In a recursive pipeline the input stream is never truly exhausted, because every response might create new inputs. You need a separate way to detect quiescence: the state where nothing is in progress and nothing new will be generated.
Turns out, the problem has a name in distributed systems: ✨ distributed termination detection. ✨
The classic solutions (Dijkstra–Scholten, token passing) just don’t map well onto Tokio’s channel-based world.
lychee’s architecture is fundamentally a DAG. Inputs flow one direction through the stages. Recursion introduces a cycle. And cycles in channel-based systems deadlock, because channels use “all senders dropped” as their done signal, and in a cycle that condition is never met on its own.
Bounded channels give you natural backpressure: if the checker is slow, the sender blocks until there’s room. Which is lovely, until you want recursion. Now the response handler needs to send into the request channel. If that channel is full, the response handler blocks; if it blocks, no responses are consumed; if no responses are consumed, no request slots free up.
We check links concurrently, which means multiple pages can hold the same link. Without synchronization, several tasks discover the same URL and submit it before any of them can mark it “seen.” Through attempts 1–4 the cache didn’t save us, because cache entries were written after checking, not before submission.
Recursion-awareness wants to live “everywhere.” Responses need to carry discovered links, Requests need a depth, the collector needs to understand recursive inputs, stats and formatters need to handle duplicates.
I think this is the question people reading my blog really want answered, so let me be direct. My honest estimate is… about 30%? The termination problem, the cycle problem, and the backpressure problem are all just part of the problem space. Any concurrent recursive crawler, be it written in Go, Python, Java, or Erlang, has to solve that. At some point, Scrapy, Colly, and the other mature crawling frameworks all had to do distributed termination detection and backpressure management.
What Rust adds is friction at the implementation level:
Send bounds make it harder to share state across spawned tasks. In Go you capture variables in a goroutine closure and move on. In Rust everything in async-land wants to be Arc-wrapped and Send + 'static.context.Context gives you an orthogonal cancellation mechanism that Tokio channels don’t natively have. (In Tokio, you’d use a CancellationToken for that.)But on the other side, Rust also prevented a lot of issues:
Put another way, Rust made the wrong approaches fail loudly and painfully e.g. with compiler errors (but also deadlocks in tests) and made the right approach more solid and ergonomic.
Despite all the failed attempts, the ground has quietly shifted under this problem in 2025–2026. A bunch of work, most of it not even about recursion, has made a real implementation finally look within reach.
Recursion without rate limiting is dangerous. Gwenn found that out firsthand by accidentally DDoS’ing their own WiFi router while recursively checking Wikipedia. 😬 Per-host rate limiting, which got merged in PR #1929, makes recursive crawling respect server limits. I previously waved this off as “out of scope” but it’s super important in practice.
The underlying issue (#1605) was one I opened on January 6, 2025 — the same week PR #1603 (Attempt 4) opened. That timing was no accident. The moment we tried recursion for real, the lack of per-host rate limiting showed up as a glaring gap. It caused concurrent requests to the same host to throw 429s, the cache to be ineffective under high concurrency due to races (issue #1593), and global concurrency settings being too coarse for a workload spread across many hosts at once.
The fix introduced a HostPool, which is a per-host request queue with configurable rate limits, delays, and concurrent-request caps.
Each host gets its own bucket with its own settings, configurable via lychee.toml:
[hosts."github.com"]
max_concurrent_requests = 10
request_delay = "100ms"
The HostPool would later become a central abstraction. It’s the very same HostPool that PR #2100 reused to unify input fetching with link checking, which means it’s now the single entrypoint that all HTTP requests flow through.
It’s important for recursion because the HostPool gives us per-host rate limiting, deduplication (via each Host’s per-URI active_requests mutex and HostCache), and caching at the right granularity, which lets recursive crawling stay a good web citizen (respecting rate-limit headers, backing off on 429s).
The single most important recent thing is the WaitGroup primitive, contributed by Kait and merged in PR #2046.
It is one step towards solving the termination problem.
WaitGroup is a mechanism for waiting on a dynamic set of tasks that can themselves spawn more tasks. It’s two pieces:
WaitGroup, a single waiter that fires when all the work is done.WaitGuard, a cloneable guard held by each task. When the last guard is dropped, the waiter completes.The key move is that a WaitGuard can be cloned. A task can spawn sub-tasks (recursion!) while preserving the invariant that the WaitGroup only completes once every guard — including the ones held by recursive sub-tasks — has been dropped.
That cleanly solves the termination problem:
let (waiter, guard) = WaitGroup::new();
// Each request carries a guard clone
send_req.send((guard.clone(), request)).await;
// In the response handler, if recursing:
// the guard is cloned for each new request
for new_request in discovered_links {
send_req.send((guard.clone(), new_request)).await;
}
// The original guard is dropped when the response is fully processed.
// When ALL guards are dropped (no more work), waiter.wait() returns.
It’s already wired into lychee’s main check loop. The collect_responses function uses take_until(waiter.wait()) to stop receiving when the work is done. There’s even a comment in the current code anticipating exactly this:
// unused for now, but will be used for recursion eventually. by holding
// an extra `send_req` endpoint, we prevent the natural termination when
// each channel finishes and closes. instead, we rely on the WaitGroup to
// break the cyclic channels.
let _ = send_req;
That’s the missing piece that our previous attempts lacked.
PR #2100 unified input URL fetching with the link checker’s HostPool. Before this, CLI input URLs went through a separate reqwest::Client that didn’t share config (user-agent, rate limiting, TLS settings) with the checker. That caused real bugs (Wikipedia returning 403 for input URLs because no user-agent was set).
After it, input fetching and link checking go through the same pool. For recursion this matters because recursively discovered pages need to be fetched and parsed, and they should use the same client config as everything else.
Sitemap support is a partial solution to a lot of recursion use cases. By parsing sitemap.xml, lychee can discover every page on a site without crawling recursively at all. It’s not a replacement for true recursion (it doesn’t help sites without sitemaps, and it won’t find dynamically linked pages), but it unblocks a lot of use-cases.
With all that in place, here’s what’s left. The striking part is how much of it is already done:
WaitGroup.Once those are in, the actual recursion is just a handful of lines. When a checked page is on an allowed domain and under the depth limit, grab its content from cache, pull out the links, and send them back through the same pipeline as fresh requests:
if recursive && is_same_domain(&response, &recursion_domains) && depth < max_depth {
let content = resolver.url_contents(response.url()).await?; // cache hit
let links = extractor.extract(&content);
for req in request::create(links, ...) {
send_req.send((guard.clone(), Ok(req))).await;
}
}
The hard parts (knowing when to stop, not deadlocking, not flooding a server) are already solved by work that was never about recursion in the first place. Recursion becomes a by-product of good architecture, not a special case bolted onto a pipeline that was never built for it.
For a long time I told myself we’d failed. Four attempts, five years, seemingly nothing shipped.
But writing it all out changed how I see it. Every attempt hit some mix of channel termination semantics, backpressure deadlocks, ownership ergonomics, and distributed termination detection. None of those are lychee problems. They’re hard concurrent-systems problems. We just lacked the vocabulary to talk about them, and while I wasn’t looking, those primitives got built. Sometimes the most important code you write for a feature is the code that never mentions the feature at all.
So no, I don’t think we failed. We made progress by stumbling into the right direction.
Thanks to NLnet for funding the work on lychee, and to everyone who contributed to the recursion effort over the years, whether in code, design feedback, or moral support. It’s been a long road, but we’re closer than ever to the finish line.
Well, to be fair, I still code late at night. But that’s just how I’m wired. ↩
2026-05-21 08:00:00
I had a newsletter on this blog for years, but I didn’t send a single email for a long time. This is the story of how I finally got it back up and running, and what I learned along the way.
A quick note up front, because this caused some confusion: by “hosting my own” I mean I don’t use a newsletter platform. The signup backend and the CLI I use to send issues are mine, and the issues themselves are just markdown files in a git repo. I still use Plunk as the sending backend (so SES, bounces, suppression lists, and unsubscribe pages aren’t my problem). Plunk is open source and I could self-host it, but the deliverability side has enough edge cases that I’m happy to pay someone else to run it. 🙃

For years my setup was a small form on the website pointing at Tinyletter, a small newsletter service that was focused on writers. What I liked about it was the simplicity. I never had to think about email deliverability, bounce rates, suppression lists, SPF, DKIM, DMARC, or any of that. I wrote a thing, hit send, people got it.

It just worked. Then Tinyletter shut down.
A bit of history: Tinyletter was built in 2010 by Philip Kaplan, reportedly coded on a single Sunday, the 31st of October, 2010.
It got acquired by Mailchimp one year later, and quietly became the de facto home for writers who wanted a personal newsletter without thinking about funnels, segments, or A/B tests.
Then in late 2023, Mailchimp (now part of Intuit) announced they’d shut it down. The official wording was that their “business priorities have evolved” and that they were “laser focused on building tools to serve marketers and help small businesses grow.” Writers were never their core customers.

Just before Tinyletter went dark on February 29, 2024, I made a final backup of my subscriber list, but I didn’t have a plan for what to do with it.
At this point, I became hostile to the idea of using a third-party service. The same story could repeat itself again.
I still looked at all options and bounced off all of them:
People kept asking me when the newsletter was coming back, so I cobbled something together on fly.io. It was a small Rust API, a CSV file with subscribers, and a way to subscribe through the website. The idea was to deal with the sending later, but at least offer a way to sign up for now.
Then the list just sat there.
Turns out, a cold list is a problem all by itself. When you finally do send to a list of people who haven’t heard from you in a long time, mail providers get suspicious and you can get flagged as spam. Suddenly your own newsletter can turn against you.
This was the hardest part by far. I looked into Resend, Postmark, SendGrid, Mailgun, Amazon SES, and many more. All of them were either quite expensive for a small newsletter, had a terrible API, didn’t comply with GDPR regulations, or were way too complicated.
I was about to give up when I found Plunk. It is open source, the pricing scales with your list size, and the API doesn’t fight me. It does the deliverability work I don’t want to think about (SES integration, bounce handling, suppression list, hosted unsubscribe pages). I’m a paying customer now. I’m not affiliated, just a genuinely happy user.
I even sent them a small contribution and they merged it in ten minutes. This made me feel like I was actually part of a community.
The first real newsletter issue went out to a thousand-plus contacts that hadn’t heard from me in ages. I was bracing for a wave of bounces, but it went fine. Bounce rate around 1%, only very few unsubscribes, and no deliverability issues. Wow!
I didn’t do anything fancy: no batching, no slow warmup, no clever subject line. I sent it all at once and let Plunk (well, SES underneath) auto-prune obviously dead addresses via bounce handling. The one thing I did do was lead the first issue with a short, frank reintroduction – something like “hey, you signed up because you read a blog post of mine once, sorry for the silence” – which I think did most of the work in keeping unsubscribes low.
Cost-wise, one send to the full list costs me roughly $1. For a newsletter I send irregularly, that’s nothing.

I realized I could write issues as plain markdown files in a folder, version-controlled, with a small CLI for everything else. That’s where I feel at home. Just me, a cup of hot chocolate, my editor, the terminal, and git. No more web dashboard between me and the writing.
The whole thing lives in a single repo:
newsletter/
├── issues/ # one .md per edition (1.md, 2.md, ...)
├── send/ # the CLI I run locally
└── subscribe/ # tiny HTTP service behind the website signup form
The CLI is called send. Here’s what it can do:
$ send help
Usage: send <COMMAND>
Commands:
new Create a new issue file and open $EDITOR
list List local issues
lint Check links in an issue (or all issues)
test Send a test email to myself
publish Publish the issue to all subscribed contacts
status Show contact-list and deliverability report
prune Delete unsubscribed contacts
send publish 2 shows me a preview, the recipient count, and a y/N prompt before it actually fires anything off.
The subject line gets built automatically as corrode v0.N.0 # <topic> – semver-styled, with the major version stuck at 0 forever as a small joke about projects that never quite reach 1.0.
send status shows me per-campaign deliverability with bounce-rate cells colour-coded against the SES thresholds, plus daily bounces and unsubscribes, so I can spot trouble early.
send lint runs every link in an issue through lychee before I hit publish.
I am a lychee maintainer, so dogfooding it here was an obvious choice and a nice quality-of-life improvement over the old Tinyletter web editor, which had no link checking at all.
The signup form on the website POSTs to the tiny subscribe service, which runs on my server.
It validates the email, drops anything with the honeypot field filled in, and POSTs to Plunk with a subscribe-requested event.
Plunk creates the contact in the unsubscribed state and fires off the transactional confirmation email through its Action workflow.
Only when the recipient clicks the link does Plunk flip them to subscribed1.
No webhook back to my side, no callback, no JavaScript on the page.
I just push to git, my server detects the change, builds and runs the server crate, and the new version is live.
The running service takes absolutely no CPU or memory.
Plunk needs three things in DNS to send on my behalf: an SPF record (saying SES is allowed to send for the domain), a DKIM key (so SES can sign outgoing mail), and a return-path MX record (so bounces come back somewhere Plunk can read them). All three live under a subdomain. Don’t worry, Plunk tells you exactly how to set this up and you can copy-paste the records into your DNS provider’s dashboard.
The one thing worth not forgetting: do not add Plunk’s optional inbound MX at the apex of your domain. That would steal mail away from whoever currently handles your inbox (mailbox.org in my case), and replies stop landing where you expect.
I forgot that the From: address actually needs to be a real mailbox if you want replies to work.
The first issue went out as [email protected], which didn’t exist as a mailbox.
A kind reader (hey Kevin!) replied to say hi, his message bounced, and he forwarded the bounce notice back to me to let me know.
I created the alias on mailbox.org, and replies have landed in my inbox ever since.
While I was at it, I also collapsed my older endler.dev newsletter and the corrode.dev one into a single list. Both were always written by me, and running two parallel setups never really made sense. Same person on the keyboard, mostly overlapping audience, twice the maintenance.
The merge itself was uneventful: I had a CSV exported from Tinyletter (the original endler.dev list) and another from my fly.io service (the corrode.dev list I’d started collecting when corrode.dev launched). Same format. Both went into Plunk and deduplication was a non-issue. In the first issue I made the framing explicit (one newsletter for all my writing) so nobody had to guess what they were now signed up for. 2
Going forward, there’s just one newsletter. If any of this isn’t for you, you can always unsubscribe and never hear from me again. No hard feelings.
If you’ve been thinking about doing this yourself: do it. Self-hosting is genuinely easier than it used to be. There are great open source services for almost every piece now. In general, building small things yourself is one of the best ways to actually understand them and to keep owning the parts that matter. That would be its own blog post, so let me know if you want me to write it.
If you’d like a peek at the (somewhat hacky) repo, send me a mail and I’ll send you a link. It’s really not that interesting, but if you’re curious about how it works, I’m happy to share. Or wait until I clean it up a bit and open source it properly, which will just take me another few years to get around to it.
And the best part is that you can now test my setup by filling out the form below and subscribing to the newsletter!
2026-05-01 08:00:00
I’ve been refining this blog’s design for two decades now. With each new version, I get a little better at knowing what I want.
Turns out, my designs tend to get simpler with time. At this point, it’s all typography and negative space. I got better at knowing what I value and what I can take away.
Uncluttering brings joy. Maybe that’s a deeply human thing?




Spend enough time with anything, and you’ll develop strong opinions about it. You become obsessed with the details. And everything is somebody’s obsession.
Sneakers. Mechanical keyboards. Coffee.
Watch enough movies and you might start analyzing Dutch angles and obsessing over color grading. You start noticing subtle issues in pacing. You’ll recognize the same actor in different roles, even if they’re just a random extra.
Once you start seeing nuance, you can never unsee it. You’ve developed a strong personal preference.
But at this stage, you’re just a guy with an opinion.
Having preferences isn’t the same as having good taste.
To develop good taste, you have to recognize quality outside your own preferences. It’s seeing how something expresses an idea through deliberate choice, and being able to tell when those choices are honest.
Good taste is rooted in context. It requires understanding the history and craftsmanship behind a thing. Without that, quality flies right by you.
Good taste and expertise are siblings. Both come from caring deeply. Both require knowing why things work. Both demand awareness of the effect each decision has.
But they’re not the same. Taste is recognition. You can have taste in something you can’t make yourself. Expertise is production. You can be an expert in something whether you have taste or not.
What’s interesting is how often one drags the other along. People with taste care so much about the details that they eventually start making the thing themselves. And experts who care about their craft tend to develop taste as a byproduct.
Take Willie Nelson’s guitar, Trigger, a 1969 Martin N-20. Out of context, it’s just a worn-down chunk of wood. It could just as easily be a busker’s guitar. But after 10,000 performances, it has become a part of Willie Nelson.
He could have bought a new guitar a hundred times over. He didn’t. He developed such a strong taste for the sound he wanted that he refused to settle for anything less than Trigger. The same repairman has maintained it since 1977.
Willie demands control over his sound, because the details are what shape it. He didn’t do anything out of the ordinary. He just did the ordinary so well that it became extraordinary.
Is it Willie who shaped the guitar, or the guitar that rounded Willie’s sound? They aged together. They’ll die together.
If you watch an artist like him perform, it looks effortless. That’s because he knows what matters and what doesn’t. He focuses on the details that make a difference and ignores the rest, which makes the task simpler for him. That’s the trick: good taste tells you what to look for, and expertise teaches you how to achieve it — what to emphasize and what to leave out.
Buying an Apple device doesn’t mean you have good taste. Millions own them. For many, it’s just a status symbol. Apple helps you think different in the same way that Air Jordans help you jump higher. (I do love the posters, though.)
In fact, people with good taste often avoid brands. Brands are what kill great products.
The cycle goes like this:
Most products are built for the average customer, so most products settle for average quality. We can’t be experts in everything, so we trust what others say, then wonder why most things suck.
We sense when something’s sub-par, even if we can’t put our finger on it. That’s a little tragic.
Brands don’t care about quality or being honest. They care about money.
It’s harder to develop a unique taste today. We’re surrounded by feeds and algorithms that show us only what we already like. Without exposure to things we don’t like, we can’t discover the things we do.
If we get too comfortable, we stop developing taste at all. Our preferences become narrow and shallow. We just like what everyone else likes, with no way to say why.
The way out is to care about something. Anything! Care so much it becomes part of who you are.
Everyone should have at least one thing they irrationally obsess over. Something they know inside and out. It makes for a more interesting personality. A richer life.
Humans socialize over shared obsessions. We bond over espresso grinders and motorbikes. Suddenly, you can connect with a stranger.
When you create for others, your taste becomes visible. It shows who you are and what you care about. That’s scary. But it’s how you find your people.
I don’t think I will ever stop refining this blog’s design. Maybe I’m not an expert, just a guy with an opinion, but my blog is my zen garden. It brings me joy. Now go out and build your own.
2026-01-26 08:00:00
I was listening to a podcast recently where someone pointed out something curious: machines have been better at playing chess than humans for three decades now, and nobody cares. People still watch human chess players. If anything, chess is experiencing a renaissance.
I don’t know anyone who wants to watch robots play chess. But I know plenty of people who love watching humans play. I’m not a big chess player myself (heck, I could hardly win any game), but I catch myself watching matches on YouTube.
The reason is that the game is way more than moving pieces on a board. There’s a background story to every player and the match is a truly human experience where people don’t only compete against an opponent but also against themselves. That’s what makes it so interesting, not the fact that there’s an algorithm which has 3646 Elo. Humans can still contribute!
And I think it’s the same with writing.
Everyone wants to read personal thoughts from real human beings, but no one writes them anymore. What we get instead is slop, and that’s hardly a good read. The moment I notice I’m reading autogenerated text, I care less.
That’s why I keep writing. My personal blog is a weird mix of ramblings about reviewing code, the best programmers I know, and random thoughts.
But you know what? People are reading it and from time to time I get an email from someone who found one of my articles helpful.
Time and again, when I talk to friends, they share the same experience. My friend Thomas wrote his first article about his Experience with Atlassian and many people found it helpful and reached out. And even if no one does, writing is a joyful hobby and it helps me clear my thoughts.
I believe there has rarely been a better time to start writing.
2025-10-31 08:00:00
Over the years, I’ve gravitated toward two complementary ways to build robust software systems: building up and sanding down.
Building up means starting with a tiny core and gradually adding functionality. Sanding down means starting with a very rough idea and refining it over time.
Neither approach is inherently better; it’s almost a stylistic decision that depends on team dynamics and familiarity with the problem domain. On top of that, my thoughts on the topic are not particularly novel, but I wanted to summarize what I’ve learned over the years.

Building up focuses on creating a solid foundation first. I like to use it when working on systems I know well or when there is a clear specification I can refer to. For example, I use it for implementing protocols or when emulating hardware such as for my MOS 6502 emulator.
I prefer “building up” over “bottom-up” as the former evokes construction and upward growth. “Bottom-up” is more abstract and directional. Also “bottom-up” always felt like jargon while “building up” is more intuitive and very visual, so it could help communicate the idea to non-technical stakeholders.
There are a few rules I try to follow when building up:
When I collaborate with highly analytical people, this approach works well. People who have a background in formal methods or mathematics tend to think in terms of “building blocks” and proofs. I also found that functional programmers tend to prefer this approach.
In languages like Rust, the type system can help enforce invariants and make it easier to build up complex systems from simple components. Also, Rust’s trait system encourages composition, which aligns well with that line of thinking.
The downside of the “build up” approach is that you end up spending a lot of time on the foundational layers before you can see any tangible results. It can be slow to get to an MVP this way. Some people also find this approach too rigid and inflexible, as it can be hard to pivot or change direction once you’ve committed to a certain architecture.
For example, say you’re building a web framework. There are a ton of questions at the beginning of the project:
In a building-up approach, you would start by answering these questions and designing the core abstractions first. Foundational components like the request and response types, the router, and the middleware system are the backbone of the framework and have to be rock solid.
Only after you’ve pinned down the core data structures and their interactions would you move on to building the public API. This can lead to a very robust and well-designed system, but it can also take a long time to get there.
For instance, here is the Request struct from the popular http crate:
#[derive(Clone)]
pub struct Request<T> {
head: Parts,
body: T,
}
/// Component parts of an HTTP `Request`
///
/// The HTTP request head consists of a method, uri, version, and a set of
/// header fields.
#[derive(Clone)]
pub struct Parts {
/// The request's method
pub method: Method,
/// The request's URI
pub uri: Uri,
/// The request's version
pub version: Version,
/// The request's headers
pub headers: HeaderMap<HeaderValue>,
/// The request's extensions
pub extensions: Extensions,
_priv: (),
}
There are quite a few clever design decisions in this short piece of code:
Request struct is generic over the body type T, allowing for flexibility in how the body is represented (e.g., as a byte stream, a string, etc.).Parts struct is separated from the Request struct, allowing for easy access to the request metadata without needing to deal with the body.Extensions can be used to store extra data derived from the underlying protocol._priv: () field is a zero-sized type used to prevent external code from constructing Parts directly. It enforces the use of the provided constructors and ensures that the invariants of the Parts struct are maintained.With the exception of extensions, this design has stood the test of time. It has remained largely unchanged since the very first version in 2017.

The alternative approach, which I found to work equally well, is “sanding down.” In this approach, you start with a rough prototype (or vertical slice) and refine it over time. You “sand down” the rough edges over and over again, until you are happy with the result. It feels a bit like woodworking, where you start with a rough piece of wood and gradually refine it into a work of art. (Not that I have any idea what woodworking is like, but I imagine it’s something like that.)
Crucially, this is similar but not identical to prototyping. The difference is that you don’t plan on throwing away the code you write. Instead, you’re trying to exploit the iterative nature of the problem and purposefully work on “drafts” until you get to the final version. At any point in time you can stop and ship the current version if needed.
I find that this approach works well when working on creative projects which require experimentation and quick iteration. People with a background in game development or scripting languages tend to prefer this approach, as they are used to working in a more exploratory way.
When using this approach, I try to follow these rules:
This approach makes it easy to throw code away and try something new. I found that it can be frustrating for people who like to plan ahead and are very organized and methodical. The “chaos” seems to be off-putting for some people.
As an example, say you’re writing a game in Rust. You might want to tweak all aspects of the game and quickly iterate on the gameplay mechanics until they feel “just right.”
In order to do so, you might start with a skeleton of the game loop and nothing else. Then you add a player character that can move around the screen. You tweak the jump height and movement speed until it feels good. There is very little abstraction between you and the game logic at this point. You might have a lot of duplicated code and hardcoded values, but that’s okay for now. Once the core gameplay mechanics are pinned down, you can start refactoring the code.
I think Rust can get in the way if you use Bevy or other frameworks early on in the game design process. The entity component system can feel quite heavy and hinder rapid iteration. (At least that’s how I felt when I tried Bevy last time.)
I had a much better experience creating my own window and rendering loop using macroquad. Yes, the entire code was in one file and no, there were no tests. There also wasn’t any architecture to speak of.
And yet… working on the game felt amazing! I knew that I could always refactor the code later, but I wanted to stay in the moment and get the gameplay right first.
Here’s my game loop, which was extremely imperative and didn’t require learning a big framework to get started:
#[macroquad::main("Game")]
async fn main() {
let mut player = Player::new();
let input_handler = InputHandler::new();
clear_background(BLACK);
loop {
// Get inputs - only once per frame
let movement = input_handler.get_movement();
let action = input_handler.get_action();
// Update player with both movement and action inputs
player.update(&movement, &action, get_frame_time());
// Draw
player.draw();
next_frame().await
}
}
You don’t have to be a Rust expert to understand this code.
In every loop iteration, I simply:
It’s a very typical design for that type of work.
If I wanted to, I could now sand down the code and refactor it into a more modular design until it’s production-ready. I could introduce a “listener/callback” system to separate input handling from player logic or a scene graph to manage multiple game objects or an ontology system to manage game entities and their components. But why bother? For now, I care about the game mechanics, not the architecture.
Both variants can lead to correct, maintainable, and efficient systems. There is no better or worse approach.
I found that most people gravitate toward one approach or the other. However, it helps to be familiar with both approaches and know when to apply which mode. Choose wisely, because switching between the two approaches is quite tricky as you start from different ends of the problem.