MoreRSS

site iconTaylor TroeshModify

Author of essays on learning, time, design, and humor, shares insights through scrapscript and blogs.hn.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Taylor Troesh

Mercury Personal Banking: First Impressions

2026-03-03 08:00:00

I've been a reluctant Bank of America customer for over a decade. My parents chose BofA, so I chose BofA. Migrating to Chase or Wells Fargo is more of the same -- not worth the switching cost.

Am I really a "customer" when they charge -0.01% interest to hold my money?

There was a problem processing your request.

BofA is clunky. Their physical branches seem simultaneously overstaffed and understaffed. Everybody there is cordial yet confused. I would never visit their physical locations if their app worked, but alas, their app is crap. I cannot open/close accounts, I cannot reliably cash checks, I cannot easily transfer money -- the software might just be ornamental.

But it ain't 2010 anymore. We now have branchless banks like Ally, SoFi, and maybe even Robinhood. Online-only banking alternatives offer 3%+ APY in lieu of physical locations. According to science, paying region-locked human staffers to occupy an expensive retail space full of money costs a fortune.

Sometimes these banks are technically not banks -- they're "financial services companies with trusted banking partners".

I use Mercury for business banking. It's great. When I discovered that Mercury offers personal banking, I was cautiously optimistic. They built a successful B2B product, but companies usually botch expansions from B2B into B2C.

Oracle's graveyard of B2C products remains a trove of cautionary tales.

My wife and I opened a joint account in minutes. Mercury onboarded us individually and then instantly approved us. I transferred the money via their BofA Plaid integration -- no routing numbers needed, thank you sir. Smooth.

Bonus points: Mercury did not send me a trillion "PLEASE TAKE OUR SURVEY" emails.

I'm eager to test the following features after our money lands:

  • enforce spending limits on debit cards
  • automate stuff via the Mercury API
  • schedule monthly rent payments to our landlord
  • match receipts via email (for which I plan to automate forwarding from my FastMail account)
  • filter transactions by merchant, user, etc.
  • connect to the official Mercury MCP server

If you can survive without physical branches, consider parking your money in Mercury too.


Explore Chaotic Autonomous Systems In My Game Prototype

2026-03-02 08:00:00

I had a nifty game idea while Ivan Reese was trying to explain Death Stranding 2 to me. Here's the gist:

  • In the beginning, you control a lone robot in an isolated corner of a vast dark map.
  • Your robot randomly follows roads. It passes refineries, mines, and shops. You cannot instruct the robot where to go, but you can instruct it to grab/drop resources as it meanders by.
  • By transporting ore to refineries, and then trading refined metal at shops, you can purchase repair kits and other modules.
  • Repair kits allow you to revive broken robots and recruit them to your team.
  • Robots navigate the map according to their navigation module(s): some robots will follow mining routes, others will seek conflict, others will follow friends.
  • Reprogram robot behavior by equipping new modules. You can acquire new modules via scavenging, engineering, trade, and theft.
  • As you explore the map, you discover the Eastern forest kingdom and the Western seaside kingdom.
  • The roads connecting these kingdoms are brimming with activity as military skirmishes ramp up into full-blown war.
  • How will you participate in the conflict? Will you become a kingmaker? Broker peace? Rule the world?

In its current form, Last Mile is fun for almost five whole minutes. Try it yourself. There's definitely a great game somewhere in here, but I'm not sure if it's worth pursuing. I'd love to hear what y'all think! Feel free to email me at [email protected].

Lil' Fun Langs' Guts

2026-03-01 08:00:00

I'm still thinking about those lil' fun langs. How do they work? What's inside them? Do I need my pancreas? What if I don't want to normalize my IR? Is laziness a virtue?

Haskell-esque languages may look alike, but they differ across many dimensions:

Most implementations use standard compilation phases:

  1. Lexing: Source → Token stream
  2. Parsing: Tokens → Surface AST
  3. Desugaring: Surface AST → Core AST
  4. Type Inference: Core AST → Typed AST
  5. Pattern Match Compile: Typed AST → Case trees
  6. Normalization (ANF/K): Typed AST → Normalized IR
  7. Optimization: Normalized IR → Normalized IR
  8. Closure Conversion: Normalized IR → Closure-explicit IR
  9. Code Generation: Closure IR → Target (asm/bytecode/C/LLVM)
  10. Register Allocation: Virtual regs → Physical regs (if native)
  11. Runtime System: GC, primitives, entry point

Strict vs. Lazy

In strict evaluation, arguments are evaluated before being passed to a function. In lazy evaluation, arguments are only evaluated if their value is actually needed; the result is cached, so the work happens at most once.

-- lazy eval returns `3` without applying `foo`
length [ 1, foo 2, 4 ]
Aspect Strict (ML, OCaml) Lazy (Haskell)
Normalization ANF / K-normal form STG / thunks required
Closure conversion Standard flat closures Closures + thunks + update frames
Code generation Straightforward Requires eval/apply or push/enter
Memory management Values are always evaluated May contain unevaluated thunks
Tail calls Simple (jump) Complex (enters, updates)
Debugging Easy (call stack is meaningful) Hard (thunks obscure control flow)
Runtime complexity Simpler (~200 LOC C) More complex (~500–2000 LOC C)

Strict evaluation is the simple choice. If you want laziness, Peyton Jones's STG machine is the standard approach. MicroHs sidesteps the STG machine by compiling directly to combinatory logic with graph reduction.

Lazy evaluation also unlocks infinite collections — you can define an infinite list and consume only what you need.

Curried vs. Bland

Style Examples Implementation cost
Curried Haskell, Ben Lynn, MicroHs Free in combinator backends; native backends need arity analysis to avoid allocating a closure per argument
Bland MinCaml, OCaml (internally), Grace, EYG Simpler codegen -- multi-arg functions are just functions that take tuples or multiple params

In a curried language, f x y is ((f) x) y: two function applications. If your backend doesn't detect that f always takes two arguments (arity analysis), you pay for a heap allocation on every multi-argument call.

Bootstrapped vs. Hosted

I tried to teach myself to play the guitar. But I'm a horrible teacher — because I do not know how to play a guitar.

Mitch Hedberg

Most compilers are written in an existing language (e.g. C, Rust, Haskell, OCaml) and lean on that host's ecosystem for parsing libraries, build tools, and package management.

A bootstrapped compiler compiles itself. You write the compiler in the language it compiles, then use an earlier version of the compiler (or a minimal seed runtime) to build the next version. Your language becomes self-sustaining; the compiler is its own test suite.

There are many exemplary self-hosted languages to study:

  • MicroHs is a Haskell compiler that compiles Haskell to combinators. The combinator reducer is implemented in C. The compiler is written in Haskell and can compile itself. Bootstrapping requires only a C compiler -- no pre-existing Haskell installation.
  • Ben Lynn starts with a minimal runtime in C (~350 LOC), then constructs increasingly capable compilers, each written in the subset that the previous one can compile. Each stage is ~100–300 LOC of the language being defined. The total chain is ~2000 LOC + 350 LOC C.
  C runtime (350 LOC)
    → compiler₁: lambda calculus + integers
    → compiler₂: + let, letrec, ADTs
    → compiler₃: + type inference
    → compiler₄: + pattern matching
    → compiler₅: + type classes
    → ...
    → compilerₙ: near-Haskell-98
  • Newt is a dependently typed language whose compiler is written in Newt, targeting JavaScript. It bootstraps by keeping the generated JS checked in. This works best when your target is a high-level runtime (JS, JVM) rather than native code.

Interpreted vs. Compiled

An interpreter executes the program directly by walking its AST or stepping through bytecode. A compiler translates the program into another language (e.g. x86, C, JS) and lets that target handle execution.

The boundary here is blurry. Bytecode VMs compile to an intermediate form. "Transpilers" compile to source code rather than machine instructions.

Strategy Examples LOC estimate Trade-off
Tree-walking interpreter PLZoo poly, Eff, Frank, Grace, 1ML 50–200 Simplest. No codegen, no runtime. Slow (10–100× native)
Bytecode VM OCaml (ZINC), Tao, PLZoo miniml 200–500 Middle ground. Portable, reasonable speed. Write ~30–50 instructions
Native compilation MinCaml, mlml, AQaml 500–1500 Fast execution, but you own register allocation, calling conventions, ABI
Transpile to C Koka, Scrapscript, Chicken, Austral 200–500 Best of both worlds -- portable native speed, C compiler does the hard parts
Transpile to JS/Go Newt, SOSML, Borgo 200–400 Web/ecosystem deployment, but you inherit the target's performance model
Combinator reduction Ben Lynn, MicroHs 100–300 No closures, no registers. Graph reduction evaluator in C. Simple but slow

Lil' fun langs are usually interpreters. Without compilation, you can skip closure conversion, register allocation, and runtime systems. The leap from interpreter to compiler costs ~500–2000 LOC.

Nominal vs. Structural Types

type Meters  = Int
type Seconds = Int

-- Nominal:     Meters ≠ Seconds  (different names)
-- Structural:  Meters = Seconds  (same shape)
Style Examples Consequence
Nominal OCaml, Haskell, Austral Name is identity -- same shape doesn't mean same type
Structural EYG, Grace, TypeScript, Simple-sub Shape is identity -- same fields/variants means same type

Most ML-family languages are nominal for algebraic data types but structural for records (if implemented). Row polymorphism (EYG, Grace, Koka) is inherently structural -- it acts on "any record with at least these fields." Simple-sub goes further: union and intersection types, with principal inference intact.

Pretty vs. Ugly Errors

-- Ugly:
Error: type mismatch: int vs string

-- Pretty:
 3 │ let x = 1 + "hello"
   │             ^^^^^^^^
Error: I expected an `int` here, but got a `string`.
  The left side of `+` is `int`, so the right side must be too.

Pretty errors cannot be achieved with a coat of paint. To point at a line/region of code, you must thread source locations through every compiler phase. A minimum viable error system:

  1. Source spans on every AST node. Every expression, pattern, and type carries { file, start_line, start_col, end_line, end_col }. This costs one extra field per node.
  2. Preserve spans through desugaring. When you lower where to let, the new let node inherits the span of the where.
  3. Preserve spans through type inference. When unification fails, you need the spans of both conflicting types.
  4. Format errors with context. Show the source line, underline the relevant span, explain the mismatch.
Quality Examples Cost
Elm-tier Elm, Austral Purpose-built error messages per failure mode. Highest effort, best UX
Good enough Tao, Ante, OCaml Source spans + generic formatting. Covers 90% of cases
Positional MinCaml, most small compilers Line numbers but no span highlighting or explanation
De Bruijn indices Elaboration Zoo (intentionally) Variable names lost -- fine for research, bad for users

Lexing

Approach Used by LOC estimate Notes
Hand-written recursive MinCaml (Rust port), Tao, Ante 100–300 Full control, best errors
ocamllex / mlllex MinCaml (original), HaMLet, PLZoo 50–100 Standard for OCaml/SML hosts
Alex (Haskell) MicroHs, many Haskell-hosted 50–100 Standard for Haskell hosts
Parser combinator (integrated) Ben Lynn, some educational 0 (part of parser) Lexerless parsing

Optional enhancements:

  • Layout/indentation sensitivity (Haskell-style offside rule): Ben Lynn implements this in later bootstrapping stages. MicroHs includes full layout parsing. Adds 100–300 LOC. The algorithm is well-described by the Haskell Report's Section 2.7.
  • Unicode identifiers: Most small compilers skip this entirely. Koka supports it.
  • Interpolated strings: Syntax like "hello ${name}" is not standard in ML-family, but some newer languages add it.

Parsing

Parsing converts the flat token stream into a tree. The surface syntax is parsed into a concrete syntax tree (CST) or directly into an abstract syntax tree (AST). ML-family languages have a well-behaved grammar that is almost LL(1).

Approach Used by LOC estimate Notes
Recursive descent + Pratt/precedence climbing MinCaml (Rust port), Tao, Ante 200–500 Best error messages, easiest to extend
ocamlyacc / mlyacc (LALR) MinCaml (original), HaMLet 100–200 (grammar file) Standard, but poor error recovery
Parser combinators (Parsec-style) Ben Lynn, MicroHs, PLZoo 100–400 Elegant, compositional, backtracking
PEG / Packrat Rare in ML-family 100–300 Linear time guarantee

Every subsequent phase transforms this type. In ML-family languages, the AST typically looks like:

type expr =
  | Lit of literal                    (* 42, 3.14, "hello", true *)
  | Var of name                       (* x *)
  | App of expr * expr                (* f x *)
  | Lam of name * expr                (* fun x -> e *)  (or \x -> e)
  | Let of name * expr * expr         (* let x = e1 in e2 *)
  | LetRec of name * expr * expr      (* let rec f = e1 in e2 *)
  | If of expr * expr * expr          (* if c then t else f *)
  | Tuple of expr list                (* (a, b, c) *)
  | Match of expr * branch list       (* match e with p1 -> e1 | ... *)
  | Ann of expr * type                (* (e : t) *)

Name Resolution & Desugaring

Before type inference, the surface AST is simplified:

  1. Alpha-renaming: Every binder is assigned a unique identifier to eliminate shadowing. MinCaml's Rust port does this during type checking. Most do this while parsing or during a separate pass.
  2. Fixity resolution: Infix operators are re-associated according to declared precedence and associativity. HaMLet does this as a separate pass. Many small compilers hardcode operator precedence in the parser.
  3. Desugaring: Surface constructs are lowered into core constructs:
    • where clauses → let
    • Guards in pattern matching → nested if
    • do notation (monadic) → >>= chains
    • List comprehensions → concatMap
    • Operator sections → lambdas: (+ 1) becomes fun x -> x + 1
    • Record syntax → positional constructors + accessor functions
    • Type class instances → dictionary passing (elaboration)

Type Inference

This is the heart of an ML-family language. The "standard" algorithm is Hindley-Milner (HM) type inference, specifically Algorithm W or Algorithm J.

Core components:

  1. Type representation: Types are terms built from type variables, type constructors, and function arrows: type ty = TVar of tvar | TCon of string | TArr of ty * ty | TTuple of ty list
  2. Unification: Given two types, find the most general substitution that makes them equal. Implemented as a union-find structure over type variables with occurs-check.
  3. Generalization: At let boundaries, free type variables in a type are universally quantified to produce a polymorphic type scheme: ∀α. α → α.
  4. Instantiation: When a polymorphic name is used, its scheme is instantiated with fresh type variables.
-- Given:
let id = fun x -> x in (id 1, id true)

-- Type inference trace:
-- 1. id : α → α                (infer: x has fresh type α, body is x)
-- 2. generalize: id : ∀α. α → α    (α is free at let boundary)
-- 3. id 1:  instantiate α=β, unify β→β with int→γ, get int
-- 4. id true: instantiate α=δ, unify δ→δ with bool→ε, get bool
-- 5. result: (int, bool)
Approach Used by LOC estimate Notes
Algorithm W (substitution-based) Algorithm W Step-by-Step, PLZoo 150–400 Simplest to understand, compose substitutions eagerly
Algorithm J (mutable refs) MinCaml, most production compilers 100–300 More efficient, uses mutable unification variables
Constraint-based (HM(X)) GHC, some research compilers 500–2000 Separates constraint generation from solving; extensible
Bidirectional type checking Elaboration Zoo, some dependent type systems 200–500 Alternates checking/inference modes; scales to dependent types

But fancy type system features aren't free:

Enhancement Complexity added Used by
Type classes / traits +500–2000 LOC Haskell, MicroHs, Ben Lynn (later stages), Tao
Row polymorphism (extensible records/variants) +300–800 LOC Koka, 1ML, EYG, Grace
Higher-kinded types +200–500 LOC Haskell, Koka
GADTs +500–1500 LOC GHC, OCaml 4.x+
Algebraic effects (typed) +500–1500 LOC Koka, Eff, Frank
Dependent types (full) +1000–5000 LOC Elaboration Zoo, Idris, Lean
Algebraic subtyping (union/intersection) +500 LOC Simple-sub, MLscript
First-class polymorphism (System F) +300–1000 LOC 1ML, MLF
Module system (functors, signatures) +1000–5000 LOC HaMLet, OCaml, 1ML

Other strategies:

  • Polymorphism: Monomorphic type inference only (no quantification). Every type is fully determined. This cuts the type checker to ~100 LOC by eliminating generalization and instantiation entirely. Functions like id x = x get a concrete type at each use site.
  • Elaboration: Modern type checkers increasingly separate elaboration (translating surface syntax to a fully explicit core) from unification (solving type constraints). The Elaboration Zoo demonstrates this cleanly: each stage is a single Haskell file of 200–800 LOC, progressively adding features.
  • Type class desugaring via dictionary passing: Haskell-style type classes are implemented by translating class constraints into explicit dictionary arguments. sort :: Ord a => [a] -> [a] becomes sort :: OrdDict a -> [a] -> [a]. Ben Lynn's compiler and MicroHs both use this approach.

Pattern Match Compilation

With types inferred, pattern matching can be compiled to efficient decision trees or case trees.

Approach Used by LOC estimate Notes
Decision trees (Maranget's algorithm) Most modern compilers, Tao, Ante 200–600 Optimal -- no redundant tests, good code
Backtracking automata Older compilers, simple implementations 100–300 Simpler but can duplicate work
Nested if/switch (naive) Many educational compilers 50–100 Correct but exponentially bad in worst case
Omitted entirely MinCaml, PLZoo poly 0 Only supports if/then/else on primitives
Defunctionalized Some educational compilers 50–150 Sequence of partial functions with fallthrough; simpler but less efficient

Key phases:

  1. Exhaustiveness checking: Warn/error if a match doesn't cover all cases.
  2. Redundancy checking: Warn if a pattern is unreachable.
  3. Guard compilation: Guards add a "backtrack" obligation.
  4. Nested pattern flattening: (Cons (x, Cons (y, Nil))) → sequence of tests.

The canonical reference is Compiling Pattern Matching to Good Decision Trees. Luc Maranget's algorithm produces provably optimal decision trees in terms of the number of tests. OCaml and Rust use this approach.

Normalization

-- Before (nested expression):
f (g x) (h y)

-- After (A-normal form):
let a = g x in
let b = h y in
f a b

Every intermediate value gets a name. Every function argument becomes trivial. Evaluation order is now explicit in the let chain.

Normalization strategies:

Strategy Used by Character
K-normal form (MinCaml's variant of ANF) MinCaml and derivatives Direct-style; names all intermediate values with let
A-normal form (ANF) Flanagan et al. 1993, many modern compilers Essentially the same as K-normal form; the standard name
Continuation-passing style (CPS) Appel's SML/NJ, Rabbit, CertiCoq Every function takes an extra continuation argument; all calls are tail calls
No normalization Ben Lynn Typed AST → combinatory logic directly. Works for graph reduction, not for native codegen
SSA directly Scrapscript Skips ANF/CPS; SSA IR with SCCP + DCE. Lets LLVM/C handle the rest
Monadic normal form Some dependent type systems (Bowman, 2024) Like ANF but uses monadic bind instead of let; cleaner for certain optimizations

Optimization

With the program in normal form, optimization passes can simplify it. In small compilers, optimizations are kept minimal -- the goal is to not be embarrassingly slow, not to compete with GCC.

MinCaml's optimization passes (totaling ~300 LOC):

Pass LOC (MinCaml) Effect
Beta reduction ~50 Inline let x = y in ... x ...... y ...
Let flattening (assoc) 22 let x = (let y = e1 in e2) in e3let y = e1 in let x = e2 in e3
Inline expansion ~100 Replace calls to small functions with their bodies
Constant folding ~50 3 + 47
Dead code elimination ~50 Remove let x = e1 in e2 when x is not free in e2
Common subexpression elimination ~50 (optional in MinCaml, via hash-consing)

These six passes cover 80%+ of the optimization value for a small compiler. They are applied iteratively until a fixpoint is reached (typically 2–3 iterations).

Beyond the basics:

Optimization Complexity Effect
Tail call optimization +50–100 LOC Essential for functional languages; loops are recursive calls
Known-call optimization +50 LOC When the target of a call is statically known, skip closure indirection
Unboxing (specialization) +200–500 LOC Avoid boxing for monomorphic uses of polymorphic functions
Contification +100–300 LOC Convert functions that are always called in tail position to local jumps
Demand analysis (strictness) +500–2000 LOC For lazy languages: determine which arguments are always evaluated
Worker/wrapper transform +200–500 LOC Separate strict args from lazy ones for better codegen
Deforestation / fusion +500–2000 LOC Eliminate intermediate data structures (e.g., map f . map gmap (f . g))
Whole-program optimization varies JHC does this via GRIN; eliminates unused constructors, specializes globally

Closure Conversion

-- Before:
let f = \ x -> x + y

-- After:
let f = 
  { fun = \ env x -> x + env.y
  , env = { y = y } 
  }

The optimized IR still has functions with free variables. Closure conversion makes all functions "closed" -- because hardware doesn't understand lexical scoping. Every function becomes a pair: (code pointer, environment record). The environment captures the function's free variables at the point of definition.

Approach Used by Trade-offs
Flat closures MinCaml, OCaml, most compilers Environment is a flat vector of captured values. O(1) access, one allocation per closure. Standard choice.
Linked/shared closures Some older Scheme compilers Environment is a linked list of frames. Shares structure between closures. More allocation, slower access.
Lambda lifting GHC (selectively), some educational compilers Eliminates closures entirely by adding extra parameters. No heap allocation for the closure itself. But callers must pass more arguments, and call sites must be updated.
Defunctionalization Reynolds (1972), MLton Replace higher-order functions with first-order dispatch on a sum type. Eliminates function pointers entirely. Requires whole-program analysis.
Combinatory logic (bracket abstraction) Ben Lynn, MicroHs Replace lambdas with SKI combinators (or variants). No closures, no environments. Evaluation by graph reduction.

Code Generation

Codegen is wholly determined by your choice of target:

Target Used by LOC estimate Trade-offs
Native assembly (x86-64, ARM, etc.) MinCaml, mlml, AQaml 300–800 Best performance, most work, platform-specific
C source Koka, Scrapscript, Chicken, JHC, Austral 200–500 Portable, leverages C compiler's optimizer, but indirection
LLVM IR Ante, gocaml, Harrop's MiniML 200–500 Good native perf, cross-platform, but large dependency
Cranelift MinCaml (Rust port), some new languages 200–500 Faster compilation than LLVM, good codegen, Rust-native
Bytecode (custom VM) OCaml (ZINC machine), PLZoo miniml 200–500 Portable, simple, but slower execution
JavaScript / Wasm MinCaml-wasm, SOSML, Newt, various 200–400 Web deployment, but limited performance model
Go source Borgo 200–500 Inherit Go's ecosystem, tooling, and concurrency model
Combinatory logic Ben Lynn, MicroHs 100–300 No register allocation needed, but slow execution
Normalizer (no runtime target) Dhall 200–500 "Compilation" = reduce to normal form. No executable output

Register Allocation

Programs use arbitrarily many variables, but CPUs have a fixed number of registers. Register allocation decides which variables live in registers and which spill into memory.

If you target native assembly, you implement this yourself. The backend handles this for you if you target C/LLVM/Cranelift/etc.

Approach Used by LOC estimate Quality
Graph coloring (Chaitin-Briggs) MinCaml, Appel's textbook 200–500 Optimal for most cases, standard
Linear scan Some JITs, simple compilers 100–200 Fast compilation, slightly worse code
Naïve (spill everything) Some educational compilers 50 Correct but terrible performance
Not applicable Compilers targeting C/LLVM/bytecode 0 Delegated to backend

Runtime System

The minimal setup includes:

Component Complexity Notes
Entry point / stack setup 10–30 LOC C Set up initial heap and stack pointers
Garbage collector 100–1000 LOC C See below
Primitive operations 50–200 LOC C/asm I/O, math, string operations
Allocation routine 10–50 LOC Bump allocator (if GC handles collection)
Closure representation part of codegen How closures are laid out in memory

Lil' fun langs allocate frequently -- every closure, every cons cell, every partial application. Without reclamation, you run out of memory fast. You need to prevent garbage from accumulating:

Strategy Used by Complexity Notes
No GC (leak memory) Some educational compilers, MinCaml benchmarks 0 Viable for short-running programs
Cheney copying (semispace) Many small compilers, Appel's textbook 100–300 LOC C Simple, fast, but uses 2× memory
Mark-and-sweep Various 100–300 LOC C Doesn't move objects, no forwarding needed
Reference counting Koka (Perceus), Carp, Swift-like 200–500 LOC No pause times; Perceus achieves it precisely with no overhead via compile-time insertion
Region-based MLKit, some research languages 300–1000 LOC Compile-time memory management, no GC pauses
Arena / stack only Very simple compilers 20–50 LOC Allocate in arenas, free all at once
Ownership / affine types Rust, Carp, Lean 4 0 (compile-time) No runtime GC needed, but restricts the language

If your language has algebraic effects (Eff, Frank, Koka, Ante), the runtime needs support for delimited continuations or a CPS-transformed calling convention. Effect handlers essentially require a second stack or a segmented stack to capture continuations. Koka handles this via evidence-passing; Eff and Frank use interpretation.

My Phone Will Spam You If I Fail To Exercise By 3PM

2026-02-26 08:00:00

screenshot of convo

That's right. My phone sends annoying text messages to my friends if I don't log a workout by 3PM.

Try it yourself. To add friends as spam targets, write "Tattle." somewhere in their contact notes. Use "Automations" in the Shortcuts app to trigger it on a recurring schedule.

It's strange how this motivates me -- I'm not seeking encouragement nor validation here. My brain simply converts the situation to "I must do pushups to save my friends from my spam robot".

Whatever works.


screenshot of my Workout Tattler

Math Milestones

2026-02-25 08:00:00

Lil' Fun Langs

2026-02-20 08:00:00

LOC Host HM ADTs Match Cl. Target
Hirrolot's CoC src ~70 OCaml Interpreter
Harrop MiniML src ~100 OCaml LLVM → native
Algorithm W src ~300 Haskell Type checker only
tomprimozic/type-systems src ~300 OCaml Type checker only
lambda-calculus-hs src ~200–900 Haskell Interpreter
THIH src ~429 Haskell Type checker only
Simple-sub src ~500 Scala Type checker only
PLZoo poly src ~500 OCaml Interpreter
EYG src ~500 Gleam Interpreter
Pico-ml src ~500 TypeScript WebAssembly
TinyML src <700 SML Interpreter
Eff src ~1–2K OCaml Interpreter
Frank src ~1–2K Haskell Interpreter
Grace src ~1–3K Haskell Interpreter
Hackett src ~1–3K Racket Racket runtime
Scrapscript src ~1–3K Python C/WASM/Cosmo native
MinCaml src ~2,000 OCaml x86/SPARC/PPC native
Ben Lynn src ~2,000 Haskell/C Combinators → C VM
1ML src ~3–5K OCaml Interpreter
mlml src ~3–5K OCaml x86-64 native
Dhall src ~4K Haskell Normalizer
Ante src ~5–10K Rust Cranelift → native
Tao src ~5–10K Rust Bytecode interpreter
Austral src ~5–10K OCaml C
AQaml src ~5–8K OCaml x86-64 native
Borgo src ~5–10K Rust Go source
polytt src ~5–10K OCaml Interpreter
Newt src ~7K Newt JavaScript
HaMLet src ~10–15K SML Interpreter
SOSML src ~10–15K TypeScript Browser interpreter
MicroHs src ~15–30K Haskell/C Combinators → C/JS

I adore small programming languages. Iota is two combinators. tinylisp is 99 lines of C. milliForth is 340 bytes. Fractran multiplies fractions. Oh, K?

I've encountered tiny implementations of Forth, Lisp, C, Prolog, etc., but never "milliHaskell".

Yes, I'm still slowly working on scrapscript.

ML-style languages carry a pungent monad odor that attracts mathochists. Notable examples include Haskell, Elm, F#, Scala, and OCaml. They're "Lambda Calculus with syntactic sugar", i.e. functional and statically-typed. Most implementations extend Hindley-Milner type inference with algebraic data types, pattern matching, and closures:

Feature LOC Dependencies References
Integer arithmetic ~50 Parser, codegen MinCaml
Floating-point ~100 Parser, codegen (SSE/NEON) MinCaml
Booleans + if/then/else ~50 Parser, codegen Everything
Let bindings ~30 Parser, normalization Everything
First-class functions (closures) ~200 Closure conversion, runtime MinCaml
Recursive functions (let rec) ~50 Type inference (occurs check), codegen MinCaml
Tuples ~100 Parser, type inference, codegen MinCaml
Arrays ~100 Parser, runtime (bounds checking) MinCaml
Monomorphic type inference ~100 Unification MinCaml
Polymorphic type inference (HM) ~300 Generalization, instantiation Algorithm W, PLZoo
Algebraic data types ~200–400 Parser, type checker, runtime (tagging) HaMLet, Tao
Pattern matching (basic) ~200 Exhaustiveness check, case trees Tao, Ante
Pattern matching (optimized) ~400–600 Maranget's algorithm OCaml, Rust
Type classes ~500–2000 Dictionary passing, instance resolution MicroHs, Ben Lynn
Modules (basic) ~500–1000 Namespace management HaMLet
Modules (functors/signatures) ~2000–5000 Type-level computation HaMLet, 1ML
Row polymorphism ~300–800 Extended unification EYG, type-systems
Algebraic effects ~500–1500 Effect typing, runtime support Eff, Frank, Ante
Algebraic subtyping ~500 Polar types, biunification Simple-sub
Linear types ~600 Linearity checker Austral
Lazy evaluation ~300–500 Thunks, memoization runtime MicroHs, Ben Lynn
Garbage collection (Cheney) ~200 Runtime system Most
Tail call optimization ~50–100 Codegen (jump instead of call) MinCaml
Inline expansion ~100 Normalization pass MinCaml
Dead code elimination ~50 Free variable analysis MinCaml
Totality checking ~300–500 Coverage analysis, termination checker Tao, Dhall

Further reading:

  • Write You a Haskell (and sequel): builds a Haskell subset incrementally: lambda calculus → STLC → HM inference → ADTs → pattern matching → type classes → STG → LLVM.
  • Implementing Functional Languages: a tutorial by Simon Peyton Jones & David Lester: complete implementations of template instantiation, G-Machine, TIM, and parallel G-Machine for a lazy Core language. Reimplemented in C++ with LLVM by Daniel Fedorin.
  • The ZINC experiment: the foundational paper behind OCaml's bytecode compiler. The ZINC abstract machine uses ~140 instructions and 7 registers. Implementations include OMicroB (running OCaml bytecode on PIC18 microcontrollers with <10KB RAM) and HardCaml-Zinc (hardware implementation).
  • Elaboration Zoo: progressive dependent type checking implementations, each a single Haskell file of 200–800 lines, from basic NbE through holes, implicit arguments, and first-class polymorphism. The best resource for understanding modern elaboration. Its companion smalltt (~1–2K LOC Haskell) is a complete dependent type elaborator with normalization-by-evaluation.
  • Modern Compiler Implementation in ML: the Tiger language compiler covers every phase from lexing through graph-coloring register allocation in ~5,000–8,000 LOC of SML. Multiple GitHub implementations target x86-64 and RISC-V.

If you want a milliHaskell, all your inspiration/ingredients are right here.


Hirrolot's CoC

🤖 The most extreme capability-to-size ratio in this list — a complete Calculus of Constructions (the type theory at the top of the lambda cube) with bidirectional typing, dependent function types, and a type-in-type universe, all in a single OCaml gist of ~60–80 lines. It can express length-indexed vectors and other dependently typed programs. Not ML-family per se, but it demonstrates that full dependent types need not be complex to implement.

Harrop MiniML

🤖 MiniML demonstrates the absolute floor for a native-code ML compiler. Using Camlp4 for parsing and OCaml's LLVM bindings, it supports integer arithmetic, conditionals, and recursive first-order functions. Xavier Leroy noted the critical caveat: this is not truly "Mini-ML" since it lacks higher-order first-class functions — adding closures and garbage collection would significantly expand the codebase. Still, it shows what LLVM enables in ~100 lines.

Algorithm W

🤖 Algorithm W Step by Step by Martin Grabmüller (~300 LOC, literate Haskell) is the canonical educational implementation of Algorithm W for Hindley-Milner type inference. Self-contained, well-commented, and widely referenced — this is where most people first implement HM inference.

tomprimozic/type-systems

🤖 A collection of standalone implementations of several inference algorithms in OCaml (~300–600 LOC total): basic Algorithm W, row polymorphism (the technique foundational to Elm's original type system), and HMF (first-class polymorphism with partial inference). Each variant is self-contained in a single directory. Where Algorithm W Step by Step teaches you one algorithm well, this repository shows you what changes when you swap in more powerful type system features.

lambda-calculus-hs

🤖 A progressive collection of single-file lambda calculus implementations in Haskell (~200–900 LOC each) by Solomon Bothwell. Starts with simply typed evaluation and builds incrementally through bidirectional typechecking, normalization by evaluation (NbE), System T, records with depth subtyping, and nominal inductive types with dependent pattern matching. Each implementation is self-contained. Where tomprimozic/type-systems varies the inference algorithm, this repository varies the type system while keeping bidirectional checking as the constant.

THIH

🤖 Typing Haskell in Haskell by Mark P. Jones is the definitive executable specification of Haskell 98's complete type system in just 429 lines of core Haskell. It covers kinds, qualified types, type classes, pattern matching types, binding groups, mutual recursion, and defaulting. For context, the Hugs type checker implementing the same semantics spans 90+ pages of C. THIH is a type checker only (no evaluation), but its density of specification per line of code is unmatched.

Simple-sub

🤖 ~500 LOC of Scala. Lionel Parreaux's clean reimplementation of Stephen Dolan's MLsub — algebraic subtyping that adds union and intersection types to Hindley-Milner while preserving principal types. No annotations required. The original MLsub won POPL 2017; Simple-sub distills it into an ICFP 2020 Pearl that's small enough to read in one sitting. The ancestor of MLscript, which grows the idea into a full language with OOP and TypeScript interop.

PLZoo poly

🤖 ~400–600 LOC, OCaml. Implements a lazy, purely functional language with parametric polymorphism and HM type inference. Its sibling miniml (~300–500 LOC) includes a compiler targeting an abstract machine. Both are part of Andrej Bauer's Programming Languages Zoo, which contains 12+ miniature language implementations, each a few hundred lines of OCaml, covering everything from untyped lambda calculus to call-by-push-value.

EYG

🤖 ~500 LOC JavaScript interpreter, full implementation in Gleam. EYG ("Eat Your Greens") by Peter Saxton prioritizes predictability, portability, and crash-free programs. It uses row-typed inference (HM extended with row polymorphism), algebraic effects as the sole FFI mechanism, and closure serialization — functions can be sent to other machines for tierless client/server programming. The most distinctive feature: programs are stored as JSON ASTs, not text files. A structural editor makes it impossible to write syntactically invalid programs.

Pico-ml

🤖 An OCaml subset with HM type inference that compiles to WebAssembly, implemented in TypeScript. Small and self-contained — unusual for having a TypeScript host language rather than the OCaml/Haskell norm. A good starting point if you want to understand ML compilation targeting the browser.

TinyML

🤖 Packs a lexer, parser, interpreter, and full polymorphic HM type checker into under 700 lines of SML. Referenced on Lambda the Ultimate, this may be the smallest complete implementation with genuine Hindley-Milner inference, though the original download link appears to have gone stale.

Eff

🤖 The original algebraic effects language (2012) by Andrej Bauer and Matija Pretnar. OCaml syntax with effect handlers as first-class constructs — you declare effect operations, then install handlers that give them meaning. This is where the idea was first made concrete in a running implementation. Koka, Frank, OCaml 5's effect handlers, and virtually every subsequent algebraic effects system trace lineage here.

Frank

🤖 "Do Be Do Be Do" (POPL 2017) by Sam Lindley, Conor McBride, and Craig McLaughlin. A strict effectful functional language where functions are handlers that handle zero effects — and multihandlers generalize function abstraction to handle multiple effect interfaces simultaneously. The insight: the boundary between "function" and "effect handler" is artificial. Implemented in Haskell. Lindley describes it as "the one I'm most fond of" while noting it's "basically unmaintained." That tension between conceptual elegance and practical neglect is the story of many languages on this list.

Grace

🤖 A JSON superset with bidirectional type checking and row polymorphism, by Gabriella Gonzalez (author of Dhall). Designed explicitly as a "ready-to-fork" language skeleton — if you need a typed DSL, clone Grace and customize it. Has open records, open unions (polymorphic variants), and a clean Haskell codebase that reads like a tutorial. No Hindley-Milner per se (bidirectional instead), but closely related.

Hackett

🤖 A Haskell-like language implemented entirely as Racket macros via the "Type Systems as Macros" technique, by Alexis King. Bidirectional type inference, algebraic datatypes, pattern matching, typeclasses, higher-kinded types, and higher-rank polymorphism — all implemented not as a separate type-checker pass but as macro expansion. The meta-angle is the story: types as macros rather than a traditional elaboration pipeline.

Scrapscript

🤖 A content-addressable pure functional language where every expression reduces to a cryptographic hash, stored in a decentralized "scrapyard" registry and referenced by hash or alias. The implementation is a ~1,300-line dependency-free Python interpreter in a single file, with a baseline compiler to C (~500 LOC) and an SSA IR with SCCP/DCE optimization (~1,000 LOC). Pattern matching is the sole control-flow mechanism. Compiles to C, WebAssembly, and Cosmopolitan portable executables. Implemented primarily by Max Bernstein.

MinCaml

🤖 ~2,000 LOC, OCaml → native code. The gold standard for capability-to-code-size ratio. Written by Eijiro Sumii at Tohoku University, it implements a strict, higher-order functional language with type inference, closures, tuples, arrays, tail-call optimization, inline expansion, constant folding, and graph-coloring register allocation. It compiles to SPARC, PowerPC, and x86 assembly. On benchmarks including a ray tracer, MinCaml-compiled code runs within 2× of GCC and OCaml's ocamlopt — sometimes faster. The deliberate trade-off: it omits polymorphism, algebraic data types, and pattern matching. Used in undergraduate compiler courses at the University of Tokyo since 2001, where students build ray tracers compiled by their own compilers running on custom CPUs.

  • Repo: github.com/esumii/min-caml
  • Paper: "MinCaml: A Simple and Efficient Compiler for a Minimal Functional Language" (FDPE 2005)
  • Forks: gocaml (Go + LLVM reimplementation), miniml (OCaml + LLVM, ~1,500 LOC, adds LLVM backend to MinCaml's architecture)

Ben Lynn

🤖 ~2,000 lines of Haskell + 350 lines of C. Arguably the most remarkable bootstrapping achievement in this space. Starting from a 350-SLOC C runtime that interprets combinatory logic, Lynn builds a chain of approximately 20 progressively more capable compilers, each written in the subset of Haskell that the previous compiler can handle. The final compiler supports type inference, type classes, algebraic data types, pattern matching, guards, where clauses, monadic I/O, modules, and layout parsing — approaching Haskell 98 coverage. It compiles Haskell to combinatory logic via Kiselyov's bracket abstraction algorithm, with graph reduction evaluation. Later stages even target WebAssembly. The entire bootstrapping chain is reproducible from just a C compiler.

1ML

🤖 ~3,000–5,000 LOC, OCaml. Andreas Rossberg unified ML's core and module layers into a single language where modules are first-class values, types are values, and functors are ordinary functions. It elaborates to System Fω with HM-style inference. Won the ICFP Most Influential Paper Award in 2025. A proof-of-concept interpreter, not optimized, but a conceptual breakthrough in minimal surface area.

mlml

🤖 A self-hosting OCaml subset compiler targeting native x86-64. ~3,000–5,000 LOC. Supports pattern matching, algebraic data types, recursive functions, and closures. Does not implement type inference — it demonstrates the minimum OCaml subset needed for self-compilation.

Dhall

🤖 A total (non-Turing-complete) typed configuration language. ~4K LOC core Haskell. Normalization is guaranteed to terminate — you can always reduce a Dhall expression to a normal form, which means imports resolve, functions inline, and what you get is plain data. Based on a Calculus-of-Constructions-derived type theory with records, unions, and natural numbers. Has a formal specification and implementations in Haskell, Rust, Go, and Clojure.

Ante

🤖 Combines HM type inference, algebraic data types, pattern matching, algebraic effects, and an ownership-like system for shared mutability. Written in Rust, it uses Cranelift for native code generation. Actively developed, aiming to bridge the Rust/OCaml divide.

Tao

🤖 Surprisingly feature-rich for its size: generics, typeclasses, sum types, pattern matching, first-class functions, currying, algebraic effects, associated types, and totality checking. Its pipeline runs from lexing through HIR type inference to MIR monomorphization and bytecode execution. Written in Rust.

Austral

🤖 A systems language with linear types and capability-based security. The linear type checker is ~600 lines. OCaml bootstrap compiler targeting C. Designed by Fernando Borretti to fit in one person's head — the spec is deliberately small enough that a single developer can understand the entire language. Not functional in the Haskell sense, but linear types make it adjacent. An experiment in "what if we took linear types seriously but kept the language small."

AQaml

🤖 A self-hosting OCaml subset compiler targeting native x86-64. ~5,000–8,000 LOC. Adds records, variants, references, and garbage collection beyond what mlml supports. Triple self-hosting verified. Like mlml, it omits type inference — demonstrating the minimum OCaml needed for self-compilation.

Borgo

🤖 Adds ML-family features (algebraic data types, exhaustive pattern matching, Result/Option types) to Go's ecosystem by compiling to Go source code with Rust-like syntax. Written in Rust.

polytt

🤖 A research experiment from the Topos Institute extending Martin-Löf Type Theory with native, first-class polynomial functors — the mathematical objects underlying deterministic state machines and interactive systems. Written in OCaml with Menhir parsing. Custom syntax for polynomial types (y^n), morphism arrows, and wiring operators. Dependent types (Pi, Sigma), finite-set ADTs, and pattern matching via case elimination. An ended experiment, but a unique point in the design space: what happens when you make polynomial functors a language primitive rather than an encoding.

Newt

🤖 ~7K LOC, self-hosted, compiles to JavaScript. A dependently typed language with Agda/Idris/Haskell-like syntax by Steve Dunham. Bidirectional typechecking with normalization by evaluation (based on Elaboration Zoo), typeclasses, ADTs with dependent pattern matching, case tree compilation, trampoline-based TCO for mutually tail-recursive functions, and erasure of compile-time-only values (0/ω quantities). Has a web playground and an LSP. The compiler is written in Newt itself. Built as a learning exercise, but the feature set — self-hosting, dependent types, typeclasses, erasure, LSP — puts it well beyond most pedagogical implementations.

HaMLet

🤖 ~10,000–15,000 LOC, SML. Andreas Rossberg's most faithful implementation of the Definition of Standard ML. It implements all of SML '97 including the full module system (signatures, structures, functors), mapping rule-by-rule to the formal Definition. Jeremy Yallop recommends it as the most readable SML implementation. It can be bundled into a single SML file and compiled by any SML implementation. A compile-js branch demonstrates compilation to JavaScript.

SOSML

🤖 ~10,000–15,000 LOC, TypeScript. Implements the full SML core language in the browser: val/fun/datatype declarations, pattern matching, HM type inference, exceptions, and references. Used for teaching at Saarland University.

MicroHs

🤖 By Lennart Augustsson (one of GHC's original creators) — the most complete "small" Haskell compiler alive today. It compiles an extended subset of Haskell 2010 including type classes, do-notation, deriving, record syntax, overloaded literals, and modules. It is fully self-hosting and — critically — bootstrappable from only a C compiler (no pre-existing Haskell toolchain required). MicroHs translates Haskell to combinators executed by a C runtime. It has a JavaScript runtime target, a package manager (mcabal), and can compile real Hackage packages like QuickCheck. The codebase is not trivially small (estimated 15,000–30,000 lines across compiler, libraries, and runtime), but for what it does — a near-complete Haskell compiler bootstrappable from C — it is remarkably compact.