About Slava Akhmechet

I currently work at Azure on distributed services. Before that I worked at Stripe, and before that I cofounded RethinkDB.

The RSS's url is : https://www.spakhm.com/feed.rss

Please copy to your reader or subscribe it with :

Preview of RSS feed of Slava Akhmechet

Benchmarking ts-wolfram v. Mathematica

2024-10-21 08:00:00

My original plan for ts-wolfram was to quickly write a toy Wolfram Language interpreter in Typescript to better understand Mathematica internals, then abandon it as an exhaust product of learning. But one feature of interpreters is that building them is really fun. Once you get something working you want to keep hacking on it. So, last weekend I decided to allow myself to get nerd-sniped and worked on ts-wolfram some more.

I wanted to find out how much slower ts-wolfram is than Mathematica. To measure this I added two more commands: Do which evaluates an expression a specified number of times, and Timing which does actual measurement. I then measured the performance of a simple (and deliberately very inefficient) fibonacci function on my Apple M1:

fib[1] := 1
fib[2] := 1
fib[n_] := fib[n-2] + fib[n-1]

Timing[Do[fib[15], 1000]]

(* Mathematica: 0.44s *)
(* ts-wolfram: 3.4s *)

A mere ~8x slowdown was surprising! I put no effort into efficiency and expected closer to ~100x slowdown. Still, I was curious how much time I could shave off with simple optimizations. I reran the code with node --inspect, connected Chrome’s excellent profile visualizer, and narrowed down the hot spots. I then made the following changes:

All of these changes combined got me down to… 0.98s, an only ~2.2x slowdown!1 I find this incredible. Certainly Mathematica’s term rewrite loop is optimized to death, and I only spent an hour or two making the most basic optimizations. The fact that V8 runs my barely optimized term rewriting code only ~2.2x slower than Mathematica’s hyper-optimized engine is a testament to the incredible work done by V8 performance engineers.

As I write this I still find myself surprised. I always vaguely knew that V8 is fast. But it never sunk in exactly how fast it is until now.


  1. All the usual benchmarking disclaimers apply. This is meant to be a smoke test of a pet project rather than a serious industry benchmark.↩︎

A Mathematica interpreter in Typescript

2024-10-17 08:00:00

Try it on GitHub: https://github.com/coffeemug/ts-wolfram

I’ve been using Mathematica to help with learning math (and using math to help learn Mathematica). Too often my Mathematica code has surprising behavior, which means I don’t really understand the language or how it’s evaluated. So I thought I’d do two things to get better.

First, write a bunch of Mathematica rules to implement a toy symbolic differentiation operator. Second, write a toy Mathematica interpreter in Typescript that’s good enough to correctly run my custom differentiation code.

Toy differentiation

Writing a toy differentiator turns out to be shockingly easy. It’s a near verbatim transcription of differentiation rules from any calculus textbook:

D[_?NumberQ, x_Symbol] = 0;
D[x_, x_Symbol] = 1;
D[Times[expr1_, expr2_], x_Symbol] =
  D[expr1, x] expr2 + D[expr2, x] expr1;
D[Plus[expr1_, expr2_], x_Symbol] = D[expr1, x] + D[expr2, x];
D[Sin[x_], x_Symbol] = Cos[x];
D[Cos[x_], x_Symbol] = -Sin[x];
D[f_Symbol[expr_], x_Symbol] :=
  (D[f[x], x] /. x -> expr) * D[expr, x];
D[Power[expr_, p_Integer], x_Symbol] := p expr^(p - 1) * D[expr, x];

My first implementation had three more rules (one for each of dxk/dx,(cf)(x)dx^k/dx, (cf)’(x), and (1/f)(x)(1/f)’(x)), which I later realized you don’t need. These are shortcuts for human differentiators, but they’re automagically covered by the multiplication and exponentiation rules above.

Some problems I encountered while writing these rules: infinite recursion, rules not matching, and rules matching but evaluating to a surprising result. The hobby edition of Mathematica has some tools for debugging these problems, but not much. Ultimately I fixed bad rules by staring at the code and thinking really hard. I found Trace impossible to read, and TracePrint (which is supposed to be better) not much better. Also MatchQ is good, but somehow not as useful for debugging as I would have liked.

Toy interpreter

I first implemented basic parsing of integers, symbols, forms, and arithmetic operators using ts-parsec (which I wrote for this project). In Mathematica a 2 b 3 evaluates to Times[6, a, b] because Times has a Flat attribute. To get this behavior I implemented attributes next– Flat, and also HoldFirst, HoldRest, and HoldAll which I’d eventually need. I also exposed Attributes, SetAttributes, and ClearAttributes. These all accept (and Attributes returns) lists, so I added those too. All this was easy enough.

I wanted to implement assignment next so I could say a = 1. In Mathematica even something this simple is implemented by adding RuleDelayed[HoldPattern[a], 1] to OwnValues.1 So the next step was to build the skeleton of a term rewriting system.2 I first implemented a version of MatchQ that does a deep equal, extended it to handle Blank[], extended it again to handle Blank[...], and extended it again to handle Pattern[foo, Blank[...]]. The version exposed to the interpreter returns True or False, but under the hood it also returns an environment with matched variables. I built on that next to implement term rewriting.

A really simple rewrite rule is f[1] /. f[x_] :> x + 1. In Mathematica this parses to

ReplaceAll[f[1], RuleDelayed[f[Pattern[x, Blank[]]], Plus[x, 1]]]

With my MatchQ implementation there was now enough machinery to get this working. I added operators /. and :> to the grammar, implemented Replace, and built ReplaceAll on top. I tested a bunch of rewrite rules and they all worked! From here it was also easy to add -> and //. which I did next.

I had enough machinery to implement assignment. I added Set and SetDelayed, and modified evaluation to apply the rules stored in OwnValues and DownValues. This let me assign values to symbols and define functions! I could now run code like this:

fib[1] := 1
fib[2] := 1
fib[x_] := fib[x-2] + fib[x-1]

One caveat is that Mathematica inserts rules into value lists in order of specificity, so specific rules are tried before general rules. I initially added rules in order of definition to get assignment working, but then went back and added a simple specificity ordering function3.

EDIT: I wrote specificity ordering code at late at night, and just realized it was completely broken. I removed it; rules are now processed in the order they’re entered. But everything else still works!

Finally, I added support for PatternTest/? and NumberQ. These were all the features needed to do differentiation!

Differentiating in ts-wolfram

I ran D in ts-wolfram on the following examples, and cross-checked with the results in Mathematica:

D[1, x]
D[x, x]
D[x^5, x]
D[3 x^2, x]
D[(x + 1) (x + 2), x]
D[x^2 + x^3, x]
D[Cos[x], x]
D[x^3/(x^2 + 1), x]
D[Cos[Cos[x]], x]
D[Cos[Cos[Cos[x]]], x]
D[Cos[x^2 + 1], x]
D[(x + 1)^2, x]

Somewhat miraculously, ts-wolfram got correct results on every test! Since I didn’t add any manipulation to simplify expressions, the results weren’t exactly the same. For example:

D[Cos[Cos[x]], x] 

(* ts-wolfram outputs *)
Times[Minus[Sin[Cos[x]]], Minus[Sin[x]]]

(* With `FullForm`, Mathematica outputs *)
Times[Sin[x], Sin[Cos[x]]]

This would be easy to fix by adding the rule Times[Minus[x_], Minus[y_]]=x y, but (a) Times is implemented in typescript, and the interpreter doesn’t currently support mixing kernel and userspace rules, and (b) extending the system to simplify algebraic expressions feels like a different undertaking.

Overall, this has been a really fun and instructive project. I built it in four days hacking on it after work, and learned a great deal about Mathematica internals. Of course this is still only scratching the surface, but now I feel a lot less lost when Mathematica doesn’t behave the way I expect. I’m very happy with this outcome!


  1. You can check this by running a = 1; OwnValues[a].↩︎

  2. If you don’t already understand how this works, reverse engineering it is tricky, even with Mathematica’s great docs. For examples why, see Stack Overflow questions here, here, and here.↩︎

  3. The exact details for how Mathematica handles pattern specificity are not so easy to find. I didn’t try too hard to reverse engineer it; I just did what seemed reasonable and moved on.↩︎

A parser combinator library in Typescript

2024-10-12 08:00:00

Try it on GitHub: https://github.com/coffeemug/ts-parsec

I write a lot of throwaway interpreters to play with programming language design ideas. For these projects, writing a parser is usually the most frustrating part. Parser libraries are hard to learn, easy to forget, and finicky to use. The other option, hand-coding a custom parser for each interpreter, raises the activation energy to start a project high enough that I abandon too many ideas before I try them.

All of this is unsatisfactory. To solve this problem I wrote a parser combinator library for myself in Typescript. It has these design goals:

Example

Here is a simple example:

const digit = range('0', '9');
const lower = range('a', 'z');
const upper = range('A', 'Z');
const alpha = either(lower, upper);
const alnum = either(alpha, digit);

const ident = seq(alpha, many(alnum)).map(([first, rest]) =>
  [first, ...rest].join(""));

You can see how these parsers build on top of each other. I added a map method to support transforming the concrete syntax tree into an AST on the spot. Here seq(alpha, many(alnum)) return a tuple with an alphabetic character and an array of alphanumeric characters. But we don’t want to deal with that when handling identifiers– we just want to deal with a string. I can do that with a simple map.

Parsers operate on a special stream type that’s mostly irrelevant to the end user. To parse an identifier you’d do this:

const input = "Hello";
const stream = fromString(input);
ident(stream);

Actually, I lied a little. By default the stream automagically skips whitespace. That’s the desired behavior for most higher-order parsers, but when parsing keywords, identifiers, numbers, etc. we want to turn that behavior off. So in practice ident would be defined like this:

// `lex` turns off skipping whitespace for the parser it's wrapping
const ident = lex(seq(alpha, many(alnum))).map(([first, rest]) =>
  [first, ...rest].join(""));

All the usual suspects like seq, either, maybe, some, many, and sepBy are implemented in the library. This turns out to be enough to write parsers for most grammars I may ever want to parse.1

Calculator

One limitation of recursive descent parsers is that they fall into an infinite loop on left recursion. You can manually rewrite your parser to avoid left recursion, but it’s a pain. This is relevant for toy interpreters because left recursion is the most natural way to express grammars for basic arithmetic. To avoid having to deal with this problem I added a helper parser binop for parsing binary operators. Using binop a calculator grammar looks like this:

const factor = binop(either(str('*'), str('/')), int,
  (op, l, r) => [op, l, r]);

const term = binop(either(str('+'), str('-')), factor,
  (op, l, r) => [op, l, r]);

const input = "1 + 2 * 3";
const stream = fromString(input);

// produces `['+', 1, ['*', 2, 3]]`
term(stream);

(The version binop is left-associative. There is a right-associative version binopr for binary operators like assignments.)

Practical problems

There are two painful limitations of this library, one fixable (but not yet fixed), the other inherent to its design.

The fixable problem is that the library produces no error messages whatsoever. It’s structurally set up to handle errors, but I haven’t implemented error reporting yet. So if something goes wrong during parsing, there is no useful information at all. I have some ideas for how to make error reporting easy and really good, but haven’t gotten around to working on this.

The more serious problem is that type safe parser combinators seem like an elegant, obviously good idea, but they turn out to kind of suck in practice. Maybe I’m not smart enough, or maybe I’m too lazy to properly understand the ins and outs of the Typescript type system, or maybe I just need to work a little harder to mature the library. But whatever the reason, every time I do semi-advanced type hackery like this, I end up spending more time dealing with weird type errors than actually working on my grammar. It’s all right in this narrow case because it’s fun, I know the ins and outs of the library, and it’s meant for throwaway/toy interpreters. But for a serious project I’d use an ugly, boring, old-school parser generator.

Or, more likely, bite the bullet and code the parser by hand.


  1. There are many grammars this doesn’t parse, but see the design goals. It’s not meant to! Actually, I would argue that if your programming language can’t be parsed with a PEG parser, it’s hard for humans to parse too.↩︎

Visualizing Mean Value Theorem in Mathematica

2024-10-06 08:00:00

The mean value theorem is a surprising Calculus result that states for any function ff differentiable on (a,b)(a,b)1 there exists x(a,b)x(a,b) such that

f(x)=f(b)f(a)ba

Here are three informal intuitions for what this means (all of them say the same thing in different ways):

  1. Physical example. If you travel 60 miles in one hour, at some point you must have been traveling exactly 60 miles per hour.2
  2. Geometric intuition. There exists a line tangent to ff parallel to the line between the endpoints (i.e. parallel to the line between (a,f(a))(a, f(a)) and (b,f(b))(b, f(b))). See figure 1 below for an illustration.
  3. Algebraic intuition. There exists a point xx at which instantaneous rate of change of ff is equal to the average change of ff on [a,b][a,b].

The proof is remarkably straightforward. You define a function hh that maps values between aa and bb to a height from ff to the line segment between the endpoints. This function hh turns out to be differentiable, and hence has a maximum value (and hence there is a point aa such that h(a)=0h’(a)=0). From there it’s an easy algebraic transformation to demonstrate the mean value theorem is true. See figure 2 for an illustration.

Mathematica

I won’t repeat the proof, but what I wanted to do is play with Mathematica to generate an interactive visualization of how the proof works (I generated above figures in Mathematica, but the final product has an interactive slider to make the proof more clear).

For tasks like this Mathematica is wonderful. Here I define a line at an angle, and another function ff that adds a sine to it. Because the visualization is interactive ss will allow the user to change the slope:

interpol[s_, a_, b_] := a + s*(b - a);

g[s_, x_] = interpol[s, 1/2, 0]*x + interpol[s, 1, 0];
f[s_, x_] = Sin[x - 1] + g[s, x];

These are symbolic definitions. We can plot them, take derivatives, integrate, and do all kinds of fancy computer algebra system operations. Here is an example. The vertical line in figure 2 comes down from the mean value of ff– i.e. a point on ff where the tangent is parallel to the average change. In Mathematica code we can take the derivative of ff (using Mathematica function D), and then solve (using Solve) for all values where the derivative is equal to mean value:

fD[a_] := D[f[s, x], x] /. x -> a;

avgSlope[s_] = (f[s, Pi + 1] - f[s, 1])/Pi;
meanPoint[s_] = 
  Solve[fD[x] == avgSlope[s] && x > 1 && x < Pi + 1, x];

This is the core of the code. Most of the rest of the code is plotting, which Mathematica does exceptionally well. I exported the plots to figures above using the Export function. Another notable function is Manipulate– this is what makes the graph dynamic as the user drags a slider (by changing the variable s which the equations depend on). Finally I was able to publish the notebook in a few clicks, as well as publish the visualization itself using CloudDeploy. Instantaneous deployment of complex objects is very cool and useful.

Snags

There are a few things I don’t like, but it’s too early to say these are Mathematica’s flaws (more likely it’s my lack of familiarity):

Normally I learn very unfamiliar/weird programming languages by writing a toy interpreter of the target language in a language I already know. The Mathematica rules engine is a great candidate. Putting this in my virtual shoebox of future projects.


  1. ff must also be continuous on [a,b][a,b].↩︎

  2. Assuming you were accellerating smoothly.↩︎

  3. Warning: this is probably not the most idiomatic Mathematica code.↩︎

Want now

2024-10-01 08:00:00

There are all sorts of things people want. To pay off debt1, to help the environment, to be kind to strangers.

But we don’t want these things now. Right now we want the latest iPhone, our drink to go, to zip through traffic faster. And since anything we ever do we do in the now, paying off debt, helping the environment, and kindness to strangers remain perpetually in the future.

Failed products aren’t necessarily products people don’t want. But they are always products people don’t want now. Confusing wanting” for wanting now” is an easy mistake to make. I’ve probably made it more than anyone. People are adamant about the urgency of their wants (or maybe I’m just credulous). But once you know what to look for, you can begin telling the two apart. Wanting is free. Wanting now exacts a price.

Put differently, there is no tradeoff to wanting. It costs nothing to want to pay off debt. You are not confronted with a tradeoff until it’s time to write a check to the credit card company, or to deny yourself an expensive purchase. And when that time comes, you have to want to pay off debt now. Which to your surprise you discover you don’t actually want.

The surprise is important. We don’t know what we want now until the now comes, and we have infinite capacity to lie about it to ourselves and others. So to predict behavior you cannot trust what people say. You can only trust the data as you observe them confront a tradeoff. This is so important it’s worth repeating. To predict behavior you must observe people confront a tradeoff.

Wanting has different texture from wanting now. Because it’s tradeoff-free, wanting tends to be abstract, altruistic, and aspirational. Conversely, wanting now is concrete, selfish, tangible, and opportunistic. Wanting is in far mode, wanting now in near mode. Wanting preferences are often stated but rarely revealed; wanting now preferences are eventually revealed but are rarely stated.

Can you ever give people what they want? Yes, if you bundle it with what they want now. This was the brilliance of Tesla. You got what you want now, a badass car. And in the process you also got what you want– to migrate the world away from fossil fuels. The first part is hard enough. So most products don’t bother with the second part, or at best do it as a marketing exercise. Only magical products credibly give you both.


  1. This is true for technical debt as much as for credit card debt.↩︎

Decoding Leibniz notation

2024-09-30 08:12:20

I wrote this for myself to understand the Leibniz notation. Prerequisites for this post are the definition of the derivative and the Lagrange notation. If you don’t understand these yet, please study them first.

So…

You may have already seen something like dydx. This is called the Leibniz notation. The Leibniz notation has many of what Spivak calls vagaries”. It has multiple interpretations– formal and informal. The informal interpretation doesn’t map to modern mathematics, but can sometimes be useful (while at other times misleading). The full, unambiguous Leibniz notation is verbose, so in practice people end up taking liberties with it. As a consequence, its meaning must often be discerned from the context.

This flexibility makes the notation very useful in science and engineering, but also makes it difficult to learn. I explore it here to make learning easier.

Historical motivation

We start with the historical interpretation, where the notation began. Leibniz didn’t know about limits. He thought the derivative is the value of the quotient

f(x+h)f(x)h

when hh is infinitesimally small”. He denoted this infinitesimally small quantity of hh by dxdx, and the corresponding difference f(x+dx)f(x)f(x+dx)-f(x) by df(x)df(x). Thus for a given function ff the Leibniz notation for its derivative ff’ is:

df(x)dx=f

Intuitively, we can think of dd in a historical context as delta” or change”. Then we can interpret this notation as Leibniz did– a quotient of a tiny change in f(x)f(x) and a tiny change in xx. But this explanation comes with two important disclaimers.

First, dd is not a value. If it were a value, you could cancel out dds in the numerator and the denominator. But you can’t. Instead think of dd as an operator. When applied to f(x)f(x) or xx, it produces an infinitesimally small quantity. Alternatively you can think of df(x)df(x) and dxdx as one symbol that happens to look like multiplication, but isn’t.1

Second, df(x)dx denotes a function (the same one denoted by ff’), not a value at a point (i.e. not f(a)f’(a)). To denote the image of the derivative function at aa we use the following notation:

df(x)dxx=a=f(a)

Writing all that is a pain and in practice people rarely do it this way, but we’ll get to that in a minute.

Modern interpretation

To summarize, the full and unambiguous Leibniz notation is:

df(x)dx=fanddf(x)dxx=a=f(a)

In modern mathematics real numbers do not have a notion of infinitesimally small quantities. Thus in a modern interpretation we treat df(x)dx as a symbol denoting ff’, not as a quotient of numbers. Nothing here is being divided, nothing can be canceled out. In a modern interpretation df(x)dx is just one thing that happens to look like a quotient but isn’t, anymore than ff’ is a quotient.

Second derivative

A question arises for how to express the second (or nth) derivative in the Leibniz notation. Let g(x)=df(x)dxg(x)= (i.e. let gg be the first derivative of ff). Then it follows that the second derivative in Leibniz notation is dg(x)dx=g=f=g’=f’′. Substituting the definition of gg we get:

d(df(x)dx)dx=f

Of course this is too verbose and no one wants to write it this way. This is where the vagaries begin. For convenience people use the usual algebraic rules to get a simpler notation, even though formally everything is one symbol and you can’t actually do algebra on it:

d(df(x)dx)dx=d2f(x)dx2

Two questions arise here.

First, why dx2dx^2? Shouldn’t it be (dx)2(dx)^2? One way to answer this question is to remember that dxdx is one symbol, not a multiplication (because dd is not a value). And so we’re just squaring that one symbol dxdx, which doesn’t require parentheses.

Another probably more honest way to answer this question is to recall that this isn’t real algebra– we just use a simularcum of algebra out of convenience. But convenience is a morally flexible thing, and people decided to drop parentheses because they’re a pain to write. So (dx)2(dx)^2 became dx2dx^2.

Second, we said before that df(x)df(x) can be thought of as one symbol. Then what is this d2d^2 business? The answer here is the same– we aren’t doing real algebra but a simularcum of algebra. We aren’t really squaring anything; we’re overloading exponentiation to mean second derivative”. The symbol d2f(x)d^2f(x) is again one symbol.

Liberties and ambiguities

There are a few more liberties people take with the Leibniz notation. Let f(x)=x2f(x)=x^2. If we want to denote the derivative of ff we can do it in two ways:

df(x)dxordx2dx

Here dx2dx is new, but the meaning should be clear. We’re just replacing f(x)f(x) in df(x)df(x) with the definition of f(x)f(x). This is a little confusing because in the particular case of f(x)=x2f(x)=x^2, it’s visually similar to the notation for second derivative. There are no ambiguities here so far– it’s just a visual artifact of the notation we have to learn to ignore. But now the liberties come.

Suppose we wanted to state what the derivative of ff at a point aa is. In Lagrange notation we say f(a)=2af’(a)=2a. In Leibniz notation the proper way to say it is:

df(x)dxx=a=2a

But this is obviously a pain, so people end up taking two liberties. First, everyone drops the vertical line that denotes the application at aa. So in practice the form above becomes:

df(x)dx=2x

This shouldn’t compile” because df(x)dx=f=f’. Thus this statement is equivalent to saying f=2xf’=2x, which should be a syntax error. But this is the notation most people use, and you have to get used to it.

Second, people decided that writing df(x)dx is too painful, and in practice everyone writes dfdx. This also shouldn’t compile (it would be something like writing limxaf_{x a}f, which also is a syntax error). But again, it’s the notation most people use.

To summarize what we have so far:

df(x)dxx=a=2abecomesdfdx=2x

Even more liberties

You’d think that we already pushed the notation past all limits of propriety, but scientists and engineers manage to push it even further. Consider the following simple problem. A circle’s radius is growing at 1 inch per second. How quickly is the area of the circle growing? Let’s solve it with Lagrange’s notation first.

The area of a circle is A=πr2A=r^2. We’re trying to understand change by using derivatives to analyze behavior of functions. Since rr is changing, what we’ll be looking at is the function for the area of a circle A(r)=πr2A(r)=r^2. And since the radius is changing with time, we have another function for the radius at a particular time r(t)r(t). The problem doesn’t tell us how rr is defined, but it tells us its derivative is r(t)=1 in/sr’(t)=1. All we have to do now is take the derivative of AA:

A(r(t))=(Ar)(t)=π2r(t)r(t)=2πr(t) in/s

Thus at a given time tt the area is increasing at the rate of 2πr(t) in/s2r(t).

Now here’s the rub. In science and engineering most values are somehow related to other values, and nearly everything is related to time. Explicitly defining functions makes even simple relationships (like the one above) complicated to write down. So people dispense with denoting functions explicitly, and just treat these quantities as functions. In practice, the Leibniz notation for the equation above is something like this:

dAdt=dAdrdrdt=2πr in/s

We’re not explicitly defining or mentioning functions anywhere, but immediately proceed with the understanding that the variables AA and rr are really functions.

As a matter of studying advice, I spent hours trying to understand exactly why anyone might want to do this and how the mechanics work, until I sat down to do a bunch of simple related rates problems, at which point abusing the notation in this way quickly became the most natural thing in the world. So if you’re stuck, go solve a bunch of simple problems and then come back here. Hopefully by then everything will make a lot more sense.


  1. I read somewhere that in his notebooks Leibniz experimented with extending dd with a squiggle on top that went over xx to indicate that dd is not a value, but I haven’t been able to verify if that’s true.↩︎

Linear algebra self-study

2024-05-12 09:18:24

I finished self-studying Axler’s Linear Algebra Done Right (3rd edition). I wanted to understand linear algebra and picked up LADR because it’s the most recommended book online. Some notes below.

After picking the book I found a corresponding syllabus online. I wanted one that has a clear reading schedule and homework assignments. The one I initially used was this one from Brown. I tried to follow the actual class schedule, but dropped that pretty quickly. Sometimes I ended up going faster, sometimes slower. I finished the course in three months– on balance about the same time as the original class schedule. The Brown syllabus didn’t cover the entire book, but I wanted to do more chapters. So when I was done I found another syllabus from Berkeley that covers more of the book, and did the remaining problems from there.

I did all the homework problems in Overleaf. There were a total of 197 problems assigned; I ended up finishing 167. There were 30 problems I couldn’t solve quickly enough and decided to move on. I’d like to come back to these, but have not done that yet.

I extensively used ChatGPT 4 to check my homework problems (the paid version; I found ChatGPT 3.5 isn’t good enough). Prompts like check the following homework problem” or critique the following proof” usually produce very good results. Sometimes ChatGPT was wrong in its critique. Typically, but not always, tinkering with the arguments to make ChatGPT happy made for better proofs anyway.

One edge case where ChatGPT performed poorly was proofs by contradiction. It tends to have a lot of trouble understanding those. A minor nit is that ChatGPT UI renders Latex correctly maybe half the time. When it does, the experience is excellent. When it doesn’t, reading the output is a pain.

Sometimes I needed hints to solve homework problems. For those I texted a friend with a math degree and he’d usually point me in the right direction (thanks Ryan!) I never found a prompt that would get ChatGPT to give good hints; my friend’s hints were always dramatically better.

In general I find myself bored with lectures, I almost always would rather read a textbook. But there were a few points where the material got especially difficult. For those I supplemented with these lectures from Penn State that are pretty faithful to the book. I must admit to occasionally phoning it in on understanding some proofs. Axler’s proofs are beautiful, but his proofwriting was a hit and miss for me. Some proofs flowed like a poem. Others sputtered. It was never too hard to understand the sputtering ones, but it did require a degree of conscientiousness I sometimes wasn’t able to generate.

Which brings me to the book itself. I have mixed feelings about LADR. On the one hand, Axler delivers. If you diligently read the book and struggle through the exercises, you will understand the material. And once you’ve understood the material (and often even before that) you can appreciate the elegance of the exposition.

This commitment to elegance makes the material more difficult to absorb. Some struggle is endemic in learning math, but it need not be any more difficult than necessary. Later I found Terence Tao’s linear algebra notes, which are roughly as rigorous as LADR but are much easier to understand. One plausible reason is that I found Tao’s notes after I’d already worked through LADR and understood the abstractions. But I don’t think so. It seems to me that Tao set out to minimize student confusion and Axler set out to write an elegant linear algebra book. Both texts achieve their respective goal.

How to send progress updates

2024-04-04 03:12:56

If you work on anything worthwhile, sooner or later people will care about it and will want you to send progress updates. These could be quarterly investor updates, weekly updates to your boss, emails to adjacent teams, etc. Here are tips on how to do this well.

  1. Understand your role, and with each update add to the body of evidence that you’re a good steward in that role. If people want your updates, they’ve entrusted you with something– a successful delivery of a product or feature, investment capital, company budget, their reputation, something. Convey that you value their trust and take stewardship seriously.
  2. Add a little randomness to the cadence. People think they want regular cadence, but they’re happier with bounded irregularity. For example, if you send project updates every Tuesday they will seem transactional and no one will read them. If instead you send updates every 2-3 weeks, your audience will look forward to them because they’ll assume you have something new to say.
  3. Know what your next update will be and work toward it (instead of coming up with an update when it’s time to send one). This is Headline driven development for an internal audience. If you don’t have a headline you don’t have an update, and you can’t generate good headlines post-factum.
  4. Always start with a one sentence TL;DR and a 2-4 sentence recap of the overall goals of the project. Assume your audience is smarter than you are, but is very busy and remembers nothing about your work.
  5. People love pleasant surprises, but these don’t come along often enough by chance. Within reason, deliberately engineer pleasant surprises so you can include them in your updates.
  6. People hate unpleasant surprises. Obviously, avoid these if possible. But if unavoidable, take two steps. First, talk to each person privately before informing the group. Second, deliver negative news in 2-3 escalating phases. For example, start by softly expressing the possibility of a problem to give people time to adjust. The next time, state the problem as fact. (But don’t do this if there is a genuine emergency or crisis.)
  7. Acknowledge changes explicitly. If you said a the last time and b this time, and b conflicts with a, you need to explain the inconsistency. People perceive acknowledged inconsistencies as cost of doing business, but unacknowledged inconsistencies as broken promises.
  8. Don’t insult anyone, accidentally or on purpose. I once wrote an update that said something along the lines of our engineers don’t know anything and therefore can’t ship, we need better engineers”, and sent it to everyone including the engineers themselves. It was factually true, but crude and unnecessary. Don’t do this.
  9. People want a steady hand at the helm. Your tone should reflect that. You want the text equivalent of pilot radio voice. (I know, I’m mixing my metaphors.)
  10. Many people (correctly) worry they’re being personally evaluated by their updates, so they sanitize every sentence to death. Don’t do this. Make it all about the work, not about you, and leave the evaluating to others. (I visualize myself in a third-person view physically separate from the work, and pretend my character is writing the update.)
  11. Know the top three questions your audience wants answered, and state the answers as clearly as possible.
  12. Add a dedicated section for worries and failures. Be honest, have good plans, and don’t panic. People are drawn to conscientiousness and vulnerability but repelled from haplessness and histrionics.
  13. The goal of updates is for your audience to know how your project is doing at any given time without having to ask you.
  14. Elon yells at Wall St analysts in quarterly earnings calls, why can’t I?” If you built a company with a market cap greater than the rest of its industry combined, you have my blessings to ignore my advice.
  15. These tips don’t work if you’re incompetent.

Headline driven development

2024-04-01 11:54:33

Here is a simple process for shipping software projects that works. First, decompose the project into a stream1 of headlines. Then pick an aggressive date to ship the first headline and work like hell to meet that date. Have everyone work only on one headline at a time– the upcoming one. Ignore everything else. Don’t work on anything that doesn’t help you ship the headline. Once the headline is shipped, switch to the next headline in the stream and repeat. That’s all, you can fire your agile consultant.

A headline is a very short sentence that contains only the highest order bit, with all the other bits culled. Imagine you bump into someone on the street after not having seen them for a few months and they ask what you’ve been up to. What kinds of responses work well in this situation? I trekked through Southeast Asia”, I switched jobs”, I got married”. Software release headlines work the same way. You can now rent VMs through an API, we rolled out FSD autopilot”, Treasury is available in India”.

Headline driven development works really well for three reasons.

First, headlines is how humans process change. If you ever found your users confused, your boss frustrated, your investors anxious, your peers indifferent– these problems go away when you organize communication around a stream of headlines. But it doesn’t work as an afterthought. Communicating through headlines but developing in some other way is like leading a double life. It gets too messy. So to communicate with headlines you must develop with headlines too.

Second, it makes it easy to ruthlessly prioritize. If you can credibly ship a headline without something, cut that something. For example, suppose you’re working on your Southeast Asia trek headline and you’re planning to visit six countries. Can you credibly say to your friends I trekked through Southeast Asia” only having visited five countries instead of six? Obviously yes. So one of the countries gets cut. How about four countries? Repeatedly go through this exercise and stop before the credibility of the headline is at risk. You want to do the minimum possible amount of work that still leaves the headline credible.

Third, the deadline effect is real. Most of the work in college happens at midnight before the project is due. The industry isn’t that different. So simulating class assignments turns out to be a very effective way to ship quickly. You need a discrete chunk of work, with an arbitrary deadline2, and a binary outcome. You get this with headlines– a headline has either shipped by a given timestamp or it hasn’t.


  1. I mean stream” in a programming language sense– an infinite list of elements with ability to pop one at a time.↩︎

  2. Advertise arbitrary deadlines to candidates up front and let self-selection do its magic.↩︎

Commit/bullshit ratio

2024-03-20 11:57:32

Big companies have all kinds of complicated ways to evaluate engineers. I’m skeptical that this is necessary, even at large scale. But that aside, I like the following simple system that works well and is yet to fail. Assign each engineer a commit/bullshit ratio. Then write off (or ideally fire) everyone whose ratio isn’t high” or very high”. Ignore everything else.

What’s a commit to bullshit ratio?

You already know what a commit is. It’s an object in git identified by a SHA1 hash that leaves the codebase in a more useful state than one before the commit was pushed.

You also intuitively know what bullshit is. It’s delays, bad taste, fighting a lot, being dogmatic, complaining, broken code, laziness, cynicism, activism, pedantry, entitlement. Bullshit is everything that makes your coworkers’ life more of a pain than it needs to be.

Everybody is allowed a little bullshit because if you only allow zero bullshit you can never work with anyone at all. But bullshit must be paid for with commits. The more bullshit you generate, the more commits you need to push. It’s not an exact science, but it doesn’t need to be. Everyone already knows. Think of a coworker and ask yourself– what is their commit to bullshit ratio? The answer probably leaps to mind. Maybe the answer is unusually high”. Or maybe it’s neutral at best”. Whatever it is, you already know.

This system is very easy to use. If you’re an engineer, keep your commit/bullshit ratio as high as possible. If you’re hiring and firing engineers, fire everyone whose ratio is lower than high”.