Frontier red team @AnthropicAI . Writing a book about being a dad. Now I live in London.
The RSS's url is : https://robertheaton.com/feed.xml
2024-09-02 08:00:00
A few weeks ago my 5-year-old and I tried playing Cat Crimes, a puzzle game in which you work out which of your cats ate your shoes. We had a wonderful time - for about 20 minutes.
In each round of Cat Crimes you get a puzzle card with a list of clues on it. You have to use the clues to figure out where in your front room each of your 6 cats were sitting. This tells you which one of them was responsible for your ruined stilettos. The game comes with 40 puzzle cards, ranging from the very easy to the mind-crushingly difficult.
However, the problem is that “very easy” to “mind-crushingly difficult” is a lot of ground to cover in 40 puzzles, and by the fifth puzzle the clues had become too abstract and difficult for my little man. In the first few puzzles each new clue allowed us to immediately place a new cat and then forget about it. For example, a clue might have told us that Mr. Mittens was sitting opposite Pip Squeak. We already knew where Pip Squeak was sitting, so we could work out exactly where Mr. Mittens was sitting too. This is the perfect level of complexity for a small child and his aging father at 6am.
However, as the puzzles get harder the clues stopped neatly resolving like this. They still narrowed down the possible pussy permutations, but they didn’t necessarily allow us to definitively place a new cat straightway. For example, a clue might have told us that Mr Mittens was sitting next to Pip Squeak. We know that Mr Mittens must have been on either Pip Squeak’s left or right, but we couldn’t say for sure which until we’d processed more clues. We might later learn that Duchess was sitting to Pip Squeak’s left, which in turn would tell us that Mr. Mittens must be sitting to her right.
To follow this extended chain of logic you need to hold multiple simultaneous superpositions in your head. This is fun and challenging and good puzzle design, but my kid hasn’t done superpositions at school yet so he didn’t get it. I tried drawing some pictures for him, but they made no sense even to me. We got angry with each other and eventually gave up on the game altogether.
But we’d really had a great time with those first few puzzles, so that evening I wrote us a computer program that generated an infinite number of new beginner level Cat Crimes challenges. I ran it 20 times and printed out the challenges and their solutions. The next day we continued happily solving age-appropriate cat mysteries together.
You can download:
The program works by generating random challenges until it finds one that has a single unique solution and meets certain constraints. The constraints ensure that the challenges are easy but not too easy. For example, a maximum of 3 cats can be asleep (meaning that they are out of the round), and a maximum of 2 clues are allowed to tell you a cat’s exact position.
In order to play the puzzles you’ll need to buy the Cat Crimes game.
Good luck, and let me know how you get on!
At first I tried asking ChatGPT to generate puzzles for me. My puzzles are guaranteed to be solvable and probably about the right difficulty, but since they’re randomly generated their solutions don’t generally have much of a careful narrative behind them.
I thought that ChatGPT might be able to do better. “Absolutely!” it said when I asked it, but it kept giving me back puzzles that had either several different solutions or no solutions at all. No dice!
To fix this I added a ChatGPT mode to my tool. In this mode the tool gives you a prompt to paste into ChatGPT. The prompt asks ChatGPT it to give you a Cat Crimes puzzle formatted in a specific way. You paste ChatGPT’s output back into the tool, and the tool checks whether the puzzle is valid. If it is then the tool converts the puzzle into printable card; if it’s not then it prints an error message for you to give to ChatGPT to help it fix the problem. You continue this debugging loop until you have a valid (and hopefully more fun) puzzle.
I’m not associated with Cat Crimes in any way; this is a completely unofficial fan project. Cat Crimes is owned and published by Thinkfun Inc. Go and buy it from them!
2024-07-09 08:00:00
The plane reached 10,000ft. I took out my laptop, planning to peruse the internet and maybe do a little work if I got really desperate.
I connected to the in-flight wi-fi and opened my browser. The network login page demanded credit card details. I fumbled for my card, which I eventually discovered had hidden itself inside my passport. As I searched I noticed that the login page was encouraging me to sign in to my airmiles account, free of charge, even though I hadn’t paid for anything yet. A hole in the firewall, I thought. It’s a long way from London to San Francisco so I decided to peer through it.
I logged in to my JetStreamers Diamond Altitude account and started clicking. I went to my profile page, where I saw an edit button. It looked like a normal button: drop shadow, rounded corners, nothing special. I was supposed to use it to update my name, address, and so on.
But suddenly I realised that this was no ordinary button. This clickable rascal would allow me to access the entire internet through my airmiles account. This would be slow. It would be unbelievably stupid. But it would work.
Several co-workers were asking me to review their PRs because my feedback was “two weeks late” and “blocking a critical deployment.” But my ideas are important too so I put on my headphones and smashed on some focus tunes. I’d forgotten to charge my headphones so Limp Bizkit started playing out of my laptop speakers. Fortunately no one else on the plane seemed to mind so we all rocked out together.
Before I could access the entire internet through my airmiles account I’d need to write a few prototypes. At first I thought that I’d write them using Go, but then I realised that if I used Python then I could call the final tool PySkyWiFi
. Obviously I did that instead.
Here’s the basic idea: suppose that I logged into my airmiles account and updated my name. If you were also logged in to my account then you could read my new name, from the ground. You could update it again, and I could read your new value. If we kept doing this then the name field of my airmiles account could serve as a tunnel through the airplane’s wi-fi firewall to the real world.
This tunnel could support a simple instant messaging protocol. I could update my name to “Hello how are you
.” You could read my message and then send me a reply by updating my name again to “Im fine how are you
.” I could read that, and we could have a stilted conversation. This might not sound like much, but it would be the first step on the road to full internet access.
I paid for the internet on my old laptop. I hadn’t finished migrating my data off this computer, so it still had to come everywhere with me. I messaged my wife to ask her to help me with my experiments. no, what are you talking about, i'm busy
she replied, lovingly.
So instead I took out my new laptop, which still had no internet access. I created a test airmiles account and logged into it on both computers. I found that I could indeed chat with myself by updating the name field in the UI.
sequenceDiagram
participant Computer1
participant AirmilesAccount as Airmiles Account<br>Name Field
participant Computer2
Computer1->>AirmilesAccount: TYPE: Hello how are you
AirmilesAccount->>Computer2: READ: Hello how are you
Computer2->>AirmilesAccount: TYPE: Im fine how are you
AirmilesAccount->>Computer1: READ: Im fine how are you
This was a lousy user experience though. So I wrote a command line tool to automate it. My tool asked the user for a message, and then behind the scenes it logged into my airmiles account via the website, using my credentials. The tool updated the name field of my test account with the user’s message. It then polled the name field every few seconds to see if my account’s name had changed again, which would indicate that the other person had sent a message back. Once the tool detected a new value it printed that value and asked the user for their next reply, and so on.
sequenceDiagram
actor Me
participant AirmilesAccount as Airmiles Account<br>Name Field
actor You
You->>AirmilesAccount: (poll for new data)
AirmilesAccount-->>You: (no new data)
Me->>AirmilesAccount: WRITE: Hello how are you
You->>AirmilesAccount: (poll for new data)
AirmilesAccount->>You: READ: Hello how are you
Me->>AirmilesAccount: (poll for new data)
AirmilesAccount-->>Me: (no new data)
You->>AirmilesAccount: WRITE: Im fine how are you
Me->>AirmilesAccount: (poll for new data)
AirmilesAccount->>Me: READ: Im fine how are you
Using this tool I could chat with someone on the ground, via my terminal. I wouldn’t have to pay for wifi, and neither of us would have to know or care that the messages were being sent via my SkyVenture Premium Gold Rewards account.
I still needed to find someone who would chat with me. But this was a good start!
NB: at this point I didn’t want to send any more automated data through my airmiles account in case that got me in trouble somehow. Nothing I was doing could possibly cause any damage, but some companies get jumpy about this kind of thing.
I therefore proved to myself that PySkyWiFi would work on my airmiles accounts too by updating my name ten or so times in quick succession. They all succeeded, which suggested to me that my airmiles account probably wasn’t rate-limiting the speed or number of requests I could send to it.
I then wrote the rest of my code by sending my data through friendly services like GitHub Gists and local files on my computer, using the same principles as if I were sending it through an airmiles account. If PySkyWiFi worked through GitHub then it would work through my Star Power UltimateBlastOff account too. This had the secondary advantage of being much faster and easier for iteration too.
I’m going to keep talking about sending data through an airmiles account, because that’s the point I’m trying to make.
The tunnel I’d constructed through my airmiles account would be useful for more than IMing. For my next prototype I wrote a program that would run on a computer back at my house or in the cloud, and would automatically send information from the real world up to me on the plane, through my airmiles account. I could deploy it before I left for my next flight and have it send me the latest stock prices or football scores while I was in the sky.
To do this I wrote a daemon that would run on a computer that was on the ground and connected to the internet. The daemon constantly polled the name field in my airmiles account, looking for structured messages that I sent to it from the plane (such as STOCKPRICE: APPL
or SCORE: MANUNITED
). When the daemon saw a new request it parsed it, retrieved the requested information using the relevant API, and sent it back to me via my airmiles account. It worked perfectly.
Now I could use my first prototype to send IMs through my airmiles account, and I could use my second prototype tio follow the markets and the sports.
It was time to squeeze the entire internet through my airmiles account.
During the rest of the flight I wrote PySkyWiFi. PySkyWiFi is a highly simplified version of the TCP/IP protocol that squeezes whole HTTP requests through an airmiles account, out of the plane, and down to a computer connected to the internet on the ground. A daemon running on this ground computer makes the HTTP requests for me, and then finally squeezes the completed HTTP responses back through my airmiles account, up to me on my plane.
This meant that on my next flight I could technically have full access to the internet, via my airmiles account. Depending on network conditions on the plane I might be able to hit speeds of several bytes per second.
DISCLAIMER: you obviously shouldn’t actually do any of this
Here’s how it works (and here’s the source code).
PySkyWiFi has two components:
Here’s a simplified diagram:
sequenceDiagram
actor Me
participant SkyProxy as Sky Proxy
participant AirmilesAccount1 as Airmiles Account
participant GroundDaemon as Ground Daemon
participant Website as example.com
Me->>SkyProxy: HTTP request
SkyProxy->>AirmilesAccount1: HTTP request
AirmilesAccount1->>GroundDaemon: HTTP request
GroundDaemon->>Website: HTTP request
Website->>GroundDaemon: HTTP response
GroundDaemon->>AirmilesAccount1: HTTP response
AirmilesAccount1->>SkyProxy: HTTP response
SkyProxy->>Me: HTTP response
Setup starts before you leave your house. First you start up the ground daemon. Then you get a taxi to the airport, get on the plane, and connect to the plane’s wi-fi network. You boot up the sky proxy on your laptop. Your PySkyWiFi relay is now ready to go.
You use a tool like curl
to make an HTTP request to the sky proxy that you’ve started on your laptop. You address your request to the proxy (eg. localhost:1234/
) and you put the actual URL that you want to query inside a custom HTTP header called X-PySkyWiFi
. For example:
curl localhost:1234 -H "X-PySkyWiFi: example.com"`
The X-PySkyWiFi
header will be stripped by the ground daemon and used to route your request to your target website. Everything else about the request (including the body and other headers) will be forwarded exactly as-is.
Once you make your request it will hang for several minutes. If by some miracle nothing breaks then you’ll eventually get back an HTTP response, exactly as if you’d sent the request over the normal internet like a normal person. The only difference is that it didn’t cost you anything. You will now almost certainly pay for wi-fi, because your curiosity has been satisfied and your time on this earth is very short.
Here’s what happens behind the scenes:
sequenceDiagram
actor Me
participant SkyProxy as Sky Proxy
participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
participant GroundDaemon as Ground Daemon
participant Website as example.com
Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: example.com"
SkyProxy->>AirmilesAccount1: Write request chunk 1
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read request chunk 1
GroundDaemon->>AirmilesAccount2: Ack request chunk 1
SkyProxy-->>AirmilesAccount2: (poll for new data)
AirmilesAccount2->>SkyProxy: Read ack for request chunk 1
SkyProxy->>AirmilesAccount1: Write request chunk 2
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read request chunk 2
Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
GroundDaemon->>Website: GET / HTTP/1.1<br>Host: example.com<br><etc>
Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>
GroundDaemon->>AirmilesAccount2: Write response chunk 1
SkyProxy-->>AirmilesAccount2: (poll for new data)
AirmilesAccount2->>SkyProxy: Read response chunk 1
SkyProxy->>AirmilesAccount1: Ack request chunk 1
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read ack for request chunk 1
GroundDaemon->>AirmilesAccount2: Write response chunk 2
SkyProxy-->>AirmilesAccount2: (poll for new data)
AirmilesAccount2->>SkyProxy: Read response chunk 2
Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>
In order:
curl
call. It splits the request into chunks, because the entire request is too large to fit into you airmiles account in one gocurl
. As far as curl
is concerned this is a perfectly normal HTTP response, just a little slow. curl
has no idea about the silliness that just transpiredThe sky proxy and the ground daemon are relatively simple: they send HTTP requests and parse HTTP responses. The magic is in how they squeeze these requests and responses through an airmiles account. Let’s look closer.
PySkyWiFi’s communication logic is split into two layers: a transport layer, and a network layer. The transport layer’s job is to decide what data clients should send to each other. It dictates how senders should split up long messages into manageable chunks, as well as how senders and receivers should signal information like “I am ready to receive another chunk.” The PySkyWiFi transport layer is somewhat similar to the TCP protocol that powers much of the internet, if you squint very hard and don’t know much about TCP.
By contrast, the network layer’s job is to actually send data between clients, once the transport protocol has decided what that data should be. It’s vaguely similar to the IP protocol, if you squint even harder and know even less what you’re talking about.
This division of responsibility between layers is useful because the transport layer doesn’t have to care about how the network layer sends its data, and the network layer doesn’t care what the data it sends means or where it came from. The transport layer just hands the network layer some data, and the network layer sends it however it likes.
This separation makes it easy to add support for new airmiles platforms, because all we have to do is implement a new network layer that reads and writes to the new type of airmiles account. This separation also allows us to write test versions of the network protocol that write and read from local files instead of airmiles accounts. In each case the network layer changes, but the transport layer stays exactly the same. Here’s how they work.
A PySkyWiFi transport connection between two clients consists of two “pipes” (or “airmiles accounts”). Each client has a “SEND” pipe that it can write data to, and a “RECV” pipe that it can read from. Clients write to their SEND pipe by writing data to it, and they read from their RECV pipe by constantly polling it and seeing if anything has changed.
flowchart LR
Client1 --> Client2
Client2 --> Client1
From the transport layer’s point of view, a pipe is just something that it can write and read data from. Beyond that the transport layer doesn’t care how its pipes work.
At any given moment a PSWF (PySkYWiFi) client can only either send or receive data, but not both. A client in send mode will not see data sent by the other client, and a client in receive mode should never send data because the other client won’t see it. This is unlike TCP, where clients can send or receive data at ay time.
When squeezing HTTP requests and responses through an airmiles account, the sky proxy sends the first message and the ground daemon receives it. Once the sky proxy has finished sending its HTTP request it switches to receive mode and the ground daemon switches to send. The ground daemon makes the HTTP request and sends back the response, at which point the two switch roles again so that the sky proxy can send another HTTP request.
PSWF uses small pipes (such as an airmiles name field) that can’t fit much data in them at once. This means that it takes some work and care to squeeze long messages (like HTTP requests) through them.
To send a long message, the sender first splits up their message into chunks that will fit into their SEND pipe. They then send each chunk down the pipe one at a time.
To begin a message, a sender starts by sending its first chunk of message data inside a DATA
segment:
A
DATA
segment consists of:
- The letter
D
- The sequence number of the chunk (a number that uniquely identifies the chunk, padded to 6 digits)
- The actual chunk of data.
For example, a data segment in the middle of a message might read:
D000451adline": "Mudslide in Wigan causes m
Once the sender has sent a DATA
segment, it pauses. It wants to send its next DATA
segment, but it can’t overwrite the airmiles account name field until it knows that the receiver has received and processed the previous one.
The receiver tells the sender that it’s safe for to send a new DATA
segment by acknowledging every segment that it reads. The receiver does this by writing an ACK
segment to its own SEND pipe:
An
ACK
segment consists of:
- The letter
A
- The sequence number of the segment that is being acknowledged (padded to 6 digits)
For example:
A000451
The sender is constantly polling its own RCV pipe to check for changes, and so it reads this new ACK
segment promptly. Once the sender reads the ACK
, it knows that the receiver has received the segment corresponding to the ACK
’s sequence number. For example, if a sender receives an ACK
segment with sequence number 000451
, the sender knows that it’s safe to send the next DATA
segment with sequence number 000452
. The sender therefore pulls the next chunk from its message and constructs a new DATA
segment using this chunk and sequence number. The sender writes the new segment to its SEND pipe, and then pauses waits for another ACK
.
This loop continues until the sender has sent all the data in its message. To tell the recipient that it’s finished, the sender sends an END
segment.
An
END
segment is just the letterE
.
When a receiver sees an END
segment it knows that the sender’s message is over. The sender and the receiver swap roles. The old sender starts polling its RECV pipe for DATA
segments, and the old receiver starts chunking up its response message and sending it through its pipe, exactly as before.
None of this transport logic cares about the details of the network layer through which the segments are sent. The transport layer just needs the network layer to provide two pipes that it can read and write to. The network layer can pipe this data around via local files, a Discord profile, or an airmiles account. This genericness is what allows PySkyWiFi to work with any airline’s airmiles account, so long as the airline allows you to login to it from the plane without paying.
Here’s how PSWF uses transport protocol segments to exchange long messages:
sequenceDiagram
actor Me
participant SkyProxy as Sky Proxy
participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
participant GroundDaemon as Ground Daemon
participant Website as robertheaton.com
Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: robertheaton.com"
SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
GroundDaemon->>AirmilesAccount2: Write ACK segment<br>sequence number=000000
SkyProxy-->>AirmilesAccount2: (poll for new data)
AirmilesAccount2->>SkyProxy: Read ACK segment<br>sequence number=000000
SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
GroundDaemon->>Website: GET / HTTP/1.1<br>Host: robertheaton.com<br><etc>
Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>
GroundDaemon->>AirmilesAccount2: Write DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
SkyProxy-->>AirmilesAccount2: (poll for new data)
AirmilesAccount2->>SkyProxy: Read DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
SkyProxy->>AirmilesAccount1: Write ACK segment<br>sequence number=000000
GroundDaemon-->>AirmilesAccount1: (poll for new data)
AirmilesAccount1->>GroundDaemon: Read ACK segment<br>sequence number=000000
Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>
The transport layer decides what data the clients should send each other, but it doesn’t say anything about how they should send it. That’s where the network protocol comes in.
The network layer’s job is to send data between clients. It doesn’t care about where the data came from or what it means; it just receives some data from the transport layer and sends it to the other client (typically via an airmiles account).
This means that the network layer is quite simple. It also means that adding a new network layer for a new airmiles platform is straightforward. You use the new platform to implement a few operations and a few properties (see below), and then the transport layer can automatically to use your new airmiles platform with no extra work.
A network layer consists of two operations:
send(msg: str)
- write msg
to storage. For an airmiles-based implementation, this writes the value of msg
to the name field in the user’s airmiles accountrecv() -> str
- read the message from storage. For an airmiles-based implementation, this reads the value of the name field from the user’s airmiles account.A network layer implementation must also define two properties:
sleep_for
- the number of seconds that the transport layer should sleep for in between polling for new segments from a RECV pipe. sleep_for
can be very low for test implementations like files, but it should be at least several seconds for an implementation like an airmiles account. This is in order to avoid hammering remote server with too many requests.segment_data_size
- the number of characters that the transport layer should send in a single segment. Should be equal to the maximum size of the airmiles account field being used to transfer segments (often around 20 characters).A network layer implementation can also optionally provide two more operations:
connect_send()
- a hook called by the sender when a SEND pipe is initialised. In an airmiles-based implementation this allows the client to login to the platform using a username and password. This gives the client a cookie that it can use to authenticate future send
and recv
calls.connect_recv()
- a hook called by the receiver when a RECV pipe is initialisedIf you fill in all these methods, you’ll be able to use PySkyWiFi on a new airline. But again, don’t.
When writing a network layer that uses a new airmiles provider, there are a couple of tricks that can make your implementation faster and more reliable.
Airmiles HTML forms usually don’t let users include non-alphabetic characters in their name. Stephen
will probably be allowed, but GET /data?id=5
will probably be rejected.
To work around this, the network layer should encode segments using base26 before writing them to an airmiles account. base26 is a way of representing a string using only the letters A
to Z
. In order to convert a byte string to base26, you convert the bytes to a single large number, then you represent that number using a counting system with base 26 (hence the name) where the digits are the letters A
to Z
.
def b26_encode(input_string: str) -> int:
# Convert input string to a base-256 integer
base256_int = 0
for char in input_string:
base256_int = base256_int * 256 + ord(char)
# Convert base-256 integer to base26 string
if base256_int == 0:
return 'A' # Special case for empty input or input that equals zero
base26_str = ""
while base256_int > 0:
base26_str = chr(base256_int % 26 + 65) + base26_str
base256_int //= 26
return base26_str
b26_encode("Hello world")
# => 'CZEZINADXFFTZEIDPKM'
The transport layer never needs to know about this encoding. The network layer receives some bytes, encodes them using base26, and writes this encoded string of A
to Z
to the airmiles account. When the network layer reads the base26 value back out of the airmiles account, it decodes the encoded string back into a number and then back into bytes, and then returns those bytes to the transport layer.
Encoding a string using base 26 makes it significantly longer, just like how it takes many more digits to represent a number using binary than decimal. This reduces the bandwidth of our protocol. We could increase our bandwidth by using base52 (using both upper- and lower-case letters) instead of base26, which would shorten it somewhat. This is left as an enhancement for version 2.
Another way to increase our PSWF bandwidth is to increase the segment size that a network layer can handle. If we double the size of our segments, we double the bandwidth of our protocol.
Fields in airmiles accounts usually have length limits. For example, you might not be allowed to set a name longer than 20 characters. However, we can maximise our bandwidth by:
Suppose we have control over 5 fields that can each store 20 characters. Instead of using one field to transmit segments of 20 characters, we can split a 100 character segment into 5 chunks of 20 and update them all at once in a single request. The receiver can then read all 5 fields, again in a single request, and stitch them back together to reconstruct the full segment.
CONNECT
It would be better if PySkyWiFi used HTTP CONNECT
requests to set up the tunnel from the sky proxy to the target site, instead of manually tossing around HTTP requests. CONNECT
requests are how most HTTP proxies work, and using them would allow PySkyWiFi to act as the system-level proxy and so handle requests from a web browser. It would also mean that PySkyWiFi would negotiate TLS connections with the target website directly, so its traffic would be encrypted as it passed through the airmiles account.
On the other hand, using CONNECT
would also be a lot more work and I’ve already taken this joke way too far.
When I was done with all of this I used PySkyWiFi to load the homepage of my blog using curl
, tunneling the data via a GitHub Gist. Several minutes later I got a response back. I scrolled around the HTML and reflected that this had been both the most and least productive flight of my life.
2024-03-25 08:00:00
For the last eighteen months I’ve been writing a book about being a dad. Two weeks ago I finished the first draft!
The book is inspired by my blog posts about parenting, but most of it is brand new and I think it might be very good. It’s about childbirth, covid, careers, old friends, new friends, kid friends, chess, pianos, screens, AI, marriage, and much more.
Now that I’ve finished a draft I’m looking for an agent and a publisher. I’ve never done this before so I’d appreciate help and advice! Please get in touch if:
If you want to find out when the book is ready, subscribe to my newsletter. If you have friends who you think would enjoy the book, tell them about it and make them subscribe too.
Thanks!
2023-10-18 08:00:00
I know that time spent with my kids is supposed to be its own reward, and it is. But I also want to believe that what I do in this time matters, as much as possible. Elegantly handling a tantrum feels more worthwhile if I’m helping my son learn to express his feelings, not just making it through another day. I find more contentment at the end of a long afternoon if I think that I did a good job and that this good job will echo through the ages, or at least after bedtime.
I want my kids to be happy and fulfilled, skilled and accomplished, and I want to be able to help. Shouldn’t I try to pass my good habits onto them while shielding them from my dark thoughts? This shouldn’t be too hard; I’m their dad, they see me every day. My eldest child, Oscar, is only 4, but I already think he might be a little remarkable, and thick black lines seem to lead back to what Gaby (my wife) and I do with him. The spark and smarts are his, but I feel like I must surely be making a difference.
However, I recently read “Selfish Reasons To Have More Kids” by Bryan Caplan, an economist and blogger, and it’s turned me a little upside-down. Caplan observes that many parents wreck themselves trying to boost and polish their children. He argues that this isn’t just a bad tradeoff, but an almost total waste of time. He presents reams of remarkable research suggesting that, in Western middle-class families, parents’ choices have almost no influence on their children’s long-term health, intelligence, happiness, success, or character. Parents achieve nothing by sending their kids to extra maths lessons, hiding the TV remote, or even teaching them the value of hard work. Caplan shows that upbringing counts for almost nil (at least within the Western middle-class), and that genetics and randomness are everything. It appears that nothing within parental control matters.
Caplan presents his arguments as a gift, one that frees parents from eighteen years of guilt and wasted effort. In his telling there’s little that parents can do to influence their children in the long-run, so there’s no point and no duty for them to try. Kids have genes and free will; now let go and enjoy your time together.
Caplan knows that some parents will rebel against his arguments. I certainly did. I heard him telling me that I don’t matter, at least not in the ways that I’d hoped. I want parenting to be a deep, complex vocation, and I want to spend the coming decades playing a domestic game of skill and consequence. The idea of having children who I have no influence over is scary, like living with werewolves. Randomness and outside forces are everywhere and the kids are mutating while I sleep.
But even though I want to be relevant, I don’t want to waste my time. Begrudgingly, I kept reading.
Caplan’s claim that parents have little long-term influence on their children seems absurd at first. Contra Caplan, I see my influence in my children every day. Oscar likes the same music as me. He used to be terrified of playgrounds but Gaby screwed a wooden ladder to his bedroom wall and now he’s mostly normal. I stubbed my toe and shouted “fuck!” and he whispered “fuck indeed daddy, you sound frustrated,” failing to calm me in the same way that I fail to calm him. This is surely common sense.
But common sense grows in unscientific environments. Nature and nurture are conflated, we don’t see the aggregates, and we don’t see the long-term. Kaplan agrees that parents have huge influence over their children in the short-term, but he also argues that this influence fades, sometimes fast, sometimes slow, but it does fade, and it vanishes completely when they grow up and finish becoming whoever they are. Kids are resilient to setbacks, but they’re resilient to assistance too.
In order to rigorously test theories like this, researchers study large groups of children. However, most kids are useless to them. Suppose that two happy parents have and raise a child. The child grows up with their parents, and in time they become a happy adult too. It’s impossible to know whether the child’s happiness comes from happy genes that they inherited from their happy parents, or from the happy environment that their happy parents raised them in. Their parents’ genes and choices are irreversibly mixed together. Even with a huge database of children, parents, and measurements of happiness, causalities are impossible to itemise.
Fortunately, researchers can still extract good data from special children, like identical twins who were separated at birth. These kids give researchers two copies of the same genes, raised in different environments. Since separated identical twins share genes but not environments, any systematic differences between them must be due to their different upbringings. If identical twins raised separately bear no resemblance to each other but are similar to their adopted siblings, this would suggest that the twins were shaped by their divergent upbringings. If the twins remain similar, despite growing up entirely separately, this would suggest that they were made by their identical genes.
Researchers slice and measure these children, pulling apart the effects of nature and nurture. Twins separated at birth are the gold standard, but non-twin adoptees and non-adopted twins can work too. The researchers find or build databases of useful children (who may now be adults), and compare their grades (perhaps from school records), income (perhaps from tax records) or personalities (perhaps from administering personality tests directly). The evidence from this data is strong and consistent: a near-zero effect of upbringing on character, happiness, and almost everything else.
The studies are clever, but are they valid? They control naturally for almost everything, but they still aren’t perfect. For example, maybe parents who choose to adopt are meaningfully different to the average parent, meaning that conclusions based solely on them don’t generalise to the rest of the population. Maybe parents who choose to adopt and then also agree to be part of a long-term study are even more different. Maybe women who have twins are different. Maybe twins themselves are different too.
But even if these sampling biases are material, I doubt that they’re large enough to tear down the studies’ broad conclusions. I’d guess that adoptees and twins separated at birth are a good enough sample to represent humanity, and that even if they aren’t fully representative, they probably aren’t masking a giant effect that skips twins and applies only to the rest of us. If researchers were able to fully control for sampling biases then this might shift their estimate of the effect of parental influence from “incredibly low” to merely “very, very low”.
Caplan admits that the studies are primarily focussed on the Western middle class, because that’s where the data is. This hurts the studies’ generalisability but binds me - an orthodox member of their class - even tighter. All said, I think I have to assume that the studies pointing towards the primacy of genes are valid for people like me.
But do the studies definitely apply to me, or you, specifically? They find no effect of parenting style on children’s adult outcomes, within the range of normal parenting styles in middle-class Western families. That word “normal” might provide an opening for a determined parent to squeeze through in order to regain their lost gravity. The studies suggest that there’s no difference between the free-range and regimented ends of the normal spectrum, but they can’t say anything definite about what happens beyond the edges of normality.
Caplan recommends that parents dissolve their fears and ambitions in the acid balm of the evidence. But there is another response that’s consistent with the data, although it might not necessarily be a good idea: redouble your efforts and head for the ambiguity beyond the well-studied centre, where the evidence might not stretch. More enrichment, more practice, more effort, fewer half-measures.
This makes some common sense; think about outlandish famous families. The Williams sisters must be naturally gifted tennis players, but they surely wouldn’t be the same dominant champions without their obsessive dad. The Polgar sisters would have been unremarkable chess players without theirs. These types of childhoods are so rare that they can’t possibly be adequately represented in any of the twin datasets, so the research doesn’t have anything direct to say about them. Twin studies don’t disprove the Jackson Five.
Caplan claims that kids are elastic, and that whether helped or harmed they tend to snap back to their natural state. However, I learned in physics classes that not even elastic is perfectly elastic. It pings back to its original shape after mild deformation, but it can still be altered permanently if stretched beyond a point called its elastic limit. Whilst the sum of small interventions on a child might be zero, it might still be possible to permanently deform them (in a good way) through the application of massive force. This metaphor is so perfect that it must surely be true.
In fact, even Caplan is stretching his own kids like this. In a 2015 blog post (the book was written in 2011), he describes the rigorous homeschool that he runs for his two eldest (twins, coincidentally). The main reason he homeschools them, he says, is because they are particularly academic kids, and they all think that they will enjoy an uncompromising homeschool more than a conventional one. However, he also suspects that his homeschool might be so off-the-scale remarkable that it vaults over the evidence and produces better adult outcomes, despite his claims that this is usually impossible. He writes:
I suspect – though I’m far from sure – that the Caplan Family School is such an exceptional experience that ordinary twin and adoption evidence isn’t relevant. For example, my sons are plausibly the only 12-year-olds in the nation taking a college class in labor economics.
Should you or I try to do this too? It’s almost always delusional to put yourself and your children in a category called “exceptional”, and this might not even be a category that you want to be in. I do wonder, though, where does “normal” end and “exceptional” begin? Where’s the elastic limit, and how weird is it really? Is anything less than the Williams sisters a waste of time? Or does the curve bend much sooner than that? Even if you don’t want to do anything too odd by modern standards, a lot of the data in these studies comes from dead twins brought up decades ago. Today’s parenting zeitgeist might not necessarily be better than the old days, but it’s certainly different. How well does data from a different era in parenting generalise to today? Is it possible that even normal parenting today is different enough from several decades ago to have a material impact?
Is reading a respected parenting manual and teaching your toddler to add and multiply too normal and futile, or just crazy enough that it might work? I don’t want to be Richard Williams and I couldn’t even be Bryan Caplan, but I could be a bit weirder than average if that was worthwhile and harmless. I might be inventing straws to clutch at, but as far as I know there’s no cast iron science out here so we’re allowed to make things up again and I can assert a world in which I have agency.
I’ve drilled a tenuous airhole in Caplan’s claims, but his evidence is still strong, spiky, and hard to digest without a rupture in my plans. Normally when confronted with new evidence you can wisely say “it’s probably a combination of everything” and then maybe do a bit more or less of something, or not. However, Caplan argues specifically that parenting is not a combination of everything. Everything is nature, at least in the long-term. His arguments are backed by simple and compelling studies that are hard to wishy-wash away and that block the easy path back to the status quo.
But it’s drastic to change how you raise your kids based on a short book and some studies that you aren’t going to read. The book’s claims are extreme, at least compared to what I used to think, and it’s hard to build enough confidence to change your mind about things that matter to you. I rarely need to develop solid beliefs about messy, unsettled topics that I’m not an expert in. I’ve skimmed a few paper abstracts and some reviews of the book, but that doesn’t feel like enough. Caplan seems smart and honest but this isn’t settled science and how do I know he’s not missing or ignoring grave methodological gaffes?
I can’t unread the book, and as someone who likes to consider themselves a somewhat scientific, data-driven parent, I can’t ignore it. So what should I do now?
I think I value my children for who they are already, but it’s good to be reminded to start there. I don’t care whether I have any long-term influence on my friends, I just like spending time with them and being there if they need me. So why do I care about being able to shape my kids? The desire to help your children is surely natural and normal, to a degree, but that doesn’t mean it’s always helpful.
Caplan says that I can stop worrying about whether I’m wrecking Oscar’s future habits and character. I try not to fret like this, but often it’s unavoidable. Does he play by himself enough? Does he watch too much TV? Are we letting him be too picky with his food? Should we use more discipline when he won’t share? Less discipline? I extrapolate today’s small behavioural decisions ten years forward into a bleak future. I fear that parenting is a system of positive feedback loops, where deviations become liberties that congeal into nightmares. But Caplan says that everything is fluid and reverts to the mean and I shouldn’t sweat the deviations. Bribe kids to behave, give them unlimited social media time, none of it matters, they’re much less of a blank slate than you think. Nothing will come back to bite you, and if you do get bitten then there was nothing you could have done to stop it.
Still, I’m not ready to stop trying to help my kids flourish. I’m not confident enough that Caplan’s evidence applies to my family and my era, and in any case at the moment I don’t have to make any tradeoffs. Oscar and I do a lot that I’d previously assumed would benefit him in the longterm: maths, reading, piano. For now he enjoys nearly all of it and so do I, so nothing is being sacrificed. I’m sure that this will change as he gets older, but at the moment it’s more fun for me to talk to Oscar about multiplication and prime numbers than pretend to order another pasta with cheese from his play-dough restaurant. On some days he doesn’t want to do any sums and tells me to get lost. But even if I was certain that the long-term impact on his future earnings would be zero, I’d still take him to the science museum and try to remember how aeroplanes work.
This sounds relaxed and balanced, but it’s easy to be sanguine when there aren’t any dilemmas. If he stops being interested in things that I think are valuable then I’m sure I’ll feel anxious, and I’ll struggle when he starts making decisions that I think are mistakes. I’ll reevaluate when I’m forced to, but for now I hope that it’s possible to both try to help your kids excel and to live with them in the moment.
I wonder if these studies should change how I see the rest of the world too. I’m friends with my old physics teacher from high school. I went to his house for lunch and told him about Caplan’s book. He was horrified. “If that’s true, is there any point in me trying to be a good teacher?” he said. This had occurred to me too. If parents truly have no lasting influence on their children, how can schools, or local theatres, or any kind of small public policy intervention hope to have any? Maybe it’s even harder than I thought to make any long-term difference to anything.
And how should I think about traits that have value but don’t show up in survey data? For example, I can take Oscar to piano lessons and encourage him to practice. Most adults who know how to play the piano probably had lessons when they were younger, and their parents probably pushed them at least a little. Does being able to play the piano matter, morally and cosmically, even if it has no impact on income, happiness, or anything else that can easily be measured? The harder you think and the more precise the questions, the more you need a detailed moral philosophy.
It’s helpful to have thousands of elderly twins reminding me that my kids will probably be fine, whatever I do. Everything reverts to the mean, the twins murmur kindly. Don’t be too smug when things are going the way you hoped, and don’t despair when they aren’t.
I’m not ready to fully accept my obsolescence yet. We’ll watch more TV but we’ll keep doing maths together. One day we’ll start to disagree, and then we’ll reassess. Caplan does throw me one bone: “parents [have] moderate influence over how much their children like them.” Even if nothing I do adds up to anything, the days will hopefully make a happy childhood.
Read more of my essays about parenthood here. Plus, I’m writing a book about having kids! Subscribe to my newsletter for updates.
2023-10-13 08:00:00
Introductions to deep learning are too complicated and spend too much time trying to thrill you with details and real-world applications.
This makes them a frustrating place to start. You already know that deep learning is amazing and that it actually works on real problems. You know that most of the hard work in industry is in the data cleaning. You don’t want to set up a new environment, or play with parameters, or get dirty in the data.
The actual first thing you want to do is to train a model, as soon as possible, and it doesn’t matter how simple it is. Once you’ve trained your own model you’d be more than happy to learn about overfitting, data cleaning, and splitting strategies as well. But first you just want to create something yourself and see it work.
Hello Deep Learning is the missing introduction to deep learning. It’s a series of challenges, each of which gives you a task and a perfect, synthetic dataset and asks you to train and play with a trivial model. The challenges cover image generation, text classification, and tabular data, and each one:
Hello Deep Learning allows you to rapidly experiment with simple models and take your first steps in a calm, kindhearted environment. It gets you ready to leap into the detail and chaos of the real-world.
You can get the challenges, data generation scripts, and setup instructions on GitHub. Let me know how you get on; if they’re useful then I’ll make more!
Train a classifier that distinguishes between red squares and yellow circles. Your program should be able to:
The repo includes a script that generates 200 images of circles and 200 of rectangles and saves them in the data/shapes/
directory.
vision_learner
based on the resnet18
pretrained model.Train a classifier that distinguishes between text inputs of positive and negative words, for example "happy chirpy awesome"
and "awful terrible heinous"
. Your program should be able to:
The repo includes a script that generates 1000 text files containing positive words and 1000 containing negative words and saves them in the data/sentiment_text/
directory.
language_model_learner
based on the AWD_LSTM
pretrained model, and a fastai text_classifier_learner
.Train decision trees that reverse-engineer the rules from src/generators/random_tabular.py that were used to randomly generate a tabular dataset. Your program should be able to
dtreeviz
libraryThe repo contains a script that generates 1 JSON file containing 10,000 data points and saves it in the data/random_tabular/data.json
file.. Each data point contains:
a
, b
, c
, d
, e
, and f
. Each of these is a random integer between 0 and 100.y
. This label is derived deterministically from the features using simple rules contained in src/generators/random_tabular.py .DecisionTreeRegressor
and RandomForestRegressor
.My solutions are in src/examples/
in the repo, although they’re not the only way to solve the challenges, and they’re almost certainly not the best way to solve them either.
Get the challenges, data generation scripts, and setup instructions on GitHub. Let me know how you get on; if they’re useful then I’ll make more!
2023-08-30 08:00:00
In the last 10 years I’ve given more than 400 coding interviews. That’s the equivalent of 2 working months just watching strangers having a crack at the same few programming challenges. Some of my would-be colleagues solve the problems without incident, but others run into trouble for similar, easily-correctable reasons. I wish I could give better feedback, but because of legal and time constraints that’s not how the system works.
So instead of personalised advice, I’ve written this cheat sheet containing 22 tips about how to pass a programming challenge interview with me. The tips can’t replace skill and practice, but they will help you calm your nerves, avoid silly mistakes, and showcase the best of your ability. Most of the tips are easy to implement, and put together they’ll increase the number of interviews that you pass.
You might use Google to look up the questions that my company tends to ask before the interview. This does go against the spirit of the thing, but I’m not sure how much of a duty you have to uphold the integrity of my company’s interview process. Plenty of people do it, and I imagine that it’s usually very helpful. I do have to ding you if I notice that you’ve done it, though.
If you’re doing the interview on the same computer you used to look up the question, make sure to close your tabs and delete your browser history before you start. I’ve interviewed several people who left a tab containing the leaked question open, although interestingly they all did quite poorly.
If you’re using your own laptop, set up a basic hello-world program and make sure you can run it. Write your solution in this file. This makes sure that you don’t waste time at the start of the interview scrabbling around with a broken environment.
I’ve interviewed several people who used a laptop belonging to their current company, but then realised that it was so customised that they didn’t know how to use it to write and run a basic program from scratch. They hacked around the problem by finding a simple part of their company’s codebase and editing it to answer the question. This cost them time and stress, looked bad, and surely violated their employment contract.
It’s usually not a technical interviewer’s job to assess your experience and goals. I might start an interview by asking “My name is Rob, I work on this and that, what about you?” but this is just an icebreaker. Answer succinctly like “I’m Sarah, I work on the Infrastructure team at Badger corp. It’s our job to make sure that our servers are reliable, secure, and easy to manage.” I’ve had people go on for several minutes, which achieves nothing and wastes your time and energy.
Ask how long you have to answer the question so that you can manage your time accordingly. I should tell you this unprompted, but sometimes I forget and interviewees forget to ask.
When you get the question I think it’s useful to have a repeatable script to check you understand what you’re being asked to do and to get any extra information that you need. I’d suggest:
A few more details:
I’ll either describe the question or send you a written version. Once you think you understand it, restate it briefly:
“So the goal is to return all the triplets of numbers from the input list that add up to 0. Does that sound right?”
This allows the interviewer to correct you if you’ve got the wrong idea, and probably helps start to establish a little rapport.
I admit that it’s possible I only approve of rephrasing the question like this because it shows that you’ve read the same books saying that it’s a good habit as I have.
Once you understand the problem, try to ask a clarifying question or two. This gives you an air of thoughtfulness and might even help you answer the question as well. Two off-the-shelf options are:
The goal of any coding interview is to write good code at a decent pace, but the details can vary.
Some interviews have multiple parts, and some deliberately have more parts than anyone could reasonably finish in the time allowed. Ask how far you’re expected to get so that you can plan ahead.
Some interviewers are looking for efficient code; some are looking for clean code. Most are probably looking for a bit of both. Ask them what they care about so that you can focus on the right things.
Simple unit tests will help you answer most questions (see below), but some interviewers might not want them, or some questions might not be easily testable (eg. those that use external dependencies like HTTP requests). You might as well clarify.
Start by drafting your program using prose or pseudocode. Don’t start by writing actual code; even if you know that step 1 will be to read an input file, only read it once you know roughly what you’re going to do afterwards.
This will probably help you get your own head in order, and will also allow a benevolent interviewer to correct any big misconceptions early, or at least know where they’ve come from. The more you say and do, the more partial credit an interviewer can give you if your code doesn’t fully work.
Once you’ve started writing code, run it as frequently as you can. Some people start interviews by coding for 30 minutes, then execute their program for the first time with 15 minutes to go, then spend those 15 minutes drowning in bugs and faulty assumptions only now revealed. To avoid this, run your program often. Print the latest state and make sure it looks about right.
If you code for 40 minutes straight, then run your code once and it works first time, that’s fine, you still pass. But running your code often gives you insurance when things go wrong. It makes it easier to give you partial credit for partial correctness, and probably helps you iterate and solve the problem faster too.
Your code won’t actually work first time. When it fails, try to state specific hypotheses about why. Try saying things like:
“My hypothesis is that I’m not filtering the list correctly. To check this I’m going to print the contents of the list after my filter, and in theory it should have 5 elements. Oh actually it does have 5 elements, so the filter can’t be the problem here. In that case the problem must come after this line. The part of the code I’m least confident in is this bit, so I’m going to add some more print statements. Now…”
Hypotheses make your debugging methodical and easier to follow. They’re useful in real life too, but it’s harder to have good habits when no one’s watching.
People often get derailed by the same trivial mistakes. Three of the most common ones:
/Users/rob/interview/data/input.csv
instead of ../data/input.csv
). This is inelegant but forgivable and harder to mess up.# ---- MUTATING ----
# CORRECT
# sort() sorts the list by mutating it in-place
inp1 = [3,7,1,8]
inp1.sort()
print(inp1)
# => [1,3,7,8]
# WRONG
# sort() doesn't return anything, so using its return value is pointless
inp2 = [3,7,1,8]
out2 = inp2.sort()
print(out2)
# => None
# ---- NON-MUTATING ----
# CORRECT
# sorted() sorts the list by returning a new list, so we have to use its
# return value
inp3 = [4,8,2,9]
out3 = sorted(inp3)
print(out3)
# => [2,4,8,9]
# WRONG
# sorted() doesn't mutate the input list, so using the same input variable
# will not do what you want
inp4 = [4,8,2,9]
sorted(inp4)
print(inp4)
# => [4,8,2,9]
This is a last resort, but if you’re completely out of ideas then you can ask for help. I’ll give you a nudge anyway once this looks like the only way forward, but if you already know you’re stuck then you might as well say so now and save a minute or two.
Asking for help doesn’t mean that you’ve failed unrecoverably. You might have done a good job of debugging. I might forget that it happened. You can use the time you saved to impress me elsewhere.
Some people blame their extremely popular tools when they make mistakes, presumably trying and failing to save face. They say “oh JavaScript, why do you have to do things like that?” or “that’s just one of the many things I dislike about the Ruby standard libraries.” It’s possible that they’re right, but they sound like a whiny buffoon.
Describe what you’re thinking and doing as much as possible. This helps me understand your work and to give you credit for your thought process, even when you make mistakes. If you solve the question perfectly without saying a word then you still pass, but running commentary gives you insurance in case things go wrong, and often helps you work better too.
Functions, classes, and other forms of abstraction impose structure onto a program. If you’ve understood the true nature of the problem you’re solving then abstraction can make your code terser, more readable, and more maintainable. But if you’ve misunderstood the problem then abstraction can impose the wrong structure, forcing the rest of your code to contort awkwardly around it.
If you’re not absolutely certain that a block of code should be pulled into a method or class, leave it as it is with a TODO
comment saying something like TODO: probably extract this into a method
. This shows that you’re thinking about structure and abstraction, without committing you to anything.
“Don’t abstract too soon” is good advice for the real world, but it’s particularly important in an interview. The costs of choosing an incorrect abstraction are higher than normal, since you don’t have time to iterate. Even worse, you’re more likely than normal to choose the wrong approach, since you’ve only been thinking about the problem for a few minutes.
Even worse than that, good programmers disagree about the right abstractions. Your useful logic boundary might be my unnecessary complexity. I might query your decisions during the interview, giving you a chance to explain them, but I might not. Even if I do, I might disagree with your justifications, and I might be completely wrong to do so. I started giving interviews to senior candidates after only 3 years of professional experience. I’m sure I roasted good engineers for bad reasons, and I’m sure I still do so today. If you avoid abstractions then you avoid disagreements where you can only ever lose, even if you’re right. By contrast, it’s hard to disagree with “I’d probably turn this into a function later once I’d finished the rest of the program.”
Unfortunately abstraction feels fancy, and many interviewees can’t resist looking clever. Too often they instead end up with a ball of overloaded mud which slows their progress, achieves nothing, and looks ugly. You should be quick to note where an abstraction might be appropriate, but slow to implement it.
TODO
sStart by getting your solution working for the base case, since this is probably the main thing you’re being judged on. Most secondary problems you encounter along the way would be better solved later, including validation, error-handling, edge-cases, and cosmetic tidyups. Don’t bother with them for now, and instead write comments describing the work still to be done, for example TODO: don't hardcode this value
.
Once you’ve solved the base case you can sweep back through your TODO
s. You can delete the ones that have become irrelevant, and polish off the ones that still apply. By this point you’ll have 15 minutes extra understanding of the problem, and might make better decisions. If you don’t have time to address them all then you’ve still shown that you know what should be done, netting you at least partial credit.
Some examples:
TODO: handle empty input
TODO: validate that all numbers in list are positive
TODO: better variable name
TODO: factor out this logic with previous block
TODO: handle exceptions
TODO: tidyup
TODO: don't hardcode this value
Write basic tests to help you check your code. Your program should print something like Pass,Pass,Pass
if the tests pass and Pass,Pass,Fail
if one of them doesn’t.
# Good
$ python3 interview_good.py
Pass
Pass
Fail - expected: [3,9,2], got: [3,9,3]
Don’t print the return value of your program and verify it manually (eg. [1,4,5]
followed by “oh yep that looks about right”). Eyeballing is easy to get wrong, especially when you have multiple test cases.
# Bad
$ python interview_bad.py
[1,4,5]
[2]
[3,9,3]
Testing frameworks are good for producing maintainable, modular tests, but they require setup and configuration that can be easy to get wrong and time-consuming to get right.
In an interview all you need is a simple way to make sure your little program is behaving correctly. This can be easily accomplished with simple code like:
actual1 = my_function([1,3,5])
expected1 = [2,4,6]
if actual1 == expected1:
print("Pass")
else:
print(f"Fail - expected: {expected1}, got: {actual1}")
You could package this up into a simple assert_equal
method, but I wouldn’t bother. A little repetition is fine. You can even say “in real life I’d use pytest
” if you feel awkward about it.
If you truly are comfortable enough with a particular testing framework to use it in an interview then go ahead. Write an example test for your hello-world program from tip 2 and make sure you can run it ahead of time.
pass
If you do use your own basic testing framework, print something showing when a test passes. If you instead continue silently, then if your program is correct it will run and exit without outputting anything. However, this could also happen if your program exits early for some erroneous reason. Does the following definitely mean your code works?
$ python3 interview_bad.py
$
Printing pass
when a test passes means that you know your program ran properly. This output definitely means your code works:
$ python3 interview_good.py
Pass
Pass
Pass
$
If all your tests pass first time, it’s more likely that your testing setup is broken than you’re a genius. Add a bug into your code and make sure that some tests fail. If they don’t, something is wrong with your tests, as well as - probably - your code.
TDD (test-driven development) is the practice of writing tests first, then writing code that passes these tests afterwards. Some swear by it; I find it laborious and unhelpful. In my experience, when someone attempts full TDD in an interview it usually takes a lot of extra time for little benefit. If you truly prefer to work this way then go for it, but don’t do it to impress me.
If you haven’t heard of TDD before then don’t worry and don’t learn about it, at least for this interview.
Some people write a single test case and modify it when they want to check a different input. This means that they have no way to validate all of the other inputs that they’ve tried in one execution. This invites bugs and regressions, and makes it hard for either of us to tell if their program works for all cases or just the last one they checked.
Don’t modify your test cases; write a new test for each input so that you can run them all at once.
This is how to pass an interview with me. Write clean code, save your energy for what matters, and put on a bit of a performance to make sure I notice your good parts. Other interviewers will have their own views. Different companies look for different things, and even different people within the same company look for different things, despite leadership’s efforts to standardise on a consistent rubric.
But even though other people might have different priorities and interviews are half-charade, I still think that the tips in this post are universally good habits that will help you pass more of them. Good luck, and let me know how you get on!
2023-04-04 08:00:00
This is part 17 of a series about my experiences being a parent. Read the rest here.
Gaby, my wife, a few weeks before the birth:
Before Oscar was born, I was entirely excited that we were going to have a child. I liked my life, but it was a good time to change everything. My job was fine, but I wasn’t worried about putting it on hold or even losing it. We were about to move to London, so my social circle, hobbies, and routines were going to get warped anyway. I’d have to rebuild, with or without a baby.
But my new life with Oscar is perfect as it is, and so this time I feel like I have more to lose. My love for Oscar feels so supreme that I struggle to believe I could experience the same thing with another person, even though I’m sure I will. I can’t imagine how I’ll feel about our new child, even though I already have one and I know exactly how much I love him.
The baby is due soon. I’d planned a lovely day out with Oscar and Rob, one of the last with just the three of us. Then the Queen died. The theatre closed as a mark of respect, the fire station canceled its open day, the community fair shut down. I cried. They didn’t need to do that, she didn’t need that much reverence, I wanted one more day out with my little boy. I told Oscar that the queen had canceled everything and we went to the playground, like always. The next day he said “let’s pretend to be the queen and cancel things!” Rob tried to go for a walk but Oscar stood in front of the door and canceled it.
There was something thrilling about giving birth to Oscar, despite the pain. I don’t get many chances for that kind of acute, exquisite experience. I had time for drugs, and I felt safe and in control. I saw labour as a test of endurance that I could win, which I did. I didn’t mind being a sleepless new mum. I had four months off work and I enjoyed being at home with my new baby. I gained a new identity instead of losing one.
Oscar’s birth was exhilarating, but now I think I got lucky. I’d read some birth stories, but back then I didn’t know many women who’d had children themselves. Now every mum I meet has a story, and so do all of their friends, and many are horrendous. Septic shock for mum and baby. Days in the NICU. Thirty-six hours of labour, then a C-section anyway. Anthologies of disasters from a random sample of women, just like me, in the same place and time. I know more about what happens during a birth, so I know more about what could go wrong. I wonder if my shallow confidence helped make Oscar’s birth simple. Maybe, in some tiny way, but optimism doesn’t protect against sepsis.
The part of Oscar’s birth that did rock me was the recovery. I knew that birth would be painful, but I’d never had to recover from any physical trauma worse than a broken leg, and I couldn’t imagine weeks or months of not being able to get out of bed. After Oscar was born I struggled for a long time. When he was twelve weeks old I was able to go for one last, slow bike ride round San Francisco, just before we left forever, and even that was uncomfortable. And it could have been much worse. It was summer, Rob had time off work, our families came to visit, my wounds healed.
If this recovery is similar then I’ll be trapped inside for months, in winter, unable to play properly with Oscar, reliant on Rob for everything. We’ve been thinking of getting a car to help me get around and take Oscar out on my own, but I haven’t got round to taking the UK driving test, and it’s too late now. But bodies are strong, I’ll make a comeback eventually, probably.
Last night I said that I had a headache and Oscar brought me an empty packet of ibuprofen. I appreciated the gesture even though I shouldn’t have left it where he could reach it. He sang a song he’d made up called “you can’t imagine how nice it is to be a bear” while I snuggled him to sleep. “I love you so much,” I said. “Me too,” he said, “I super duper superstar love you always”. I’m cautious and a little melancholy but I really am happy to be doing this again.
Back to me:
I met up with a child-curious friend when Gaby was six months pregnant. He asked what life was like with a child and whether I was nervous about a second. “You do have a lot less time for yourself when you have kids,” I said. “Although we’re at the pub now so I suppose it can’t always be that bad.”
I hadn’t thought in much detail about the implications of another baby. They’re hard to imagine, and by this point they didn’t matter, we were all out of decisions. We’d find and reallocate time and energy, just like everyone else. We’d done well for parental leave; Gaby would be off for a year and I’d finagled seven months, part paid, part unpaid.
I took an extra month off before the due date, liquidating a vacation account accumulated during a cautious year. I handed off my projects and set a frank email auto-response: “Thanks for your email. I’m not going to read it, so if it’s important then please send it again next year.” During this month Oscar and I started piano lessons, and I failed to teach him to ride a bike. I worked on hobby projects, including writing my own Game Boy emulator to let me play Pokemon on my computer, and an open source intelligence tool to work out when an audio recording was taken. Each day was slow and quiet and possibly the last of its kind.
I moved Oscar from Gaby’s bed, where he’d slept for the last three years, into mine. The baby would sleep with Gaby, and if Oscar did too then the baby might keep him up or he might roll over and crush it. I filled our box room with two beds, one for me and one for Oscar, since I can’t sleep near other people. This left just enough room for a chest of drawers and the door to swing open.
Oscar still sometimes wakes up during the night. He snuffles from his bed to mine, puts his hands on my face, and sacks out again. I give him a few minutes, then go to the toilet and switch to the other bed. One night I forgot how many times we’d done this disco and sat on his head. If it’s before 5am then I can usually get back to sleep, but sometimes I’m up for the day at 2am. I should probably try to move him into his own room, but I like taking care of him at midnight, when he’s at his softest. I’m happy to have more snuggles before I blink and we’re on handshakes and back slaps. Moving him would be yet another project, and I no longer think that sharing a bed with your three-year old is a failure if it more or less works for you.
I took Oscar to a new playground. He climbed into a wooden house, big enough for four or five kids, and set up a shop. I pretended to buy things through the window. Three children came over, about two years older than Oscar, and started shouting at him. He’d been in the house for too long, he couldn’t play with them, he had to leave. Oscar ran out and started shouting back that no, he was allowed to be in the house, and they had to leave.
I was used to kids not sharing, but this was full bullying, which I hadn’t seen for a long time. I’d forgotten the desolation of being told “you can’t play with us”. Oscar and the invaders kept yelling at each other. I wondered if I should protect him somehow, but I wasn’t sure what type of counterstrike was appropriate, or whether a playground is like a nature documentary, where you have to detach yourself and let the jungle decide. I held Oscar’s hand and asked if he wanted to go somewhere else. He insisted that he didn’t, but eventually he lost the war of words and crumpled to the colourful tarmac. He got up and buried his face in my chest and said that the children were making him feel sad.
I carried him off and we ate an apple. The children moved on and we returned to our shop. Later one of them came back and said that his friends had left, could he play with us? Oscar said that he could, magnanimous in victory.
When we got home he wanted to talk about what had happened, over and over. Next time I think I’ll just lead him away immediately. He’ll have to stand up for himself one day, but as long as he’s three years old, against a gang of older kids who he’ll never see again, I think it’s fine for his dad to ride shotgun.
We were at 41 weeks. But it had been a routine Wednesday, it couldn’t be today. It was 3:30pm and we were getting ready to go for a walk through the common for a pizza. We’d pick up Oscar from nursery and catch the bus home. Gaby found it hard to reach her feet, so I was tying up her shoelaces. She put a hand on my head. “I’m not having contractions, but every minute or so I do feel like I’m gently splitting in half for a second,” she said. She dithered. “It’s probably nothing, I’m hungry, let’s go.” She went to the toilet. “Although my underwear looks quite pink,” she said, “and shit - I haven’t felt the baby kick for ages.” She called the hospital and they said to come in now. We don’t have a car so I called an Uber.
It took ten minutes to come; Gaby kept asking how far away it was and I felt like a failure every time I answered. I don’t panic in stressful situations, but I do get serious and verbose, which Gaby finds just as distressing. “The taxi is approximately 7 minutes away and appears to be moving at a constant speed, I’ll update you when I have new information.” Still no kicks. These didn’t feel like contractions as Gaby remembered them. Contractions are usually in your belly, back, or bum. They start small and far apart and get bigger and closer as labour progresses. Gaby’s cervix felt like it was being gently prised apart for two seconds every two minutes, which the internet doesn’t discuss. Still no kicks. I called my parents and said that we weren’t exactly sure what was happening but could they pick Oscar up from nursery. They wished us good luck.
I was worried that the Uber driver might refuse to take us, in case Gaby started leaking or even gave birth in his car. I thought about setting our destination to a few streets away from the hospital to disguise our destination, but under pressure I forgot. Fortunately the driver was kind, and in any case we were probably inside his car before he realised. He drove quickly and carefully and told us about his own kids. I tried to secrete gravitas and levity.
Halfway to the hospital Gaby’s cervix quietened down, and she started grumbling that this was probably all nothing. I still thought that she was in labour. We agreed that either way we should get the baby looked at. I reflected on how lucky we’d been that we hadn’t left the house thirty minutes earlier like we’d planned, otherwise we’d have been in the middle of a forest. An ambulance might have been able to squeeze through, depending on which route we’d taken, but I’d have still felt unbelievably stupid. I couldn’t believe we’d been so nonchalant; due dates are only estimates, but they do mean that you’re about to have a baby. I knew that labour can be sudden. My friend had given birth in a 4-wheel drive in a hospital carpark, but yesterday I’d gone to look round a piano showroom a forty minute train ride away, at Gaby’s suggestion. While I was out Gaby went shopping and had to pause every few minutes to grunt and tell strangers that she was fine, these weren’t contractions, just a pinched nerve.
We arrived at the hospital. Gaby griped to the woman at the front desk that nothing was happening but we were here now so might as well get checked out. We walked up a flight of stairs to the maternity ward, and in that minute the baby moved fast. By the time we lurched into the waiting room Gaby was in acute pain, definitely about to give birth. The receptionist was calmer than I expected. I suppose her waiting room is always full of women in acute pain and about to give birth, and she can’t spend every day in a rictus of concern. She was helpful enough and checked us in. I rubbed Gaby’s back; she asked me to stop.
A student midwife came to collect us. “We’re very busy!” she said cheerfully, “Seems like everyone’s having a baby today!” Congratulations but poor scheduling to us all. We went into the triage room. No one seemed too concerned about Gaby’s red discharge, although I should have asked directly. After some poking and waiting, two midwives led us into a labour ward where the rest of the day would take place. They asked if Gaby would like to try a birthing pool and she said sure, why not. They turned on the tap and Gaby leant against the bed moaning. Then her waters broke. I was startled, I think I’d been out of the room for this bit last time. The floor was soaked; my shoes were wet. A midwife gave me a paper pot, I wasn’t sure why and assumed that it would become important later. She saw my quizzical look, pointed at me, and mimed vomiting.
The day deteriorated. Gaby had a few huffs of gas and air but it made her feel sick and she wailed for stronger meds. The midwives said that the baby was nearly out and that anything they gave her now wouldn’t kick in until it was all over. She was past the event horizon. I asked what they meant by “nearly out”. “Probably just a few more minutes,” they said. It was about 5:40; we’d arrived at the hospital at 5. Another contraction.
Oscar’s birth had taken 9 hours. There’d been a storyline and a chance to settle in. Gaby had had time for pain relief, so the crescendo was lower and the build-up was slower. Gaby pushed for several hours, and we all cheered her on. It was a sublime trial, a marathon with pain but also glory. But this birth was ferocious and seemed out of control. One of the midwives scrabbled around underneath Gaby with a wand and a heart monitor, trying to check on the baby. She didn’t say anything, so I hoped that everything was OK.
Gaby broke character and stopped wailing, still on all-fours. “I’m sorry, but this isn’t going to work,” she said, “I’m not going to be able to do this, you’re going to have to figure out something else.” The midwives assured her that she was so strong and that she could do it. She was going to have to. The next contraction hit and Gaby burst into silent tears, which hadn’t happened last time. I started crying too. I didn’t know what to do, nothing seemed appropriate. This was so much worse than before, I just wanted it to be over, this couldn’t be worth it. I tried to act excited and positive, which Gaby said helped. “We’re going to see our baby soon!” I said. The bed was made up of interlocking pieces of foam, and I noticed that one of the slices holding Gaby’s knees was starting to slip. I held it in place and reflected that this was surely bad design. The midwives said that they could see the baby’s head. I couldn’t see anything but said that I could too. They were either telling the truth or lying for motivation.
And then his head dawned, all at once, a sudden, bloody sun. I saw the umbilical cord wrapped around his neck and I choked, but the midwives had seen it too and didn’t seem concerned. They asked Gaby to roll onto her back to push out the rest of him. She told them that there was no way she could do that; she had a head sticking out of her vagina.
The midwives asked if she wanted to catch the baby herself and she said no way, absolutely not. He flopped out anyway and they caught him for us. He was a boy and his name was probably Samuel. Gaby collapsed and they gave him to her. He rooted around for a few minutes, latched, fed, and passed out. One midwife turned off the birthing pool; it had filled up to ankle height. It was a few minutes after 6.
I took and sent some photos. The room had fairy lights taped to the wall, which looked hokey in person but fantastic in the pictures. Several people asked if we’d gone private. I ordered too much pizza. I put down the delivery address as “Hospital, Maternity Ward”. I took our son while the midwives dealt with Gaby’s placenta. I’d forgotten how to hold a newborn. They have no muscles and it’s important not to drop them. I slouched in a chair by the window and perched him on my belly. I was glad he was here but I didn’t feel love yet. I wasn’t worried though. I hadn’t felt much when Oscar was born and now I loved him more than anything. The plughole gurgled as unused birth water trickled away.
Gaby, later:
I felt in control, despite the speed and pain. Sam didn’t happen to me; I birthed him myself. I remembered what my doctor in San Francisco had screamed while I was labouring with Oscar. “Poop him out, Gaby, poop him out!” This wasn’t a metaphor, those are the muscles to use. When I pushed like that again I felt Sam edge through me.
But it hurt too much, I needed a break. When the next contraction came I did nothing with it, I let it wash over me. The pain was almost the same but Sam stayed where he was; wasted agony. I plunged into the next contraction and felt him move again. And now he’s here.
One midwife said goodbye while the other continued to stitch Gaby back together. Workdays on a maternity ward must have a deafening rhythm. Unreasonable pain, then pizza. Repeat until 7pm, with a lunch break. After an hour of reconstruction, Gaby asked the midwife if they could pause for a few minutes, because the stitches and prodding were painful and nonstop. The midwife realised that Gaby hadn’t had any pain relief of any sort and certainly no local anaesthetic. She shot some in and finished the repairs, then wished us good luck and left to get the next baby. I turned the lights off. The sun had gone but the fairy lights flared. We ate pizza, trading our baby to keep the cheese off his head. We called Oscar and my parents. Oscar asked why we hadn’t come home yet. We’d got the baby, what was left?
I started putting a nappy on Sam, but he’d already pooed in his blanket and I’d forgotten to bring any wipes. I found a jumbo pack on a counter and rubbed him down. I noticed that my fingers were tingling, and I realised that the wipes were covered in antiseptic. I told a midwife. She said that Sam would be fine but I should stop touching things that wasn’t mine. I gingerly finished putting his nappy on and wrapped him up again. Then I realised that I’d used the same blanket that he’d pooed in, and now he had syrupy black infant crap on his head, even though he wasn’t yet two hours old and I’d thought I knew what I was doing this time.
The duty midwife told us that we could go home that evening if we wanted. At first that sounded silly. Sam was so fragile, and I’d already covered him in chemicals and faeces. But he’d still be fragile tomorrow, so we said yes. The midwife filled out the paperwork and Gaby asked me to sneak into the triage room and steal some biscuits that she’d seen there a few hours ago.
My dad picked us up around 10. This time round Gaby could hobble herself to the car. I carried Sam in his new, unfamiliar seat and we strapped him in as best we could. We got home, hugged my parents and went to bed. Sam was comatose the whole night but Gaby lay awake shaking. She roused him every few hours to feed. The next day we introduced Oscar to his brother. Oscar said hello but was otherwise uninterested and went about his day as normal. Gaby’s parents had given him a new cuddly lion to mark the day, and I suggested that we call it Lionel Richie. Oscar liked my idea, but got confused about which animal I was talking about. Now he has a baby brother called Sam, and a penguin called Lionel Richie.
This is part 17 of a series about my experiences being a parent. Read the rest here.
2023-02-07 08:00:00
I’m a professional computer programmer, but I don’t often do all that much computer programming. Instead I spend much of my time futzing around with config files, other people’s frameworks, and other people. This makes it easy to forget how much fun it is to write a long program that truly interests you and watch it do something a little bit astonishing.
I write a lot of words in my spare time, but rarely any code. However, a year or so ago my wife and toddler went to America to visit her family while I stayed at home for a staycation. I’d been enjoying work and was feeling professionally inspired, so I wanted to spend some of my holiday building something.
But choosing a personal project is hard. This was going to be the freest time I’d had since my son had been born two years earlier. If I was going to spend it programming instead of being outside or playing Skyrim then I wanted to be sure that I was programming something very, very compelling. I wanted a project that protruded the right distance outside of my comfort zone. I wanted to go a bit closer to the machine than I normally get. I wanted concrete outputs that would impress both me and my friends. I wanted to know what I’d learn, how it was relevant to my other interests, and where it could lead in the future.
I decided to write a Game Boy emulator: a program that would run old Game Boy games on my computer. This seemed hard but achievable, and I thought it would dazzle my friends for at least a few seconds. The internals of the Game Boy seemed well-documented enough to be accessible, but poorly-documented enough to feel like I’d be the one doing the work. I’d never made anything like it before, and I loved the idea of running Pokemon Blue and catch a Pidgey using a program I’d written myself. I called the project Gamebert because it was a Game Boy and my name is Robert.
I waved goodbye to my family and went home to get started.
I chose to write Gamebert in Go. I guessed that an emulator would be a resource intensive program and would benefit from a fast language, especially if, as seemed likely for a first attempt, it wasn’t written very well. I anticipated making lots of mistakes, and wanted a typechecker that would catch some of them quickly. Since I’d be simulating registers (small slots in a CPU for storing temporary data) and bytes of RAM that held a specific number of bits, I wanted to be able to specify the size of my integers and have the typechecker hold me to my choice. Representing an 8-bit register using an unsigned 8-bit integer (uint8
) would handle overflows and underflows for me (i.e. making sure that 255+1
looped back round to 0
), and prevent me from accidentally assigning the register an oversized number that a real 8-bit register couldn’t actually hold.
I watched some talks on Gameboy architecture. The talks were clear and entertaining, but I didn’t absorb enough information to actually start anything. I continued wandering the internet in search of the right type of handrail.
Lots of Reddit threads mentioned the CHIP-8, a machine similar in structure to a Game Boy but much simpler. The CHIP-8 isn’t a piece of actual physical hardware; it’s a specification for a theoretical virtual machine. Writing a CHIP-8 emulator is like writing a Game Boy emulator if Nintendo had never actually manufactured any consoles. Reddit suggested that novices cut their teeth on a CHIP-8 emulator before grinding them on a Game Boy, so I did. Writing my CHIP-8 didn’t take too long (thanks in large part to this guide), and it got me used to emulator concepts like flags (slots in a CPU that store boolean values about the last operation, such as whether the result was zero) and opcodes (numerical codes representing the different operations that a CPU can perform). I was ready to apply what I’d learned from my CHIP-8 to my Game Boy.
A Game Boy has a few major components, including a CPU, an LCD, a few memory banks, a game cartridge, and a motherboard. The CPU performs the machine’s calculations, using tiny instructions like “load the byte from RAM location 1234 into register D” and “add 1 to register B”. It executes about a million of these instructions every second, which miraculously add up to a game. The LCD works out which pixels to display on the screen, which in a Game Boy is surprisingly complicated.
A game cartridge contains a game’s code, written by its programmers using the Game Boy’s assembly language. Some types of cartridge offer other functionality, such as allowing the Game Boy to write data back to the cartridge to save the player’s progress. Memory banks store data about the game’s intermediate state as it executes, such as “how much life does this Charmander have left?” and “which room is the player in?” All of these components plug into a motherboard, which they use to communicate with each other. For example, the CPU can write data to memory banks, and the LCD can read this data in order to work out what to display to the user.
After a few days of reading and partially-understanding the documentation, I got Gamebert to load a game and scroll the Nintendo logo down the opening screen for the first time. This was a magical moment. I’d never built anything that could produce emergent output so far removed from the code I had written. Nowhere in my code did it say anything like “print an N, then an I, then…” I’d produced this animation by passing a bizarre series of bytes (a ROM file ripped from a Pokemon Blue game cartridge) through a program that I’s written entirely myself.
This motivating milestone came relatively early in the project, after only a few days (although your own definition of early will depend on how much patience and free time you have). The day after I produced my first Nintendo logo I woke up at 5:30am and worked straight through until 8pm, which I’ve only ever done once before.
But after the logo dropped, the screen went blank. No Pokemon appeared, and I didn’t understand my system well enough to be able to work out why. I’ve never written assembly code, and so I had no intuition for which behaviours looked right or wrong. I didn’t know which errors were there because I hadn’t finished, and which were because I’d made a mistake. The only way I could think of to test my emulator was to write the whole thing and hope it produced a Squirtle. I wasn’t sure if I had bugs in my CPU or my LCD, although with hindsight the answer was of course “both”.
At the end of my week alone I was still stuck and nowhere near finished. My emulator continued to loop forever doing nothing and I had no idea why. I’d found test ROMs (like Blargg’s ROMs) that helped emulator developers like me validate our in-progress creations, but my program couldn’t execute them accurately enough to even produce an error message. My family and job came back. I still had an hour or two of discretionary brain time most days, but I couldn’t make much progress without stretches of at least half a day. Any understanding I had of how a Game Boy works fell out of my head.
Fortunately, a year later I had another child. Before he was born I took a month of vacation that I hadn’t used during COVID. I spent this time playing with the toddler I already had, working on a new project, and picking up Gamebert again. I used my fresh eyes to fix some now-obvious bugs, although none of them made my emulator stop looping and start working. My new son was born, and after a few weeks we were settled enough that I could work on Gamebert while he slept, strapped to my chest. I was on parental leave and he slept a lot, so I had a lot of time. I looked for a new strategy.
The clearest way to describe something as fiddly a Game Boy is often through code. I read other people’s emulators on Github, which sometimes felt like cheating, but then I remembered that this is my free time and there are no rules. I chose two reference projects written in different languages (Python and Rust) to the one I was using (Go). This meant that I had to translate between languages and architectures, which I think helped my comprehension, although sometimes I did still have to copy without understanding.
There’s also a lot of community-written documentation, which is both amazing and kind of crummy. It’s extensive and detailed, but also near-impossible for a newcomer to parse. For example, here’s part of a description of an LCD screen from the PanDocs, the most comprehensive public Game Boy manual:
After checking for sprites at X coordinate 0 the fetcher is advanced two steps. The first advancement lengthens mode 3 by 1 dot and the second advancement lengthens mode 3 by 3 dots. After each fetcher advancement there is a chance for a sprite fetch abortion to occur.
I’m not criticising the people who wrote these docs. They are still invaluable and I doubt that it’s possible to make them accessible to novices without spending a year writing a book that would be lucky to sell a hundred copies. I found them especially handy when I was able to match a paragraph of prose to a line of someone else’s code and use each to understand the other. But as a warning for other newcomers having finding the docs hard going - it’s not just you. At the very least, it’s me as well.
My big methodological breakthrough came when I joined the Emulator Developer Discord. The helpful denizens of the #gameboy channel pointed me to a GitHub repo of logfiles, in which a kind person with a working emulator had run Blargg’s ROMs and logged the state of their emulator after each instruction.
I used these golden logs to validate my own emulator’s behaviour, even though it still wasn’t correct enough to get a sensible result from the ROM’s actual tests. I logged the output from Gamebert in the same format as the logfiles in the repo, and wrote a tool that compared my logs line-by-line against the repo’s until I found a discrepancy. This allowed me to identify the exact CPU cycle when my emulator started misbehaving. I called my tool Game Boy Doctor.
I stopped aimlessly scrolling through my code and started using Game Boy Doctor to reliably pinpoint bugs. The Doctor found several errors in my CPU; faulty bit twiddles and and buggy half-carries. It couldn’t say how to fix the problems it uncovered, but the solutions were often obvious once I knew where to look.
The biggest, most boneheaded mistake it revealed was that I had accidentally left a debug flag turned on. This flag told Gamebert to silently continue if it was asked to execute an opcode that I hadn’t implemented yet, instead of crashing. Since I had implemented fewer than half of its 501 opcodes, Gamebert was quickly getting into a nonsensical state. I lost days of my life to this stupid flag, and I don’t know why I even added it in the first place - you can’t expect a program to do anything useful if you skip half its instructions. Nonetheless, with Game Boy Doctor’s help I finished implementing every instruction in my CPU, and successfully ran all of Blargg’s CPU test ROMs. I published Game Boy Doctor on GitHub, and now it’s helping other people debug and fix their own emulators.
Now I had momentum and understanding, and I swept through the remaining components of the project. I’d written enough of an LCD display to draw the Nintendo logo, and now I added the code to display the parts that I’d skipped over. The LCD was the most complex part of the project, and I think I just got lucky that it mostly worked first time. I tidied up my cartridge, joypad, and motherboard, but I stopped at implementing sound. You can tell yourself and your friends that your Game Boy works even if it doesn’t have sound, and by all accounts sound is quite difficult.
I ran Tetris, and it sort of worked. The Tetris pieces were invisible until they landed, but I correctly guessed that this was due to a bug in how my LCD combined pixels from its different layers. I fixed the problem and ran Pokemon Blue. It worked. I caught a Pidgey, walked around for a few minutes, and stopped. Gamebert seemed stable, but I hadn’t added any functionality to save the game and it would probably crash before too long, so I didn’t want to spend much time training up a squad. I’d reached my goal.
Gamebert was the best personal project I’ve ever worked on, and it’s one of the few I’ve actually finished. Despite this, it still sometimes felt absurd and pointless. Was I really spending 100 or so hours of my spare time on a shoddy replica of other people’s work instead of, say, playing with my family? This is of course an ungenerous way to talk to yourself, and is easy to attack from multiple angles. Who knows what zigzag professional paths my new knowledge might open up, and more to the point, who cares? Everyone needs time to themselves, and not everything has to have meaning or benefit beyond having some fun.
If this project sounded interesting, then I wholly recommend making your own Game Boy emulator. Start with this talk, these docs, and this sub-Reddit, and let me know how it goes.
2022-12-14 08:00:00
A week ago my brother sent a message to our family group:
“My team at work launched something! It’s called ChatGPT. Give it a go: https://chat.openai.com”
I talked to ChatGPT for ten minutes and then had a crisis of meaning for a few days. I eventually texted my brother back to say well done, because family will still be important, whatever happens next.
At first I thought this was the end of the world. ChatGPT is nowhere near an Artificial General Intelligence (AGI): an AI capable of performing most tasks that a human can. But until last week I thought that even ChatGPT’s level of abstract reasoning was impossible. It can already - to an extent - code, rhyme, correct, advise, and tell stories. How fast is it going to improve? When’s it going to stop? I know that GPT is just a pile of floating point numbers predicting the next token in an output sequence, but perhaps that’s all you need in order to be human enough. I suddenly thought that AGI was inevitable, and I’d never given this possibility much credit before. I found that it made me very unhappy. This is a post about feelings, not analysis.
I texted everyone I could to warn them what was coming. I sounded like an uncritical AI futurist. But all the real futurists seemed excited, or at least frantic. I just felt glum. Our parents had got both the property boom and the last shreds of meaning.
I cycled to a friend’s house to watch England play in the World Cup. I thought of Richard Feynman and how he felt after the Manhattan Project. He said that he saw people building a bridge in New York and thought that they were being absurd, that they didn’t understand. There were atomic bombs now, it was senseless to make anything. It would all be destroyed soon.
As I pedalled I watched people through their office windows, composing unclear emails and editing buggy spreadsheets. I hadn’t played any part in GPT, but I felt like Feynman. Why were these people wasting their time? Didn’t they read the news? Human striving was over, this was all going to be annihilated. Just go home and wait.
This was an overreaction. ChatGPT is impressive, but it’s not an AGI or even proof that AGI is possible. It makes more accessible some skills that I’ve worked hard to cultivate, such as writing clear sentences and decent programs. This is somewhat good for the world and probably somewhat bad for me, to the first degree. But I can still write and code better than GPT.
On the other hand, whilst I can peacefully coexist with GPT as it currently stands, it won’t be standing there for long. Perhaps I should calm down; disruption is everywhere and necessary. Artists should be afraid of Stable Diffusion; master weavers are anxious about mechanised looms. But now machines are encroaching on things I care about and everyone needs to pay attention.
Even if AGI isn’t imminent, I suspect that big ideas will become more important and their implementation will become more automated. But I don’t have big ideas. If I did then I’d be the kind of person who sees AI as an opportunity, not a threat. Recently there’s been a lot of VC money sloshing around Silicon Valley and not enough programmers. This has made it possible for the programmers who are there to do well in cash and cachet just by being competent implementers. We can even get another rung or two higher by lightly exaggerating the impact of our work during performance reviews. This has allowed me to have a type of success without much brilliance.
It’s been a good deal, but it’s made me professionally complacent. Today’s system works for me, why would it ever change? I assumed there could never be another tech crash. I found AI think-pieces boring and didn’t have time or expertise for the maths. I ignored the boosters, the blowhards, and apparently the experts. Now I’d guess that AI is going to change everything, somehow or another. Luckily I’m allowed to be complacent. I don’t have to be right about the future; I’m not responsible for a company or a product that needed to see this coming. I’ve probably been fortunate enough in the old world that I’m more worried about what AI means for my chances of self-actualisation than for my stability.
I think that AI will make programmers - and almost all other workers - much more productive, but what this means for the industry will depend on the size of the productivity gains. A 50% increase in output per programmer would be incredible (and perhaps even a wild overestimate), but I think it could be absorbed by something that looks like the current tech market. A 500% increase couldn’t.
Some grunt work will get eaten by AI, probably at first as an augmentation to human programmers. What counts as grunt work will depend on how much GPT’s code runs in production. I don’t know where the people currently doing the digested tasks will end up. Perhaps this will be the dawn of a new golden age of software, with more companies, more jobs, and more pie. Perhaps they’ll have to find something else to do. If there is any kind of divide and cull then I assume - perhaps again complacently - that I’ll be on the right side of it, at least for now. But even if I am, will I enjoy my new job? I already regret not knowing much about operating systems. I’m not sure I can handle another layer of abstraction. Present-day GPT isn’t that much more than a personalised Stack Overflow, but what about in five years?
Shit, shit, shit, what if I’ve been wrong about cryptocurrency too?
Perhaps my distress isn’t even about the practical implications of AI. In the last week I’ve discovered that I care more about status than I thought. Status doesn’t have to mean razzle dazzle; I drive a pedal bike and my hoodies have holes in the elbows. But until now I’ve always felt like part of the main event. I already lost a sort of status when I left San Francisco. In London my industry isn’t the centre of attention. I miss the billboards advertising support desk software and telephony APIs. In SF when I told people that I work at Stripe they nodded approvingly. In London they ask what Stripe is.
I’ve still always been in the growth sector, playing a small part in automating other people. I’m not on the cutting edge of technology; I plumb together libraries I didn’t write on top of AWS just like almost everyone else. But I’ve been on the cutting edge of industry, bolting together pipes that move billions of dollars a day. Now I feel like I might be part of a legacy system, being hauled into the future by AI. Perhaps I only care because it’s my own brother with his hands on my lapels. And what a privilege to be fussing about status on the morning of what I’m claiming might be the apocalypse.
Pause; breathe. This might not even be the apocalypse. ChatGPT confidently hallucinates nonsense, and it can’t absorb enough context to do anything all that useful. The commentators focussing on these shortcomings are either missing or nailing the point; I’m not sure which.
On the one hand, the models are going to get better. Lots of AI labs are working on them, and many are at least aiming for full AGI. Even if today’s obstacles are intractable, how many other intractable obstacles have been overcome in order to get this far? I know that most knowledge work is design, coordination, and maintenance, not turning well-specified paragraphs into short scripts or emails. But couldn’t the next version of GPT eat your company’s wiki and chat logs and take over operations from there? Obviously it couldn’t, because the docs are incomplete and out of date and not incentivised by human performance reviews, but you get the idea.
On the other hand, maybe the impossible problems of the future really will be impossible. From the outside it’s easy to underestimate the size of a field’s remaining challenges and the degree of reliability required in order to be transformational. Five years ago self-driving cars were just around the bend; now most companies seem to be giving up. There’s presumably a hard theoretical limit on the power of Large Language Models like GPT, and I’d guess that this boundary is well short of AGI. Perhaps the next leap is still several lifetimes away, like I used to assume.
On balance, taking both sides into account, I have no idea.
So what should I do? I could try to get into AI myself. I’m sure I could help build some training tools. AI infrastructure will go the way of all other programming jobs, wherever that turns out to be, but at least I’d feel like I was on the inside again. I just checked and there are plenty of AI companies in London, if necessary.
For now I’ll wait and see. Until last week I had vague plans to one day write books about teaching programming, maybe a novel, maybe work on some music, spend more time with the kids. I thought I had a plausible path to a highly circumscribed form of greatness. But I suspect that ChatGPT is in many ways already a better teacher than me; certainly it’s more patient and available. I don’t know how long until AI can write novels and synthwave, but it could be soon. That just leaves the kids. I might have to get comfortable with the idea that I have inherent value as a human beyond what I produce.
That’s melodramatic, I’m sorry. But here’s something concrete - I do think I’m going to have to rely less on my blog for self-worth. I mostly write accessible explanations of complex technical topics, like Tor and Off-The-Record Messaging. These essays don’t require novel ideas; just time and interest and some facility with words. ChatGPT can’t yet write extended prose or explain fine details as well as me, but it will one day, plus it will answer follow-up questions. Even if it turns out that I have an inimitable stylistic flair that people appreciate and GPT can’t reproduce (a fanciful hope), I’m not interested in editing for hours and hours just for that. I’m not going to stop writing yet, but I expect to need an alternative sideline before too long.
I know I have to listen to the techno-optimists as well as the techno-pessimists. Economic progress requires productivity gains, as melancholy as they can be for the people on the wrong end of them. I haven’t even considered the good things that will come of AI or the presumably invigorating work that will be required to deploy it. I find that much harder to visualise. I’m sure it will be begrudgingly magnificent.
A possible rule of thumb until things become clearer: before getting too deep into a new field, consider whether you’d be OK if it became an old-fashioned hobby that you only do for yourself.
2022-12-03 08:00:00
Are you building a Gameboy emulator? Are you stuck? Are you failing Blargg’s test ROMs and can’t work out why?
Gameboy Doctor can help! (GitHub link)
Gameboy Doctor is a tool that compares your emulator to an example emulator that passes Blargg’s test ROMs. It finds the exact tick where your emulator’s state diverges from the example, helping you isolate and fix your bugs. You don’t need to have implemented an LCD in order to use it, and you don’t even have to be able to successfully get any kind of pass/fail message back from Blargg! All you need is a minimally functional CPU and motherboard.
Just Python3, no third-party libraries.
The tool is available on GitHub - clone it using git.
Choose a cpu_instrs
individual test ROM (these are currently the only ones supported by Gameboy Doctor - see below)
You’ll need to make 2 changes to the internal workings of your emulator. They’ll probably take about 20 minutes to do, but they’ll save you hours and days of aimless debugging. The changes are:
Register | Value |
---|---|
A | 0x01 |
F | 0xB0 (or CH-Z if managing flags individually) |
B | 0x00 |
C | 0x13 |
D | 0x00 |
E | 0xD8 |
H | 0x01 |
L | 0x4D |
SP | 0xFFFE |
PC | 0x0100 |
0x90
when the LY
register is read (memory location 0xFF44
). This is what I did when generating my example logs, because returning a constant prevent spurious log divergences.Next, update your emulator to write the state of the CPU after each operation to a logfile. Use a new line for each tick, and use the following format for each state (replace the example numbers with your CPU’s values):
A:00 F:11 B:22 C:33 D:44 E:55 H:66 L:77 SP:8888 PC:9999 PCMEM:AA,BB,CC,DD
All of the values between A
and PC
are the hex-encoded values of the corresponding registers. The final value (PCMEM
) is the 4 bytes stored in the memory locations near PC
(ie. the values at pc,pc+1,pc+2,pc+3
).
Run your emulator and get a log file. You can kill the program at any point - Gameboy Doctor will tell you if your log file is correct but ends before the test ROM has finished its assertions. If you pass the test then your emulator will display the word “Passed” on the LCD, and write the bytes for the word “Passed” to the serial output. However, you don’t need to pass or even finish the tests in order to use Gameboy Doctor.
Once you have your logfile, feed it into Gameboy Doctor like so:
./gameboy-doctor /path/to/your/logfile $ROM_TYPE $ROM_NUMBER
For example, to check the 3rd cpu_instrs ROM:
./gameboy-doctor /path/to/your/logfile cpu_instrs 3
On windows you may need to invoke the Python interpreter directly:
python3 gameboy-doctor /path/to/your/logfile cpu_instrs 3
Gameboy Doctor will tell you how you’re doing and give suggestions on bugfixes. For example:
$ ./gameboy-doctor ../my-emulator/logs/3.log cpu_instrs 3
============== ERROR ==============
Mismatch in CPU state at line 9997:
MINE: A:3E F:C--- B:01 C:07 D:C9 E:BA H:49 L:BB SP:FFFE PC:0208 PCMEM:1C,20,FB,14
YOURS: A:3D F:C--- B:01 C:07 D:C9 E:BA H:49 L:BB SP:FFFE PC:0208 PCMEM:1C,20,FB,14
The CPU state before this (at line 9996) was:
A:3E F:10 B:01 C:07 D:C9 E:BA H:49 L:BB SP:FFFE PC:0207 PCMEM:12,1C,20,FB
The last operation executed (in between lines 9996 and 9997) was:
0x12 LD (DE) A
Perhaps the problem is with this opcode, or with your interrupt handling?
Eventually you’ll hopefully see:
$ ./gameboy-doctor ../my-emulator/logs/3.log cpu_instrs 3
============== SUCCESS ==============
Your log file matched mine for all 1066160 lines - you passed the test ROM!
Gameboy Doctor currently only supports Blargg’s cpu_instrs
test ROMs because these are the most useful for initial debugging. It should be relatively easy to support other test ROMs, although small timing differences that don’t affect the successful running of the emulator may cause divergences in CPU states between otherwise well-functioning emulators.
Let me know if you find Gameboy Doctor useful and I’ll work on expanding the ROMs and emulators it supports.
This tool was inspired by GitHub user wheremyfoodat.