MoreRSS

site iconRobert HeatonModify

Frontier red team @AnthropicAI . Writing a book about being a dad. Now I live in London.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Robert Heaton

PyMyFlySpy: track your flight using its headrest data

2024-12-04 08:00:00

“Where are we daddy?” asked my five-year-old.

“We’ll land in about an hour,” I said.

“No I mean where are we? Are we flying over Italy yet?”

I wasn’t sure. Our flight was short and cheap and the seats didn’t have TV screens in the headrests. I looked around. I noticed a sticker encouraging me to connect to the in-flight wi-fi. That would do it, I thought. A site like FlightRadar would answer my little man’s question, down to the nearest few meters.

But unfortunately for him I’m the creator of PySkyWiFi (“completely free, unbelievably stupid wi-fi on long-haul flights”). Not paying for airplane internet is kind of my signature move. We’d need a different, offline strategy.

image

I had a think. When you connect to an airplane wi-fi network, you’re usually met with a payment page where you can purchase access to the internet. The page also usually gives you the same flight information that you’d find in the back of your headrest, like speed, direction, and estimated flight length. Perhaps it would have a map as well, I thought.

I pulled out my laptop, connected to the network, and loaded up the payment page. It did indeed show our wind speed, direction, and estimated time of arrival. But no map.

(It didn’t occur to me to screenshot the page so here’s an artist’s impression)

image

“Maybe the server that’s sending us this data is actually also sending us our location, but the web page isn’t displaying it,” I thought. I opened up the Chrome developer tools. I saw that my browser was making regular requests to a /info endpoint.

image

I clicked on one of the requests. This /info endpoint was indeed sending us a huge pile of data, including fields for ground_speed, wind_speed, and estimated_arrival_time. At the bottom of the response I noticed fields for latitude and longitude. My heart leapt. But then I looked closer. They were both null. Aerofoiled.

This looked like the end of the line. I was about to give up and tell my son that we were somewhere just north of Italy, probably…Europe somewhere. But then I was hit by two fantastic ideas.

Fantastic idea number 1: the /info endpoint didn’t tell us our location, but it did tell us our precise, regularly-updated speed and direction. On our flight home I could track and save our speed and direction every second or so for the whole flight. I could use this information to estimate how far we had traveled in each second, and in which direction. I could dynamically calculate our position by starting at our airport’s co-ordinates, then adding on each second’s step.

image

Fantastic idea number 2: even if I had been able to find our latitude and longitude in the /info response, it wouldn’t have meant much to either me or my son. However, I could build a web app that ran on my laptop and showed us our dynamically calculated position on a map, in real time. The app could have automatically updating graphs of our ETA, wind direction, speed, altitude, and so on. Ooh and an interface for running arbitrary queries against the data. And event callbacks to allow me to programmatically trigger code based on flight info (“when our ETA is 2 hours exactly, block my access to netflix.com and open the latest draft of my unfinished novel”). My son would know where he was. I’d be a Good Dad.

I decided to call the app PyMyFlySpy in order to give it some brand association with PySkyWiFi, my airplane-related project. I couldn’t wait to get started. Unfortunately right now I was wedged in between a five-year-old and a two-year-old and we were all terrible at JavaScript. I waited, impatiently.

PyMyFlySpy

Eventually we landed. I built PyMyFlySpy during our holiday, over late evenings and one or two derelict afternoons while the rest of my family did normal-person fun things. I couldn’t figure out whether it was bad manners to use your laptop in artisanal Italian coffee shops, or which of them had wi-fi, so to my eternal shame I googled “starbucks near me” and planted myself in a corner with a skinny mochachino and typed away.

I finished PyMyFlySpy the day before we left. The code is available on GitHub and it’s easy to setup and run. It even has a “dummy” mode that allows you to demo it without being inside a plane, using a made-up flight.

Here’s what PyMyFlySpy can do:

Maps and graphs

PyMySkySpy shows a map of your flightpath so far. It also shows your current flight metrics and how these metrics have changed over the course of your flight. It does this for all data available from the in-flight wi-fi, even data that isn’t usually displayed on the website or headrest screen. You can see exactly where you are and feel a bit like a pilot.

image

image

Query interface

PyMySkySpy saves all the data that it records to a database. Its UI has a page that allows you to write queries against the data to answer questions like “what’s our maximum speed so far, and when did we hit it?” or “how fast was the wind during that turbulence we just went through?”

image

I’m not claiming that this is hugely useful, but I do think it’s cool.

Multi-airline support

Different airlines have different wi-fi systems. A recorder for a JetBlue flight won’t work on AirFrance. Fortunately, PyMySkySpy allows you to easily add and use recorders for different airlines. You just have to load up their wi-fi landing page, open your browser’s developer tools, and figure out how to parse their page’s data like I did above. Then you add your new code to the PyMySkySpy config, and tell the recorder to use it. Everything else continues to work just the same.

System design

The system is very simple. It has 4 parts:

  1. Firefox Extension - reads flight info from the airline’s website and sends it to the PyMySkySpy web server
  2. Local web server - saves data that the extension sends to it, and makes it available to the frontend
  3. Sqlite Database - stores data
  4. Web frontend - displays data using maps and graphs

image

The one strange design choice I made was to use a Firefox Extension to read the flight data, instead of writing a scraper that makes its own data requests directly. Scraping the information like this would have been easier and more flexible, as well as completely harmless. Hundreds of people were already connected to the wifi, and the airline’s own landing page hits the /info endpoint once every couple of seconds. Adding one more request from a scraper would have been entirely safe.

image

However, I’m sure that airlines would rather people didn’t poke around at their onboard servers, even if those people are very careful and well-intentioned and handsome. To make sure I didn’t irritate them, I came up with an even more judicious approach.

Instead of scraping the data endpoint, I wrote a Firefox Extension. The extension sits there while the airline’s wi-fi landing page requests the latest data from the /info endpoint, just like normal, every few seconds. The extension peeks at the data that’s returned; sends the data to the PyMyFlySpy web server; and finally the web server writes it to the PyMyFlySpy database, to serve to the web frontend. Using a Firefox Extension like this means that PyMyFlySpy never interacts with the plane’s info server directly. This means that PyMyFlySpy can provably never harm the server.

I had to write the extension for Firefox instead of Chrome, because Chrome is in the process of reducing extensions’ ability to interact with requests made by a website (like requests made to the /info endpoint). In particular, Chrome is going to prevent extensions from easily reading the responses to HTTP requests made by a website, which would prevent the PyMyFlySpy from reading the data returned by the /info endpoint. As far as I can tell these restrictions are half for security reasons, and half to make it harder to develop adblockers. Either way, PyMyFlySpy requires Firefox.

Future work - event subscriptions

PyMySkySpy gives us programmatic access to data about our flight. It would be fun to use this to trigger events, like:

  • “For the first half of the flight, only let me open the big report that I need to finish by 5pm today.”
  • “When our location is within 10 miles of the Grand Canyon, send the kids a Slack message to look out the window. Also send me a Slack message to bug them to look out the window.”
  • “If our altitude drops by more than 300ft in 1 second then play a reassuring but really quite urgent sound on all of my devices.”

Next holiday, perhaps.

The flight home

Our flight home was in the late afternoon. We shuffled on board and took off. I pulled out my laptop, connected to the wi-fi, and booted up PyMySkySpy. I turned to my son to show him where we were. I’d shown him the prototype every day for the last week and I though he seemed to be somewhere between “tolerant” and “mildly interested.” But he’d already fallen asleep. I took some screenshots to show him later.

I spent the next few hours monitoring and debugging the recorder to make sure that it stayed up. My two-year-old screamed the whole flight and kept trying to throw himself on the floor. I made supportive faces at my wife across the aisle and pretended to offer to take him, but she shook her head. She knew that this was important.

I watched the graphs. Temperature within normal range. Wind speed stable. Suddenly our altitude dropped by a fifty feet. I wondered if I should tell the pilots. I decided that they probably had it under control. I kept watching, just in case.

PyMyFlySpy is on GitHub.

Generating infinite, age-appropriate Cat Crimes puzzles

2024-09-02 08:00:00

A few weeks ago my 5-year-old and I tried playing Cat Crimes, a puzzle game in which you work out which of your cats ate your shoes. We had a wonderful time - for about 20 minutes.

In each round of Cat Crimes you get a puzzle card with a list of clues on it. You have to use the clues to figure out where in your front room each of your 6 cats were sitting. This tells you which one of them was responsible for your ruined stilettos. The game comes with 40 puzzle cards, ranging from the very easy to the mind-crushingly difficult.

However, the problem is that “very easy” to “mind-crushingly difficult” is a lot of ground to cover in 40 puzzles, and by the fifth puzzle the clues had become too abstract and difficult for my little man. In the first few puzzles each new clue allowed us to immediately place a new cat and then forget about it. For example, a clue might have told us that Mr. Mittens was sitting opposite Pip Squeak. We already knew where Pip Squeak was sitting, so we could work out exactly where Mr. Mittens was sitting too. This is the perfect level of complexity for a small child and his aging father at 6am.

However, as the puzzles get harder the clues stopped neatly resolving like this. They still narrowed down the possible pussy permutations, but they didn’t necessarily allow us to definitively place a new cat straightway. For example, a clue might have told us that Mr Mittens was sitting next to Pip Squeak. We know that Mr Mittens must have been on either Pip Squeak’s left or right, but we couldn’t say for sure which until we’d processed more clues. We might later learn that Duchess was sitting to Pip Squeak’s left, which in turn would tell us that Mr. Mittens must be sitting to her right.

To follow this extended chain of logic you need to hold multiple simultaneous superpositions in your head. This is fun and challenging and good puzzle design, but my kid hasn’t done superpositions at school yet so he didn’t get it. I tried drawing some pictures for him, but they made no sense even to me. We got angry with each other and eventually gave up on the game altogether.

But we’d really had a great time with those first few puzzles, so that evening I wrote us a computer program that generated an infinite number of new beginner level Cat Crimes challenges. I ran it 20 times and printed out the challenges and their solutions. The next day we continued happily solving age-appropriate cat mysteries together.

Downloads

You can download:

The program works by generating random challenges until it finds one that has a single unique solution and meets certain constraints. The constraints ensure that the challenges are easy but not too easy. For example, a maximum of 3 cats can be asleep (meaning that they are out of the round), and a maximum of 2 clues are allowed to tell you a cat’s exact position.

In order to play the puzzles you’ll need to buy the Cat Crimes game.

Good luck, and let me know how you get on!

ChatGPT mode

At first I tried asking ChatGPT to generate puzzles for me. My puzzles are guaranteed to be solvable and probably about the right difficulty, but since they’re randomly generated their solutions don’t generally have much of a careful narrative behind them.

I thought that ChatGPT might be able to do better. “Absolutely!” it said when I asked it, but it kept giving me back puzzles that had either several different solutions or no solutions at all. No dice!

To fix this I added a ChatGPT mode to my tool. In this mode the tool gives you a prompt to paste into ChatGPT. The prompt asks ChatGPT it to give you a Cat Crimes puzzle formatted in a specific way. You paste ChatGPT’s output back into the tool, and the tool checks whether the puzzle is valid. If it is then the tool converts the puzzle into printable card; if it’s not then it prints an error message for you to give to ChatGPT to help it fix the problem. You continue this debugging loop until you have a valid (and hopefully more fun) puzzle.

Disclaimer

I’m not associated with Cat Crimes in any way; this is a completely unofficial fan project. Cat Crimes is owned and published by Thinkfun Inc. Go and buy it from them!

PySkyWiFi: completely free, unbelievably stupid wi-fi on long-haul flights

2024-07-09 08:00:00

The plane reached 10,000ft. I took out my laptop, planning to peruse the internet and maybe do a little work if I got really desperate.

I connected to the in-flight wi-fi and opened my browser. The network login page demanded credit card details. I fumbled for my card, which I eventually discovered had hidden itself inside my passport. As I searched I noticed that the login page was encouraging me to sign in to my airmiles account, free of charge, even though I hadn’t paid for anything yet. A hole in the firewall, I thought. It’s a long way from London to San Francisco so I decided to peer through it.

I logged in to my JetStreamers Diamond Altitude account and started clicking. I went to my profile page, where I saw an edit button. It looked like a normal button: drop shadow, rounded corners, nothing special. I was supposed to use it to update my name, address, and so on.

But suddenly I realised that this was no ordinary button. This clickable rascal would allow me to access the entire internet through my airmiles account. This would be slow. It would be unbelievably stupid. But it would work.

Several co-workers were asking me to review their PRs because my feedback was “two weeks late” and “blocking a critical deployment.” But my ideas are important too so I put on my headphones and smashed on some focus tunes. I’d forgotten to charge my headphones so Limp Bizkit started playing out of my laptop speakers. Fortunately no one else on the plane seemed to mind so we all rocked out together.

Before I could access the entire internet through my airmiles account I’d need to write a few prototypes. At first I thought that I’d write them using Go, but then I realised that if I used Python then I could call the final tool PySkyWiFi. Obviously I did that instead.

Prototype 1: Instant Messaging

Here’s the basic idea: suppose that I logged into my airmiles account and updated my name. If you were also logged in to my account then you could read my new name, from the ground. You could update it again, and I could read your new value. If we kept doing this then the name field of my airmiles account could serve as a tunnel through the airplane’s wi-fi firewall to the real world.

This tunnel could support a simple instant messaging protocol. I could update my name to “Hello how are you.” You could read my message and then send me a reply by updating my name again to “Im fine how are you.” I could read that, and we could have a stilted conversation. This might not sound like much, but it would be the first step on the road to full internet access.

I paid for the internet on my old laptop. I hadn’t finished migrating my data off this computer, so it still had to come everywhere with me. I messaged my wife to ask her to help me with my experiments. no, what are you talking about, i'm busy she replied, lovingly.

So instead I took out my new laptop, which still had no internet access. I created a test airmiles account and logged into it on both computers. I found that I could indeed chat with myself by updating the name field in the UI.

sequenceDiagram
    participant Computer1
    participant AirmilesAccount as Airmiles Account<br>Name Field
    participant Computer2
    
    Computer1->>AirmilesAccount: TYPE: Hello how are you
    AirmilesAccount->>Computer2: READ: Hello how are you
    Computer2->>AirmilesAccount: TYPE: Im fine how are you
    AirmilesAccount->>Computer1: READ: Im fine how are you

This was a lousy user experience though. So I wrote a command line tool to automate it. My tool asked the user for a message, and then behind the scenes it logged into my airmiles account via the website, using my credentials. The tool updated the name field of my test account with the user’s message. It then polled the name field every few seconds to see if my account’s name had changed again, which would indicate that the other person had sent a message back. Once the tool detected a new value it printed that value and asked the user for their next reply, and so on.

sequenceDiagram
    actor Me
    participant AirmilesAccount as Airmiles Account<br>Name Field
    actor You
    
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>You: (no new data)
    Me->>AirmilesAccount: WRITE: Hello how are you
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>You: READ: Hello how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>Me: (no new data)
    You->>AirmilesAccount: WRITE: Im fine how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>Me: READ: Im fine how are you

Using this tool I could chat with someone on the ground, via my terminal. I wouldn’t have to pay for wifi, and neither of us would have to know or care that the messages were being sent via my SkyVenture Premium Gold Rewards account.

I still needed to find someone who would chat with me. But this was a good start!

NB: at this point I didn’t want to send any more automated data through my airmiles account in case that got me in trouble somehow. Nothing I was doing could possibly cause any damage, but some companies get jumpy about this kind of thing.

I therefore proved to myself that PySkyWiFi would work on my airmiles accounts too by updating my name ten or so times in quick succession. They all succeeded, which suggested to me that my airmiles account probably wasn’t rate-limiting the speed or number of requests I could send to it.

I then wrote the rest of my code by sending my data through friendly services like GitHub Gists and local files on my computer, using the same principles as if I were sending it through an airmiles account. If PySkyWiFi worked through GitHub then it would work through my Star Power UltimateBlastOff account too. This had the secondary advantage of being much faster and easier for iteration too.

I’m going to keep talking about sending data through an airmiles account, because that’s the point I’m trying to make.

Prototype 2: Live headlines, stock prices, and football scores

The tunnel I’d constructed through my airmiles account would be useful for more than IMing. For my next prototype I wrote a program that would run on a computer back at my house or in the cloud, and would automatically send information from the real world up to me on the plane, through my airmiles account. I could deploy it before I left for my next flight and have it send me the latest stock prices or football scores while I was in the sky.

To do this I wrote a daemon that would run on a computer that was on the ground and connected to the internet. The daemon constantly polled the name field in my airmiles account, looking for structured messages that I sent to it from the plane (such as STOCKPRICE: APPL or SCORE: MANUNITED). When the daemon saw a new request it parsed it, retrieved the requested information using the relevant API, and sent it back to me via my airmiles account. It worked perfectly.

Now I could use my first prototype to send IMs through my airmiles account, and I could use my second prototype tio follow the markets and the sports.

It was time to squeeze the entire internet through my airmiles account.

The real thing: PySkyWiFi

During the rest of the flight I wrote PySkyWiFi. PySkyWiFi is a highly simplified version of the TCP/IP protocol that squeezes whole HTTP requests through an airmiles account, out of the plane, and down to a computer connected to the internet on the ground. A daemon running on this ground computer makes the HTTP requests for me, and then finally squeezes the completed HTTP responses back through my airmiles account, up to me on my plane.

This meant that on my next flight I could technically have full access to the internet, via my airmiles account. Depending on network conditions on the plane I might be able to hit speeds of several bytes per second.

DISCLAIMER: you obviously shouldn’t actually do any of this

Here’s how it works (and here’s the source code).


How PySkyWiFi works

PySkyWiFi has two components:

  • The sky proxy - a proxy that runs on your laptop, on a plane
  • The ground daemon - a daemon that runs on a computer connected to the internet, at your home on the ground or in the cloud

Here’s a simplified diagram:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: HTTP request
    SkyProxy->>AirmilesAccount1: HTTP request
    AirmilesAccount1->>GroundDaemon: HTTP request
    GroundDaemon->>Website: HTTP request
    Website->>GroundDaemon: HTTP response
    GroundDaemon->>AirmilesAccount1: HTTP response
    AirmilesAccount1->>SkyProxy: HTTP response
    SkyProxy->>Me: HTTP response

Setup starts before you leave your house. First you start up the ground daemon. Then you get a taxi to the airport, get on the plane, and connect to the plane’s wi-fi network. You boot up the sky proxy on your laptop. Your PySkyWiFi relay is now ready to go.

You use a tool like curl to make an HTTP request to the sky proxy that you’ve started on your laptop. You address your request to the proxy (eg. localhost:1234/) and you put the actual URL that you want to query inside a custom HTTP header called X-PySkyWiFi. For example:

curl localhost:1234 -H "X-PySkyWiFi: example.com"`

The X-PySkyWiFi header will be stripped by the ground daemon and used to route your request to your target website. Everything else about the request (including the body and other headers) will be forwarded exactly as-is.

Once you make your request it will hang for several minutes. If by some miracle nothing breaks then you’ll eventually get back an HTTP response, exactly as if you’d sent the request over the normal internet like a normal person. The only difference is that it didn’t cost you anything. You will now almost certainly pay for wi-fi, because your curiosity has been satisfied and your time on this earth is very short.

Step-by-step

Here’s what happens behind the scenes:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: example.com"
    SkyProxy->>AirmilesAccount1: Write request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 1
    GroundDaemon->>AirmilesAccount2: Ack request chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ack for request chunk 1
    SkyProxy->>AirmilesAccount1: Write request chunk 2
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 2
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: example.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>
    GroundDaemon->>AirmilesAccount2: Write response chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 1
    SkyProxy->>AirmilesAccount1: Ack request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ack for request chunk 1
    GroundDaemon->>AirmilesAccount2: Write response chunk 2
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 2
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>

In order:

  1. The sky proxy receives the HTTP request from your curl call. It splits the request into chunks, because the entire request is too large to fit into you airmiles account in one go
  2. The sky proxy writes each chunk one-by-one to the name field in your airmiles account.
  3. The ground daemon polls your airmiles account. When it detects that the name field has changed to a new chunk, it reads that chunk and sends an acknowledgement to the sender so that the sender knows it’s safe to send the next chunk. The receiver sticks the chunks back together and rebuilds the original HTTP request
  4. Once the ground daemon has received and rebuilt the full HTTP request, it sends the request out over the internet.
  5. The ground daemon receives an HTTP response.
  6. The ground daemon sends the HTTP response up to the sky proxy using the same process as before, in reverse. This time the ground daemon splits the HTTP response up into chunks and writes each chunk one-by-one to the name field in your airmiles account (it actually writes these response chunks using a second airmiles account to make the protocol simpler)
  7. The sky proxy polls the second airmiles account. It reads each chunk and sticks them back together to rebuild the HTTP response
  8. The sky proxy returns the HTTP response to the original call to curl. As far as curl is concerned this is a perfectly normal HTTP response, just a little slow. curl has no idea about the silliness that just transpired

The sky proxy and the ground daemon are relatively simple: they send HTTP requests and parse HTTP responses. The magic is in how they squeeze these requests and responses through an airmiles account. Let’s look closer.

Squeezing HTTP requests through an airmiles account

PySkyWiFi’s communication logic is split into two layers: a transport layer, and a network layer. The transport layer’s job is to decide what data clients should send to each other. It dictates how senders should split up long messages into manageable chunks, as well as how senders and receivers should signal information like “I am ready to receive another chunk.” The PySkyWiFi transport layer is somewhat similar to the TCP protocol that powers much of the internet, if you squint very hard and don’t know much about TCP.

By contrast, the network layer’s job is to actually send data between clients, once the transport protocol has decided what that data should be. It’s vaguely similar to the IP protocol, if you squint even harder and know even less what you’re talking about.

This division of responsibility between layers is useful because the transport layer doesn’t have to care about how the network layer sends its data, and the network layer doesn’t care what the data it sends means or where it came from. The transport layer just hands the network layer some data, and the network layer sends it however it likes.

This separation makes it easy to add support for new airmiles platforms, because all we have to do is implement a new network layer that reads and writes to the new type of airmiles account. This separation also allows us to write test versions of the network protocol that write and read from local files instead of airmiles accounts. In each case the network layer changes, but the transport layer stays exactly the same. Here’s how they work.

The transport layer

A PySkyWiFi transport connection between two clients consists of two “pipes” (or “airmiles accounts”). Each client has a “SEND” pipe that it can write data to, and a “RECV” pipe that it can read from. Clients write to their SEND pipe by writing data to it, and they read from their RECV pipe by constantly polling it and seeing if anything has changed.

flowchart LR
    Client1 --> Client2
    Client2 --> Client1

From the transport layer’s point of view, a pipe is just something that it can write and read data from. Beyond that the transport layer doesn’t care how its pipes work.

At any given moment a PSWF (PySkYWiFi) client can only either send or receive data, but not both. A client in send mode will not see data sent by the other client, and a client in receive mode should never send data because the other client won’t see it. This is unlike TCP, where clients can send or receive data at ay time.

When squeezing HTTP requests and responses through an airmiles account, the sky proxy sends the first message and the ground daemon receives it. Once the sky proxy has finished sending its HTTP request it switches to receive mode and the ground daemon switches to send. The ground daemon makes the HTTP request and sends back the response, at which point the two switch roles again so that the sky proxy can send another HTTP request.

How are long messages sent through such a small pipe?

PSWF uses small pipes (such as an airmiles name field) that can’t fit much data in them at once. This means that it takes some work and care to squeeze long messages (like HTTP requests) through them.

To send a long message, the sender first splits up their message into chunks that will fit into their SEND pipe. They then send each chunk down the pipe one at a time.

To begin a message, a sender starts by sending its first chunk of message data inside a DATA segment:

A DATA segment consists of:

  • The letter D
  • The sequence number of the chunk (a number that uniquely identifies the chunk, padded to 6 digits)
  • The actual chunk of data.

For example, a data segment in the middle of a message might read: D000451adline": "Mudslide in Wigan causes m

Once the sender has sent a DATA segment, it pauses. It wants to send its next DATA segment, but it can’t overwrite the airmiles account name field until it knows that the receiver has received and processed the previous one.

The receiver tells the sender that it’s safe for to send a new DATA segment by acknowledging every segment that it reads. The receiver does this by writing an ACK segment to its own SEND pipe:

An ACK segment consists of:

  • The letter A
  • The sequence number of the segment that is being acknowledged (padded to 6 digits)

For example: A000451

The sender is constantly polling its own RCV pipe to check for changes, and so it reads this new ACK segment promptly. Once the sender reads the ACK, it knows that the receiver has received the segment corresponding to the ACK’s sequence number. For example, if a sender receives an ACK segment with sequence number 000451, the sender knows that it’s safe to send the next DATA segment with sequence number 000452. The sender therefore pulls the next chunk from its message and constructs a new DATA segment using this chunk and sequence number. The sender writes the new segment to its SEND pipe, and then pauses waits for another ACK.

This loop continues until the sender has sent all the data in its message. To tell the recipient that it’s finished, the sender sends an END segment.

An END segment is just the letter E.

When a receiver sees an END segment it knows that the sender’s message is over. The sender and the receiver swap roles. The old sender starts polling its RECV pipe for DATA segments, and the old receiver starts chunking up its response message and sending it through its pipe, exactly as before.

None of this transport logic cares about the details of the network layer through which the segments are sent. The transport layer just needs the network layer to provide two pipes that it can read and write to. The network layer can pipe this data around via local files, a Discord profile, or an airmiles account. This genericness is what allows PySkyWiFi to work with any airline’s airmiles account, so long as the airline allows you to login to it from the plane without paying.

Here’s how PSWF uses transport protocol segments to exchange long messages:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as robertheaton.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: robertheaton.com"
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon->>AirmilesAccount2: Write ACK segment<br>sequence number=000000
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ACK segment<br>sequence number=000000
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: robertheaton.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>
    GroundDaemon->>AirmilesAccount2: Write DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy->>AirmilesAccount1: Write ACK segment<br>sequence number=000000
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ACK segment<br>sequence number=000000
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>

The transport layer decides what data the clients should send each other, but it doesn’t say anything about how they should send it. That’s where the network protocol comes in.

The network layer

The network layer’s job is to send data between clients. It doesn’t care about where the data came from or what it means; it just receives some data from the transport layer and sends it to the other client (typically via an airmiles account).

This means that the network layer is quite simple. It also means that adding a new network layer for a new airmiles platform is straightforward. You use the new platform to implement a few operations and a few properties (see below), and then the transport layer can automatically to use your new airmiles platform with no extra work.

A network layer consists of two operations:

  • send(msg: str) - write msg to storage. For an airmiles-based implementation, this writes the value of msg to the name field in the user’s airmiles account
  • recv() -> str - read the message from storage. For an airmiles-based implementation, this reads the value of the name field from the user’s airmiles account.

A network layer implementation must also define two properties:

  • sleep_for - the number of seconds that the transport layer should sleep for in between polling for new segments from a RECV pipe. sleep_for can be very low for test implementations like files, but it should be at least several seconds for an implementation like an airmiles account. This is in order to avoid hammering remote server with too many requests.
  • segment_data_size - the number of characters that the transport layer should send in a single segment. Should be equal to the maximum size of the airmiles account field being used to transfer segments (often around 20 characters).

A network layer implementation can also optionally provide two more operations:

  • connect_send() - a hook called by the sender when a SEND pipe is initialised. In an airmiles-based implementation this allows the client to login to the platform using a username and password. This gives the client a cookie that it can use to authenticate future send and recv calls.
  • connect_recv() - a hook called by the receiver when a RECV pipe is initialised

If you fill in all these methods, you’ll be able to use PySkyWiFi on a new airline. But again, don’t.

Tips and tricks

When writing a network layer that uses a new airmiles provider, there are a couple of tricks that can make your implementation faster and more reliable.

1. Encode messages to make sure the airmiles account accepts them

Airmiles HTML forms usually don’t let users include non-alphabetic characters in their name. Stephen will probably be allowed, but GET /data?id=5 will probably be rejected.

To work around this, the network layer should encode segments using base26 before writing them to an airmiles account. base26 is a way of representing a string using only the letters A to Z . In order to convert a byte string to base26, you convert the bytes to a single large number, then you represent that number using a counting system with base 26 (hence the name) where the digits are the letters A to Z.

def b26_encode(input_string: str) -> int:
    # Convert input string to a base-256 integer
    base256_int = 0
    for char in input_string:
        base256_int = base256_int * 256 + ord(char)
    
    # Convert base-256 integer to base26 string
    if base256_int == 0:
        return 'A'  # Special case for empty input or input that equals zero
    
    base26_str = ""
    while base256_int > 0:
        base26_str = chr(base256_int % 26 + 65) + base26_str
        base256_int //= 26
    
    return base26_str

b26_encode("Hello world")
# => 'CZEZINADXFFTZEIDPKM'

The transport layer never needs to know about this encoding. The network layer receives some bytes, encodes them using base26, and writes this encoded string of A to Z to the airmiles account. When the network layer reads the base26 value back out of the airmiles account, it decodes the encoded string back into a number and then back into bytes, and then returns those bytes to the transport layer.

Encoding a string using base 26 makes it significantly longer, just like how it takes many more digits to represent a number using binary than decimal. This reduces the bandwidth of our protocol. We could increase our bandwidth by using base52 (using both upper- and lower-case letters) instead of base26, which would shorten it somewhat. This is left as an enhancement for version 2.

2. Increase bandwidth by using more account fields

Another way to increase our PSWF bandwidth is to increase the segment size that a network layer can handle. If we double the size of our segments, we double the bandwidth of our protocol.

Fields in airmiles accounts usually have length limits. For example, you might not be allowed to set a name longer than 20 characters. However, we can maximise our bandwidth by:

  1. Using the full length of the field
  2. Spreading out a segment across multiple fields

Suppose we have control over 5 fields that can each store 20 characters. Instead of using one field to transmit segments of 20 characters, we can split a 100 character segment into 5 chunks of 20 and update them all at once in a single request. The receiver can then read all 5 fields, again in a single request, and stitch them back together to reconstruct the full segment.

Further enhancements

HTTP CONNECT

It would be better if PySkyWiFi used HTTP CONNECT requests to set up the tunnel from the sky proxy to the target site, instead of manually tossing around HTTP requests. CONNECT requests are how most HTTP proxies work, and using them would allow PySkyWiFi to act as the system-level proxy and so handle requests from a web browser. It would also mean that PySkyWiFi would negotiate TLS connections with the target website directly, so its traffic would be encrypted as it passed through the airmiles account.

On the other hand, using CONNECT would also be a lot more work and I’ve already taken this joke way too far.

In conclusion

When I was done with all of this I used PySkyWiFi to load the homepage of my blog using curl, tunneling the data via a GitHub Gist. Several minutes later I got a response back. I scrolled around the HTML and reflected that this had been both the most and least productive flight of my life.

(PySkyWiFi source code here)

I've written a book about being a dad; now I want to get it published

2024-03-25 08:00:00

For the last eighteen months I’ve been writing a book about being a dad. Two weeks ago I finished the first draft!

The book is inspired by my blog posts about parenting, but most of it is brand new and I think it might be very good. It’s about childbirth, covid, careers, old friends, new friends, kid friends, chess, pianos, screens, AI, marriage, and much more.

Now that I’ve finished a draft I’m looking for an agent and a publisher. I’ve never done this before so I’d appreciate help and advice! Please get in touch if:

  • You’ve published a book or know someone who has
  • You are a literary agent or you know someone who is
  • You’d be up for giving feedback on a first draft
  • You have some words of encouragement
  • You have any thoughts or ideas of any sort about anything

If you want to find out when the book is ready, subscribe to my newsletter. If you have friends who you think would enjoy the book, tell them about it and make them subscribe too.

Thanks!

Thousands of elderly twins assure me that my kids will be alright

2023-10-18 08:00:00

I know that time spent with my kids is supposed to be its own reward, and it is. But I also want to believe that what I do in this time matters, as much as possible. Elegantly handling a tantrum feels more worthwhile if I’m helping my son learn to express his feelings, not just making it through another day. I find more contentment at the end of a long afternoon if I think that I did a good job and that this good job will echo through the ages, or at least after bedtime.

I want my kids to be happy and fulfilled, skilled and accomplished, and I want to be able to help. Shouldn’t I try to pass my good habits onto them while shielding them from my dark thoughts? This shouldn’t be too hard; I’m their dad, they see me every day. My eldest child, Oscar, is only 4, but I already think he might be a little remarkable, and thick black lines seem to lead back to what Gaby (my wife) and I do with him. The spark and smarts are his, but I feel like I must surely be making a difference.

However, I recently read “Selfish Reasons To Have More Kids” by Bryan Caplan, an economist and blogger, and it’s turned me a little upside-down. Caplan observes that many parents wreck themselves trying to boost and polish their children. He argues that this isn’t just a bad tradeoff, but an almost total waste of time. He presents reams of remarkable research suggesting that, in Western middle-class families, parents’ choices have almost no influence on their children’s long-term health, intelligence, happiness, success, or character. Parents achieve nothing by sending their kids to extra maths lessons, hiding the TV remote, or even teaching them the value of hard work. Caplan shows that upbringing counts for almost nil (at least within the Western middle-class), and that genetics and randomness are everything. It appears that nothing within parental control matters.

Caplan presents his arguments as a gift, one that frees parents from eighteen years of guilt and wasted effort. In his telling there’s little that parents can do to influence their children in the long-run, so there’s no point and no duty for them to try. Kids have genes and free will; now let go and enjoy your time together.

Caplan knows that some parents will rebel against his arguments. I certainly did. I heard him telling me that I don’t matter, at least not in the ways that I’d hoped. I want parenting to be a deep, complex vocation, and I want to spend the coming decades playing a domestic game of skill and consequence. The idea of having children who I have no influence over is scary, like living with werewolves. Randomness and outside forces are everywhere and the kids are mutating while I sleep.

But even though I want to be relevant, I don’t want to waste my time. Begrudgingly, I kept reading.


What’s the evidence?

Caplan’s claim that parents have little long-term influence on their children seems absurd at first. Contra Caplan, I see my influence in my children every day. Oscar likes the same music as me. He used to be terrified of playgrounds but Gaby screwed a wooden ladder to his bedroom wall and now he’s mostly normal. I stubbed my toe and shouted “fuck!” and he whispered “fuck indeed daddy, you sound frustrated,” failing to calm me in the same way that I fail to calm him. This is surely common sense.

But common sense grows in unscientific environments. Nature and nurture are conflated, we don’t see the aggregates, and we don’t see the long-term. Kaplan agrees that parents have huge influence over their children in the short-term, but he also argues that this influence fades, sometimes fast, sometimes slow, but it does fade, and it vanishes completely when they grow up and finish becoming whoever they are. Kids are resilient to setbacks, but they’re resilient to assistance too.

In order to rigorously test theories like this, researchers study large groups of children. However, most kids are useless to them. Suppose that two happy parents have and raise a child. The child grows up with their parents, and in time they become a happy adult too. It’s impossible to know whether the child’s happiness comes from happy genes that they inherited from their happy parents, or from the happy environment that their happy parents raised them in. Their parents’ genes and choices are irreversibly mixed together. Even with a huge database of children, parents, and measurements of happiness, causalities are impossible to itemise.

Fortunately, researchers can still extract good data from special children, like identical twins who were separated at birth. These kids give researchers two copies of the same genes, raised in different environments. Since separated identical twins share genes but not environments, any systematic differences between them must be due to their different upbringings. If identical twins raised separately bear no resemblance to each other but are similar to their adopted siblings, this would suggest that the twins were shaped by their divergent upbringings. If the twins remain similar, despite growing up entirely separately, this would suggest that they were made by their identical genes.

Researchers slice and measure these children, pulling apart the effects of nature and nurture. Twins separated at birth are the gold standard, but non-twin adoptees and non-adopted twins can work too. The researchers find or build databases of useful children (who may now be adults), and compare their grades (perhaps from school records), income (perhaps from tax records) or personalities (perhaps from administering personality tests directly). The evidence from this data is strong and consistent: a near-zero effect of upbringing on character, happiness, and almost everything else.

Should I pay attention to the evidence?

The studies are clever, but are they valid? They control naturally for almost everything, but they still aren’t perfect. For example, maybe parents who choose to adopt are meaningfully different to the average parent, meaning that conclusions based solely on them don’t generalise to the rest of the population. Maybe parents who choose to adopt and then also agree to be part of a long-term study are even more different. Maybe women who have twins are different. Maybe twins themselves are different too.

But even if these sampling biases are material, I doubt that they’re large enough to tear down the studies’ broad conclusions. I’d guess that adoptees and twins separated at birth are a good enough sample to represent humanity, and that even if they aren’t fully representative, they probably aren’t masking a giant effect that skips twins and applies only to the rest of us. If researchers were able to fully control for sampling biases then this might shift their estimate of the effect of parental influence from “incredibly low” to merely “very, very low”.

Caplan admits that the studies are primarily focussed on the Western middle class, because that’s where the data is. This hurts the studies’ generalisability but binds me - an orthodox member of their class - even tighter. All said, I think I have to assume that the studies pointing towards the primacy of genes are valid for people like me.

But do the studies definitely apply to me, or you, specifically? They find no effect of parenting style on children’s adult outcomes, within the range of normal parenting styles in middle-class Western families. That word “normal” might provide an opening for a determined parent to squeeze through in order to regain their lost gravity. The studies suggest that there’s no difference between the free-range and regimented ends of the normal spectrum, but they can’t say anything definite about what happens beyond the edges of normality.

Caplan recommends that parents dissolve their fears and ambitions in the acid balm of the evidence. But there is another response that’s consistent with the data, although it might not necessarily be a good idea: redouble your efforts and head for the ambiguity beyond the well-studied centre, where the evidence might not stretch. More enrichment, more practice, more effort, fewer half-measures.

This makes some common sense; think about outlandish famous families. The Williams sisters must be naturally gifted tennis players, but they surely wouldn’t be the same dominant champions without their obsessive dad. The Polgar sisters would have been unremarkable chess players without theirs. These types of childhoods are so rare that they can’t possibly be adequately represented in any of the twin datasets, so the research doesn’t have anything direct to say about them. Twin studies don’t disprove the Jackson Five.

Caplan claims that kids are elastic, and that whether helped or harmed they tend to snap back to their natural state. However, I learned in physics classes that not even elastic is perfectly elastic. It pings back to its original shape after mild deformation, but it can still be altered permanently if stretched beyond a point called its elastic limit. Whilst the sum of small interventions on a child might be zero, it might still be possible to permanently deform them (in a good way) through the application of massive force. This metaphor is so perfect that it must surely be true.

In fact, even Caplan is stretching his own kids like this. In a 2015 blog post (the book was written in 2011), he describes the rigorous homeschool that he runs for his two eldest (twins, coincidentally). The main reason he homeschools them, he says, is because they are particularly academic kids, and they all think that they will enjoy an uncompromising homeschool more than a conventional one. However, he also suspects that his homeschool might be so off-the-scale remarkable that it vaults over the evidence and produces better adult outcomes, despite his claims that this is usually impossible. He writes:

I suspect – though I’m far from sure – that the Caplan Family School is such an exceptional experience that ordinary twin and adoption evidence isn’t relevant.  For example, my sons are plausibly the only 12-year-olds in the nation taking a college class in labor economics.

Should you or I try to do this too? It’s almost always delusional to put yourself and your children in a category called “exceptional”, and this might not even be a category that you want to be in. I do wonder, though, where does “normal” end and “exceptional” begin? Where’s the elastic limit, and how weird is it really? Is anything less than the Williams sisters a waste of time? Or does the curve bend much sooner than that? Even if you don’t want to do anything too odd by modern standards, a lot of the data in these studies comes from dead twins brought up decades ago. Today’s parenting zeitgeist might not necessarily be better than the old days, but it’s certainly different. How well does data from a different era in parenting generalise to today? Is it possible that even normal parenting today is different enough from several decades ago to have a material impact?

Is reading a respected parenting manual and teaching your toddler to add and multiply too normal and futile, or just crazy enough that it might work? I don’t want to be Richard Williams and I couldn’t even be Bryan Caplan, but I could be a bit weirder than average if that was worthwhile and harmless. I might be inventing straws to clutch at, but as far as I know there’s no cast iron science out here so we’re allowed to make things up again and I can assert a world in which I have agency.


What should I do now?

I’ve drilled a tenuous airhole in Caplan’s claims, but his evidence is still strong, spiky, and hard to digest without a rupture in my plans. Normally when confronted with new evidence you can wisely say “it’s probably a combination of everything” and then maybe do a bit more or less of something, or not. However, Caplan argues specifically that parenting is not a combination of everything. Everything is nature, at least in the long-term. His arguments are backed by simple and compelling studies that are hard to wishy-wash away and that block the easy path back to the status quo.

But it’s drastic to change how you raise your kids based on a short book and some studies that you aren’t going to read. The book’s claims are extreme, at least compared to what I used to think, and it’s hard to build enough confidence to change your mind about things that matter to you. I rarely need to develop solid beliefs about messy, unsettled topics that I’m not an expert in. I’ve skimmed a few paper abstracts and some reviews of the book, but that doesn’t feel like enough. Caplan seems smart and honest but this isn’t settled science and how do I know he’s not missing or ignoring grave methodological gaffes?

I can’t unread the book, and as someone who likes to consider themselves a somewhat scientific, data-driven parent, I can’t ignore it. So what should I do now?

I think I value my children for who they are already, but it’s good to be reminded to start there. I don’t care whether I have any long-term influence on my friends, I just like spending time with them and being there if they need me. So why do I care about being able to shape my kids? The desire to help your children is surely natural and normal, to a degree, but that doesn’t mean it’s always helpful.

Caplan says that I can stop worrying about whether I’m wrecking Oscar’s future habits and character. I try not to fret like this, but often it’s unavoidable. Does he play by himself enough? Does he watch too much TV? Are we letting him be too picky with his food? Should we use more discipline when he won’t share? Less discipline? I extrapolate today’s small behavioural decisions ten years forward into a bleak future. I fear that parenting is a system of positive feedback loops, where deviations become liberties that congeal into nightmares. But Caplan says that everything is fluid and reverts to the mean and I shouldn’t sweat the deviations. Bribe kids to behave, give them unlimited social media time, none of it matters, they’re much less of a blank slate than you think. Nothing will come back to bite you, and if you do get bitten then there was nothing you could have done to stop it.

Still, I’m not ready to stop trying to help my kids flourish. I’m not confident enough that Caplan’s evidence applies to my family and my era, and in any case at the moment I don’t have to make any tradeoffs. Oscar and I do a lot that I’d previously assumed would benefit him in the longterm: maths, reading, piano. For now he enjoys nearly all of it and so do I, so nothing is being sacrificed. I’m sure that this will change as he gets older, but at the moment it’s more fun for me to talk to Oscar about multiplication and prime numbers than pretend to order another pasta with cheese from his play-dough restaurant. On some days he doesn’t want to do any sums and tells me to get lost. But even if I was certain that the long-term impact on his future earnings would be zero, I’d still take him to the science museum and try to remember how aeroplanes work.

This sounds relaxed and balanced, but it’s easy to be sanguine when there aren’t any dilemmas. If he stops being interested in things that I think are valuable then I’m sure I’ll feel anxious, and I’ll struggle when he starts making decisions that I think are mistakes. I’ll reevaluate when I’m forced to, but for now I hope that it’s possible to both try to help your kids excel and to live with them in the moment.

I wonder if these studies should change how I see the rest of the world too. I’m friends with my old physics teacher from high school. I went to his house for lunch and told him about Caplan’s book. He was horrified. “If that’s true, is there any point in me trying to be a good teacher?” he said. This had occurred to me too. If parents truly have no lasting influence on their children, how can schools, or local theatres, or any kind of small public policy intervention hope to have any? Maybe it’s even harder than I thought to make any long-term difference to anything.

And how should I think about traits that have value but don’t show up in survey data? For example, I can take Oscar to piano lessons and encourage him to practice. Most adults who know how to play the piano probably had lessons when they were younger, and their parents probably pushed them at least a little. Does being able to play the piano matter, morally and cosmically, even if it has no impact on income, happiness, or anything else that can easily be measured? The harder you think and the more precise the questions, the more you need a detailed moral philosophy.


It’s helpful to have thousands of elderly twins reminding me that my kids will probably be fine, whatever I do. Everything reverts to the mean, the twins murmur kindly. Don’t be too smug when things are going the way you hoped, and don’t despair when they aren’t.

I’m not ready to fully accept my obsolescence yet. We’ll watch more TV but we’ll keep doing maths together. One day we’ll start to disagree, and then we’ll reassess. Caplan does throw me one bone: “parents [have] moderate influence over how much their children like them.” Even if nothing I do adds up to anything, the days will hopefully make a happy childhood.

Read more of my essays about parenthood here. Plus, I’m writing a book about having kids! Subscribe to my newsletter for updates.

Hello Deep Learning

2023-10-13 08:00:00

Introductions to deep learning are too complicated and spend too much time trying to thrill you with details and real-world applications.

This makes them a frustrating place to start. You already know that deep learning is amazing and that it actually works on real problems. You know that most of the hard work in industry is in the data cleaning. You don’t want to set up a new environment, or play with parameters, or get dirty in the data.

The actual first thing you want to do is to train a model, as soon as possible, and it doesn’t matter how simple it is. Once you’ve trained your own model you’d be more than happy to learn about overfitting, data cleaning, and splitting strategies as well. But first you just want to create something yourself and see it work.

Hello Deep Learning is the missing introduction to deep learning. It’s a series of challenges, each of which gives you a task and a perfect, synthetic dataset and asks you to train and play with a trivial model. The challenges cover image generation, text classification, and tabular data, and each one:

  • Runs on your laptop
  • Trains in a few seconds
  • Uses perfect, noiseless, synthetic data that takes seconds to generate
  • Has absolutely no sidebars

Hello Deep Learning allows you to rapidly experiment with simple models and take your first steps in a calm, kindhearted environment. It gets you ready to leap into the detail and chaos of the real-world.

You can get the challenges, data generation scripts, and setup instructions on GitHub. Let me know how you get on; if they’re useful then I’ll make more!

The challenges

1. Image classification

Challenge

Train a classifier that distinguishes between red squares and yellow circles. Your program should be able to:

  1. Train a model that can distinguish between squares and circles
  2. Use it to run a few individual predictions on specific images
  3. Display the confusion matrix

Data generation

The repo includes a script that generates 200 images of circles and 200 of rectangles and saves them in the data/shapes/ directory.

Tips

  • I did this using a fastai vision_learner based on the resnet18 pretrained model.
  • I mostly copied and stitched together code snippets from chapter 2 of the fastai book

2. Text classification

Challenge

Train a classifier that distinguishes between text inputs of positive and negative words, for example "happy chirpy awesome" and "awful terrible heinous". Your program should be able to:

  1. Train a model that can distinguish between this type of positive and negative input
  2. Use it to run a few individual predictions on specific inputs

Data generation

The repo includes a script that generates 1000 text files containing positive words and 1000 containing negative words and saves them in the data/sentiment_text/ directory.

Tips

  • I did this using a fastai language_model_learner based on the AWD_LSTM pretrained model, and a fastai text_classifier_learner.
  • I copied and stitched together code snippets from chapter 10 of the fastai book

3. Decision trees

Challenge

Train decision trees that reverse-engineer the rules from src/generators/random_tabular.py that were used to randomly generate a tabular dataset. Your program should be able to

  1. Train a decision tree that reverse-engineers the rules
  2. Train a random forest that reverse-engineers the rules
  3. Uses these models it to run a few individual predictions on specific inputs
  4. Calculates the RMS error on a validation set
  5. Visualises the decision tree, using (for example) the dtreeviz library

Data generation

The repo contains a script that generates 1 JSON file containing 10,000 data points and saves it in the data/random_tabular/data.json file.. Each data point contains:

  • 6 features: a, b, c, d, e, and f. Each of these is a random integer between 0 and 100.
  • 1 label: y. This label is derived deterministically from the features using simple rules contained in src/generators/random_tabular.py .

Tips

  • I did this using an sklearn DecisionTreeRegressor and RandomForestRegressor.
  • I copied and stitched together code snippets from chapter 9 of the fastai book

My solutions

My solutions are in src/examples/ in the repo, although they’re not the only way to solve the challenges, and they’re almost certainly not the best way to solve them either.

Get the challenges, data generation scripts, and setup instructions on GitHub. Let me know how you get on; if they’re useful then I’ll make more!