MoreRSS

site iconRobert HeatonModify

Frontier red team @AnthropicAI . Writing a book about being a dad. Now I live in London.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Robert Heaton

It's not cheating if you write the video game solver yourself

2025-03-11 08:00:00

My wife and two little boys sometimes go on trips to see friends or family for whom my presence isn’t strictly required. While my wife books the flights I make a show of weighing up my options and asking if it’s really OK if I don’t come. Eventually I’m persuaded that it truly would be the best thing for all of us for me to have five to seven days to myself with no nappies and all Nintendo.

My wife hits the “Pay Now” button and when the confirmation email comes through I call my secretary and tell him to clear my calendar. When no one answers I remember that I have neither a secretary nor all that much going on, so instead I find a prestige TV series, a selection of local takeaway menus, and a short but immersive video game. This time I messaged my buddies and asked them what I should play. Steve gushed about a game called Cocoon; Morris said that he’d played it a year ago and it was “alright.” Sold.

One month later there were hugs, kisses, wave goodbye to taxi, shut the door, where’s the HDMI cable, how did it get under the sink, do I have a Nintendo Account, what’s my password, whatever I’ll make a new one, right let’s do this, estimated download time 45 minutes, start on tax returns, am I relaxed yet?

Eventually I was able to boot up Cocoon. I learned that I was going to play as a little bee guy who has to solve artsy puzzles in a lonely, abstract world for no adequately explained reason. Bee guy makes his way through the world using four glowing orbs. He can pick orbs up, carry them around, and put them back down on switches in order to open doors and unfold bridges - standard orb stuff. However, bee guy soon learns that the orbs also contain other worlds, which he can jump in and out of to help with his puzzles. If he jumps inside one orb-world whilst carrying another one then he can put orb-worlds inside each other. He can even - towards the end - put a world inside itself. Conundrums ensue.

image

By the end of Day 1 of my vacation I was having about three-quarters as much fun as I’d hoped. Cocoon’s atmosphere was absorbing and its puzzles made me smile; I only wish there had been a little bit of a story and not just a fuzzy metaphor for entropy and decay. Still, I kept going, trekking through crumbling ruins and overbearing symbolism. By the end of Day 2 I was 51% through the game. I started the next puzzle and got completely stuck. I was surely just tired, I thought, so I watched an episode of The Sopranos and went to bed. The next morning I got up at 6am, made some coffee, and went into my office to crack on. But I was still stuck. I spent an hour filling a bin bag with toys that I didn’t like, since no one was around to stop me. I tried again. Still stuck.

Then I realised. I didn’t know how to solve the puzzle, but I did know how to write a computer program to solve it for me. That would probably be even more fun, and I could argue that it didn’t actually count as cheating. I didn’t want the solution to reveal itself to me before I’d had a chance to systematically hunt it down, so I dived across the room to turn off the console.

I wanted to have a shower but I was worried that if I did then inspiration might strike and I might figure out the answer myself. So I ran upstairs to my office, hit my Pomodoro timer, scrolled Twitter to warm up my brain, took a break, made a JIRA board, Slacked my wife a status update, no reply, she must be out of signal. Finally I fired up my preferred assistive professional tool. Time to have a real vacation.

image


How I wrote a Cocoon solver

In order to write a Cocoon solver, I needed to:

  1. Model the game’s logic using a Finite State Machine
  2. Use this model to work out the sequence of actions that would get me from a puzzle’s start to its end

1. Model the game’s logic

Cocoon’s mechanics are simple but elegant. In the first level you find an orange orb sitting on a stand. You pick it up, obviously, and start to carry it around. You come to a gate with an orb-stand next to it. You put the orb down on the stand and the gate opens. You pick it up again and continue through the gate.

Eventually you come to a small reflecting pool with a special-looking orb-stand in the middle. There’s no obvious path forward, so you put your orange orb down on the stand. You stand next to it and press A, because that’s the only button in the game that does anything. You watch yourself dive into the orb, and land in another world. You’re now inside the orange orb. You start to walk, You find a new, green orb, which you use to keep solving puzzles and opening doors and moving forwards through the orange orb-world. Throughout the course of the game you find a total of four orbs, and the final puzzles require you to lug them all around and dive in and out of them in just the right order to get across the next bridge.

An example: in one puzzle you need to bring your red orb to the top of a vine so that you can use it to reveal a magic walkway to get to the next area. However, you can only climb up vines while holding your green orb, and you can only hold one orb at a time. This means that you can’t carry the red orb up the vine. The solution is therefore to pick up the red orb, dive inside the green orb, drop the red orb, jump out, use the green orb to climb the vine, dive back inside the green orb, retrieve the red orb, and jump back out.

In order for my solver to analyse Cocoon, it would need to be able to programatically manipulate a copy of it. My solver would need to know how the world was laid out, what actions were allowed, and whether it had finished solving its puzzle yet. The easiest way to do this wasn’t to have the solver interact with the real game, but to instead write a new, stripped-back copy of it that contained all its logic but none of the graphics.

This was made easier by the fact that each of the puzzles that I got stuck on could be represented as a finite state machine. A finite state machine is a system that can be in exactly one of a finite number of states at any given time. The system is able to transition between some pairs of states using a set of known rules.

Most games with a large 3-D world can’t be modelled as a FSM because they have too many possible states and too many possible transitions. The player can be in a near-infinite number of slightly different locations; so can their enemies. The player might have a huge number of items they could be carrying and past actions they could have taken, each of which could affect the world in some important way. Some parts of the game may depend on timing and agility, which are hard to represent in a FSM. In most game levels there’s too much going on for a simple model like an FSM.

However, Cocoon’s is a simple world. There are no enemies. The only items you can carry are 4 orbs. And whilst its 3-D landscape is lovely to look at, it masks a simple topology that can be modelled as a small graph - a collection of nodes (important locations like orbs and switches) and edges (pathways that connect nodes that are immediately accessible from each other). Beyond these characteristics, the exact layout of the world rarely matters.

This means that a “state” in Cocoon is easy to define and manage. A state is defined by the combination of:

  • The node you’re nearest
  • Which orb you’re holding
  • Where the other orbs are
  • (possibly a few other things, depending on the exact puzzle)

Transitions between states are similarly constrained. Players can transition between game states via only a small set of actions, such as:

  • Walking to an adjacent node
  • Picking up or putting down an orb
  • Jumping in or out of an orb

image

This means that it’s relatively simple to write a program that:

  1. Defines the layout of a puzzle world
  2. Defines the state that the game is currently in
  3. Is able to transition between states
  4. Knows how to identify a goal state

Once I’d written this program, I had a representation of the game that I could programmatically manipulate. I could tell my game to do things like “take such-and-such an action” or “tell me all the possible next actions that can be taken from the current state.” This meant that I could use my simplified game to analyse and solve the real one.

2. Work out how to get from the start to the goal state

To solve a puzzle I needed to find a sequence of actions that would take me from the puzzle’s start state to its goal state (for example, opening a door). This was a job for an algorithm called Breadth-First Search (BFS).

BFS starts at an initial state in an FSM and simultaneously takes a step to every new state that can be reached from it. For example, suppose that a puzzle starts with bee guy holding the green orb, next to a door and to a corridor to another section.

image

My solver takes steps both through the door and down the corridor, and begins keeping track of each path. From each of these new states, it takes another step to each further state that can be reached from them.

image

It keeps stepping and tracking the expanding number of paths that it’s exploring, until one of its paths reaches a goal state (for example, a state in which a particular orb is on a particular stand, which opens a bridge to the next area). At this point the solver stops, and returns the path that led to the goal state. Because the solver takes a single step down every possible path at once, the first path to the goal state that it finds is guaranteed to be the shortest one possible.

I can then input the sequence of actions that my solver returned into the real game (for example: pick up red orb, walk to orb holder, put down red orb, etc), and move on to the next level, without having to do any thinking whatsoever. Here’s my code.

image

How hard are these puzzles really?

My solver stops as soon as it has reached the goal state. However, I was also interested in fully mapping out every possible sequence of actions one could take inside a puzzle. I wanted to see how big and hard the puzzles actually are.

To do this I wrote a script that explores every possible state and transition. It keeps stepping through states transitions, even after one of its paths has reached a goal state. It only stops when none of the paths it’s exploring have any available actions that lead to a new state that it hasn’t seen before.

This script generates a new graph, in which each node is a state and each edge is a transition. The graph shows every single sequence of actions that you can take inside the puzzle. For example, here’s the puzzle that I found hardest, represented as a graph of states and transitions:

image

Drawing a fully-expanded FSM graph like this shows how small and simple Cocoon’s puzzles really are once you write them down. That path on the right, between the green and red nodes, requires you to put one orb inside another orb, then inside another orb, and then shuffle them around just-so. This is a counter-intuitive strategy and hard to find if you’re playing the game like a normal person.

However, writing everything down makes it obvious. Writing everything down makes all actions visible and strips away all of the misleading heuristics that make humans focus on some strategies and completely miss others. It reduces a puzzle to “find a path from the green dot to one of the red dots,” as you saw above, which can be solved by a moderately gifted two year-old. To be very facetious, Cocoon is a fluffy layer of graphics and game mechanics on top of an unbelievably simple maze.

This isn’t meant as a knock on the game. Cocoon is a tidy mind-bender that wasn’t designed to withstand exhaustive search. If you write a program to solve your Cocoon puzzles for you then you’re only cheating yourself. Or at least, you would be if writing such a program wasn’t so much fun.


Life lessons

Real life isn’t a finite state machine though. All my solver really proves is that small, low-dimensional problems can easily be solved using exhaustive search. By contrast, the world has an infinite number of states. It’s hard to be sure which properties are important, and the allowed transitions between states are terribly defined. No problem of any importance can be reduced to a tiny graph, which means that even the most determined two year-old won’t be able to help with any of the big challenges in your life.

However, you don’t need to fully explore all possible solutions in order to benefit from systematic search. As I’d feared, the process of writing my solver forced me to consider the game with so much rigour that I’d solved all 3 challenges that I used it for before I’d finished programming them in. When I came to encode every location and transition on the map in my program I was forced to pay attention to each of them individually for at least a few seconds, even the ones that the game had tricked me into ignoring. This helped me see that some of the nodes and actions that I’d thought were irrelevant were actually the key to the whole thing.


That evening I talked to my family on the phone. I showed them all the diagrams I’d created and all the fun I’d had. They showed me pictures of them inside St. Peter’s Basilica. My son asked if I’d beaten my game. I explained that I’d transcended it by creating a mathematical representation of its entire possibility space. He asked if that meant “no.”


Here’s the code to my solver, but you should really just play the game properly.

Come and work with me on Anthropic's Frontier Red Team

2025-02-03 08:00:00

I work at Anthropic on the Frontier Red Team. Our mission is to find out whether AI models possess critical, advanced capabilities, and to help the world to prepare. We’re hiring AI researchers and engineers in the US and UK, and if that describes you then we’d love to talk.

We explore questions like: can models design bioweapons, or accelerate vaccine research? Can they orchestrate massive cyberattacks, or defend our critical infrastructure? Can they self-improve, or build a business, or fly drones? Even if they can’t do these things right now, when might they be able to?

We’re a technical research team embedded inside both Anthropic’s policy and research organizations. We work with frontier models; top experts in every domain that we cover; and key national security and policy actors. This Wall Street Journal article explains a bit more.

Below are the people we need to help us go faster. If any of them sound like you then please get in touch:

People to build and run evaluations

We need people to design and run the evaluations for Anthropic’s Responsible Scaling Policy (RSP). These people:

  • Work with internal domain leads and external domain experts to design and build cutting edge evaluations across cybersecurity, biosecurity, AI R and D, and more
  • Build sandbox environments to safely run these evaluations
  • Build automated analysis frameworks to make sense of all the results
  • Build agentic frameworks to squeeze the best possible performance from models

I think this is a great job for anyone, and the best job in the world for generalist engineers who are also interested in AI. You get to spend half your time building bulletproof systems for novel technical problems, and the other half thinking earnestly about how many tokens it might take for an AI model to build a GPU cluster to copy itself onto. If you work in computer infrastructure anywhere else then I’m convinced that you would have more fun working here instead.

Read more and apply:

People to conduct deep research into critical domains

We believe that the most important capabilities that models are likely to develop are in autonomy, biosecurity, and cybersecurity. We have small sub-teams that lead research into each of these domains. People on these teams:

  • Carry out threat modeling. What would it even mean for a model to be dangerous in this domain? How could it cause catastrophic harm? What capabilities would it need?
  • Design evaluations that test for these capabilities
  • Run large-scale experiments on these evaluations; understand what capabilities current models have and don’t have; predict when these capabilities might emerge
  • Build safe demonstrations of dangerous capabilities to show to experts and stakeholders

The people on these teams get to work on projects from reinforcement learning to understanding the biggest threats and opportunities facing society. Never a dull moment.

Read more and apply:

Logistics

This is the best and weirdest job I’ve ever had. The team is full of brilliant people who care a lot about their jobs and each other. Most of the team is on the West Coast of the US, with some on the East Coast and a couple in London. I work from London and my schedule is great, even with two kids. I go into the London office once or twice a week and to SF a couple of times a year.

Apply now

If you’d like to work with us then apply using the links above and below. There’s no specific deadline and we’ll keep looking until we find the right people, but we have a lot of work to do and need people to help us do it as soon as possible.

Those links again:

PyMyFlySpy: track your flight using its headrest data

2024-12-04 08:00:00

“Where are we daddy?” asked my five-year-old.

“We’ll land in about an hour,” I said.

“No I mean where are we? Are we flying over Italy yet?”

I wasn’t sure. Our flight was short and cheap and the seats didn’t have TV screens in the headrests. I looked around. I noticed a sticker encouraging me to connect to the in-flight wi-fi. That would do it, I thought. A site like FlightRadar would answer my little man’s question, down to the nearest few meters.

But unfortunately for him I’m the creator of PySkyWiFi (“completely free, unbelievably stupid wi-fi on long-haul flights”). Not paying for airplane internet is kind of my signature move. We’d need a different, offline strategy.

image

I had a think. When you connect to an airplane wi-fi network, you’re usually met with a payment page where you can purchase access to the internet. The page also usually gives you the same flight information that you’d find in the back of your headrest, like speed, direction, and estimated flight length. Perhaps it would have a map as well, I thought.

I pulled out my laptop, connected to the network, and loaded up the payment page. It did indeed show our wind speed, direction, and estimated time of arrival. But no map.

(It didn’t occur to me to screenshot the page so here’s an artist’s impression)

image

“Maybe the server that’s sending us this data is actually also sending us our location, but the web page isn’t displaying it,” I thought. I opened up the Chrome developer tools. I saw that my browser was making regular requests to a /info endpoint.

image

I clicked on one of the requests. This /info endpoint was indeed sending us a huge pile of data, including fields for ground_speed, wind_speed, and estimated_arrival_time. At the bottom of the response I noticed fields for latitude and longitude. My heart leapt. But then I looked closer. They were both null. Aerofoiled.

This looked like the end of the line. I was about to give up and tell my son that we were somewhere just north of Italy, probably…Europe somewhere. But then I was hit by two fantastic ideas.

Fantastic idea number 1: the /info endpoint didn’t tell us our location, but it did tell us our precise, regularly-updated speed and direction. On our flight home I could track and save our speed and direction every second or so for the whole flight. I could use this information to estimate how far we had traveled in each second, and in which direction. I could dynamically calculate our position by starting at our airport’s co-ordinates, then adding on each second’s step.

image

Fantastic idea number 2: even if I had been able to find our latitude and longitude in the /info response, it wouldn’t have meant much to either me or my son. However, I could build a web app that ran on my laptop and showed us our dynamically calculated position on a map, in real time. The app could have automatically updating graphs of our ETA, wind direction, speed, altitude, and so on. Ooh and an interface for running arbitrary queries against the data. And event callbacks to allow me to programmatically trigger code based on flight info (“when our ETA is 2 hours exactly, block my access to netflix.com and open the latest draft of my unfinished novel”). My son would know where he was. I’d be a Good Dad.

I decided to call the app PyMyFlySpy in order to give it some brand association with PySkyWiFi, my airplane-related project. I couldn’t wait to get started. Unfortunately right now I was wedged in between a five-year-old and a two-year-old and we were all terrible at JavaScript. I waited, impatiently.

PyMyFlySpy

Eventually we landed. I built PyMyFlySpy during our holiday, over late evenings and one or two derelict afternoons while the rest of my family did normal-person fun things. I couldn’t figure out whether it was bad manners to use your laptop in artisanal Italian coffee shops, or which of them had wi-fi, so to my eternal shame I googled “starbucks near me” and planted myself in a corner with a skinny mochachino and typed away.

I finished PyMyFlySpy the day before we left. The code is available on GitHub and it’s easy to setup and run. It even has a “dummy” mode that allows you to demo it without being inside a plane, using a made-up flight.

Here’s what PyMyFlySpy can do:

Maps and graphs

PyMySkySpy shows a map of your flightpath so far. It also shows your current flight metrics and how these metrics have changed over the course of your flight. It does this for all data available from the in-flight wi-fi, even data that isn’t usually displayed on the website or headrest screen. You can see exactly where you are and feel a bit like a pilot.

image

image

Query interface

PyMySkySpy saves all the data that it records to a database. Its UI has a page that allows you to write queries against the data to answer questions like “what’s our maximum speed so far, and when did we hit it?” or “how fast was the wind during that turbulence we just went through?”

image

I’m not claiming that this is hugely useful, but I do think it’s cool.

Multi-airline support

Different airlines have different wi-fi systems. A recorder for a JetBlue flight won’t work on AirFrance. Fortunately, PyMySkySpy allows you to easily add and use recorders for different airlines. You just have to load up their wi-fi landing page, open your browser’s developer tools, and figure out how to parse their page’s data like I did above. Then you add your new code to the PyMySkySpy config, and tell the recorder to use it. Everything else continues to work just the same.

System design

The system is very simple. It has 4 parts:

  1. Firefox Extension - reads flight info from the airline’s website and sends it to the PyMySkySpy web server
  2. Local web server - saves data that the extension sends to it, and makes it available to the frontend
  3. Sqlite Database - stores data
  4. Web frontend - displays data using maps and graphs

image

The one strange design choice I made was to use a Firefox Extension to read the flight data, instead of writing a scraper that makes its own data requests directly. Scraping the information like this would have been easier and more flexible, as well as completely harmless. Hundreds of people were already connected to the wifi, and the airline’s own landing page hits the /info endpoint once every couple of seconds. Adding one more request from a scraper would have been entirely safe.

image

However, I’m sure that airlines would rather people didn’t poke around at their onboard servers, even if those people are very careful and well-intentioned and handsome. To make sure I didn’t irritate them, I came up with an even more judicious approach.

Instead of scraping the data endpoint, I wrote a Firefox Extension. The extension sits there while the airline’s wi-fi landing page requests the latest data from the /info endpoint, just like normal, every few seconds. The extension peeks at the data that’s returned; sends the data to the PyMyFlySpy web server; and finally the web server writes it to the PyMyFlySpy database, to serve to the web frontend. Using a Firefox Extension like this means that PyMyFlySpy never interacts with the plane’s info server directly. This means that PyMyFlySpy can provably never harm the server.

I had to write the extension for Firefox instead of Chrome, because Chrome is in the process of reducing extensions’ ability to interact with requests made by a website (like requests made to the /info endpoint). In particular, Chrome is going to prevent extensions from easily reading the responses to HTTP requests made by a website, which would prevent the PyMyFlySpy from reading the data returned by the /info endpoint. As far as I can tell these restrictions are half for security reasons, and half to make it harder to develop adblockers. Either way, PyMyFlySpy requires Firefox.

Future work - event subscriptions

PyMySkySpy gives us programmatic access to data about our flight. It would be fun to use this to trigger events, like:

  • “For the first half of the flight, only let me open the big report that I need to finish by 5pm today.”
  • “When our location is within 10 miles of the Grand Canyon, send the kids a Slack message to look out the window. Also send me a Slack message to bug them to look out the window.”
  • “If our altitude drops by more than 300ft in 1 second then play a reassuring but really quite urgent sound on all of my devices.”

Next holiday, perhaps.

The flight home

Our flight home was in the late afternoon. We shuffled on board and took off. I pulled out my laptop, connected to the wi-fi, and booted up PyMySkySpy. I turned to my son to show him where we were. I’d shown him the prototype every day for the last week and I though he seemed to be somewhere between “tolerant” and “mildly interested.” But he’d already fallen asleep. I took some screenshots to show him later.

I spent the next few hours monitoring and debugging the recorder to make sure that it stayed up. My two-year-old screamed the whole flight and kept trying to throw himself on the floor. I made supportive faces at my wife across the aisle and pretended to offer to take him, but she shook her head. She knew that this was important.

I watched the graphs. Temperature within normal range. Wind speed stable. Suddenly our altitude dropped by a fifty feet. I wondered if I should tell the pilots. I decided that they probably had it under control. I kept watching, just in case.

PyMyFlySpy is on GitHub.

Generating infinite, age-appropriate Cat Crimes puzzles

2024-09-02 08:00:00

A few weeks ago my 5-year-old and I tried playing Cat Crimes, a puzzle game in which you work out which of your cats ate your shoes. We had a wonderful time - for about 20 minutes.

In each round of Cat Crimes you get a puzzle card with a list of clues on it. You have to use the clues to figure out where in your front room each of your 6 cats were sitting. This tells you which one of them was responsible for your ruined stilettos. The game comes with 40 puzzle cards, ranging from the very easy to the mind-crushingly difficult.

However, the problem is that “very easy” to “mind-crushingly difficult” is a lot of ground to cover in 40 puzzles, and by the fifth puzzle the clues had become too abstract and difficult for my little man. In the first few puzzles each new clue allowed us to immediately place a new cat and then forget about it. For example, a clue might have told us that Mr. Mittens was sitting opposite Pip Squeak. We already knew where Pip Squeak was sitting, so we could work out exactly where Mr. Mittens was sitting too. This is the perfect level of complexity for a small child and his aging father at 6am.

However, as the puzzles get harder the clues stopped neatly resolving like this. They still narrowed down the possible pussy permutations, but they didn’t necessarily allow us to definitively place a new cat straightway. For example, a clue might have told us that Mr Mittens was sitting next to Pip Squeak. We know that Mr Mittens must have been on either Pip Squeak’s left or right, but we couldn’t say for sure which until we’d processed more clues. We might later learn that Duchess was sitting to Pip Squeak’s left, which in turn would tell us that Mr. Mittens must be sitting to her right.

To follow this extended chain of logic you need to hold multiple simultaneous superpositions in your head. This is fun and challenging and good puzzle design, but my kid hasn’t done superpositions at school yet so he didn’t get it. I tried drawing some pictures for him, but they made no sense even to me. We got angry with each other and eventually gave up on the game altogether.

But we’d really had a great time with those first few puzzles, so that evening I wrote us a computer program that generated an infinite number of new beginner level Cat Crimes challenges. I ran it 20 times and printed out the challenges and their solutions. The next day we continued happily solving age-appropriate cat mysteries together.

Downloads

You can download:

The program works by generating random challenges until it finds one that has a single unique solution and meets certain constraints. The constraints ensure that the challenges are easy but not too easy. For example, a maximum of 3 cats can be asleep (meaning that they are out of the round), and a maximum of 2 clues are allowed to tell you a cat’s exact position.

In order to play the puzzles you’ll need to buy the Cat Crimes game.

Good luck, and let me know how you get on!

ChatGPT mode

At first I tried asking ChatGPT to generate puzzles for me. My puzzles are guaranteed to be solvable and probably about the right difficulty, but since they’re randomly generated their solutions don’t generally have much of a careful narrative behind them.

I thought that ChatGPT might be able to do better. “Absolutely!” it said when I asked it, but it kept giving me back puzzles that had either several different solutions or no solutions at all. No dice!

To fix this I added a ChatGPT mode to my tool. In this mode the tool gives you a prompt to paste into ChatGPT. The prompt asks ChatGPT it to give you a Cat Crimes puzzle formatted in a specific way. You paste ChatGPT’s output back into the tool, and the tool checks whether the puzzle is valid. If it is then the tool converts the puzzle into printable card; if it’s not then it prints an error message for you to give to ChatGPT to help it fix the problem. You continue this debugging loop until you have a valid (and hopefully more fun) puzzle.

Disclaimer

I’m not associated with Cat Crimes in any way; this is a completely unofficial fan project. Cat Crimes is owned and published by Thinkfun Inc. Go and buy it from them!

PySkyWiFi: completely free, unbelievably stupid wi-fi on long-haul flights

2024-07-09 08:00:00

The plane reached 10,000ft. I took out my laptop, planning to peruse the internet and maybe do a little work if I got really desperate.

I connected to the in-flight wi-fi and opened my browser. The network login page demanded credit card details. I fumbled for my card, which I eventually discovered had hidden itself inside my passport. As I searched I noticed that the login page was encouraging me to sign in to my airmiles account, free of charge, even though I hadn’t paid for anything yet. A hole in the firewall, I thought. It’s a long way from London to San Francisco so I decided to peer through it.

I logged in to my JetStreamers Diamond Altitude account and started clicking. I went to my profile page, where I saw an edit button. It looked like a normal button: drop shadow, rounded corners, nothing special. I was supposed to use it to update my name, address, and so on.

But suddenly I realised that this was no ordinary button. This clickable rascal would allow me to access the entire internet through my airmiles account. This would be slow. It would be unbelievably stupid. But it would work.

Several co-workers were asking me to review their PRs because my feedback was “two weeks late” and “blocking a critical deployment.” But my ideas are important too so I put on my headphones and smashed on some focus tunes. I’d forgotten to charge my headphones so Limp Bizkit started playing out of my laptop speakers. Fortunately no one else on the plane seemed to mind so we all rocked out together.

Before I could access the entire internet through my airmiles account I’d need to write a few prototypes. At first I thought that I’d write them using Go, but then I realised that if I used Python then I could call the final tool PySkyWiFi. Obviously I did that instead.

Prototype 1: Instant Messaging

Here’s the basic idea: suppose that I logged into my airmiles account and updated my name. If you were also logged in to my account then you could read my new name, from the ground. You could update it again, and I could read your new value. If we kept doing this then the name field of my airmiles account could serve as a tunnel through the airplane’s wi-fi firewall to the real world.

This tunnel could support a simple instant messaging protocol. I could update my name to “Hello how are you.” You could read my message and then send me a reply by updating my name again to “Im fine how are you.” I could read that, and we could have a stilted conversation. This might not sound like much, but it would be the first step on the road to full internet access.

I paid for the internet on my old laptop. I hadn’t finished migrating my data off this computer, so it still had to come everywhere with me. I messaged my wife to ask her to help me with my experiments. no, what are you talking about, i'm busy she replied, lovingly.

So instead I took out my new laptop, which still had no internet access. I created a test airmiles account and logged into it on both computers. I found that I could indeed chat with myself by updating the name field in the UI.

sequenceDiagram
    participant Computer1
    participant AirmilesAccount as Airmiles Account<br>Name Field
    participant Computer2
    
    Computer1->>AirmilesAccount: TYPE: Hello how are you
    AirmilesAccount->>Computer2: READ: Hello how are you
    Computer2->>AirmilesAccount: TYPE: Im fine how are you
    AirmilesAccount->>Computer1: READ: Im fine how are you

This was a lousy user experience though. So I wrote a command line tool to automate it. My tool asked the user for a message, and then behind the scenes it logged into my airmiles account via the website, using my credentials. The tool updated the name field of my test account with the user’s message. It then polled the name field every few seconds to see if my account’s name had changed again, which would indicate that the other person had sent a message back. Once the tool detected a new value it printed that value and asked the user for their next reply, and so on.

sequenceDiagram
    actor Me
    participant AirmilesAccount as Airmiles Account<br>Name Field
    actor You
    
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>You: (no new data)
    Me->>AirmilesAccount: WRITE: Hello how are you
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>You: READ: Hello how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>Me: (no new data)
    You->>AirmilesAccount: WRITE: Im fine how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>Me: READ: Im fine how are you

Using this tool I could chat with someone on the ground, via my terminal. I wouldn’t have to pay for wifi, and neither of us would have to know or care that the messages were being sent via my SkyVenture Premium Gold Rewards account.

I still needed to find someone who would chat with me. But this was a good start!

NB: at this point I didn’t want to send any more automated data through my airmiles account in case that got me in trouble somehow. Nothing I was doing could possibly cause any damage, but some companies get jumpy about this kind of thing.

I therefore proved to myself that PySkyWiFi would work on my airmiles accounts too by updating my name ten or so times in quick succession. They all succeeded, which suggested to me that my airmiles account probably wasn’t rate-limiting the speed or number of requests I could send to it.

I then wrote the rest of my code by sending my data through friendly services like GitHub Gists and local files on my computer, using the same principles as if I were sending it through an airmiles account. If PySkyWiFi worked through GitHub then it would work through my Star Power UltimateBlastOff account too. This had the secondary advantage of being much faster and easier for iteration too.

I’m going to keep talking about sending data through an airmiles account, because that’s the point I’m trying to make.

Prototype 2: Live headlines, stock prices, and football scores

The tunnel I’d constructed through my airmiles account would be useful for more than IMing. For my next prototype I wrote a program that would run on a computer back at my house or in the cloud, and would automatically send information from the real world up to me on the plane, through my airmiles account. I could deploy it before I left for my next flight and have it send me the latest stock prices or football scores while I was in the sky.

To do this I wrote a daemon that would run on a computer that was on the ground and connected to the internet. The daemon constantly polled the name field in my airmiles account, looking for structured messages that I sent to it from the plane (such as STOCKPRICE: APPL or SCORE: MANUNITED). When the daemon saw a new request it parsed it, retrieved the requested information using the relevant API, and sent it back to me via my airmiles account. It worked perfectly.

Now I could use my first prototype to send IMs through my airmiles account, and I could use my second prototype tio follow the markets and the sports.

It was time to squeeze the entire internet through my airmiles account.

The real thing: PySkyWiFi

During the rest of the flight I wrote PySkyWiFi. PySkyWiFi is a highly simplified version of the TCP/IP protocol that squeezes whole HTTP requests through an airmiles account, out of the plane, and down to a computer connected to the internet on the ground. A daemon running on this ground computer makes the HTTP requests for me, and then finally squeezes the completed HTTP responses back through my airmiles account, up to me on my plane.

This meant that on my next flight I could technically have full access to the internet, via my airmiles account. Depending on network conditions on the plane I might be able to hit speeds of several bytes per second.

DISCLAIMER: you obviously shouldn’t actually do any of this

Here’s how it works (and here’s the source code).


How PySkyWiFi works

PySkyWiFi has two components:

  • The sky proxy - a proxy that runs on your laptop, on a plane
  • The ground daemon - a daemon that runs on a computer connected to the internet, at your home on the ground or in the cloud

Here’s a simplified diagram:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: HTTP request
    SkyProxy->>AirmilesAccount1: HTTP request
    AirmilesAccount1->>GroundDaemon: HTTP request
    GroundDaemon->>Website: HTTP request
    Website->>GroundDaemon: HTTP response
    GroundDaemon->>AirmilesAccount1: HTTP response
    AirmilesAccount1->>SkyProxy: HTTP response
    SkyProxy->>Me: HTTP response

Setup starts before you leave your house. First you start up the ground daemon. Then you get a taxi to the airport, get on the plane, and connect to the plane’s wi-fi network. You boot up the sky proxy on your laptop. Your PySkyWiFi relay is now ready to go.

You use a tool like curl to make an HTTP request to the sky proxy that you’ve started on your laptop. You address your request to the proxy (eg. localhost:1234/) and you put the actual URL that you want to query inside a custom HTTP header called X-PySkyWiFi. For example:

curl localhost:1234 -H "X-PySkyWiFi: example.com"`

The X-PySkyWiFi header will be stripped by the ground daemon and used to route your request to your target website. Everything else about the request (including the body and other headers) will be forwarded exactly as-is.

Once you make your request it will hang for several minutes. If by some miracle nothing breaks then you’ll eventually get back an HTTP response, exactly as if you’d sent the request over the normal internet like a normal person. The only difference is that it didn’t cost you anything. You will now almost certainly pay for wi-fi, because your curiosity has been satisfied and your time on this earth is very short.

Step-by-step

Here’s what happens behind the scenes:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: example.com"
    SkyProxy->>AirmilesAccount1: Write request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 1
    GroundDaemon->>AirmilesAccount2: Ack request chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ack for request chunk 1
    SkyProxy->>AirmilesAccount1: Write request chunk 2
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 2
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: example.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>
    GroundDaemon->>AirmilesAccount2: Write response chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 1
    SkyProxy->>AirmilesAccount1: Ack request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ack for request chunk 1
    GroundDaemon->>AirmilesAccount2: Write response chunk 2
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 2
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>

In order:

  1. The sky proxy receives the HTTP request from your curl call. It splits the request into chunks, because the entire request is too large to fit into you airmiles account in one go
  2. The sky proxy writes each chunk one-by-one to the name field in your airmiles account.
  3. The ground daemon polls your airmiles account. When it detects that the name field has changed to a new chunk, it reads that chunk and sends an acknowledgement to the sender so that the sender knows it’s safe to send the next chunk. The receiver sticks the chunks back together and rebuilds the original HTTP request
  4. Once the ground daemon has received and rebuilt the full HTTP request, it sends the request out over the internet.
  5. The ground daemon receives an HTTP response.
  6. The ground daemon sends the HTTP response up to the sky proxy using the same process as before, in reverse. This time the ground daemon splits the HTTP response up into chunks and writes each chunk one-by-one to the name field in your airmiles account (it actually writes these response chunks using a second airmiles account to make the protocol simpler)
  7. The sky proxy polls the second airmiles account. It reads each chunk and sticks them back together to rebuild the HTTP response
  8. The sky proxy returns the HTTP response to the original call to curl. As far as curl is concerned this is a perfectly normal HTTP response, just a little slow. curl has no idea about the silliness that just transpired

The sky proxy and the ground daemon are relatively simple: they send HTTP requests and parse HTTP responses. The magic is in how they squeeze these requests and responses through an airmiles account. Let’s look closer.

Squeezing HTTP requests through an airmiles account

PySkyWiFi’s communication logic is split into two layers: a transport layer, and a network layer. The transport layer’s job is to decide what data clients should send to each other. It dictates how senders should split up long messages into manageable chunks, as well as how senders and receivers should signal information like “I am ready to receive another chunk.” The PySkyWiFi transport layer is somewhat similar to the TCP protocol that powers much of the internet, if you squint very hard and don’t know much about TCP.

By contrast, the network layer’s job is to actually send data between clients, once the transport protocol has decided what that data should be. It’s vaguely similar to the IP protocol, if you squint even harder and know even less what you’re talking about.

This division of responsibility between layers is useful because the transport layer doesn’t have to care about how the network layer sends its data, and the network layer doesn’t care what the data it sends means or where it came from. The transport layer just hands the network layer some data, and the network layer sends it however it likes.

This separation makes it easy to add support for new airmiles platforms, because all we have to do is implement a new network layer that reads and writes to the new type of airmiles account. This separation also allows us to write test versions of the network protocol that write and read from local files instead of airmiles accounts. In each case the network layer changes, but the transport layer stays exactly the same. Here’s how they work.

The transport layer

A PySkyWiFi transport connection between two clients consists of two “pipes” (or “airmiles accounts”). Each client has a “SEND” pipe that it can write data to, and a “RECV” pipe that it can read from. Clients write to their SEND pipe by writing data to it, and they read from their RECV pipe by constantly polling it and seeing if anything has changed.

flowchart LR
    Client1 --> Client2
    Client2 --> Client1

From the transport layer’s point of view, a pipe is just something that it can write and read data from. Beyond that the transport layer doesn’t care how its pipes work.

At any given moment a PSWF (PySkYWiFi) client can only either send or receive data, but not both. A client in send mode will not see data sent by the other client, and a client in receive mode should never send data because the other client won’t see it. This is unlike TCP, where clients can send or receive data at ay time.

When squeezing HTTP requests and responses through an airmiles account, the sky proxy sends the first message and the ground daemon receives it. Once the sky proxy has finished sending its HTTP request it switches to receive mode and the ground daemon switches to send. The ground daemon makes the HTTP request and sends back the response, at which point the two switch roles again so that the sky proxy can send another HTTP request.

How are long messages sent through such a small pipe?

PSWF uses small pipes (such as an airmiles name field) that can’t fit much data in them at once. This means that it takes some work and care to squeeze long messages (like HTTP requests) through them.

To send a long message, the sender first splits up their message into chunks that will fit into their SEND pipe. They then send each chunk down the pipe one at a time.

To begin a message, a sender starts by sending its first chunk of message data inside a DATA segment:

A DATA segment consists of:

  • The letter D
  • The sequence number of the chunk (a number that uniquely identifies the chunk, padded to 6 digits)
  • The actual chunk of data.

For example, a data segment in the middle of a message might read: D000451adline": "Mudslide in Wigan causes m

Once the sender has sent a DATA segment, it pauses. It wants to send its next DATA segment, but it can’t overwrite the airmiles account name field until it knows that the receiver has received and processed the previous one.

The receiver tells the sender that it’s safe for to send a new DATA segment by acknowledging every segment that it reads. The receiver does this by writing an ACK segment to its own SEND pipe:

An ACK segment consists of:

  • The letter A
  • The sequence number of the segment that is being acknowledged (padded to 6 digits)

For example: A000451

The sender is constantly polling its own RCV pipe to check for changes, and so it reads this new ACK segment promptly. Once the sender reads the ACK, it knows that the receiver has received the segment corresponding to the ACK’s sequence number. For example, if a sender receives an ACK segment with sequence number 000451, the sender knows that it’s safe to send the next DATA segment with sequence number 000452. The sender therefore pulls the next chunk from its message and constructs a new DATA segment using this chunk and sequence number. The sender writes the new segment to its SEND pipe, and then pauses waits for another ACK.

This loop continues until the sender has sent all the data in its message. To tell the recipient that it’s finished, the sender sends an END segment.

An END segment is just the letter E.

When a receiver sees an END segment it knows that the sender’s message is over. The sender and the receiver swap roles. The old sender starts polling its RECV pipe for DATA segments, and the old receiver starts chunking up its response message and sending it through its pipe, exactly as before.

None of this transport logic cares about the details of the network layer through which the segments are sent. The transport layer just needs the network layer to provide two pipes that it can read and write to. The network layer can pipe this data around via local files, a Discord profile, or an airmiles account. This genericness is what allows PySkyWiFi to work with any airline’s airmiles account, so long as the airline allows you to login to it from the plane without paying.

Here’s how PSWF uses transport protocol segments to exchange long messages:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as robertheaton.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: robertheaton.com"
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon->>AirmilesAccount2: Write ACK segment<br>sequence number=000000
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ACK segment<br>sequence number=000000
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: robertheaton.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>
    GroundDaemon->>AirmilesAccount2: Write DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy->>AirmilesAccount1: Write ACK segment<br>sequence number=000000
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ACK segment<br>sequence number=000000
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>

The transport layer decides what data the clients should send each other, but it doesn’t say anything about how they should send it. That’s where the network protocol comes in.

The network layer

The network layer’s job is to send data between clients. It doesn’t care about where the data came from or what it means; it just receives some data from the transport layer and sends it to the other client (typically via an airmiles account).

This means that the network layer is quite simple. It also means that adding a new network layer for a new airmiles platform is straightforward. You use the new platform to implement a few operations and a few properties (see below), and then the transport layer can automatically to use your new airmiles platform with no extra work.

A network layer consists of two operations:

  • send(msg: str) - write msg to storage. For an airmiles-based implementation, this writes the value of msg to the name field in the user’s airmiles account
  • recv() -> str - read the message from storage. For an airmiles-based implementation, this reads the value of the name field from the user’s airmiles account.

A network layer implementation must also define two properties:

  • sleep_for - the number of seconds that the transport layer should sleep for in between polling for new segments from a RECV pipe. sleep_for can be very low for test implementations like files, but it should be at least several seconds for an implementation like an airmiles account. This is in order to avoid hammering remote server with too many requests.
  • segment_data_size - the number of characters that the transport layer should send in a single segment. Should be equal to the maximum size of the airmiles account field being used to transfer segments (often around 20 characters).

A network layer implementation can also optionally provide two more operations:

  • connect_send() - a hook called by the sender when a SEND pipe is initialised. In an airmiles-based implementation this allows the client to login to the platform using a username and password. This gives the client a cookie that it can use to authenticate future send and recv calls.
  • connect_recv() - a hook called by the receiver when a RECV pipe is initialised

If you fill in all these methods, you’ll be able to use PySkyWiFi on a new airline. But again, don’t.

Tips and tricks

When writing a network layer that uses a new airmiles provider, there are a couple of tricks that can make your implementation faster and more reliable.

1. Encode messages to make sure the airmiles account accepts them

Airmiles HTML forms usually don’t let users include non-alphabetic characters in their name. Stephen will probably be allowed, but GET /data?id=5 will probably be rejected.

To work around this, the network layer should encode segments using base26 before writing them to an airmiles account. base26 is a way of representing a string using only the letters A to Z . In order to convert a byte string to base26, you convert the bytes to a single large number, then you represent that number using a counting system with base 26 (hence the name) where the digits are the letters A to Z.

def b26_encode(input_string: str) -> int:
    # Convert input string to a base-256 integer
    base256_int = 0
    for char in input_string:
        base256_int = base256_int * 256 + ord(char)
    
    # Convert base-256 integer to base26 string
    if base256_int == 0:
        return 'A'  # Special case for empty input or input that equals zero
    
    base26_str = ""
    while base256_int > 0:
        base26_str = chr(base256_int % 26 + 65) + base26_str
        base256_int //= 26
    
    return base26_str

b26_encode("Hello world")
# => 'CZEZINADXFFTZEIDPKM'

The transport layer never needs to know about this encoding. The network layer receives some bytes, encodes them using base26, and writes this encoded string of A to Z to the airmiles account. When the network layer reads the base26 value back out of the airmiles account, it decodes the encoded string back into a number and then back into bytes, and then returns those bytes to the transport layer.

Encoding a string using base 26 makes it significantly longer, just like how it takes many more digits to represent a number using binary than decimal. This reduces the bandwidth of our protocol. We could increase our bandwidth by using base52 (using both upper- and lower-case letters) instead of base26, which would shorten it somewhat. This is left as an enhancement for version 2.

2. Increase bandwidth by using more account fields

Another way to increase our PSWF bandwidth is to increase the segment size that a network layer can handle. If we double the size of our segments, we double the bandwidth of our protocol.

Fields in airmiles accounts usually have length limits. For example, you might not be allowed to set a name longer than 20 characters. However, we can maximise our bandwidth by:

  1. Using the full length of the field
  2. Spreading out a segment across multiple fields

Suppose we have control over 5 fields that can each store 20 characters. Instead of using one field to transmit segments of 20 characters, we can split a 100 character segment into 5 chunks of 20 and update them all at once in a single request. The receiver can then read all 5 fields, again in a single request, and stitch them back together to reconstruct the full segment.

Further enhancements

HTTP CONNECT

It would be better if PySkyWiFi used HTTP CONNECT requests to set up the tunnel from the sky proxy to the target site, instead of manually tossing around HTTP requests. CONNECT requests are how most HTTP proxies work, and using them would allow PySkyWiFi to act as the system-level proxy and so handle requests from a web browser. It would also mean that PySkyWiFi would negotiate TLS connections with the target website directly, so its traffic would be encrypted as it passed through the airmiles account.

On the other hand, using CONNECT would also be a lot more work and I’ve already taken this joke way too far.

In conclusion

When I was done with all of this I used PySkyWiFi to load the homepage of my blog using curl, tunneling the data via a GitHub Gist. Several minutes later I got a response back. I scrolled around the HTML and reflected that this had been both the most and least productive flight of my life.

(PySkyWiFi source code here)

I've written a book about being a dad; now I want to get it published

2024-03-25 08:00:00

For the last eighteen months I’ve been writing a book about being a dad. Two weeks ago I finished the first draft!

The book is inspired by my blog posts about parenting, but most of it is brand new and I think it might be very good. It’s about childbirth, covid, careers, old friends, new friends, kid friends, chess, pianos, screens, AI, marriage, and much more.

Now that I’ve finished a draft I’m looking for an agent and a publisher. I’ve never done this before so I’d appreciate help and advice! Please get in touch if:

  • You’ve published a book or know someone who has
  • You are a literary agent or you know someone who is
  • You’d be up for giving feedback on a first draft
  • You have some words of encouragement
  • You have any thoughts or ideas of any sort about anything

If you want to find out when the book is ready, subscribe to my newsletter. If you have friends who you think would enjoy the book, tell them about it and make them subscribe too.

Thanks!