MoreRSS

site iconRobert HeatonModify

Frontier red team @AnthropicAI . Writing a book about being a dad. Now I live in London.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Robert Heaton

MinorMiner: we turn your kid's maths homework into Bitcoin

2025-05-14 08:00:00

Hello! Hello! Welcome, welcome. My name is Hobert Reaton, and I’m here in this shabby motel conference room to present you with yet another once-in-a-lifetime investment opportunity.

Look at this picture. Tell me what you see:

image

Do you see learning? Self-improvement? The future leaders of our country?

I’ll tell you what I see: wasted computing power.

Between the ages of 5 and 18, the average child in full-time education completes about 5 maths worksheets a week. Each worksheet has 20 questions. This means that over the course of their school career, every single one of our kids performs about 80,000 calculations.

At the moment we completely waste their work. A student figures out that 5+5=10 and 7x7=49. This motivates them; they’re energised by their success. But then what do we do with the fruits of their labour? Nothing! We throw the fruit away, to rot in the void. “We knew that already,” we tell our children. “Your ideas don’t matter.” Unlike the rest of society, I believe that kids deserve to feel appreciated. I believe that their achievements are valuable.

That’s why I founded MinorMiner.

What is MinorMiner?

MinorMiner is a platform that allows school-age children to monetise their maths homework by using it to mine Bitcoin. Yes, you heard me. We send children their homework, they crunch through it, and then together we transform their sweat into digital gold. This isn’t some rinky-dink incentive program where we bribe children to care about multiplication. Homework is the essential raw material that feeds our machine. We need these kids. No kids; no Bitcoin.

In order to understand the innovation that makes MinorMiner possible, we first need to understand how Bitcoin is mined today. Right now, people mine Bitcoin by using computers to solve complex mathematical puzzles. The puzzles look like this:

  1. Take the list of the Bitcoin transactions that have occurred since the last Bitcoin was mined. Check that they’re all correctly authorized and that none of them spend money that the creator doesn’t have.
  2. Choose a string of extra letters and numbers to add on to the end of this list (a nonce). This is your attempt to solve the puzzle.
  3. Pass the list and your extra characters through an extremely complex function called the SHA-256 hash function (technically you pass it through the function twice). The hash function chops and slices and spins and dices the input around, seemingly (but not actually) at random. At the end it spits out a number
  4. The puzzle that you’re trying to solve is: what combination of letters and numbers from step 2 cause the output from step 3 to be less than some small target number?
Step 1Collect valid Bitcointransactions sincelast blockStep 2Add nonceStep 3Pass transactions andnonce through hashfunctionStep 4Check if hash outputis less than targetNO - try again withnew noncePublish solution and claim rewardYES - new bitcoinblock mined!

There’s no elegant way to solve these puzzles. The only thing for Bitcoin miners to do is to guess inputs to step 2, over and over and over again, until they find one that happens to satisfy the criteria in step 4. When a miner guesses a right answer we say that they’ve “mined” a new “block”. They attach their solution to the blockchain to show that they’ve verified the transactions in the block, and they’re rewarded with new bitcoin. Their work, along with a couple of extra steps that I’ve hand-waved over, ensures that the blockchain stays safe and secure.

However, it also requires an incredible amount of electricity - around 150TWh per year, or 1% of the world’s total energy consumption. What if there was a better, more efficient way to achieve the same thing?

This is where MinorMiner and school-aged children come in. “But Bitcoin mining sounds hard!” I hear you wail. “My child has only a rudimentary grasp of basic algorithms!” True, true - but the magic is that the children on our platform don’t need to know how to mine Bitcoin, and they won’t even know that they’re doing it. Our team has converted the SHA-256 hashing algorithm used by the bitcoin blockchain into a sequence of elementary arithmetic questions that even the dullest dullard can answer. Solving a blockchain puzzle used to require understanding and executing a SHA-256 hash. Now all it takes is skipping through a few trillion simple brainteasers.

  • 5+3=?
  • 10*5=?
  • Is 102 bigger than 67? (y/n)
  • And so on

The kids do these sums - we take care of the rest.

How does the MinorMiner platform work?

The heart of MinorMiner is a centralised system that manages our mining. The system decomposes a SHA-256 hash computation into simple arithmetic questions, and works out which of these questions need answering next. Simple enough - but how do we get the questions to the kids? Three words: online maths quizzes.

You see, MinorMiner also has a maths learning platform that we sell to schools all over the world. Our platform isn’t particularly good, but we give department heads a generous revenue share and so this tends not to matter. Once a teacher (or “distribution associate” as we like to call them) is set up on MinorMiner, they assign a quiz to their class as homework. In the evening their children (or “computation partners”) log into the MinorMiner portal and answer their quiz for the day, which consists of whatever questions our hashing system needs doing next. By way of compensation their distribution associate doesn’t give them a detention. We collect their answers and use them to continue calculating a hash. When one partner finishes their quiz, the next partner continues calculating from where they left off.

image

We have to be careful - a hash is a delicate thing. One tiny mistake in one tiny step and - poof! - the whole calculation is completely, irreversibly screwed. That’s why send each calculation to two separate computation partners. If their answers disagree then we escalate to a slightly older partner to adjudicate. We maintain a rating for each partner based on their accuracy. If their rating drops below 4.3 stars then they are invited to undergo additional training to help them get back to the standard expected for MinorMiner partners. If such improvement is not forthcoming then they are invited to seek maths education elsewhere.

Any questions so far? No? Then it’s time for me to show you our real technological breakthrough.

CUDAAAAGH

We write our mining code using a Python library called Centralized Underage Distributed Arithmetic - Automated Assignment And Group Hashing (CUDAAAAGH). CUDAAAAGH allows us to distribute any complex computation across an infinitely-scalable pool of computation partners. We’ve open-sourced it on GitHub and PyPi.

To use it, we run pip install CUDAAAAGH and then use its CUDAAAAGHInt class everywhere we would normally use Python’s standard integer type. Aside from that, we write all of our code as normal. When we execute our program, CUDAAAAGH automatically offloads any arithmetic computations to our network of computation partners, instead of burdening our own CPUs.

For example:

from CUDAAAAGH import CUDAAAAGHInt

x = CUDAAAAGHInt(5)
y = x + CUDAAAAGHInt(10)

# Behind the scenes, this sends the calculation "5+10" to a computation
# partner. Execution pauses until we receive an answer.

print(f"The answer is: {y}")
# => 15

image

This works for all integer operations. For complex operations like XOR, CUDAAAAGH breaks them up into simpler additions and multiplications that will be more familiar to our computation partners. It then combines their answers behind the scenes to calculate the requested XOR:

from CUDAAAAGH import CUDAAAAGHInt

x = CUDAAAAGHInt(0b010100100)
y = x ^ CUDAAAAGHInt(0b111100101)

print(y)
# => 321

And before you ask - yes you can absolutely use CUDAAAAGH to train AI models. Stick around until the end for more details.

I know what you’re thinking: this is genius, it’s revolutionary, but Bitcoin mining is a game of speed. Is CUDAAAAGH fast enough? And to you I say: hell yes it’s almost fast enough, if you go by our 5-year projections and use our heterodox assumptions about the direction of the world economy.

Hold onto your cheque books everybody.

It’s time to mine

Today, calculating a hash takes the MinorMiner platform about 7 billion operations. An off-the-shelf ten-year-old can perform one addition every 10 seconds. Including a generous sleeping allowance, this means that they can compute 1 hash every 2,000 years. On the other hand, specialized mining rigs can compute 1 hash every 0.00000000001 seconds, and new blocks are mined every 10 minutes. These numbers don’t work for MinorMiner - yet.

Fortunately MinorMiner is in a fundamentally strong position in the value chain. We’re fuelled by exhaust work that was previously completely wasted, which means that we don’t need to be uber-efficient in order to be competitive. However, we do still need to complete each hash within the time it takes for a new block to be mined. Otherwise, even if we find a successful hash that would have mined us a block and won us some Bitcoin, someone else will have mined that block already.

This is why MinorMiner’s number one focus is on turbo-charing our hashrate. Let me tell about our top three strategies: parallelisation, curriculum optimisation, and teacher incentive alignment.

1. Parallelisation

First, parallelisation. Right now, a single partner works on a single hash. This means that even though adding more partners allows us to calculate more hashes at the same time, it doesn’t decrease the end-to-end time it takes to calculate a single hash.

However, we can distribute work between computation partners far more cunningly than we do today. We can split up the calculations required to compute a hash into independent chunks, and we can give the chunks to different computation partners to work on in parallel. Once all of the chunks are done, we can combine them and take a big leap forward in a single hash, instead of many small leaps in lots of different hashes. Spreading out work like this allows us to decrease the time it takes us to calculate a single hash. This will significantly increase our competitiveness.

“But Hobert, SHA-256 can’t be parallelised!” I hear you squawking. “It uses a sequential block structure, where the output of each block is the input to the next block! This means that you can’t calculate later blocks until you’ve first calculated earlier ones. This makes parallelisation impossible!”

You’re right, oddly-well-informed heckler! Most implementations of SHA-256 can’t be usefully parallelised. However, remember that CUDAAAAGH breaks bitwise operations like AND, OR, and XOR into a large number of additions and multiplications. Many of the sub-calculations inside a single XOR operation are in fact independent and don’t depend on each other. This makes them easy to parallelise and get the massive speedups I’ve been promising.

For example, here’s our current, naive implementation of XOR:

class CUDAAAAGHInt:
    def __init__(self, val: int):
        self.val = val
    
    def __xor__(self, other: 'CUDAAAAGHInt') -> 'CUDAAAAGHInt':
        result = CUDAAAAGHInt(0)
        # Calculate the value of each bit, and use bit-shifting to combine
        # them using standard integer arithemtic.
        for i in range(max(self.bit_length(), other.bit_length())):
            bit_self = self._ith_bit(CUDAAAAGHInt(i))
            bit_other = other._ith_bit(CUDAAAAGHInt(i))
            
            xor_bit = bit_self + bit_other - CUDAAAAGHInt(2) * bit_self * bit_other
            result += xor_bit << CUDAAAAGHInt(i)
        
        return result
    
    def _ith_bit(self, i: 'CUDAAAAGHInt') -> 'CUDAAAAGHInt':
        return CUDAAAAGHInt((self.val >> i.val) & 1)
    
    def bit_length(self) -> int:
        return self.val.bit_length()

This implementation is serial and slow - just look at that for-loop! But now look even closer. Notice how each pass through the loop is entirely independent of all others. This means that we can calculate the value of each bit separately, in parallel, then combine all the results once we’re done. We can even compute bit_x and bit_y in parallel inside each loop.

Combining these tricks gives us a parallel implementation that looks something like this:

import asyncio

class CUDAAAAGHInt:

    def __init__(self, val: int):
        self.val = val

    async def __xor__(self, other: 'CUDAAAAGHInt') -> 'CUDAAAAGHInt':
        # Calculate a single bit XOR at position i
        async def compute_bit_xor(i: int) -> 'CUDAAAAGHInt':
            bit_self_task = asyncio.create_task(self._ith_bit(CUDAAAAGHInt(i)))
            bit_other_task = asyncio.create_task(other._ith_bit(CUDAAAAGHInt(i)))
            
            bit_self = await bit_self_task
            bit_other = await bit_other_task
            
            xor_bit = bit_self + bit_other - CUDAAAAGHInt(2) * bit_self * bit_other
            return xor_bit << CUDAAAAGHInt(i)

        # Determine the number of bits to process
        max_bits = max(self.bit_length(), other.bit_length())
        
        # Calculate all bits concurrently
        tasks = [compute_bit_xor(i) for i in range(max_bits)]
        all_bits = await asyncio.gather(*tasks)
        
        # Sum the results
        result = CUDAAAAGHInt(0)
        for bit in all_bits:
            result = result + bit
            
        return result
        
    async def _ith_bit(self, i: 'CUDAAAAGHInt') -> 'CUDAAAAGHInt':
        return CUDAAAAGHInt((self.val >> i.val) & 1)
        
    def bit_length(self) -> int:
        return self.val.bit_length()

This will reduce the time it takes for us to calculate the XOR of two numbers M and N by a factor of about log2(max(M, N)). We’re mostly dealing with 32-bit integers, so this is a 5x speedup.

I’m sure you’ve all noticed that we could use a map-reduce approach to turbocharge that last sum as well. I’m also sure that a couple of you have noticed that we could write a monad that allows us to elegantly parallelise everything, everywhere. I’m even more sure that one of you has already emailed me an implementation of this monad, written in a Lisp dialect that you designed yourself. Fair warning to that person - I am unlikely to read it.

2. Curriculum optimisation

A 5x speedup in our xor implementation is very, very, deeply impressive, but it’s still a hack. Decomposing XOR calculations into additions and multiplications is fundamentally inefficient, and it imposes a hard ceiling on our performance. In order to achieve true speed and elegance, we need to perform the XOR directly, without splitting it up.

The reason that we don’t do this already is because most children can’t calculate even a basic 8-bit XOR. I know! I was as shocked as you when I found out. But these kids aren’t stupid; their ignorance isn’t their fault. They’re being let down by a broken system that fails to teach them the skills they need to compete in today’s highly-specialised economy.

This is why we’ve successfully lobbied to have XOR calculations added to the first grade syllabus, starting in the coming autumn term. We’ve produced a textbook containing all the important bitwise operations that every 7 year-old should know, including XOR, AND, OR, and bitshifts. This will allow us to ask our computation partners truly useful questions like What is 2136782 ^ 2136821?, instead of spoonfeeding them piles of simpler calculations. We predict that this will lead to a further hashrate speedup of approximately 100x, and we expect to start seeing results in Q1 next year, after the end-of-unit quizzes start to bite.

“Teach a man to hash and you’ll etc etc.”

But why stop with XORs? Think about how legacy mining has evolved over the last decade or so. The first Bitcoins were mined on normal computers, with normal CPUs. Nowadays all bitcoins are mined using specialised computers called ASICs, hardwired to calculates hashes and nothing out. In order to compete, we have to train human ASICs.

Kids need to learn how to calculate a SHA-256 themselves, end-to-end. This is why we’re such vocal supporters of SB-1337 - “No Child Left Unmined.” SB-1337 will replace the outdated year 7 maths syllabus with an in-depth, end-to-end course on calculating SHA-256 hashes by hand. It will allow us to stop sending students trivial additions and multiplications, and send them real maths instead:

Question 1:

What is the SHA-256 hash of 01003ba3edfd7a...? (3 marks)

Question 2:

What is the SHA-256 hash of 01003ba3edfd7b...? (3 marks)

We’ll be able to turn their homework directly into bitcoins, with no intermediate calculations required. At this point all we’ll need to do is scale. Which brings me to our final speedup technique: Teacher Incentive Alignment

3. Teacher Incentive Alignment

At first some of our new Distribution Associates (or teachers, as they’re known as in the legacy system) were…hesitant to embrace our new, hash-centric curriculum. Fortunately this changed when they learned about our Teacher Incentive Alignment program (TIA).

TIA allows us to compensate Distribution Associates for their hard work, using a sliding-scale fee for every billion hashes produced by their Computation Partners (or, “students”). With TIA, whenever we profit, they profit. The Computation Partners profit too of course, through the invaluable knowledge and practice that they get by participating in MinorMiner.

We’ve found that Distribution Associates who are in the TIA program set 1,000,000% more homework questions than those who are not. For example, one particularly keen associate set their partners the following quiz:

Question 1 of 1,471,126,723

What is the SHA-256 hash of 01003ba3edfd7a...? (0.0000000001 marks)

Question 2 of 1,471,126,723

What is the SHA-256 hash of 01003ba3edfd7b...? (0.0000000001 marks)

(and so on for 1,471,126,721 more questions)

Some Associates were initially concerned that their Computation Partners might balk at such ambitious workloads, despite all the knowledge and hands-on-experience and so on that it would give them. They worried that some Partners might cheat and use specialized mining software to do their hashing homework for them. Fortunately we were able to use spreadsheets and a lot of winking to help most of them realise that this might not actually be a problem.

MinorMiner would like to make it clear that we do not in any way condone cheating. Use of our Homework Submission APIs and their associated SDKs is strictly prohibited.

So in short - yes, we are going to be fast enough. Through parallelisation, curriculum optimisation, and teacher incentive alignment, we believe that we are extremely well-positioned to speed up and scale massively in the coming years. Then what?

From Bitcoin to AI

After we’ve perfected the bitcoin use-case, we’ll pivot straight to AI. We’re already extending CUDAAAAGH with pytorch bindings that will allow users to run their training and inference code using our unique computing platform. Most children don’t know much matrix algebra, but matrix algebra is just addition and multiplication wrapped up in funny symbols. All we need to do is implement matrix multiplication using CUDAAAAGH and we’ll be very, very golden. And if our next legislative priority (SB-80085) passes, matrix algebra will soon be on the year eight curriculum too. It’s curriculum optimisation all over again.

def matmul(m1: CUDAAAAGHMatrix, m2: CUDAAAAGHMatrix) -> CUDAAAAGHMatrix:
  # (Implementation left as an exercise for the reader, you get
  # the joke by now)

Applying CUDAAAAGH to AI raises some delightful philosophical questions. Were you already deeply confused about whether sufficiently advanced AI models should count as conscious beings? Are we going to be morally obliged to care about their welfare? How much more confusing do these questions become if the AI models are an emergent property of billions of children doing their maths homework?

And what will we do after AI? Cloud computing, ladies and gentlemen, cloud computing. Children are commodity hardware. Our big, audacious goal is to implement an entire computer using them. Everywhere that a computer normally has an electron, we’ll replace it with a school-aged child doing their maths homework. CPUs become specialised children performing incredibly-specialised operations. Hard-drives become arrays of children remembering 1s and 0s. Motherboards become lines of children deciding what messages to send to the others. Think about the implications! Free computers for everyone!

That’s all I have to say today, thank you for listening. Believe in children! Invest in the MinorMiner pre-seed! Form an orderly line! Make your cheques payable to “Hobert Reaton.” No madam, there’s no “LLC” at the end. Just “Hobert Reaton.” I also accept cash and a wide range of memecoins. Here’s my wallet address. Thank you.

CUDAAAAGH is available on GitHub. It can also be installed from PyPi using pip install CUDAAAAGH although I can’t imagine why on earth you would do this.

It's not cheating if you write the video game solver yourself

2025-03-11 08:00:00

My wife and two little boys sometimes go on trips to see friends or family for whom my presence isn’t strictly required. While my wife books the flights I make a show of weighing up my options and asking if it’s really OK if I don’t come. Eventually I’m persuaded that it truly would be the best thing for all of us for me to have five to seven days to myself with no nappies and all Nintendo.

My wife hits the “Pay Now” button and when the confirmation email comes through I call my secretary and tell him to clear my calendar. When no one answers I remember that I have neither a secretary nor all that much going on, so instead I find a prestige TV series, a selection of local takeaway menus, and a short but immersive video game. This time I messaged my buddies and asked them what I should play. Steve gushed about a game called Cocoon; Morris said that he’d played it a year ago and it was “alright.” Sold.

One month later there were hugs, kisses, wave goodbye to taxi, shut the door, where’s the HDMI cable, how did it get under the sink, do I have a Nintendo Account, what’s my password, whatever I’ll make a new one, right let’s do this, estimated download time 45 minutes, start on tax returns, am I relaxed yet?

Eventually I was able to boot up Cocoon. I learned that I was going to play as a little bee guy who has to solve artsy puzzles in a lonely, abstract world for no adequately explained reason. Bee guy makes his way through the world using four glowing orbs. He can pick orbs up, carry them around, and put them back down on switches in order to open doors and unfold bridges - standard orb stuff. However, bee guy soon learns that the orbs also contain other worlds, which he can jump in and out of to help with his puzzles. If he jumps inside one orb-world whilst carrying another one then he can put orb-worlds inside each other. He can even - towards the end - put a world inside itself. Conundrums ensue.

image

By the end of Day 1 of my vacation I was having about three-quarters as much fun as I’d hoped. Cocoon’s atmosphere was absorbing and its puzzles made me smile; I only wish there had been a little bit of a story and not just a fuzzy metaphor for entropy and decay. Still, I kept going, trekking through crumbling ruins and overbearing symbolism. By the end of Day 2 I was 51% through the game. I started the next puzzle and got completely stuck. I was surely just tired, I thought, so I watched an episode of The Sopranos and went to bed. The next morning I got up at 6am, made some coffee, and went into my office to crack on. But I was still stuck. I spent an hour filling a bin bag with toys that I didn’t like, since no one was around to stop me. I tried again. Still stuck.

Then I realised. I didn’t know how to solve the puzzle, but I did know how to write a computer program to solve it for me. That would probably be even more fun, and I could argue that it didn’t actually count as cheating. I didn’t want the solution to reveal itself to me before I’d had a chance to systematically hunt it down, so I dived across the room to turn off the console.

I wanted to have a shower but I was worried that if I did then inspiration might strike and I might figure out the answer myself. So I ran upstairs to my office, hit my Pomodoro timer, scrolled Twitter to warm up my brain, took a break, made a JIRA board, Slacked my wife a status update, no reply, she must be out of signal. Finally I fired up my preferred assistive professional tool. Time to have a real vacation.

image


How I wrote a Cocoon solver

In order to write a Cocoon solver, I needed to:

  1. Model the game’s logic using a Finite State Machine
  2. Use this model to work out the sequence of actions that would get me from a puzzle’s start to its end

1. Model the game’s logic

Cocoon’s mechanics are simple but elegant. In the first level you find an orange orb sitting on a stand. You pick it up, obviously, and start to carry it around. You come to a gate with an orb-stand next to it. You put the orb down on the stand and the gate opens. You pick it up again and continue through the gate.

Eventually you come to a small reflecting pool with a special-looking orb-stand in the middle. There’s no obvious path forward, so you put your orange orb down on the stand. You stand next to it and press A, because that’s the only button in the game that does anything. You watch yourself dive into the orb, and land in another world. You’re now inside the orange orb. You start to walk, You find a new, green orb, which you use to keep solving puzzles and opening doors and moving forwards through the orange orb-world. Throughout the course of the game you find a total of four orbs, and the final puzzles require you to lug them all around and dive in and out of them in just the right order to get across the next bridge.

An example: in one puzzle you need to bring your red orb to the top of a vine so that you can use it to reveal a magic walkway to get to the next area. However, you can only climb up vines while holding your green orb, and you can only hold one orb at a time. This means that you can’t carry the red orb up the vine. The solution is therefore to pick up the red orb, dive inside the green orb, drop the red orb, jump out, use the green orb to climb the vine, dive back inside the green orb, retrieve the red orb, and jump back out.

In order for my solver to analyse Cocoon, it would need to be able to programatically manipulate a copy of it. My solver would need to know how the world was laid out, what actions were allowed, and whether it had finished solving its puzzle yet. The easiest way to do this wasn’t to have the solver interact with the real game, but to instead write a new, stripped-back copy of it that contained all its logic but none of the graphics.

This was made easier by the fact that each of the puzzles that I got stuck on could be represented as a finite state machine. A finite state machine is a system that can be in exactly one of a finite number of states at any given time. The system is able to transition between some pairs of states using a set of known rules.

Most games with a large 3-D world can’t be modelled as a FSM because they have too many possible states and too many possible transitions. The player can be in a near-infinite number of slightly different locations; so can their enemies. The player might have a huge number of items they could be carrying and past actions they could have taken, each of which could affect the world in some important way. Some parts of the game may depend on timing and agility, which are hard to represent in a FSM. In most game levels there’s too much going on for a simple model like an FSM.

However, Cocoon’s is a simple world. There are no enemies. The only items you can carry are 4 orbs. And whilst its 3-D landscape is lovely to look at, it masks a simple topology that can be modelled as a small graph - a collection of nodes (important locations like orbs and switches) and edges (pathways that connect nodes that are immediately accessible from each other). Beyond these characteristics, the exact layout of the world rarely matters.

This means that a “state” in Cocoon is easy to define and manage. A state is defined by the combination of:

  • The node you’re nearest
  • Which orb you’re holding
  • Where the other orbs are
  • (possibly a few other things, depending on the exact puzzle)

Transitions between states are similarly constrained. Players can transition between game states via only a small set of actions, such as:

  • Walking to an adjacent node
  • Picking up or putting down an orb
  • Jumping in or out of an orb

image

This means that it’s relatively simple to write a program that:

  1. Defines the layout of a puzzle world
  2. Defines the state that the game is currently in
  3. Is able to transition between states
  4. Knows how to identify a goal state

Once I’d written this program, I had a representation of the game that I could programmatically manipulate. I could tell my game to do things like “take such-and-such an action” or “tell me all the possible next actions that can be taken from the current state.” This meant that I could use my simplified game to analyse and solve the real one.

2. Work out how to get from the start to the goal state

To solve a puzzle I needed to find a sequence of actions that would take me from the puzzle’s start state to its goal state (for example, opening a door). This was a job for an algorithm called Breadth-First Search (BFS).

BFS starts at an initial state in an FSM and simultaneously takes a step to every new state that can be reached from it. For example, suppose that a puzzle starts with bee guy holding the green orb, next to a door and to a corridor to another section.

image

My solver takes steps both through the door and down the corridor, and begins keeping track of each path. From each of these new states, it takes another step to each further state that can be reached from them.

image

It keeps stepping and tracking the expanding number of paths that it’s exploring, until one of its paths reaches a goal state (for example, a state in which a particular orb is on a particular stand, which opens a bridge to the next area). At this point the solver stops, and returns the path that led to the goal state. Because the solver takes a single step down every possible path at once, the first path to the goal state that it finds is guaranteed to be the shortest one possible.

I can then input the sequence of actions that my solver returned into the real game (for example: pick up red orb, walk to orb holder, put down red orb, etc), and move on to the next level, without having to do any thinking whatsoever. Here’s my code.

image

How hard are these puzzles really?

My solver stops as soon as it has reached the goal state. However, I was also interested in fully mapping out every possible sequence of actions one could take inside a puzzle. I wanted to see how big and hard the puzzles actually are.

To do this I wrote a script that explores every possible state and transition. It keeps stepping through states transitions, even after one of its paths has reached a goal state. It only stops when none of the paths it’s exploring have any available actions that lead to a new state that it hasn’t seen before.

This script generates a new graph, in which each node is a state and each edge is a transition. The graph shows every single sequence of actions that you can take inside the puzzle. For example, here’s the puzzle that I found hardest, represented as a graph of states and transitions:

image

Drawing a fully-expanded FSM graph like this shows how small and simple Cocoon’s puzzles really are once you write them down. That path on the right, between the green and red nodes, requires you to put one orb inside another orb, then inside another orb, and then shuffle them around just-so. This is a counter-intuitive strategy and hard to find if you’re playing the game like a normal person.

However, writing everything down makes it obvious. Writing everything down makes all actions visible and strips away all of the misleading heuristics that make humans focus on some strategies and completely miss others. It reduces a puzzle to “find a path from the green dot to one of the red dots,” as you saw above, which can be solved by a moderately gifted two year-old. To be very facetious, Cocoon is a fluffy layer of graphics and game mechanics on top of an unbelievably simple maze.

This isn’t meant as a knock on the game. Cocoon is a tidy mind-bender that wasn’t designed to withstand exhaustive search. If you write a program to solve your Cocoon puzzles for you then you’re only cheating yourself. Or at least, you would be if writing such a program wasn’t so much fun.


Life lessons

Real life isn’t a finite state machine though. All my solver really proves is that small, low-dimensional problems can easily be solved using exhaustive search. By contrast, the world has an infinite number of states. It’s hard to be sure which properties are important, and the allowed transitions between states are terribly defined. No problem of any importance can be reduced to a tiny graph, which means that even the most determined two year-old won’t be able to help with any of the big challenges in your life.

However, you don’t need to fully explore all possible solutions in order to benefit from systematic search. As I’d feared, the process of writing my solver forced me to consider the game with so much rigour that I’d solved all 3 challenges that I used it for before I’d finished programming them in. When I came to encode every location and transition on the map in my program I was forced to pay attention to each of them individually for at least a few seconds, even the ones that the game had tricked me into ignoring. This helped me see that some of the nodes and actions that I’d thought were irrelevant were actually the key to the whole thing.


That evening I talked to my family on the phone. I showed them all the diagrams I’d created and all the fun I’d had. They showed me pictures of them inside St. Peter’s Basilica. My son asked if I’d beaten my game. I explained that I’d transcended it by creating a mathematical representation of its entire possibility space. He asked if that meant “no.”


Here’s the code to my solver, but you should really just play the game properly.

Come and work with me on Anthropic's Frontier Red Team

2025-02-03 08:00:00

I work at Anthropic on the Frontier Red Team. Our mission is to find out whether AI models possess critical, advanced capabilities, and to help the world to prepare. We’re hiring AI researchers and engineers in the US and UK, and if that describes you then we’d love to talk.

We explore questions like: can models design bioweapons, or accelerate vaccine research? Can they orchestrate massive cyberattacks, or defend our critical infrastructure? Can they self-improve, or build a business, or fly drones? Even if they can’t do these things right now, when might they be able to?

We’re a technical research team embedded inside both Anthropic’s policy and research organizations. We work with frontier models; top experts in every domain that we cover; and key national security and policy actors. This Wall Street Journal article explains a bit more.

Below are the people we need to help us go faster. If any of them sound like you then please get in touch:

People to build and run evaluations

We need people to design and run the evaluations for Anthropic’s Responsible Scaling Policy (RSP). These people:

  • Work with internal domain leads and external domain experts to design and build cutting edge evaluations across cybersecurity, biosecurity, AI R and D, and more
  • Build sandbox environments to safely run these evaluations
  • Build automated analysis frameworks to make sense of all the results
  • Build agentic frameworks to squeeze the best possible performance from models

I think this is a great job for anyone, and the best job in the world for generalist engineers who are also interested in AI. You get to spend half your time building bulletproof systems for novel technical problems, and the other half thinking earnestly about how many tokens it might take for an AI model to build a GPU cluster to copy itself onto. If you work in computer infrastructure anywhere else then I’m convinced that you would have more fun working here instead.

Read more and apply:

People to conduct deep research into critical domains

We believe that the most important capabilities that models are likely to develop are in autonomy, biosecurity, and cybersecurity. We have small sub-teams that lead research into each of these domains. People on these teams:

  • Carry out threat modeling. What would it even mean for a model to be dangerous in this domain? How could it cause catastrophic harm? What capabilities would it need?
  • Design evaluations that test for these capabilities
  • Run large-scale experiments on these evaluations; understand what capabilities current models have and don’t have; predict when these capabilities might emerge
  • Build safe demonstrations of dangerous capabilities to show to experts and stakeholders

The people on these teams get to work on projects from reinforcement learning to understanding the biggest threats and opportunities facing society. Never a dull moment.

Read more and apply:

Logistics

This is the best and weirdest job I’ve ever had. The team is full of brilliant people who care a lot about their jobs and each other. Most of the team is on the West Coast of the US, with some on the East Coast and a couple in London. I work from London and my schedule is great, even with two kids. I go into the London office once or twice a week and to SF a couple of times a year.

Apply now

If you’d like to work with us then apply using the links above and below. There’s no specific deadline and we’ll keep looking until we find the right people, but we have a lot of work to do and need people to help us do it as soon as possible.

Those links again:

PyMyFlySpy: track your flight using its headrest data

2024-12-04 08:00:00

“Where are we daddy?” asked my five-year-old.

“We’ll land in about an hour,” I said.

“No I mean where are we? Are we flying over Italy yet?”

I wasn’t sure. Our flight was short and cheap and the seats didn’t have TV screens in the headrests. I looked around. I noticed a sticker encouraging me to connect to the in-flight wi-fi. That would do it, I thought. A site like FlightRadar would answer my little man’s question, down to the nearest few meters.

But unfortunately for him I’m the creator of PySkyWiFi (“completely free, unbelievably stupid wi-fi on long-haul flights”). Not paying for airplane internet is kind of my signature move. We’d need a different, offline strategy.

image

I had a think. When you connect to an airplane wi-fi network, you’re usually met with a payment page where you can purchase access to the internet. The page also usually gives you the same flight information that you’d find in the back of your headrest, like speed, direction, and estimated flight length. Perhaps it would have a map as well, I thought.

I pulled out my laptop, connected to the network, and loaded up the payment page. It did indeed show our wind speed, direction, and estimated time of arrival. But no map.

(It didn’t occur to me to screenshot the page so here’s an artist’s impression)

image

“Maybe the server that’s sending us this data is actually also sending us our location, but the web page isn’t displaying it,” I thought. I opened up the Chrome developer tools. I saw that my browser was making regular requests to a /info endpoint.

image

I clicked on one of the requests. This /info endpoint was indeed sending us a huge pile of data, including fields for ground_speed, wind_speed, and estimated_arrival_time. At the bottom of the response I noticed fields for latitude and longitude. My heart leapt. But then I looked closer. They were both null. Aerofoiled.

This looked like the end of the line. I was about to give up and tell my son that we were somewhere just north of Italy, probably…Europe somewhere. But then I was hit by two fantastic ideas.

Fantastic idea number 1: the /info endpoint didn’t tell us our location, but it did tell us our precise, regularly-updated speed and direction. On our flight home I could track and save our speed and direction every second or so for the whole flight. I could use this information to estimate how far we had traveled in each second, and in which direction. I could dynamically calculate our position by starting at our airport’s co-ordinates, then adding on each second’s step.

image

Fantastic idea number 2: even if I had been able to find our latitude and longitude in the /info response, it wouldn’t have meant much to either me or my son. However, I could build a web app that ran on my laptop and showed us our dynamically calculated position on a map, in real time. The app could have automatically updating graphs of our ETA, wind direction, speed, altitude, and so on. Ooh and an interface for running arbitrary queries against the data. And event callbacks to allow me to programmatically trigger code based on flight info (“when our ETA is 2 hours exactly, block my access to netflix.com and open the latest draft of my unfinished novel”). My son would know where he was. I’d be a Good Dad.

I decided to call the app PyMyFlySpy in order to give it some brand association with PySkyWiFi, my airplane-related project. I couldn’t wait to get started. Unfortunately right now I was wedged in between a five-year-old and a two-year-old and we were all terrible at JavaScript. I waited, impatiently.

PyMyFlySpy

Eventually we landed. I built PyMyFlySpy during our holiday, over late evenings and one or two derelict afternoons while the rest of my family did normal-person fun things. I couldn’t figure out whether it was bad manners to use your laptop in artisanal Italian coffee shops, or which of them had wi-fi, so to my eternal shame I googled “starbucks near me” and planted myself in a corner with a skinny mochachino and typed away.

I finished PyMyFlySpy the day before we left. The code is available on GitHub and it’s easy to setup and run. It even has a “dummy” mode that allows you to demo it without being inside a plane, using a made-up flight.

Here’s what PyMyFlySpy can do:

Maps and graphs

PyMySkySpy shows a map of your flightpath so far. It also shows your current flight metrics and how these metrics have changed over the course of your flight. It does this for all data available from the in-flight wi-fi, even data that isn’t usually displayed on the website or headrest screen. You can see exactly where you are and feel a bit like a pilot.

image

image

Query interface

PyMySkySpy saves all the data that it records to a database. Its UI has a page that allows you to write queries against the data to answer questions like “what’s our maximum speed so far, and when did we hit it?” or “how fast was the wind during that turbulence we just went through?”

image

I’m not claiming that this is hugely useful, but I do think it’s cool.

Multi-airline support

Different airlines have different wi-fi systems. A recorder for a JetBlue flight won’t work on AirFrance. Fortunately, PyMySkySpy allows you to easily add and use recorders for different airlines. You just have to load up their wi-fi landing page, open your browser’s developer tools, and figure out how to parse their page’s data like I did above. Then you add your new code to the PyMySkySpy config, and tell the recorder to use it. Everything else continues to work just the same.

System design

The system is very simple. It has 4 parts:

  1. Firefox Extension - reads flight info from the airline’s website and sends it to the PyMySkySpy web server
  2. Local web server - saves data that the extension sends to it, and makes it available to the frontend
  3. Sqlite Database - stores data
  4. Web frontend - displays data using maps and graphs

image

The one strange design choice I made was to use a Firefox Extension to read the flight data, instead of writing a scraper that makes its own data requests directly. Scraping the information like this would have been easier and more flexible, as well as completely harmless. Hundreds of people were already connected to the wifi, and the airline’s own landing page hits the /info endpoint once every couple of seconds. Adding one more request from a scraper would have been entirely safe.

image

However, I’m sure that airlines would rather people didn’t poke around at their onboard servers, even if those people are very careful and well-intentioned and handsome. To make sure I didn’t irritate them, I came up with an even more judicious approach.

Instead of scraping the data endpoint, I wrote a Firefox Extension. The extension sits there while the airline’s wi-fi landing page requests the latest data from the /info endpoint, just like normal, every few seconds. The extension peeks at the data that’s returned; sends the data to the PyMyFlySpy web server; and finally the web server writes it to the PyMyFlySpy database, to serve to the web frontend. Using a Firefox Extension like this means that PyMyFlySpy never interacts with the plane’s info server directly. This means that PyMyFlySpy can provably never harm the server.

I had to write the extension for Firefox instead of Chrome, because Chrome is in the process of reducing extensions’ ability to interact with requests made by a website (like requests made to the /info endpoint). In particular, Chrome is going to prevent extensions from easily reading the responses to HTTP requests made by a website, which would prevent the PyMyFlySpy from reading the data returned by the /info endpoint. As far as I can tell these restrictions are half for security reasons, and half to make it harder to develop adblockers. Either way, PyMyFlySpy requires Firefox.

Future work - event subscriptions

PyMySkySpy gives us programmatic access to data about our flight. It would be fun to use this to trigger events, like:

  • “For the first half of the flight, only let me open the big report that I need to finish by 5pm today.”
  • “When our location is within 10 miles of the Grand Canyon, send the kids a Slack message to look out the window. Also send me a Slack message to bug them to look out the window.”
  • “If our altitude drops by more than 300ft in 1 second then play a reassuring but really quite urgent sound on all of my devices.”

Next holiday, perhaps.

The flight home

Our flight home was in the late afternoon. We shuffled on board and took off. I pulled out my laptop, connected to the wi-fi, and booted up PyMySkySpy. I turned to my son to show him where we were. I’d shown him the prototype every day for the last week and I though he seemed to be somewhere between “tolerant” and “mildly interested.” But he’d already fallen asleep. I took some screenshots to show him later.

I spent the next few hours monitoring and debugging the recorder to make sure that it stayed up. My two-year-old screamed the whole flight and kept trying to throw himself on the floor. I made supportive faces at my wife across the aisle and pretended to offer to take him, but she shook her head. She knew that this was important.

I watched the graphs. Temperature within normal range. Wind speed stable. Suddenly our altitude dropped by a fifty feet. I wondered if I should tell the pilots. I decided that they probably had it under control. I kept watching, just in case.

PyMyFlySpy is on GitHub.

Generating infinite, age-appropriate Cat Crimes puzzles

2024-09-02 08:00:00

A few weeks ago my 5-year-old and I tried playing Cat Crimes, a puzzle game in which you work out which of your cats ate your shoes. We had a wonderful time - for about 20 minutes.

In each round of Cat Crimes you get a puzzle card with a list of clues on it. You have to use the clues to figure out where in your front room each of your 6 cats were sitting. This tells you which one of them was responsible for your ruined stilettos. The game comes with 40 puzzle cards, ranging from the very easy to the mind-crushingly difficult.

However, the problem is that “very easy” to “mind-crushingly difficult” is a lot of ground to cover in 40 puzzles, and by the fifth puzzle the clues had become too abstract and difficult for my little man. In the first few puzzles each new clue allowed us to immediately place a new cat and then forget about it. For example, a clue might have told us that Mr. Mittens was sitting opposite Pip Squeak. We already knew where Pip Squeak was sitting, so we could work out exactly where Mr. Mittens was sitting too. This is the perfect level of complexity for a small child and his aging father at 6am.

However, as the puzzles get harder the clues stopped neatly resolving like this. They still narrowed down the possible pussy permutations, but they didn’t necessarily allow us to definitively place a new cat straightway. For example, a clue might have told us that Mr Mittens was sitting next to Pip Squeak. We know that Mr Mittens must have been on either Pip Squeak’s left or right, but we couldn’t say for sure which until we’d processed more clues. We might later learn that Duchess was sitting to Pip Squeak’s left, which in turn would tell us that Mr. Mittens must be sitting to her right.

To follow this extended chain of logic you need to hold multiple simultaneous superpositions in your head. This is fun and challenging and good puzzle design, but my kid hasn’t done superpositions at school yet so he didn’t get it. I tried drawing some pictures for him, but they made no sense even to me. We got angry with each other and eventually gave up on the game altogether.

But we’d really had a great time with those first few puzzles, so that evening I wrote us a computer program that generated an infinite number of new beginner level Cat Crimes challenges. I ran it 20 times and printed out the challenges and their solutions. The next day we continued happily solving age-appropriate cat mysteries together.

Downloads

You can download:

The program works by generating random challenges until it finds one that has a single unique solution and meets certain constraints. The constraints ensure that the challenges are easy but not too easy. For example, a maximum of 3 cats can be asleep (meaning that they are out of the round), and a maximum of 2 clues are allowed to tell you a cat’s exact position.

In order to play the puzzles you’ll need to buy the Cat Crimes game.

Good luck, and let me know how you get on!

ChatGPT mode

At first I tried asking ChatGPT to generate puzzles for me. My puzzles are guaranteed to be solvable and probably about the right difficulty, but since they’re randomly generated their solutions don’t generally have much of a careful narrative behind them.

I thought that ChatGPT might be able to do better. “Absolutely!” it said when I asked it, but it kept giving me back puzzles that had either several different solutions or no solutions at all. No dice!

To fix this I added a ChatGPT mode to my tool. In this mode the tool gives you a prompt to paste into ChatGPT. The prompt asks ChatGPT it to give you a Cat Crimes puzzle formatted in a specific way. You paste ChatGPT’s output back into the tool, and the tool checks whether the puzzle is valid. If it is then the tool converts the puzzle into printable card; if it’s not then it prints an error message for you to give to ChatGPT to help it fix the problem. You continue this debugging loop until you have a valid (and hopefully more fun) puzzle.

Disclaimer

I’m not associated with Cat Crimes in any way; this is a completely unofficial fan project. Cat Crimes is owned and published by Thinkfun Inc. Go and buy it from them!

PySkyWiFi: completely free, unbelievably stupid wi-fi on long-haul flights

2024-07-09 08:00:00

The plane reached 10,000ft. I took out my laptop, planning to peruse the internet and maybe do a little work if I got really desperate.

I connected to the in-flight wi-fi and opened my browser. The network login page demanded credit card details. I fumbled for my card, which I eventually discovered had hidden itself inside my passport. As I searched I noticed that the login page was encouraging me to sign in to my airmiles account, free of charge, even though I hadn’t paid for anything yet. A hole in the firewall, I thought. It’s a long way from London to San Francisco so I decided to peer through it.

I logged in to my JetStreamers Diamond Altitude account and started clicking. I went to my profile page, where I saw an edit button. It looked like a normal button: drop shadow, rounded corners, nothing special. I was supposed to use it to update my name, address, and so on.

But suddenly I realised that this was no ordinary button. This clickable rascal would allow me to access the entire internet through my airmiles account. This would be slow. It would be unbelievably stupid. But it would work.

Several co-workers were asking me to review their PRs because my feedback was “two weeks late” and “blocking a critical deployment.” But my ideas are important too so I put on my headphones and smashed on some focus tunes. I’d forgotten to charge my headphones so Limp Bizkit started playing out of my laptop speakers. Fortunately no one else on the plane seemed to mind so we all rocked out together.

Before I could access the entire internet through my airmiles account I’d need to write a few prototypes. At first I thought that I’d write them using Go, but then I realised that if I used Python then I could call the final tool PySkyWiFi. Obviously I did that instead.

Prototype 1: Instant Messaging

Here’s the basic idea: suppose that I logged into my airmiles account and updated my name. If you were also logged in to my account then you could read my new name, from the ground. You could update it again, and I could read your new value. If we kept doing this then the name field of my airmiles account could serve as a tunnel through the airplane’s wi-fi firewall to the real world.

This tunnel could support a simple instant messaging protocol. I could update my name to “Hello how are you.” You could read my message and then send me a reply by updating my name again to “Im fine how are you.” I could read that, and we could have a stilted conversation. This might not sound like much, but it would be the first step on the road to full internet access.

I paid for the internet on my old laptop. I hadn’t finished migrating my data off this computer, so it still had to come everywhere with me. I messaged my wife to ask her to help me with my experiments. no, what are you talking about, i'm busy she replied, lovingly.

So instead I took out my new laptop, which still had no internet access. I created a test airmiles account and logged into it on both computers. I found that I could indeed chat with myself by updating the name field in the UI.

sequenceDiagram
    participant Computer1
    participant AirmilesAccount as Airmiles Account<br>Name Field
    participant Computer2
    
    Computer1->>AirmilesAccount: TYPE: Hello how are you
    AirmilesAccount->>Computer2: READ: Hello how are you
    Computer2->>AirmilesAccount: TYPE: Im fine how are you
    AirmilesAccount->>Computer1: READ: Im fine how are you

This was a lousy user experience though. So I wrote a command line tool to automate it. My tool asked the user for a message, and then behind the scenes it logged into my airmiles account via the website, using my credentials. The tool updated the name field of my test account with the user’s message. It then polled the name field every few seconds to see if my account’s name had changed again, which would indicate that the other person had sent a message back. Once the tool detected a new value it printed that value and asked the user for their next reply, and so on.

sequenceDiagram
    actor Me
    participant AirmilesAccount as Airmiles Account<br>Name Field
    actor You
    
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>You: (no new data)
    Me->>AirmilesAccount: WRITE: Hello how are you
    You->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>You: READ: Hello how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount-->>Me: (no new data)
    You->>AirmilesAccount: WRITE: Im fine how are you
    Me->>AirmilesAccount: (poll for new data)
    AirmilesAccount->>Me: READ: Im fine how are you

Using this tool I could chat with someone on the ground, via my terminal. I wouldn’t have to pay for wifi, and neither of us would have to know or care that the messages were being sent via my SkyVenture Premium Gold Rewards account.

I still needed to find someone who would chat with me. But this was a good start!

NB: at this point I didn’t want to send any more automated data through my airmiles account in case that got me in trouble somehow. Nothing I was doing could possibly cause any damage, but some companies get jumpy about this kind of thing.

I therefore proved to myself that PySkyWiFi would work on my airmiles accounts too by updating my name ten or so times in quick succession. They all succeeded, which suggested to me that my airmiles account probably wasn’t rate-limiting the speed or number of requests I could send to it.

I then wrote the rest of my code by sending my data through friendly services like GitHub Gists and local files on my computer, using the same principles as if I were sending it through an airmiles account. If PySkyWiFi worked through GitHub then it would work through my Star Power UltimateBlastOff account too. This had the secondary advantage of being much faster and easier for iteration too.

I’m going to keep talking about sending data through an airmiles account, because that’s the point I’m trying to make.

Prototype 2: Live headlines, stock prices, and football scores

The tunnel I’d constructed through my airmiles account would be useful for more than IMing. For my next prototype I wrote a program that would run on a computer back at my house or in the cloud, and would automatically send information from the real world up to me on the plane, through my airmiles account. I could deploy it before I left for my next flight and have it send me the latest stock prices or football scores while I was in the sky.

To do this I wrote a daemon that would run on a computer that was on the ground and connected to the internet. The daemon constantly polled the name field in my airmiles account, looking for structured messages that I sent to it from the plane (such as STOCKPRICE: APPL or SCORE: MANUNITED). When the daemon saw a new request it parsed it, retrieved the requested information using the relevant API, and sent it back to me via my airmiles account. It worked perfectly.

Now I could use my first prototype to send IMs through my airmiles account, and I could use my second prototype tio follow the markets and the sports.

It was time to squeeze the entire internet through my airmiles account.

The real thing: PySkyWiFi

During the rest of the flight I wrote PySkyWiFi. PySkyWiFi is a highly simplified version of the TCP/IP protocol that squeezes whole HTTP requests through an airmiles account, out of the plane, and down to a computer connected to the internet on the ground. A daemon running on this ground computer makes the HTTP requests for me, and then finally squeezes the completed HTTP responses back through my airmiles account, up to me on my plane.

This meant that on my next flight I could technically have full access to the internet, via my airmiles account. Depending on network conditions on the plane I might be able to hit speeds of several bytes per second.

DISCLAIMER: you obviously shouldn’t actually do any of this

Here’s how it works (and here’s the source code).


How PySkyWiFi works

PySkyWiFi has two components:

  • The sky proxy - a proxy that runs on your laptop, on a plane
  • The ground daemon - a daemon that runs on a computer connected to the internet, at your home on the ground or in the cloud

Here’s a simplified diagram:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: HTTP request
    SkyProxy->>AirmilesAccount1: HTTP request
    AirmilesAccount1->>GroundDaemon: HTTP request
    GroundDaemon->>Website: HTTP request
    Website->>GroundDaemon: HTTP response
    GroundDaemon->>AirmilesAccount1: HTTP response
    AirmilesAccount1->>SkyProxy: HTTP response
    SkyProxy->>Me: HTTP response

Setup starts before you leave your house. First you start up the ground daemon. Then you get a taxi to the airport, get on the plane, and connect to the plane’s wi-fi network. You boot up the sky proxy on your laptop. Your PySkyWiFi relay is now ready to go.

You use a tool like curl to make an HTTP request to the sky proxy that you’ve started on your laptop. You address your request to the proxy (eg. localhost:1234/) and you put the actual URL that you want to query inside a custom HTTP header called X-PySkyWiFi. For example:

curl localhost:1234 -H "X-PySkyWiFi: example.com"`

The X-PySkyWiFi header will be stripped by the ground daemon and used to route your request to your target website. Everything else about the request (including the body and other headers) will be forwarded exactly as-is.

Once you make your request it will hang for several minutes. If by some miracle nothing breaks then you’ll eventually get back an HTTP response, exactly as if you’d sent the request over the normal internet like a normal person. The only difference is that it didn’t cost you anything. You will now almost certainly pay for wi-fi, because your curiosity has been satisfied and your time on this earth is very short.

Step-by-step

Here’s what happens behind the scenes:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as example.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: example.com"
    SkyProxy->>AirmilesAccount1: Write request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 1
    GroundDaemon->>AirmilesAccount2: Ack request chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ack for request chunk 1
    SkyProxy->>AirmilesAccount1: Write request chunk 2
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read request chunk 2
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: example.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>
    GroundDaemon->>AirmilesAccount2: Write response chunk 1
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 1
    SkyProxy->>AirmilesAccount1: Ack request chunk 1
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ack for request chunk 1
    GroundDaemon->>AirmilesAccount2: Write response chunk 2
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read response chunk 2
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html<br><etc>

In order:

  1. The sky proxy receives the HTTP request from your curl call. It splits the request into chunks, because the entire request is too large to fit into you airmiles account in one go
  2. The sky proxy writes each chunk one-by-one to the name field in your airmiles account.
  3. The ground daemon polls your airmiles account. When it detects that the name field has changed to a new chunk, it reads that chunk and sends an acknowledgement to the sender so that the sender knows it’s safe to send the next chunk. The receiver sticks the chunks back together and rebuilds the original HTTP request
  4. Once the ground daemon has received and rebuilt the full HTTP request, it sends the request out over the internet.
  5. The ground daemon receives an HTTP response.
  6. The ground daemon sends the HTTP response up to the sky proxy using the same process as before, in reverse. This time the ground daemon splits the HTTP response up into chunks and writes each chunk one-by-one to the name field in your airmiles account (it actually writes these response chunks using a second airmiles account to make the protocol simpler)
  7. The sky proxy polls the second airmiles account. It reads each chunk and sticks them back together to rebuild the HTTP response
  8. The sky proxy returns the HTTP response to the original call to curl. As far as curl is concerned this is a perfectly normal HTTP response, just a little slow. curl has no idea about the silliness that just transpired

The sky proxy and the ground daemon are relatively simple: they send HTTP requests and parse HTTP responses. The magic is in how they squeeze these requests and responses through an airmiles account. Let’s look closer.

Squeezing HTTP requests through an airmiles account

PySkyWiFi’s communication logic is split into two layers: a transport layer, and a network layer. The transport layer’s job is to decide what data clients should send to each other. It dictates how senders should split up long messages into manageable chunks, as well as how senders and receivers should signal information like “I am ready to receive another chunk.” The PySkyWiFi transport layer is somewhat similar to the TCP protocol that powers much of the internet, if you squint very hard and don’t know much about TCP.

By contrast, the network layer’s job is to actually send data between clients, once the transport protocol has decided what that data should be. It’s vaguely similar to the IP protocol, if you squint even harder and know even less what you’re talking about.

This division of responsibility between layers is useful because the transport layer doesn’t have to care about how the network layer sends its data, and the network layer doesn’t care what the data it sends means or where it came from. The transport layer just hands the network layer some data, and the network layer sends it however it likes.

This separation makes it easy to add support for new airmiles platforms, because all we have to do is implement a new network layer that reads and writes to the new type of airmiles account. This separation also allows us to write test versions of the network protocol that write and read from local files instead of airmiles accounts. In each case the network layer changes, but the transport layer stays exactly the same. Here’s how they work.

The transport layer

A PySkyWiFi transport connection between two clients consists of two “pipes” (or “airmiles accounts”). Each client has a “SEND” pipe that it can write data to, and a “RECV” pipe that it can read from. Clients write to their SEND pipe by writing data to it, and they read from their RECV pipe by constantly polling it and seeing if anything has changed.

flowchart LR
    Client1 --> Client2
    Client2 --> Client1

From the transport layer’s point of view, a pipe is just something that it can write and read data from. Beyond that the transport layer doesn’t care how its pipes work.

At any given moment a PSWF (PySkYWiFi) client can only either send or receive data, but not both. A client in send mode will not see data sent by the other client, and a client in receive mode should never send data because the other client won’t see it. This is unlike TCP, where clients can send or receive data at ay time.

When squeezing HTTP requests and responses through an airmiles account, the sky proxy sends the first message and the ground daemon receives it. Once the sky proxy has finished sending its HTTP request it switches to receive mode and the ground daemon switches to send. The ground daemon makes the HTTP request and sends back the response, at which point the two switch roles again so that the sky proxy can send another HTTP request.

How are long messages sent through such a small pipe?

PSWF uses small pipes (such as an airmiles name field) that can’t fit much data in them at once. This means that it takes some work and care to squeeze long messages (like HTTP requests) through them.

To send a long message, the sender first splits up their message into chunks that will fit into their SEND pipe. They then send each chunk down the pipe one at a time.

To begin a message, a sender starts by sending its first chunk of message data inside a DATA segment:

A DATA segment consists of:

  • The letter D
  • The sequence number of the chunk (a number that uniquely identifies the chunk, padded to 6 digits)
  • The actual chunk of data.

For example, a data segment in the middle of a message might read: D000451adline": "Mudslide in Wigan causes m

Once the sender has sent a DATA segment, it pauses. It wants to send its next DATA segment, but it can’t overwrite the airmiles account name field until it knows that the receiver has received and processed the previous one.

The receiver tells the sender that it’s safe for to send a new DATA segment by acknowledging every segment that it reads. The receiver does this by writing an ACK segment to its own SEND pipe:

An ACK segment consists of:

  • The letter A
  • The sequence number of the segment that is being acknowledged (padded to 6 digits)

For example: A000451

The sender is constantly polling its own RCV pipe to check for changes, and so it reads this new ACK segment promptly. Once the sender reads the ACK, it knows that the receiver has received the segment corresponding to the ACK’s sequence number. For example, if a sender receives an ACK segment with sequence number 000451, the sender knows that it’s safe to send the next DATA segment with sequence number 000452. The sender therefore pulls the next chunk from its message and constructs a new DATA segment using this chunk and sequence number. The sender writes the new segment to its SEND pipe, and then pauses waits for another ACK.

This loop continues until the sender has sent all the data in its message. To tell the recipient that it’s finished, the sender sends an END segment.

An END segment is just the letter E.

When a receiver sees an END segment it knows that the sender’s message is over. The sender and the receiver swap roles. The old sender starts polling its RECV pipe for DATA segments, and the old receiver starts chunking up its response message and sending it through its pipe, exactly as before.

None of this transport logic cares about the details of the network layer through which the segments are sent. The transport layer just needs the network layer to provide two pipes that it can read and write to. The network layer can pipe this data around via local files, a Discord profile, or an airmiles account. This genericness is what allows PySkyWiFi to work with any airline’s airmiles account, so long as the airline allows you to login to it from the plane without paying.

Here’s how PSWF uses transport protocol segments to exchange long messages:

sequenceDiagram
    actor Me
    participant SkyProxy as Sky Proxy
    participant AirmilesAccount1 as Airmiles Account 1<br>Name Field
    participant AirmilesAccount2 as Airmiles Account 2<br>Name Field
    participant GroundDaemon as Ground Daemon
    participant Website as robertheaton.com

    Me->>SkyProxy: curl localhost:1234 \n -H "X-PySkYWiFi: robertheaton.com"
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000000:<br>contents=`GET / HTTP/1.1 X-PySkyW`
    GroundDaemon->>AirmilesAccount2: Write ACK segment<br>sequence number=000000
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read ACK segment<br>sequence number=000000
    SkyProxy->>AirmilesAccount1: Write DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read DATA segment<br>sequence number=000001<br>contents=`iFi: www.robertheaton.co`
    Note over SkyProxy,GroundDaemon: Repeat until the whole HTTP request has been transferred
    GroundDaemon->>Website: GET / HTTP/1.1<br>Host: robertheaton.com<br><etc>
    Website->>GroundDaemon: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>
    GroundDaemon->>AirmilesAccount2: Write DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy-->>AirmilesAccount2: (poll for new data)
    AirmilesAccount2->>SkyProxy: Read DATA segment<br>sequence number=000000<br>contents=HTTP/1.1 200 OK\nCont
    SkyProxy->>AirmilesAccount1: Write ACK segment<br>sequence number=000000
    GroundDaemon-->>AirmilesAccount1: (poll for new data)
    AirmilesAccount1->>GroundDaemon: Read ACK segment<br>sequence number=000000
    Note over GroundDaemon,SkyProxy: Repeat until the whole HTTP response has been transferred
    SkyProxy->>Me: HTTP/1.1 200 OK<br>Content-Type: text/html, charset=UTF-8<br><etc>

The transport layer decides what data the clients should send each other, but it doesn’t say anything about how they should send it. That’s where the network protocol comes in.

The network layer

The network layer’s job is to send data between clients. It doesn’t care about where the data came from or what it means; it just receives some data from the transport layer and sends it to the other client (typically via an airmiles account).

This means that the network layer is quite simple. It also means that adding a new network layer for a new airmiles platform is straightforward. You use the new platform to implement a few operations and a few properties (see below), and then the transport layer can automatically to use your new airmiles platform with no extra work.

A network layer consists of two operations:

  • send(msg: str) - write msg to storage. For an airmiles-based implementation, this writes the value of msg to the name field in the user’s airmiles account
  • recv() -> str - read the message from storage. For an airmiles-based implementation, this reads the value of the name field from the user’s airmiles account.

A network layer implementation must also define two properties:

  • sleep_for - the number of seconds that the transport layer should sleep for in between polling for new segments from a RECV pipe. sleep_for can be very low for test implementations like files, but it should be at least several seconds for an implementation like an airmiles account. This is in order to avoid hammering remote server with too many requests.
  • segment_data_size - the number of characters that the transport layer should send in a single segment. Should be equal to the maximum size of the airmiles account field being used to transfer segments (often around 20 characters).

A network layer implementation can also optionally provide two more operations:

  • connect_send() - a hook called by the sender when a SEND pipe is initialised. In an airmiles-based implementation this allows the client to login to the platform using a username and password. This gives the client a cookie that it can use to authenticate future send and recv calls.
  • connect_recv() - a hook called by the receiver when a RECV pipe is initialised

If you fill in all these methods, you’ll be able to use PySkyWiFi on a new airline. But again, don’t.

Tips and tricks

When writing a network layer that uses a new airmiles provider, there are a couple of tricks that can make your implementation faster and more reliable.

1. Encode messages to make sure the airmiles account accepts them

Airmiles HTML forms usually don’t let users include non-alphabetic characters in their name. Stephen will probably be allowed, but GET /data?id=5 will probably be rejected.

To work around this, the network layer should encode segments using base26 before writing them to an airmiles account. base26 is a way of representing a string using only the letters A to Z . In order to convert a byte string to base26, you convert the bytes to a single large number, then you represent that number using a counting system with base 26 (hence the name) where the digits are the letters A to Z.

def b26_encode(input_string: str) -> int:
    # Convert input string to a base-256 integer
    base256_int = 0
    for char in input_string:
        base256_int = base256_int * 256 + ord(char)
    
    # Convert base-256 integer to base26 string
    if base256_int == 0:
        return 'A'  # Special case for empty input or input that equals zero
    
    base26_str = ""
    while base256_int > 0:
        base26_str = chr(base256_int % 26 + 65) + base26_str
        base256_int //= 26
    
    return base26_str

b26_encode("Hello world")
# => 'CZEZINADXFFTZEIDPKM'

The transport layer never needs to know about this encoding. The network layer receives some bytes, encodes them using base26, and writes this encoded string of A to Z to the airmiles account. When the network layer reads the base26 value back out of the airmiles account, it decodes the encoded string back into a number and then back into bytes, and then returns those bytes to the transport layer.

Encoding a string using base 26 makes it significantly longer, just like how it takes many more digits to represent a number using binary than decimal. This reduces the bandwidth of our protocol. We could increase our bandwidth by using base52 (using both upper- and lower-case letters) instead of base26, which would shorten it somewhat. This is left as an enhancement for version 2.

2. Increase bandwidth by using more account fields

Another way to increase our PSWF bandwidth is to increase the segment size that a network layer can handle. If we double the size of our segments, we double the bandwidth of our protocol.

Fields in airmiles accounts usually have length limits. For example, you might not be allowed to set a name longer than 20 characters. However, we can maximise our bandwidth by:

  1. Using the full length of the field
  2. Spreading out a segment across multiple fields

Suppose we have control over 5 fields that can each store 20 characters. Instead of using one field to transmit segments of 20 characters, we can split a 100 character segment into 5 chunks of 20 and update them all at once in a single request. The receiver can then read all 5 fields, again in a single request, and stitch them back together to reconstruct the full segment.

Further enhancements

HTTP CONNECT

It would be better if PySkyWiFi used HTTP CONNECT requests to set up the tunnel from the sky proxy to the target site, instead of manually tossing around HTTP requests. CONNECT requests are how most HTTP proxies work, and using them would allow PySkyWiFi to act as the system-level proxy and so handle requests from a web browser. It would also mean that PySkyWiFi would negotiate TLS connections with the target website directly, so its traffic would be encrypted as it passed through the airmiles account.

On the other hand, using CONNECT would also be a lot more work and I’ve already taken this joke way too far.

In conclusion

When I was done with all of this I used PySkyWiFi to load the homepage of my blog using curl, tunneling the data via a GitHub Gist. Several minutes later I got a response back. I scrolled around the HTML and reflected that this had been both the most and least productive flight of my life.

(PySkyWiFi source code here)