MoreRSS

site iconXe IasoModify

Senior Technophilosopher, Ottawa, CAN, a speaker, writer, chaos magician, and committed technologist.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Xe Iaso

The surreal joy of having an overprovisioned homelab

2025-03-25 08:00:00

I like making things with computers. There’s just one problem about computer programs: they have to run somewhere. Sure, you can just spin up a new VPS per project, but that gets expensive and most of my projects are very lightweight. I run most of them at home with the power of floor desktops.

Tonight I’ll tell you what gets me excited about my homelab and maybe inspire you to make your own. I'll get into what I like about it and clue you into some of the fun you get to have if one of your projects meant to protect your homelab goes hockey-stick.

Want to watch this in your video player of choice? Take this:
https://cdn.xeiaso.net/file/christine-static/talks/2025/surreal-joy-homelab/index.m3u8
Cadey is enby
Cadey

For this one, a lot of the humor works better in the video.

The title slide with the name of the speaker, their sigil, and contact info for the speaker.
The title slide with the name of the speaker, their sigil, and contact info for the speaker.

Hi everyone! I’m Xe, and I’m the CEO of Techaro, the anti-AI AI company. Today I’m gonna talk with you about the surreal job of having an over-provisioned homelab and what you can do with one of your own. Buckle up, it’s gonna be a ride.

So to start, what’s a homelab? You may have heard of the word before, but what is it really?

A homelab is a playground for devops.
A homelab is a playground for devops.

It’s a playground for devops. It’s where you can mess around to try and see what you can do with computers. It’s where you can research new ways of doing things, play with software, and more. Importantly though, it’s where you can self-host things that are the most precious to you. Online platforms are vanishing left and right these days. It’s a lot harder for platforms that run on hardware that you look at to go away without notice.

An about the speaker slide explaining Xe's background.
An about the speaker slide explaining Xe's background.

Before we continue though, let’s cover who I am. I’m Xe. I live over in Orleans with my husband and our 6 homelab servers. I’m the CEO of the totally real company Techaro. I’m an avid blogger that’s written Architect knows how many articles. I stream programming crimes on Fridays.

The agenda slide covering all the topics that are about to be listed below
The agenda slide covering all the topics that are about to be listed below

Today we’re gonna cover:

  • What a homelab is
  • What you can run on one
  • A brief history of my homelab
  • Tradeoffs I made to get it to its current form
  • What I like about it

Finally I’ll give you a stealth mountain into the fun you can have when you self host things.

A disclaimer that this talk is going to be funny
A disclaimer that this talk is going to be funny

Before we get started though, my friend Leg Al told me that I should say this.

This talk may contain humor. Upon hearing something that sounds like it may be funny, please laugh. Some of the humor goes over people’s heads and laughing makes everyone have a good time.

Oh, also, any opinions are my own and not the opinions of Techaro.

Unless it would be funny for those opinions to be the opinions of Techaro, then it would be totally on-brand.

A pink haired anthropomorphic orca character whacking the hell out of a server rack with the text 'Servers at home' next to it in rather large text
A pink haired anthropomorphic orca character whacking the hell out of a server rack with the text 'Servers at home' next to it in rather large text

But yes, tl;dr: when you have servers at home, it’s a homelab. They come in all shapes and sizes from single mini pcs from Kijiji to actual rack mount infrastructure in a basement. The common theme though is experimentation and exploration. We do these things not because they are easy, but because they look like they might be easy. Let’s be real, they usually are easy but you can’t know until you’ve done it to know for sure, right?

What I run

In order to give you ideas on what you can do with one, here’s what I run in my homelab. I use a lot of this all the time. It’s just become a generic place to put things with relative certainty that they’ll just stay up. I also use it to flex my SRE muscle because working in marketing has started to atrophy that and I do not want to lose that skillset.

The Plex logo with a green haired gremlin wearing a pirate hat sticking out behind it
The Plex logo with a green haired gremlin wearing a pirate hat sticking out behind it

One of the services I run is Plex which lets me—Wait, what, how did you get there?...One second.

The Plex logo
The Plex logo

Like I was saying, one of the services I run is Plex which lets me watch TV shows and movies without having to go through flowcharts of doom to figure out where to watch them.

Numa is smug
Numa

Remember: it’s a service problem.

The Pocket-ID homepage
The Pocket-ID homepage

One of the best things I set up was pocket-id, an OIDC provider. Before your eyes glaze over, here’s what you should think.

'One ring to rule them all' with 'ring' hastily replaced with 'account'
'One ring to rule them all' with 'ring' hastily replaced with 'account'

A lot of the time with homelabs and self-hosted services you end up making a new account, admin permissions flags, group memberships, and profile pictures for every service. This sucks and does not scale. Something like Pocket-ID lets you have one account to rule them all. It’s such a time-saver.

Cadey is coffee
Cadey

I wish I set one up a long time ago.

A screenshot of my homelab's Gitea server
A screenshot of my homelab's Gitea server

I also run a git server! It’s where Techaro’s super secret projects like the Anubis integration jungle live.

A screenshot of the github actions self hosted runner docs
A screenshot of the github actions self hosted runner docs

I run my own GitHub Actions runners because let’s face it, who would win: free cloud instances that are probably oversubscribed or my mostly idle homelab 5950x’s?

A screenshot of the Longhorn UI showing 42.6 terabytes of storage available to use
A screenshot of the Longhorn UI showing 42.6 terabytes of storage available to use

One of the big things I run is Longhorn, which spreads out the storage across my house. This is just for the Kubernetes cluster, the NAS has an additional 64-ish terabytes of space where I store my tax documents, stream VODs, and…Linux ISOs.

Logos for tools like ingress-nginx, external-dns, cert manager, let's encrypt, and Docker
Logos for tools like ingress-nginx, external-dns, cert manager, let's encrypt, and Docker

Like any good cluster I also have a smattering of support services like cert-manager, ingress-nginx, a private docker registry, external-dns, a pull-through cache of the docker hub for when they find out that their business model is unsustainable because nobody wants to pay for the docker hub, etc. Just your standard Kubernetes setup sans the standard “sludge pipe” architecture.

A screenshot showing proof of Eric Chlebek coining the term sludge pipe architecture
A screenshot showing proof of Eric Chlebek coining the term sludge pipe architecture

By the way, I have to thank my friend Eric Chlebek for coming up with the term “sludge pipe” architecture to describe modern CI/CD flows. I mean look at this:

A screenshot of ArgoCD showing off the standard sludge pipe architecture
A screenshot of ArgoCD showing off the standard sludge pipe architecture

You just pipe the sludge into git repos and it flows into prod! Hope it doesn’t take anything out!

A smattering of the webapps I host in my homelab
A smattering of the webapps I host in my homelab

I’ve also got a smattering of apps that I’ve written for myself over the years, including but not limited to the hlang website, Techaro’s website, the Stealth Mountain feed on Bluesky, a personal API that’s technically part of my blog’s infrastructure, the most adorable chatbot you’ve ever seen, a bot to post things on a subreddit to Discord for a friend, and Architect knows how many other small experiments.

The history of my homelab

Like I said though, you don’t always need to start out with a complicated multi-node system with distributed storage. Most of the time you’ll start out with a single computer that can turn on. I did.

A picture of my 2012 trash can mac pro on my desk
A picture of my 2012 trash can mac pro on my desk

I started out with this: a trash can Mac Pro that was running Ubuntu. I pushed a bunch of strange experiments to it over the years and it’s where I learned how to use Docker in anger. It’s been a while and I lost the config management for it, but I’m pretty sure it ran bog-standard Docker Compose with a really old version of Caddy. I’m pretty sure this was the machine I used as my test network when I was maintaining an IRC network. Either way, 12 cores and 16 GB of RAM went a long way in giving me stuff to play with. This lasted me until I moved to Montreal in mid-2019. It’s now my Prometheus server.

Then in 2020 I got the last tax refund I’m probably ever going to get. It was about 2.2 thousand snow pesos and I wanted to use it to build a multi-node homelab cluster. I wanted to experiment with multi-node replicated services without Kubernetes.

A triangular diagram balancing wattage, cost, and muscle
A triangular diagram balancing wattage, cost, and muscle

When I designed the nodes, I wanted to pick something that had a balance of cost, muscle, and wattage. I also wanted to get CPUs that had built-in PCI to HDMI converters in them so I can attach a “crash cart” to debug them. This was also before the AI bubble, so I didn’t have langle mangles in mind. I also made sure to provision the nodes with just enough power supply overhead that I could add more hard drives, GPUs, or whatever else I wanted to play with as new shiny things came out.

A picture of three of my homelab nodes, kos-mos, ontos, and pneuma
A picture of three of my homelab nodes, kos-mos, ontos, and pneuma

Here’s a few of them on screen, from left to right that is kos-mos, ontos, and pneuma. Most of the nodes have 32 GB of RAM and a Core-i5 10-600 with 12 threads. Pneuma has a Ryzen 5950x (retired from my husband’s gaming rig when he upgraded to a 7950x3D) and 64 GB of RAM. Pneuma used to be my main shellbox until I did the big Kubernetes changeover.

Not shown are Logos and Shachi. Shachi is my old gaming tower and has another 5950x in it. In total this gives me something like 100 cores and 160 GB of RAM. This is way overkill for my needs, but allows me to basically do whatever I want. Don’t diss the power of floor desktops!

Eventually, Stable Diffusion version 1 came out and then I wanted to play with it. The only problem was that it needed a GPU. Luckily we had an RTX 2060 laying around and I was able to get it up and running on Ontos. Early Stable Diffusion was so much fun. Like look at this.

An AI generated illustration of a figure that vaguely looks like Richard Stallman having a great time with an acid trip in the forest
An AI generated illustration of a figure that vaguely looks like Richard Stallman having a great time with an acid trip in the forest

The prompt for this was “Richard Stallman acid trip in a forest, Lisa frank 420 modern computing, vaporwave, best quality”. This was hallucinated, pun intended, on Ontos’ 2060. I used that 2060 for a while but then bigger models came out. Thankfully I got a job at a cloud provider so I could just leech off of their slack time. But I wanted to get langle mangles running at home so Logos got an RTX 3060 to run Ollama.

A badly photoshopped screenshot of The End of Evangelion with a certain Linux distribution's logo over Rei's face while Shinji and Asuka look on in horror at the last sunset humanity will ever see
A badly photoshopped screenshot of The End of Evangelion with a certain Linux distribution's logo over Rei's face while Shinji and Asuka look on in horror at the last sunset humanity will ever see

At a certain point though, a few things happened that made me realize that I was going off course for what I wanted. My homelab nodes weren’t actually redundant like I wanted. The setup I used had me allocate tasks to specific nodes, and if one of them fell over I had to do configuration pushes to move services around. This was not according to keikaku.

Numa is smug
Numa

By the way, translator’s note: keikaku means plan.

Then the distribution I was using made…creative decisions in community management and I realized that my reach as a large-scale content creator (I hate that term) and blogger meant that by continuing to advocate for that distro in its current state, I was de-facto harming people. So then I decided to look for something else.

The Kubernetes logo
The Kubernetes logo

Let’s be real, the kind of things I wanted out of my homelab were literally Kubernetes shaped. I wanted a bunch of nodes that I could just push jobs to and let the machine figure out where it lives. I couldn’t have that with my previous setup no matter how much I wanted because the tools just weren’t there to do it in real life.

A screenshot of my 'Do I need Kubernetes?' post
A screenshot of my 'Do I need Kubernetes?' post

This was kind of a shock, as previously I had been on record saying that you don’t in fact need Kubernetes. At the time I gave this take though, there were other options. Docker Swarm was still actively in development. Nomad was a thing that didn’t have any known glaring flaws other than being well Nomad, and Kubernetes was really looking like an over engineered pile of jank.

It really didn’t help that one of my past jobs was to create a bog-standard sludge pipe architecture on AWS and Google Cloud but way before cert-manager was stable. Ingress-nginx was still in beta. Everything was in flux.

Instructions on how to use hand dryers, but with the text 'Push button, receive bacon' under each step
Instructions on how to use hand dryers, but with the text 'Push button, receive bacon' under each step

Kubernetes itself was fine, but it was not enough to push button and receive bacon and get your web apps running somewhere. I get that’s not the point of Kubernetes per se, it scales from web apps to fighter jets, but at the end of the day you gotta ship something, right?

It really just burnt me out and I nearly left the industry at large as a result of the endless churn of bullshit. The admission that Kubernetes was what I needed really didn’t come easy. It was one of the last things I wanted to use; but with everything else either dying out from lack of interest or having known gaping flaws show up, it’s what I was left with.

Then at some point I thought, “eh, fuck it, what do I have to lose” and set it up. It worked out pretty great actually.

A screenshot of a Discord conversation where someone asks me what I think about Kubernetes after using it for a while, I reply 'I don't hate it'
A screenshot of a Discord conversation where someone asks me what I think about Kubernetes after using it for a while, I reply 'I don't hate it'

After a few months someone in the patron discord asked me what I thought about Kubernetes in my homelab after using it for a while and my reply was “It’s nice to not have to think about it”. To be totally honest, as someone with sludge pipe operator experience, “it’s nice to not have to think about it” is actually high praise. It just kinda worked out and I didn’t have to spend too much time or energy on it modulo occasional upgrades.

What I like about it

And with that in mind, here’s what I really like about my homelab setup as it is right now.

I can just push button and receive bacon. If I want to run more stuff, I push it to the cluster. If I want to run less stuff, I delete it from the cluster. Backups happen automatically every night. The backup restore procedure works. Pushing apps is trivial. Secrets are integrated with 1password. Honestly, pushing stuff to my homelab cluster is so significantly easier than it’s ever been at any company I’ve ever worked at. Even when I was a sludge pipe operator.

One of the best parts is that I haven’t really had to fight it. Stuff just kinda works and it’s glorious. My apps are available internally and externally and I don’t really have to think too much about the details.

Of course, I didn’t just stop there. I took things one step farther and then realized across my /x/ repo that I had a bunch of services fall into a few basic patterns:

  • The first generic shape of service is the headless bot that just does a thing like monitor an RSS feed and poke a web hook somewhere. This only really needs a Deployment to manage the versions of the container images and maybe some secrets for API keys or the like.
  • Second, I need to run programs that listen internally and serve API calls. Maybe they have some persistent storage. Either way, they definitely need a DNS name within the cluster so other services can use that API to do things like post messages on IRC.
  • Third, some of the things I run are web apps. Webapps are pretty much the same, but they need a DNS name outside the cluster and a way to get HTTP ingress routed to the pod. I use nginx for that, but the configuration can be a bit fiddly and manual. It’d be nice to hyper automate it so that I don’t have to think about the details, I just think about the App.

I was really inspired by Heroku’s setup back when I worked there. With Heroku you just pushed your code and let the platform figure it out. Given that I had a few known “shapes” of apps, what if I just made my own resources in Kubernetes to do that?

apiVersion: x.within.website/v1
        kind: App
        metadata:
          name: httpdebug
        
        spec:
          image: ghcr.io/xe/x/httpdebug:latest
          autoUpdate: true
        
          ingress:
            enabled: true
            host: httpdebug.xelaso.net
        

So I did that, thanks to Yoke. I just define an App, and it creates everything downstream for me. 1Password Secrets can be put in the filesystem or the environment. Persistent storage is a matter of saying where to mount it and how much I want. HTTP ingresses are a simple boolean flag with the DNS name. External DNS records, TLS certificates, and the whole nine yards is naught but an implementation detail. A single flag lets me create a Tor hidden service out of the App so that people can view it wherever they want in the world without government interference. I can add Kubernetes roles by just describing the permissions I want. It’s honestly kind of amazing.

A screenshot of the Techaro Bluesky account ominously posting about HyperCloud
A screenshot of the Techaro Bluesky account ominously posting about HyperCloud

This is something I want to make more generic so that you can use it too, I’ll get to it eventually. It’s in the cards.

Learning to play defense

In the process of messing with my homelab, I’ve had to learn to play defense.

Numa is smug
Numa

This isn’t something that the Jedi will teach you, learning how to do this is much more of a Sith legend.

Something to keep in mind though: I have problems you don’t. My blog gets a lot of traffic in weird patterns. If it didn’t, I’d run it at home, but it does so I have to host it in the cloud. However, remember that git server? Yeah, that runs at home.

A brown haired anime catgirl running away from a swarm of bots, generated with Flux [schnell]
A brown haired anime catgirl running away from a swarm of bots, generated with Flux [schnell]

When you host things on the modern internet, bots will run in once the cert is minted and start pummeling the hell out of it. I like to think that the stuff I make can withstand this, but some things just aren’t up to snuff. It’s not their fault mind you, modern scraper bots are unusually aggressive.

Honestly it feels like when modern scrapers are designed, they have these goals in mind:

Numa is smug
Numa
  • Speed up requests when the server is overloaded, because if it’s returning responses faster it must be able to handle more traffic, right?
  • Oh and if the server is responding with anything but 200, just retry that page later. It’ll be fine, right?
  • Not to mention, those Linux kernel commits from 15 years ago may have changed since you last looked, so why not just scrape everything all over again a few days later?
  • Caches? That requires more code. We gotta ship fast and iterate. We can’t spend time downloading git repositories or caching the etags. That’ll slow us down!
  • Oh, they’re blocking our datacenter IP addresses? No problem! We’ll just cycle through sketchy residential proxy services so that they just think it’s a bunch of people using normal chrome to fetch unusual amounts of webpages.

What could go wrong? Pass me the booch yo.

A smug green haired anime woman telling you to not use VPNs
A smug green haired anime woman telling you to not use VPNs

By the way, public service announcement. Don’t use VPNs unless you have a really good reason. Especially don’t use free VPNs. Those sketchy residential proxy services are all powered by people using free VPNs. If you aren’t a customer, you are the product.

What makes this worse is that git servers are the most pathologically vulnerable to the onslaught of doom from modern internet scrapers because remember, they click on every link on every page.

A screenshot of a webpage with about 50 billion yellow tags highlighted, each is a clickable link
A screenshot of a webpage with about 50 billion yellow tags highlighted, each is a clickable link

See those little yellow tags? Those are all links. Do the math. There’s a lot of them. Not to mention that git packfiles are stored in compressed files which can’t seek. Every time they open every link on every page, they go deeper and deeper into uncached git pack file resolution because let’s face it, who on this planet is going out of their way to look at every file in every commit of GTK from 2004 and older. Not many people it turns out!

And that’s how Amazon’s scraper took out my Git server. I tried some things and they didn’t work including but not limited to things I can’t say in a recording. I debated taking it offline completely and just having the stuff I wanted to expose publicly be mirrored on GitHub. That would have worked, but I didn’t want to give up. I wanted to get even.

Then I had an idea. Raise your hand if you know what I do enough to know how terrifying that statement is.

More of you than I thought.

Somehow I ended up on the wikipedia page for weighing of souls. Anubis, the god of the underworld, weighed your soul and if it was lighter than a feather you got to go into the afterlife. This felt like a good metaphor.

A screenshot of Anubis' readme, showing a brown haired jackal waifu looking happy and successful
A screenshot of Anubis' readme, showing a brown haired jackal waifu looking happy and successful

And thus I had a folder name to pass to mkdir. Anubis weighs the soul of your connection using a SHA256 proof-of-work challenge in order to protect upstream resources from scraper bots. This was a super nuclear response, but remember, this was the state of my git server:

A server that was immolated by fire
A server that was immolated by fire

I just wanted uptime, man.

Either way, the absolute hack I had worked, so I put it on GitHub. Honestly, when I’ve done this before it got ignored. So I just had my 4080 dream up some placeholder assets, posted an blog about it, and went back to playing video games.

Then people started using it. I put it in its own repo and posted about it on Bluesky.

Screenshots of people raving about Anubis
Screenshots of people raving about Anubis

I wasn’t the only one having this problem it seems! It’s kinda taking off! This is so wild and not the kind of problem I usually have.

The GitHub star count graph going hockey-stick
The GitHub star count graph going hockey-stick

Like the graphs went hockey stick.

The GitHub star count graph going even more hockey-stick
The GitHub star count graph going even more hockey-stick

Like really hockey-stick.

The GitHub star count graph continuing to be a hockey-stick
The GitHub star count graph continuing to be a hockey-stick

It just keeps going up and it’s not showing any signs of stopping any time soon.

Anubis' GitHub star count compared to my other big projects
Anubis' GitHub star count compared to my other big projects

For context, here it is compared to my two biggest other projects. It's the mythical second Y axis graph shape. So yeah, you can understand that it’s gonna take a while to circle back to the Techaro HyperCloud.

The cool part about this in my book though is that because I had a problem that was only exposed with the hardware my homelab uses (specifically because my git server was apparently running on rotational storage, oops), I got creative, made a solution, pushed it to GitHub, and now it’s in use to protect GNOME’s GitLab, SourceHut, small community projects, god knows how many git forges, and I’ve heard that basically every major open source project that self-hosts infrastructure is evaluating it to protect their websites too. I really must have touched a nerve or something.

Conclusion

In conclusion:

If you like it, you should self-host it. Online services are vanishing so frequently. Everything is centralizing around the big web and it makes me afraid for what the future of the small Internet could look like should this continue.

Anubis looking pensive next to 'Think small'
Anubis looking pensive next to 'Think small'

Think small. A single node with a 2012 grade CPU and 16 gigabytes of dedotated wam lasted me until 2019. When I get a computer, I use the whole computer. If it’s fine for me, it’s more than enough for you.

A smug green haired anime woman telling you to fuck around and find out, but not as a threat
A smug green haired anime woman telling you to fuck around and find out, but not as a threat

Fuck around and find out. That’s not just a threat. That’s a mission statement.

Remember that if you get an idea, fuck around, find out, and write down what you’ve learned: you’ve literally just done science. Well, with computers, so it’d be computer science, but you get my point.

And if bots should come in and start a-pummeling away, remember: you’re not in the room with them. They’re in the room with you. Remember Slowloris? A little birdie told me that it works server to client too. Consider that.

The GReeTZ / special thanks slide
The GReeTZ / special thanks slide

My time with you is about to come to an end, but before we go, I just want to thank everyone on this list. You know what you did. If you’re not on this list, you know what you didn’t do.

The conclusion slide with more contact info
The conclusion slide with more contact info

And with that, I've been Xe! I'll be around if you have questions or want stickers. Stay warm!

If I don’t get to you, please email your questions to [email protected]. With all that out of the way, does anyone have any questions?

I'm testing Anubis in prod

2025-03-20 08:00:00

Hey all!

Anubis has really been taking off to the point that it has its own repo now. I'm going to be doing more work on it, but for right now what I really need is data. In order to get this data, I need you to let me know what I just broke by turning on Anubis in prod.

What I know broke:

  • Discord link resolving (still working on fixing this, but I wanted to get this post out first)
  • Twitter link resolving

If I missed something, contact me.

Opsec and you: how to navigate having things to hide

2025-03-13 08:00:00

It feels like privacy has become "impossible", hasn't it? What does it mean to actually be "private" these days? Who are you defending against? What do you want to do in order to mitigate it? And more importantly, how do you do this without giving up the conveniences of modern life?

In this talk, I'll be covering the finer points of operational security (opsec), knowing your threat model, building your own infrastructure to self-host things that are important to you with discarded hardware, and how to "blend in" when traveling or even at home. It's all about balance and figuring out what your needs are. My needs are certainly a lot different than yours are. This is a nuanced topic and I am not going to pretend there isn't any.

Want to watch this in your video player of choice? Take this:
https://cdn.xeiaso.net/file/christine-static/talks/2025/opsec-and-you/index.m3u8
The title slide with the title 'Opsec and you: how to navigate having something to hide' and speaker information.
The title slide with the title 'Opsec and you: how to navigate having something to hide' and speaker information.

Hi, I'm Xe. You probably know me from my blog. Today, I'm gonna give a talk that I really wish I didn't have to give. In a sane or just world, I wouldn't need to have this talk exist; however, we know what world we got and I'm here, so today I'm gonna talk about operational security or opsec.

Opsec in rather large text.
Opsec in rather large text.

Opsec is a somewhat multifaceted topic, but it really boils down to making sure you keep yourself safe online.

It’s really easy to go down the online privacy rabbit hole and way past Narnia. This is fundamentally a game of balancing your authentic expression with how much information you share. Again, it sucks that we have to have this conversation, but I’d really much rather y’all have the tools to protect yourselves.

The agenda slide for the talk.
The agenda slide for the talk.

Today, I’m gonna cover the basics of what opsec is, give you practical tips on how to protect yourself online, how to control what you can, be aware of the things you can’t, show you the tools you can use today to keep yourself safe, and give you tips on how you can set up your own online infrastructure so that you can have real privacy online.

About the speaker slide.
About the speaker slide.

Before we get into all that though, I’m Xe. I’m the CEO of Techaro, which is a totally real company that actually exists. I’ve written god knows how many articles and I’ve worked at a smattering of companies. Some of them you know, most of them you don’t. I live in Ottawa with my husband and my 6 homelab servers.

'Opsec 101' in rather large text.
'Opsec 101' in rather large text.

So, let’s talk about opsec. Today I’ll start out with what it means. Perfect security is impossible. Any actions you take are compromises. Sure in theory you can just become a hermit and live away from society, but that makes it difficult to do things like attend conference talks or post on social media. Like I said, it’s all about compromises and balance. Unless you're a citizen of Germany, in which case you can actually have real privacy online, asterisk.

Another thing to keep in mind is that it’s a lot easier to be one of the people out there in the audience watching this talk than it is to be me, the person giving it. There are completely different security implications at play. The trick is to figure out the right balance of information you share vs information you don’t share.

'You're gonna fuck it up' in rather large text.
'You're gonna fuck it up' in rather large text.

Also, you’re gonna fuck it up. You will accidentally leak something. You are going to make an error and it will be okay. The other trick with opsec is to balance things out such that when you do inevitably make that error you minimize the consequences. You will fall for a phishing link. The trick is when you inevitably fuck it up, the consequences are minimized as much as possible.

Threat modeling

The heart of operational security is the threat model. A threat model is the list of things and people you care about and what you are protecting against. This is probably one of the most personal parts of this. Your threat model is going to differ vastly from mine. Here’s an example threat model for a guy I just made up:

An example threat model for Sleve McDichael
An example threat model for Sleve McDichael

Let’s imagine a guy named Sleve McDichael. He’s a straight white dude that posts cooking videos to TikTok. He doesn’t really have any enemies and works as a car mechanic. He’s civilly involved and sometimes posts about US politics. He used to play baseball and probably peaked in high school.

Let’s say the worst thing that could happen to Sleve is that someone gets angry about one of his cooking videos. He doesn’t mention his employer in his cooking videos, maybe he’ll say “oh yeah I’m a car mechanic” at some point, but overall he doesn’t mention where he works. Just to be safe, he let his employer know about the cooking TikTok videos. Their reaction was “oh cool I’ll follow and make the good recipes”. Imagine how simple Sleve’s life is. This is the dream.

Sleve has random internet strangers in scope for his threat model. Random internet strangers aren’t the most predictable, but generally they have limits as to what they can do. Individuals can only really do small scale actions.

The other thing to keep in mind with Sleve’s threat model is that there’s things that are out of scope. Usually most threat models end where the government begins. Sure hope that’s not an ominous thing to say in Anno Dominium Two Thousand And Twenty Five fake laugh.

The list of things Sleve can control.
The list of things Sleve can control.

In terms of things that can impact his threat model, here’s the low hanging fruit that Sleve can control. He can control what he posts, such as by not mentioning that he works at Jiffy Lube. He can control what social media apps he uses, such as TikTok or Bluesky. He can control when he posts because you can figure out where someone lives by when you post (you usually don’t post while you’re asleep!). He can also control what he shows in any photos or videos he posts.

The list of things Sleve cannot control.
The list of things Sleve cannot control.

Now let’s take a look at the things Sleve can’t control. Generally, Sleve can control the things he does, but he can’t control what other people do in response to them. He can’t control what other people do, and he has even less control over what the government does. Sure, he votes, but I vote too.

The list of things Sleve cannot easily control.
The list of things Sleve cannot easily control.

There’s also a bunch of things in the middle between things Sleve can and can’t control. In theory he can control his writing style so that people can’t identify him by his “writeprint”, but changing your writeprint (or even being cognizant of it) is difficult for most people. If he’s really worried, he can use an AI tool to rewrite what he posts so that it’ll hide his writeprint. Yes, this is something that works, and every AI model has its own writeprint. Even models that run on your local device are good enough to hide it -- fun fact, the Torment Nexus has a use.

In theory, Sleve also has control of how he speaks (voice training is a thing that does exist), but it’s difficult to control for most people. These are things that he needs to keep in mind as he writes posts or makes cooking videos.

Opsec behaviors

Despite everything, Sleve still manages to keep himself safe online. In order to keep yourself safe like Sleve does, there’s a few behaviors you can follow and they’re mostly low-hanging fruit:

  • Don’t follow out those viral online quizzes or install apps you don’t need to. Who knows what the publishers of those quizzes or viral apps are doing with your data. Remember Cambridge Analytica? That started with online quizzes. Once it's off your device, God knows.
  • Another strategy is to google your name or usernames to see what comes up. Think like an attacker. What can you dig up about yourself from your online footprint?
  • Be aware of phishing. This is statistically the thing that you are inevitably going to fuck up. Attackers only have to be lucky once, you have to be lucky every time. I’ve fallen for phishing before and because I set things up to lessen the consequences, nothing bad happened. I didn’t even lose control of that Discord account that got temporarily yoinked. I even got control back without contacting support.
  • Use HTTPS. Browsers used to be more vocal about using HTTPS, but the s in HTTPS means “secure”. When you connect over HTTPS, it’s encrypted on the wire. Attackers may be able to see what domain names you are visiting, but they won’t see much more than that. Any contents of webpages or the paths you are visiting aren’t visible, even over public insecure wifi. The page you're visiting, the contents, or the paths are secure, even over public Wi-Fi.
Numa is concern
Numa
What the 'not secure' mark looks like in Chrome, Firefox, and Safari.
What the 'not secure' mark looks like in Chrome, Firefox, and Safari.

Most browsers won’t let you know if the website you’re connected to is over HTTPS. Browsers will want you to assume HTTPS is the default. They will show you a “NOT SECURE” warning when you are not using HTTPS. Look for “Not Secure” in the address bar. If it’s there? Browse away to somewhere else. They probably don't need your traffic.

  • Use multi-factor authentication. It’s free. Passkeys are built into every major OS and are immune to phishing. Use 6 digit two factor authentication codes if you have to, but if you can avoid it never use SMS authentication codes. Your bank may not let you disable SMS authentication though. Your password manager will have support for two-factor auth stuff; I'll get into password managers later.
  • Before you post something, take a moment to think about what you’re about to do. Is it really worth posting? Once you post something, even if you delete it, it’s really hard to un-post it. It’s much easier to just not post it in the first place. I have a thing set up to let me think I’m posting things but it just deletes them. Best thing I’ve set up in a while. One of the things I have set up for myself is a website that looks like Twitter, so I can type things and hit "post", and it just gets sent to /dev/null. It's great, one of the best things I've ever set up.
  • Use full disk encryption on your machines. If you use a Mac, it’s on by default. If you use Windows, look for BitLocker in your settings. If you use Linux, look for LUKS in your distribution’s documentation. Full disk encryption is especially important for laptops because laptops can and will inevitably be left behind at the coffee shop. If the disk is encrypted, the machine is worthless to attackers.

Nyms

'Nyms' in rather large text.
'Nyms' in rather large text.

One of the things you can do to keep yourself anonymous online is to use pseudonyms, also known as nyms. These are names that don’t match the name on your passport. If you’re part of the furry community, you probably know your best friends by names like Soatok, Cendyne, or Framebuffer instead of whatever their passport names are. Pseudonyms are really easy to adopt and can be a great way to add personality to your online presence.

Xe's GitHub profile.
Xe's GitHub profile.

Fun fact: the name I use professionally is a pseudonym! I don’t use my passport name professionally so that I can brand myself better. Xe Iaso is three syllables instead of the longer name that I use on my passport that people constantly misspell and mispronounce. It's also three syllables, and I thought it would be less easy to typo, but I've also had to buy the domain xeLaso.net because someone at Apple decided that the serifs on lowercase L were too ugly.

If you are going to adopt pseudonyms, make sure that you only use two or three separate nyms at once. If you use more than that, you’ll run into the risk of confusing them with each other. If you’re plural, you may be able to get away with more, your mileage may vary, less is more. You’ve probably run into something I’ve published under a pseudonym and never known. Someone you know has published under a pseudonym and you've never known.

If you’re going to use pseudonyms longer term, make sure to make their social media accounts in advance and “age” them. New accounts look more suspicious than older accounts do. Brand new accounts have things that stand out in the UI of most social platforms to make them look fishy, because most phishing comes from brand new accounts. Accounts that recently became active after being idle also look suspicious for super-intense scrutiny, but you can automate posting to prevent a lot of the worst effects. Don’t feel bad about aging your nyms for a few months or even a year.

Pro tip: use AI models to help anonymize your writing. I use obscure locally hosted models to do this so that people can't place why they think the text looks familiar. This is a great way to keep your writing style from being used to identify you.

Aoi is wut
Aoi

Really? Are you sure? That seems a bit unbelievable.

Cadey is aha
Cadey

Yep! The really neat part is that this extends to very small local models too. Here's an example of Apple Intelligence (one of the worst models out there) rewriting the abstract for this talk (you can see it at the top of the page).

Mimi is happy
Mimi

In today’s digital landscape, privacy has become increasingly challenging. This presentation will delve into the intricacies of operational security (opsec), elucidating the concept of true privacy in the modern world. It will explore the identification of potential threats, the establishment of self-hosted infrastructure utilizing discarded hardware, and strategies for blending in during travel or at home. The key takeaway is the importance of striking a balance between privacy and convenience. While the specific requirements may vary, this presentation aims to provide a comprehensive understanding of the nuances involved.

Generated by Apple Intelligence (macOS)
Cadey is enby
Cadey

The really cool part is that this effect works with every single language model on the market. Each of them have their own writeprint, meaning that if you consistently stick to one, you can be theoretically tracked that way. This will be a way to keep your writing style from identifying you in particular, but people can and will track the writing style of the model. Everything's a tradeoff.

Metadata

'Metadata' in rather large text.
'Metadata' in rather large text.

One of the other big things to think about with regards to opsec is metadata. Metadata is data about data. One of the best examples of metadata is the data attached to photos. Here’s an example with a photo I took on my iPhone:

A picture of a sign in Brooklyn that says 'No standing'.
A picture of a sign in Brooklyn that says 'No standing'.

This is a photo I took in New York City in order to communicate how strange the sign was to me. I still think it’s kinda strange, but here’s the metadata that my iPhone attached: It says "no standing," referring to stopped cars.

The same picture with a window to the side showing the photo metadata.
The same picture with a window to the side showing the photo metadata.

Wow, that’s a lot of info! It says I used an iPhone 15 Pro Max with the telephoto lens at ISO 50, f/2.8, a shutter speed of 1/125 seconds, and has the exact GPS coordinates the photo was taken at. Let's break this down. The telephoto lens is about 120mm equivalent, has an aperture of f2.8, shutter speed of 1/125 seconds, and has the exact GPS coordinates of where I hit the capture button. This is a shocking amount of metadata at first glance. It makes you wonder, how much information are you really sharing when you upload a picture to the internet?

The good news is that online platforms know about this and take steps to prevent you from doxxing yourself with picture metadata. Most of this data is stored as EXIF data. Modern platforms will scrub this data before sharing any photos users upload. I've seen some mobile OSes, like CalyxOS and GrapheneOS, strip that at the photo picker level. But your mileage may vary; you may be more or less paranoid.

A screenshot of the GPSDetect extension.
A screenshot of the GPSDetect extension.

If you use Firefox, you can install the GPSDetect extension and you’ll get a notification every time someone leaves GPS metadata in their photos. The link to the extension will be in a resource list at the end. Here’s an example of what it looks like in action:

A screenshot of the GPSDetect extension in action. Three notifications showing GPS coordinates of photos.
A screenshot of the GPSDetect extension in action. Three notifications showing GPS coordinates of photos.

You’ll get notifications like this every time someone didn’t strip the GPS metadata from their photos. When I encounter these in the wild, I usually send an email to the people that published those photos to help them out. They’re almost always thankful.

Other bit of metadata you may not think about: pictures of the sky can be used to figure out where the photo was taken. This requires more complicated attacks, but try to avoid posting pictures of the sky the same day you are taking them. If they're posted within about five minutes of when you took them, a dedicated attacker can figure out where you are.

Some people vary, but most people have a 24 hour sleep cycle. About 8 hours of the day are going to be spent sleeping. Usually when people are asleep, they aren’t posting. Here’s an example based on my Reddit account:

A screenshot of my active times on Reddit based on public account actions like comments and story posts.
A screenshot of my active times on Reddit based on public account actions like comments and story posts.

I live in eastern time. My most active hours on reddit align with the morning and evening eastern time. This is my Reddit account's peak activity time: right after work, and right after I wake up. If you were looking at my Reddit account history, you could probably figure out that I live in eastern time just from the metadata of when I post. This is something to keep in mind.

Tools

'Tools' in rather large text.
'Tools' in rather large text.

Now that we covered metadata, let’s branch into the more practical part of this talk: what tools you should use.

Browsers

The old Google Chrome and Mozilla Firefox logos.
The old Google Chrome and Mozilla Firefox logos.

As far as browsers go: use very common browsers. Pick either Firefox or Chrome. They are very boring browsers, but they’re used by a lot of people. If someone hacks Chrome or Firefox, it’s almost certainly not to hack you in particular. They both suck, but they are used by so many people that nobody is going to attack you in particular via Chrome or Firefox, because there are way more high-value targets like governments and banks. Common browsers also mean that you blend into the crowd and are harder to attack. Common browsers also mean your metadata blends in better and is harder to uniquely identify.

VPNs

'VPN' in rather large text.
'VPN' in rather large text.

One of the things that you’re gonna want to do is shove all your traffic into a VPN. This is what the YouTubers suggest after all, it sounds like it’s a good idea, and it’s not that expensive, right? It encrypts your IP address, right? It stops the hackers from getting your information! It's what the YouTubers suggest with the NordVPN and ProtonVPN ads, and advertising hasn't lied to you, has it? It's not that expensive, it's like three Starbucks drinks in 2019.

'VPN' in rather large text with a 'no' symbol over it.
'VPN' in rather large text with a 'no' symbol over it.

Don’t.

Don’t use VPN services unless you have a very good reason to. Privacy VPNs are the security snake oil of our day. You should only use a VPN service as your default route if you have a very good reason to, such as to make sure that your very legal Linux ISOs are able to be downloaded without getting love letters.

A screenshot of the HTTPS metadata for the website xeiaso.net.
A screenshot of the HTTPS metadata for the website xeiaso.net.

Remember that bit about HTTPS? HTTPS is already encrypted. You don’t need to encrypt it again with a VPN. I mean, you can if you want, but you don't need to.

A screenshot of the Tor browser.
A screenshot of the Tor browser.

Use the Tor browser for any browsing that you really want to be private. Tor is free. Tor is used by a lot of people all over the world. It's free, and it's available on your OS of choice.

Remember that ancient meme that went something like “you can’t get me, I’m behind seven proxies”. That’s how Tor works.

A diagram about how onion routing works.
A diagram about how onion routing works.

Tor takes your traffic and uses onion routing to send it through a bunch of nodes and then end up getting to the target through an indirect route. This gives you even more privacy advantages than a VPN server does, especially because every website is inevitably going to be using a different circuit. Your computer sends traffic to a node that decrypts it, unwraps it, and sends it along until it reaches an exit node, which sends it to the target. You get the response back, do the whole song and dance, and you get there indirectly, usually through like seven European countries.

A screenshot of the Tor Project website.
A screenshot of the Tor Project website.

You can download the Tor browser for free from torproject.org. Again, I’ll have a resource list linked at the end of the talk. The Tor browser is available on every major OS. The Tor Project is getting an aarch64 Linux port soon. The Tor browser is made by experts that care.

The only thing to keep in mind is that you shouldn’t use it all the time, and this is more from a practical angle rather than a theoretical angle. Tor helps keep activists safe and lets people evade government censorship, but there’s also a shocking amount of abusive traffic that comes from Tor exit nodes. Lots of websites block Tor in order to protect themselves. This probably includes your favorite websites. Lots of websites, like Reddit, block Tor to protect themselves.

Messaging

A screenshot of the Signal website.
A screenshot of the Signal website.

If you’re gonna message people, use Signal. Make sure to enable disappearing messages. Disappearing messages mean that everything you send with people gets automatically deleted after a configurable amount of time. I personally use a week for most people I know.

Signal is one of the few encrypted messaging apps that has Soatok approval.

Of note: when nation state actors attack Signal, they don’t even go after the cryptography. They just attack convenience features like linked devices. When nation-state actors attack Signal, they don't go after the cryptography; they phish you. That should say a lot about Signal's security.

One of the annoying features of Signal is that it doesn’t sync message scrollback to new devices by default. I think this is a feature and proof that the messages ARE NOT BEING SAVED ON THE SERVER, but this can be an annoyance. I think they're changing this, but I think it's a feature. It's proof that messages are not being saved on the server. It's a balance of trade-offs.

Password managers

'Use a password manager' in rather large text.
'Use a password manager' in rather large text.

Use a password manager. Your device or browser likely comes with one. That one is free. I personally use 1Password with my husband and it works great for us. It’s effortless and even supports all the two-factor auth that we use. I use 1Password because we used it before a lot of the other options existed. But if you use a Mac, there's a password manager built into your iCloud account. I think Microsoft has a similar thing, but I try to avoid using Windows.

Your password manager has a password generator embedded into it. Use it. You should not know your passwords beyond the root password you use to unlock the password manager. If you only use randomly generated passwords, you can’t reuse passwords. A generated password cannot be reused unless someone has broken randomness, in which case we all have bigger issues. You should not know your passwords beyond the root password. If you only use generated passwords, you can't reuse passwords, and reused passwords are how people get popped.

Run updates

'Run updates' in rather large text.
'Run updates' in rather large text.

I know that Windows is a giant pain in the ass about updates, but seriously, run them. Updates get released for a reason. Updates patch security issues. If you don’t install updates, you can’t be protected by them. Running updates regularly is one of the easiest ways to make sure that your computers are secure. Seriously, run updates.

Self-hosting

'Self-hosting' in rather large text.
'Self-hosting' in rather large text.

Finally, you should probably know how to host things yourself. This gives you the most understanding of what platform owners can see about what you do because you become a platform. Self-hosting also can give you absolute superpowers, like being able to have every TV show or movie you want steaming at a moment’s notice without having to follow a flowchart or use dedicated websites to find out where you can watch things. No, seriously, there's a website that has detailed flowcharts for every show now, based on the show, what country you're in, and so on. It's a nightmare. There was a video by videogamedunkey about figuring out where to watch a TV show. He didn't even need to write any comedy, he just described the process of trying to watch, I think it was Severance.

If you want to get started with self-hosting, any computer will do really. You can get used desktops off of Craigslist, your local university’s surplus store, or at Woot.com. When you’re starting out, you probably don’t really have elaborate hardware needs, but anything that can turn on and run Linux is fine. You probably just need something that can turn on.

As for what to run on it, all the normal options suck equally at this point. The important part is to pick whatever you’re the most comfortable with learning about. Ubuntu and Rocky are the closest to what you’d use in production if you were to become a career systems administrator or site reliability expert. But really by this point everything is the same brand of suckitude in different ways. Some are more up to date than others, others prioritize unchanging stability, the important part is to Just Pick Something™️. Some suck more than others. Some are more out of date than others, and consider that a feature.

Once you have the OS, set up something like k3s or Docker Compose. Then you can install whatever self hosted apps you want. Here’s a whirlwind tour of the self hosted apps that I use on a regular basis: Yes, I know Kubernetes seems like a lot, but that's where the entire industry is going, because Kubernetes has sucked out all of the oxygen for everything else.

  • Plex is self-hosted Netflix that points to a folder full of media. I use it to watch anime and catch up on old movies.
  • If you want to only run open source software, there’s also Jellyfin. I personally don’t use it because I bought a lifetime Plex pass a while ago, but when Plex inevitably kills off the lifetime Plex pass I’m gonna set up Jellyfin. I don't use it because Plex was dumb enough to sell me a lifetime Plex pass for like $20. When Plex inevitably kills that off, I'm probably going to set up Jellyfin.
  • Nextcloud is like google docs, google calendar, an email client, google drive, and Slack all in one. It can do anything from instant messaging to meetings to integrations with self-hosted AI models. I’ve been meaning to use it more, but I mostly use it for storing files. It can integrate with AI stuff that runs on hardware you can look at.
  • Gitea is a self hosted GitHub. You can push private repos and even run CI on them without GitHub ever seeing your code. I use it for Techaro’s secret projects.
  • Pocket ID is an identity provider. This lets me have one account for all my internal services so that I don’t need to configure individual passwords for individual services. This is honestly one of the best things I’ve ever set up and the time savings add up so much. I don't need to configure individual passwords, group memberships, and all that nightmare nonsense. It sounds abstract, but it makes a lot of sense in practice.
'Your own apps' in rather large text.
'Your own apps' in rather large text.

One of the other big things I have in my homelab is my own apps. Here’s a screenshot of what I’m running: I've been working on something to make this easier, which I'll announce at some point in the future.

A screenshot of the k9s dashboard for my homelab.
A screenshot of the k9s dashboard for my homelab.

Listed there I have a bunch of static sites for community resources, monitoring tools, pocket-id, the slang website, a Bluesky passive scraper, a docker registry, the Techaro website, a pull-through cache of the docker hub, and even a self-hosted object storage system called Minio. This gives me basically unlimited abilities to host whatever I want. The industry standardized on Kubernetes, so whenever I want to add something else, it’s a cinch. I have a website for a satirical programming language based around the letter H, a Bluesky passive scraper, a Docker registry, the Techaro website, a pull-through cache of the Docker Hub (because they realized that their business model is inviable, so they're jacking down the rate limit), and a self-hosted object storage system called Minio. Hosting stuff myself gives me basically unlimited superpowers to do whatever I want. Because the industry standardized on Kubernetes, I can put stuff on my home lab and then move it to the cloud without thinking more than pushing a YAML file in the right place.

One of the other cool things you can do with Kubernetes is set up a Tor hidden service controller. This lets you expose your blog or another service only to people using Tor. This lets you expose services to your friends without leaking your home IP address to the world. Doing this is slow, but it’s a tradeoff that makes sense in many cases. Tor hidden services are neat; they're a way to expose a website such that people can only view it over the Tor browser, and in ideal scenarios, you can't tell where that website is hosted.

I use this for my blog so that you can access what I write regardless of any government or corporate censorship. I also plan to write something in the near future that will only be visible to people reading my blog over Tor, so keep an eye out for that! I’ll have more details about this in the resource sheet at the end. I also plan to write something that's probably going to need to only be released over Tor.

Conclusion

It’s been so much fun, but my time with you is about to run out. Let’s wrap this up. In conclusion:

  • Know your threat model. Who are you protecting against? What could they do? How can you handle the inevitable opsec fail?
  • Think before you post. It’s easier to not post something than it is to un-post something. If you really need to let it out, write it out on paper and burn it. That can't be hacked.
  • Your phone attaches GPS coordinates to photos. Strip them or use a platform you know strips them before you share them. Verify it with extensions like GPSDetect.
  • Don’t use a VPN unless you’re using it to get back into your homelab. Unless it's a site-to-site VPN to get back into hosting you control.
  • Use Tor when you want to keep browsing private. I use Tor all the time for research. Tor is love, Tor is life.
  • Use Signal for private chat. Make sure to enable disappearing messages. You got it.
  • Know how to host things yourself. Anything on your own hardware is infinitely more private than anything involving a platform.
  • Again, run updates. Updates are free and they patch security issues.
The GReeTZ / special thanks slide with a list of names.
The GReeTZ / special thanks slide with a list of names.

Before we go though, I wanna give some special thanks to all these people. You know what you did to help. If you’re not on this list, you know what you didn’t do.

The end slide with a list of my social media accounts.
The end slide with a list of my social media accounts.

And with that, I've been Xe! I'll be around if you have questions or want stickers. Stay warm! This is the first of two conferences I'm presenting at this weekend.

If I don’t get to you, please email your questions to [email protected]. With all that out of the way, does anyone have any questions? I will get back to you as soon as I can.

Q&A

Question: Can you speak about the privacy and security trade-offs between self-hosting and what it offers for privacy versus those security risks?

Xe: It's a trade-off. If you're hosting something for somebody else to connect to, then you need to make sure that keeps up to date. If you're using Kubernetes, there are ways to install tools like Keel, which will automatically update things for you, so you don't have to think about it. I use stuff like that heavily so that I have basically everything automated as much as possible. But in general, if you run updates, you're probably not going to be someone that someone's going to waste a zero-day on. And if you are that kind of person, my talk probably isn't for you because you probably need the advice of a dedicated opsec specialist. And I'm not that; I'm not even going to pretend that I am able to be that.

Question: When you say to not use VPNs, are you talking about WireGuard mesh networks such as the one that Tailscale provides?

Xe: Yeah, you can use something like a WireGuard mesh network. I use that for some of my stuff when I connect to my home lab services. A lot of them are not exposed to the public internet. I have my Kubernetes cluster set up with a unique domain name, so I can just address it by the service name. So, when I am starting to stream on Twitch, I have a PowerShell script on my desktop that I double-click, and it sends a POST request to an internal service that announces that I'm streaming. It is very hacky, but it works, asterisk.

Question: How about self-hosting your email services?

Xe: What's the diplomatic way to phrase this? I can't stop you from hurting yourself. Personally, I pay Google for my email because Google doesn't have support. And if it doesn't have support, you can't phish support. Which is kind of a horrible thing to say. But like, let's be real, one of the biggest threat vectors at this point is people phishing the support for like your phone provider, and then managing to convince them that you need a new SIM card and SIM swapping you and oh, they just stole all your apes.

Question: If you're self-hosting things, some ISPs will work to interfere with that, and can like jack down the speed or prevent incoming ICMP or something to make it difficult. How would you work around that?

Xe: I'm gonna be totally honest with the stuff that I self host that's exposed to the public internet. I have a VPS set up in Toronto that runs the moral equivalent of HA proxy. And that sends all the traffic that listens, that's the address that gets put into DNS connections go on there, get sent out over wire guard hit one of the nginx ingress pods in my home lab. And then you know that routes to wherever the hell it is all across the house gets response goes all the way back out to the internet and to the person to the person I have found that this adds like 15 milliseconds of lag and that's like literally like one frame at 60 hz and in order for people to notice it, it has to be closer to like 150 milliseconds. So it's not really that bad. In terms of providers to use for that. I use Vultr for mine, but you may want to look into Civo. The reason why is they don't have egress fees. And if a cloud provider these days is willing to make that pricing decision, you should take advantage of it while you can.

Affording your AI chatbot friends

2025-03-10 08:00:00

Servers are expensive. Servers with GPUs are even more expensive. AI agents rely on servers with GPUs. If you don’t have control over what is happening at different parts of the stack, then things can change out from under you and your AI agent can change drastically without warning. Read: your AI chatbot friend can get massively depressed out of nowhere!

In this talk, I’ll cover all of the parts involved in a production-grade AI agent workload and how and where you can and should get control of them. This will cover the overall stack you’ll end up using, model management and the risks of models changing, cost-time tradeoffs and how to make educated decisions about them, as well as stories of my misadventures when things went wrong. You will leave this talk with practical strategies for maintaining control over their AI agent’s behavior and for controlling costs.

Want to watch this in your video player of choice? Take this:
https://cdn.xeiaso.net/file/christine-static/talks/2025/ai-chatbot-friends/index.m3u8
The title slide with the talk and speaker name.
The title slide with the talk and speaker name.

Hi, I’m Xe. Today I’m gonna talk with you about the wonders of AI agents and how you can run them with whatever model you want without breaking the bank.

An anime depiction of an absolutely incensed anime businessman pointing at a whiteboard labeled 'We need AI'.
An anime depiction of an absolutely incensed anime businessman pointing at a whiteboard labeled 'We need AI'.

Imagine this is you. You wake up one day and go to a meeting. Your boss is there absolutely insisting that your product needs to have AI. What does that mean? Well let’s assume it’s something sensible, but now the mandate has come in from above and you’re the one that actually goes to implement it.

You’re probably asking yourself questions like this:

  • What are the moving parts with AI?
  • When should I buy vs build?
  • What infrastructure can I use?
  • How can I run this without spending a lot of money?
  • What is an AI agent?
A slide explaining AI agents with a robotic blue tiger working on a car with a bunch of tools.
A slide explaining AI agents with a robotic blue tiger working on a car with a bunch of tools.

Thankfully that last question is the easiest to answer. An “AI Agent” is just a model with access to tools like “escalate ticket”, “run SQL query”, or “draw an image”. The rest of the hype comes from fitting it into existing workloads like ETL nonsense with MuleSoft or something banal like that. This is really what all the hype is about: hooking AI models up to existing infrastructure so that they can do “useful things”.

The moving parts of AI

So, if an agent is just a model with access to tools, how do you make that happen? Let’s look over the moving parts of AI. To keep things simple, I’m going to break this into four parts.

  1. Models: The first part is the model. A model is a bunch of floating-point numbers that were trained on unimaginable sums of text. These models take input embeddings and use them to generate new tokens that just so happen to be words. There’s hundreds of models out there, but if you’re in doubt you should try Facebook’s Llama series of models, DeepSeek V3, or maybe one of OpenAI’s models. They’re usually good enough to start with.
  2. The inference engine: Next we have the inference engine. This is the thing that runs the model. There’s a few options on the market for inference engines, but usually you’ll use Ollama, llama.cpp, or vllm with an OpenAI API client. This is the part that needs the GPU to run. When you pay OpenAI for a model, they host the model and inference engine for you.
  3. Your code: Now we get to your code. Your code is going to be the thing that sits in the middle, wrapping your frontend around the AI model and doing whatever square peg round hole transformations you need. This is the part that’s the most diverse and opinionated, so I’m not going to cover it that much here.
  4. The user interface: Finally, there’s the user interface. This is the chat box that pops up when a user clicks on the sparkle emoji button, the tool that summarizes the meeting transcript for action items, or whatever your CEO wanted.
A slide showing all of those moving parts of AI agents.
A slide showing all of those moving parts of AI agents.

And that’s it really, the AI models get run by the inference engine. Your code calls the inference engine and then presents the results to the user interface. It’s basically the same as a database in the standard three tier webapp architecture.

The stereotypical example app

A slide showing a diagram of the stereotypical example app.
A slide showing a diagram of the stereotypical example app.

When you get started, you’ll almost certainly see a setup like this. The example code will almost certainly call OpenAI’s API, pass input and context there, maybe spin in a loop to handle tool calls depending on how it’s set up, It’ll have a chat UI like the ChatGPT UI and you pay per million tokens of output.

Let’s be real for a second. This will absolutely work for a vast majority of usecases, especially where confidentiality doesn’t matter. OpenAI’s SRE team is one of the best in the market. It’s super easy to set up, all you need is a corporate card and an API key. It works on any laptop or server with an internet connection. It’s easy to think about, and easy to compartmentalize. You get access to some of the best models in the market and it works out pretty well.

Depending on what model you choose, the needs of your product, and how much people use the AI features, you can easily get away with paying a few thousand dollars per month on the extreme edge. The best results come from making sure that you balance output quality with cost, which is more of an art than a science.

'The problems of doing this' in rather large text.
'The problems of doing this' in rather large text.

However, roses have thorns, and there’s some pretty insidious ones that may make you think otherwise for this:

The biggest problem is actually one of the biggest advantages: you’re relying on OpenAI to keep the model up. This means you don’t need to purchase GPUs or worry about uptime (let’s face it, OpenAI being down means that more than just you is down and there’s a huge GDP impact), but you’re relying on them to keep your product functional. This is a huge position of power for OpenAI. They’re selling their products at a loss so people will adopt them, and some day the financial cows will come home along with massive price hikes.

OpenAI and other providers can and will deprecate that one model that your production workloads depend on. Sure you can just change over to another model, but sometimes that can have massive consequences on your app. One time I switched a model over in one of my chatbots and she went from a happy bubbly little thing to showing symptoms of a full blown depressive episode. That was just from changing one variable in the code. Imagine how much your app’s behavior could change if the entire model was changed out but you thought you were fine because you asked for GPT-4o in your code.

And then that deprecation warning inevitably gets ignored (because let’s face it, who actually reads the deprecation warning emails these days), then suddenly it’s your emergency and you get to start learning about hyperparameters or having to tell your boss “sorry, it’s out of our hands”.

Do you really want to give the keys to the kingdom to someone else?

Self-host all the things

So then you’re inevitably temped to want to self host all the things! After all, if your code is calling your server, deprecations are on your schedule, right? This is a nice idea and sounds really good to the ear. Let’s take a look at what the stack looks like:

A diagram of what that example looks like with a self-hosted setup.
A diagram of what that example looks like with a self-hosted setup.

This looks basically the same, but it’s slightly different.

Instead of asking OpenAI’s API, you ask your own pool of GPU servers to run the model with the question and context. The API endpoint in the inference server translates your requests into tokens, the GPU mangles the tokens a bit according to the model weights and then returns a de-token-ified response to the user.

The other main difference is that there’s a model storage server in the equation because let’s face it, you’re going to have more models around than your GPU servers have space for them. In practice you’ll probably end up storing your models in object storage like Tigris and then cache them to the inference servers so they load fast.

Otherwise, your app shouldn’t really notice or care that there’s another model in the equation. The behavior will be different, sure, but you can pretty easily correct for that in your prompt and code.

'The upsides of doing this' in rather large text with the list of the upsides to the left.
'The upsides of doing this' in rather large text with the list of the upsides to the left.

This is a lot more complicated than relying on a third party, but this strategy has significant upsides that can make up for it in some circumstances.

The biggest advantage is the most obvious one. When you’re calling models on your servers, you own the stack. You get to choose what models are available. You get to choose when a model becomes deprecated. You get to mix and match models at will.

When you use a provider like OpenAI, you’re limited to the models that OpenAI has available. The OpenAI models are usually good enough, but what if you want to finetune a model for your exact usecase or experiment with models that are using things that OpenAI just doesn’t support? What if you need to see the reasoning output in your reasoning flows? OpenAI doesn’t let you do that, but self hosted models do.

Finally, the last big advantage of using a self hosting workflow is that you don’t send data to a random third party. OpenAI doesn’t see or filter what you do. This can make self-hosting more than worth it in some cases, such as using AI models to analyze trends in medical records. Nobody is going to even imagine allowing you to pipe all that data to OpenAI, no matter what acronyms OpenAI is compliant with.

An edit of 'the myth of consensual sex' meme but with AI and OpenAI over Jesus.
An edit of 'the myth of consensual sex' meme but with AI and OpenAI over Jesus.

What if your app touches LGBTQ issues, menstruation schedules, or something else that the current political zeitgeist will label wrongthink? OpenAI could just turn off answering questions like “I don’t feel like my body is right, is there something I can do to fix it?” and then you’d be totally out of luck. Your self-hosted models would be totally unaffected.

'The problems of doing this' in rather large text with a list of the problems to the right side.
'The problems of doing this' in rather large text with a list of the problems to the right side.

Of course, doing this isn’t free puppies, rainbows, and the like. There’s some significant downsides in self hosting your AI workflows and they can be subtle and insidious.

The nvidia drivers will become the bane of your existence. They’re normally stable, but they can and will fall over without notice. Always at 3am. Never during work hours, because why would they? You’ll have to deal with the nvidia drivers that you have no introspection into deciding that uptime is for cowards. You can make up for this with a worker pool and redundancy, but that makes your projects expensive. It’s a tradeoff.

Another downside is that you have to choose what models you use for your products. There’s so many options out there that it can be paralyzing. Like I said earlier, the Facebook Llama models are a good place to start, but you have to know enough about what you want out of the AI models to know which model is right for you. This is a skill you can learn, but it sucks having to learn it in anger.

There’s a few inference engine options like Ollama, llama.cpp, and vllm, but you’ll find out that they all suck in different, mutually incompatible ways. Eventually you’ll end up having opinions about which runtime is right for you, but again you have to spend the time to have those opinions.

An anime depiction of an absolutely incensed salaryman pointing at a whiteboard with unreadable text and kira-kira emoji.
An anime depiction of an absolutely incensed salaryman pointing at a whiteboard with unreadable text and kira-kira emoji.

This is time that you really don’t have when your boss is breathing down your neck wanting to show the investors a sparkle emoji button. However, there’s one huge downside that isn’t as easy to work around.

Nvidia GPUs are essential to your setup. Sure you can hurt yourself trying another provider and everyone has to learn somehow, but realistically you’re going to use nvidia GPUs because that is the path of least resistance.

The only problem is that the lead time for buying them is measured in months. And when you do get them, they’re stupidly expensive. We’re talking somewhere on the order of 40,000 per card. You can only buy them in packs of 8 with servers that cost $200,000 in total. Not to mention the power they need and the datacenter technician time to handle the inevitable hardware failure.

A line of startup workers in front of a whiteboard labeled 'Kidney donations for AI servers'.
A line of startup workers in front of a whiteboard labeled 'Kidney donations for AI servers'.

Sure you can get the cards you need off of ebay, taobao, alibaba, or that one shady guy on craigslist. But sooner or later your employees are going to run out of spare kidneys and you’ll run out of budget for the pizza parties to make up for all the kidney donations.

Even more fun: when you do get those GPUs, they only have a service life of 1-3 years. This means you have to do the whole rigamarole again! I’m kind of amazed that any companies are able to put up with this, but that’s the real cost of AI.

I don’t really know if it’s worth it. I did the self hosting flow for my chatbot in my homelab and it did work, but nvidia seems dead set on starving consumer GPUs for video memory, meaning that as I wanted to experiment with bigger and better models, I had to start branching out into the cloud.

Nomadic compute

Noah from Xenoblade Chronicles 3 standing between a fork in the path with each side labeled 'Hosted APIs' and 'Self-Hosting'.
Noah from Xenoblade Chronicles 3 standing between a fork in the path with each side labeled 'Hosted APIs' and 'Self-Hosting'.

There’s gotta be some middle path between these two extremes, right? Both of these sides have just so much suffering in different ways.

N from Xenoblade Chronicles 3 walking down a middle path labeled 'Nomadic Compute'.
N from Xenoblade Chronicles 3 walking down a middle path labeled 'Nomadic Compute'.

There is a way forward: Nomadic compute. Nomadic compute means cheating, but knowing exactly how and when you should cheat. It all revolves around taking advantage of your user patterns and the fundamental constants of the infrastructure we’re all working on top of. In a nomadic compute setup, your runtime hunts down deals between providers around your well-defined workloads, spinning up more of them when you need more and slaying off the excess when you have too many.

The biggest key to how nomadic compute works is by taking advantage of one fundamental constant between every provider: they all have nvidia GPUs. Any nvidia GPU is fungible for another one.

'Everyone has GPUs' in the middle of a smattering of cloud provider logos for platforms with GPU support.
'Everyone has GPUs' in the middle of a smattering of cloud provider logos for platforms with GPU support.

Not to mention, every cloud provider and their mom has GPUs these days. I’ve even seen single person VPS companies have GPUs available. GPUs are absolutely everywhere and because there’s so much competition, you can almost always get a really good deal.

Title: 'The only specs that matter'
Title: 'The only specs that matter'

To make this even more convenient from a nomadic compute setup: there’s only three specifications you care about when running AI workloads:

  1. The model year of the GPU
  2. The amount of vram it has
  3. The amount of memory bandwidth it has

More model year number? More fast. More vram amount? You can use bigger models More memory bandwidth? The model can respond faster.

These are fundamentally the only specs that really matter.

You don’t always need the newest possible cards either! Most of my AI workloads use the Nvidia A100, which is now three generations out of date but still way more than sufficient for my needs. They also get cheaper over time, so I pay even less for fundamentally the same experience!

A green-haired anime woman in cyberpunk Seattle starbucks desperately trying to hack with a laptop and getting angry. The coffee isn't helping.
A green-haired anime woman in cyberpunk Seattle starbucks desperately trying to hack with a laptop and getting angry. The coffee isn't helping.

But then if you try to switch between providers and handle all the corner cases of their APIs and runtimes, you end up like this. Overcaffenated hacking late nights trying to cram yet another square peg into an uncooperative round hole. Luckily though, it’s the future and we have Skypilot:

A screenshot of the SkyPilot website.
A screenshot of the SkyPilot website.

SkyPilot lets you specify what hardware you need, the providers it can pick between, and what you want the job to do. It’ll figure out what to spin up for you and just make it work.

Load it with API keys for every provider you can, set the requirements wide, and it’ll just figure it all out. You can even have it autoscale workers based on HTTP request pressure, meaning that you can just sit back and let your app go viral because the infrastructure layer will just figure it out for you.

Fundamentals

The fundamentals of nomadic compute, detailed list on the side.
The fundamentals of nomadic compute, detailed list on the side.

When you’re making nomadic compute work for you, keep these ideas in mind:

Build on top of boring tools. Sure you may want to use that fancy database that one provider offers, but don’t. Pick super boring and battle-tested tools like Postgres. Use Tigris or S3 instead of the filesystem. Make your application function anywhere that has an internet connection and an nvidia GPU. You can run WireGuard in userspace, take advantage of this!

In your three tier webapp diagrams, put your AI infrastructure in the same tier as databases. Your AI service is an ancillary support service. Make it act like one. A database is just an internal facing server with a weird API, right? Your AI services should be the same.

Finally, scale down your AI services when nobody is using them. Why should you have to pay for compute time that sits there doing nothing useful?

Cold starts can suck, but if you really cheat by putting the model weights into the docker image for the service, you can shunt a lot of the cold start cost to before you’re paying for the compute time. If your app allows you to, you can even go as far as making all of your AI workloads happen en masse in batch processing instead of spinning up and down workers on demand!

Take advantage of the product design and how people use it to your advantage. Autoscaling is love. Autoscaling is life.

Mimi, the chatbot

A slide introducing the chatbot Mimi.
A slide introducing the chatbot Mimi.

As an example, let’s take a look at my chatbot Mimi. Mimi is one of the characters in the xe iaso dot net cinematic universe and when I made her into a chatbot I wanted to make something to amuse people and help them laugh. Like any good AI project, Mimi is actually pretty complicated under the hood:

A diagram of Mimi's architecture.
A diagram of Mimi's architecture.

Mimi gets chat messages from Discord and then if she’s interested in them passes them to the GPU over Glaceon. Glaceon connects to Fly where the copy of Ollama I use for inference lives. It sends API requests over and gets API responses back so that the bot can decide what to do next.

If the AI wants to draw an image, it sends a request to Falin and Falin sends it over to fal. Falin makes a copy of the image in Tigris and then sends that URL back to the bot, which posts it to Discord.

Mimi remembers things long term with pgvector. Rather, she should, but she’s kinda bad at remembering things right now. Maybe I’ll fix that eventually.

A self-host all the things meme with Mimi placed over the stick figure
A self-host all the things meme with Mimi placed over the stick figure

When I started out, I genuinely wanted to self host all the things so that I could have Mimi be one of the only AI chatbots I know that did it. I wanted to run everything on computers that I could look at, but reality soon got in the way.

A diagram of Mimi's old architecture.
A diagram of Mimi's old architecture.

Mimi used to run across three nodes in my homelab: logos, ontos, and pneuma. Logos ran Ollama, and that’s what generated Mimi’s responses. When Mimi wanted to draw an image, she sent a request to ComfyUI on Ontos. ComfyUI generated the image from Mimi’s prompt, uploaded the image to Tigris, and spat back a URL that Mimi used to upload the image to Discord. It worked really well, but then a new model came out:

Facebook released Llama 3 with a 70 billion parameter version. The benchmarks said it was good. My testing on my MacBook said it was really good. I wanted to use it with Mimi, but there was a small problem: it was too big to fit on the GPU in Logos. It was actually bigger than any individual machine in my homelab. This is when I had to compromise, and this compromise is actually what inspired me to make the idea of nomadic compute in the first place.

It's okay to use the cloud, just make sure you have an exit strategy
It's okay to use the cloud, just make sure you have an exit strategy

I put the part of Mimi that generates AI responses into the cloud. I reach out to it over a private network and it scales down when it’s not in use. Should I need to, I can move Mimi’s models and inference engine around again. It’d be inconvenient, sure, but it would be only a mild annoyance instead of a showstopper. GPUs are fungible, and as long as you have easy access to them in the cloud, it’s more than okay to ship parts of your AI apps out.

The image drawing tool had a similar set of compromises. Originally I was using Stable Diffusion 1.5. That model works really well (and sometimes can actually generate better images than newer models), but it’s kinda cumbersome to use and Mimi’s tool use was giving ComfyUI prompts that had biblically accurate results. To spare your sanity, I’m not going to show you the worst, but trust me when I say that it could end up really badly.

Then Flux came out and it could actually handle the prompts that Mimi was using. Again, Flux was too big for my GPUs. It’s only 12 billion parameters, but even with quantization it barely fit on my gaming tower’s GPU. I wanted to use that gaming GPU for…well gaming, and GPUs are super expensive where I live in Canada. Then I found out about fal:

Fal could just run Flux for me and I’d pay for the output per image. With the exact model I picked for Mimi, it’s three tenths of a penny per image. That’s less than I’d pay in power to run the model locally. It just made sense to pick fal for my needs. Worst case, I could figure out a way to run the model elsewhere thanks to nomadic compute.

The same diagram of Mimi's architecture.
The same diagram of Mimi's architecture.

And as a result, Mimi’s infrastructure looks something like this. Everything that can scale down to zero does. I’d get Mimi to scale down when she’s not in use, but she needs to stay connected to Discord and IRC. It’s pretty nice in practice, I end up paying about $5 a month on Mimi and for what’s going on under the hood I think that’s pretty darn neat.

Lessons I've learned running AI workloads

Before we finish this out, lemme cover some of the biggest lessons I’ve learned running AI workloads that will save you time and money as you implement the sparkle emoji for your boss.

Every input matters

Every input to the AI model matters. Even small changes to prompts or user inputs can drastically alter the outputs and behavior of your product.

A diagram showing input pointing to the XKCD comic about machine learning pointing to output.
A diagram showing input pointing to the XKCD comic about machine learning pointing to output.

Generally when you think about AI systems, it’s easy to think about it like this. You take in the prompt, send it to a pile of linear algebra or whatever that mangles the language the right way and then you get an output. Usually this is the case. Usually.

Not always though! Sometimes there’s hidden inputs that seem pretty banal, such as the date and time of the user making the requests. This can have any number of strange effects from Claude getting lazier around August when Europe goes on holiday to ChatGPT getting noticeably worse output in December when Americans ritualistically give up for the year.

If your app doesn’t need a given bit of input, don’t supply it. This will make your models way more deterministic.

Most platforms and inference engines also let you set the seed value that the model uses to randomly select tokens or for seeding the diffusion space. Pick either a set seed for everyone (such as 3407) or a set seed per user. This will make your model way more predictable in practice.

Set the temperature as low as you can

One of the other main parameters to AI models is the temperature. This controls how random the output is. Higher temperatures can cause more amusing results, but higher temperatures can also cause the model to become wildly unpredictable and go off the rails. Set the temperature as low as you can.

Task Temperature
Amusement chatbot 1
Summarization of meeting transcript 0
Analysis of financial documents 0.25 or 0.5

If you need things to be strictly factual for data entry, summarization, or other cases where the details matter, set the temperature as low as zero. Otherwise, you can set the temperature to whatever feels right with testing.

My chatbot Mimi uses a temperature of 1 because nobody is going to be negatively affected if she’s wrong. I’m pointing this out because some runtimes like Ollama set the temperature to 1 by default, which can have implications for using it in more complicated flows.

Use filter models

As we all know, user input is difficult to trust. What if your user asks your AI product how to make a pipe bomb? You don’t want your AI product telling people how to do that, that could get you in trouble. Thankfully, Facebook, Google, and other companies have created filter models.

A diagram showing a filter model in the middle of the AI model and the user input.
A diagram showing a filter model in the middle of the AI model and the user input.

A filter model is something that sits in the middle of your AI model. If user input passes the filter, it goes to the model. If it fails, the user gets a reason why it failed.

The output of the model is also passed to the filter to make sure that the user didn’t manage to smuggle an input that makes the model generate an unsafe reply. This can help you make sure that user inputs are passing muster as well as making sure that your AI model doesn’t advocate for horrible things under your nose.

The two most popular filter models are Llama guard from Facebook and ShieldGemma from Google. Both of them come in a few different sizes, but something of note is that filter models are almost always smaller than general purpose models.

These models won’t be able to write poetry, tell you how to make a pie, or the recipe for pancakes. They are finetuned specifically to make sure that input and output meets quality standards. Finetuned models for a specific task can always be smaller than general purpose models. You can even get access to hosted copies of these models via services like OpenRouter.

A screenshot of Mimi thinking some random innocuous Discord message was election interference.
A screenshot of Mimi thinking some random innocuous Discord message was election interference.

These models have gotten a lot better than they used to be. One of the funniest ways Llama guard backfired on me was that one day that Mimi decided that everything was election interference. Advances in language models have really made this better, I’d love to see what a reasoning filter model would act like in practice.

Conclusion

In conclusion, AI stuff isn’t scary or expensive. The devil is in the details of how you balance the complexity around.

  • Tools like Skypilot and philosophies like nomadic compute can absolutely help you make sure that your workloads are sustainable. OpenAI can’t deprecate the models that you host yourself.
  • Host what you’re the most comfortable with. Buy the things you’re less comfortable with. This will mean that you’re going to spread your workloads between clouds, but it will mean that you’re less beholden to any individual platform in particular. If one platform tries to jack up the prices, others certainly will welcome you with open arms.
  • Cheat when you can. Take advantage of user behavior. You don’t need to pay for idle workloads, so spin them down when they’re not in use.
  • If you can’t self host a model because it’s too big for your local hardware, make sure that you can self host it at all. This means that you have an exit strategy for when a provider goes insolvent or deprecates the one AI model that’s instrumental for your app’s workload. This is kinda like hexagonal architecture.
The GReeTZ / special thanks slide with a list of names.
The GReeTZ / special thanks slide with a list of names.

Before we get wrapped up, I wanna take a moment to give special thanks to everyone on this list for helping make this talk shine. Thank you so much!

The end slide with the speaker's name, contact info, and a link to the supplemental material.
The end slide with the speaker's name, contact info, and a link to the supplemental material.

And thank you for watching! I’ve been Xe Iaso and I’m gonna linger around afterwards for questions. If I don’t get to you and you really want an answer, please email [email protected] and I’ll get back to you as soon as I can. This is my second conference talk this weekend so I may take a bit to pass out and recover.

There's a link to supplemental material and things that didn't make the cut on the slide. Scan it with your phone's QR code reader. I promise it's not a Rick Roll...this time.

Either way, have a great conference all! I’ve got stickers.

Yoke is really cool

2025-03-02 08:00:00

One of the biggest memes in site reliability is "infrastructure as code". This is usually very well-intentioned, but there's one small problem:

data "aws_route53_zone" "cetacean_club" {
          name = "cetacean.club."
        }
        
        resource "aws_route53_record" "A" {
          zone_id = data.aws_route53_zone.cetacean_club.zone_id
          name    = "ingressd.${data.aws_route53_zone.cetacean_club.name}"
          type    = "A"
          ttl     = "300"
          records = [resource.vultr_instance.my_instance.main_ip]
        }
        

This is not code. This is configuration. Sure you manage the configuration with the same tools you use to manage code, you can lint it like it is code, but it's not code. It's a fairly limited DSL that makes it easy to get infrastructure up and running. Let's say you create a new server and you want to add it to DNS. You have to declare the instance, then declare the DNS record using data from the instance.

Cadey is coffee
Cadey

If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.

What if things were a bit more flexible? What if you could make a common "dns_for_instance" method and then use that everywhere?

This is the basic idea behind Pulumi. Instead of managing your infrastructure using configuration files, you manage it in code. You can create helper functions that can be shared between projects and you can use the full power of whatever programming language you want to manage your infrastructure.

However, Pulumi has a few downsides:

  • You have to install the language runtimes and dependencies for the language you're using
  • The code has to run on the server that's managing the infrastructure

This sounds reasonable at first, but then you come to the shocking realization that code that runs on the host machine can do literally anything it wants. This means that if a dependency gets popped, your infrastructure is now compromised and likely has cryptocurrency miners running on it.

This is where Yoke comes in.

Yoke: infrastructure as code, but actually

Yoke is a project that takes this basic idea to the next level. With Yoke, you write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster.

Aoi is wut
Aoi

Wait, there's something here that I'm not getting. Why are you compiling the code to WebAssembly instead of just running it directly on the server?

Numa is hacker
Numa

Well, everything's a tradeoff. Let's imagine a world where you run the code on the server directly.

If you're using a language like Python, you need to have the Python runtime and any dependencies installed. This means you have to incur the famous wrath of pip (pip hell is a real place and you will go there without notice). If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.

One of the main advantages of using WebAssembly here is that you can compile your code once and then run it anywhere that has a WebAssembly runtime, such as with the yoke CLI or with Air Traffic Controller. This means that you can do your infrastructure applies on Windows, Linux, macOS, or even in a VM on your aarch64 MacBook without having to notice or care.

One of the main downsides of an approach like this is that WebAssembly binaries are not easy for users to introspect, meaning that you have to execute the code to see what it does. WebAssembly is a hard layer of sandboxing and Yoke doesn't expose any system calls to the host, but again this is a tradeoff between modeling infrastructure as actual code and the ability to introspect the shipped binaries.

Imagine if someone published a malicious dependency that somehow percolated into your infrastructure code. If you're running the code directly on your laptop or a server, there's basically no real way to easily sandbox that code; meaning that it can just steal your Bitcoin wallet, exfiltrate your SSH keys, or do literally whatever it wants. Modern operating systems are general-purpose and will do exactly what they are told. If you're running the code in a WebAssembly sandbox, you can be sure that it can't do anything malicious to your system because it literally does not have access to anything outside of the sandbox.

I guess an attacker could make a dependency that percolates up and causes a yoke flight to create a cryptocurrency miner in your cluster or something, but in the process it'd probably break a lot of other things and it'd be a pretty obvious attack.

I think that the tradeoff is worth it, even though it may limit the ability to share flights between users.

Think about Yoke flights as functions. They take in input and output Kubernetes resources. One of the big advantages of using WebAssembly here is that you can use the same Kubernetes manifest types that Kubernetes itself uses. This means you don't have to write your own types and you can reuse code aggressively. Here's an example bit of code that creates a Kubernetes ServiceAccount:

Cadey is enby
Cadey

In this article, KubernetesTerms will be in JavaClassNameCase. If you're not sure what one of them is, search this in DuckDuckGo:

site:kubernetes.io KubernetesTerm
        

Other things like the App CustomResourceDefinition are specific to my setup and you won't find them in the Kubernetes documentation.

func createServiceAccount(app v1.App) *corev1.ServiceAccount {
        	return &corev1.ServiceAccount{
        		TypeMeta: metav1.TypeMeta{
        			APIVersion: corev1.SchemeGroupVersion.Identifier(),
        			Kind:       "ServiceAccount",
        		},
        		ObjectMeta: metav1.ObjectMeta{
        			Name:      app.Name,
        			Namespace: app.Namespace,
        			Labels:    app.Labels,
        		},
            AutomountServiceAccountToken: ptr.To(true),
        	}
        }
        

This is roughly the same thing as the following Helm template:

{{- if .Values.serviceAccount.create -}}
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: {{ include "simpleapp.serviceAccountName" . }}
          labels:
            {{- include "simpleapp.labels" . | nindent 4 }}
          {{- with .Values.serviceAccount.annotations }}
          annotations:
            {{- toYaml . | nindent 4 }}
          {{- end }}
        automountServiceAccountToken: {{ .Values.serviceAccount.automount }}
        {{- end }}
        

Note the differences here:

  • The Go code takes in variables and replaces values directly in the structs
  • The Helm template uses text/template to replace values in the YAML, and in order to make the YAML valid, you have to pass values to the nindent function to make sure that the YAML is properly indented
  • The Go code is type-checked by the Go compiler
  • The Helm template is not type-checked by anything until it is applied to the cluster, at which point it may be too late

Admittedly, this is a super contrived simple example, but you can see how this can get way out of hand super quickly. The Go code looks terrible in comparison because all of the type names are verbose, but it is completely type-checked and you can be sure that it will work when you run it because the compiler will reject obviously invalid code.

Cadey is coffee
Cadey

Note that type-checked is different than semantically correct. The Go compiler can make sure that you are putting a string where the type wants a string, but it can't stop you if you make something semantically invalid. For example, you could create a ServiceAccount with the same name as another ServiceAccount in the same namespace, which would cause a conflict when you try to apply the manifest to the cluster.

Yoke is cool and all, but at a high level it's really just a slightly inconvenient way to write manifests in ways that make Helm look easier at first glance. However, they didn't stop there. They introduced a feature that honestly made me throw out Helm entirely: Air Traffic Control.

Air Traffic Control

Air Traffic Control is a Kubernetes operator that has you define your infrastructure as CustomResourceDefinitions. The data in the CustomResource is passed into the Flight you associate with it, and the Flight generates the manifests that get applied to the cluster.

This is the part that really transformed Yoke from "this is neat but I don't know where I'd use it" to "this is the single thing that will make Yoke indispensable in my workflow".

The key difference between Air Traffic Control and other tools like Helm is that Helm largely operates on the side of your Kubernetes cluster. You run Helm to generate manifests that get applied to the cluster, but there's no real introspection into what Helm has done from inside the cluster. Sure, there are things like k3s' HelmChart resource that let you define Helm charts declaratively, but that's really not the same as having things be a native part of the cluster.

The big thing that this fixes is editor support for understanding your CustomResources (which function similar to Helm values.yaml files in practice). With the Kubernetes extensions I'm using in my copy of VSCode, it's automatically imported the OpenAPI spec for the CustomResource types so I can get my editor's syntax highlighting, documentation, and autocompletion for free.

Mara is hacker
Mara

-Wpedantic: this is possible with Helm using a plugin like Helm Intellisense and defining a values json schema, but the process requires manual intervention and upkeep. Air Traffic Control does this automatically for you the moment you define your CustomResourceDefinition.

To really understand where and when this can be useful, let's talk about how I've been using Air Traffic Control to make deploying stuff to my clusters easier than it has ever been to deploy things at any job I've ever had.

My App CustomResourceDefinition

When I deploy my own apps to Kubernetes, I generally follow a few common "shapes" for how they should be run:

  • An internal web app that's exposed to the cluster with a Service
  • An external web app that's exposed to the internet with an Ingress
  • A worker app that doesn't need to be exposed
  • A web app that's exposed as a tor hidden service

There's also a few common bits of configuration that I usually need:

  • Persistent storage via a PersistentVolumeClaim (usually pointing to Longhorn or Tigris)
  • The number of pod replicas
  • If the app should auto-update or not
  • The port the app listens on
  • If the app should run as root
  • The healthcheck route for the app
  • The log level for the app
  • Arbitrary environment variables for the app
  • Any Kubernetes role permissions for the app
  • Secrets from 1Password via the 1Password operator

I've been working on a simpleapp chart to encode a lot of these common patterns, but it's been fairly annoying to use because I end up fighting Helm's templating system more than I end up using it to my advantage.

At some level, I'm really just doing a pure transformation of data from one format (a brief set of configuration flags) to another (a set of Kubernetes manifests). This is where I felt like Yoke could really help.

So I did that. Here's an example from the manifest that powers the sticker server:

apiVersion: x.within.website/v1
        kind: App
        metadata:
          name: stickers
        
        spec:
          image: ghcr.io/xe/x/stickers:latest
          autoUpdate: true
        
          healthcheck:
            enabled: true
        
          ingress:
            enabled: true
            host: stickers.xeiaso.net
        
          secrets:
            - name: tigris-creds
              itemPath: "vaults/lc5zo4zjz3if3mkeuhufjmgmui/items/kvc2jqoyriem75ny4mvm6keguy"
              environment: true
        

That's it. Everything else is just ambiently created and deployed with Yoke. Here's all the resources this creates:

  • The OnePasswordItem for the tigris-creds secret
  • The tigris-creds secret containing Tigris credentials to presign sticker URLs
  • The Deployment for the sticker server, listening on port 3000 (unless you override it with the spec.port field)
  • The Service for the sticker server, forwarding traffic from Service port 80 to the Deployment's port 3000
  • An Ingress pointing to port 80 on the Service, with the hostname stickers.xeiaso.net
  • A cert-manager Certificate for stickers.xeiaso.net that gets automatically renewed
  • DNS records for stickers.xeiaso.net pointing to the Ingress

This is a huge improvement over the previous state of things. I got to remove over 150 lines of YAML (that let's be real, I copy-pasted from another manifest in the same repo) and replaced it with a single deterministic program that just does what I want.

Most of the features of the App CustomResource are off by default, but here's an example showing everything off at once:

Click to expand
apiVersion: x.within.website/v1
        kind: App
        metadata:
          name: maximum-settings
        
        spec:
          autoUpdate: true # If true, sets Keel to update images automatically
          image: ghcr.io/xe/x/stickers:latest # The image to run
          logLevel: info # The log level for the app, specific to my apps
          replicas: 3 # The number of replicas to run, defaults to 1
          port: 3000 # The port the app listens on, defaults to 3000, sets PORT and BIND
          runAsRoot: false # If true, runs the app as root, defaults to false
        
          env: # Arbitrary environment variables to set, same as env in a Deployment
            - name: FOO
              value: bar
            - name: BAZ
              value: qux
        
          healthcheck: # Healthcheck configuration for the app, defaults to / on the app's port
            enabled: true
            path: /
            port: 3000
        
          ingress: # Ingress configuration for the app, defaults to off
            enabled: true
            host: maximum-settings.xeiaso.net # The hostname to use for the Ingress
            clusterIssuer: letsencrypt-prod # The cert-manager ClusterIssuer to use for the Ingress
            className: nginx # The Ingress class to use for the Ingress, defaults to nginx
        
          onion: # Tor hidden service configuration for the app, defaults to off
            enabled: true
            nonAnonymous: true # If true, creates a non-anonymous OnionService, defaults to false
            haproxy: true # If true, configures Tor to expose hidden service circuit IDs in haproxy format, defaults to false
            proofOfWorkDefense: false # If true, configures Tor to require proof of work for hidden service connections, defaults to false
        
          storage: # configures a PersistentVolumeClaim for this App
            enabled: true
            path: /data # The path to mount persistent storage to
            size: 10Gi # The size of the persistent storage
            storageClass: longhorn # The storage class to use for the PersistentVolumeClaim
        
          role: # Kubernetes role configuration for this App
            enabled: true
            rules: # The rules to apply to the role
              - apiGroups: [""]
                resources: ["pods"]
                verbs: ["get", "list", "watch"]
        
          secrets: # 1Password secrets to inject into the app
            - name: tigris-creds
              itemPath: "vaults/Kubernetes/items/stickers tigris creds"
              environment: true # If true, injects the secret values as environment variables
            - name: another-secret
              itemPath: "vaults/Kubernetes/items/another secret"
              folder: true # If true, injects the secret values to /run/secrets/another-secret
        

Numa is hacker
Numa

In practice, most Helm values.yaml files end up this complicated with a bunch of if statements to handle all the different permutations of configuration.

This looks like a lot of configuration, but I've found that in practice I only need a very small subset of these options for any given app. Usually I end up needing just the following:

  • autoUpdate if I want the app to automatically update
  • image for the image to run (though I can probably default this based on the name of the App)
  • replicas to define how many instances of the app I want to run
  • env to set environment variables
  • healthcheck to define the healthcheck route
  • ingress to expose the app to the internet
  • secrets to inject secrets from 1Password

The rest is there for when I need it, but off by default so things are simple. This means that creating a new app on one of my clusters is down to this:

  • Write code
  • Build/push docker image
  • Write an App manifest
  • kubectl apply -f app.yaml
  • Share URL with friends

Now sure, I could just make my own operator with operator-sdk or kubebuilder to do this, but that would be way overkill for my needs. Yoke's Air Traffic Control let me take a simple "generate manifests" program and turn it into most of a Kubernetes operator with only a few hours of work.

If you want to check out my App resource, you can find it on GitHub. It's nowhere near ready for production use or for other people to use, but I think it's a great example for how you can use Yoke to reduce the amount of boilerplate you have to write for your infrastructure.

Security

Earlier in the article, I mentioned the fact that Yoke uses WebAssembly as a way to sandbox the code that generates manifests. When I streamed my first reactions last Friday, one of the most common questions I got in the chat was "how do you know that the code you're running isn't malicious?"

So let's take a look at the security model of Yoke. Yoke flights are run in Wazero, a WebAssembly runtime for Go programs. I've used Wazero pretty extensively since it was released and I love it. Yoke flights also target WASI, the WebAssembly System Interface, the POSIX of WebAssembly.

WASI normally has the following restrictions:

  • WASI cannot open outgoing network connections (this is kind of a huge pain in my experience because it limits its usability for me, but it's a great security feature)
  • WASI programs are not assigned filesystem permissions by default
  • WASI programs cannot access the host's environment variables
  • If you mount a filesystem into the WASI program, it can only access that filesystem (be it a chroot in the host filesystem or a user-defined filesystem in code)
  • If you pass a pre-opened socket to the WASI program, it can only access that socket (this does allow for WASI programs to listen on a port to serve HTTP)

Yoke flights are run in Wazero without filesystem access and no sockets are passed to them. Here are the ways that Yoke flights interact with the outside world:

  • Standard input to the yoke takeoff command
  • Standard output from the flight
  • Standard error from the flight
  • Command line flags to the yoke takeoff command
  • Opt-in cluster access allowing a flight to look up resources in the cluster

That last bit sounds like it might be scary, but it's actually way more limited than you'd think: Yoke flights can only access resources that are managed by that flight. For example, imagine a flight that creates a password in a Secret. Cluster access allows the flight to look up the Secret and if it doesn't exist, create it. If the flight doesn't have cluster access, it can't look up the Secret and will either fail to create it or regenerate it every time it runs.

I guess an attacker could make a flight that somehow detects what cluster it's being deployed to via its release name and then does something malicious like running cryptocurrency miners, but I think that in practice this is a pretty limited attack vector. The flight would have to be pretty obvious about what it's doing and it would be pretty easy to detect and stop.

Long-term, this can probably be solved with signature validation of the WebAssembly binaries, but that's a problem for another day. For now, I'm pretty happy with the security model of Yoke.

WebAssembly tangent

One of the cool parts about Yoke is the cluster access feature with WebAssembly. The way that this works is such an elegant hack that I feel like I have to tell someone lest I explode or something.

One of the most annoying problems with embedding WebAssembly programs is handling system calls. If you've never done OS development before, system calls are the way that a program asks the operating system to do something for it, such as reading from or writing to a file. WebAssembly doesn't specify any system calls by default, so you either have to use a standard like WASI or you have to write your own system call interface.

WASI does work, but it doesn't have a good interface for things like reading from Kubernetes resources. Yoke implements cluster access by adding a k8s_lookup system call to the mix. The flow looks like this:

  • Guest runs k8s_lookup with a resource identifier
  • Guest puts the resource identifier into a buffer and sends the pointer to the buffer in the k8s_lookup system call
  • Host reads the buffer from the guest memory, parses it as JSON, and looks up the resource in the cluster
  • Host serializes the resource as JSON in memory
  • Host runs malloc on the guest memory to allocate a buffer for the response
  • Host writes the serialized resource to the buffer
  • Host returns the pointer to the buffer to the guest

The part that made me really take a look at Yoke is the definition of the Buffer type:

type Buffer uint64
        
        func (buffer Buffer) Address() uint32 {
        	return uint32(buffer >> 32)
        }
        
        func (buffer Buffer) Length() uint32 {
        	return uint32((buffer << 32) >> 32)
        }
        

This hack works because of several features of WebAssembly and several limitations of how Go's WebAssembly port works. The first big part of this is the calling convention of WebAssembly. WebAssembly is natively a 32 bit environment, but function arguments are stored in the stack (which is external to linear memory). Function arguments can natively be a few types:

  • 32 bit integers (signed or unsigned)
  • 64 bit integers (signed or unsigned)
  • 32 bit floats
  • 64 bit floats
  • Function references
  • Userdata references

Go's WebAssembly port only allows you to return a single value from a function. This would normally mean that you'd have to return the address and length of the buffer separately, but that's not possible due to Go just not supporting this.

However, it can return 64 bit integers. You can pack two 32 bit integers into a 64 bit integer. This is what the Buffer type does. The "top" 32 bits are the address of the buffer and the "bottom" 32 bits are the length of the buffer. This is a really elegant hack that I'm surprised I haven't seen before.

This means you can allocate memory in the guest, pass the pointer to the host, then the host can read and write to that memory.

I love this hack so much that I'm going to use it in my own projects. I've been thinking out building a (unary only) gRPC client function via something like this and I think it's gonna be a lot of fun.

Conclusion

Yoke is really exciting and I can't wait to see how it develops. I think that this has a lot of potential to make your infrastructure as code actually code and I'm excited to see where it goes. I hope I'll be able to get a coffee with the maintainer at some point.

Anubis Update: February 2025

2025-02-24 08:00:00

Hi all! I've been busy working on Anubis and I'm excited to share the new features and enhancements I've added. I wish I could have gotten around to this sooner, but I've been doing research into AI browser operators and doing some futile attempts to detect them statically in code. I'll share more about that in a future post, once I have more to show.

Here's what I've gotten done since the first release of Anubis:

  • Bot policy file
  • DNSBL checking

A failed experiment: video element detection

Earlier this month, I tried to add the first browser environment check to Anubis, a simple test that makes sure your browser can render video elements correctly. I thought that it would be a fairly easy way to check if someone was using a headful browser, but it turns out that iOS Safari doesn't support the kind of .mp4 video that I was using for the test. I'm going to figure something out, but I'll have to figure out how to automate testing on iOS Safari first.

The basic premise of the test is that it checks if the video element is supported by the browser and if it can actually load a video file. My assumption is that a lot of the headless browsers will be set up in environments that don't have all those codecs installed (because they use a fair bit of space), so they won't be able to load the video file. I'm going to try a different video format and see if that works better.

Bot policy JSON file

This is the biggest feature I shipped in Anubis this month. Previously I hardcoded some "sensible default behavior" into Anubis. This allowed me to get the project off the ground, but it meant that Anubis wouldn't fire on RSS feeds or other "low risk" requests by default. I wanted to make it easier for users to customize how Anubis reacts to different types of requests, so I added a bot policy JSON file. The bot policy allows users to define rules that better suit their specific needs and environments. Here's an example that allows GoogleBot, but blocks ChatGPT:

{
          "bots": [
            {
              "name": "googlebot",
              "user_agent_regex": "\\+http\\:\\/\\/www\\.google\\.com/bot\\.html",
              "action": "ALLOW"
            },
            {
              "name": "chatgpt",
              "user_agent_regex": "\\+https\\:\\/\\/openai\\.com\\/gptbot",
              "action": "DENY"
            },
            {
              "name": "generic-browser",
              "user_agent_regex": "Mozilla",
              "action": "CHALLENGE"
            }
          ]
        }
        

I have more documentation about this in the Bot policy JSON documentation.

DNSBL checking

I've also added DNS blocklist support to Anubis. If you enable it in the policy file, this checks every client's IPv4 or IPv6 address in DroneBL. If the client is on the blocklist, Anubis will block the request. This is a great way to block known bad actors from accessing your site.

I plan to make this support custom DNS blocklists in the future (such as the Tor exit node blocklist), but for now DroneBL will help cut out a lot of the most abusive hosts on the internet.

To enable DNSBL checking, add dnsbl: true to your bot policy JSON file. This is on by default if you don't have a bot policy file.

Half-baked forward thinking idea: remote updating checker via WebAssembly

I've been thinking about how to make Anubis better able to react to the constantly changing landscape of AI scrpaers. I want to be able to define additional checks via WebAssembly binaries that Anubis downloads and runs. This would allow me to ship new checks without having to wait for you to update Anubis.

I need to work out a lot of the details here, but I think that I'd have a few calls that the host would make to the WebAssembly binary:

  • ListChallenges() -> []Challenge
  • CheckChallenge(Challenge, Input) -> bool
  • CheckIP(IP) -> bool

This would allow me to define new challenges (such as the video element challenge) and deploy them without having to update Anubis. I'm going to work more on this in the near future, but I wanted to share the idea with you all.

This would be an opt-in feature and would require a lot of trust in me to not abuse the feature. I'm going to work on a way to make this as secure as possible.

Half-baked idea: allow users that don't have JavaScript enabled to bypass the challenge

The current version of Anubis requires JavaScript to be enabled to pass the challenge. I've been thinking about how to allow users that don't have JavaScript enabled to bypass the challenge. One of the more terrible ideas I've had is to give non-JS users a HTML form that asks them to write the name of something orange. Surprisingly, this works way better than you'd think. I've done the pricing logic with a few models and assuming that you prevent user input to about 64 characters, you can get a fairly high true positive rate for absolutely negligible amounts of money.

I'm going to experiment with this idea more and see about implementing it in a future release. This will be opt-in and not on by default.

Conclusion

That's what I've been up to! Thanks for reading and following Anubis. I'm surprised that there's been so much uptick in the project. I'm excited to see where it goes next. If you have any questions, feedback, or ideas, please make an issue on GitHub!