MoreRSS

site iconXe IasoModify

Senior Technophilosopher, Ottawa, CAN, a speaker, writer, chaos magician, and committed technologist.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Xe Iaso

Nomadic Infrastructure Design for AI workloads

2025-01-27 08:00:00

How do you design a production-ready AI system to maximize effectiveness per dollar? How do you manage and reduce dependency lock-in? Moreover, how do you separate concerns between your compute, network, and storage? In this talk I'll be covering all of that and showing you how to design a production-worthy AI setup that lets you be nomadic between providers, hunting down deals as easily as possible.

Video

Want to watch this in your video player of choice? Take this:
https://cdn.xeiaso.net/file/christine-static/talks/2025/nomadic-compute/index.m3u8

Transcript

Cadey is coffee
<Cadey>

This is spoken word. It is not written like I write blogposts. It is reproduced here for your convenience.

The title slide of the talk. It shows the speaker name and the title.
The title slide of the talk. It shows the speaker name and the title.

Hi, I'm Xe. I work at Tigris Data, and I'm going to talk about the concept of nomadic infrastructure design for your AI workloads.

This is not a product demo.
This is not a product demo.

But disclaimer, this is not a product demo.

(Audience cheers)

This is thought leadership, which is a kind of product, I guess.

The three parts of a workload: compute, network, and storage.
The three parts of a workload: compute, network, and storage.

A workload has three basic parts. Compute, network, and storage. Compute is the part that does the number crunching or the linear algebra. The network is what connects all our computers together. It's why we have to update everything every fifth femtosecond. And storage is what remembers things for next time.

This is what you're billed on over time.

As I've been messing with new providers and trying to find cheap hacks to get my AI stuff working at absurdly low prices, I found a really weird thing.

Compute time is cheaper than storage time.
Compute time is cheaper than storage time.

Compute time is cheaper than storage time.

I don't know why this is the case. With Vast.ai, RunPod, all these bid-acquired GPU markets; spending time downloading things is cheaper than storing them for the next run.

Pricing details for a random 4090 in South Carolina.
Pricing details for a random 4090 in South Carolina.

Like, look at this. I selected a $40.90 in South Carolina at random. It costs two pennies per hour to run with 50 GB of local storage. Keeping that data around is one penny per hour. That's half of the price of the instance. Sure, there's probably some...creative financial decisions that go on into pricing things like this.

But if it takes 30 seconds to boot it and it costs like two cents an hour, it costs more to store things than it does to not store things. Really weird thing to think about.

How to cheat at infrastructure design.
How to cheat at infrastructure design.

So let's learn how to cheat an infrastructure design and find out why I am not allowed to be an SRE anymore. Asterisk.

A graph of Bluesky user activity.
A graph of Bluesky user activity.

So, the first thing that you can do is scale to zero because people don't use workloads when they're asleep. This graph has a sinusoidal wave and it's from bluesky when they blew up late last year. There's a peak in the middle of American daytime and then it all goes down to very low as the Americans go to sleep.

If you've ever worked in SRE stuff, you see this all the time. This is what your request rate looks like. This is what your active user account looks like. This is what healthy products look like. So if you just make your service turn off when nobody's using it, you already save 12 hours of runtime per day.

A green-haired anime woman immolating money and laughing.
A green-haired anime woman immolating money and laughing.

Like, remember, it may not be your money, but money is expensive now. The free tier is going to end. At some point, the hype will die out and the price of compute will reflect the price of acquiring the hardware.

Your AI workloads are dependencies. Without those workloads, your product is doomed. Those who control the infrastructure spice, control the infrastructure universe or whatever Frank Herbert said in Dune.

Tradeoffs

The tradeoffs.
The tradeoffs.

So when you're cheating, it's all about making trade-offs. There are several factors that come into mind, but in my view, the biggest one is time because that's what you're billed on.

A list of the steps involved in a cold start of an AI workload.
A list of the steps involved in a cold start of an AI workload.

Specifically, cold start time or the time that it takes to go from the service not running to the service running. Here's an example of all of the steps involved in running a workload on some cloud provider somewhere.

Statistically, Docker is the universal package format of the internet. It's going to be in a Docker image that has to be pulled and video stuff is like gigabytes of random C++ libraries and a whole bunch of bytecode for GPUs that you don't have, but has to ship around anyway because who knows, you might run it on a 2060.

That gets pulled, extracted, it gets started. Your app boots up, realizes, "Oh, I don't have any models. I need to pull them down."

And then that time that it takes from pulling the models to loading the models is time where you're on the clock doing nothing useful. But once you get to the point where the models are loaded, you can inference them, do whatever it is and somehow make profit. But everything above that inference model step is effectively wasted time.

Depending on the platform you're using, this can cost you money doing nothing.

A perfectly normal drawing of Sonic the Hedgehog.
A perfectly normal drawing of Sonic the Hedgehog.

How can we make it fast? How can we give our infrastructure Sanic speed? Users don't care if you're trying to cheap out. They care about responsiveness. There's two ways to handle this and both are different ways of cheating.

Batch operations.
Batch operations.

One of the biggest ways to cheat is to make your workloads happen on a regular basis where you can do a whole bunch of stuff en masse. This is called batch operations. This is how the US financial system works. This is a horrifying thing. You bundle everything up into big batches and do them every 6, 12, 24 hours, whatever father time says you should do.

This is great. Let's say you have a wallpaper of the day app and you want to have it every wallpaper generated by AI for some reason. Statistically, if there's the wallpaper of the day, you don't need to run it more than once a day. So you can just have it cron job, start it up, generate the wallpaper, put it into storage somewhere. Mark it as ready for the world after it passes some basic filtering. Bob's your uncle, you're good.

This lets you run the most expensive part of your app on pennies for the dollar using any model that you want that you have the bytes for. So that you can't have your upstream infrastructure provider say, "Oh, we're going to turn off the model you're using. Good luck!"

Speed up downloads.
Speed up downloads.

But the other way to cheat is to speed up the cold start process. Let's look at that list again.

Another copy of the list of cold start operations.
Another copy of the list of cold start operations.

Pulling models is the slowest part because that's usually done by your Python program and Python is still single threaded in Anno Dominium two thousand and twenty-five. Your app has to sit there doing nothing waiting for the model to pull and get ready. This can take minutes if you're unlucky and take tens of minutes if you're really unlucky.

What if you could cheat by doing it in a phase where you're not billed? You could just put it into the Docker image with the runtime, right? So I did this and to my horror, it worked kind of well.

There's just like many problems.

Docker hates this
Docker hates this

Number one, Docker hates this. Docker absolutely despises this because the way that Docker works is that it's a bunch of tar balls in a trench coat, right? In order to pull a Docker image, you have to extract all the tar balls. It can only extract one of the tar balls at once because tar balls are weird.

And if you have a Flux dev, that's like a 12 billion parameter model. So we're talking about like 26 gigabytes of floating point numbers, including the model, the autoencoder and whatever else it has.

But this isn't time you have to pay for, but it is time that users may notice. But we're cheating, so you could just do it for batch operations.

If you want to do this anyways, here's a trick I learned:

Model weights don't change often. So what you can do is you can make a separate Docker image that has all of the model weights and then link those model weights into your runtime image.

FROM anu-registry.fly.dev/models/waifuwave AS models
        
        FROM anu-registry.fly.dev/runners/comfyui:latest
        
        COPY --link --from=models /opt/comfyui/models/checkpoints /opt/comfyui/models/checkpoints
        COPY --link --from=models /opt/comfyui/models/embeddings /opt/comfyui/models/embeddings
        COPY --link --from=models /opt/comfyui/models/loras /opt/comfyui/models/loras
        COPY --link --from=models /opt/comfyui/models/vae /opt/comfyui/models/vae
        
Aoi is facepalm
<Aoi>

This works. I'm horrified.

You get to reuse these models between that because if you have a base stable diffusion checkpoint and each LoRA in a separate layer, you can just have those be there in the image by default. And if you need to download a separate LoRa, you can do that at runtime and only have to download like 150 megs instead of like 5 gigs. That's a lot faster.

And you can also reuse them between projects or workloads, which might be preferable depending on what you're doing.

The Docker Hub hates this
The Docker Hub hates this

Other big problem when you're doing this, the Docker Hub will not allow this. It has a maximum layer size of like 10 gigabytes and maximum image size of 10 gigabytes. And my testing that uses stable diffusion 1.5 from 2023 is an 11 gigabyte image.

GitHub's container registry barely tolerated it. I had to use my own registry. It's not that hard. Registries are basically asset flipping S3 and I work for a company that that is basically S3. So this is easy to do and I can tell you how to do it after the talk. I have stickers.

The upsides of doing this
The upsides of doing this

But the biggest upside of doing this horrific horrific crime is that your one deploy artifact has both your application code and your weights. This is something that doesn't sound like a big advantage until you've had your model get removed from hugging face or Civitai. And then you have a production incident that you can't easily resolve because nobody has the model cached.

Numa is disgust
<Numa>

Ask me how I know.

The two of them meme edited to be 'one of them'
The two of them meme edited to be 'one of them'

And because there's just one of them, you don't have multiple artifacts to wrangle. You don't have to like have extra logic to download weights. It's amazing how much code you don't have to write when you don't have to write it.

The Nomadic Compute cover image having a robot hunting down deals
The Nomadic Compute cover image having a robot hunting down deals

But this is the key idea in a nomadic compute setup. Your workload ships with everything it needs so that it can start up quickly, head out to hunt whatever deals it can, get the job done and then head back to the cave to slumber or something. The metaphor fell apart. I'm sorry.

You also don't need to be beholden to any cloud provider because if you can execute AMD 64 byte code and you have an Nvidia GPU and there's a modern ish version of CUDA, it doesn't matter. Everything else is fungible. The only way that you'd really be locked in is if you're using local storage and remember we're trying to save money. So we're not.

So you can just use tools like Skypilot. It just works.

Live demo
Live demo

Okay, so let's tempt God.

I am a very good at web design, so this is an HTML 1.0 form. My demo is a button on a page and if you click the button, you get anime women:

A profile shot of a brown-haired anime woman looking up to the sky, made with Counterfeit v3.0
A profile shot of a brown-haired anime woman looking up to the sky, made with Counterfeit v3.0

See that was that was hallucinated by a GPU that spun up on demand and it'll shut down when we're done. I'm glad that worked.

List of special thanks
List of special thanks

Special thanks to all these people. You know what you did if you're on this list. You know what you didn't if you're not.

Final slide with Xe's contact info
Final slide with Xe's contact info

And with that, I've been Xe. If you have any questions, please ask. I don't bite.

GHSA-56w8-8ppj-2p4f: Bot protection bypass in Anubis

2025-01-26 08:00:00

Hey all. I screwed up with part of how I made Anubis, and as a result I have both fixed it and am filing this CVE to explain what went wrong and how it was fixed. This is GHSA-56w8-8ppj-2p4f.

This requires a sophisticated attacker to target a server running Anubis. I suspect that the only instances of this in the wild were the ones done by the reporter as a proof of concept and in my testing.

Vulnerability details

These details have been copied from GHSA-56w8-8ppj-2p4f.

CVSS score: 2.3 (CVSS:4.0/AV:N/AC:H/AT:N/PR:L/UI:N/VC:L/VI:N/VA:N/SC:N/SI:N/SA:N)

Weakness: CWE-807: Reliance on Untrusted Inputs in a Security Decision

Vulnerable version: anything older than v1.11.0-37-gd98d70a

Patched version: v1.11.0-37-gd98d70a and newer

Context

Anubis is a tool that allows administrators to protect bots against AI scrapers through bot-checking heuristics and a proof-of-work challenge to discourage scraping from multiple IP addresses. For more information about Anubis, see Anubis' README.md.

Impact

A sophisticated attacker (or scraper runner) that is targeting a website that uses Anubis can easily bypass the bot protection mechanisms.

This requires a targeted attack.

Patches

Pull the most recent Docker image in order to be sure you have upgraded past commit e09d0226a628f04b1d80fd83bee777894a45cd02.

Workarounds

There are no known workarounds at this time. Users must upgrade to fix this issue.

Details

Anubis works by having a client request a challenge value with a given difficulty, then the client performs proof-of-work to create a sha-256 hash matching that difficulty. Before commit e09d0226a628f04b1d80fd83bee777894a45cd02, the client sent the difficulty it used back to the server and the server used that untrusted value to make an allow/deny decision.

This has been fixed by using the difficulty value set by the administrator in Anubis' configuration flags when making said allow/deny decisions.

GReeTZ

Thank you Coral Pink for reporting this issue.

The Techaro security issue reporting policy

At Techaro, we believe in total honesty in how we handle security issues. We try our best to not make vulnerable code, but inevitably we will mess up and do it by accident. When we do, we will be: transparent, honest, high-signal, and handle the situation like professional adults. We will value the time of security researchers.

At times, we will fail at this mission. The real thing we are measuring is not the number of times that it happens, but how we react when it does happen. This is why we are openly and honestly reporting this issue.

When things do fail, we will create regression tests to ensure that those failures do not repeat themselves. The testing for Anubis is currently private, but in the interest of transparency here is the test that we added to that repo to handle this regression:

func TestFakeChallengeDifficulty(t *testing.T) {
        	cli, err := anubis.New(*testServerURL)
        	if err != nil {
        		t.Fatal(err)
        	}
        
        	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
        	defer cancel()
        
        	chall, err := cli.MakeChallenge(ctx)
        	if err != nil {
        		t.Fatal(err)
        	}
        
        	nonce := 42069
        
        	response, err := sha256sum(fmt.Sprintf("%s%d", chall.Challenge, nonce))
        	if err != nil {
        		t.Fatal(err)
        	}
        
        	if err := cli.PassChallenge(ctx, anubis.PassChallengeRequest{
        		Response:    response,
        		Nonce:       nonce,
        		Redir:       "https://xeiaso.net",
        		ElapsedTime: 420,
        		Difficulty:  0,
        	}); err != nil {
        		sce, ok := err.(*anubis.StatusCodeErr)
        		if !ok {
        			t.Fatal(err)
        		}
        		if sce.Got != http.StatusForbidden {
        			t.Fatalf("wrong status code, should have forbidden auth bypas: want: %d, got: %d", sce.Want, sce.Got)
        		}
        	}
        	return
        }
        

Thank you for following the development of Anubis.

Life pro tip: Oracle Linux is the best local VM for MacBooks

2025-01-23 08:00:00

Part of working on Anubis means that I need a local Linux environment on my MacBook. Ideally, I want Kubernetes so that I have a somewhat cromulent setup. Most of my experience using a local Kubernetes cluster on a MacBook is with Docker Desktop. I have a love/hate relationship with Docker Desktop. Historically it's been a battery hog and caused some really weird issues.

I tried to use Docker Desktop on my MacBook again and not only was it a battery hog like I remembered; whenever the Kubernetes cluster is running the machine fails to go to sleep when I close it. I haven't been able to diagnose this despite help from mac expert friends in an infosec shitposting slack. I've resigned myself to just shutting down the Docker Desktop app when I don't immediately need Docker.

I have found a solution thanks to a very unlikely Linux distribution: Oracle Linux. Oracle Linux is downstream of Red Hat Enterprise Linux, and more importantly they ship a "no thinking required" template for UTM. Just download the aarch64 UTM image from their cloud images page, extract it somewhere, rename the .utm file to the name of your VM, double click, copy the password, log in, change your password on first login, and bam. You get a Linux environment.

It is glorious.

Additionally, k3s works seamlessly on it. Just run the curl2bash, copy /etc/rancher/k3s/k3s.yaml to your ~/.kube/config (or change the IP address in the file and install it to your MacBook via a bridged network), and you have a fully working Kubernetes cluster with Traefik preinstalled.

They also have a HelmChart custom resource that lets you install Helm releases declaratively. Here's how my VM gets cert-manager:

apiVersion: helm.cattle.io/v1
        kind: HelmChart
        metadata:
          name: cert-manager
          namespace: kube-system
        spec:
          repo: https://charts.jetstack.io
          chart: cert-manager
          targetNamespace: cert-manager
          createNamespace: true
          set:
            installCRDs: "true"
            "prometheus.enabled": "false"
        

I love it.

The best part is that this setup is more complicated than the Docker Desktop VM, yet it sips battery life. Opening the Docker Desktop app can cause my MacBook's fans to spin up and stay on at a dull roar. Oracle Linux in UTM leaves the fans silent and doesn't show up in the top energy users list.

This is frankly nuts and I'm going to be taking advantage of this as much as I can for local development.

I need to figure out a good way to run a Docker registry in the k3s node or something so I can do builds and test runs on an airplane, but this is a solveable issue with enough time and effort.

I'm still just flabbergasted at how well put together Oracle Linux is though, it's very minimal, but very well documented on Oracle's site. I don't know if I'd feel comfortable using it in prod yet, but I'm very happy with it.

Update MinIO to account for AWS SDK changes

2025-01-22 08:00:00

Recent AWS SDK changes have broken many non-S3 object stores. MinIO has released RELEASE.2025-01-20T14-49-07Z that fixes compatibility with these new SDK versions.

If you do not update your MinIO instance, you will get error messages like this when using the AWS CLI, boto3, SDK for JavaScript, SDK for Java, or SDK for PHP:

upload failed: ./pvc.yaml to s3://radical/pvc.yaml An error occurred (MissingContentLength) when calling the PutObject operation: You must provide the Content-Length HTTP header.
        

Good luck!

Block AI scrapers with Anubis

2025-01-19 08:00:00

AI scrapers have been bullying the internet into oblivion and there's not much we can do about it. The well-behaved bots will relent when you ask them to, add entries to your robots.txt (even though they should understand the intent behind a wildcard), or block their user agents.

A majority of the AI scrapers are not well-behaved, and they will ignore your robots.txt, ignore your User-Agent blocks, and ignore your X-Robots-Tag headers. They will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link viewing the same pages over and over and over and over. Some of them will even click on the same link multiple times in the same second. It's madness and unsustainable.

I got tired of this and made a tool to stop them for good. I call it Anubis. Anubis weighs the soul of your connection using a sha256 proof-of-work challenge in order to protect upstream resources from scraper bots. It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.

Numa is smug
<Numa>

You know it's good when the description references The Book of the Dead.

To test Anubis, click here.

If you want to protect your Gitea, Forgejo, or other self-hosted server with Anubis, check out the instructions on GitHub.

If you would like to purchase commercial support for Anubis including an unbranded or custom branded version (namely one without the happy anime girl), please contact me.

How Anubis works

Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.

The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing "Mozilla". Nearly all AI scrapers (and browsers) use a User-Agent string that includes "Mozilla" in it. This means that Anubis is able to block nearly all AI scrapers without any configuration.

Aoi is wut
<Aoi>

Doesn't that mean that you're allowing any AI scraper that simply chooses to not put "Mozilla" in their User-Agent string?

Cadey is coffee
<Cadey>

Well, yes, but that's a very small number of AI scrapers. Most of them want to appear as a browser to get around the most basic of bot protections because a lot of servers have dubious logic around "Mozilla" being in the User-Agent string. It's a bit of a hack, but it works way better than should be expected.

At a super high level, Anubis follows the basic idea of hashcash. In order to prevent spamming the protected service with requests, the client needs to solve a mathematical operation that takes a certain amount of time to compute, but can be validated almost instantly. The answer is stored as a signed JWT token in an HTTP cookie, and the client sends this token with every request to the protected service. The server will usually validate the signature of the token and allow it through, but the server will also randomly select the token for secondary screening. If the token is selected for secondary screening, the server will validate the proof-of-work and allow the request through if everything checks out.

Challenges are stored on the client for one week, requiring the client to solve a new challenge once per week. This is to balance out the inconvenience of solving a challenge with protecting the server from aggressive scrapers.

If any step in the validation fails, the cookie is removed and the client is required to solve the proof-of-work challenge again. This is to prevent the client from reusing a token that has been invalidated.

Anubis also relies on modern web browser features:

  • ES6 modules to load the client-side code and the proof-of-work challenge code.
  • Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread.
  • Fetch API to communicate with the Anubis server.
  • Web Cryptography API to generate the proof-of-work challenge.

This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start.

This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.

This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.

The gorey details

Anubis decides to present a challenge using this logic:

  1. If the client has a User-Agent that does not contain "Mozilla", the client is allowed through.
  2. If the client does not have a cookie with a valid JWT token, the client is presented with a challenge.
  3. If the cookie is expired, the client is presented with a challenge.
  4. If the client is not selected for secondary screening, the client is allowed through.
  5. If the client is selected for secondary screening, server re-validates the proof-of-work and allows the client through if everything checks out.
The above logic in flowchart form.
The above logic in flowchart form.

When you get requested to solve a challenge, a HTML page is served. It references JavaScript code that is loaded as an ES6 module. The server is asked for a challenge, and then the client goes ham making a SHA256 hash of the challenge and a nonce until the hash has a certain number of leading zeroes. This is the proof-of-work challenge. The client then sends the answer to the server, and the server validates the answer. If the answer is correct, the server signs a JWT token and sends it back to the client in an HTTP cookie. The client then sends this cookie with every request to the server.

The above logic in flowchart form.
The above logic in flowchart form.

Challenges are SHA-256 sums of user request metadata. The following inputs are used:

  • Accept-Encoding: The content encodings that the requestor supports, such as gzip.
  • Accept-Language: The language that the requestor would prefer the server respond in, such as English.
  • X-Real-Ip: The IP address of the requestor, as set by a reverse proxy server.
  • User-Agent: The user agent string of the requestor.
  • The current time in UTC rounded to the nearest week.
  • The fingerprint (checksum) of Anubis' private ED25519 key.

This forms a fingerprint of the requestor using metadata that any requestor already is sending. It also uses time as an input, which is known to both the server and requestor due to the nature of linear timelines. Depending on facts and circumstances, you may wish to disclose this to your users.

Anubis uses an ed25519 keypair to sign the JWTs issued when challenges are passed. Anubis will generate a new ed25519 keypair every time it starts. At this time, there is no way to share this keypair between instance of Anubis; but that will be addressed in future releases.

Setting up Anubis

Anubis is meant to sit between your reverse proxy (such as Nginx or Caddy) and your target service. One instance of Anubis must be used per service you are protecting.

Anubis is shipped in the Docker image ghcr.io/xe/x/anubis:latest. Other methods to install Anubis may exist, but the Docker image is currently the only supported method.

Anubis has very minimal system requirements. I suspect that 128Mi of ram may be sufficient for a large number of concurrent clients. Anubis may be a poor fit for apps that use WebSockets and maintain open connections, but I don't have enough real-world experience to know one way or another.

Anubis uses these environment variables for configuration:

Environment Variable Default value Explanation
BIND :8923 The TCP port that Anubis listens on.
DIFFICULTY 5 The difficulty of the challenge, or the number of leading zeroes that must be in successful responses.
METRICS_BIND :9090 The TCP port that Anubis serves Prometheus metrics on.
SERVE_ROBOTS_TXT false If set true, Anubis will serve a default robots.txt file that disallows all known AI scrapers by name and then additionally disallows every scraper. This is useful if facts and circumstances make it difficult to change the underlying service to serve such a robots.txt file.
TARGET http://localhost:3923 The URL of the service that Anubis should forward valid requests to.

Docker compose

Add Anubis to your compose file pointed at your service:

services:
          anubis-nginx:
            image: ghcr.io/xe/x/anubis:latest
            environment:
              BIND: ":8080"
              DIFFICULTY: "5"
              METRICS_BIND: ":9090"
              SERVE_ROBOTS_TXT: "true"
              TARGET: "http://nginx"
            ports:
              - 8080:8080
          nginx:
            image: nginx
            volumes:
              - "./www:/usr/share/nginx/html"
        

Kubernetes

This example makes the following assumptions:

  • Your target service is listening on TCP port 5000.
  • Anubis will be listening on port 8080.

Attach Anubis to your Deployment:

containers:
          # ...
          - name: anubis
            image: ghcr.io/xe/x/anubis:latest
            imagePullPolicy: Always
            env:
              - name: "BIND"
                value: ":8080"
              - name: "DIFFICULTY"
                value: "5"
              - name: "METRICS_BIND"
                value: ":9090"
              - name: "SERVE_ROBOTS_TXT"
                value: "true"
              - name: "TARGET"
                value: "http://localhost:5000"
            resources:
              limits:
                cpu: 500m
                memory: 128Mi
              requests:
                cpu: 250m
                memory: 128Mi
            securityContext:
              runAsUser: 1000
              runAsGroup: 1000
              runAsNonRoot: true
              allowPrivilegeEscalation: false
              capabilities:
                drop:
                  - ALL
              seccompProfile:
                type: RuntimeDefault
        

Then add a Service entry for Anubis:

# ...
         spec:
           ports:
        +  - protocol: TCP
        +    port: 8080
        +    targetPort: 8080
        +    name: anubis
        

Then point your Ingress to the Anubis port:

   rules:
           - host: git.xeserv.us
             http:
               paths:
               - pathType: Prefix
                 path: "/"
                 backend:
                   service:
                     name: git
                     port:
        -              name: http
        +              name: anubis
        

RPM packages and unbranded (or customly branded) versions are available if you contact me and purchase commercial support. Otherwise your users have to see a happy anime girl every time they solve a challenge. This is a feature.

Conclusion

In a just world, this software would not need to exist. Scraper bots would follow the unspoken rules of the internet and not scrape sites that ask them not to. But we don't live in a just world, and we have to take steps to protect our servers from the bad actors that scrape them. This is why I made Anubis and I hope it helps you protect your servers from the bad actors that scrape them.

Please let me know what you think and if you run into any problems.

Amazon's AI crawler is making my git server unstable

2025-01-17 08:00:00

EDIT(2025-01-18 23:50 UTC):

I wrote a little proxy that does a proof-of-work check before allowing requests to my Gitea server. It's called Anubis and I'll be writing a blog post about it soon.

For now, check it out at https://git.xeserv.us/. It's a little rough around the edges, but it works enough.


EDIT(2025-01-18 19:00 UTC):

I give up. I moved the Gitea server back behind my VPN. I'm working on a proof of work reverse proxy to protect my server from bots in the future. I'll have it back up soon.


EDIT(2025-01-17 17:50 UTC):

I added this snippet to the ingress config:

nginx.ingress.kubernetes.io/configuration-snippet: |
          if ($http_user_agent ~* "(Amazon)" ){
            return 418;
          }
        

The bots are still hammering away from a different IP every time. About 10% of the requests do not have the amazonbot user agent. I'm at a loss for what to do next.

I hate the future.


Hi all. This is a different kind of post. This is not informative. This is a cry for help.

To whoever runs AmazonBot, please add git.xeserv.us to your list of blocked domains. If you know anyone at Amazon, please forward this to them and ask them to forward it to the AmazonBot team.

Should you want to crawl my git server for some reason, please reach out to me so we can arrange for payment for hardware upgrades commensurate to your egregious resource usage.

I don't want to have to close off my Gitea server to the public, but I will if I have to. It's futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more. I just want the requests to stop.

I have already configured the robots.txt file to block all bots:

User-agent: *
        Disallow: /
        

What else do I need to do?