2024-11-13 08:00:00
Hey all!
I'm not going to be posting as much on Twitter/X anymore. I've moved a lot of my online posting to Bluesky. If you want to follow me there, follow @xeiaso.net. You can also follow me on Bluesky via the Fediverse with Bridgy Fed at @[email protected].
I've locked my Twitter account and will not be posting anything there but reminders that I have left. Thank you for following me there all these years, but enough has become enough and I have no real reason to stay there. Bluesky is just a better place for me.
Stay safe out there and have fun on the internets!
2024-11-12 08:00:00
Taco Bell is a miracle of food preparation. They manage to have a menu of dozens of items that all boil down to permutations of 8 basic items: meat, cheese, beans, vegetables, bread, and sauces. Those basic fundamentals are combined in new and interesting ways to give you the crunchwrap, the chalupa, the doritos locos tacos, and more. Just add hot water and they’re ready to eat.
Even though the results are exciting, the ingredients for them are not. They’re all really simple things. The best designed production systems I’ve ever used take the same basic idea: build exciting things out of boring components that are well understood across all facets of the industry (eg: S3, Postgres, HTTP, JSON, YAML, etc.). This adds up to your pitch deck aiming at disrupting the industry-disrupting industry.
A bunch of companies want to sell you inference time for your AI workloads or the results of them inferencing AI workloads for you, but nobody really tells you how to make this yourself. That’s the special Mexican Pizza sauce that you can’t replicate at home no matter how much you want to be able to.
Today, we’ll cover how you, a random nerd that likes reading architectural articles, should design a production-ready AI system so that you can maximize effectiveness per dollar, reduce dependency lock-in, and separate concerns down to their cores. Buckle up, it’s gonna be a ride.
The industry uses like a billion different terms for “unit of compute that has access to a network connection and the ability to store things for some amount of time” that all conflict in mutually incompatible ways. When you read “workload”, you should think about some program that has network access to some network and some amount of storage through some means running somewhere, probably in a container.
At the core, any workload (computer games, iPadOS apps, REST APIs, Kubernetes, $5 Hetzner VPSen, etc.) is a combination of three basic factors:
In reality, these things will overlap a little (compute has storage in the form of ram, some network cards run their own Linux kernel, and storage is frequently accessed over the network), but that still very cleanly maps to the basic things that you’re billed for in the cloud:
And of course, there’s a huge money premium for any of this being involved in AI anything because people will pay. However, let’s take a look at that second basic thing you’re billed for a bit closer:
- Gigabytes egressed over the network
Note that it’s egress out of your compute, not ingress to your compute. Providers generally want you to make it easy to put your data into their platform and harder to get the data back out. This is usually combined with your storage layer, which can make it annoying and expensive to deal with data that is bigger than your local disk. Your local disk is frequently way too small to store everything, so you have to make compromises.
What if your storage layer didn’t charge you per gigabyte of data you fetched out of it? What classes of problems would that allow you to solve that were previously too expensive to execute on?
If you put your storage in a service that is low-latency, close to your servers, and has no egress fees, then it can actually be cheaper to pull things from object storage just-in-time to use them than it is to store them persistently.
In serverless (Lambda) scenarios, most of the time your application is turned off. This is good. This is what you want. You want it to turn on when it’s needed, and turn back off when it’s not. When you do a setup like this, you also usually assume that the time it takes to do a cold start of the service is fast enough that the user doesn’t mind.
Let’s say that your AI app requires 16 gigabytes of local disk space for your Docker image with the inference engine and the downloaded model weights. In some clouds (such as Vast.ai), this can cost you upwards of $4-10 per month to have the data sitting there doing nothing, even if the actual compute time is as low as $0.99 per hour. If you’re using Flux [dev] (12 billion parameters, 25 GB of weight bytes) and those weights take 5 minutes to download, this means that you are only spending $0.12 waiting things to download. If you’re only doing inference in bulk scenarios where latency doesn’t matter as much, then it can be much, much cheaper to dynamically mint new instances, download the model weights from object storage, do all of the inference you need, and then slay those instances off when you’re done.
Most of the time, any production workload’s request rate is going to follow a sinusodal curve where there’s peak usage for about 8 hours in the middle of the day and things will fall off overnight as everyone goes to bed. If you spin up AI inference servers on demand following this curve, this means that the first person of the day to use an AI feature could have it take a bit longer for the server to get its coffee, but it’ll be hot’n’ready for the next user when they use that feature.
You can even cheat further with optional features such that the first user doesn’t actually see them, but it triggers the AI inference backend to wake up for the next request.
When you set up cloud compute, it’s really easy to fall prey to the siren song of the seemingly bottomless budget of the corporate card. At a certain point, we all need to build sustainable business as the AI hype wears off and the free tier ends. However, thanks to the idea of Taco Bell infrastructure design, you can reduce the risk of lock-in and increase flexibility between providers so you can lower your burn rate.
In many platforms, data ingress is free. Data egress is where they get you. It’s such a problem for businesses that the EU has had to step in and tell providers that people need an easy way out. Every gigabyte of data you put into those platforms is another $0.05 that it’ll cost to move away should you need to.
This doesn’t sound like an issue, because the CTO negotiating dream is that they’ll be able to play the “we’re gonna move our stuff elsewhere” card and instantly win a discount and get a fantastic deal that will enable future growth or whatever.
This is a nice dream.
In reality, the sales representative has a number in big red letters in front of them. This number is the amount of money it would cost for you to move your 3 petabytes of data off of their cloud. You both know you’re stuck with eachother, and you’ll happily take an additional measly 5% discount on top of the 10% discount you negotiated last year. We all know that the actual cost of running the service is 15% of even that cost; but the capitalism machine has to eat somehow, right?
Let’s be real, dependencies aren’t fundamentally bad things to have. All of us have a hard dependency on the Internet, amd64 CPUs, water, and storage. Everything’s a tradeoff. The potentially harmful part comes in when your dependency locks you in so you can’t switch away easily.
This is normally pretty bad with traditional compute setups, but can be extra insidious with AI workloads. AI workloads make cloud companies staggering amounts of money, so they want to make sure that you keep your AI workloads on their servers as much as possible so they can extract as much revenue out of you as possible. Combine this with the big red number disadvantage in negotiations, and you can find yourself backed into a corner.
This is why picking your dependencies is such a huge thing to consider. There’s a lot to be said about choosing dependencies to minimize vendor lock-in, and that’s where the Taco Bell infrastructure philosophy comes in:
If you follow these rules, you can easily make your compute nomadic between services. Capitalize on things like Kubernetes (the universal API for cloud compute, as much as I hate that it won), and you make the underlying clouds an implementation detail that can be swapped out as you find better strategic partnerships that can offer you more than a measly 5% discount.
Just add water.
There's an extra evil way that AI models can become production-critical dependencies. Most of the time when you implement an application that uses an AI model, you end up encoding "workarounds" for the model into the prompts you use. This happens because AI models are fundamentally unpredictable and unreliable tools that sometimes give you the output you want. As a result though, changing out models sounds like it's something that should be easy. You just change out the model and then you can take advantage of better accuracy, new features like tool use, or JSON schema prompting, right?
In many cases, changing out a model will result in a service that superficially looks and functions the same. You give it a meeting transcript, it tells you what the action items are. The problem comes in with the subtle nuances of the je ne sais quoi of the experience. Even subtle differences like the current date being in the month of December can drastically change the quality of output. A recent paper from Apple concluded that adding superficial details that wouldn't throw off a human can severely impact the performance of large language models. Heck, they even struggle or fall prey to fairly trivial questions that humans find easy, such as:
If changing the placement of a comma in a prompt can cause such huge impacts to the user experience, what would changing the model do? What would being forced to change the model because the provider is deprecating it so they can run newer models that don't do the job as well as the model you currently use? This is a really evil kind of dependency that you can only get when you rely on cloud-hosted models. By controlling the weights and inference setups for your machines, you have a better chance of being able to dictate the future of your product and control all parts of the stack as much as possible.
Like I said earlier, the three basic needs of any workload are compute, network, and storage. Production architectures usually have three basic planes to support them:
Storage is the sticky bit; it’s not really changed since the beginning. You either use a POSIX-compatible key-value store or an S3 compatible key-value store. Both are used in practically the same ways that the framers intended in the late 80’s and 2009 respectively. You chuck bytes into the system with a name, and you get the bytes back when you give the name.
Storage is the really important part of your workloads. Your phone would not be as useful if it didn’t remember your list of text messages when you rebooted it. Many applications also (reasonably) assume that storage always works, is fast enough that it’s not an issue, and is durable enough that they don’t have to manually make backups.
What about latency? Human reaction time is about 250 milliseconds on average. It takes about 250 milliseconds for a TCP session to be established between Berlin and us-east-1. If you move your compute between providers, is your storage plane also going to move data around to compensate?
If your storage plane doesn’t have egress costs and stores your data close to where it’s used, this eliminates a lot of local storage complexity, at the cost of additional compute time spent waiting to pull things and the network throughput for them to arrive. Somehow compute is cheaper than storage in anno dominium two-thousand twenty-four. No, I don’t get how that happened either.
Part of the secret for how people make these production platforms is that they cheat: they don’t pass around values as much as possible. They pass a reference to that value in the storage plane. When you upload an image to the ChatGPT API to see if it’s a picture of a horse, you do a file upload call and then an inference call with the ID of that upload. This makes it easier to sling bytes around and overall makes things a lot more efficient at the design level. This is a lot like pass-by-reference semantics in programming languages like Java or a pointer to a value in Go.
The other big secret is that there’s a layer on top of all of the compute: an orchestrator with a queue.
This is the rest of the owl that nobody talks about. Just having compute, network, and storage is not good enough; there needs to be a layer on top that spreads the load between workers, intelligently minting and slaying them off as reality demands.
Yeah, yeah, I get it, you want to see this live and in action. I don’t have an example totally ready yet, but in lieu of drawing the owl right now, I can tell you what you’d need in order to make it a reality on the cheap.
Let’s imagine that this is all done in one app, let’s call it orodayagzou (c.f. Ôrödyagzou, Ithkuil for “synesthesia”). This app is both a HTTP API and an orchestrator. It manages a pool of worker nodes that do the actual AI inferencing.
So let’s say a user submits a request asking for a picture of a horse. That’ll come in to the right HTTP route and it has logic like this:
type ScaleToZeroProxy struct {
cfg Config
ready bool
endpointURL string
instanceID int
lock sync.RWMutex
lastUsed time.Time
}
func (s *ScaleToZeroProxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
s.lock.RLock()
ready := s.ready
s.lock.RUnlock()
if !ready {
// TODO: implement instance creation
}
s.lock.RLock()
defer s.lock.RUnlock()
u, err := url.Parse(s.endpointURL)
if err != nil {
panic(err)
}
u.Path = r.URL.Path
u.RawQuery = r.URL.RawQuery
next := httputil.NewSingleHostReverseProxy(u)
next.ServeHTTP(w, r)
s.lock.Lock()
s.lastUsed = time.Now()
s.lock.Unlock()
}
This is a simple little HTTP proxy in Go, it has an endpoint URL and an instance ID in memory, some logic to check if the instance is “ready”, and if it’s not then to create it. Let’s mint an instance using the Vast.ai CLI. First, some configuration:
const (
diskNeeded = 36
dockerImage = "reg.xeiaso.net/runner/sdxl-tigris:latest"
httpPort = 5000
modelBucketName = "ciphanubakfu" // lojban: test-number-bag
modelPath = "glides/ponyxl"
onStartCommand = "python -m cog.server.http"
publicBucketName = "xe-flux"
searchCaveats = `verified=False cuda_max_good>=12.1 gpu_ram>=12 num_gpus=1 inet_down>=450`
// assume awsAccessKeyID, awsSecretAccessKey, awsRegion, and awsEndpointURLS3 exist
)
type Config struct {
diskNeeded int // gigabytes
dockerImage string
environment map[string]string
httpPort int
onStartCommand string
}
Then we can search for potential machines with some terrible wrappers to the CLI:
func runJSON[T any](ctx context.Context, args ...any) (T, error) {
return trivial.andThusAnExerciseForTheReader[T](ctx, args)
}
func (s *ScaleToZeroProxy) mintInstance(ctx context.Context) error {
s.lock.Lock()
defer s.lock.Unlock()
candidates, err := runJSON[[]vastai.SearchResponse](
ctx,
"vastai", "search", "offers",
searchCaveats,
"-o", "dph+", // sort by price (dollars per hour) increasing, cheapest option is first
"--raw", // output JSON
)
if err != nil {
return fmt.Errorf("can't search for instances: %w", err)
}
// grab the cheapest option
candidate := candidates[0]
contractID := candidate.AskContractID
slog.Info("found candidate instance",
"contractID", contractID,
"gpuName", candidate.GPUName,
"cost", candidate.Search.TotalHour,
)
// ...
}
Then you can try to create it:
func (s *ScaleToZeroProxy) mintInstance(ctx context.Context) error {
// ...
instanceData, err := runJSON[vastai.NewInstance](
ctx,
"vastai", "create", "instance",
contractID,
"--image", s.cfg.dockerImage,
// dump ports and envvars into format vast.ai wants
"--env", s.cfg.FormatEnvString(),
"--disk", s.cfg.diskNeeded,
"--onstart-cmd", s.cfg.onStartCommand,
"--raw",
)
if err != nil {
return fmt.Errorf("can't create new instance: %w", err)
}
slog.Info("created new instance", "instanceID", instanceData.NewContract)
s.instanceID = instanceData.NewContract
// ...
Then collect the endpoint URL:
func (s *ScaleToZeroProxy) mintInstance(ctx context.Context) error {
// ...
instance, err := runJSON[vastai.Instance](
ctx,
"vastai", "show", "instance",
instanceData.NewContract,
"--raw",
)
if err != nil {
return fmt.Errorf("can't show instance %d: %w", instanceData.NewContract, err)
}
s.EndpointURL = fmt.Sprintf(
"http://%s:%d",
instance.PublicIPAddr,
instance.Ports[fmt.Sprintf("%d/tcp", s.cfg.httpPort)][0].HostPort,
)
return nil
}
And then finally wire it up and have it test if the instance is ready somehow:
func (s *ScaleToZeroProxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// ...
if !ready {
if err := s.mintInstance(r.Context()); err != nil {
slog.Error("can't mint new instance", "err", err)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
t := time.NewTicker(5 * time.Second)
defer t.Stop()
for range t.C {
if ok := s.testReady(r.Context()); ok {
break
}
}
}
// ...
Then the rest of the logic will run through, the request will be passed to the GPU instance and then a response will be fired. All that’s left is to slay the instances off when they’re unused for about 5 minutes:
func (s *ScaleToZeroProxy) maybeSlayLoop(ctx context.Context) {
t := time.NewTicker(5 * time.Minute)
defer t.Stop()
for {
select {
case <-t.C:
s.lock.RLock()
lastUsed := s.lastUsed
s.lock.RUnlock()
if lastUsed.Add(5 * time.Minute).Before(time.Now) {
if err := s.slay(ctx); err != nil {
slog.Error("can't slay instance", "err", err)
}
}
case <-ctx.Done():
return
}
}
}
Et voila! Run maybeSlayLoop
in the background and implement the slay()
method to use the vastai destroy instance
command, then you have yourself
nomadic compute that makes and destroys itself on demand to the lowest bidder.
Of course, any production-ready implementation would have limits like “don’t
have more than 20 workers” and segment things into multiple work queues. This is
all really hypothetical right now, I wish I had a thing to say you could
kubectl apply
and use right now, but I don’t.
I’m going to be working on this this on my Friday streams on Twitch until it’s done. I’m going to implement it from an empty folder and then work on making it a Kubernetes operator to run any task you want. It’s going to involve generative AI, API reverse engineering, eternal torment, and hopefully not getting banned from the providers I’m going to be using. It should be a blast!
Every workload involves compute, network, and storage on top of production’s compute plane, network plane, and storage plane. Design your production clusters to take advantage of very well-understood fundamentals like HTTP, queues, and object storage so that you can reduce your dependencies to the bare minimum. Make your app an orchestrator of vast amounts of cheap compute so you don’t need to pay for compute or storage that nobody is using while everyone is asleep.
This basic pattern is applicable to just about anything on any platform, not just AI or not just with Tigris. We hope that by publishing this architectural design, you’ll take it to heart when building your production workloads of the future so that we can all use the cloud responsibly. Certain parts of the economics of this pattern work best when you have free (or basically free) egress costs though.
We’re excited about building the best possible storage layer based on the lessons learned building the storage layer Uber uses to service millions of rides per month. If you try us and disagree, that’s fine, we won’t nickel and dime you on the way out because we don’t charge egress costs.
When all of these concerns are made easier, all that’s left for you is to draw the rest of the owl and get out there disrupting industries.
2024-11-09 08:00:00
I think I made a mistake when I decided to put my cards into Kubernetes for my personal setup. It made sense at the time (I was trying to learn Kubernetes and I am cursed into learning by doing), however I don't think it is really the best choice available for my needs.
[...]
My Kubernetes setup is a money pit. I want to prioritize cost reduction as much as possible.
So after a few years of switching between a Hetzner dedi running NixOS and Docker images on Fly.io, I'm crawling back to Kubernetes for hosting my website. I'm not gonna lie, it will look like massive overkill from the outset, but consider this: Kubernetes is standard at this point. It's the boring, pragmatic choice.
Plus, every massive infrastructure crime and the inevitable ways they go horribly wrong only really serves to create more "how I thought I was doing something good but actually really fucked everything up" posts that y'all seem to like. Win/win. I get to play with fun things, you get to read about why I thought something would work, how it actually works, and how you make things meet in the middle.
I've had a really good experience with Kubernetes in my homelab, and I feel confident enough in my understanding of it to move my most important, most used, most valuable to me service over to a Kubernetes cluster. I changed it over a few days ago without telling anyone (and deploying anything, just in case). Nothing went wrong in the initial testing, so I feel comfortable enough to talk about it now.
Hi from the cluster Aeacus! My website is running on a managed k3s cluster via Civo. The cluster is named after one of the space elevators in an RPG where a guy found a monolith in Kenya, realized it was functionally an infinite battery, made a massive mistake, and then ended up making Welsh catgirls real (among other things).
If/when I end up making other Kubernetes clusters in the cloud, they'll probably be named Rhadamanthus and Minos (the names of the other space elevators in said world with Welsh catgirls).
Originally I was going to go with Vultr, but then I did some math on the egress of my website vs the amount of bandwidth I'd get for the cluster and started to raise some eyebrows. I don't do terrifying amounts of egress bandwidth, but sometimes I have months where I'm way more popular than other months and those "good" months would push me over the edge.
I also got a warning from a friend that Vultr vastly oversubscribes their CPU cores, so you get very, very high levels of CPU steal. Most of the time, my CPU cores are either idle or very close to idle; but when I do a build for my website in prod, the entire website blocks until it's done.
This is not good for availability.
When I spun up a test cluster on Vultr, I did notice that the k3s nodes they were using were based on Ubuntu 22.04 instead of 24.04. I get that 24.04 is kinda new and they haven't moved things over yet, but it was kind of a smell that something might be up.
I'm gonna admit, I hadn't heard of Civo cloud until someone in the Kubernetes homelab Discord told me about them, but there's one key thing in their pricing that made me really consider them:
At Civo, data transfer is completely free and unlimited - we do not charge for egress or ingress at all. Allowing you to move data freely between Civo and other platforms without any costs or limitations. No caveats, No fineprint. No surprise bills.
This is basically the entire thing that sold me. I've been really happy with Civo. I haven't had a need to rely on their customer support yet, but I'll report back should I need to.
Worst case, it's all just Kubernetes, I can set up a new cluster and move everything over without too much risk.
That being said, here's a short list of things that in a perfect world I wish I could either control, influence, or otherwise have power over:
aeacus.xeserv.us
so that way the DNS names can be globally unique, enabling me to cross-cluster interconnect it with my homelab and potentially other clusters as my cloud needs expand.And here's a few things I learned about my setup in particular that aren't related to Civo cloud, but worth pointing out:
Either way, I moved over pronouns.within.lgbt to proof-of-concept the cluster beyond a hello world test deployment. That worked fine.
To be sure that things worked, I employed the industry standard "scream test" procedure where you do something that could break, test it to hell on your end, and see if anyone screams about it being down. Coincidentally, a friend was looking through it during the breaking part of the migration (despite my efforts to minimize the breakage) and noticed the downtime. They let me know immediately. I was so close to pulling it off without a hitch.
Like any good abomination, my website has a fair number of moving parts, most of them are things that you don't see. Here's what the infrastructure of my website looks like:
This looks like a lot, and frankly, it is a lot. Most of this functionality is optional and degrades cleanly too. By default, when I change anything on GitHub (or someone subscribes/unsubscribes on Patreon), I get a webhook that triggers the site to rebuild. The rebuild will trigger fetching data from Patreon, which may trigger fetching an updated token from patreon-saasproxy
. Once the build is done, a request to announce new posts will be made to Mi. Mi will syndicate any new posts out to Bluesky, Mastodon, Discord, and IRC.
This, sadly, is an idealized diagram of the world I wish I could have. Here's what the real state of the world looks like:
I have patreon-saasproxy
still hosted on fly.io. I'm not sure why the version on Aeacus doesn't work, but trying to use it makes it throw an error that I really don't expect to see:
{
"time": "2024-11-09T09:12:17.76177-05:00",
"level": "ERROR",
"source": {
"function": "main.main",
"file": "/app/cmd/xesite/main.go",
"line": 54
},
"msg": "can't create patreon client",
"err": "The server could not verify that you are authorized to access the URL requested. You either supplied the wrong credentials (e.g. a bad password), or your browser doesn't understand how to supply the credentials required."
}
I'm gonna need to figure out what's going on later, but I can live with this for now. I connect back to Fly.io using their WireGuard setup with a little sprinkle of userspace WireGuard. It works well enough for my needs.
In the process of moving things over, I found out that there's a Tor hidden service operator for Kubernetes. This is really neat and lets me set up a mirror of this website on the darkweb. If you want or need to access my blog over Tor, you can use gi3bsuc5ci2dr4xbh5b3kja5c6p5zk226ymgszzx7ngmjpc25tmnhaqd.onion to do that. You'll be connected directly over Tor.
I configured this as a non-anonymous hidden service using a setup like this:
apiVersion: tor.k8s.torproject.org/v1alpha2
kind: OnionService
metadata:
name: xesite
spec:
version: 3
extraConfig: |
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1
rules:
- port:
number: 80
backend:
service:
name: xesite
port:
number: 80
This creates an OnionService set up to point directly to the backend that runs this website. Doing this bypasses the request logging that the nginx ingress controller does. I do not log requests made over Tor unless you somehow manage to get one of the things you're requesting to throw an error, even then I'll only log details about the error so I can investigate them later.
If you're already connected with the Tor browser, you may have noticed the ".onion available" in your address bar. This is because I added a middleware for adding the Onion-Location
header to every request. The Tor browser listens for this header and will alert you to it.
I'm not sure how the Tor hidden service will mesh with the ads with Ethical Ads, but I'd imagine that looking at my website over Tor would functionally disable them.
One of the most controversial things about my website's design is that everything was served out of a .zip
file full of gzip streams. This was originally done so that I could implement a fastpath hack to serve gzip compressed streams to people directly. This would save a bunch of bandwidth, make things load faster, save christmas from the incoming elf army, etc.
Guess what I never implemented.
This zipfile strategy worked, for the most part. One of the biggest ways this didn't pan out is that I didn't support HTTP Range requests. Normally this isn't an issue, but Slack, LinkedIn, and other web services use them when doing a request to a page to unfurl links posted by users.
This has been a known issue for a while, but I decided to just fix it forever by making the website serve itself from the generated directory instead of using the zipfile in the line of serving things. I still use the zipfile for the preview site (I'm okay with that thing's functionality being weird), but yeah, it's gone.
If I ever migrate my website to use CI to build the website instead of having prod build it on-demand, I'll likely use the zipfile as a way to ship around the website files.
Like any good Xe project, I had to commit some crimes somewhere, right? This time I implemented them at the storage layer. My website works by maintaining a git clone of its own repository and then running builds out of it. This is how I'm able to push updates to GitHub and then have it go live in less than a minute.
The main problem with this is that it can make cold start times long. Very long. Long enough that Kubernetes will think that the website isn't in a cromulent state and then slay it off before it can run the first build. I fixed this by making the readiness check run every 5 seconds for 5 minutes, but I realized there was a way I could do it better: I can cache the website checkout on the underlying node's filesystem.
So I use a hostPath
volume to do this:
- name: data
hostPath:
path: /data/xesite
type: DirectoryOrCreate
Isn't this a very bad idea?
Using the hostPath volume type presents many security risks. If you can avoid using a hostPath volume, you should. For example, define a local PersistentVolume, and use that instead.
Shouldn't you use a PersistentVolumeClaim instead?
Normally, yes. This is a bad idea. However, a PersistentVolumeClaim doesn't really work for this due to how the Civo native Container Storage Interface works. They only support the ReadWriteOnce access mode, which would mean that I can only have my website running on one Kubernetes node at once. I'd like my website to be more nomadic between nodes, so I need to make it a ReadWriteMany mount so that the same folder can be used on different nodes.
I'll figure out a better solution eventually, but for now I can get away with just stashing the data in /data/xesite
on the raw node filesystems and it'll be fine. My website doesn't grow at a rate where this would be a practical issue, and should this turn out to actually be a problem I can always reprovision my nodes as needed.
I'm pretty sure that this is way more than good enough for now. This should be more than enough for the next few years of infrastructure needs. Worst case though, it's just Kubernetes. I can move it anywhere else that has Kubernetes without too much fuss.
I'd like to make the Deno cache mounted in Tigris or something using csi-s3, but that's not a priority right now. This would only help with cold start latency, and to be honest the cold start latency right now is fine. Not the most ideal, but fine.
Everything else is just a matter of implementation more than anything at this point.
Hope this look behind the scenes was interesting! I put this level of thought and care into things so that you don't have to care about how things work.
2024-11-03 08:00:00
I'm setting up some cloud Kubernetes clusters for a bit coming up on the blog. As a result, I need some documentation on what a "standard" cluster looks like. This is that documentation.
Every Kubernetes term is WrittenInGoPublicValueCase. If you aren't sure what one of those terms means, google "site:kubernetes.io KubernetesTerm".
I'm assuming that the cluster is named mechonis
.
For the "core" of a cluster, I need these services set up:
These all complete different aspects of the three core features of any cloud deployment: compute, network, and storage. Most of my data will be hosted in the default StorageClass implementation provided by the platform (or in the case of baremetal clusters, something like Longhorn), so the csi-s3 StorageClass is more of a "I need lots of data but am cheap" than anything.
Most of this will be managed with helmfile, but 1Password can't be.
The most important thing at the core of my k8s setups is the 1Password operator. This syncs 1password secrets to my Kubernetes clusters, so I don't need to define them in Secrets manually or risk putting the secret values into my OSS repos. This is done separately as I'm not able to use helmfile
After you have the op
command set up, create a new server with access to the Kubernetes
vault:
op connect server create mechonis --vaults Kubernetes
Then install the 1password connect Helm release with operator.create
set to true
:
helm repo add \
1password https://1password.github.io/connect-helm-charts/
helm install \
connect \
1password/connect \
--set-file connect.credentials=1password-credentials.json \
--set operator.create=true \
--set operator.token.value=$(op connect token create --server mechonis --vault Kubernetes)
Now you can deploy OnePasswordItem resources as normal:
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
name: falin
spec:
itemPath: vaults/Kubernetes/items/Falin
In the cluster folder, create a file called helmfile.yaml
. Copy these contents:
repositories:
- name: jetstack
url: https://charts.jetstack.io
- name: csi-s3
url: cr.yandex/yc-marketplace/yandex-cloud/csi-s3
oci: true
- name: ingress-nginx
url: https://kubernetes.github.io/ingress-nginx
- name: metrics-server
url: https://kubernetes-sigs.github.io/metrics-server/
releases:
- name: cert-manager
kubeContext: mechonis
chart: jetstack/cert-manager
createNamespace: true
namespace: cert-manager
version: v1.16.1
set:
- name: installCRDs
value: "true"
- name: prometheus.enabled
value: "false"
- name: csi-s3
kubeContext: mechonis
chart: csi-s3/csi-s3
namespace: kube-system
set:
- name: "storageClass.name"
value: "tigris"
- name: "secret.accessKey"
value: ""
- name: "secret.secretKey"
value: ""
- name: "secret.endpoint"
value: "https://fly.storage.tigris.dev"
- name: "secret.region"
value: "auto"
- name: ingress-nginx
chart: ingress-nginx/ingress-nginx
kubeContext: mechonis
namespace: ingress-nginx
createNamespace: true
- name: metrics-server
kubeContext: mechonis
chart: metrics-server/metrics-server
namespace: kube-system
Create a new admin access token in the Tigris console and copy its access key ID and secret access key into secret.accessKey
and secret.secretKey
respectively.
Run helmfile apply
:
$ helmfile apply
This will take a second to think, and then everything should be set up. The LoadBalancer Service may take a minute or ten to get a public IP depending on which cloud you are setting things up on, but once it's done you can proceed to setting up DNS.
The next kinda annoying part is getting external-dns set up. It's something that looks like it should be packageable with something like Helm, but realistically it's such a generic tool that you're really better off making your own manifests and deploying it by hand. In my setup, I use these features of external-dns:
You will need two DynamoDB tables:
external-dns-mechonis-crd
: for records created with DNSEndpoint resourcesexternal-dns-mechonis-ingress
: for records created with Ingress resourcesCreate a terraform configuration for setting up these DynamoDB configuration values:
terraform {
backend "s3" {
bucket = "within-tf-state"
key = "k8s/mechonis/external-dns"
region = "us-east-1"
}
}
resource "aws_dynamodb_table" "external_dns_crd" {
name = "external-dns-crd-mechonis"
billing_mode = "PROVISIONED"
read_capacity = 1
write_capacity = 1
table_class = "STANDARD"
attribute {
name = "k"
type = "S"
}
hash_key = "k"
}
resource "aws_dynamodb_table" "external_dns_ingress" {
name = "external-dns-ingress-mechonis"
billing_mode = "PROVISIONED"
read_capacity = 1
write_capacity = 1
table_class = "STANDARD"
attribute {
name = "k"
type = "S"
}
hash_key = "k"
}
Create the tables with terraform apply
:
terraform init
terraform apply --auto-approve # yolo!
While that cooks, head over to ~/Code/Xe/x/kube/rhadamanthus/core/external-dns
and copy the contents to ~/Code/Xe/x/kube/mechonis/core/external-dns
. Then open deployment-crd.yaml
and replace the DynamoDB table in the crd
container's args:
args:
- --source=crd
- --crd-source-apiversion=externaldns.k8s.io/v1alpha1
- --crd-source-kind=DNSEndpoint
- --provider=aws
- --registry=dynamodb
- --dynamodb-region=ca-central-1
- - --dynamodb-table=external-dns-crd-rhadamanthus
+ - --dynamodb-table=external-dns-crd-mechonis
And in deployment-ingress.yaml
:
args:
- --source=ingress
- - --default-targets=rhadamanthus.xeserv.us
+ - --default-targets=mechonis.xeserv.us
- --provider=aws
- --registry=dynamodb
- --dynamodb-region=ca-central-1
- - --dynamodb-table=external-dns-ingress-rhadamanthus
+ - --dynamodb-table=external-dns-ingress-mechonis
Apply these configs with kubectl apply
:
kubectl apply -k .
Then write a DNSEndpoint pointing to the created LoadBalancer. You may have to look up the IP addresses in the admin console of the cloud platform in question.
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
name: load-balancer-dns
spec:
endpoints:
- dnsName: mechonis.xeserv.us
recordTTL: 3600
recordType: A
targets:
- whatever.ipv4.goes.here
- dnsName: mechonis.xeserv.us
recordTTL: 3600
recordType: AAAA
targets:
- 2000:something:goes:here:lol
Apply it with kubectl apply
:
kubectl apply -f load-balancer-dns.yaml
This will point mechonis.xeserv.us
to the LoadBalancer, which will point to ingress-nginx based on Ingress configurations, which will route to your Services and Deployments, using Certs from cert-manager.
Copy the contents of ~/Code/Xe/x/kube/rhadamanthus/core/cert-manager
to ~/Code/Xe/x/kube/mechonis/core/cert-manager
. Apply them as-is, no changes are needed:
kubectl apply -k .
This will create letsencrypt-prod
and letsencrypt-staging
ClusterIssuers, which will allow the creation of Let's Encrypt certificates in their production and staging environments. 9 times out of 10, you won't need the staging environment, but when you are doing high-churn things involving debugging the certificate issuing setup, the staging environment is very useful because it has a much higher rate limit than the production environment does.
Nearly every term for "unit of thing to do" is taken by different aspects of Kubernetes and its ecosystem. The only one that isn't taken is "workload". A workload is a unit of work deployed somewhere, in practice this boils down to a Deployment, its Service, any PersistentVolumeClaims, Ingresses, or other resources that it needs in order to run.
Now you can put everything into test by making a simple "hello, world" workload. This will include:
Make a folder called hello-world
and put these files in it:
apiVersion: v1
kind: ConfigMap
metadata:
name: hello-world
data:
index.html: |
<html>
<head>
<title>Hello World!</title>
</head>
<body>Hello World!</body>
</html>
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-world
spec:
selector:
matchLabels:
app: hello-world
replicas: 1
template:
metadata:
labels:
app: hello-world
spec:
containers:
- name: web
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
volumes:
- name: html
configMap:
name: hello-world
apiVersion: v1
kind: Service
metadata:
name: hello-world
spec:
ports:
- port: 80
protocol: TCP
selector:
app: hello-world
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: hello-world
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- hello.mechonis.xeserv.us
secretName: hello-mechonis-xeserv-us-tls
rules:
- host: hello.mechonis.xeserv.us
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: hello-world
port:
number: 80
resources:
- configmap.yaml
- deployment.yaml
- service.yaml
- ingress.yaml
Then apply it with kubectl apply
:
kubectl apply -k .
It will take a minute for it to work, but here are the things that will be done in order so you can validate them:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
annotation, which triggers cert-manager to create a Cert for the Ingresshello-mechonis-xeserv-us-tls
in the default Namespace, so it creates an Order for a new certificate from the letsencrypt-prod
ClusterIssuer (set up in the cert-manager apply step earlier)hello-mechonis-xeserv-us-tls
in the default Namespacehello-world
service so every request to hello.mechonis.xeserv.us
points to the Pods managed by the hello-world
DeploymentThis results in the hello-world
workload going from nothing to fully working in about 5 minutes tops. Usually this can be less depending on how lucky you get with the response time of the Route 53 API. If it doesn't work, run through resources in this order in k9s:
external-dns-ingress
Pod logscert-manager
Pod logsBy the way: k9s is fantastic. You should have it installed if you deal with Kubernetes. It should be baked into kubectl. It's a near perfect tool.
From here you can deploy anything else you want, as long as the workload configuration kinda looks like the hello-world
configuration. Namely, you MUST have the following things set:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
annotation, if they don't, then no TLS certificate will be mintednginx.ingress.kubernetes.io/ssl-redirect: "true"
to ensure that all plain HTTP traffic is upgraded to HTTPSIf you work at a cloud provider that offers managed Kubernetes, I'm looking for a new place to put my website, sponsorship would be greatly appreciated!
Happy kubeing all!
2024-10-29 08:00:00
In the hours following the release of CVE-2024-9632 for the project X.org, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a buffer overflow that allows an attacker with access to raw X client calls to arbitrarily read and write memory, allowing for privilege escalation attacks. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Queen Annamarie Bayer, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."