2025-11-04 08:00:00
2025-11-02 08:00:00
Something I've seen around the internet is that many projects want a blanket policy of no AI tools being allowed for contributors. As much as I agree with the sentiment of policies like this, I don't think it's entirely realistic because it's trivial to lie about not using them when you actually do.
I think a better middle ground is something like Fedora's AI-Assisted Contributions Policy. This demands that you include a commit footer that discloses what AI tools you've used in your process, such as this:
Assisted-by: GPT-OSS 120b via OpenAI Codex (locally hosted)
Amusingly, you can actually tell AI agents to write this commit footer and they'll happily do it. Consider this part of this repo's AGENTS.md file (AGENTS.md is a set of instructions for AI agents to know how to best contribute to the policy):
Attribution Requirements
AI agents must disclose what tool and model they are using in the "Assisted-by" commit footer:
Assisted-by: [Model Name] via [Tool Name]Example:
Assisted-by: GLM 4.6 via Claude Code
Not only does this make it trivial for automation to detect when AI tools are being used (and add appropriate tagging so reviewers can be more particular), it lets you know which AI tools cause more issues in the longer run. This can help guide policy and assist contributors that want to use AI tooling into picking the best tools for the job.
Anyways, at a high level if you ask people to disclose what AI tools they are using and make it so that the default configuration of most AI tooling will just add that disclosure for you, people are much more likely to comply with that policy. I think that this is a better middle ground than having witch hunts trying to figure out who used what tool and letting it become a free ground for noisy, low‑quality contributions.
I want to see a future where people are allowed to experiment with fancy new tools. However, given the risks involved with low‑effort contributions causing issues, I think it's better for everyone to simply require an easy machine‑readable footer.
Also, if you want to put Assisted-by: GNU Emacs, I won't stop you.
This post was edited with help from GPT-OSS 120b on a DGX Spark, a device which only consumes 150 watts at maximum load. While I was writing this, I had Final Fantasy 14 open in the background to listen to bards perform in Limsa Lominsa. This made my workstation's RTX 4080 pull 150 watts of power constantly.
2025-10-31 08:00:00
This blog post explains how to effectively file abuse reports against cloud providers to stop malicious traffic. Key points:
Two IP Types: Residential (ineffective to report) vs. Commercial (targeted reports)
Why Cloud Providers: Cloud customers violate provider terms, making abuse reports actionable
Effective Abuse Reports Should Include:
Process:
Note on "Free VPNs": Often sell your bandwidth as part of botnets, not true public infrastructure
The goal is to make scraping the cloud provider's problem, forcing them to address violations against their terms of service.
2025-10-30 08:00:00
This blog post explores how Tigris Object Storage's bucket forking feature enables isolated dataset experimentation similar to forking code repositories. It demonstrates creating parallel data timelines for filtering, captioning, and resizing game screenshots without duplicating storage, allowing safe experimentation on large datasets.
2025-10-28 08:00:00
In the hours following the release of CVE-2025-62229 for the project X.Org X server, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a use-after-free bug in the XPresentNotify extension. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Prof. Sophia Strosin, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."
2025-10-14 08:00:00
I'm considering this post as a sponsored post. I was not paid by NVIDIA to work on this, but I did receive a DGX Spark from them pre-release and have been dilligently testing it and filing bugs.
I've had access to the NVIDIA DGX Spark for over a month now. Today I'm gonna cover my first impressions and let you know what I've been up to with it.
In a nutshell, this thing is a beast. It's one of the most powerful devices in my house and in a pinch I'd be okay with using it as my primary workstation. It's got a mix of a CPU that's got enough punch to do software development with a GPU that's in that sweet spot between consumer and datacenter tier. Not to mention 128Gi of ram. When I've been using this thing, the main limit is my imagination…and my poor understanding of Python environment management.
I think that it's best to understand the DGX Spark as a devkit for their NVIDIA Grace Datacentre processors. It's incredibly powerful for what it is, it's a device that can fit on your desk and run AI models right there.

The DGX Spark is tiny. It's about as wide as the screen of a Steam Deck OLED, or about halfway between the size of a Mac mini M1 and a Mac mini M4.
This thing is also stupidly power efficient. I've been unable to cause my office to get warm in a way that is attributed to the DGX Spark alone. On average rendering Final Fantasy 14 in one of the major player hub areas ends up making my tower use more power than the DGX Spark does while doing AI finetuning. I'll talk more about this in the future.
One of the most interesting things about this device is that it's got an Arm chip, CUDA, and unified RAM. This combination means that in practice you need to compile all of the Python packages you're using from source. Pip usually handles this well enough, but it does mean that for many Python packages it will take longer to install from source than it will on an X86 system. I assume this will be ironed out as the ecosystem matures.
The power efficiency can't be overstated though. I've tried to make my office warm using the DGX Spark and I have failed. I'm seeing it pull a maximum of 70 watts.
I get about 30-40 tokens per second with gpt-oss:120b:
$ ollama version
$ ollama run \
--nowordwrap \
--verbose \
xe/mimi:gpt-oss-120b \
"Summarize this post: $(cat 2025/rolling-ladder-behind-us.mdx)"
total duration: 16.464571893s
load duration: 123.742176ms
prompt eval count: 7237 token(s)
prompt eval duration: 33.491521ms
prompt eval rate: 216084.54 tokens/s
eval count: 567 token(s)
eval duration: 16.063168189s
eval rate: 35.30 tokens/s
With flash attention on gpt-oss:120b at 128k context window, it uses about 70Gi of ram:
$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
xe/mimi:gpt-oss-120b 81089177a28c 70 GB 100% GPU 131072 29 minutes from now
xe@zohar:~$ nvidia-smi
Mon Oct 13 22:48:25 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 On | 0000000F:01:00.0 Off | N/A |
| N/A 43C P0 11W / N/A | Not Supported | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 4752 C /usr/local/bin/ollama 66473MiB |
+-----------------------------------------------------------------------------------------+
I assume that the unaccounted 4Gi or so of ram is in the CPU ram overhead of the Ollama model runner process.
So far I've been using the Spark in place of cloud GPUs for every AI thing I've needed to do at work. In general, I haven't really noticed any differences between the GPU in the cloud and the Spark on my home network. The only real rough edge is that I need to use this one blessed NVIDIA authored docker image to run iPython notebooks. It's easy enough though. Usually my Docker command looks like:
docker run \
--gpus all \
--net=host \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-it \
--rm \
-v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
-v "$HOME/.huggingface:/root/.huggingface" \
-v "$HOME/Code:/workspace/code" \
-v "$SSH_AUTH_SOCK:$SSH_AUTH_SOCK" \
-e HF_TOKEN=hf_hunter2hunter2hunter2 \
-e "SSH_AUTH_SOCK=$SSH_AUTH_SOCK" \
-e HF_HOME=/root/.cache/huggingface \
-e HF_HUB_CACHE=/root/.cache/huggingface/hub \
-e HF_DATASETS_CACHE=/root/.cache/huggingface/datasets \
nvcr.io/nvidia/pytorch:25.09-py3
And then it Just Works™.
The main thing I've been doing with it is inference of GPT-OSS 120b via Ollama. I've been doing latency and power usage testing by setting up a Discord bot and telling people that the goal is to jailbreak the bot into telling you how to make a chocolate cake. Nobody has been able to make my room warm.
This whole experience has been a bit of a career bucket list item for me. I've never had access to prerelease hardware like this before and being able to see what reviewers have to deal with before things are available to the masses is enlightening. I've ended up filing GPU driver bugs using my tower as a "known good" reference.
I've been slowly sinking my teeth into learning how AI training actually works using this device to do it. I've mostly been focusing on finetuning GPT-2 and using that to learn the important parts of dataset cleaning, tokenization, and more. Let me know if you want to hear more about that and if you want me to release my practice models.
At the very least though, here's the things I have in the pipeline that this device enables:
I also plan to make a comprehensive review video. Details to be announced soon.
I hope this was interesting. Thanks for early access to the device NVIDIA!