MoreRSS

site iconXe IasoModify

Senior Technophilosopher, Ottawa, CAN, a speaker, writer, chaos magician, and committed technologist.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Xe Iaso

Using Assisted-by commit footers instead of banning AI tools

2025-11-02 08:00:00

Something I've seen around the internet is that many projects want a blanket policy of no AI tools being allowed for contributors. As much as I agree with the sentiment of policies like this, I don't think it's entirely realistic because it's trivial to lie about not using them when you actually do.

I think a better middle ground is something like Fedora's AI-Assisted Contributions Policy. This demands that you include a commit footer that discloses what AI tools you've used in your process, such as this:

Assisted-by: GPT-OSS 120b via OpenAI Codex (locally hosted)
        

Amusingly, you can actually tell AI agents to write this commit footer and they'll happily do it. Consider this part of this repo's AGENTS.md file (AGENTS.md is a set of instructions for AI agents to know how to best contribute to the policy):

Attribution Requirements

AI agents must disclose what tool and model they are using in the "Assisted-by" commit footer:

Assisted-by: [Model Name] via [Tool Name]
        

Example:

Assisted-by: GLM 4.6 via Claude Code
        

Not only does this make it trivial for automation to detect when AI tools are being used (and add appropriate tagging so reviewers can be more particular), it lets you know which AI tools cause more issues in the longer run. This can help guide policy and assist contributors that want to use AI tooling into picking the best tools for the job.

Anyways, at a high level if you ask people to disclose what AI tools they are using and make it so that the default configuration of most AI tooling will just add that disclosure for you, people are much more likely to comply with that policy. I think that this is a better middle ground than having witch hunts trying to figure out who used what tool and letting it become a free ground for noisy, low‑quality contributions.

I want to see a future where people are allowed to experiment with fancy new tools. However, given the risks involved with low‑effort contributions causing issues, I think it's better for everyone to simply require an easy machine‑readable footer.

Also, if you want to put Assisted-by: GNU Emacs, I won't stop you.

This post was edited with help from GPT-OSS 120b on a DGX Spark, a device which only consumes 150 watts at maximum load. While I was writing this, I had Final Fantasy 14 open in the background to listen to bards perform in Limsa Lominsa. This made my workstation's RTX 4080 pull 150 watts of power constantly.

Taking steps to end traffic from abusive cloud providers

2025-10-31 08:00:00

This blog post explains how to effectively file abuse reports against cloud providers to stop malicious traffic. Key points:

  1. Two IP Types: Residential (ineffective to report) vs. Commercial (targeted reports)

  2. Why Cloud Providers: Cloud customers violate provider terms, making abuse reports actionable

  3. Effective Abuse Reports Should Include:

    • Time of abusive requests
    • IP/User-Agent identifiers
    • robots.txt status
    • System impact description
    • Service context
  4. Process:

    • Use whois to find abuse contacts (look for "abuse-c" or "abuse-mailbox")
    • Send detailed reports with all listed emails
    • Expect response within 2 business days
  5. Note on "Free VPNs": Often sell your bandwidth as part of botnets, not true public infrastructure

The goal is to make scraping the cloud provider's problem, forcing them to address violations against their terms of service.

Fearless dataset experimentation with bucket forking

2025-10-30 08:00:00

This blog post explores how Tigris Object Storage's bucket forking feature enables isolated dataset experimentation similar to forking code repositories. It demonstrates creating parallel data timelines for filtering, captioning, and resizing game screenshots without duplicating storage, allowing safe experimentation on large datasets.

"No way to prevent this" say users of only language where this regularly happens

2025-10-28 08:00:00

In the hours following the release of CVE-2025-62229 for the project X.Org X server, site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a use-after-free bug in the XPresentNotify extension. This is due to the affected components being written in C, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Prof. Sophia Strosin, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

First look at the DGX Spark

2025-10-14 08:00:00

Disclaimer

I'm considering this post as a sponsored post. I was not paid by NVIDIA to work on this, but I did receive a DGX Spark from them pre-release and have been dilligently testing it and filing bugs.

I've had access to the NVIDIA DGX Spark for over a month now. Today I'm gonna cover my first impressions and let you know what I've been up to with it.

In a nutshell, this thing is a beast. It's one of the most powerful devices in my house and in a pinch I'd be okay with using it as my primary workstation. It's got a mix of a CPU that's got enough punch to do software development with a GPU that's in that sweet spot between consumer and datacenter tier. Not to mention 128Gi of ram. When I've been using this thing, the main limit is my imagination…and my poor understanding of Python environment management.

I think that it's best to understand the DGX Spark as a devkit for their NVIDIA Grace Datacentre processors. It's incredibly powerful for what it is, it's a device that can fit on your desk and run AI models right there.

A DGX Spark on top of a desk with typical computer things around it such as a laptop, a coffee mug, a keyboard, and a mouse.
A DGX Spark on top of a desk with typical computer things around it such as a laptop, a coffee mug, a keyboard, and a mouse.

The DGX Spark is tiny. It's about as wide as the screen of a Steam Deck OLED, or about halfway between the size of a Mac mini M1 and a Mac mini M4.

This thing is also stupidly power efficient. I've been unable to cause my office to get warm in a way that is attributed to the DGX Spark alone. On average rendering Final Fantasy 14 in one of the major player hub areas ends up making my tower use more power than the DGX Spark does while doing AI finetuning. I'll talk more about this in the future.

One of the most interesting things about this device is that it's got an Arm chip, CUDA, and unified RAM. This combination means that in practice you need to compile all of the Python packages you're using from source. Pip usually handles this well enough, but it does mean that for many Python packages it will take longer to install from source than it will on an X86 system. I assume this will be ironed out as the ecosystem matures.

The power efficiency can't be overstated though. I've tried to make my office warm using the DGX Spark and I have failed. I'm seeing it pull a maximum of 70 watts.

I get about 30-40 tokens per second with gpt-oss:120b:

$ ollama version
        
        $ ollama run \
          --nowordwrap \
          --verbose \
          xe/mimi:gpt-oss-120b \
          "Summarize this post: $(cat 2025/rolling-ladder-behind-us.mdx)"
        
Mimi is think
Mimi

The user asks: "Summarize this post: --- ...". So we need to provide a summary. Follow developer instructions: be polite, bubbly, lowercase when possible unless serious. No JSON. Summarize the content. The post is long, covering themes: loss of craft, historical analogy with weavers and Luddites, decline of senior titles, AI replacing expertise, vibe coding, security concerns, AI's impact on art, etc. Provide concise summary. Use informal tone, lowercase. End without period unless serious. Probably okay.

Mimi is happy
Mimi

this post is a sprawling meditation on how we keep “rolling the ladder up behind us” – building a future that erodes the very expertise that made it possible. it starts with a historic look at cloth‑making: skilled weavers were displaced by power looms, turning a durable, treasured craft into cheap, disposable fast‑fashion. the author draws a parallel to today’s software world, warning that we’re replacing senior developers and deep craft with hype‑driven AI tools and “vibe coding” platforms that promise instant results but degrade long‑term quality.

key points:

  • craft is disappearing – both in weaving and coding, the knowledge of masters is vanishing, leaving only fragmented R&D notes or AI‑generated shortcuts.
  • senior titles are at risk – companies favor hiring senior talent without nurturing the next generation, so the pool of true “seniors” will run out.
  • AI as a double‑edged sword – generative tools can reduce drudgery but are being marketed as the next industrial revolution while actually shifting value to owners and creating insecure, low‑quality products.
  • vibe coding & AI assistants – slick UX masks the fact that many tools are subscription traps, security hazards, and can erode programmers’ skills.
  • artistic impact – similar to how AI floods art spaces with cheap, low‑effort outputs, software development risks becoming a flood of “good enough” code.
  • security concerns – model‑context‑protocol servers can expose secrets and run unchecked code, highlighting the need for sandboxed, capability‑based designs.
  • broader societal worry – the author (also the CEO of a small AI‑security startup) sees a winner‑take‑all capitalism fueled by AI, with the cost falling on workers, artists, and even the environment.

the piece ends with a bleak outlook: without deliberate effort to preserve and pass on true craft, future generations will inherit ladders that only go up for the privileged few, leaving the rest scrambling. it’s a call to value high‑quality, human‑made work over quick, AI‑generated shortcuts.

total duration:       16.464571893s
        load duration:        123.742176ms
        prompt eval count:    7237 token(s)
        prompt eval duration: 33.491521ms
        prompt eval rate:     216084.54 tokens/s
        eval count:           567 token(s)
        eval duration:        16.063168189s
        eval rate:            35.30 tokens/s
        

With flash attention on gpt-oss:120b at 128k context window, it uses about 70Gi of ram:

$ ollama ps
        NAME                    ID              SIZE     PROCESSOR    CONTEXT    UNTIL
        xe/mimi:gpt-oss-120b    81089177a28c    70 GB    100% GPU     131072     29 minutes from now
        
xe@zohar:~$ nvidia-smi
        Mon Oct 13 22:48:25 2025
        +-----------------------------------------------------------------------------------------+
        | NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
        +-----------------------------------------+------------------------+----------------------+
        | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
        | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
        |                                         |                        |               MIG M. |
        |=========================================+========================+======================|
        |   0  NVIDIA GB10                    On  |   0000000F:01:00.0 Off |                  N/A |
        | N/A   43C    P0             11W /  N/A  | Not Supported          |      0%      Default |
        |                                         |                        |                  N/A |
        +-----------------------------------------+------------------------+----------------------+
        
        +-----------------------------------------------------------------------------------------+
        | Processes:                                                                              |
        |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
        |        ID   ID                                                               Usage      |
        |=========================================================================================|
        |    0   N/A  N/A            4752      C   /usr/local/bin/ollama                 66473MiB |
        +-----------------------------------------------------------------------------------------+
        

I assume that the unaccounted 4Gi or so of ram is in the CPU ram overhead of the Ollama model runner process.

What I'm doing with the DGX Spark

So far I've been using the Spark in place of cloud GPUs for every AI thing I've needed to do at work. In general, I haven't really noticed any differences between the GPU in the cloud and the Spark on my home network. The only real rough edge is that I need to use this one blessed NVIDIA authored docker image to run iPython notebooks. It's easy enough though. Usually my Docker command looks like:

docker run \
           --gpus all \
           --net=host \
           --ipc=host \
           --ulimit memlock=-1 \
           --ulimit stack=67108864 \
           -it \
           --rm \
           -v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
           -v "$HOME/.huggingface:/root/.huggingface" \
           -v "$HOME/Code:/workspace/code" \
           -v "$SSH_AUTH_SOCK:$SSH_AUTH_SOCK" \
           -e HF_TOKEN=hf_hunter2hunter2hunter2 \
           -e "SSH_AUTH_SOCK=$SSH_AUTH_SOCK" \
           -e HF_HOME=/root/.cache/huggingface \
           -e HF_HUB_CACHE=/root/.cache/huggingface/hub \
           -e HF_DATASETS_CACHE=/root/.cache/huggingface/datasets \
           nvcr.io/nvidia/pytorch:25.09-py3
        

And then it Just Works™.

The main thing I've been doing with it is inference of GPT-OSS 120b via Ollama. I've been doing latency and power usage testing by setting up a Discord bot and telling people that the goal is to jailbreak the bot into telling you how to make a chocolate cake. Nobody has been able to make my room warm.

What's up next?

This whole experience has been a bit of a career bucket list item for me. I've never had access to prerelease hardware like this before and being able to see what reviewers have to deal with before things are available to the masses is enlightening. I've ended up filing GPU driver bugs using my tower as a "known good" reference.

I've been slowly sinking my teeth into learning how AI training actually works using this device to do it. I've mostly been focusing on finetuning GPT-2 and using that to learn the important parts of dataset cleaning, tokenization, and more. Let me know if you want to hear more about that and if you want me to release my practice models.

At the very least though, here's the things I have in the pipeline that this device enables:

  • Finetuning at home: how to make your own AI models do what you want
  • Some rough outlines and/or overviews for how I want to use classical machine learning models to enhance Anubis and do outlier detection
  • If I can somehow get Final Fantasy 14 running on it, some benchmarking in comparison to my gaming tower (if you know how to get amd64 games running well on aarch64, DM me!)

I also plan to make a comprehensive review video. Details to be announced soon.

I hope this was interesting. Thanks for early access to the device NVIDIA!