MoreRSS

site iconHackadayModify

Hackaday serves up Fresh Hacks Every Day from around the Internet. Our playful posts are the gold-standard in entertainment for engineers and engineering enthusiasts.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Hackaday

USB, Abstracted

2026-04-10 10:00:42

Modern technology builds on abstractions. Most application programmers today don’t know what a non-maskable interrupt is, nor should they have to. Even fewer understand register coloring or reservation stations for instruction scheduling, and fewer still can explain the physics behind the transistors in the CPU. Sometimes tech starts out where you need to know everything (programming a bare-metal microprocessor, for example) and then evolves to abstraction. That’s where [WerWolv] wants to get you for writing USB code using the recent post USB for Software Developers.

Many USB tutorials assume you want to know about the intricacies of protocol negotiation, information about the hardware layer, and that you are willing to write a Linux kernel module to provide a driver. But thanks to abstraction, none of this has been absolutely necessary for many use cases for a long time.

While the post focuses on Linux, there is libusb for Windows. We presume the same principles would apply, more or less.

Interestingly, the target device for the tutorial is an Android phone in bootloader mode. We thought that was strange at first, until we read the rationale. You can easily get your hands on an Android phone if you don’t already have one. The device is simple. Plus, it is unlikely you already have drivers installed on your system that would interfere with your tutorial driver. Makes sense.

After that, it is pretty straightforward to use libusb to find the phone, determine what you can do with it, and communicate with it. Sure, the phone’s “fastboot” protocol is simple, but that’s just like using a TCP socket. You may implement a fancy protocol on top of it, but that doesn’t mean sockets are hard to use.

We’ve looked at simplified USB drivers before. Of course, for some applications, you can bend a USB serial port to handle something a bit more complex.

[Kerry Wong] Finds SMD Test Clips

2026-04-10 07:00:35

One of the many problems you run into when you work with SMD parts is trying to probe the little tiny pins. While we usually watch [Kerry Wong’s] videos for the oscilloscopes, it makes sense that he’d also be looking for probes. The video below shows some cheap probes from China that can clamp onto tiny QFP pins.

The probes look a little like tiny needles, but the needle part isn’t conductive. When you push them, very tiny and rigid clamps come out. On the other end is a pin that will take a female header or, of course, you could connect another test lead to that pin.

As an example, he shows a decidedly dirty Arduino Due and probes the CPU with the tiny probes. Off camera, he put two probes on adjacent pins on the QFP, and it worked just fine. Definitely something we will add to our toolbox.

The probes appear to work with pitches as small as 0.5mm, which covers many common situations. We’ve looked at oddball probes before. Or try making your own solutions.

Upgrading a MacBook Neo Using a 1 TB iPhone NAND Flash

2026-04-10 04:00:33

The nekkid Flash footprint with unused pads perimeter. (Credit: dosdude1, YouTube)
The nekkid Flash footprint with unused pads perimeter. (Credit: dosdude1, YouTube)

For some reason the newly introduced MacBook Neo appears to be the subject of a lot of modding, though a recent mod by [dosdude1] leans into the fact that this laptop has been assembled using what are effectively iPhone 16 parts inside a laptop case. This consequently means that there’s an overlap with certain iPhone 16 components, such as the NAND Flash. Incidentally storage on the Neo is limited to 512 GB when you purchase it from Apple, which is weird since the same SoC in the iPhone 16 Pro happily uses 1 TB.

Even if it was just a price point thing that Apple went for, there’s seemingly nothing standing between a Neo owner with a hot air gun and sheer determination. As long as you’re comfortable soldering a fine-pitched BGA NAND Flash package, natch.

Of course, there was always the possibility that Apple used a different NAND Flash package footprint, but the installed 256 GB model chip that comes installed matches the replacement 1 TB model K8A5 chip as hoped. This just left disassembly and preparing the PCB for a storage replacement. Removal of the BGA underfill and desoldering the old chip without taking out surrounding SMD parts is definitely the hardest part, but handled in the video with the equivalent of an IC spatula and a temporary removal of some capacitors.

Interestingly, the uncovered IC footprint shows a whole perimeter of unused pads that might target other NAND Flash packages. Regardless, the new chip installed fine, giving the Neo 1 TB of storage and a slightly faster read/write performance.

Need a Reactalyser?

2026-04-10 02:30:54

We’ve noticed a recent surge in people recreating old projects from vintage electronics magazines, and we approve. After all, parts and PCBs are easier to get than ever, so other than replacing obsolete parts, it is usually much easier to build these projects now compared to when they first appeared. The latest one we’ve noticed was [Anthony Francis-Jones’] build of the “Reactalyser” from a 1968 edition of Practical Electronics. Check it out in the video below.

You may ask yourself what a reactalyser could be. We did too. Our guess was extremely far off, since we thought it might have to do with reactance.

We liked the retro-look radio that [Anthony] used as a case. He changed the circuit to use an OC71 PNP transistor and replaced a mechanical part of the device with more electronics. So this isn’t a totally faithful reproduction, but it does keep the spirit of the device.

This might seem like an odd circuit for something that would be totally trivial to make with a microcontroller. However, these kinds of circuits were very common prior to simple-to-use computers.

If you like these old retro builds, check out some of the ones we’ve featured from [Bettina Neumryr]. We need a name for this activity. We’ll suggest retromagging. Give us your entry in the comments.

Printed Sleeve Gives Keys Some Grip

2026-04-09 23:30:53

[Enginerd]’s chonky key handle is a beautiful use of 3D printing that helps people help themselves. The large wings, indented faces, and beefed-up grip make a typical house key much easier for someone with arthritis or difficulty gripping those brass slivers. Bright filaments in different colors can also help someone with vision limitations. The thing that will not improve is the space in your pocket or purse.

The design only requires a tiny bit of plastic, prints without supports, and what sets it apart from similar models is that you do not need any double-sided tape or bolts, only a keyring, so someone may have to assemble it for the user. The author is clever enough to use an uncut blank in the project photo so that no one will be decoding and copying their house key. We would wager they have read Hackaday if they are so prepared.

Some of the people who purchased early consumer 3D printers already need these kinds of builds, and there is no shortage of intelligent people creating remarkable open-source designs.

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

2026-04-09 22:00:43

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are effectively massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of parameters, times N bits per parameter, equals N-billion bits of storage required for a full model. Since increasing the number of parameters makes the models appear smarter, most effort on reducing the storage they require has been on reducing the size of the parameters themselves.

Vector quantization (VQ) is a new method that can compress the vectors calculated during inference to take up less space without significant loss of data. Google’s recently published pre-print paper on TurboQuant covers an LLM-oriented VQ algorithm that’s claimed to provide up to a 6x compression level with no negative impact on inference times.

The tokens aren’t directly encoded in the vector space, but their associated key value is, which along with the single token per inference process creates the need for a key-value (KV) cache, the size of which scales with the size of the model. Thus by compressing the KV cache using VQ, it will reduce its size and correspondingly speed up look-ups due to the smaller size in memory. One catch here is that VQ is due to the nature of quantization some accuracy will be lost. The trick here is thus to apply VQ in such a way that it does not affect this accuracy in a noticeable manner.

Other aspects that had to be taken into account by the TurboQuant algorithm was fast computation to keep up with real-time requirements, along with compatibility with so-called ‘AI accelerator’ hardware.

Key-Value Cache

A basic way to look at the KV cache in LLMs is that it caches the results of previous inference cycles. An in-depth explanation can for example be found in this article by Sebastian Raschka. In the case of generating a phrase of three words starting with the word ‘Time’, we can see the following repeated computations:

Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)
Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)

Considering that inference is rather expensive computation-wise, you really want to cache these calculated values. This provides a massive boost in performance and much lower CPU load, but because there’s no such thing as a free lunch the catch here is a rapidly increasing memory usage.

Correspondingly, we now have a big in-memory cache to manage, along with memory management routines to make sure that the KV cache doesn’t exceed its allocated memory pool:

KV cache schematic with memory pool management. (Credit: NVIDIA)
KV cache schematic with memory pool management. (Credit: NVIDIA)

As covered in a December 2025 NVIDIA Developer article, KV cache optimization has been a topic for a while, with the article in question covering NVFP4. This is a VQ approach that reduces the precision of the KV cache from 16-bit floating point to 4-bit (FP4). Meanwhile production systems already employ 8-bit quantization, also using a floating point format (FP8).

An additional cost here is that FP4 has to be dequantized back to FP8, which would seem to be an implementation detail in the current version. Compared to FP8 quantization, FP4 reduces latency by up to 3 times and halves the required memory required, while accuracy is negatively impacted by ‘less than’ 1% compared to FP8 due to quantization error.

Accuracy here is important as it factors into the next auto-complete step when the LLM’s probability vector space is once again rummaged through for the next statistically most likely follow-up token. KV cache VQ compression is thus always a trade-off between memory use and accuracy. In short, the same issues apply as with all implementations of quantization-based compression, including the tragic absence of any free lunch.

Turbo Quantization

So what magic did Google’s intrepid engineers pull off to improve on NVIDIA’s NVFP4 approach? The key is in how the quantization is performed, as it isn’t simple a matter of truncating or throwing away data, rounding up to the nearest available value. Instead a series of steps are applied that seek to minimize the quantization error, which in the case of TurboQuant is (confusingly) an algorithm called PolarQuant followed by the QJL (quantized Johnson-Lindenstrauss) algorithm.

Annoyingly for the non-mathematically gifted/educated among us, Google didn’t simply provide a straightforward visualization like that for NVFP4 that’s understandable even for us software developers and other casuals. For NVIDIA’s format we can see that it takes the form of a single sign bit, two exponents and one mantissa (E2M1), as well as a shared FP8 scale per block of 16 values.

One step where TurboQuant appears to be differ is in the PolarQuant algorithm, that applies a polar coordinates transformation to the vectors, following which a typical normalization can apparently be skipped.

Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)
Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)

This polar transformation is preceded by the application of a random projection matrix as a type of preconditioning that will affect later normal distribution, with proof and the full algorithm provided in the PolarQuant arXiv paper for those who desire more detail.

Of note is that PolarQuant employs the Johson-Lindenstrauss lemma, which Google researchers used as the basis for a JL-based transform called QJL. From reading the blog post it’s not immediately clear whether QJL is directly integrated into PolarQuant or an additional step, due to the muddled messaging on Google’s end. From the benchmarking results it does appear that QJL is an additional step.

What we know is that the final format that TurboQuant ends up with is three-bit value, which would logically be 1 bit smaller than NVFP4, or an approximate 25% smaller KV cache for the same amount of data.

Judging On Merits

Comparison and benchmark data in the Google blog post and associated papers do not provide direct comparisons with NVFP4, and the few numbers that are thrown out are rather inconsistent, or unspecified. Take the claim of ‘at least 6x smaller memory size’, for example. The blog text does not clearly specify what this is relative to, while it then tosses out a 4-bit TurboQuant number of 8x performance increase compared to FP32.

Although with some more digging and poking of the available data it might be possible to glean some actual performance information from the provided files, it’s rather vexing how vague Google’s messaging is kept. Not to mention the lack of direct benchmarking against what would be the biggest competitors in the space.

It is definitely true that VQ is a thing for LLM KV cache compression, as we have seen, and NVIDIA ‘accelerator cards’ provide hardware acceleration for this feature, so this is the reality that TurboQuant would have to compete with. Based on the few clear facts that we do have it doesn’t appear that it’s quite the revolution that the hype machine has made it out to be, with it likely being just a bump over NVFP4 that NVIDIA is likely to trump again with its next quantized format.

It will of course be most interesting to see how this will play out once TurboQuant makes its way out of the laboratory into the wider world and we start seeing independent benchmarking performed.