2026-04-10 04:00:33


For some reason the newly introduced MacBook Neo appears to be the subject of a lot of modding, though a recent mod by [dosdude1] leans into the fact that this laptop has been assembled using what are effectively iPhone 16 parts inside a laptop case. This consequently means that there’s an overlap with certain iPhone 16 components, such as the NAND Flash. Incidentally storage on the Neo is limited to 512 GB when you purchase it from Apple, which is weird since the same SoC in the iPhone 16 Pro happily uses 1 TB.
Even if it was just a price point thing that Apple went for, there’s seemingly nothing standing between a Neo owner with a hot air gun and sheer determination. As long as you’re comfortable soldering a fine-pitched BGA NAND Flash package, natch.
Of course, there was always the possibility that Apple used a different NAND Flash package footprint, but the installed 256 GB model chip that comes installed matches the replacement 1 TB model K8A5 chip as hoped. This just left disassembly and preparing the PCB for a storage replacement. Removal of the BGA underfill and desoldering the old chip without taking out surrounding SMD parts is definitely the hardest part, but handled in the video with the equivalent of an IC spatula and a temporary removal of some capacitors.
Interestingly, the uncovered IC footprint shows a whole perimeter of unused pads that might target other NAND Flash packages. Regardless, the new chip installed fine, giving the Neo 1 TB of storage and a slightly faster read/write performance.
2026-04-10 02:30:54

We’ve noticed a recent surge in people recreating old projects from vintage electronics magazines, and we approve. After all, parts and PCBs are easier to get than ever, so other than replacing obsolete parts, it is usually much easier to build these projects now compared to when they first appeared. The latest one we’ve noticed was [Anthony Francis-Jones’] build of the “Reactalyser” from a 1968 edition of Practical Electronics. Check it out in the video below.
You may ask yourself what a reactalyser could be. We did too. Our guess was extremely far off, since we thought it might have to do with reactance.
We liked the retro-look radio that [Anthony] used as a case. He changed the circuit to use an OC71 PNP transistor and replaced a mechanical part of the device with more electronics. So this isn’t a totally faithful reproduction, but it does keep the spirit of the device.
This might seem like an odd circuit for something that would be totally trivial to make with a microcontroller. However, these kinds of circuits were very common prior to simple-to-use computers.
If you like these old retro builds, check out some of the ones we’ve featured from [Bettina Neumryr]. We need a name for this activity. We’ll suggest retromagging. Give us your entry in the comments.
2026-04-09 23:30:53

[Enginerd]’s chonky key handle is a beautiful use of 3D printing that helps people help themselves. The large wings, indented faces, and beefed-up grip make a typical house key much easier for someone with arthritis or difficulty gripping those brass slivers. Bright filaments in different colors can also help someone with vision limitations. The thing that will not improve is the space in your pocket or purse.
The design only requires a tiny bit of plastic, prints without supports, and what sets it apart from similar models is that you do not need any double-sided tape or bolts, only a keyring, so someone may have to assemble it for the user. The author is clever enough to use an uncut blank in the project photo so that no one will be decoding and copying their house key. We would wager they have read Hackaday if they are so prepared.
Some of the people who purchased early consumer 3D printers already need these kinds of builds, and there is no shortage of intelligent people creating remarkable open-source designs.

2026-04-09 22:00:43

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are effectively massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of parameters, times N bits per parameter, equals N-billion bits of storage required for a full model. Since increasing the number of parameters makes the models appear smarter, most effort on reducing the storage they require has been on reducing the size of the parameters themselves.
Vector quantization (VQ) is a new method that can compress the vectors calculated during inference to take up less space without significant loss of data. Google’s recently published pre-print paper on TurboQuant covers an LLM-oriented VQ algorithm that’s claimed to provide up to a 6x compression level with no negative impact on inference times.
The tokens aren’t directly encoded in the vector space, but their associated key value is, which along with the single token per inference process creates the need for a key-value (KV) cache, the size of which scales with the size of the model. Thus by compressing the KV cache using VQ, it will reduce its size and correspondingly speed up look-ups due to the smaller size in memory. One catch here is that VQ is due to the nature of quantization some accuracy will be lost. The trick here is thus to apply VQ in such a way that it does not affect this accuracy in a noticeable manner.
Other aspects that had to be taken into account by the TurboQuant algorithm was fast computation to keep up with real-time requirements, along with compatibility with so-called ‘AI accelerator’ hardware.
A basic way to look at the KV cache in LLMs is that it caches the results of previous inference cycles. An in-depth explanation can for example be found in this article by Sebastian Raschka. In the case of generating a phrase of three words starting with the word ‘Time’, we can see the following repeated computations:

Considering that inference is rather expensive computation-wise, you really want to cache these calculated values. This provides a massive boost in performance and much lower CPU load, but because there’s no such thing as a free lunch the catch here is a rapidly increasing memory usage.
Correspondingly, we now have a big in-memory cache to manage, along with memory management routines to make sure that the KV cache doesn’t exceed its allocated memory pool:

As covered in a December 2025 NVIDIA Developer article, KV cache optimization has been a topic for a while, with the article in question covering NVFP4. This is a VQ approach that reduces the precision of the KV cache from 16-bit floating point to 4-bit (FP4). Meanwhile production systems already employ 8-bit quantization, also using a floating point format (FP8).
An additional cost here is that FP4 has to be dequantized back to FP8, which would seem to be an implementation detail in the current version. Compared to FP8 quantization, FP4 reduces latency by up to 3 times and halves the required memory required, while accuracy is negatively impacted by ‘less than’ 1% compared to FP8 due to quantization error.
Accuracy here is important as it factors into the next auto-complete step when the LLM’s probability vector space is once again rummaged through for the next statistically most likely follow-up token. KV cache VQ compression is thus always a trade-off between memory use and accuracy. In short, the same issues apply as with all implementations of quantization-based compression, including the tragic absence of any free lunch.
So what magic did Google’s intrepid engineers pull off to improve on NVIDIA’s NVFP4 approach? The key is in how the quantization is performed, as it isn’t simple a matter of truncating or throwing away data, rounding up to the nearest available value. Instead a series of steps are applied that seek to minimize the quantization error, which in the case of TurboQuant is (confusingly) an algorithm called PolarQuant followed by the QJL (quantized Johnson-Lindenstrauss) algorithm.
Annoyingly for the non-mathematically gifted/educated among us, Google didn’t simply provide a straightforward visualization like that for NVFP4 that’s understandable even for us software developers and other casuals. For NVIDIA’s format we can see that it takes the form of a single sign bit, two exponents and one mantissa (E2M1), as well as a shared FP8 scale per block of 16 values.
One step where TurboQuant appears to be differ is in the PolarQuant algorithm, that applies a polar coordinates transformation to the vectors, following which a typical normalization can apparently be skipped.

This polar transformation is preceded by the application of a random projection matrix as a type of preconditioning that will affect later normal distribution, with proof and the full algorithm provided in the PolarQuant arXiv paper for those who desire more detail.
Of note is that PolarQuant employs the Johson-Lindenstrauss lemma, which Google researchers used as the basis for a JL-based transform called QJL. From reading the blog post it’s not immediately clear whether QJL is directly integrated into PolarQuant or an additional step, due to the muddled messaging on Google’s end. From the benchmarking results it does appear that QJL is an additional step.
What we know is that the final format that TurboQuant ends up with is three-bit value, which would logically be 1 bit smaller than NVFP4, or an approximate 25% smaller KV cache for the same amount of data.
Comparison and benchmark data in the Google blog post and associated papers do not provide direct comparisons with NVFP4, and the few numbers that are thrown out are rather inconsistent, or unspecified. Take the claim of ‘at least 6x smaller memory size’, for example. The blog text does not clearly specify what this is relative to, while it then tosses out a 4-bit TurboQuant number of 8x performance increase compared to FP32.
Although with some more digging and poking of the available data it might be possible to glean some actual performance information from the provided files, it’s rather vexing how vague Google’s messaging is kept. Not to mention the lack of direct benchmarking against what would be the biggest competitors in the space.
It is definitely true that VQ is a thing for LLM KV cache compression, as we have seen, and NVIDIA ‘accelerator cards’ provide hardware acceleration for this feature, so this is the reality that TurboQuant would have to compete with. Based on the few clear facts that we do have it doesn’t appear that it’s quite the revolution that the hype machine has made it out to be, with it likely being just a bump over NVFP4 that NVIDIA is likely to trump again with its next quantized format.
It will of course be most interesting to see how this will play out once TurboQuant makes its way out of the laboratory into the wider world and we start seeing independent benchmarking performed.
2026-04-09 19:00:44

Like many long-established broadcasters, the BBC put out a selection of their archive material for us all to enjoy online. Their most recent may be of interest to Hackaday readers and has more than a bit of personal interest to your scribe, as it visits the Spadeadam rocket test range on the event of its closure in 1973. This marked the final chapter in the story of Blue Streak, the British intercontinental missile project that later became part of the first European space launcher.
It’s possible citizens of every country see their government as uniquely talented in the throwing away of taxpayer’s money, but the sad story here isn’t in Blue Streak itself which was obsolete as a missile by the time it was finished. Instead it lies in the closure of the test range as part of the ill-advised destruction of a nascent and successful space industry, just as it had made the UK the third nation to have successfully placed a satellite in orbit.
We normally write in the second person in our daily posts here at Hackaday, but for now there’s a rare switch into the first person. My dad spent a large part of the 1950s working as a technician for de Haviland Propellers, later part of Hawker Siddeley, and then British Aerospace. He was part of the team working on Blue Streak at Spadeadam and the other test site at RAF Westcott in Buckinghamshire, and we were brought up on hair-raising tales of near-disasters in the race to get British nukes flying. He’s not one of the guys in the video below, as by that time he was running his metalwork business in Oxfordshire, but I certainly recognise the feeling of lost potential they express. Chances are I’ll never visit what remains of the Spadeadam test stands in person as the site is now the UK’s electronic warfare test range, so the BBC film represents a rare chance for a closer look.
In a related story, the trackers for the same program in Australia were saved from the scrapheap.
2026-04-09 16:00:41

For some types of embedded systems — especially those that are safety-critical — it’s considered bad form to dynamically allocate memory during operation. While you can usually arrange for your own code to behave, it’s the libraries that get you. In particular, it is hard to find a TCP/IP stack that doesn’t allocate and free memory all over the place. Unless you’ve found wolfIP.
The library supports a BSD-like non-blocking socket API. It can act as an endpoint, but can also support multiple interfaces and forwarding if you were building something like a router. It doesn’t appear to be bare-bones either. In addition to the normal things you’d expect for IPv4, there’s also ICMP, IPSEC, ARP, DHCP, DNS, and HTTP with or without SSL TLS. There is also a FIPS-compliant implementation of WireGuard for VPN, although it is not directly compatible with standard WireGuard, only with other instances of itself (known as wolfGuard). There is a Linux kernel module for WolfGuard, though.
The code should be fairly easy to port, and it includes a binding for FreeRTOS already. If you’ve used wolfIP, let us know in the comments.
If you want to really get down to the low-level, try this project. Of, if you want a refresher on basics, we can help with that, too.