MoreRSS

site iconHackadayModify

Hackaday serves up Fresh Hacks Every Day from around the Internet. Our playful posts are the gold-standard in entertainment for engineers and engineering enthusiasts.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Hackaday

Hackaday播客第353集:令人惊叹的外设、真假皮科以及Steam Deck上的所有内容

2026-01-17 01:00:33

Join Hackaday Editors Elliot Williams and Tom Nardi as they swap their favorite hacks and stories from the week. In this episode, they’ll start off by marveling over the evolution of the “smart knob” and other open hardware input devices, then discuss a futuristic propulsion technology you can demo in your own kitchen sink, and a cheap handheld game system that get’s a new lease on life thanks to the latest version of the ESP32 microcontroller.

From there they’ll cover spinning CRTs, creating custom GUIs on Android, and yet another thing you can build of out that old Ender 3 collecting dust in the basement. The episode wraps up with a discussion about putting Valve’s Steam Deck to work and a look at the history-making medical evacuation of the International Space Station.

Check out the links below if you want to follow along, and as always, tell us what you think about this episode in the comments!

As always, this episode is available in DRM-free MP3.

Where to Follow Hackaday Podcast

Places to follow Hackaday podcasts:

Episode 353 Show Notes:

What’s that Sound?

Interesting Hacks of the Week:

Quick Hacks:

Can’t-Miss Articles:

尝试使用基于Allwinner的Walnut Pi SBC

2026-01-17 00:00:00

When it comes to the term ‘Raspberry Pi clones’, the most that they really clone is the form factor, as nobody is creating clones of Broadcom VideoCore-based SoCs. At least not if they want to stay safe from Broadcom’s vicious legal team. That said, the Walnut Pi 1B single-board computer (SBC) that [Silly Workshop] recently took a gander at seems to be taking a fairly typical approach to a Raspberry Pi 4 form factor compatible board.

Part of Walnut Pi’s line-up, the Allwinner H616/H168-equipped 1B feels like it takes hints from both the RPi 4B and the Asus Tinkerboard, especially with its nicely colored GPIO pins. There’s also a beefier Walnut Pi 2B with an Allwinner T527 SoC that’s not being reviewed here. Translating the Chinese-language documentation for the board suggests that either the H616 or H618 may be installed, with both featuring a quad-core Cortex-A53, so in the ballpark of the Raspberry Pi 3.

There are also multiple RAM configurations, ranging from 1 GB of DDR3 to 4 GB of LPDDR4, with the 1 GB version being fun to try and run benchmarks like GeekBench on. Ultimately the impression was that it’s just another Allwinner SoC-based board, with a half-hearted ‘custom’ Linux image, no hardware acceleration due to missing (proprietary) Allwinner IP block drivers, etc.

While cheaper than a Raspberry Pi SBC, if you need anything more than the basic Allwinner H61* support and Ethernet/WiFi, there clearly are better options, some of which may even involve repurposing an e-waste Android TV box.

通过零拷贝和其他技术优化软件

2026-01-16 23:00:55

An important aspect in software engineering is the ability to distinguish between premature, unnecessary, and necessary optimizations. A strong case can be made that the initial design benefits massively from optimizations that prevent well-known issues later on, while unnecessary optimizations are those simply do not make any significant difference either way. Meanwhile ‘premature’ optimizations are harder to define, with Knuth’s often quoted-out-of-context statement about these being ‘the root of all evil’ causing significant confusion.

We can find Donald Knuth’s full quote deep in the 1974 article Structured Programming with go to Statements, which at the time was a contentious optimization topic. On page 268, along with the cited quote, we see that it’s a reference to making presumed optimizations without understanding their effect, and without a clear picture of which parts of the program really take up most processing time. Definitely sound advice.

And unlike back in the 1970s we have today many easy ways to analyze application performance and to quantize bottlenecks. This makes it rather inexcusable to spend more time today vilifying the goto statement than to optimize one’s code with simple techniques like zero-copy and binary message formats.

Got To Go Fast

The cache hierarchy of the 2008 Intel Nehalem x86 microarchitecture. (Source: Intel)
The cache hierarchy of the 2008 Intel Nehalem x86 microarchitecture. (Source: Intel)

There’s a big difference between having a conceptual picture of how one’s code interacts with the hardware and having an in-depth understanding. While the basic concept of more lines of code (LoC) translating into more RAM, CPU, and disk resources used is technically true much of the time, the real challenge lies in understanding how individual CPU cores are scheduled by the OS, how core cache synchronization works, and the impact that the L2 and L3 cache have.

Another major challenge is that of simply moving data around between system RAM, caches and registers, which seems obvious at face value, but the impact of certain decisions can have big implications. For example, passing a pointer to a memory address instead of the entire string, and performing aligned memory accesses instead of unaligned can take more or less time. This latter topic is especially relevant on x86, as this ISA allows unaligned memory access with a major performance penalty, while ARM will hard fault the application at the merest misaligned twitch.

I came across a range of these issues while implementing my remote procedure call library NymphRPC. Initially I used a simple and easy to parse binary message format, but saddled it with a naïve parser implementation that involved massive copying of strings, as this was the zero-planning-needed, smooth-brained, ‘safe’ choice. In hindsight this was a design failure with a major necessary optimization omitted that would require major refactoring later.

In this article I’d like to highlight both the benefits of simple binary formats as well as how simple it is to implement a zero-copy parser that omits copying of message data during parsing, while also avoiding memory alignment issues when message data is requested and copied to a return value.

KISS

Perhaps the biggest advantage of binary message formats is that they’re very simple, very small, and extremely low in calories. In the case of NymphRPC its message format features a standard header, a message-specific body, and a terminator. For a simple NymphRPC message call for example we would see something like:

uint32     Signature: DRGN (0x4452474e)
uint32      Total message bytes following this field.
uint8       Protocol version (0x00).
uint32      Method ID: identifier of the remote function.
uint32      Flags (see _Flags_ section).
uint64      Message ID. Simple incrementing global counter.
<..>            Serialised values. 
uint8           Message end. None type (0x01).

The very first value is a 32-bit unsigned integer that when interpreted as characters identifies this as a valid NymphRPC message. (‘DRGN’, because dragonfly nymph.) This is followed by another uint32 that contains the number of bytes that follow in the message. We’re now eight bytes in and we already have done basic validation and know what size buffer to allocate.

Serializing the values is done similarly, with an 8-bit type code followed by the byte(s) that contain the value. This is both easy to parse without complex validation like XML or JSON, and about as light-weight as one can make a format without adding something like compression.

Only If Needed

When we receive the message bytes on the network socket, we read it into a buffer. Because the second 32-bit value which we read earlier contained the message size, we can make sure to allocate a buffer that’s large enough to fit the rest of the message’s bytes. The big change with zero-copy parsing commences after this, where the naïve approach is to copy the entire byte buffer into e.g. a std::string for subsequent substring parsing.

Instead of such a blunt method, the byte buffer is parsed in-place with the use of a moving index pointer into the buffer. The two key methods involved with the parsing can be found in nymph_message.cpp and nymph_types.cpp, with the former providing the NymphMessage constructor and the basic message parser. After parsing the header, the NymphType class provides a parseValue() function that takes a value type code, a reference to the byte buffer and the current index. This function is called until the terminating NYMPH_TYPE_NONE is found, or some error occurs.

Looking at parseValue() in more detail, we can see two things of note: the first is that we are absolutely copying certain data despite the ‘zero-copy’ claim, and the liberal use of memcpy() instead of basic assignment statements. The first item is easy to explain: the difference between either copying the memory address or the value of a simple integer/floating point type is so minimal that we trip head-first into the same ‘premature optimization’ thing that Mr. Knuth complained about back in 1974.

Ergo we just copy the value and don’t break our pretty little heads about whether doing the same thing in a more convoluted way would net us a few percent performance improvement or loss. This is different with non-trivial types, such as strings. These are simply a char* pointer into the byte buffer, leaving the string’s bytes in peace and quiet until the application demands either that same character pointer via the API or calls the convenience function that assembles a readily-packaged std::string.

Memcpy Is Love

Although demonizing ‘doing things the C way’ appears to be a popular pastime, if you want to write code that works with the hardware instead of against it, you really want to be able to write some highly performative C code and fully understand it. When I had written the first zero-copy implementation of NymphRPC and had also written up what I thought was a solid article on how well optimized the project now was, I had no idea that I had a “fun” surprise waiting for me.

As I happily tried running the new code on a Raspberry Pi SBC after doing the benchmarking for the article on an x86 system, the first thing it did was give me a hard fault message in the shell along with a strongly disapproving glare from the ARM CPU. As it turns out, doing a direct assignment like this is bound to get you into trouble:

methodId = *((uint32_t*) (binmsg + index));

This line casts the current index into the byte buffer as a uint32_t type before dereferencing it and assigning the value to the variable. When you’re using e.g. std::string the alignment issues sort themselves out somewhere within the depths of the STL, but with direct memory access like this you’re at the mercy of the underlying platform. Which is a shame, because platforms like ARM do not know the word ‘mercy’.

Fortunately this is easy to fix:

memcpy(&methodId, (binmsg + index), 4);

Instead of juggling pointers ourselves, we simply tell memcpy what the target address is, where it should copy from and how many bytes are to be copied. Among all the other complex scenarios that this function has to cope with, doing aligned memory address access for reading and writing is probably among its least complex requirements.

Hindsight

Looking back on the NymphRPC project so far, it’s clear that some necessary optimizations that ought to have been there from the very beginning weren’t there. At least as far as unnecessary and premature optimizations go, I do feel that I have successfully dodged these, but since these days we’re still having annual flamewars about the merits of using goto I very much doubt that we will reach consensus here.

What is clear from the benchmarking that I have done on NymphRPC before and after this major refactoring is that zero-copy makes a massive difference, with especially operations involving larger data (string) chunks becoming multiple times faster, with many milliseconds shaved off and the Callgrind tool of Valgrind no longer listing __memcpy_avx_unaligned_erms as the biggest headache due to std::string abuse.

Perhaps the most important lesson from optimizing a library like NymphRPC is that aside from it being both frustrating and fun, it’s also a humbling experience that makes it clear that even as a purported senior developer there’s always more to learn. Even if putting yourself out there with a new experience like porting a lock-free ring buffer to a language like Ada and getting corrected by others stings a little.

After all, we are here to write performant software that’s easy to maintain and have fun while doing it, with sharing optimization tips and other tricks just being part of the experience.

WCH CH32M030:另一款值得关注的微控制器

2026-01-16 20:00:53

One of the joys of writing for Hackaday comes in following the world of new semiconductor devices, spotting interesting ones while they are still just entries on manufacturer websites, and then waiting for commonly-available dev boards. With Chinese parts there’s always a period in which Chinese manufacturers and nobody else has them, and then they quietly appear on AliExpress.

All of which brings us to the WCH CH32M030, a chip that’s been on the radar for a while and has finally broken cover. It’s the CH32 RISC-V microcontroller you may be familiar with, but with a set of four half-bridge drivers on board for running motors. A handy, cheap, and very smart motor controller, if you will.

There’s been at  least one Chinese CH32M030 dev board (Chinese language) online for a while now, but the one listed on AliExpress appears to be a different design. At the time of writing the most popular one is still showing fewer than 20 sales, so we’re getting in at the ground floor here.

We think this chip is of interest because it has the potential to be used in low price robotic projects, replacing as it does a couple of parts or modules in one go. If you use it, we’d like to hear from you!

掌上PSOne

2026-01-16 17:00:00

Sony’s original Playstation wasn’t huge, and they did shrink it for re-release later as the PSOne, but even that wasn’t small enough for [Secret Hobbyist]. You may have seen the teaser video a while back where his palm-size Playstation went viral, but now he’s begun a series of videos on how he redesigned the vintage console.

Luckily for [Secret Hobbyist], the late-revision PSOne he started with is only a two-layer PCB, which made reverse engineering the traces a lot easier. Between probing everything under the microscope and cleaning the board off to follow all the traces in copper, [Hobbyist] was able to reproduce the circuit in KiCAD. (Reverse engineering starts at about 1:18 in the vid.)

With a schematic in hand, drafting a smaller PCB than Sony built is made easier by the availability of multi-layer PCBs. In this case [Hobbyist] was able to get away with a four-layer board. He was also able to ditch one of the ICs from the donor mainboard, which he called a “sub-CPU” as its functionality was recreated on the “PSIO” board that’s replacing the original optical drive. The PSIO is a commercial product that has been around for years now, allowing Playstations to run from SD cards– but it’s not meant for the PSOne so just getting it working here is something of a hack. He’s also added on a new DAC for VGA output, but otherwise the silicon is all original SONY.

This is the first of a series about this build, so if you’re into retro consoles you might want to keep an eye on [Secret Hobbyist] on YouTube to learn all the details as they are released.

 

当一只虫子以两倍音速出现时会发生什么?

2026-01-16 14:00:58

While some may see amateur rocketry as little more than attaching fins to a motor and letting it fly, it is, in fact, rocket science. This fact became very clear to [BPS.space] when a parachute deployed on a rocket traveling at approximately Mach 1.8. 

The rocket design is rather simple — essentially just 3D printed fins glued onto a motor with a nose-cone for avionics. A single servo and trim tab provide a modicum of roll control, and a parachute is mounted in the nose along with a homing beacon for faster recovery. Seemingly, the only thing different about this flight is properly validated telemetry and GPS antennae.

After a final ground check of the telemetry and GPS signal quality, everything is ready for what seems like a routine launch. However, somewhere around Mach 1.8, the parachute prematurely deploys, ripping apart the Kevlar rope holding together the three rocket sections. Fortunately, the booster and avionics sections could be recovered from the desert.

But this begs the question, what could possibly have caused a parachute deployment at nearly twice the speed of sound?[BPS.space] had made a quick untested change to the flight control software, in an attempt to get more accurate speed data. By feeding into the flight controller barometric altitude changes during the decent stage, it should be able to more accurately estimate its position. However, direct static pressure readings at supersonic speeds are not an accurate way of measuring altitude. So, during the boost phase, the speed estimation function should only rely on accelerometer data.

The line in question.

However, a simple mistake in boolean logic resulted in the accelerometer velocity being passed into the velocity estimate function during the boost phase. This gave an erroneous velocity value below zero triggering the parachute deployment. Nevertheless, the test was successful in proving antenna choice resulted in poor telemetry and GPS readings on earlier launches.

If you want to see a far more successful [BPS.space] rocket launch, make sure to check out this self landing rocket next!