2025-12-28 11:02:44

I used to always want to rewrite my code. Maybe even use another programming language. « If only I could rewrite my code, it would be so much better now. »
If you maintain software projects, you see it all the time. Someone new comes along and they want to start rewriting everything. They always have subjective arguments: it is going to be more maintainable or safer or just more elegant.
If your code is battle tested… then the correct instinct is to be conservative and keep your current code. Sometimes you need to rewrite your code : you made a mistake or must change your architecture. But most times, the old code is fine and investing time in updating your current code is better than starting anew.
The great intellectual Robin Hanson argues that software ages. One of his arguments is that software engineers say that it does. That’s what engineers feel but whether it is true is another matter.
« Before Borland’s new spreadsheet for Windows shipped, Philippe Kahn, the colorful founder of Borland, was quoted a lot in the press bragging about how Quattro Pro would be much better than Microsoft Excel, because it was written from scratch. All new source code! As if source code rusted. The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material? » (Joel Spolsky)
2025-12-28 07:39:57

Most programmers are familiar with IP addresses. They take the form
of four numbers between 0 and 255 separated by dots: 192.168.0.1.
In some sense, it is a convoluted way to represent a 32-bit integer.
The modern version of an IP address is IPv6 which is usually surrounded
by square brackets. It is less common in my experience.
Using fancy techniques, you can parse IP addresses with as little as 50 instructions. It is a bit complicated and not necessarily portable.
What if you want high speed without too much work or a specialized library? You can try to roll your own. But since I am civilized programmer, I just asked my favorite AI to write it for me.
// Parse an IPv4 address starting at 'p'.
// p : start pointer, pend: end of the string
std::expected<uint32_t, parse_error> parse_manual(const char *p, const char *pend) {
uint32_t ip = 0;
int octets = 0;
while (p < pend && octets < 4) {
uint32_t val = 0;
const char *start = p;
while (p < pend && *p >= '0' && *p <= '9') {
val = val * 10 + (*p - '0');
if (val > 255) {
return std::unexpected(invalid_format);
}
p++;
}
if (p == start || (p - start > 1 && *start == '0')) {
return std::unexpected(invalid_format);
}
ip = (ip << 8) | val;
octets++;
if (octets < 4) {
if (p == pend || *p != '.') {
return std::unexpected(invalid_format);
}
p++; // Skip dot
}
}
if (octets == 4 && p == pend) {
return ip;
} else {
return std::unexpected(invalid_format);
}
}
It was immediately clear to me that this function was not as fast as it could be. I then asked the AI to improve the result by using the fact that each number is made of between one and three digits. I got the following reasonable function.
std::expected<uint32_t, parse_error> parse_manual_unrolled(const char *p, const char *pend) {
uint32_t ip = 0;
int octets = 0;
while (p < pend && octets < 4) {
uint32_t val = 0;
if (p < pend && *p >= '0' && *p <= '9') {
val = (*p++ - '0');
if (p < pend && *p >= '0' && *p <= '9') {
if (val == 0) {
return std::unexpected(invalid_format);
}
val = val * 10 + (*p++ - '0');
if (p < pend && *p >= '0' && *p <= '9') {
val = val * 10 + (*p++ - '0');
if (val > 255) {
return std::unexpected(invalid_format);
}
}
}
} else {
return std::unexpected(parse_error::invalid_format);
}
ip = (ip << 8) | val;
octets++;
if (octets < 4) {
if (p == pend || *p != '.') {
return std::unexpected(invalid_format);
}
p++; // Skip the dot
}
}
if (octets == 4 && p == pend) {
return ip;
} else {
return std::unexpected(invalid_format);
}
}
Nice work AI!
In C++, we have standard functions to parse numbers (std::from_chars) which can significantly simplify the code.
std::expected<uint32_t, parse_error> parse_ip(const char *p, const char *pend) {
const char *current = p;
uint32_t ip = 0;
for (int i = 0; i < 4; ++i) {
uint8_t value;
auto r = std::from_chars(current, pend, value);
if (r.ec != std::errc()) {
return std::unexpected(invalid_format);
}
current = r.ptr;
ip = (ip << 8) | value;
if (i < 3) {
if (current == pend || *current++ != '.') {
return std::unexpected(invalid_format);
}
}
}
return ip;
}
You can also use the fast_float library as a substitute for std::from_chars. The latest version of fast_float has faster 8-bit integer parsing thanks to Shikhar Soni (with a fix by Pavel Novikov).
I wrote a benchmark for this problem. Let us first consider the results using an Apple M4 processors (4.5 GHz) with LLVM 17.
| function | instructions/ip | ns/ip |
|---|---|---|
| manual | 185 | 6.2 |
| manual (unrolled) | 114 | 3.3 |
| from_chars | 381 | 14 |
| fast_float | 181 | 7.2 |
Let us try with GCC 12 and an Intel Ice Lake processor (3.2 GHz) using GCC 12.
| function | instructions/ip | ns/ip |
|---|---|---|
| manual | 219 | 30 |
| manual (unrolled) | 154 | 24 |
| from_chars | 220 | 29 |
| fast_float | 211 | 18 |
And finally, let us try with a Chinese Longsoon 3A6000 processor (2.5 GHz) using LLVM 21.
| function | instructions/ip | ns/ip |
|---|---|---|
| manual | 187 | 29 |
| manual (unrolled) | 109 | 21 |
| from_chars | 191 | 39 |
| fast_float | 193 | 27 |
The optimization work on the fast_float library paid off. The difference is especially striking on the x64 processor.
What is also interesting in my little experiment is that I was able to get the AI to produce faster code with relatively little effort on my part. I did have to ‘guide’ the AI. Does that mean that I can retire? Not yet. But I am happy that I can more quickly get good reference baselines, which allows me to better focus my work where it matters.
Reference: The fast_float C++ library is a fast number parsing library part of GCC and major web browsers.
2025-12-21 07:26:09

Strings in programming are often represented as arrays of 8-bit words. The string is ASCII if and only if all 8-bit words have their most significant bit unset. In other words, the byte values must be no larger than 127 (or 0x7F in hexadecimal).
A decent C function to check that the string is ASCII is as follows.
bool is_ascii_pessimistic(const char *data, size_t length) {
for (size_t i = 0; i < length; i++) {
if (static_cast<unsigned char>(data[i]) > 0x7F) {
return false;
}
}
return true;
}
We go over each character, we compare it with 0x7F and continue if the value is no larger than 0x7F. If you have scanned the entire string and all tests have passed, you know that your string is ASCII.
Notice how I called this function pessimistic. What do I mean? I mean that it expects, in some sense, that it will find some non-ASCII character. If so, the best option is to immediately return and not scan the whole string.
What if you expect the string to almost always be ASCII? An alternative then is to effectively do a bitwise OR reduction of the string: you OR all characters together and you check just once that the result is bounded by 0x7F. If any character has its most significant bit set, then the bitwise OR of all characters will also have its most significant bit set. So you might write your function as follows.
bool is_ascii_optimistic(const char *data, size_t length) {
unsigned char result = 0;
for (size_t i = 0; i < length; i++) {
result |= static_cast<unsigned char>(data[i]);
}
return result <= 0x7F;
}
If you have strings that are all pure ASCII, which function will be fastest? Maybe surprisingly, the optimistic might be several times faster. I wrote a benchmark and ran it with GCC 15 on an Intel Ice Lake processor. I get the following results.
| function | speed |
|---|---|
| pessimistic | 1.8 GB/s |
| optimistic | 13 GB/s |
Why is the optimistic faster? Mostly because the compiler is better able to optimize it. Among other possibilities, it can use autovectorization to automatically use data-level parallelization (e.g., SIMD instructions).
Which function is best depends on your use case.
What if you would prefer a pessimistic function, that is, one that returns early when non-ASCII characters are encountered, but you still want high speed? Then you can use a dedicated library like simdutf where we have hand-coded the logic. In simdutf, the pessimistic function is called validate_ascii_with_errors. Your results will vary but I got that it has the same speed as optimistic function.
| function | speed |
|---|---|
| pessimistic | 1.8 GB/s |
| pessimistic (simdutf) | 14 GB/s |
| optimistic | 13 GB/s |
So it is possible to combine the benefits of pessimism and optimism although it requires a bit of care.
2025-12-21 05:24:33

Much of the data on the Internet is shared using a simple format called JSON. JSON is made of two composite types (arrays and key-value maps) and a small number of primitive types (64-bit floating-point numbers, strings, null, Booleans). That JSON became ubiquitous despite its simplicity is telling.
{ "name": "Nova Starlight", "age": 28, "powers": ["telekinesis", "flight","energy blasts"] }
Interestingly, JSON matches closely the data structures provided by default in the popular language Go. Go gives you arrays/slices and maps… in addition to the standard primitive types. It is a bit more than C which does not provide maps by default. But it is significantly simpler than Java, C++, C#, and many other programming languages where the standard library covers much of the data structures found in textbooks.
There is at least one obvious data structure that is missing in JSON, and in Go, the set. Because objects are supposed to have no duplicate keys, you can implement a set of strings by assigning keys to an arbitrary value like true.
{"element1": true, "element2": true}
But I believe that it is a somewhat unusual pattern. Most times, when we mean to represent a set of objects, an array suffices. We just need to handle the duplicates somehow.
There have been many attempts at adding more concepts to JSON, more complexity, but none of them have achieved much traction. I believe that it reflects the fact that JSON is good enough as a data format.
I refer to any format that allows you to represent JSON data, such as YAML, as a JSON-complete data format. If it is at least equivalent to JSON, it is rich enough for most problems.
Similarly, I suggest that new programming languages should aim to be JSON-complete: they should provide a map with key-value pairs, arrays, and basic primitive types. In this light, the C and the Pascal programming languages are not JSON-complete.
2025-12-15 09:42:10

Programmers often want to randomly shuffle arrays. Evidently, we want to do so as efficiently as possible. Maybe surprisingly, I found that the performance of random shuffling was not limited by memory bandwidth or latency, but rather by computation. Specifically, it is the computation of the random indexes itself that is slow.
Earlier in 2025, I reported how you could more than double the speed of a random shuffle in Go using a new algorithm (Brackett-Rozinsky and Lemire, 2025). However, I was using custom code that could not serve as a drop-in replacement for the standard Go Shuffle function. I decided to write a proper library called batchedrand. You can use it just like the math/rand/v2 standard library.
rng := batchedrand.Rand{rand.New(rand.NewPCG(1, 2))}
data := []int{1, 2, 3, 4, 5}
rng.Shuffle(len(data), func(i, j int) {
data[i], data[j] = data[j], data[i]
})
How fast is it? The standard library provides two generators, PCG and ChaCha8. ChaCha8 should be slower than PCG, because it has better cryptographic guarantees. However, both have somewhat comparable speeds because ChaCha8 is heavily optimized with assembly code in the Go runtime while the PCG implementation is conservative and not focused on speed.
On my Apple M4 processor with Go 1.25, I get the following results. I report the time per input element, not the total time.
| Benchmark | Size | Batched (ns/item) | Standard (ns/item) | speedup |
|---|---|---|---|---|
| ChaChaShuffle | 30 | 1.8 | 4.6 | 2.6 |
| ChaChaShuffle | 100 | 1.8 | 4.7 | 2.5 |
| ChaChaShuffle | 500000 | 2.6 | 5.1 | 1.9 |
| PCGShuffle | 30 | 1.5 | 3.9 | 2.6 |
| PCGShuffle | 100 | 1.5 | 4.2 | 2.8 |
| PCGShuffle | 500000 | 1.9 | 3.8 | 2.0 |
Thus, from tiny to large arrays, the batched approach is two to three times faster. Not bad for a drop-in replacement!
Get the Go library at https://github.com/lemire/batchedrand
Further reading:
2025-12-06 03:24:50

The one constant that I have observed in my professional life is that people underestimate the need to move fast.
Of course, doing good work takes time. I once spent six months writing a URL parser. But the fact that it took so long is not a feature, it is not a positive, it is a negative.
If everything is slow-moving around you, it is likely not going to be good. To fully make use of your brain, you need to move as close as possible to the speed of your thought.
If I give you two PhD students, one who completed their thesis in two years and one who took eight years… you can be almost certain that the two-year thesis will be much better.
Moving fast does not mean that you complete your projects quickly. Projects have many parts, and getting everything right may take a long time.
Nevertheless, you should move as fast as you can.
For multiple reasons:
1. A common mistake is to spend a lot of time—too much time—on a component of your project that does not matter. I once spent a lot of time building a podcast-like version of a course… only to find out later that students had no interest in the podcast format.
2. You learn by making mistakes. The faster you make mistakes, the faster you learn.
3. Your work degrades, becomes less relevant with time. And if you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? He is not going to throw them away and start again, as that would be a new seven-year project. So he will keep teaching using aging lecture notes until he retires and someone finally updates the course.
What if you are doing open-heart surgery? Don’t you want someone who spends days preparing and who works slowly? No. You almost surely want the surgeon who does many, many open-heart surgeries. They are very likely to be the best one.
Now stop being so slow. Move!