MoreRSS

site iconDaniel LemireModify

Computer science professor at the University of Quebec (TELUQ), open-source hacker, and long-time blogger.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of Daniel Lemire

The culture war that we won

2025-12-31 23:26:25

Culture wars are real. They occur when a dominant culture faces a serious challenge. But unless you pay close attention, you might miss them entirely. As a kid, I was a “nerd.” I read a lot and spent hours on my computer. I devoured science and technology magazines. I taught myself programming. “Great!” you might think. Not at all. This was not valued where and when I grew up. Computers were seen as toys. A kid who spent a lot of time on a computer was viewed as obsessed with a rather dull plaything. We had a computer club, but it was essentially a gathering of “social rejects.” No one looked up to us. Working with computers carried no prestige. Dungeons & Dragons was outright “dangerous”—you had to hide any interest in such games. The 1983 movie WarGames stands out precisely because the computer-obsessed kid gets the girl and saves the world. Bill Gates was becoming famous around that time, but this marked only the beginning of a decade-long culture war in which hacker culture gradually rose to dominance.

Today, most people can speak the language of hackers. It did not have to turn out this way, and it did not unfold identically everywhere. The status of hacker culture is high in the United States, but it remains lower in many other places even now. Even so, in many organizations today, even in the United States, the « computer people » are stored in the basement. We do not let them out too often. They are not « people persons ». So the culture war was won by the hackers, the victory is undeniable. But as with all wars, the result is more nuanced that one might think. Many would like nothing more than to send back the computer people at the bottom of the prestige ladder.

Salaries are a good indicator for prestige. In the USA, in Australia and in Switzerland, « computer people » have high salaries and relatively high status. In the UK as a whole? Not so much. I bet you do better as a « financial analyst » over there.

What is worth watching is the effect that « AI » will have on the status battles. In some sense, building software that can do financial, political and legal analysis is the latest weapon in the arsenal of the computer people. Many despair about what AI might do to software developers: I recommend looking at it in the context of the hacker culture war.

By how much does your memory allocator overallocate?

2025-12-31 03:15:55

How much virtual memory does the following C++ expression allocate on the heap?

new char[4096]

The answer is at least 4 kibibytes but surely more.

Firstly, each heap memory allocation requires some memory to keep track of what has been allocated. You are likely using 8 bytes or so of overhead that your program cannot access.

Secondly, the memory allocator may allocate a bit more than the 4096 bytes you requested. On a Linux machine, I found that it would allocate 4104 bytes, so 8 extra bytes that are usable by your program. You can check this value by calling malloc_usable_size under Linux.

Thus, overall, you may end up with an extra 16 bytes allocated when you requested 4096 bytes. It is an overhead of about 0.4%. You are basically wasting a byte for every 256 bytes that you allocate.

But that is not the worst possible case. On macOS, let us consider the following line of code.

new char[3585]

The system reports an allocation of 4096 bytes: a 14% overhead. What is happening is that macOS rounds up the memory allocation to the nearest 512 byte boundary for moderately small allocations. If you try allocating even larger memory blocks, it starts rounding up even more.

Freedom from incompetence

2025-12-29 22:41:00

Many people say that they crave more freedom.
But what do we mean by “freedom”?
Being free from constraints? Is that what we mean? Would you feel “freer” if you could walk outside in your underwear?
It is almost surely not what you mean by “freedom.”
I submit to you that it is almost always the case that if you are frustrated at work by your lack of freedom, the actual problem is competence.
Imagine two scenarios.

  • Scenario A: You work for a highly directive boss. You are constantly accountable for what you do. But everyone around you is highly competent. You need to wear a jacket and a tie, but you work in the best team in the world.
  • Scenario B: You work in a context where you hardly know who your boss is. You come to work in your underwear. However, everyone is incompetent. You work with the least competent team in the world.

Assuming that the salary is the same, which job do you prefer?
I cannot answer for you, but most people I know prefer Scenario A.

Don’t be so eager to rewrite your code

2025-12-28 11:02:44

I used to always want to rewrite my code. Maybe even use another programming language. « If only I could rewrite my code, it would be so much better now. »

If you maintain software projects, you see it all the time. Someone new comes along and they want to start rewriting everything. They always have subjective arguments: it is going to be more maintainable or safer or just more elegant.

If your code is battle tested… then the correct instinct is to be conservative and keep your current code. Sometimes you need to rewrite your code : you made a mistake or must change your architecture. But most times, the old code is fine and investing time in updating your current code is better than starting anew.

The great intellectual Robin Hanson argues that software ages. One of his arguments is that software engineers say that it does. That’s what engineers feel but whether it is true is another matter.

« Before Borland’s new spreadsheet for Windows shipped, Philippe Kahn, the colorful founder of Borland, was quoted a lot in the press bragging about how Quattro Pro would be much better than Microsoft Excel, because it was written from scratch. All new source code! As if source code rusted. The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material? » (Joel Spolsky)

Parsing IP addresses quickly (portably, without SIMD magic)

2025-12-28 07:39:57

Most programmers are familiar with IP addresses. They take the form
of four numbers between 0 and 255 separated by dots: 192.168.0.1.
In some sense, it is a convoluted way to represent a 32-bit integer.
The modern version of an IP address is IPv6 which is usually surrounded
by square brackets. It is less common in my experience.

Using fancy techniques, you can parse IP addresses with as little as 50 instructions. It is a bit complicated and not necessarily portable.

What if you want high speed without too much work or a specialized library? You can try to roll your own. But since I am civilized programmer, I just asked my favorite AI to write it for me.

// Parse an IPv4 address starting at 'p'.
// p : start pointer, pend: end of the string
std::expected<uint32_t, parse_error> parse_manual(const char *p, const char *pend) {
uint32_t ip = 0;
    int octets = 0;
    while (p < pend && octets < 4) {
        uint32_t val = 0;
        const char *start = p;
        while (p < pend && *p >= '0' && *p <= '9') {
            val = val * 10 + (*p - '0');
            if (val > 255) {
                return std::unexpected(invalid_format);
            }
            p++;
        }
        if (p == start || (p - start > 1 && *start == '0')) {
            return std::unexpected(invalid_format);
        }
        ip = (ip << 8) | val;
        octets++;

        if (octets < 4) {
            if (p == pend || *p != '.') {
                return std::unexpected(invalid_format);
            }
            p++; // Skip dot
        }
    }
    if (octets == 4 && p == pend) {
        return ip;
    } else {
        return std::unexpected(invalid_format);
    }
}

It was immediately clear to me that this function was not as fast as it could be. I then asked the AI to improve the result by using the fact that each number is made of between one and three digits. I got the following reasonable function.

std::expected<uint32_t, parse_error> parse_manual_unrolled(const char *p, const char *pend) {
    uint32_t ip = 0;
    int octets = 0;
    while (p < pend && octets < 4) {
        uint32_t val = 0;
        if (p < pend && *p >= '0' && *p <= '9') {
            val = (*p++ - '0');
            if (p < pend && *p >= '0' && *p <= '9') {
                if (val == 0) { 
                  return std::unexpected(invalid_format);
                }
                val = val * 10 + (*p++ - '0');
                if (p < pend && *p >= '0' && *p <= '9') {
                    val = val * 10 + (*p++ - '0');
                    if (val > 255) { 
                      return std::unexpected(invalid_format);
                    }
                }
            }
        } else {
            return std::unexpected(parse_error::invalid_format);
        }
        ip = (ip << 8) | val;
        octets++;
        if (octets < 4) {
            if (p == pend || *p != '.') {
              return std::unexpected(invalid_format);
            }
            p++; // Skip the dot
        }
    }
    if (octets == 4 && p == pend) {
        return ip;
    } else {
        return std::unexpected(invalid_format);
    }
}

Nice work AI!

In C++, we have standard functions to parse numbers (std::from_chars) which can significantly simplify the code.

std::expected<uint32_t, parse_error> parse_ip(const char *p, const char *pend) {
  const char *current = p;
  uint32_t ip = 0;
  for (int i = 0; i < 4; ++i) {
    uint8_t value;
    auto r = std::from_chars(current, pend, value);
    if (r.ec != std::errc()) {
      return std::unexpected(invalid_format);
    }
    current = r.ptr;
    ip = (ip << 8) | value;
    if (i < 3) {
      if (current == pend || *current++ != '.') {
        return std::unexpected(invalid_format);
      }
    }
  }
  return ip;
}

You can also use the fast_float library as a substitute for std::from_chars. The latest version of fast_float has faster 8-bit integer parsing thanks to Shikhar Soni (with a fix by Pavel Novikov).

I wrote a benchmark for this problem. Let us first consider the results using an Apple M4 processors (4.5 GHz) with LLVM 17.

function instructions/ip ns/ip
manual 185 6.2
manual (unrolled) 114 3.3
from_chars 381 14
fast_float 181 7.2

Let us try with GCC 12 and an Intel Ice Lake processor (3.2 GHz) using GCC 12.

function instructions/ip ns/ip
manual 219 30
manual (unrolled) 154 24
from_chars 220 29
fast_float 211 18

And finally, let us try with a Chinese Longsoon 3A6000 processor (2.5 GHz) using LLVM 21.

function instructions/ip ns/ip
manual 187 29
manual (unrolled) 109 21
from_chars 191 39
fast_float 193 27

The optimization work on the fast_float library paid off. The difference is especially striking on the x64 processor.

What is also interesting in my little experiment is that I was able to get the AI to produce faster code with relatively little effort on my part. I did have to ‘guide’ the AI. Does that mean that I can retire? Not yet. But I am happy that I can more quickly get good reference baselines, which allows me to better focus my work where it matters.

Reference: The fast_float C++ library is a fast number parsing library part of GCC and major web browsers.

Performance trick : optimistic vs pessimistic checks

2025-12-21 07:26:09

Strings in programming are often represented as arrays of 8-bit words. The string is ASCII if and only if all 8-bit words have their most significant bit unset. In other words, the byte values must be no larger than 127 (or 0x7F in hexadecimal).

A decent C function to check that the string is ASCII is as follows.

bool is_ascii_pessimistic(const char *data, size_t length) {
  for (size_t i = 0; i < length; i++) {
    if (static_cast<unsigned char>(data[i]) > 0x7F) {
      return false;
    }
  }
  return true;
}

We go over each character, we compare it with 0x7F and continue if the value is no larger than 0x7F. If you have scanned the entire string and all tests have passed, you know that your string is ASCII.

Notice how I called this function pessimistic. What do I mean? I mean that it expects, in some sense, that it will find some non-ASCII character. If so, the best option is to immediately return and not scan the whole string.

What if you expect the string to almost always be ASCII? An alternative then is to effectively do a bitwise OR reduction of the string: you OR all characters together and you check just once that the result is bounded by 0x7F. If any character has its most significant bit set, then the bitwise OR of all characters will also have its most significant bit set. So you might write your function as follows.

bool is_ascii_optimistic(const char *data, size_t length) {
  unsigned char result = 0;
  for (size_t i = 0; i < length; i++) {
    result |= static_cast<unsigned char>(data[i]);
  }
  return result <= 0x7F;
}

If you have strings that are all pure ASCII, which function will be fastest? Maybe surprisingly, the optimistic might be several times faster. I wrote a benchmark and ran it with GCC 15 on an Intel Ice Lake processor. I get the following results.

function speed
pessimistic  1.8 GB/s
optimistic  13 GB/s

Why is the optimistic faster? Mostly because the compiler is better able to optimize it. Among other possibilities, it can use autovectorization to automatically use data-level parallelization (e.g., SIMD instructions).

Which function is best depends on your use case.

What if you would prefer a pessimistic function, that is, one that returns early when non-ASCII characters are encountered, but you still want high speed? Then you can use a dedicated library like simdutf where we have hand-coded the logic. In simdutf, the pessimistic function is called validate_ascii_with_errors. Your results will vary but I got that it has the same speed as optimistic function.

function speed
pessimistic  1.8 GB/s
pessimistic (simdutf)  14 GB/s
optimistic  13 GB/s

So it is possible to combine the benefits of pessimism and optimism although it requires a bit of care.