RSS preview of Daniel Lemire

Rss preview of Blog of Daniel Lemire

Memory-level parallelism :: Apple M2 vs Apple M4

2025-07-10 04:33:11

The Apple M2, introduced in 2022, and the Apple M4, launched in 2024, are both ARM-based system-on-chip (SoC) designs featuring unified memory architecture. That is, they use the same memory for both graphics (GPU) and main computations (CPU). The M2 processor relies on LPDDR5 memory whereas the M4 relies on LPDDR5X which should provide slightly more bandwidth.

The exact bandwidth you get from an Apple system depends on your configuration. But I am interested in single-core random access performance. To measure this performance, I construct a large array of indexes. These indexes form a random loop: starting from any element, if you read its value, treat it as an index, move to this index and so forth, you will visit each and every element in the large array. This type of benchmark is often described as ‘pointer chasing’ since it simulates what happens when your software is filled with pointers to data structures which themselves are made of pointers, and so forth.

When loading any value from memory, there is a latency of many cycles. Thankfully, modern processors can sustain many such loads at the same time. How many depends on the processor but modern processors can sustain tens of memory requests at any given time. This phenomenon is part of what we call memory-level parallelism : the ability of the memory subsystem to sustain many tasks at once.

Thus we can split the pointer-chasing benchmark into lanes. Instead of starting at just one place, you can start at two locations at once, one at the ‘beginning’ and the other at the midpoint. And so forth. I refer the number of such divisions as a ‘lane’. So it is one lane, two lanes and so forth. Obviously, the more lanes you have, the faster you can go. From how fast you can go, you can estimate the effective bandwidth by assuming that each hit in the array is equivalent to loading a cache line (128 bytes). The array is made of over 33 million 64-bit words.

I run my benchmarks on two processors (Apple M2 and Apple M4). I have to limit the number of lanes since beyond a certain point, there is too much noise. A maximum of 28 lanes works well.

Maybe unsurprisingly, I find that the difference between the M4 and the M2 is not enormous (about 15%). Both processors can visibly sustain 28 lanes.

My code is available.

Just say no to broken JSON

2025-07-04 21:49:03

JSON, or JavaScript Object Notation, is a lightweight data-interchange format. It is widely used for transmitting data between a server and a web application, due to its simplicity and compatibility with many programming languages.

The JSON format has a simple syntax with a fixed number of data types such as strings, numbers, Booleans, null, objects, and arrays. Strings must not contain unescaped control characters (e.g., no unescaped newlines or tabs); instead, special characters must be escaped with a backslash (e.g., the two characters ‘\n’ replace the newline character). Numbers must follow valid formats, such as integers (e.g., 42), floating-point numbers (e.g., 3.14), or scientific notation (e.g., 1e-10). The format is specified formally in the RFC 8259.

Irrespective of your programming language, there are readily available libraries to parse and generate valid JSON. Unfortunately, people who have not paid attention to the specification often write buggy code that leads to malformed JSON. Let us consider the strings, for example. The specification states the following:

All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

The rest of the specification explains how characters must be escaped. For example, any linefeed character must be replaced by the two characters ‘\n’.

Simple enough, right? Producing valid JSON is definitively not hard. Programming a function to properly escape the characters in a string can be done by ChatGPT and it only spans four or five lines of code, at most.

Sadly, some people insist on using broken JSON generators. It is a recurring problem as they later expect parsers to accept their ill-formed JSON. By breaking interoperability you lose the core benefit of JSON.

Let me consider a broken JSON document:

{"key": "value\nda"}

My convention is that \n is the one-byte ASCII control character linefeed, unless otherwise stated. This JSON is not valid. What happens when you try to parse it?

Let us try Python:

import json
json_string = '{"key": "value\nda"}' 
data = json.loads(json_string)

This program fails with the following error:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 15 (char 14)

So the malformed JSON cannot be easily processed by Python.

What about JavaScript?

const jsonString = '{"key": "value\nda"}'; 
let data = JSON.parse(jsonString);

This fails with

SyntaxError: Bad control character in string literal in JSON at position 14 (line 1 column 15)

What about Java? The closest thing to a default JSON parser in Java is jackson. Let us try.

import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Map;

void main() {
   String jsonString = "{\"key\": \"value\nda\"}";
   Map<String, Object> data = parseJson(jsonString);
}

I get

JSON parsing error: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value

What about C#?

using System.Text.Json;

string jsonString = "{"key": "value\nda"}";
using JsonDocument doc = JsonDocument.Parse(jsonString);

And you get, once again, an error.

In a very real sense, the malformed JSON document I started with is not JSON. By accommodating buggy systems instead of fixing them, we create workarounds that degrade our ability to work productively.

We have a specific name for this effect: technical debt. Technical debt refers to the accumulation of compromises or suboptimal solutions in software development that prioritize short-term progress but complicate long-term maintenance or evolution of the system. It often arises from choosing quick fixes, such as coding around broken systems instead of fixing them.

To avoid technical debt, systems should simply reject invalid JSON. They pollute our data ecosystem. Producing correct JSON is easy. Bug reports should be filled with people who push broken JSON. It is ok to have bugs, it is not ok to expect the world to accommodate them.

Base64 for compression

2025-07-03 21:38:07

C and C++ compilers like GCC first take your code and produce assembly, typically a pure ASCII output (so just basic English characters). This assembly code is a low-level representation of the program, using mnemonic instructions specific to the target processor architecture. The compiler then passes this assembly code to an assembler, which translates it into machine code—binary instructions that the processor can execute directly.

When compiling code, characters like ‘é’ in strings, such as unsigned char a[] = "é";, may be represented in UTF-8. The Unicode (UTF-8) encoding for ‘é’ is two bytes, \303\251. However, when this is represented as an assembly string, it requires 8 characters to express those two bytes (e.g., "\303\251") because the assembly is ASCII. Thus, a single character in source code can expand significantly in the compiled output.

As a related issue, new versions of C and C++ have an ‘#embed’ directive that allows you to directly embed an arbitrary file in your code (e.g., en image). Such data might be encoded inefficiently as assembly.

What could you do?

Base64 is an encoding method that converts binary data into a string of printable ASCII characters, using a set of 64 characters (uppercase and lowercase letters, digits, and symbols like + and /). It is commonly used to represent binary data, such as images or files, in text-based formats like JSON, XML, or emails (MIME).

When starting from binary data, base64 data expands the data, turning 3 input bytes into 4 ASCII characters. Interestingly, in some cases, base64 can be used for compression purposes. Older versions of GCC would compile

unsigned char a[] = "éééééééé";

.string "\303\251\303\251\303\251\303\251\303\251\303\251\303\251\303\251"

The sequences \303\251 are octal escape codes representing the bytes 0xC3 (\303 in octal) and 0xA9 (\251 in octal).

GCC 15 now supports base64 encoding of data during compilation, with a new “base64” pseudo-op. Our array now gets compiled to the much shorter string

.base64 "w6nDqcOpw6nDqcOpw6nDqQA="

From code reuse to the impact of generative AI

2025-06-27 21:33:12

Back when I started programming, project teams were large. Organizations had dozens of programmers on sprawling projects. Reusing code was not trivial. Sharing code required real effort. Sometimes you had to hand over a disk or recopy code from a printout.

Better networking eased code reuse. Open source took over. We no longer have someone writing a red-black tree from scratch each time we need one.

What happened to those writing red-black trees when you could just download one from the internet?

They moved on to other work.

Generative AI (Copilot and friends) is basically just that: easier code reuse. If you’re trying to solve a problem that 20 engineers have solved and posted online, AI will solve it for you.

What happens to you? You move on to other, better things.
Software is supposed to be R&D. You’re supposed to solve problems no one has quite solved before.

But aren’t people predicting mass unemployment due to AI?

I suggest not making claims about the future without looking at hard data first The unemployment rate in Canada is relatively high right now, but historical data shows no ChatGPT discontinuity (before 2022, after 2022).

Maybe unemployment due to AI will hit computer science graduates hardest? Compared to what? Biology or business graduates? Where’s the data? Whether you should go to college at all is another question, but if you recommend opting out of computer science while still going to college, please provide the ChatGPT-safe alternative.

Can anyone become a programmer today? Let’s look at what happens on campus. Over 90% of students use AI for coursework. I estimate the percentage is higher in computer science. Let’s say all computer science students use AI.

A couple of years ago, I added a chatbot to my intro-to-programming class (now using RAG + GPT-4). With over 300 students a year, it’s a good test case.

I have grade data, and there’s no visible before/after ChatGPT effect. Last term, the failure rate was about the same as five years ago.

I don’t have data from other universities, but I haven’t heard anyone complain that students breeze through programming classes. This is despite AI being able to handle most assignments in an intro-to-computing course.

The challenge in an intro-to-programming class was never finding answers. Before ChatGPT, you could find solutions on Google or StackOverflow. Maybe it took longer, but it’s a quantitative difference, not a qualitative one, for elementary problems.

The skill you need to develop early on is reading code and instantly understanding it. If you’re a programmer, you might forget you have this skill. ChatGPT can give you code, but can your brain process it?

If you give Donald Trump ChatGPT, he still won’t code your web app. He’ll get code but won’t understand it.

More broadly, OECD economists predict worker productivity could grow by up to 0.9% per year thanks to AI over the next ten years. I’m not sure I trust economists to predict a decade ahead, but I trust they’ve studied recent trends carefully. And economist will tell you that a 0.9% rise in worker productivity per year is not at all a break from historical trends.

In fact, by historical standards, productivity growth is low. We are not at all in a technological surge like our grand-parents who lived through the electrification of their country. Going from no refrigerator to a refrigerator and a TV, that’s a drastic change. Going from Google to ChatGPT is nothing extraordinary.

Can generative AI bring about more drastic changes? Maybe. I hope so. But we don’t have hard evidence right now that it does.

Discover C++26’s compile-time reflection

2025-06-22 09:55:37

Herb Sutter just announced that the verdict is in: C++26, the next version of C++, will include compile-time reflection.

Reflection in programming languages means that you have access the code’s own structure. For example, you can take a class, and enumerate its methods. For example, you could receive a class, check whether it contains a method that returns a string, call this method and get the string. Most programming languages have some form of reflection. For example, the good old Java does have complete reflection support.

However, C++ is getting compile-time reflection. It is an important development.

I announced a few months ago that thanks to joint work with Francisco Geiman Thiesen, the performance-oriented JSON library simdjson would support compile-time reflection as soon as mainstream compilers support it.

This allows you to take your own data structure and convert it to a JSON string without any effort, and at high speed:

kid k{12, "John", {"car", "ball"}};
simdjson::to_json(k);
// would generate {"age": 12, "name": "John", "toys": ["car", "ball"]}

And you can also go back, given a JSON document, you can get back an instance of your custom type:

kid k = doc.get<kid>();

The code can be highly optimized and it can be thoroughly tested, in the main library. Removing the need for boilerplate code has multiple benefits.

To illustrate the idea further, let me consider the case of object-to-SQL mapping. Suppose you have your own custom type:

struct User {
    int id;
    std::string name;
    double balance;
  private:
    int secret; // Ignored in SQL generation
};

You want to insert an instance of this user into your database. You somehow need to convert it to a string such as

INSERT INTO tbl (id, name, balance) VALUES (0, '', 0.000000);

How easy can it be? With compile-time reflection, we can make highly efficient and as simple as single function call:

generate_sql_insert(u, "tbl");

Of course, the heavy lifting still needs to be done. But it only needs to be done once.

What might it look like? First, we want to generate the column string (e.g., id, name, balance).

I do not have access yet to a true C++26 compiler. When C++26 arrive, we will have features such as ‘template for’ which are like ‘for’ loops, but for template metaprogramming. Meanwhile, I use a somewhat obscure ‘expand’ syntax.

Still the code is reasonable:

template<typename T>
consteval std::string generate_sql_columns() {
    std::string columns;
    bool first = true;

    constexpr auto ctx = std::meta::access_context::current();

    [:expand(std::meta::nonstatic_data_members_of(^^T, ctx)):] 
                        >> [&]<auto member>{
            using member_type = typename[:type_of(member):];
            if (!first) {
                columns += ", ";
            }
            first = false;

            // Get member name
            auto name = std::meta::identifier_of(member);
            columns += name;
    };

    return columns;
}

This function is ‘consteval’ which means that you should expect it to get evaluated at compile time. So it is very efficient: the string is computed while you are compiling your code. Thus the following function might just return a precomputed string:

std::string g() {
    return generate_sql_columns<User>();
}

Next we need to compute the string for the values (e.g., (0, ”, 0.000000)). That’s a bit trickier. You need to escape strings, and handle different value types. Here is a decent sketch:

template<typename T>
constexpr std::string generate_sql_valuess(const T& obj) {
    std::string values;
    bool first = true;

    constexpr auto ctx = std::meta::access_context::current();

    [:expand(std::meta::nonstatic_data_members_of(^^T, ctx)):] 
                        >> [&]<auto member>{
        using member_type = typename[:type_of(member):];
        if (!first) {
            values += ", ";
        }
        first = false;

        // Get member value
        auto value = obj.[:member:];

        // Format value based on type
        if constexpr (std::same_as<member_type, std::string>) {
            // Escape single quotes in strings
            std::string escaped = value;
            size_t pos = 0;
            while ((pos = escaped.find('\'', pos)) 
                      != std::string::npos) {
                escaped.replace(pos, 1, "''");
                pos += 2;
            }
            values += "'" + escaped + "'";
        } else if constexpr (std::is_arithmetic_v<member_type>) {
            values += std::to_string(value);
        }
    };

    return values;
}

You can now put it all together:

template<typename T>
constexpr std::string generate_sql_insert(const T& obj, const std::string& table_name) {
    constexpr std::string columns = generate_sql_columns<T>();
    std::string values = generate_sql_valuess(obj);
    return "INSERT INTO " + table_name 
          + " (" + columns + ") VALUES (" + values + ");";
}

It is just one of many applications. The important idea is that you can craft highly optimized and very safe code, that will get reused in many instances. The code looks a bit scary, as C++ tends to, but it is quite reasonable.

In the coming years, many projects will be simplified and optimized thanks to compile-time reflection.

Code: I have posted a complete implementation in the code repository of my blog. I am sure it can be significantly improved.

Life expectancy of men in Canadian provinces

2025-06-21 08:32:39

In North America, my home province of Quebec has a slightly higher life expectancy than the rest of the country. It is also a poorer-than-average province, so that is maybe surprising. For 2023, the life expectancy at birth in Ontario was 82.33 years, whereas it was 82.55 years for Quebec.

However, if drill down in the data, you find that the differences has to do with men: men in Quebec live on average to be 80.77 years, the highest among Canadian provinces. Females in Ontario and British Columbia live longer than in Quebec.

What could explain Quebec’s men longevity?

People in Quebec smoke more then the Canadian average. It is probably not why Quebec men live longer..

Alcohol consumption patterns are worth examining. Unfortunately, data on alcohol use by sex for Canadian provinces were unavailable. Focusing on overall alcohol consumption, Quebec stands out with significantly higher levels. The correlation between total alcohol consumption (across men and women) and male longevity is positive but modest, with an R-squared value of 0.27.

One hypothesis suggests that variations in obesity rates may be a key factor. Quebec differs from other Canadian provinces in several ways. Notably, men in British Columbia and Quebec tend to have lower body mass indices compared to those in other regions. The correlation between obesity and longevity is more obvious (R square of 0.59).

In conclusion, I do not know why men in Quebec tend to live a bit longer than men in other provinces, but they tend to drink a lot more alcohol and they are leaner.

My source code is available.

Daniel LemireModify

Rss preview of Blog of Daniel Lemire

The author's social media

Daniel Lemire Modify