Computer science professor at the University of Quebec (TELUQ), open-source hacker, and long-time blogger.
The RSS's url is : https://lemire.me/blog/feed/
2024-10-18 07:03:05
We often store large datasets using comma-separated-value (CSV) files. The format is simple enough, each line of a text file is made of several values separated by commas, like so: "Willett, Walter C.",Harvard T.H. Chan School of Public Health,usa Given an excel spreadsheet, you can easily get a CSV file, and vice versa. CSV files … Continue reading How fast can you parse a CSV file in C#?
2024-10-15 06:15:42
When optimizing small functions, I often rely on a table lookup: I replace the actual computation with table of precomputed values. It is often surprisingly efficient. Let us consider an example. Suppose that you are given an array of characters and you want to replace all instances of the character ‘\’ with the two-character string … Continue reading Table lookups are efficient
2024-10-09 21:21:37
Both the Physics and Chemistry Nobel prizes were awarded to computer scientists in 2024. Computer scientists are emerging as leading figures in the natural sciences. In turn, these sciences are increasingly dominated by theorists and modelers. There has a noticeable shift in some scientific fields where theoretical models and simulations often not only precede experimental … Continue reading From software to reality?
2024-10-08 21:29:44
So… in 2024, the Physics Nobel prize was awarded to a Computer Scientist. Is Physics out of ideas? The Nobel Committee just gave a Physics award to a COMPUTER SCIENTIST! What does this say about the state of modern Physics? Though the first half of the XXth century was filled with breakthrough in Physics, it … Continue reading Geoffrey Hinton, the Godfather of Deep Learning, wins Nobel Prize in Physics!
2024-10-07 05:52:13
Consider the following problem. You want to iterate through the characters of a strings and find only those matching some criteria. For example, you might want scan an HTML string looking for the characters ‘<‘, ‘&’, ‘\0’, ‘\n’. We might do it in C++ using the find_first_of function. It is a generic function that is … Continue reading Iterating through matched characters in modern C++
2024-09-28 12:06:18
Recently, I received an email from an engineer at a prominent company who shared how he managed to save his employer tens of thousands of dollars annually by developing a custom command-line utility in C++. Without delving into specifics (which remain confidential), the company’s servers were tasked with querying a JSON file for a specific … Continue reading It is never too late to write your own C/C++ command-line utilities
2024-09-12 04:33:28
Imagine a world where becoming a doctor isn’t about years of rigorous study, but about showcasing your life’s work. That’s how doctorates used to roll. You’d write a book, make a groundbreaking discovery, and voila, a doctorate was yours. Fast forward to today, and we’ve flipped the script. Now, a PhD is less about what … Continue reading The PhD Paradox: A Journey into Academia’s Upside-Down World
2024-09-10 04:26:36
C++ programmers tend to represent strings using the std::string class. Though the implementation might vary, each instance of an std::string might use 32 bytes. Though it is not a large amount of memory, it can add up. In the Node.js runtime, as part of the build tools, there is a function which precomputes the string … Continue reading Replace strings by views when you can
2024-09-03 06:52:28
We sometimes have to work a large quantity of floating-point numbers. This volume can be detrimental to performance. Thus we often want to compress these numbers. Large-language models routinely do so. A sensible approach is to convert them to brain floating point numbers. These are 16-bit numbers that are often capable of representing accurately a … Continue reading Compressing floating-point numbers quickly by converting them to brain floats
2024-08-26 23:34:55
Most programming languages support floating-point numbers. You typically have the ability to turn a string into a floating-point number. E.g., “3.1416” could be parsed as a number close to pi. However strings typically cannot be represented exactly or at all. For example, “1e-1000” is too small and “1e1000” is too large for even 64-bit floating-point … Continue reading Parsing tiny and very large floating-point values: a programming-language comparison
2024-08-17 10:11:30
We often generate random integers. Quite often these numbers must be within an interval: e.g., an integer between 0 and 100. One application is a random shuffle. A standard algorithm for a fair random shuffle is the Knuth algorithm: void shuffle(mytype *storage, uint64_t size) { for (uint64_t i = size; i > 1; i--) { … Continue reading Faster random integer generation with batching
2024-08-14 00:12:46
JSON (JavaScript Object Notation) is a popular format for storing and transmitting data. It uses human-readable text to represent structured data in the form of attribute–value pairs and arrays. E.g., {"age":5, "name":"Daniel", toys:["wooden dog", "little car"]}. Ingesting and producing JSON documents can be a performance bottleneck. Thankfully, a few JSON parsers such as simdjson have … Continue reading Reflection-based JSON in C++ at Gigabytes per Second
2024-08-04 07:26:22
AMD Zen 4 and Zen 5, as well as server-side recent Intel processors, support an advanced set of instructions called AVX-512. They are powerful SIMD (Single Instruction, Multiple Data) instructions. Importantly, they allow ‘masked’ operations. That is, you can compute a mask and only do an operation on bytes indicated by the mask. Thus you … Continue reading Converting ASCII strings to lower case at crazy speeds with AVX-512
2024-07-28 23:39:06
People who should know better often underestimate how fast our storage capacity has grown. We have been able to get 1 TB of storage on iPhones for the last three generations. 2010 iPhone 4 32 GB 2012 iPhone 5 64 GB 2014 iPhone 6 128 GB 2016 iPhone 7 256 GB 2018 iPhone XS 512 … Continue reading Evolution of iPhone storage capacity
2024-07-28 05:35:42
Storage costs are plummeting like a skydiver in freefall—between 10 and 100 times cheaper with each passing decade. Meanwhile, the programmer population is growing at a leisurely pace, like a tortoise in a marathon, increasing by about 50% per decade. And the Linux kernel? It is maybe doubling in size every ten years. The net … Continue reading Storage costs are plummeting
2024-07-28 01:31:23
Docker is a standard to deploy software on the cloud. Developers start with an existing image and add their own code before deploying their systems. How big are typical uncompressed images? python:alpine (latest, aarch64) 58 MiB chainguard/bun (latest, aarch64) 90 MiB node:alpine (latest, aarch64) 141 MiB golang:alpine (latest, aarch64) 219 MiB Method: docker inspect -f … Continue reading How big are your docker images?
2024-07-27 08:10:07
We sometimes use binary executable which can span megabytes. I wondered: how much text is contained in these binary files? To find out, I wrote a Python script which adds up the size of all sequences of at least 16 ASCII characters in the file. My heuristic is simple but is not quite perfect: some … Continue reading How much of your binary executable is just ASCII text?
2024-07-26 23:25:13
For better performance in software, we avoid unnecessary copies. To do so, we introduce references (or pointers). An example of this ideas in C++ is the std::string_view class. As the name suggests, a std::string_view instance is merely a ‘view’: it points at some string, but it does not own or otherwise manage the underlying memory. … Continue reading Safer code in C++ with lifetime bounds
2024-07-22 23:13:24
Recent versions of C++ (C++20) have a new feature: concepts. A concept in C++ is a named set of requirements that a type must satisfy. E.g., ‘act like a string’ or ‘act like a number’. In C++, we have two closely related terms: traits and concepts. For example, std::is_floating_point is a type trait that checks … Continue reading Does C++ allow template specialization by concepts?
2024-07-21 05:30:43
Earlier this year, both major Web engines (WebKit/Safari and Chromium/Chrome/Edge/Brave) accelerated HTML parsing using SIMD instructions. These ‘SIMD’ instructions are special instructions that are present in all our processors that can process multiple bytes at once (e.g., 16 bytes). The problem that WebKit and Chromium solve is to jump to the next target character as … Continue reading Scan HTML even faster with SIMD instructions (C++ and C#)
2024-07-11 04:43:45
The world of commodity processor is roughly divided in two: x64 chips for servers and PCs, and ARM processors for mobile devices. However, ARM chips increasingly common on servers and laptop. My own favorite laptop is an Apple macBook with an M2 chip. Amazon has been producing its own ARM processors (Graviton) and it recently … Continue reading Benchmarking ARM processors: Graviton 4, Graviton 3 and Apple M2
2024-07-06 03:44:50
Recently, the two major Web engines (WebKit and Chromium) adopted fast SIMD routines to scan HTML content. The key insight is to use vectorized classification (Langdale and Lemire, 2019): you load blocks of characters and identify the characters you seek using a few instructions. In particular, we use ‘SIMD instructions’, special instructions that are available … Continue reading Scan HTML faster with SIMD instructions: .NET/C# Edition
2024-06-28 04:41:51
In C, we allocate memory on the heap using the malloc function. Other programming languages like C++ or zig (e.g., std.heap.c_allocator) may call on malloc underneath so it is important to understand how malloc works. Furthermore, the same concepts apply broadly to other memory allocators. In theory, you could allocate just one byte like so: … Continue reading How much memory does a call to ‘malloc’ allocate?
2024-06-23 03:14:18
Copying data in software is cheap, but it is not at all free. As you start optimizing your code, you might find that copies become a performance bottleneck. Let me be clear that copies really are cheap. It is often more performant to copy that data than to track the same memory across different threads. … Continue reading Performance tip: avoid unnecessary copies
2024-06-21 01:25:02
We have been working on a fast library to validate and transcode Unicode and other formats such as base64 in C++: simdutf. We wondered: could we achieve the same good results in C#? Microsoft’s .NET framework has made great strides in leveraging advanced instructions. For instance, if your processor supports AVX-512, you can instantiate 512-bit … Continue reading Validating gigabytes of Unicode strings per second… in C#?
2024-06-14 05:11:26
If you must multiply matrices, you should use dedicated libraries. However, we sometimes need to roll our own code. In C++, you can quickly write your own Matrix template: template <typename T> struct Matrix { Matrix(size_t rows, size_t cols) : data(new T[rows * cols]), rows(rows), cols(cols) {} T &operator()(size_t i, size_t j) { return data.get()[i … Continue reading Rolling your own fast matrix multiplication: loop order and vectorization
2024-06-08 12:55:42
Modern processors have instructions to process several bytes at once. Effectively all processors have the capability of processing 16 bytes at once. These instructions are called SIMD, for single instruction, multiple data. It was once an open question whether these instructions could be useful to accelerate common tasks such as parsing HTML or JSON. However, … Continue reading Scan HTML faster with SIMD instructions: Chrome edition
2024-05-31 11:48:21
In software, we often represent strings by surrounding them with quotes ("). What happens if the string itself contains quotes? We then need to escape the string. For example, the quote character (") or the backslash character (\) should be replaced by \" or \\. Most programmers are familiar with this process. Most strings do … Continue reading Quickly checking whether a string needs escaping
2024-05-31 03:33:28
In the quest for software optimization, a trusty companion is the sampling profiler, a tool available in most programming languages. These profilers work unobtrusively, taking snapshots of the program’s state and recording the currently executing function or instruction. While profilers sound like a silver bullet for identifying performance bottlenecks, their usefulness has limitations. They excel … Continue reading Never reason from the results of a sampling profiler
2024-05-26 09:09:54
Artificial intelligence is far more efficient at producing content than human beings, as far as carbon emissions go. Human brains got larger by over 5% between 1930 and 1970. Replacing plastics by ‘environment friendly’ alternatives typically results in greater greenhouse gas emissions. Prostate-specific antigen screening has only a small effect on men’s risk of dying … Continue reading Science and Technology links (May 25 2024)
2024-05-14 22:17:42
Back when I started programming professionally, every expert and every software engineering professor would swear by object-oriented programming. Resistance was futile. History had spoken: the future was object-oriented. It is hard to understate how strong the mania was. In education, we started calling textbooks and videos ‘learning objects‘. Educators would soon ‘combine learning objects and reuse them‘. A competitor … Continue reading Learning from the object-oriented mania
2024-05-13 23:51:51
In C++, there are different ways to pass a value to a function. Typically, at any given time, an object in C++ ‘belongs’ to a single function. The various ways to call a function differ in who owns the object, the caller or the callee (the function being called). The simplest one is that we … Continue reading Forwarding references in C++
2024-05-12 06:47:16
Peer review as we know it today was introduced very late, over a century after the scientific revolution. It happened after Einstein’s time… arguably the most productive era in science. Current scientists often equate a success with the publication in a selective peer-reviewed venue. But that was never the scientific paradigm. In fact, it is … Continue reading Peer review is not the gold standard in science
2024-05-09 11:55:45
Python is probably the most popular programming language in the world right now. Python is easy to extend using C code. You may want to return from Python a small data structure. When crossing from C to Python, there is an overhead. Thus, if performance is a concern, you do not want to return lots … Continue reading How fast can you construct a small list of strings in C for Python?
2024-05-03 03:23:11
Under Windows, when using Visual Studio to build C++ code, there are two possible compiler strategies. The Visual Studio compiler (often referred to as MSVC) is the default compiler provided by Microsoft for Windows development. In Debug mode, the regular Visual Studio compiler produces faster compilation times and more performant code compared to ClangCL. ClangCL … Continue reading Should Node.js be built with ClangCL under Windows?
2024-04-29 08:32:38
Egor Bogatov is an engineer working on C# compiler technology at Microsoft. He had an intriguing remark about a performance regression on Apple hardware following what appears to be an optimization. The .NET 9.0 runtime introduced the optimization where two loads (ldr) could be combined into a single load (ldp). It is a typical peephole … Continue reading Careful with Pair-of-Registers instructions on Apple Silicon
2024-04-27 08:05:24
Software can beat human beings at most games… from Chess to Go, and even poker. Large language models like GPT-4 offered through services such as ChatGPT allow us to solve a new breed of problems. GPT-4 can beat 90% of human beings at the bar exam. Artificial intelligence can match math Olympians. The primary skills … Continue reading Large language models (e.g., ChatGPT) as research assistants
2024-04-22 01:35:28
Go back to the roots: experience. An expert is someone who has repeatedly solved the concrete problem you are encountering. If your toilet leaks, an experienced plumber is an expert. An expert has a track record and has had to face the consequences of their work. Failing is part of what makes an expert: any … Continue reading How do you recognize an expert?
2024-04-20 05:25:21
Suppose that you receive a long string and you need to break it down into lines. Consider the simplified problems where you need to break the string into segments of (say) 72 characters. It is a relevant problem if your string is a base64 string or a Fortran formatted statement. The problem could be a … Continue reading How quickly can you break a long string into lines?
2024-04-14 06:31:25
Our computer hardware exchange data using a standard called PCI Express. Your disk, your network and your GPU are limited by what PCI Express can do. Currently, it means that you are limited to a few gigabytes per second of bandwidth. PCI Express is about to receive a big performance boost in 2025 when PCI … Continue reading Science and Technology links (April 13 2024)