2025-10-31 06:55:28
As smart as a PhD
There is sometimes confusion about what a PhD is. The main signal that you should derive from the fact that someone has a PhD is that they are well suited to the university campus environment. Maybe people who complete a PhD are especially thorough and finish their work to perfection? In computer science, academic projects have a reputation for being of relative quality. Stonebraker, a famous computer science, attributes part of his success to his dedication to finishing up the work:
The smartest things we ever did, was to then put in the effort to make it really work. (Stonebraker, 2014)Did you know that a lot of people have a PhD? Over 3% of the population in a country like Switzerland has a PhD. Hence, in a city of 1 million people, you may have 30,000 people with a PhD. In Germany, that would be 15,000 people. Do PhDs lead to great jobs and higher incomes? Not really. If you do get your…
2025-10-27 04:15:30
Flame Graphs in Go
The hardest problem in software performance is often to understand your code and why it might be slow. One approach to this problem is called profiling. Profiling tries to count the time spent in the various functions of your program. It can be difficult to understand the result of profiling. Furthermore, it is more complicated than it seems. For example, how do you count the time spent in a function A that calls another function B. Does the time spent in function B count for the time spent in function A? Suppose function A executes code for 2 seconds, then calls function B which takes 3 seconds to run, and finally A continues with 1 additional second. The total time for A would be 6 seconds if everything were included, but that does not accurately reflect where the time is truly spent. We might prefer the exclusive or flat time: the time spent only in the body of the function itself,…
https://lemire.me/blog/2025/10/26/flame-graphs-in-go/
2025-10-27 04:14:29
Flame Graphs in Go
The hardest problem in software performance is often to understand your code and why it might be slow. One approach to this problem is called profiling. Profiling tries to count the time spent in the various functions of your program. It can be difficult to understand the result of profiling. Furthermore, it is more complicated than it seems. For example, how do you count the time spent in a function A that calls another function B. Does the time spent in function B count for the time spent in function A? Suppose function A executes code for 2 seconds, then calls function B which takes 3 seconds to run, and finally A continues with 1 additional second. The total time for A would be 6 seconds if everything were included, but that does not accurately reflect where the time is truly spent. We might prefer the exclusive or flat time: the time spent only in the body of the function itself,…
https://lemire.me/blog/2025/10/26/flame-graphs-in-go/
2025-10-26 23:01:26
Thinking Clearly
If you have ever met me in person, you know that when you share an idea with me, I simplify it to its core and reflect it back to you, focusing on its essential parts. I dissect each statement for precision. “What do you mean by this word?” I have two decades of experience working with academics who overcomplicate everything. Humans are easily confused. A project proposal with ten moving parts and five objectives is overwhelming. Most people cannot think it through critically, which can lead to disaster. By instinct, I simplify problems as my first step, reducing them to their “minimum viable product,” as they say in Silicon Valley. Some people avoid simplicity to sound smarter. They won’t admit it, maybe not even to themselves, but that’s what they are thinking: “Oh no! I’m not doing this simple thing; my work is much more sophisticated.” That’s a terrible idea. Even simple projects become…
https://lemire.me/blog/2025/10/26/thinking-clearly/
2025-10-20 06:17:24
Speeding up C++ functions with a thread_local cache
In large code bases, we are often stuck with unpleasant designs that are harming our performance. We might be looking for a non-intrusive method to improve the performance. For example, you may not want to change the function signatures. Let us consider a concret example. Maybe someone designed the programming interface so that you have to access the values from a map using an index. They may have code like so: “`cpp auto at_index(map_like auto& index_map, size_t idx) { size_t count = 0; for (const auto &[key, value] : index_map) { if(count == idx) return value; count++; } throw std::out_of_range(“Index out of range”); } “` This code goes through the keys of the map `idx` times. Typictally, it implies some kind of linked list traversal. If you are stuck with this interface, going through the values might imply repeated calls to the `at_index` function: “`cpp for (size_t i = 0; i < input_size; ++i) { at_index(index_map, i); } ``` If you took any kind of computer science, you will immediately see the problem: my code has quadratic complexity. If…
https://lemire.me/blog/2025/10/19/speeding-up-c-functions-with-a-thread_local-cache/
2025-10-18 02:51:34
Research results are cultural artifacts, not public goods
Many view scientific research as a public good. I consider this naive and indefensible. Scientific progress hinges on people and culture, not on research results as public goods. What do they mean by a public good? A public good is non-excludable: Once a scientific discovery is made anyone can access it at low cost. If true, why bother innovating? Just wait and adopt others’ discoveries. History tells us that it does not work. The British Empire’s technological edge took decades for others to match, despite eventual diffusion—at great cost. Consider the most significant advance of the last 20 years: large language models (LLMs). Papers, code, and breakthroughs are freely available online. Yet firms like OpenAI, Anthropic, and xAI pay millions in salaries to top talent. Why? Can’t others just study the papers and…
https://lemire.me/blog/2025/10/17/research-results-are-cultural-artifacts-not-public-goods/