2025-12-06 03:24:50

The one constant that I have observed in my professional life is that people underestimate the need to move fast.
Of course, doing good work takes time. I once spent six months writing a URL parser. But the fact that it took so long is not a feature, it is not a positive, it is a negative.
If everything is slow-moving around you, it is likely not going to be good. To fully make use of your brain, you need to move as close as possible to the speed of your thought.
If I give you two PhD students, one who completed their thesis in two years and one who took eight years… you can be almost certain that the two-year thesis will be much better.
Moving fast does not mean that you complete your projects quickly. Projects have many parts, and getting everything right may take a long time.
Nevertheless, you should move as fast as you can.
For multiple reasons:
1. A common mistake is to spend a lot of time—too much time—on a component of your project that does not matter. I once spent a lot of time building a podcast-like version of a course… only to find out later that students had no interest in the podcast format.
2. You learn by making mistakes. The faster you make mistakes, the faster you learn.
3. Your work degrades, becomes less relevant with time. And if you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? He is not going to throw them away and start again, as that would be a new seven-year project. So he will keep teaching using aging lecture notes until he retires and someone finally updates the course.
What if you are doing open-heart surgery? Don’t you want someone who spends days preparing and who works slowly? No. You almost surely want the surgeon who does many, many open-heart surgeries. They are very likely to be the best one.
Now stop being so slow. Move!
2025-12-04 23:40:59

“We see something that works, and then we understand it.” (Thomas Dullien)
It is a deeper insight than it seems.
Young people spend years in school learning the reverse: understanding happens before progress. That is the linear theory of innovation.
So Isaac Newton comes up with his three laws of mechanics, and we get a clockmaking boom. Of course, that’s not what happened: we get the pendulum clock in 1656, then Hooke (1660) and Newton (1665–1666) get to think about forces, speed, motion, and latent energy.
The linear model of innovation makes as much sense as the waterfall model in software engineering. In the waterfall model, you are taught that you first need to design every detail of your software application (e.g., using a language like UML) before you implement it. To this day, half of the information technology staff members at my school are made up of “analysts” whose main job is supposedly to create such designs based on requirements and supervise execution.
Both the linear theory and the waterfall model are forms of thinkism, a term I learned from Kevin Kelly. Thinkism sets aside practice and experience. It is the belief that given a problem, you should just think long and hard about it, and if you spend enough time thinking, you will solve it.
Thinkism works well in school. The teacher gives you all the concepts, then gives you a problem that, by a wonderful coincidence, can be solved just by thinking with the tools the same teacher just gave you.
As a teacher, I can tell you that students get really angry if you put a question on an exam that requires a concept not explicitly covered in class. Of course, if you work as an engineer and you’re stuck on a problem and you tell your boss it cannot be solved with the ideas you learned in college… you’re going to look like a fool.
If you’re still in school, here’s a fact: you will learn as much or more every year of your professional life than you learned during an entire university degree—assuming you have a real engineering job.
Thinkism also works well in other limited domains beyond school. It works well in bureaucratic settings where all the rules are known and you’re expected to apply them without question. There are many jobs where you first learn and then apply. And if you ever encounter new conditions where your training doesn’t directly apply, you’re supposed to report back to your superiors, who will then tell you what to do.
But if you work in research and development, you always begin with incomplete understanding. And most of the time, even if you could read everything ever written about your problem, you still wouldn’t understand enough to solve it. The way you make discoveries is often to either try something that seems sensible, or to observe something that happens to work—maybe your colleague has a practical technique that just works—and then you start thinking about it, formalizing it, putting it into words… and it becomes a discovery.
And the reason it often works this way is that “nobody knows anything.” The world is so complex that even the smartest individual knows only a fraction of what there is to know, and much of what they think they know is slightly wrong—and they don’t know which part is wrong.
So why should you care about how progress happens? You should care because…
1. It gives you a recipe for breakthroughs: spend more time observing and trying new things… and less time thinking abstractly.
2. Stop expecting an AI to cure all diseases or solve all problems just because it can read all the scholarship and “think” for a very long time. No matter how much an AI “knows,” it is always too little.
Further reading: Godin, Benoît (2017). Models of innovation: The history of an idea. MIT press.
2025-12-04 04:41:26

It is absolutely clear to me that large language models represent the most significant scientific breakthrough of the past fifty years. The nature of that breakthrough has far reaching implications for what is happening in science today. And I believe that the entire scientific establishment is refusing to acknowledge it.
We often excuse our slow progress with tired clichés like “all the low-hanging fruit has been picked.” It is an awfully convenient excuse if you run a scientific institution that pretends to lead the world in research—but in reality is mired in bureaucracy, stagnation and tradition.
A quick look at the world around us tells a different story, progress is possible and even moderately easy, even through the lens of everyday experience. I have been programming in Python for twenty years and even wrote a book about it. Managing dependencies has always been a painful, frustrating process—seemingly unsolvable. The best anyone could manage was to set up a virtual environment. Yes, it was clumsy and awkward as you know if you programmed in Python, but that was the state of the art after decades of effort by millions of Python developers. Then, in 2024, a single tool called uv appeared and suddenly made the Python ecosystem feel sane, bringing it in line with the elegance of Go or JavaScript runtimes. In retrospect, the solution seems almost obvious.
NASA has twice the budget of SpaceX. Yet SpaceX has launched more missions to orbit in the past decade than NASA managed in the previous fifty years. The difference is not money; it is culture, agility, and a willingness to embrace new ideas.
Large language models have answered many profound scientific questions, yet one of the deepest concerns the very nature of language itself. For generations, the prevailing view was that human language depends on a vast set of logical rules that the brain applies unconsciously. That rule-based paradigm dominated much of twentieth-century linguistics and even shaped the early web. We spent an entire decade chasing the dream of the Semantic Web, convinced that if we all shared formal, machine-readable metadata, rule engines would deliver web-scale intelligence. Thanks to large language models, we now know that language does not need to be rule-based at all. Verbal intelligence does not need to require on explicit rules.
It is a tremendous scientific insight that overturns decades of established thinking.
A common objection is that I am conflating engineering with science. Large language models are just engineering. I invite you to examine the history of science more closely. Scientific progress has always depended on the tools we build.
You need a seaworthy boat before you can sail to distant islands, observe wildlife, and formulate the theory of natural selection. Measuring the Earth’s radius with the precision achieved by the ancient Greeks required both sophisticated engineering and non-trivial mathematics. Einstein’s insights into relativity emerged in an era when people routinely experienced relative motion on trains; the phenomenon was staring everyone in the face.
The tidy, linear model of scientific progress—professors thinking deep thoughts in ivory towers, then handing blueprints to engineers—is indefensible. Fast ships and fast trains are not just consequences of scientific discovery; they are also wellsprings of it. Real progress is messy, iterative, and deeply intertwined with the tools we build. Large language models are the latest, most dramatic example of that truth.
So what does it tell us about science? I believe it is telling us that we need to rethink our entire approach to scientific research. We need to embrace agility, experimentation, and a willingness to challenge established paradigms. The bureaucratization of science was a death sentence for progress.
2025-11-29 13:00:03

Base64 is a binary-to-text encoding scheme that converts arbitrary binary data (like images, files, or any sequence of bytes) into a safe, printable ASCII string using a 64-character alphabet (A–Z, a–z, 0–9, +, /). Browsers use it in JavaScript to embedding binary data directly in code or HTML or to transmitting binary data as text.
Browsers recently added convenient and safe functions to process base 64 functions Uint8Array.toBase64() and Uint8Array.fromBase64(). Though they are several parameters, it comes down to an encoding and a decoding function.
const b64 = Base64.toBase64(bytes); // string const recovered = Base64.fromBase64(b64); // Uint8Array
When encoding, it takes 24 bits from the input. These 24 bits are divided into four 6-bit segments, and each 6-bit value (ranging from 0 to 63) is mapped to a specific character from the Base64 alphabet: the first 26 characters are uppercase letters A-Z, the next 26 are lowercase a-z, then digits 0-9, followed by + and / as the 62nd and 63rd characters. The equals sign = is used as padding when the input length is not a multiple of 3 bytes.
How fast can they be ?
Suppose that you consumed 3 bytes and produced 4 bytes per CPU cycle. At 4.5 GHz, it would be that you would encode to base64 at 13.5 GB/s. We expect lower performance going in the other direction. When encoding, any input is valid: any binary data will do. However, when decoding, we must handle errors and skip spaces.
I wrote an in-browser benchmark. You can try it out in your favorite browser.
I decided to try it out on my Apple M4 processor, to see how fast the various browsers are. I use blocks of 64 kiB. The speed is measured with respect to the binary data.
| browser | encoding speed | decoding speed |
|---|---|---|
| Safari | 17 GB/s | 9.4 GB/s |
| SigmaOS | 17 GB/s | 9.4 GB/s |
| Chrome | 19 GB/s | 4.6 GB/s |
| Edge | 19 GB/s | 4.6 GB/s |
| Brave | 19 GB/s | 4.6 GB/s |
| Servo | 0.34 GB/s | 0.40 GB/s |
| Firefox | 0.34 GB/s | 0.40 GB/s |
Safari seems to have slightly slower encoding speed than the Chromium browsers (Chome, Edge, Brave), however it is about twice as fast at decoding. Servo and Firefox have similarly poor performance with the unexpected result of having faster decoding speed than encoding speed. I could have tried other browsers but most seem to be derivatives of Chromium or WebKit.
For context, the disk of a good laptop can sustain over 3 GB/s of read or write speed. Some high-end laptops have disks that are faster than 5 GB/s. In theory, your wifi connections might get close to 5 GB/s with Wifi 7. Some Internet providers might get close to providing similar network speeds although your Internet connection is likely several times slower.
The speeds on most browsers are faster than you might naively guess. They are faster than networks or disks.
Note. The slower decoding speed on Chromium-based browsers appears to depend on the v8 JavaScript engine which decodes the string first to a temporary buffer, before finally copying from the temporary buffer to the final destination. (See BUILTIN(Uint8ArrayFromBase64) in v8/src/builtins/builtins-typed-array.cc.)
Note. Denis Palmeiro from Mozzila let me know that upcoming changes in Firefox will speed up performance of the base64 functions. In my tests with Firefox nightly, the performance is up by about 20%.
2025-11-29 10:39:56

Whenever I say I dislike debugging and organize my programming habits around avoiding it, there is always pushback: “You must not use a good debugger.”
To summarize my view: I want my software to be antifragile (credit to Nassim Taleb for the concept). The longer I work on a codebase, the easier it should become to fix bugs.
Each addition to a pieces of code can be viewed as a stress. If nothing is done, the code gets slightly worse, harder to maintain, more prone to bugs. Thankfully, you can avoid such outcome.
That’s not natural. For most developers lacking deep expertise, as the codebase grows, bugs become harder to fix: you chase symptoms through layers of code, hunt heisenbugs that vanish in the debugger, or fix one bug only to create another. The more code you have, the worse it gets. Such code is fragile. Adding a new feature risks breaking old, seemingly unrelated parts.
In my view, the inability to produce antifragile code explains the extreme power-law distribution in programming: most of the code we rely on daily was written by a tiny fraction of all programmers who have mastered antifragility.
How do you reverse this? How do you ensure that the longer you work on the code, the shallower the bugs become?
There are well-known techniques, and adding lots of tests and checks definitely helps. You can write antifragile code without tests or debug-time checks… but you’ll need something functionally equivalent.
Far-reaching prescriptions (“you must use language X, tool Y, method Z”) are usually cargo-cult nonsense. Copying Linus Torvalds’ tools or swearing style won’t guarantee success. But antifragillity is not a prescription, it is a desired outcome.
Defensive programming itself is uncontroversial—yet it wasn’t common in the 1980s and still isn’t the default for many today.
Of course, a full defensive approach isn’t always applicable or worth the cost.
For example, if I’m vibe-coding a quick web app with more JavaScript than I care to read, I’ll just run it in the browser’s debugger. It works fine. I’m not using that code to control a pacemaker, and I’m not expecting to be woken up at midnight on Christmas to fix it.
If your program is 500 lines and you’ll run it 20 times a year, antifragility isn’t worth pursuing.
Large language models can generate defensive code, but if you’ve never written defensively yourself and you learn to program primarily with AI assistance, your software will probably remain fragile.
You can add code quickly, but the more you add, the bigger your problems become.
That’s the crux of the matter. Writing code was never the hard part—I could write code at 12, and countless 12-year-olds today can write simple games and apps. In the same way, a 12-year-old can build a doghouse with a hammer, nails, and wood. Getting 80% of the way has always been easy.
Scaling complexity without everything collapsing—that’s the hard part.
2025-11-24 07:09:59

I maintain a few widely used libraries that have optimized code paths based on the specific processor being used. We started supporting Loongson processors in recent years, but I did not have access to a Loongson processor until now. To my knowledge, they are not widely distributed in North America. This made it difficult for me to do any performance tuning. Thankfully, kind people from the Loongson Hobbyists’ Community helped me acquire a small computer with a Loongson processors.
My understanding is that Loongson processors serve to reduce the dependence of China on architectures like x64 and ARM. They use their own proprietary architecture called LoongArch. These processors have two generations of SIMD (single instruction, multiple data) vector extensions designed for parallel processing : LSX and LASX. LSX (Loongson SIMD Extension) provides 128-bit wide vector registers and instructions roughly comparable to ARM NEON or early x64 SSE extensions. LASX (Loongson Advanced SIMD Extension), first appearing in the Loongson 3A5000 (2021), is the 256-bit successor that is somewhat comparable with x64 AVX/AVX2 present in most x64 (Intel and AMD) processors.
The LoongArch architecture is not yet universally supported. You can run most of Linux (Debian), but Visual Studio Code cannot ssh into a LoongArch system although there is community support in VSCodium. However, recent versions of the GCC and LLVM compilers support LoongArch.
My Loongson-3A6000 processor supports both LASX and LSX. However, I do not know how to do runtime dispatching under LoongArch: check whether LASX is supported as the program is running and switching on LASX support dynamically. I can force the compiler to use LASX (by compiling with -march=native) but my early experiments show that LASX routines are no faster than LSX routines… possibly a sign of poor optimization on our part.
I decided to run some tests to see how this Chinese processor compares with a relatively recent Intel processor (Ice Lake). The comparison is not meant to be fair. The Ice Lake processor is somewhat older but it is an expensive server-class processor. Further, the code that I am using is likely to have been tuned for x64 processors much more than for Loongson processors. I am also not trying to be exhaustive: I just want a broad idea.
Let us first consider number parsing. My test is reproducible.
git clone https://github.com/lemire/simple_fastfloat_benchmark.git
cd simple_fastfloat_benchmark
cmake -B build
cmake --build build
./build/benchmarks/benchmark # use sudo for perf counters
This will parse random numbers. I focus on the fast_float results. I use GCC 15 in both instances.
| processor | instructions/float | ins/cycle | GHz |
|---|---|---|---|
| Loongson-3A6000 | 377 | 4.92 | 2.50 |
| Xeon Gold 6338 | 295 | 5.07 | 3.19 |
So the Loongson-3A6000 retires about as many instructions per cycle as the Intel processor. However, it requires more instructions and its clock frequency is lower. So the Intel processor wins this round.
What if we replace the fast_float function by abseil’s number parse (from Google). I get that both processors are entirely comparable, except for the clock frequency.
| processor | instructions/float | ins/cycle | GHz |
|---|---|---|---|
| Loongson-3A6000 | 562 | 4.42 | 2.50 |
| Xeon Gold 6338 | 571 | 5.08 | 3.19 |
Intel still wins due to the higher frequency, but by a narrower margin.
I wanted to test the Loongson processor on SIMD intensive tasks. So I used the simdutf library to do some string transcoding.
git clone https://github.com/simdutf/simdutf/git
cd simdutf
cmake -B build -D SIMDUTF_BENCHMARKS=ON
cmake --build build --target benchmark
./build/benchmarks/benchmark -P utf8_to_utf16le -F README.md
# use sudo for perf counters
My results are as follows, depending on which instructions are used. The Intel processor has three options (128-bit with SSSE3, 256-bit with AVX2 and 512-bit with AVX-512) while the Loongson processor has two options (128-bit with LSX and 256-bit with LASX).
| processor | ins/byte | ins/cycle | GHz |
|---|---|---|---|
| Loongson-3A6000 (LSX) | 0.562 | 2.633 | 2.50 |
| Loongson-3A6000 (LASX) | 0.390 | 1.549 | 2.50 |
| Xeon Gold 6338 (SSSE3) | 0.617 | 5.07 | 3.236 |
| Xeon Gold 6338 (AVX2) | 0.364 | 2.625 | 3.19 |
| Xeon Gold 6338 (AVX-512) | 0.271 | 1.657 | 3.127 |
Roughly speaking, the Loongson transcodes a simple ASCII file (the README.md file) at 10 GB/s whereas the Intel processor does it slightly faster than 20 GB/s.
Overall, I find these results quite good for the Loongson processor.
The folks at Chips and Cheese have a more extensive review. They put the Chinese processor somewhere between the first AMD Zen processors and the AMD Zen 2 processors on a per core basis. The AMD Zen 2 processors power current gaming consoles such as the PlayStation 5. Chips and Cheese concluded “Engineers at Loongson have a lot to be proud of”: I agree.