2026-01-02 00:02:07
How are you, hacker?
🪐 What’s happening in tech today, January 1, 2026?
The HackerNoon Newsletter brings the HackerNoon homepage straight to your inbox. On this day, Ireland became a part of Great Britain in 1801, The Euro became the official currency of 12 European Countries in 2002, The First Transcontinental phone call was made in 1915, and we present you with these top quality stories. From We Asked 14 Tech Bloggers Why They Write. Heres What They Said to The 10 Most Interesting C# Bugs We Found in Open Source in 2025, let’s dive right in.

By @scynthiadunlop [ 14 Min read ] 14 expert tech bloggers share why they started writing and why they continue. Read More.

By @akiradoko [ 15 Min read ] If youd like to check whether your project has similar issues, nows the time to use a static analyzer. Read More.

By @tigranbs [ 8 Min read ] Y Combinator reports that 25% of its W25 project has codebases that are 95% AI-generated. Read More.

By @dmytrospilka [ 4 Min read ] Could it be time for pension savers to embrace the boom by making the switch to a SIPP? Read More.
🧑💻 What happened in your world this week?
It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️
ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME
We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it.See you on Planet Internet! With love, The HackerNoon Team ✌️

2026-01-01 15:50:54
Last month, I spent 3 hours trying to write a decent cold email template.
Three. Whole. Hours.
The AI kept spitting out generic garbage that sounded like every other “Hey [FIRST_NAME], hope this email finds you well”…
Then I changed one thing in my prompt.
One thing.
Suddenly, the AI was writing emails that actually sounded human, referenced specific connection points, and had personality.
My replies rates jumped tremendously!
That moment?
That’s when prompt engineering stopped feeling like a skill and started feeling like almost cheating.
Here’s the thing about prompt engineering that is obvious: it’s about getting really, really good at asking for exactly what you want.
Most of us suck at it. Because is not that easy.
It clicked when I started to build this site using Cursor.
My first attempts were disasters:
"Create my homepage and style with stunning and aesthetic visuals"
Generic, ugly, messy code nobody would ever be able to customize. 🤮
"You're a senior web designer developer with deep knowledge in UI/UX. You are building my personal blog with me, a good fellow unfamiliar with our codebase (Astro Framework). Based in Astro conventions and best practices, create practical assets, components like UI and sections based in astro files. The final result should be a template that experienced developers could use and customize easily..."
Actually useful and clean astro files, at least better and more organized than before. (CSS files are still meeh though) 😅

The difference? I stopped asking the AI to write generic codes and started asking it to be an experienced developer helping a colleague building his humble project.
I used to write prompts like I was asking a favor: “Could you please maybe help me write a blog post about SEO?”.
Now I’m direct: “Write a 1,200-word blog post for marketing developers who want to understand technical SEO. Include code examples and explain why site speed actually matters for conversion rates, not just rankings.”
The AI doesn’t have feelings. It has algorithms. Feed those algorithms exactly what they need.
"Write a LinkedIn post about growth marketing."
"I'm a Marketing Engineer at a YC startup. Write a LinkedIn post sharing one specific growth hack I discovered while scaling our user base from 1K to 10K. Make it tactical, not theoretical. My audience is other growth marketers and technical founders."
The second prompt works because the AI knows:
Instead of saying “write in a conversational tone,” I give examples:
"Write like this: Here's the thing nobody talks about with A/B testing: most marketers get so excited about statistical significance that they forget to check if the difference actually matters. I've seen teams celebrate a 2% lift on a metric that generates $50/month. Congrats, you just spent three weeks optimizing for an extra dollar a month"
The AI learns from the example and matches that specific style.
Counterintuitive but true: the more constraints you give, the more creative the output.
"Help me with marketing automation."
"I need a 7-email drip sequence for SaaS trial users who haven't logged in after day 3. Each email should be under 100 words, focus on one specific feature, include a clear and valuable CTA, sounding like it's coming from a helpful teammate, not a sales robot."
Constraints force creativity within boundaries.
My best prompts are never first drafts. I treat prompt engineering like optimizing ad copy (test, measure, refine, repeat).
First attempt usually gets me 60% of what I want. Then I say:
Each iteration gets closer to perfect.
Here’s why prompt engineering feels like cheating: I’m getting expert-level outputs on topics I’m still learning about.
I needed to ship a free Astro template. Instead of spending hours reading documentation, I just:
Here’s what I’ve learned being caught between marketing and engineering teams: both sides are already using AI, but they’re using it differently.
Marketers use AI for content: social posts, email copy, blog outlines.
Engineers use AI for code: debugging, documentation, optimization.
As a Marketing Engineer, I am trying to use AI to translate between worlds:
The prompt engineering skills transfer directly. Asking AI to debug a Python script or write an email sequence, it’s the same core skill: being incredibly specific about what I want.
Prompt engineering isn’t actually about AI. It’s about getting incredibly good at articulating exactly what you want.
That’s why I believe to be better we need to be learning, reading, and discovering something, always. And writing thoughts in somewhere.
This is exactly how I built this blog, by applying prompt engineering to create content that ranks well and helps readers.
And the specificity skill will transfer everywhere:
So yes, prompting well feels like cheating.
This is just the latest one.
What’s your best prompting win?
Want to see prompt engineering in action? Check out how I used these techniques to build this blog with perfect SEO scores and create content that ranks.
2026-01-01 15:23:20
This document was adapted from Dagger Directives for a monorepo that uses Bazel Build, and is provided for ease of use in other organizations.
This document is extensive, and while each directive is simple, the broader architecture they promote may be unclear; therefore, an end-to-end example is provided to aid comprehension, and the underlying architectural rationale is provided to link the individual directives to broader engineering principles. The sections are presented for ease of reference, with directives first; however, readers are encouraged to begin with whichever section they find most helpful.
The following definitions apply throughout this document:
Directives for how components are defined, scoped, and related to one another.
Libraries and generic utilities should provide components that expose their functionality and declare their component dependencies instead of only providing raw classes/interfaces.
Positive Example: A Networking library provides a NetworkingComponent that exposes anOkHttpClient binding and depends on a CoroutinesComponent.
Negative Example: A Networking library that provides various interfaces and classes, but no component, and requires downstream consumers to define modules and components to wire them together.
This approach transforms Dagger components from details of the downstream application into details of upstream libraries. Instead of forcing consumers to understand a library's internal structure(and figure out how to instantiate objects), library authors provide complete, ready-to-use components that can be composed together and used to instantiate objects. This approach is analogous to plugging in a finished appliance instead of assembling a kit of parts: consumers just declare a dependency on the component (e.g. a fridge), supply the upstream components (e.g. electricity), and get the fully configured objects they need without ever seeing the wiring (e.g. cold drinks). This approach scales well, at the cost of more boilerplate.
Components should export a minimal set of bindings, accept only the dependencies they require to operate (i.e. with @BindsInstance), and depend only on the components they require to operate.
Positive Example: A Feature component that depends only on Network and Database components, exposes only its public API (e.g. FeatureUi), and keeps its internal bindings hidden.
Negative Example: A Feature component that depends on a monolithic App component (which itself goes against the practice), exposes various bindings that could exist in isolation (e.g.FeatureUi, Clock, NetworkPorts and RpcBridge, IntegerUtil), and exposes its internal bindings.
This allows consumers to compose functionality with granular precision, reduces unnecessary configuration (i.e. passing instances/dependencies that are not used at runtime), and optimizes build times. This approach is consistent with the core tenets of the Interface Segregation Principle in that it ensures that downstream components can depend on the components they need, without being forced to depend on unnecessary components.
Components should be defined as plain interfaces ("naked interfaces") without Dagger annotations, and then extended by annotated interfaces for production, testing, and other purposes. Downstream components should target the naked interfaces in their component dependencies instead of the annotated interfaces.
Example:
// Definition
interface FooComponent {
fun foo(): Foo
}
// Production Implementation
@Component(modules = [FooModule::class]) interface ProdFooComponent : FooComponent
// Testing Implementation
@Component(modules = [FakeFooModule::class]) interface TestFooComponent : FooComponent {
fun fakeFoo(): FakeFoo
}
@Component(dependencies = [FooComponent::class])
interface BarComponent {
@Component.Builder
interface Builder {
fun consuming(fooComponent: FooComponent): Builder
fun build(): BarComponent
}
}
This ensures Dagger code follows general engineering principles (separation of interface and implementation). While Dagger components are interfaces, the presence of a `@Component` annotation implicitly creates an associated implementation (the generated Dagger code); therefore, depending on an annotated component forces a dependency on its implementation (at the build system level), and implicitly forces test code to depend on production code. By separating them, consumers can depend on a pure interface without needing to include the Dagger implementation in their class path, thereby preventing leaky abstractions, optimising build times, and directly separating production and test code into discrete branches.
Components must be bound to a custom Dagger scope.
Example:
@FooScope
@Component
interface ProdFooComponent : FooComponent {
fun foo(): Foo
}
Unscoped bindings can lead to subtle bugs where expensive objects are recreated or shared state is lost. Explicit lifecycle management ensures objects are retained only as long as needed, thereby preventing these issues.
Components must only include modules defined within their own package or its subpackages; however, they must never include modules from a subpackage if another component is defined in an intervening package.
Example:
Given the following package structure:
src
├── a
│ ├── AComponent
│ ├── AModule
│ ├── sub1
│ │ └── Sub1Module
│ └── sub2
│ ├── Sub2Component
│ └── sub3
│ └── Sub3Module
└── b
└── BModule
AComponent may include AModule (same package) and Sub1Module (subpackage with no intervening component), but not Sub3Module (intervening Sub2Component in a.sub2) or BModule (not a subpackage of a).
This enforces strict architectural layering and prevents dependency cycles (spaghetti code), thereby ensuring proper component boundaries and maintainability.
Component dependencies should be used instead of subcomponents.
Example: Foo depends on Bar via @Component(dependencies = [Bar::class]) rather than using@Subcomponent.
While subcomponents are a standard feature of Dagger, prohibiting them favors a flat composition-based component graph, thereby reducing cognitive load, allowing components to be tested in isolation, and creating a more scalable architecture.
Components may depend on components from any package.
Example: Foo in a.b can depend on Bar in x.y.z.
Allowing components to depend on each other regardless of location promotes reuse, thereby fostering high cohesion within packages.
Components must include the suffix Component in their name.
Positive example: ConcurrencyComponent
Negative example: Concurrency
This clearly distinguishes the component interface from the functionality it provides and prevents naming collisions.
The name of the custom scope associated with a component must inherit the name of the component (minus "Component") with "Scope" appended.
Example: FooComponent is associated with FooScope.
Consistent naming allows contributors to immediately associate a scope with its component, thereby preventing conflicts and reducing split-attention effects.
Component builders must be called `Builder`.
Example:
@Component
interface FooComponent {
@Component.Builder
interface Builder {
@BindsInstance fun binding(bar: Bar): Builder
fun build(): FooComponent
}
}
Standardizing builder names allows engineers to predict the API surface of any component, thereby reducing the mental overhead when switching between components.
Component builder functions that bind instances must be called `binding`; however, when bindings use qualifiers, the qualifier must be appended.
Example:
@Component
interface ConcurrencyComponent {
@Component.Builder
interface Builder {
// Unqualified
@BindsInstance fun binding(bar: Bar): Builder
// Qualified
@BindsInstance fun bindingIo(@Io scope: CoroutineScope): Builder
@BindsInstance fun bindingMain(@Main scope: CoroutineScope): Builder
fun build(): ConcurrencyComponent
}
}
Explicit naming immediately clarifies the mechanism of injection (instance binding vs component dependency), thereby preventing collisions when binding multiple instances of the same type.
Component builder functions that set component dependencies must be called `consuming`.
Example:
@Component(dependencies = [Bar::class])
interface FooComponent {
@Component.Builder interface Builder {
fun consuming(bar: Bar): Builder
fun build(): FooComponent
}
}
Distinct naming clearly separates structural dependencies (consuming) from runtime data (binding), thereby making the component's initialization logic self-documenting.
Component provision functions must be named after the type they provide (in camelCase). However, when bindings use qualifiers, the qualifier must be appended to the function name.
Example:
@Component
interface FooComponent {
// Unqualified
fun bar(): Bar
// Qualified
@Io fun genericIo(): Generic
@Main fun genericMain(): Generic
}
This ensures consistency and predictability in the component's public API.
Requirements for the factory functions that instantiate components for ease of use.
Components must have an associated factory function that instantiates the component.
Example:
@Component(dependencies = [Quux::class])
interface FooComponent { // ... }
fun fooComponent(quux: Quux = DaggerQuux.create(), qux: Qux): FooComponent = DaggerFooComponent.builder()
.consuming(quux)
.binding(qux)
.build()
This integrates cleanly with Kotlin, thereby significantly reducing the amount of manual typing required to instantiate components.
Exception: Components that are file private may exclude the factory function (e.g. components defined in tests for consumption in the test only).
Factory functions must supply default arguments for parameters that represent component dependencies.
Example: fun fooComponent(quux: Quux = DaggerQuux.create(), ...)
Providing defaults for dependencies allows consumers to focus on the parameters that actually vary, thereby improving developer experience and reducing boilerplate.
The default arguments for component dependency parameters in factory functions should be production components, even when the component being assembled is a test component.
Example: fun testFooComponent(quux: Quux = DaggerQuux.create(), ...)
This ensures tests exercise real production components and behaviours as much as possible, thereby reducing the risk of configuration drift between test and production environments.
Factory functions should be defined as top-level functions in the same file as the component.
Example: fooComponent() function in same file as FooComponent interface.
Co-locating the factory with the component improves discoverability.
Factory function names should match the component, but in lower camel case.
Example: FooComponent component has fun fooComponent(...) factory function.
This ensures factory functions can be matched to components easily.
Factory functions should supply default arguments for parameters that do not represent component dependencies (where possible).
Example: fun fooComponent(config: Config = Config.DEFAULT, ...)
Sensible defaults allow consumers to only specify non-standard configuration when necessary, thereby reducing cognitive load.
Directives regarding Dagger modules and their placement in build targets.
Modules must be defined in separate build targets to the objects they provide/bind.
Example: BarModule in separate build target from Baz implementation.
Separating implementation from interface/binding prevents changing an implementation from invalidating the cache of every consumer of the interface, thereby improving build performance.Additionally, it ensures consumers can depend on individual elements independently (crucial forHilt) and allows granular binding overrides in tests.
Modules must depend on interfaces rather than implementations.
Example: BarModule depends on Baz interface, not BazImpl.
This enforces consistency with the dependency inversion principle, thereby decoupling the module and its bindings from concrete implementations.
Patterns for defining components used in testing to ensure testability.
Test components must extend production components.
Example: interface TestFooComponent : FooComponent
Tests should operate on the same interface as production code (Liskov Substitution), thereby ensuring that the test environment accurately reflects production behavior.
Test components should export additional bindings.
Example: TestFooComponent component extends FooComponent and additionally exposes fun testHelper(): TestHelper.
Exposing test-specific bindings allows tests to inspect internal state or inject test doubles without compromising the public production API, thereby facilitating white-box testing where appropriate.
The directives in this document work together to promote an architectural pattern for Dagger that follows foundational engineering best practices and principles, which in turn supports sustainable development and improves the contributor experience. The core principles are:
Overall, this architecture encourages and supports granular, maintainable components that can be evolved independently and composed together into complex structures. Components serve as both the public API for utilities, the integration system that ties elements together within utilities, and the composition system that combines utilities together. For upstream utility maintainers, this reduces boilerplate and reduces the risk of errors; for downstream utility consumers, this creates an unambiguous and self-documenting API that can be integrated without knowledge of implementation details; and for everyone, it distributes complexity across the codebase and promotes high cohesion(i.e. components defined nearest to the objects they expose). All together, this fosters sustainable development by reducing cognitive and computational load. \n The disadvantages of this approach and a strategy for mitigation are discussed in the[future work](#future-work) appendix.
The following example demonstrates a complete Dagger setup and usage that adheres to all the directives in this document. It features upstream (User) and downstream (Profile) components, separate modules for production and testing (including fake implementations), and strict separation of interface and implementation via naked component interfaces.
Common elements:
/** Custom Scope */
@Scope @Retention(AnnotationRetention.RUNTIME)
annotation class UserScope
/** Domain Interface */
interface User
/** Naked Component */
interface UserComponent {
fun user(): User
}
Production elements:
/** Real Implementation */
@UserScope class RealUser @Inject constructor() : User
/** Production Module */
@Module
interface UserModule {
@Binds
fun bind(impl: RealUser): User
companion object {
@Provides
fun provideTimeout() = 5000L
}
}
/** Production Component */
@UserScope
@Component(modules = [UserModule::class])
interface ProdUserComponent : UserComponent {
@Component.Builder
interface Builder {
fun build(): ProdUserComponent
}
}
/** Production Factory Function */
fun userComponent(): UserComponent = DaggerProdUserComponent.builder().build()
Test elements:
/** Fake Implementation */
@UserScope class FakeUser @Inject constructor() : User
/** Fake Module */
@Module
interface FakeUserModule {
@Binds
fun bind(impl: FakeUser): User
}
/** Test Component */
@UserScope
@Component(modules = [FakeUserModule::class])
interface TestUserComponent : UserComponent {
fun fakeUser(): FakeUser
@Component.Builder
interface Builder {
fun build(): TestUserComponent
}
}
/** Test Factory Function */
fun testUserComponent(): TestUserComponent = DaggerTestUserComponent.builder().build()
Common elements:
/** Custom Scope */
@Scope @Retention(AnnotationRetention.RUNTIME)
annotation class ProfileScope
/** Domain Interface */
interface Profile
/** Naked Component */
interface ProfileComponent {
fun profile(): Profile
}
Production elements:
** Real Implementation */
@ProfileScope class RealProfile @Inject constructor(
val user: User,
private val id: ProfileId
) : Profile {
data class ProfileId(val id: String)
}
/** Production Module */
@Module
interface ProfileModule {
@Binds
fun bind(impl: RealProfile): Profile
}
/** Production Component */
@ProfileScope
@Component(dependencies = [UserComponent::class], modules = [ProfileModule::class])
interface ProdProfileComponent : ProfileComponent {
@Component.Builder
interface Builder {
fun consuming(user: UserComponent): Builder
@BindsInstance fun binding(id: ProfileId): Builder
fun build(): ProdProfileComponent
}
}
/** Production Factory Function */
fun profileComponent(
user: UserComponent = userComponent(),
id: ProfileId = ProfileId("prod-id")
): ProfileComponent = DaggerProdProfileComponent.builder().consuming(user).binding(id).build()
Test elements:
/** Test Component */
@ProfileScope
@Component(dependencies = [UserComponent::class], modules = [ProfileModule::class])
interface TestProfileComponent : ProfileComponent {
@Component.Builder
interface Builder {
fun consuming(user: UserComponent): Builder
@BindsInstance fun binding(id: ProfileId): Builder
fun build(): TestProfileComponent
}
}
/** Test Factory Function */
fun testProfileComponent(
user: UserComponent = userComponent(),
id: ProfileId = ProfileId("test-id")
): TestProfileComponent = DaggerTestProfileComponent.builder().consuming(user).binding(id).build()
Example of production component used in production application:
class Application {
fun main() {
// Automatically uses production implementations (RealUser, RealProfile)
val profile = profileComponent().profile()
// ...
}
}
Example of production profile component used with test user component in a test:
@Test
fun testProfileWithFakeUser() {
// 1. Setup: Create the upstream test component (provides FakeUser)
val fakeUserComponent = testUserComponent()
val fakeUser = fakeUserComponent.fakeUser()
// 2. Act: Inject it into the downstream test component
val prodProfileComponent = profileComponent(user = fakeUserComponent)
val profile = prodProfileComponent.profile()
// 3. Assert: Verify integration
assertThat(profile.user).isEqualTo(fakeUser)
}
The main disadvantage of the pattern this document encodes is the need for a final downstreamassembly of components, which can become boilerplate heavy in deep graphs. For example:
fun main() {
// Level 1: Base component
val core = coreComponent()
// Level 2: Depends on Core
val auth = authComponent(core = core)
val data = dataComponent(core = core)
// Level 3: Depends on Auth, Data, AND Core
val feature = featureComponent(auth = auth, data = data, core = core)
// Level 4: Depends on Feature, Auth, AND Core
val app = appComponent(feature = feature, auth = auth, core = core)
}
A tool to reduce this boilerplate has been designed, and implementation is tracked by this issue.
2026-01-01 15:20:01
After spending months studying transformer architectures and building LLM applications, I realized something: most explanations are overwhelming or missing out some details. This article is my attempt to bridge that gap — explaining transformers the way I wish someone had explained them to me.
For an intro into what Large language model (LLM) means, refer this article I published previously.
By the end of this lesson, you will be able to look at any LLM architecture diagram and understand what is happening.
This is not just academic knowledge — understanding the Transformer architecture will help you make better decisions about model selection, optimize your prompts, and debug issues when your LLM applications behave unexpectedly.
How to Read This Lesson: You don't need to absorb everything in one read. Skim first, revisit later—this lesson is designed to compound over time. The concepts build on each other, so come back as you need deeper understanding.
Don't worry if some of these terms sound unfamiliar—we'll explain each concept step by step, starting with the basics. By the end of this lesson, these technical terms will make perfect sense, even if you're new to machine learning architecture.
Let's start with a simple analogy. Imagine you're reading a book and trying to understand a sentence:
"The animal didn't cross the street because it was too tired."
To understand this, your brain does several things:
A Transformer does something remarkably similar, but using math. Let me give you a simple explanation of how it works:
What goes in: Text broken into pieces (called tokens)
What's a token? Think of tokens as the basic building blocks that language models understand:
What happens inside: The model processes this text through several stages (we'll explore each in detail):
What comes out: Depends on what you need:
Think of a Transformer like an assembly line where each station refines the product. Raw materials (words) enter, each station adds something (position info, relationships, meaning), and the final product emerges more polished at each step.
Here's how text flows through a Transformer:
The diagram shows how a simple sentence like "The cat sat on the mat" gets processed through the transformer architecture - from tokenization to final output. The key steps include embedding the tokens into vectors, adding positional information, applying self-attention to understand relationships between words, and repeating the attention and processing steps multiple times to refine understanding.
Modern LLMs repeat the attention and processing steps many times:
Now let's walk through each step in detail, starting from the very beginning.
Before the model can process text, it needs to solve two problems: breaking text into pieces (tokenization) and converting those pieces into numbers (embeddings).
The Problem: How do you break text into manageable chunks? You might think "just split by spaces into words," but that's too simple.
Why not just use words?
Consider these challenges:
The solution: Subword Tokenization
Modern models break text into subwords - pieces smaller than words but larger than individual characters. Think of it like Lego blocks: instead of needing a unique piece for every possible structure, you reuse common blocks.
Simple example:
Text: "I am playing happily"
Split by spaces (naive approach):
["I", "am", "playing", "happily"]
Problem: Need separate entries for "play", "playing", "played", "player", "plays"...
Subword tokenization (smart approach):
["I", "am", "play", "##ing", "happy", "##ly"]
Better: Reuse "play" and "##ing" for "playing", "running", "jumping"
Reuse "happy" and "##ly" for "happily", "sadly", "quickly"
Why this matters - concrete examples:
Real example of tokenization impact:
Input: "The animal didn't cross the street because it was tired"
Tokens (what the model actually sees):
["The", "animal", "didn", "'", "t", "cross", "the", "street", "because", "it", "was", "tired"]
Notice:
- "didn't" → ["didn", "'", "t"] (split to handle contractions)
- Each token gets converted to numbers (embeddings) next
The Problem: Computers don't understand tokens. They only work with numbers. So how do we convert "cat" into something a computer can process?
Before we dive in, let's understand what "dimensions" mean with a familiar example:
Describing a person in 3 dimensions:
These 3 numbers (dimensions) give us a mathematical way to represent a person. Now, what if we want to represent a word mathematically?
Describing a word needs way more dimensions:
To capture everything about the word "cat", we need hundreds of numbers:
Modern models use 768 to 4096 dimensions because words are complex! But here's the key: you don't need to understand what each dimension represents. The model figures this out during training.
Let's walk through a concrete example:
# This is a simplified embedding table (real ones have thousands of words)
# Each word maps to a list of numbers (a "vector")
embedding_table = {
"cat": [0.2, -0.5, 0.8, ..., 0.1], # 768 numbers total
"dog": [0.3, -0.4, 0.7, ..., 0.2], # Notice: similar to "cat"!
"bank": [0.9, 0.1, -0.3, ..., 0.5], # Very different from "cat"
}
# When we input a sentence:
sentence = "The cat sat"
# Step 1: Break into tokens
tokens = ["The", "cat", "sat"]
# Step 2: Look up each token's vector
embedded = [
embedding_table["The"], # Gets: [0.1, 0.3, ..., 0.2] (768 numbers)
embedding_table["cat"], # Gets: [0.2, -0.5, ..., 0.1] (768 numbers)
embedding_table["sat"], # Gets: [0.4, 0.2, ..., 0.3] (768 numbers)
]
# Result: We now have 3 vectors, each with 768 dimensions
# The model can now do math with these!
Great question! The embedding table isn't written by hand. Here's how it's created:
These embeddings capture word relationships mathematically:
When we say GPT-3 has 175 billion parameters, where are they? A significant chunk lives in the embedding table.
What happens in the embedding layer:
Example: If "cat" = token #847, the model looks up row #847 in its embedding table and retrieves a vector like [0.2, -0.5, 0.7, …] with hundreds or thousands of numbers. Each of these numbers is a parameter that was optimized during training.
This is why embeddings contain so much "knowledge" - they encode the meaning and relationships between words that the model learned from massive amounts of text.
The Problem: After converting words to numbers, we have another issue. Look at these two sentences:
They have the same words, just in different order. But right now, the model sees them as identical because it just has three vectors with no order information!
Real-world example:
Transformers process all words at the same time (unlike reading left-to-right), so we need to explicitly tell the model: "This is word #1, this is word #2, this is word #3."
Think of it like adding page numbers to a book. Each word gets a "position tag" added to its embedding.
Simple Example:
# We have our word embeddings from Step 1:
word_embeddings = [
[0.1, 0.3, 0.2, ...], # "The" (768 numbers)
[0.2, -0.5, 0.1, ...], # "cat" (768 numbers)
[0.4, 0.2, 0.3, ...], # "sat" (768 numbers)
]
# Now add position information:
position_tags = [
[0.0, 0.5, 0.8, ...], # Position 1 tag (768 numbers)
[0.2, 0.7, 0.4, ...], # Position 2 tag (768 numbers)
[0.4, 0.9, 0.1, ...], # Position 3 tag (768 numbers)
]
# Combine them (add the numbers together):
final_embeddings = [
[0.1+0.0, 0.3+0.5, 0.2+0.8, ...], # "The" at position 1
[0.2+0.2, -0.5+0.7, 0.1+0.4, ...], # "cat" at position 2
[0.4+0.4, 0.2+0.9, 0.3+0.1, ...], # "sat" at position 3
]
# Now each word carries both:
# - What the word means (from embeddings)
# - Where the word is located (from position tags)
The original Transformer paper used a mathematical pattern based on sine and cosine waves. You don't need to understand the math — just know that:
Newer models like Llama and Mistral use an improved approach called RoPE (Rotary Position Embeddings).
Simple analogy: Think of a clock face with moving hands:
Word at position 1: Clock hand at 12 o'clock (0°)
Word at position 2: Clock hand at 1 o'clock (30°)
Word at position 3: Clock hand at 2 o'clock (60°)
Word at position 4: Clock hand at 3 o'clock (90°)
...
How this connects to RoPE: Just like the clock hands rotate to show different times, RoPE literally rotates each word's embedding vector based on its position. Word 1 gets rotated 0°, word 2 gets rotated 30°, word 3 gets rotated 60°, and so on. This rotation encodes position information directly into the word vectors themselves.
Why this works:
Why this matters in practice:
Key takeaway: Position encoding ensures the model knows "The cat sat" is different from "sat cat The". Without this, word order would be lost!
This is the magic that makes Transformers work! Let's understand it with a story.
Imagine you're at a dinner party with 10 people. Someone mentions "Paris" and you want to understand what they mean:
Attention does exactly this for words in a sentence!
Let's process this sentence:
"The animal didn't cross the street because it was too tired."
When the model processes the word "it", it needs to figure out: What does "it" refer to?
Step 1: The word "it" asks questions
Step 2: All other words offer information
Step 3: "it" calculates relevance scores
Step 4: "it" gathers information The model now knows: "it" = mostly "animal" + a bit of "tired" + tiny bit of others
The model creates three versions of each word:
The matching process:
# Simplified example (real numbers would be 768-dimensional)
# Word "it" creates its Query:
query_it = [0.8, 0.3, 0.9] # Looking for: subject, noun, living thing
# Word "animal" has this Key:
key_animal = [0.9, 0.4, 0.8] # Offers: subject, noun, living thing
# How well do they match? Multiply and sum:
relevance = (0.8×0.9) + (0.3×0.4) + (0.9×0.8)
= 0.72 + 0.12 + 0.72
= 1.56 # High match!
# Compare with "street":
key_street = [0.1, 0.4, 0.2] # Offers: not-subject, noun, non-living thing
relevance = (0.8×0.1) + (0.3×0.4) + (0.9×0.2)
= 0.08 + 0.12 + 0.18
= 0.38 # Lower match
# Convert to percentages (this is what "softmax" does):
# "animal" gets 45%, "street" gets 8%, etc.
You might see this formula in papers:
Attention(Q, K, V) = softmax(Q × K^T / √d_k) × V
What it means in plain English:
Where it comes from: Researchers from Google Brain discovered in 2017 that this mathematical formula effectively models how words should pay attention to each other. It's inspired by information retrieval (like how search engines find relevant documents).
You don't need to memorize this! Just remember: attention = figuring out which words are related and gathering information from them.
Let's see attention in action with actual numbers:
Sentence: "The animal didn't cross the street because it was tired"
When processing "it", the attention mechanism calculates:
Word Relevance Score What This Means
─────────────────────────────────────────────────────────
"The" → 2% Article, not important
"animal" → 45% Main subject! Likely referent
"didn't" → 3% Verb helper, not the focus
"cross" → 5% Action, minor relevance
"the" → 2% Article again
"street" → 8% Object/location, somewhat relevant
"because" → 2% Connector word
"it" → 10% Self-reference (checking own meaning)
"was" → 8% Linking verb, somewhat relevant
"tired" → 15% State description, quite relevant
─────
Total = 100% (Scores sum to 100%)
Result: The model now knows "it" primarily refers to "animal" (45%), with some connection to being "tired" (15%). This understanding gets encoded into the updated representation of "it".
How does this actually update "it"? The model takes a weighted average of all words' Value vectors using these percentages:
# Each word has a Value vector (what information it contains)
value_animal = [0.9, 0.2, 0.8] # Contains: mammal, four-legged, animate
value_tired = [0.1, 0.3, 0.9] # Contains: state, adjective, fatigue
value_street = [0.2, 0.8, 0.1] # Contains: place, concrete, inanimate
# ... (other words)
# Updated representation of "it" = weighted combination
new_it = (45% × value_animal) + (15% × value_tired) + (8% × value_street) + ...
= (0.45 × [0.9, 0.2, 0.8]) + (0.15 × [0.1, 0.3, 0.9]) + ...
= [0.52, 0.19, 0.61] # Now "it" carries meaning from "animal" + "tired"
The word "it" now has a richer representation that includes information from "animal" (heavily weighted) and "tired" (moderately weighted), helping the model understand the sentence better.
Simple analogy: When you read a sentence, you notice multiple things simultaneously:
Multi-head attention lets the model do the same thing! Instead of one attention mechanism, models use 8 to 128 different attention "heads" running in parallel.
Example with the sentence "The fluffy dog chased the cat":
Important: These specializations aren't programmed! During training, different heads naturally learn to focus on different relationships. Researchers discovered this by analyzing trained models—it emerges automatically.
How they combine:
# Each head produces its own understanding:
head_1_output = attention_head_1(text) # Finds subject-verb
head_2_output = attention_head_2(text) # Finds adjective-noun
head_8_output = attention_head_8(text) # Finds other patterns
# Combine all heads into a rich understanding:
final_output = combine([head_1_output, head_2_output, ..., head_8_output])
# Now each word has information from all types of relationships!
Why this matters: Having multiple attention heads is like having multiple experts analyze the same text from different angles. The final result is much richer than any single perspective.
After attention gathers information, each word needs to process what it learned. This is where the Feed-Forward Network (FFN) comes in.
Simple analogy:
What happens:
After "it" gathered information that it refers to "animal" and relates to "tired", the FFN processes this:
# Simplified version
def process_word(word_vector):
# Step 1: Expand to more dimensions (gives more room to think)
bigger = expand(word_vector) # 768 numbers → 3072 numbers
# Step 2: Apply complex transformations (the "thinking")
processed = activate(bigger) # Non-linear processing
# Step 3: Compress back to original size
result = compress(processed) # 3072 numbers → 768 numbers
return result
What's it doing? Let's trace through a concrete example using our sentence:
Example: Processing "it" in "The animal didn't cross the street because it was tired"
After attention, "it" has gathered information showing it refers to "animal" (45%) and relates to "tired" (15%). Now the FFN enriches this understanding:
Step 1 - What comes in:
Vector for "it" after attention: [0.52, 0.19, 0.61, ...]
This already knows: "it" refers to "animal" and connects to "tired"
Step 2 - FFN adds learned knowledge:
Think of the FFN as having millions of pattern detectors (neurons) that learned from billions of text examples. When "it" enters with its current meaning, specific patterns activate:
Input pattern: word "it" + animal reference + tired state
FFN recognizes patterns:
- Pattern A activates: "Pronoun referring to living creature" → Strengthens living thing understanding
- Pattern B activates: "Subject experiencing fatigue" → Adds physical/emotional state concept
- Pattern C activates: "Reason for inaction" → Links tiredness to not crossing
- Pattern D stays quiet: "Object being acted upon" → Not relevant here
What the FFN is really doing: It's checking "it" against thousands of patterns it learned during training, like:
Step 3 - What comes out:
Enriched vector: [0.61, 0.23, 0.71, ...]
Now contains: pronoun role + animal reference + tired state + causal link (tired → didn't cross)
The result: The model now has a richer understanding: "it" isn't just referring to "animal"—it understands the animal is tired, and this tiredness is causally linked to why it didn't cross the street.
Here's another example showing how FFN removes uncertainty of word meanings:
Example - "bank":
Think of FFN as the model's "knowledge base" where millions of facts and patterns are stored in billions of network weights (the connections between neurons). Unlike attention (which gathers context from other words), FFN applies learned knowledge to that context.
It's the difference between:
Key insight:
Modern improvement: Newer models use something called "SwiGLU" instead of older activation functions. It provides better performance, but the core idea remains: process the gathered information to extract deeper meaning.
These might sound technical, but they solve simple problems. Let me explain with everyday analogies.
The Problem: Imagine you're editing a document. You make 96 rounds of edits. By round 96, you've completely forgotten what the original said! Sometimes the original information was important.
The Solution: Keep a copy of the original and mix it back in after each edit.
In the Transformer:
# Start with a word's representation
original = [0.2, 0.5, 0.8, ...] # "cat" representation
# After attention + processing, we get changes
changes = [0.1, -0.2, 0.3, ...] # What we learned
# Residual connection: Keep the original + add changes
final = original + changes
= [0.2+0.1, 0.5-0.2, 0.8+0.3, ...]
= [0.3, 0.3, 1.1, ...] # Original info preserved!
Better analogy: Think of editing a photo:
Why this matters: Deep networks (96-120 layers) need this. Otherwise, information from early layers disappears by the time you reach the end.
The Problem: Imagine you're calculating daily expenses:
The huge number breaks everything.
The Solution: After each step, check if numbers are getting too big or too small, and adjust them to a reasonable range.
What normalization does:
Before normalization:
Word vectors might be:
"the": [0.1, 0.2, 0.3, ...]
"cat": [5.2, 8.9, 12.3, ...] ← Too big!
"sat": [0.001, 0.002, 0.001, ...] ← Too small!
After normalization:
"the": [0.1, 0.2, 0.3, ...]
"cat": [0.4, 0.6, 0.8, ...] ← Scaled down to reasonable range
"sat": [0.2, 0.4, 0.1, ...] ← Scaled up to reasonable range
How it works (simplified):
# For each word's vector:
# 1. Calculate average and spread of numbers
average = 5.0
spread = 3.0
# 2. Adjust so average=0, spread=1
normalized = (original - average) / spread
# Now all numbers are in a similar range!
Why this matters:
Key takeaway: These two tricks (residual connections + normalization) are like safety features in a car—they keep everything running smoothly even when the model gets very deep (many layers).
Transformers come in three varieties, like three different tools in a toolbox. Each is designed for specific jobs.
Think of it like: A reading comprehension expert who thoroughly understands text but can't write new text.
How it works: Sees the entire text at once, looks at relationships in all directions (words can look both forward and backward).
Training example:
Show it: "The [MASK] sat on the mat"
It learns: "The cat sat on the mat"
By filling in blanks, it learns deep understanding!
Real-world uses:
Popular models: BERT, RoBERTa (used by many search engines)
Key limitation: Can understand and classify text, but cannot generate new text. It's like a reading expert who can't write.
Think of it like: A creative writer who generates text one word at a time, always building on what came before.
How it works: Processes text from left to right. Each word can only "see" previous words, not future ones (because future words don't exist yet during generation!).
Training example:
Show it: "The cat sat on the"
It learns: Next word should be "mat" (or "floor", "chair", etc.)
By predicting next words billions of times, it learns to write!
Why only look backward? Because when generating text, future words don't exist yet—you can only use what you've written so far. It's like writing a story one word at a time: after "The cat sat on the", you can only look back at those 5 words to decide what comes next.
When predicting "sat":
Can see: "The", "cat" ← Use these to predict
Cannot see: "on", "the", "mat" ← Don't exist yet during generation
Real-world uses:
def calculate_ → it suggests the restPopular models: GPT-4, Claude, Llama, Mistral (basically all modern chatbots)
Why this is dominant: These models can both understand AND generate, making them incredibly versatile. This is what you use when you chat with AI.
Think of it like: A two-person team: one person reads and understands (encoder), another person writes the output (decoder).
How it works:
Training example:
Input (to encoder): "translate English to French: Hello world"
Output (from decoder): "Bonjour le monde"
Encoder understands English, Decoder writes French!
Real-world uses:
Popular models: T5, BART (less common nowadays)
Why less popular now: Decoder-only models (like GPT) turned out to be more versatile—they can do translation AND chatting AND coding, all in one architecture. Encoder-decoder models are more specialized.
Need to understand/classify text? → Encoder (BERT)
Need to generate text? → Decoder (GPT)
Need translation/summarization only? → Encoder-Decoder (T5)
Not sure? → Use Decoder-only (GPT-style)
Bottom line: If you're building something today, you'll most likely use a decoder-only model (like GPT, Claude, Llama) because they're the most flexible and powerful.
Now that you understand the components, let us see how they scale:
As models grow from small to large, here's what changes:
| Component | Small (125M params) | Medium (7B params) | Large (70B params) | |----|----|----|----| | Layers (depth) | 12 | 32 | 80 | | Hidden size (vector width) | 768 | 4,096 | 8,192 | | Attention heads | 12 | 32 | 64 |
Key insights:
1. Layers (depth) - This is how many times you repeat Steps 3 & 4
Example: Processing "it" in our sentence:
2. Hidden size (vector width) - How many numbers represent each word
3. Attention heads - How many different perspectives each layer examines
Where do the parameters live?
Surprising fact: The Feed-Forward Network (FFN) actually takes up most of the model's parameters, not the attention mechanism!
Why? In each layer:
In large models, FFN parameters outnumber attention parameters by 3-4x. That's where the "knowledge" is stored!
Simple explanation: Every word needs to look at every other word. If you have N words, that's N × N comparisons.
Concrete example:
3 words: "The cat sat"
- "The" looks at: The, cat, sat (3 comparisons)
- "cat" looks at: The, cat, sat (3 comparisons)
- "sat" looks at: The, cat, sat (3 comparisons)
Total: 3 × 3 = 9 comparisons
6 words: "The cat sat on the mat"
- Each of 6 words looks at all 6 words
Total: 6 × 6 = 36 comparisons (4x more for 2x words!)
12 words:
Total: 12 × 12 = 144 comparisons (16x more for 4x words!)
The scaling problem:
| Sentence Length | Attention Calculations | Growth Factor | |----|----|----| | 512 tokens | 262,144 | 1x | | 2,048 tokens | 4,194,304 | 16x more | | 8,192 tokens | 67,108,864 | 256x more |
Why this matters: Doubling the length doesn't double the work—it quadruples it! This is why:
Solutions being developed:
These tricks help models handle longer texts without the exponential cost!
Important: This diagram represents the universal Transformer architecture. All Transformer models (BERT, GPT, T5) follow this basic structure, with variations in how they use certain components.
Let's walk through the complete flow step by step:
Let's trace "The cat sat" through this architecture:
Step 1: Input Tokens
Your text: "The cat sat"
Tokens: ["The", "cat", "sat"]
Step 2: Embeddings + Position
"The" → [0.1, 0.3, ...] + position_1_tag → [0.1, 0.8, ...]
"cat" → [0.2, -0.5, ...] + position_2_tag → [0.4, -0.2, ...]
"sat" → [0.4, 0.2, ...] + position_3_tag → [0.8, 0.5, ...]
Now each word is a 768-number vector with position info!
Step 3: Through N Transformer Layers (repeated 12-120 times)
Each layer does this:
Step 4a: Multi-Head Attention
- Each word looks at all other words
- "cat" realizes it's the subject
- "sat" realizes it's the action "cat" does
- Words gather information from related words
Step 4b: Add & Normalize
- Add original vector back (residual connection)
- Normalize numbers to reasonable range
- Keeps information stable
Step 4c: Feed-Forward Network
- Process the gathered information
- Apply learned knowledge
- Each word's vector gets richer
Step 4d: Add & Normalize (again)
- Add vector from before FFN (another residual)
- Normalize again
- Ready for next layer!
After going through all N layers, each word's representation is incredibly rich with understanding.
Step 5: Linear + Softmax
Take the final word's vector: [0.8, 0.3, 0.9, ...]
Convert to predictions for EVERY word in vocabulary (50,000 words):
"the" → 5%
"a" → 3%
"on" → 15% ← High probability!
"mat" → 12%
"floor" → 8%
...
(All probabilities sum to 100%)
Step 6: Output
Pick the most likely word: "on"
Complete sentence so far: "The cat sat on"
Then repeat the whole process to predict the next word!
Now that you've seen the complete flow, here's how each model type uses it differently:
1. Encoder-Only (BERT):
2. Decoder-Only (GPT, Claude, Llama):
3. Encoder-Decoder (T5):
Uses: TWO stacks - one encoder (steps 1-4), one decoder (full steps 1-6)
Encoder: Bidirectional attention to understand input
Decoder: Causal attention to generate output, also attends to encoder
Training: Input→output mapping ("translate: Hello" → "Bonjour")
Purpose: Translation, summarization, transformation tasks
The key difference: Same architecture blocks, different attention patterns and how they're connected!
It's a loop: For generation, this process repeats. After predicting "on", the model adds it to the input and predicts again.
The "N" matters:
This is universal: Whether you're reading a research paper about a new model or trying to understand GPT-4, this diagram applies. The core architecture is the same!
Understanding the architecture helps you make better decisions:
The context window is not just a number—it is a hard architectural limit. A model trained on 4K context cannot magically understand 100K tokens without modifications (RoPE interpolation, fine-tuning, etc.).
Tokens at the beginning and end of context often get more attention (primacy and recency effects). If you have critical information, consider its placement in your prompt.
Early layers capture syntax and basic patterns. Later layers capture semantics and complex reasoning. This is why techniques like layer freezing during fine-tuning work—early layers transfer well across tasks.
Every extra token in your prompt increases compute quadratically. Be concise when you can.
\n
\
2026-01-01 15:16:34
We interviewed a dozen(ish) expert tech bloggers over the past year to share perspectives and tips beyond Writing for Developers. The idea: ask everyone the same set of questions and hopefully see an interesting range of responses emerge. They did.
You can read all the interviews here. We’ll continue the interview series (and maybe publish some book spinoff posts too). But first, we want to pause and compare how the first cohort of interviewees responded to specific questions.
Here’s how everyone answered the question “Why did you start blogging – and why do you continue?”
I started blogging as a way to get attention for a product that I was working on. And while that product never really worked out, I started getting interest from people that wanted me to come work for them, either as a freelancer or as a full-time employee. And since then, I have realized what a cheat code it is to have a public body of work that people can just passively stumble upon. It’s like having a bunch of people out there advocating for you at all times, even while you’re sleeping.
\
I don’t know exactly, but in general, I want to express my interest in things I like, in my passions. It was not some kind of calculation where I said: oh, well, blogging would benefit my career. I just needed to do it.
\
I started writing at a big life inflection point -- the brief period after I left Facebook but before I started Honeycomb. I had started giving talks, and found it surprisingly rewarding, but I’m not an extrovert and I’ve always considered myself more of a writer-thinker than a talker-thinker, so I thought I might as well give it a try.
There are very few things in life that I am prouder of than the body of writing I have developed over the past 10 years. I have had a yearly goal of publishing about one longform piece of writing per month. I don’t think I’ve ever actually hit that goal, but some years I have come close! When I look back over things I have written, I feel like I can see myself growing up, my mental health improving, I’m getting better at taking the long view, being more empathetic, being less reactive… I’ve never graduated from anything in my life, so to me, my writing kind of externalizes the progress I’ve made as a human being. It’s meaningful to me.
\
I started my blog, which is now at ericlippert.com, more than 20 years ago. I worked for the Developer Division at Microsoft at the time. As developers working on tools for other developers just like us, we felt a lot of empathy for our customers and were always looking for feedback on what their wants and needs were. But the perception of Microsoft by those customers was that the company was impersonal, secretive, uncommunicative, and uncaring. When blogs started really taking off in the early 2000s, all employees were encouraged to reach out to customers and put a more human, open, and empathetic face on the company, and I really went for that opportunity.
I was on the scripting languages team at the time, and our public-facing documentation was sparse. Our documentation was well-written and accurate, but there was only so much work that our documentation “team” – one writer – could do. I decided to focus my blog on the stuff that was missing from the documentation: odd corners of the language, why we’d made certain design decisions over others, that sort of thing. My tongue-in-cheek name for it was “Fabulous Adventures In Coding.” At its peak, I think it was the second most popular blog on MSDN that was run by an individual rather than a team.
For most of the last decade I’ve been at Facebook, which discourages employees blogging about their work, so my rate of writing dropped off precipitously then. And since leaving Facebook a couple years ago, I haven’t blogged much at all. I do miss it, and I might pick it up again this winter. I really enjoy connecting with an audience.
Editor’s note: He’s now writing a book.
\
I genuinely cannot remember why I started, because I’ve been blogging for about 15 years! That’s just what the internet was like back then? It wasn’t weird for people to have their own website — it was part of maintaining your online identity. We’re starting to see that come back in that post-Twitter era, folks value having their own domain name more, and pick up blogging again.
I can say, though, that in 2019 I started a Patreon to motivate me to take writing more seriously — I’m reluctant to call it “blogging” at this point because some of my longer articles are almost mini-books! Some can take a solid hour to go through. At the time, I was sick of so many articles glossing over particulars: I made it my trademark to go deep into little details, and not to be afraid to ask questions.
\
I started blogging years ago at ScyllaDB. I was initially forced to do it, but I ended up really enjoying it. [This is strikingly similar to Sarna’s “Stockholm Syndrome” story in Chapter 1 of the book].
I’ve always liked teaching people and I saw that technical blogging was a way to do that…at scale. As I was learning new things, often working with previously unexplored technologies and challenges, blogging gave me this opportunity to teach a large audience of people about what I discovered.
I kept doing it because it actually works. It really does reach a lot of people. And it’s very rewarding when you find that your blog is getting people to think differently, maybe even do something differently.
\
I started blogging for a couple of reasons really. First, it just helps me to take note of things I learned and which I might want to refer to again in the future, like how to prevent ever-growing replication slots in Postgres. I figured, instead of writing things like that down just for myself, I could make these notes available on a blog so others could benefit from them, too. Then, I like to explore and write about technologies such as Java, OpenJDK Flight Recorder, or Apache Kafka. Some posts also distill the experience from many years, for instance discussing how to do code reviews or writing better conference talk abstracts. Oftentimes, folks will add their own thoughts in a comment, from which I can then learn something new again–so it’s a win for everyone.
Another reason for blogging is to spread the word about things I’ve been working on, such as kcctl 🐻, a command-line client for Kafka Connect. Such posts will, for instance, introduce a new release or discuss a particular feature or use case and they help to increase awareness for a project and build a community around it. Or they might announce efforts such as the One Billion Row coding challenge I did last year. Finally, some posts are about making the case for specific ideas, say, continuous performance regression testing, or how Kafka Connect could be reinvisioned in a more Kubernetes-native way.
Overall — and this is why I keep doing it — blogging is a great way for me to express my thoughts, ideas, and learnings, and share them with others. It allows me to get feedback and input on new projects, and it’s an opportunity for helping as well as learning from others.
\
I love blogging – it’s how I got to where I am. Steve McConnell‘s book Code Complete is what inspired me to start blogging. His voice was just so human. Instead of the traditional chest-thumping about “My algorithm is better than your algorithm,” it was about “Hey, we’re all fallible humans writing software for other fallible humans.” I thought, “Oh my God, this is humanistic computing.” I loved it! I knew I had to write like that too. That’s what launched me on my journey.
Now more than ever, I think it’s important to realize that we’ve given everyone a Gutenberg printing press that reaches every other human on the planet. At first blush, that sounds amazing. Wow, everybody can talk to everybody! But then the terror sets in: Oh, my God. Everybody can talk to everybody – this is a nightmare.
I think blogs are important because it’s a structured form of writing. Sadly, chat tends to dominate now. I want people to articulate their thoughts, to really think about what they’re saying – structure it, have a story with a beginning, a middle, and an end. It doesn’t have to be super long. However, chat breaks everything up into a million pieces. You have these interleaved conversations where people are just typing whatever pops into their brain, sometimes with 10 people doing that at the same time. How do you create a narrative out of this? How do you create a coherent story out of chat?
I think blogging is a better mental exercise. Tell the story of what happened to you, and maybe we can learn from it. Maybe you can learn from your own story, perhaps from the whole rubber ducking aspect of it. As you’re explaining it to yourself, you’re also creating a public artifact that can benefit others who might have the same problem or a related story. And it’s your story – what’s unique about you. I want to hear about you as a person – your unique experience and what you’ve done and what you’ve seen. That’s what makes humanity great. And I think blogs are an excellent medium for that.
There’s certainly a place for video, there’s a place for chat. These tools all have their uses, but use the appropriate tool for the appropriate job. I think blogs are a very, very versatile tool in terms of median length, telling a story, and sharing it with the public.
If you look at the history of humanity, the things that have really changed the world have been in writing – books, novels, opinion pieces, even blogs. The invention of language was important, but the invention of writing was so much bigger. With writing, you didn’t have to depend on one person being alive long enough to tell the story over and over. You could write it down and then it could live on forever.
I encourage everyone to write, even if you write only for yourself. I think it’s better if you write in public because you can get feedback that way. You can learn so much from the feedback – learn that others feel the same way, learn about aspects you didn’t think about, etc. But it’s scary. I get it – people are afraid of putting themselves out there. Write for just you if you want, but write… just write.
\
I first started blogging in earnest back in the early 2000s because I was at university and wanted to share that experience with the rest of my family. Somewhere along the line, I started a second blog to record (mainly for myself) the technologies and tools I was learning.
These early tech blog posts were pretty basic. I’d learn a new sed trick, and write a couple hundred words about it. I’d try a new code editor, take a screenshot, and write my basic impressions. Embarrassingly, sometimes my blog posts were terribly inaccurate. I once wrote one on optimizing tree walking algorithms that was totally wrong. But I just updated it later with a note that said I’d learned more and now realized there were better ways of doing things.
In those early days, I never used any analytics or anything. I had no idea if anyone ever read what I wrote. Then one day, a friend of mine got really into SEO and asked if I would set up Google Analytics and share with him so he could learn a bit. I was utterly shocked by what we learned: My blog had a ton of traffic, and some of the most basic posts (like the one about sed) were perennially popular.
I’ve blogged on and off since then. These days, I mainly post on the Fermyon blog. And those posts are more theoretical than my early how-to focused posts.
\
My shameless goal when I started blogging in 2015 was to become a regular on the front of Hacker News (HN) because I got the sense that it would be good for my career. And I enjoyed the type of posts that made a great fit for HN; posts that were a little crazy yet taught you something useful.
After becoming a manager in 2017, I realized the importance and value of writing for the sake of communicating and my focus on writing shifted from “writing about zany explorations” to “writing as a means for teaching myself a topic, or solidifying my understanding of the topic.”
I started to notice, observing both myself and coworkers, that we developers let so many educational opportunities pass without recording the results. What a waste!
Not only is writing about what you learn good for your own understanding and your team’s understanding and for the internet’s understanding, it’s good marketing for you and the company you work for. Good marketing in the sense that when people see someone write a useful blog post, they think “that person is cool, and the company they work for must also be cool; I want to work with them or work for that company or buy from that company.”
So there’s this confluence of reasons that make blogging so obviously worth the time.
\
Honestly, I don’t think I made a conscious decision to start blogging. I just remember being involved in open source projects and my story wasn’t really out there yet. And it felt odd that nobody knew my situation.
I didn’t want to just flat out randomly tell people that I’m incarcerated. So I decided that I’d write my story down for anyone to find if they came across my profile. I really did not expect for many people to actually read it, so it was pretty shocking to see it on the front page of HN a couple days after it was published.
I try my best to keep writing, although I don’t write as often as I should. Writing an in-depth technical blog post about a feature built or a problem solved allows me to fully absorb and understand it even better than just implementing it, so this is another reason I feel like I will continue to write. It serves as both personal motivation to more deeply understand something I am working on, as well as a way to share that knowledge with others.
\
I started blogging about tech in my final year of university. The earliest posts on http://samwho.dev, the ones from 2011, were written at this time. I’d heard that having an online presence would help me get a job, so I started writing about the things I was learning.
I wrote sporadically for years, most of my posts only getting a trickle of traffic, but I did have a few modestly successful ones. https://samwho.dev/blog/the-birth-and-death-of-a-running-program/ did well and even ended up as part of the Georgia Tech CS2110 resources list. One of the lecturers, who has since retired, emailed me in 2013 asking if he could use it. I was concerned because the post had swearing in it, but he said “swearing is attention getting and helps the reader stay alert.”
The blog posts I’ve become known for, the ones with lots of visual and interactive bits, started in the first half of 2023. I’d long admired the work of https://ciechanow.ski/ and wanted to see if I could apply his style to programming. It’s working well so far!
As for why I continue, I’ve been gainfully employed for a long time now, so my initial motivations for writing are long gone. I think my blog does help when I have conversations with employers, but that’s not the goal anymore.
I have this dream of being a teacher. I’ve dabbled in many forms of teaching: teaching assistant in university for some of my lecturers, mentoring in commercial and personal capacities, moderating learning communities, volunteering at bootcamps and kids’ groups. What if I could just… teach for a living?
I’m trying to make use of the attention these blog posts are getting to see if I can make steps towards doing just that.
\
Over the years, I had accumulated a number of useful scripts and techniques for troubleshooting the common OS & database problems I had encountered. At first, I created the blog (on June 18, 2007) as a lookup table for my future self. I uploaded all my open source tools to my blog and wrote articles about the more interesting troubleshooting scenarios. When I visited a customer to solve a problem, we could just copy & paste relevant scripts and commands from my blog. That way, I didn’t have to show up with a USB stick and try to get it connected to the internal network first.
Why do I continue? There’s so much cool stuff and interesting problems to write about. When writing, you have to do additional research to make sure your understanding is good enough. That’s the fun part. Systems are getting more complex, so you need to find new ways to “stay systematic” when troubleshooting problems and not go with trial & error and guesswork. These kinds of topics are my favorite, how to measure the right things, from the right angle, at the right time – so that even new unforeseen problems would inevitably reveal their causes to you via numbers and metrics, without any magic needed.
What makes me really happy is when people contact me and say that they were able to solve completely different problems that I hadn’t even seen, on their own, with the aid of the tools and systematic troubleshooting advice I have in my blog.
\
I published my first blog post on thorstenball.com in 2012. It’s this one about implementing autocompletion using Redis. I don’t know exactly why I started the blog, but, looking back, I think the main motivation was to share something I was proud of. It was a cool bit of code, it took me a bit to figure out, I learned a lot in the process, and I wanted to share the excitement.
At that time, I was also a junior software developer, having recently finished my first internship, trying to switch from studying philosophy into being an engineer and, I think, there was also a bit of “my blog can be a CV” aspect to it.
Back then, a friend had told me: you don’t need a degree to get a job as a software engineer, all you need to do is to show that you can do the job, because, trust me, he said, there’s a lot of people who have degrees but can’t do the job.
I figured that having a blog with which I can share what I learned, what I did, well, that’s a way to show that I can do the job. Now, I don’t think a lot of recruiters have read my blog, but I still believe there’s something to it: you’re sharing with the world what you do, what you learned, how you think — that’s a good thing in and of itself, and even if someone only takes a brief look at your blog before they interview you, I think that can help.
But, I also have to admit that I’ve been writing on the internet in one form or another, since the early 2000s, when I was a teenager. I had personal websites and blogs since I was 14 years old. I shared tutorials on web forums. There’s just something in me that makes me want to share stuff on the internet.
Nowadays I mostly write my newsletter, Register Spill, which I see as a different form of blogging, and for that newsletter, I have a few reasons:
I enjoy the writing. Well, okay, I enjoy having written. But, in general: I’m proud of writing something that’s good.
I’ve enjoyed tweeting a lot, but in the past few years the social media landscape has become so fractured that I decided to create a place of my own, a place where people can subscribe and follow me, where I can potentially take their emails and send them newsletters even if the platform decides to shut down.
Writing is thinking. I like sitting down and ordering my thinking in order to write something. The feelings of “I want to write something” and “I want to really think through this topic and share it” are similar for me.
\
\
2026-01-01 15:13:12
There’s a strange feeling that washes over you when you witness something you’ve created take on a life of its own. It’s not just pride; it’s a deep, almost philosophical resonance. That’s the feeling I’ve carried since my “Auto-Painter Robot Brain” completed its first masterpiece. What began as a simple coding exercise evolved into a profound exploration of art, logic, and the very nature of creation itself.
This isn’t just an “art generator.” It’s a small-scale model of a universe, born from a single moment, unfolding its entire complex existence over a million perfect, logical strokes.
My goal was to design a Python-based “robot brain” that could autonomously generate abstract digital artwork. It needed a canvas, a set of tools, and a way to make “creative” decisions.
os.urandom. This meant that the very first decision — the "seed" for all subsequent random choices — was a unique snapshot of my computer's internal activity at that precise moment. Every time the script runs, a new "universe" is born, guaranteed to be different.Once initialized, the robot began its work. The process was set to run for 1 million strokes. For over two hours, this autonomous artist diligently layered shapes, colors, and gradients onto the digital canvas.
\ \ Each decision, each placement, each color choice was a direct, logical consequence of that initial “Big Bang” seed. There was no human intervention, no second-guessing, just the relentless, perfect execution of its programmed laws.
The final artwork, a dense tapestry of overlapping forms and colors, is a visual record of this entire journey.
Press enter or click to view image in full size 
\
Upon completion, the robot’s work wasn’t just a single image. It delivered two profound artifacts:
.png): The final abstract image itself..txt): A meticulously detailed log file. This file records every single one of the million strokes, detailing its number, shape type, exact position, size, whether it was a gradient, its specific colors, and if applicable, its number of sides.This is where the true significance of the project resonated with me.
This project redefined my understanding of art. It’s not just about the final image, but the elegance of the system that created it. It’s a testament to the beauty of logic, the power of algorithms, and the profound parallels between a coded process and the very universe we inhabit — a single starting point, unfolding into a complex, perfect, and unrepeatable reality.
\ \