2026-02-04 02:12:33
RouteReality: Building a Community-Powered Bus Tracker for Belfast
When I first launched routereality.co.uk, the goal was simple: give Belfast and Northern Ireland bus users better, more accurate arrival predictions than the official sources provide. Unlike apps that rely solely on scheduled timetables or delayed GPS feeds, RouteReality is fully community-powered.
Users check predicted times, wait at the stop, and tap to report when the bus actually arrives. Every report feeds back into the system, refining predictions for everyone in real time.
Today, the site runs 24/7, covering 100+ routes and 17,000+ stops, with live updates constantly streaming in from real users across the country. But building and deploying a system like this one that people depend on every day was far from straightforward. Here are the main problems I encountered along the way, and how they shaped the project.
1. Keeping Real-Time Data Accurate When Users Are Always Reporting
The core of RouteReality is user-submitted arrival reports. In theory, more reports = better predictions. In practice, the live nature of the system creates immediate challenges.
Timing mismatches and outliers — People report arrivals at slightly different times due to boarding the bus, network delays, or simply tapping a second too early/late. Early on, a few bad reports could mess with predictions by minutes. I had to implement outlier detection (ignoring reports more than ~3 minutes off the median) and time window clustering to group reports for the same bus instance.
Duplicate or spam reports — With users constantly using the site, especially during peak commute hours, the same bus stop could receive multiple reports in seconds. Without careful deduplication logic (based on user session, location hints, and timestamp proximity), predictions would jump erratically.
Sparse data in the early days— When the user base was small, many stops had zero or one report per day. Predictions defaulted to timetable estimates, but users expected better. Bootstrapping accuracy required careful fallback logic and incentives to encourage early reporting.
**2. Scalability and Performance Under Constant User Load
**Unlike a static site, RouteReality has users actively querying predictions and submitting reports at all hours. The system needs to handle concurrent reads/writes without lag, especially during busy periods.
Database pressure — Storing millions of timestamped reports requires a time-series-friendly setup. Early prototypes using a standard relational DB choked on write-heavy loads. Moving to a more scalable store (with proper indexing and partitioning by route/stop/day) prevented slowdowns.
Server costs and monitoring — Running 24/7 means no off-hours for heavy maintenance. Unexpected spikes in traffic (e.g., bad weather driving more bus usage) could spike costs or cause brief slowdowns. Setting up real-time monitoring dashboards became essential to catch issues before users noticed.
Zero-downtime deploys — Early deploys caused brief interruptions as the server restarted. Implementing blue-green deployments or rolling updates eliminated that pain, but required more infrastructure setup.
*Bug fixes under pressure *— A subtle bug in report aggregation once caused predictions to drift by 5+ minutes for a popular route during evening rush. Users noticed immediately and reported it (ironically helping debug). Hotfixes had to be rolled out without breaking ongoing sessions.
*Testing in production-like conditions *— Local tests missed real-world issues like network latency on mobile data, varied device clocks, or users in poor signal areas. Gradual rollouts and feature flags became my best friends.
Lessons Learned and What's Next
Launching RouteReality taught me that real-time, user-dependent systems are as much about people as technology. Community engagement is the biggest variable. The more people report, the better it gets, but getting that branch started requires patience and careful tuning.
Despite the challenges, the system is live, improving daily, and already helping commuters in Belfast and beyond. Future plans include better outlier handling, optional location-based reporting (with privacy controls), and deeper analytics to spot patterns (e.g., chronically late routes).
If you're a regular user, thank you for every report. It directly makes predictions more accurate for everyone. And if you haven't tried it yet, head to the journey page and start reporting. The more we all contribute, the better RouteReality becomes.
Always cross-check with official Translink sources, as RouteReality remains an independent community project.
Posted February 2026
2026-02-04 02:07:57
Last time, I shared how MdBin migrated to Streamdown for better markdown rendering—Mermaid diagrams, KaTeX math, built-in controls, the works.
But there was one feature request that kept coming up: "Can I share sensitive content without you seeing it?"
Today, I'm excited to announce: end-to-end encrypted pastes are live.
Here's the uncomfortable truth about every pastebin service: they can read your content.
When you paste something into Pastebin, GitHub Gists, or even MdBin (until today), the server receives your plaintext, stores it, and serves it back. The service operator—and anyone who gains access to their database—can read everything you've shared.
For most use cases, this is fine. But what about:
The traditional answer is "just use Signal" or "encrypt it yourself first." But that adds friction, and friction kills adoption.
With MdBin's new encrypted paste feature, the server becomes a dumb blob storage. Here's what happens:
The server never sees your plaintext. We don't store your password. We couldn't decrypt your content even if we wanted to.
I didn't roll my own crypto (please never do this). Instead, I used the Web Crypto API with industry-standard algorithms.
Passwords are weak. Turning a password into a strong encryption key requires a key derivation function:
const PBKDF2_ITERATIONS = 310000 // OWASP 2023 recommendation
async function deriveKey(
password: string,
salt: Uint8Array
): Promise<CryptoKey> {
const encoder = new TextEncoder()
const passwordBuffer = encoder.encode(password)
const keyMaterial = await crypto.subtle.importKey(
'raw',
passwordBuffer,
'PBKDF2',
false,
['deriveBits', 'deriveKey']
)
return crypto.subtle.deriveKey(
{
name: 'PBKDF2',
salt: salt,
iterations: PBKDF2_ITERATIONS,
hash: 'SHA-256',
},
keyMaterial,
{ name: 'AES-GCM', length: 256 },
false,
['encrypt', 'decrypt']
)
}
Why 310,000 iterations? That's the OWASP 2023 recommendation for PBKDF2-HMAC-SHA256. It makes brute-force attacks computationally expensive while still being fast enough on modern devices.
For the actual encryption, I chose AES-256-GCM—authenticated encryption that provides both confidentiality and integrity:
export async function encrypt(
plaintext: string,
password: string
): Promise<string> {
const encoder = new TextEncoder()
const plaintextBuffer = encoder.encode(plaintext)
// Generate random salt and IV for each encryption
const salt = crypto.getRandomValues(new Uint8Array(16))
const iv = crypto.getRandomValues(new Uint8Array(12))
const key = await deriveKey(password, salt)
const ciphertext = await crypto.subtle.encrypt(
{ name: 'AES-GCM', iv },
key,
plaintextBuffer
)
// Combine: salt || iv || ciphertext
const combined = new Uint8Array(
16 + 12 + ciphertext.byteLength
)
combined.set(salt, 0)
combined.set(iv, 16)
combined.set(new Uint8Array(ciphertext), 28)
return btoa(String.fromCharCode(...combined))
}
Key points:
Decryption reverses the process:
export async function decrypt(
encrypted: string,
password: string
): Promise<string> {
const combined = new Uint8Array(
atob(encrypted).split('').map(c => c.charCodeAt(0))
)
// Extract salt, iv, and ciphertext
const salt = combined.slice(0, 16)
const iv = combined.slice(16, 28)
const ciphertext = combined.slice(28)
const key = await deriveKey(password, salt)
const plaintextBuffer = await crypto.subtle.decrypt(
{ name: 'AES-GCM', iv },
key,
ciphertext
)
return new TextDecoder().decode(plaintextBuffer)
}
If you provide the wrong password, crypto.subtle.decrypt throws—GCM's authentication tag verification fails. No partial decryption, no garbage output, just a clean error.
Crypto is useless if people don't use it. Here's how I made encryption approachable.
The paste form now has a mode switcher:
<div className="flex items-center gap-2 p-1 bg-gray-100 rounded-lg w-fit">
<button
onClick={() => setIsEncrypted(false)}
className={!isEncrypted ? 'bg-white shadow-sm' : ''}
>
<LockOpen className="w-4 h-4" />
Normal
</button>
<button
onClick={() => setIsEncrypted(true)}
className={isEncrypted ? 'bg-white shadow-sm' : ''}
>
<Lock className="w-4 h-4" />
Encrypted
</button>
</div>
Simple, obvious, no hidden settings pages.
Weak passwords defeat encryption. I added real-time password validation:
export function validatePassword(password: string): ValidationResult {
const checks = {
minLength: password.length >= 8,
hasLowercase: /[a-z]/.test(password),
hasUppercase: /[A-Z]/.test(password),
hasNumber: /[0-9]/.test(password),
hasSpecial: /[!@#$%^&*...]/.test(password),
}
let score = Object.values(checks).filter(Boolean).length
if (password.length >= 12) score++
if (password.length >= 16) score++
// Map to 0-4 strength scale
return { isValid: checks.minLength, checks, strength: score }
}
The UI shows a color-coded bar and checkmarks for each requirement. Users see exactly what makes a strong password.
Here's a clever feature: you can share the password in the URL.
https://mdbin.sivaramp.com/e/abc123#MySecretPassword
The fragment after # never gets sent to the server—it's browser-only. So you can share a complete self-decrypting link, and we still never see the password.
useEffect(() => {
if (typeof window !== 'undefined') {
const hash = window.location.hash.slice(1)
if (hash) {
const password = decodeURIComponent(hash)
// Clear hash immediately to prevent browser history leak
window.history.replaceState(null, '', window.location.pathname)
handleDecrypt(password, false)
}
}
}, [])
The hash is immediately cleared from the URL bar after reading. It won't appear in browser history, bookmarks, or shared screenshots.
With encrypted pastes, MdBin cannot:
This is a feature, not a bug.
The "Remember password" feature stores passwords in localStorage. I added clear warnings:
{savePassword && (
<p className="text-xs text-amber-600">
Password will be stored in your browser.
Only use on trusted devices.
</p>
)}
And there's a "Forget & Lock" button to clear stored passwords and re-lock the paste.
Encrypted pastes have a 75KB limit (vs 100KB for normal). Base64 encoding and the salt/IV overhead add ~35% to the stored size.
| Feature | Before | After |
|---|---|---|
| Server can read content | ✅ Yes | ❌ No (encrypted) |
| Password recovery | N/A | ❌ Impossible |
| Share sensitive content | ❌ Risky | ✅ Safe |
| Self-decrypting links | ❌ No | ✅ URL hash |
| Encryption algorithm | N/A | AES-256-GCM |
| Key derivation | N/A | PBKDF2 (310k iterations) |
Head to mdbin.sivaramp.com, toggle to Encrypted mode, and paste something sensitive.
Here's a test you can try:
test123
/e/[id] instead of /p/[id]
https://mdbin.sivaramp.com/e/[id]#test123
One more thing: Streamdown moved to a plugin architecture in a recent update. The new setup looks like this:
import { createCodePlugin } from '@streamdown/code'
import { mermaid } from '@streamdown/mermaid'
import { math } from '@streamdown/math'
const code = createCodePlugin({
themes: ['github-light', 'github-dark'],
})
<Streamdown plugins={{ code, mermaid, math }}>
{content}
</Streamdown>
Same great features, more modular architecture. I updated the home page to highlight all three new capabilities: Mermaid diagrams, Math/LaTeX, and end-to-end encryption.
While I was at it, I added two quality-of-life improvements that deserved their own deep dive.
Previously, MdBin only respected prefers-color-scheme—you got whatever your OS dictated. Now there's a proper theme toggle.
The Setup
First, install next-themes:
bun add next-themes
Create a ThemeProvider wrapper:
// src/components/theme-provider.tsx
'use client'
import { ThemeProvider as NextThemesProvider } from 'next-themes'
export function ThemeProvider({ children }: { children: React.ReactNode }) {
return (
<NextThemesProvider
attribute="class"
defaultTheme="system"
enableSystem
disableTransitionOnChange
>
{children}
</NextThemesProvider>
)
}
Key config:
attribute="class" — Adds .dark class to <html> instead of using data attributesenableSystem — Respects OS preference when set to "system"disableTransitionOnChange — Prevents flash-of-wrong-theme during hydrationTailwind CSS v4 Dark Mode
Here's the trick: Tailwind v4 uses a different syntax for custom variants. In globals.css:
@import 'tailwindcss';
@custom-variant dark (&:where(.dark, .dark *));
This enables class-based dark mode alongside Tailwind's existing dark: utilities. All those dark:bg-gray-900 classes now work with next-themes.
The Toggle Component
'use client'
import { useTheme } from 'next-themes'
import { useEffect, useState } from 'react'
import { Sun, Moon, Monitor } from 'lucide-react'
export function ThemeToggle() {
const [mounted, setMounted] = useState(false)
const { theme, setTheme } = useTheme()
// Avoid hydration mismatch
useEffect(() => setMounted(true), [])
if (!mounted) return <div className="w-9 h-9" /> // Placeholder
const cycleTheme = () => {
if (theme === 'light') setTheme('dark')
else if (theme === 'dark') setTheme('system')
else setTheme('light')
}
return (
<button
onClick={cycleTheme}
className="p-2 rounded-lg hover:bg-gray-100 dark:hover:bg-gray-800"
aria-label={`Current theme: ${theme}`}
>
{theme === 'light' && <Sun className="w-5 h-5" />}
{theme === 'dark' && <Moon className="w-5 h-5" />}
{theme === 'system' && <Monitor className="w-5 h-5" />}
</button>
)
}
The mounted check prevents hydration mismatches—next-themes doesn't know the theme until client-side JavaScript runs.
The header was getting crowded on mobile: Copy Link, Raw, New Paste, plus the new theme toggle. Instead of cramming tiny buttons, I added a hamburger menu below the md: breakpoint.
Desktop vs Mobile
<div className="flex items-center gap-2">
{/* Desktop: full button row */}
<div className="hidden md:flex items-center gap-2">
<button onClick={handleCopy}>Copy Link</button>
<Link href={`/p/${pasteId}/raw`}>Raw</Link>
<Link href="/">New Paste</Link>
</div>
{/* Always visible */}
<ThemeToggle />
{/* Mobile: hamburger */}
<div className="md:hidden relative">
<button onClick={() => setIsMenuOpen(!isMenuOpen)}>
{isMenuOpen ? <X /> : <Menu />}
</button>
{isMenuOpen && <DropdownMenu />}
</div>
</div>
Click-Outside & Escape Key Handling
Two patterns I always include for dropdowns:
const menuRef = useRef<HTMLDivElement>(null)
const buttonRef = useRef<HTMLButtonElement>(null)
// Close on click outside
useEffect(() => {
function handleClickOutside(event: MouseEvent) {
if (
menuRef.current &&
buttonRef.current &&
!menuRef.current.contains(event.target as Node) &&
!buttonRef.current.contains(event.target as Node)
) {
setIsMenuOpen(false)
}
}
if (isMenuOpen) {
document.addEventListener('mousedown', handleClickOutside)
return () => document.removeEventListener('mousedown', handleClickOutside)
}
}, [isMenuOpen])
// Close on Escape
useEffect(() => {
function handleEscape(event: KeyboardEvent) {
if (event.key === 'Escape') setIsMenuOpen(false)
}
if (isMenuOpen) {
document.addEventListener('keydown', handleEscape)
return () => document.removeEventListener('keydown', handleEscape)
}
}, [isMenuOpen])
Only attach listeners when the menu is open. Clean them up on close. No memory leaks.
The Dropdown
{isMenuOpen && (
<div
ref={menuRef}
className="absolute right-0 top-full mt-2 w-48 bg-white dark:bg-gray-900
border border-gray-200 dark:border-gray-700 rounded-lg shadow-lg py-2"
>
<button onClick={() => { handleCopy(); setIsMenuOpen(false) }}>
Copy Link
</button>
<Link href={`/p/${pasteId}/raw`} onClick={() => setIsMenuOpen(false)}>
Raw
</Link>
<Link href="/" onClick={() => setIsMenuOpen(false)}>
New Paste
</Link>
</div>
)}
Each action closes the menu. The absolute right-0 top-full positions it below the hamburger button, aligned to the right edge.
Small details, but they matter for usability.
With rendering and encryption sorted, the roadmap is clear:
The foundation is solid. The features are useful. Now it's about polish and power-user capabilities.
TL;DR: Added end-to-end encryption to MdBin using AES-256-GCM with PBKDF2 key derivation (310k iterations). Server never sees your plaintext or password. Share sensitive content via self-decrypting URL hash links. Also: Streamdown plugin architecture upgrade, dark/light/system theme toggle with next-themes + Tailwind v4 class-based dark mode, and responsive hamburger menu with proper click-outside and escape key handling.
Check out the encrypted paste feature at mdbin.sivaramp.com
2026-02-04 02:07:24
For a long time, I was confident that I understood how concurrency worked in Java.
Create a thread.
Start it.
Join it.
After that, a thread pool handled the growing workload.
Simple… right?
Then I started reading about Virtual Threads in Java (Project Loom), and suddenly I realized — we’ve been living with limitations we just accepted as “normal”.
This post is my attempt to explain:
not like documentation, but like how I actually understood it.
In Java, the classic thread model is called a Platform Thread. These are traditional threads backed directly by the Operating System.
Every Java developer starts with this:
Thread thread = new Thread(() -> {
doWork();
});
thread.start(); // Starts a new platform thread
It feels powerful at first. You’re doing real parallel work. Actual multitasking.
But then the cracks appear.
Why...?
Because platform threads are directly mapped to OS threads.
That means:
So instead of writing simple code, we started doing tricks:
Not because we wanted to — but because threads didn’t scale.
The Question That Couldn’t Be Ignored
If a thread is waiting for I/O, why is it still blocking an OS thread?
During I/O operations, a thread:
So why is it treated like a scarce system resource?
If the thread isn’t actively running, why should it continue holding an OS thread?
That question is exactly where Virtual Threads come in.
Virtual threads are super-light, JVM-managed threads that let us handle millions of concurrent tasks without worrying about OS thread overhead.
Creating one is straightforward:
Thread thread = Thread.startVirtualThread(() -> {
doWork();
});
That’s it.
Virtual threads are:
And that changes everything.
What Actually Happens?
Virtual threads run on top of platform threads, called carrier threads.
When a virtual thread:
So instead of blocking an OS thread while waiting, the JVM simply moves on.
In simple terms:
This solves the core scalability problem of platform threads.
Virtual threads don’t remove blocking — they remove the cost of blocking.
That’s the breakthrough.
This difference is easier to see in the diagram below - notice what happens during blocking.
Virtual threads are a great default for most modern applications — especially when the workload is I/O-heavy.
They make sense when:
Platform threads still matter, though.
They make sense when:
In practice, it comes down to this:
That rule alone covers most real-world decisions.
Virtual threads didn’t make Java faster.
They just stopped punishing us for blocking.
We can write simple code.
We can wait on I/O.
And the JVM handles the mess.
No thread pool anxiety.
No async gymnastics.
No PhD in callbacks.
Same Java.
Same threads (mostly).
Just way fewer headaches.
That’s it.
If this helped you even a little, hit ❤️, drop a comment, or share your thoughts below.
2026-02-04 01:54:13
Hiya!
I'm back! I feel that I owe you an explanation of what's going on here.
I started this blog because I want to share my insights on agentic coding from the perspective of a developer, CTO, CEO and founder. I plan to cover the entire autonomous AI coding journey.
Throughout this series, we'll get our mindset right about the many roles you'll take on: product designer, project manager, tech lead and quality assurance engineer. Later, I'll take you through a brainstorming session. Once we have a feature specification in place, we will learn how to manage a group of coding agents. We'll learn how to enforce the rules and, most importantly, why they're important. By the end, you will be confidently shipping AI-generated code to production. We will be doing some 'vibe coding' in production.
Not necessarily in that order.
Buckle up. Here's post #2.
In previous post we covered how to make Claude stick to conventions (tl;dr - skills + hooks fix it). Now it follows the rules but...
Marcin, all tasks are complete.
I open a browser and see:
NoMethodError: undefined method 'hallucinated_method' for an instance of User (NoMethodError)
Yeah, good job, Claude! High five, let's ship it to production... NOT.
This brings us to a fundamental question: how do you know the software is working?
Back in 2017, I was working on a payment processor for a company called Paladin Software. We were processing huge YouTube earnings spreadsheets (yes, gigabyte-sized CSV files). My job was to ensure that we did it on time and that it simply worked.
One beautiful Thursday afternoon, I headed to my favourite spot in Krakow at the time, Dolnych Młynów. Friday was a day off.
As you might have guessed, one of the clients uploaded their spreadsheet on Friday. When I got back to the office on Monday, the earnings still hadn't been processed. Questions were being asked.
I'm looking at Sidekiq's failed jobs queue:NoMethodError: undefined method 'hallucinated_method' for an instance of User (NoMethodError)
Was I an LLM before it was a thing? I was certainly shipping code like one.
LLMs have a condition called anterograde amnesia. This is the inability to form new memories after the onset of the condition. They remember their past, but new experiences don't stick. Unlike me, they can't learn from a production incident on a Friday. Every session starts from zero.
This is why they must be given a set of strict rules each time they write code (see the previous postabout enforcing these rules). However, rules alone are not enough. We also need checks and reviews.
LLMs are non-deterministic. This means that, just like me, the AI agent will sometimes produce excellent code and sometimes it won't. Sometimes it will spend a lot of time testing the feature. At other times, one test will look like more than enough.
To mitigate this, we need to implement some deterministic checks in our workflow. We need something that clearly indicates when something is wrong. Here's my opinion on what should be included in local CI:
SimpleCov to report coverageSimpleCov coverage reports.We enforce a single code style. Our code is secure. We have tests for new and changed code. All tests pass (by the way, how many times has an AI agent told you that a test failure is unrelated to their changes?).
There are no more runtime errors. There is better reliability. Some of my frustrations have gone again.
You might ask: aren't there too many tests and too much boilerplate? No, unit tests are fast. With coding agents, they are more maintainable than ever before. This is a pretty good deal for improved reliability.
Wrap all of this in your local CI. If you're running Rails 8.1 or later, it's already in the framework. For Rails 8.0 and earlier you can take my ported implementation of it. Alternatively, you can create your own.
This is the sample output:
Continuous Integration
Running checks...
Rubocop
bundle exec rubocop -A
✅ Rubocop passed in 1.98s
Prettier
yarn prettier --config .prettierrc.json app/packs app/components --write
✅ Prettier passed in 1.57s
Brakeman
bundle exec brakeman --quiet --no-pager --except=EOLRails
✅ Brakeman passed in 8.76s
RSpec
bundle exec parallel_rspec --serialize-stdout --combine-stderr
✅ RSpec passed in 1m32.45s
Undercover
bundle exec undercover --lcov coverage/lcov/app.lcov --compare origin/master
✅ Undercover passed in 0.94s
✅ Continuous Integration passed in 1m45.70s
Back in 2014, I got my second IT job as a junior Rails developer at Netguru. The onboarding process included the Netguru way of writing code. Specific libraries and patterns.
I was writing code The Rails Way. I didn't have much experience with production-grade Rails apps. During one of the code reviews, I received feedback that my models were a bit too fat. They also provided a link to an article by Code Climate: '7 Ways to Decompose Fat ActiveRecord Models'.
I kinda heard about these rules. I was just so focused on getting the business logic right that I didn't apply them...
Oh wait, isn't it the exact same thing Claude told me?
"The CLAUDE.md says 'ALWAYS STOP and ask for clarification rather than making assumptions' and I violated that repeatedly. I got caught up in the momentum of the Rails 8 upgrade and stopped being careful."
It's not a new problem for the software industry. The remedy? Code review, obviously. Each pull request must be checked by another developer. This allows less experienced developers to learn good practices and enables more experienced developers to mentor others and pass on their knowledge. The rules are also enforced. Everybody wins.
Remember: never let the developer review their own code. The same applies to AI agents.
Why three stage?
Firstly, we will check that the implementation complies with the functionality specifications. This involves verifying that the agent has built what was requested (neither more nor less).
Secondly: A review of Rails and project-specific conventions. To do this, we have to load all the conventions (see previous post) and check them. Are the interfaces clean? View components instead of partial? Are jobs idempotent and thin? Do the tests verify behaviour?
Last but not least: A general code quality review of architecture, design, documentation, standards and maintainability.
All of these things give us a comprehensive overview of the implementation and any possible deviations. Each review is carried out by a different agent with a fresh perspective and no attachment to the feature.
Here's what a full report looks like in practice:
1. Spec compliance - line-by-line verification:
| Requirement | Implementation | Status |
|---|---|---|
| Column: delay_peer_reviews | :delay_peer_reviews | ✅ Match |
2. Rails conventions - checklist:
| Convention | Status |
|------------|--------|
| Reversible migration | PASS |
| Handles existing data | PASS |
3. Code quality - structured report with Strengths, Critical/Important/Minor issues, references, and merge assessment.
Final summary table:
| Check | Status |
|-------|--------|
| ✅ Spec compliance | Passed |
| ✅ Rails conventions | Passed |
| ✅ Code quality | Approved with minor suggestions |
| ✅ Local CI | Passed |
Ready for merge.
When issues are found, it consolidates them:
## Legitimate Findings to Address
1. No error handling in Discord::Client#post
2. No error handling in OAuth callback
## Findings I'm Skipping (Your Explicit Decisions)
- No encrypts on token fields (you requested this)
Which of these do you want me to address?
My /codereview command and review agent prompts are on GitHub.
Let the AI agent write the code.
Tell the agent to run bin/ci and fix everything until it's green. Every fail is their responsibility.
They will make the local CI green.
Run the command /codereview.
The agent fixes any issues.
Run /codereview again until the code is ready.
Personally, I don't read the code until this point. As a good manager, I don't micromanage.
Be a good manager, too. Provide a set of rules and the tools needed to enforce them. Don't micromanage. If you're not happy with the results, adjust the rules. Repeat until you are happy with results.
Trust, but verify.
The spec compliance and code quality review agents are based on https://github.com/obra/superpowers.
This is the second post in a longer series.
So far, we have covered:
I'd love to hear your thoughts. Reach out to me on LinkedIn or at [email protected].
2026-02-04 01:52:01
TL;DR: Discover the exact backend optimization strategies that reduced API response times from 800ms to 120ms, scaled from 120 req/s to 8,500 req/s, and cut costs by 60% - all while handling 100K+ concurrent users. Real metrics and production-ready patterns included! 🚀
Frontend performance means nothing if your backend can't keep up. At 100K+ users, every millisecond of API latency matters. Here's how I transformed my Node.js/Express backend from struggling with hundreds of requests per second to smoothly handling thousands.
When you scale from 1K to 100K+ users, backend challenges multiply:
The key insight: You can't just "add more servers" - you need systematic optimization.
Before Backend Optimization:
Performance:
├── Avg Response Time: 800ms
├── P95 Response Time: 2,400ms
├── P99 Response Time: 4,500ms
├── Throughput: 120 req/s
└── Error Rate: 2.3%
Infrastructure:
├── Servers: 2 instances
├── Database Connections: Direct
├── Caching: None
└── Load Balancing: Basic
Cost:
└── Monthly: $450/month
After Backend Optimization:
Performance:
├── Avg Response Time: 120ms (85% faster) 🚀
├── P95 Response Time: 310ms (87% faster) ⚡
├── P99 Response Time: 580ms (87% faster) 🔥
├── Throughput: 8,500 req/s (70x increase) 💪
└── Error Rate: 0.08% (96% reduction) ✅
Infrastructure:
├── Servers: Auto-scaling (2-20 instances)
├── Database Connections: Pool + replicas
├── Caching: Redis (87% hit rate)
└── Load Balancing: Advanced with health checks
Cost:
└── Monthly: $680/month (1.5x cost, 70x capacity!)
Cost per request dropped from $0.0031 to $0.000044 - that's 98.6% more efficient!
Before: Inefficient data fetching
// ❌ BAD: Multiple sequential database queries
@Get('/api/teams/:teamId/dashboard')
async getTeamDashboard(@Param('teamId') teamId: number): Promise<any> {
// Query 1: Get team info (200ms)
const team = await this.db.query(
'SELECT * FROM teams WHERE id = $1',
[teamId]
);
// Query 2: Get team members (300ms)
const members = await this.db.query(
'SELECT * FROM users WHERE team_id = $1',
[teamId]
);
// Query 3: Get metrics for each member (400ms each!)
const memberMetrics = [];
for (const member of members) {
const metrics = await this.db.query(
'SELECT * FROM metrics WHERE user_id = $1',
[member.id]
);
memberMetrics.push(metrics);
}
// Query 4: Get team stats (250ms)
const stats = await this.db.query(
'SELECT * FROM team_stats WHERE team_id = $1',
[teamId]
);
return { team, members, memberMetrics, stats };
}
// Total time: 200 + 300 + (400 × members) + 250 = 2,000ms+ for 3 members!
After: Optimized with parallel queries and caching
// ✅ GOOD: Parallel queries with caching
@Get('/api/teams/:teamId/dashboard')
async getTeamDashboard(@Param('teamId') teamId: number): Promise<any> {
const cacheKey = `dashboard:team:${teamId}`;
// Check cache first
const cached = await this.redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Execute all queries in parallel using Promise.all
const [team, members, metrics, stats] = await Promise.all([
// Query 1: Team info
this.db.query('SELECT * FROM teams WHERE id = $1', [teamId]),
// Query 2: Team members
this.db.query('SELECT * FROM users WHERE team_id = $1', [teamId]),
// Query 3: All metrics in one query using JOIN
this.db.query(`
SELECT m.*, u.name as user_name
FROM metrics m
JOIN users u ON m.user_id = u.id
WHERE u.team_id = $1
`, [teamId]),
// Query 4: Team stats
this.db.query('SELECT * FROM team_stats WHERE team_id = $1', [teamId])
]);
const result = {
team: team.rows[0],
members: members.rows,
metrics: metrics.rows,
stats: stats.rows[0]
};
// Cache for 5 minutes
await this.redis.setex(cacheKey, 300, JSON.stringify(result));
return result;
}
// Total time: max(200, 300, 150, 250) + cache overhead = ~320ms
// With cache hit: ~5ms!
Results:
// Batch multiple API requests into single database query
@Injectable()
export class BatchRequestService {
private batchQueue: Map<string, BatchRequest> = new Map();
private batchTimer: NodeJS.Timeout | null = null;
private readonly BATCH_WINDOW = 50; // ms
async get(url: string, params: any): Promise<any> {
return new Promise((resolve, reject) => {
const key = `${url}:${JSON.stringify(params)}`;
if (!this.batchQueue.has(key)) {
this.batchQueue.set(key, {
url,
params,
resolvers: []
});
}
this.batchQueue.get(key)!.resolvers.push({ resolve, reject });
this.scheduleBatch();
});
}
private scheduleBatch(): void {
if (this.batchTimer) return;
this.batchTimer = setTimeout(() => {
this.executeBatch();
}, this.BATCH_WINDOW);
}
private async executeBatch(): Promise<void> {
const batch = Array.from(this.batchQueue.values());
this.batchQueue.clear();
this.batchTimer = null;
// Group requests by type for efficient querying
const grouped = this.groupRequests(batch);
for (const [type, requests] of Object.entries(grouped)) {
try {
const results = await this.executeBatchQuery(type, requests);
// Distribute results to waiting promises
requests.forEach((req, index) => {
req.resolvers.forEach(r => r.resolve(results[index]));
});
} catch (error) {
requests.forEach(req => {
req.resolvers.forEach(r => r.reject(error));
});
}
}
}
private async executeBatchQuery(type: string, requests: any[]): Promise<any[]> {
// Execute optimized batch query based on type
const ids = requests.map(r => r.params.id);
const query = `SELECT * FROM ${type} WHERE id = ANY($1)`;
const result = await this.db.query(query, [ids]);
return result.rows;
}
}
@Injectable()
export class CachedDataService {
private readonly CACHE_TTL = {
SHORT: 60, // 1 minute - highly dynamic data
MEDIUM: 300, // 5 minutes - semi-static data
LONG: 3600, // 1 hour - rarely changing data
VERY_LONG: 86400 // 24 hours - static reference data
};
constructor(
private redis: RedisClient,
private db: DatabaseService
) {}
async getWithCache<T>(
key: string,
fetchFn: () => Promise<T>,
ttl: number = this.CACHE_TTL.MEDIUM
): Promise<T> {
// Try cache first
const cached = await this.redis.get(key);
if (cached) {
return JSON.parse(cached);
}
// Cache miss - fetch from source
const data = await fetchFn();
// Store in cache
await this.redis.setex(key, ttl, JSON.stringify(data));
return data;
}
// Cache with automatic invalidation
async setWithInvalidation(
key: string,
data: any,
relatedKeys: string[] = []
): Promise<void> {
// Invalidate related caches
if (relatedKeys.length > 0) {
await this.redis.del(...relatedKeys);
}
// Update the data
await this.updateData(key, data);
}
// Pattern-based cache invalidation
async invalidatePattern(pattern: string): Promise<void> {
const keys = await this.redis.keys(pattern);
if (keys.length > 0) {
await this.redis.del(...keys);
}
}
}
// Usage example
@Injectable()
export class TeamMetricsService {
constructor(private cache: CachedDataService) {}
async getTeamMetrics(teamId: number): Promise<TeamMetrics> {
return this.cache.getWithCache(
`metrics:team:${teamId}`,
async () => {
// Expensive database query
return await this.fetchTeamMetricsFromDb(teamId);
},
this.cache.CACHE_TTL.MEDIUM
);
}
async updateTeamMetrics(teamId: number, data: any): Promise<void> {
// Invalidate related caches
await this.cache.setWithInvalidation(
`metrics:team:${teamId}`,
data,
[
`dashboard:team:${teamId}`,
`metrics:team:${teamId}`,
`stats:team:${teamId}`
]
);
}
}
// Proactively populate cache for frequently accessed data
@Injectable()
export class CacheWarmingService {
constructor(
private redis: RedisClient,
private db: DatabaseService
) {
this.startWarmingSchedule();
}
private startWarmingSchedule(): void {
// Warm cache every 4 minutes (before 5-minute expiry)
setInterval(() => {
this.warmFrequentlyAccessedData();
}, 4 * 60 * 1000);
}
private async warmFrequentlyAccessedData(): Promise<void> {
try {
// Get list of active teams
const activeTeams = await this.db.query(`
SELECT DISTINCT team_id
FROM user_sessions
WHERE last_activity > NOW() - INTERVAL '1 hour'
`);
// Warm cache for each active team
const warmingPromises = activeTeams.rows.map(async (team) => {
const metrics = await this.fetchTeamMetrics(team.team_id);
await this.redis.setex(
`metrics:team:${team.team_id}`,
300,
JSON.stringify(metrics)
);
});
await Promise.all(warmingPromises);
console.log(`Cache warmed for ${activeTeams.rows.length} teams`);
} catch (error) {
console.error('Cache warming failed:', error);
}
}
}
Results:
// Optimized connection pool configuration
import { Pool } from 'pg';
const poolConfig = {
host: process.env.DB_HOST,
port: 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
// Connection pool settings
min: 10, // Minimum connections
max: 100, // Maximum connections
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 2000,
// Performance tuning
statement_timeout: 10000, // Kill queries after 10s
query_timeout: 10000,
keepAlive: true,
keepAliveInitialDelayMillis: 10000
};
class DatabaseService {
private pool: Pool;
private readPool: Pool;
constructor() {
// Write pool (primary database)
this.pool = new Pool(poolConfig);
// Read pool (read replicas)
this.readPool = new Pool({
...poolConfig,
host: process.env.DB_READ_REPLICA_HOST
});
this.setupPoolMonitoring();
}
private setupPoolMonitoring(): void {
// Monitor pool health
setInterval(() => {
console.log('Pool stats:', {
total: this.pool.totalCount,
idle: this.pool.idleCount,
waiting: this.pool.waitingCount
});
// Alert if pool is saturated
if (this.pool.waitingCount > 10) {
console.error('Connection pool saturated!');
// Send alert to monitoring service
}
}, 60000);
}
async executeWrite(query: string, params: any[]): Promise<any> {
const client = await this.pool.connect();
try {
return await client.query(query, params);
} finally {
client.release();
}
}
async executeRead(query: string, params: any[]): Promise<any> {
const client = await this.readPool.connect();
try {
return await client.query(query, params);
} finally {
client.release();
}
}
async transaction<T>(callback: (client: any) => Promise<T>): Promise<T> {
const client = await this.pool.connect();
try {
await client.query('BEGIN');
const result = await callback(client);
await client.query('COMMIT');
return result;
} catch (error) {
await client.query('ROLLBACK');
throw error;
} finally {
client.release();
}
}
}
@Injectable()
export class DataAccessService {
constructor(private db: DatabaseService) {}
// Read operations use read replicas
async getTeamMetrics(teamId: number): Promise<any> {
return this.db.executeRead(
'SELECT * FROM team_metrics WHERE team_id = $1',
[teamId]
);
}
// Write operations use primary database
async updateTeamMetrics(teamId: number, data: any): Promise<void> {
await this.db.executeWrite(
'UPDATE team_metrics SET data = $1, updated_at = NOW() WHERE team_id = $2',
[data, teamId]
);
}
// Transactions always use primary
async createTeamWithMembers(team: any, members: any[]): Promise<void> {
await this.db.transaction(async (client) => {
// Insert team
const teamResult = await client.query(
'INSERT INTO teams (name, created_at) VALUES ($1, NOW()) RETURNING id',
[team.name]
);
const teamId = teamResult.rows[0].id;
// Insert members
for (const member of members) {
await client.query(
'INSERT INTO users (team_id, name, email) VALUES ($1, $2, $3)',
[teamId, member.name, member.email]
);
}
});
}
}
// Efficient pagination for large datasets
@Get('/api/metrics')
async getMetrics(
@Query('limit') limit: number = 50,
@Query('cursor') cursor?: string
): Promise<PaginatedResponse> {
// Validate and sanitize
const safeLimit = Math.min(Math.max(limit, 1), 100);
let query: string;
let params: any[];
if (cursor) {
// Decode cursor (base64 encoded ID)
const cursorId = Buffer.from(cursor, 'base64').toString('utf-8');
query = `
SELECT id, name, value, created_at
FROM metrics
WHERE id > $1
ORDER BY id ASC
LIMIT $2
`;
params = [cursorId, safeLimit];
} else {
query = `
SELECT id, name, value, created_at
FROM metrics
ORDER BY id ASC
LIMIT $1
`;
params = [safeLimit];
}
const result = await this.db.executeRead(query, params);
const items = result.rows;
// Generate next cursor
const nextCursor = items.length === safeLimit
? Buffer.from(items[items.length - 1].id.toString()).toString('base64')
: null;
return {
items,
nextCursor,
hasMore: items.length === safeLimit
};
}
interface PaginatedResponse {
items: any[];
nextCursor: string | null;
hasMore: boolean;
}
// Enable compression for API responses
import compression from 'compression';
import express from 'express';
const app = express();
// Compression middleware
app.use(compression({
filter: (req, res) => {
if (req.headers['x-no-compression']) {
return false;
}
return compression.filter(req, res);
},
level: 6, // Compression level (1-9, 6 is good balance)
threshold: 1024 // Only compress responses > 1KB
}));
// Result: Typical API response reduced from 45KB to 8KB (82% smaller)
@Injectable()
export class ErrorHandlerService {
handleError(error: any, context: string): never {
// Log error with context
console.error(`Error in ${context}:`, {
message: error.message,
stack: error.stack,
timestamp: new Date().toISOString()
});
// Send to monitoring service (Sentry)
if (process.env.NODE_ENV === 'production') {
this.sentryService.captureException(error, { context });
}
// Return appropriate error response
if (error instanceof ValidationError) {
throw new BadRequestException(error.message);
}
if (error instanceof NotFoundError) {
throw new NotFoundException(error.message);
}
if (error instanceof UnauthorizedError) {
throw new UnauthorizedException(error.message);
}
// Generic error response
throw new InternalServerErrorException(
'An unexpected error occurred. Please try again later.'
);
}
}
@Injectable()
export class CircuitBreakerService {
private failures = new Map<string, number>();
private lastFailureTime = new Map<string, number>();
private state = new Map<string, CircuitState>();
private readonly FAILURE_THRESHOLD = 5;
private readonly RESET_TIMEOUT = 60000; // 1 minute
private readonly HALF_OPEN_MAX_CALLS = 3;
async execute<T>(
key: string,
fn: () => Promise<T>,
fallback?: () => Promise<T>
): Promise<T> {
const currentState = this.state.get(key) || 'closed';
if (currentState === 'open') {
const lastFailure = this.lastFailureTime.get(key) || 0;
if (Date.now() - lastFailure > this.RESET_TIMEOUT) {
this.state.set(key, 'half-open');
} else {
if (fallback) {
return fallback();
}
throw new ServiceUnavailableException(
'Service temporarily unavailable'
);
}
}
try {
const result = await fn();
this.onSuccess(key);
return result;
} catch (error) {
this.onFailure(key);
if (fallback && this.state.get(key) === 'open') {
return fallback();
}
throw error;
}
}
private onSuccess(key: string): void {
this.failures.set(key, 0);
this.state.set(key, 'closed');
}
private onFailure(key: string): void {
const currentFailures = this.failures.get(key) || 0;
const newFailures = currentFailures + 1;
this.failures.set(key, newFailures);
this.lastFailureTime.set(key, Date.now());
if (newFailures >= this.FAILURE_THRESHOLD) {
this.state.set(key, 'open');
console.error(`Circuit breaker opened for: ${key}`);
}
}
}
type CircuitState = 'closed' | 'open' | 'half-open';
// Usage
@Injectable()
export class ExternalApiService {
constructor(private circuitBreaker: CircuitBreakerService) {}
async fetchFromExternalApi(url: string): Promise<any> {
return this.circuitBreaker.execute(
`external-api:${url}`,
async () => {
const response = await fetch(url);
return response.json();
},
async () => {
// Fallback: return cached data or default response
return this.getCachedData(url);
}
);
}
}
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
// Global rate limiter
const globalLimiter = rateLimit({
store: new RedisStore({
client: redis,
prefix: 'rl:global:'
}),
windowMs: 15 * 60 * 1000, // 15 minutes
max: 1000, // 1000 requests per window per IP
message: 'Too many requests, please try again later',
standardHeaders: true,
legacyHeaders: false
});
// Stricter limits for expensive endpoints
const expensiveLimiter = rateLimit({
store: new RedisStore({
client: redis,
prefix: 'rl:expensive:'
}),
windowMs: 60 * 1000, // 1 minute
max: 10, // 10 requests per minute
message: 'Rate limit exceeded for this endpoint'
});
// Apply middleware
app.use('/api/', globalLimiter);
app.use('/api/reports/generate', expensiveLimiter);
// Custom rate limiter by user ID
const createUserRateLimiter = (maxRequests: number) => {
return rateLimit({
store: new RedisStore({
client: redis,
prefix: 'rl:user:'
}),
windowMs: 60 * 1000,
max: maxRequests,
keyGenerator: (req) => {
// Rate limit by user ID instead of IP
return req.user?.id || req.ip;
}
});
};
app.use('/api/user/*', createUserRateLimiter(100));
# Artillery load test - sustained load
artillery run loadtest.yml
# Configuration
config:
target: 'https://api.orgsignals.com'
phases:
- duration: 300
arrivalRate: 100
rampTo: 1000
name: "Ramp to peak"
- duration: 600
arrivalRate: 1000
name: "Sustained peak load"
# Results after optimization:
Summary:
✅ Scenarios: 960,000 (100%)
✅ Requests: 4,800,000
✅ Success Rate: 99.92%
✅ Response Times:
- Min: 35ms
- Median: 118ms
- P95: 298ms
- P99: 562ms
- Max: 1,841ms
✅ Throughput: 8,000 req/s sustained
✅ Error Rate: 0.08%
Database Performance:
✅ Connection Pool:
- Total: 100
- Idle: 45
- Active: 55
- Waiting: 0
✅ Query Performance:
- Avg: 12ms
- P95: 45ms
- P99: 120ms
API Performance:
✅ Total Requests: 45.2M
✅ Avg Response Time: 118ms
✅ P95 Response Time: 298ms
✅ P99 Response Time: 562ms
✅ Error Rate: 0.08%
✅ Peak Throughput: 8,500 req/s
Cache Performance:
✅ Redis Hit Rate: 87%
✅ Avg Cache Response: 5ms
✅ Total Cache Hits: 39.3M
✅ Total Cache Misses: 5.9M
✅ Database Load Reduction: 85%
Infrastructure Health:
✅ Uptime: 99.98%
✅ Avg CPU: 45%
✅ Avg Memory: 52%
✅ Connection Pool: Healthy
✅ Auto-scaling Events: 47
❌ Microservices too early: Added complexity without benefits at this scale
❌ Over-caching: Caused stale data issues, had to fine-tune TTLs
❌ GraphQL: Added overhead without clear advantages for our use case
❌ Too many middleware: Each middleware added latency
These backend optimization strategies transformed our API from struggling at 120 req/s to smoothly handling 8,500 req/s - a 70x improvement. But backend performance is just one component of delivering world-class developer productivity insights.
Ready to see sub-200ms API responses in action?
OrgSignals leverages every backend optimization strategy covered in this article:
Stop flying blind with your engineering metrics. OrgSignals provides:
✅ Lightning-fast analytics - Get insights in milliseconds, not seconds
✅ Real-time DORA metrics - Track deployment frequency, lead time, MTTR, and change failure rate
✅ Seamless integrations - GitHub, GitLab, Jira, Slack - all your tools unified
✅ AI-powered insights - Automatically identify bottlenecks and improvement opportunities
✅ Developer-friendly dashboards - Beautiful visualizations that tell the story
✅ Team & individual metrics - From C-suite to individual contributors
📚 Read the complete series:
Questions about scaling your backend? Drop them in the comments - I respond to every question!
Found this helpful? Follow for more backend optimization and system design content.
2026-02-04 01:46:44
Hi everyone, just wanted to share this Github in case it’s useful!
I work with OpenMetaData, if you too are interested in or working with modern data platforms I think it’s worth having it on your radar!
Some of the features:
👉GitHub: https://github.com/open-metadata/OpenMetadata
Fully open source. If you find it useful, consider giving the repo a ⭐ to bookmark and support the project.
Feedback and contributions are always welcome!