MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Ridge vs. Lasso Regression: A Clear Guide to Regularization Techniques

2026-01-27 02:56:16

Ridge vs. Lasso Regression: A Clear Guide to Regularization Techniques

In the world of machine learning, linear regression is often one of the first algorithms we learn. But standard linear regression has a critical weakness: it can easily overfit to training data, especially when dealing with many features. This is where Ridge and Lasso regression come in—two powerful techniques that prevent overfitting and can lead to more interpretable models. Let's break down how they work, their differences, and when to use each.

THE CORE PROBLEM: OVERFITTING
Imagine you're trying to predict house prices based on features like size, age, number of bedrooms, proximity to a school, and even the color of the front door. A standard linear regression might assign some weight (coefficient) to every single feature, even the irrelevant ones (like door color). It will fit the training data perfectly but will fail miserably on new, unseen houses. This is overfitting.

Ridge and Lasso solve this by adding a "penalty" to the regression model's objective. This penalty discourages the model from relying too heavily on any single feature, effectively simplifying it.

RIDGE REGRESSION: THE GENTLE MODERATOR
What it does: Ridge regression (also called L2 regularization) adds a penalty equal to the square of the magnitude of the coefficients.

Simple Explanation: Think of Ridge as a strict but fair moderator in a group discussion. It allows everyone (every feature) to speak, but it prevents any single person from dominating the conversation. No feature's coefficient is allowed to become extremely large, but very few are ever set to zero.

The Math (Simplified):
The Ridge model tries to minimize:
(Sum of Squared Errors) + λ * (Sum of Squared Coefficients)

Where λ (lambda) is the tuning parameter. A higher λ means a stronger penalty, pushing all coefficients closer to zero (but never exactly zero).

Example:
`Predicting a student's final exam score (y) using:

  • x1: Hours studied (truly important)
  • x2: Number of pencils they own (irrelevant noise)`

A standard regression might output:
Score = 5.0*(Hours) + 0.3*(Pencils)

Ridge regression, with its penalty, might output:
Score = 4.8*(Hours) + 0.05*(Pencils)

See what happened? The coefficient for the important feature (Hours) shrank slightly, and the coefficient for the nonsense feature (Pencils) shrank dramatically. The irrelevant feature is suppressed but not removed.

LASSO REGRESSION: THE RUTHLESS SELECTOR
What it does: Lasso regression (also called L1 regularization) adds a penalty equal to the absolute value of the magnitude of the coefficients.

Simple Explanation: Lasso is a ruthless talent scout. It evaluates all features and doesn't just quiet down the weak ones—it completely eliminates those it deems unnecessary. It performs feature selection.

The Math (Simplified):
The Lasso model tries to minimize:
(Sum of Squared Errors) + λ * (Sum of Absolute Coefficients)

Example:
Using the same student score prediction:

A standard regression might output:
Score = 5.0*(Hours) + 0.3*(Pencils)

Lasso regression, with its penalty, might output:
Score = 4.9*(Hours) + 0.0*(Pencils)

The coefficient for Pencils has been forced to absolute zero. Lasso has identified it as useless and removed it from the model entirely, leaving a simpler, more interpretable model.

HEAD-TO-HEAD COMPARISON
Feature Ridge Regression ** Lasso Regression**
Penalty Term Sum of squared coefficients Sum of absolute
coefficients
Effect on Coefficients Shrinks them
smoothly towards zero Can force coefficients to
exactly zero
Feature Selection No. Keeps all features. Yes. Creates sparse
models.
Use Case When you believe all features are relevant, but need to reduce overfitting. When you have many features and suspect only a
subset are important.
Good for Handling multicollinearity (highly correlated features). Building simpler, more interpretable models.
Geometry Penalty region is a circle. Solution tends to be where the error contour touches the circle. Penalty region is a diamond. Solution often occurs at a corner, zeroing out coefficients.

VISUAL ANALOGY: THE FITTING GAME
Imagine you're fitting a curve to points on a graph, with two dials (coefficients) to adjust.

  • Standard Regression: You only care about getting the line as close to the points as possible. You might turn both dials to extreme positions to fit perfectly.
  • Ridge: You have a second goal: you don't want the dials to point to very high numbers. You find a balance between fit and keeping the dial settings moderate.
  • Lasso: You have a second goal: you want as few dials as possible to be far from the "off" position. You're willing to turn a dial all the way to "OFF" (zero) if it doesn't help enough.

WHICH ONE SHOULD YOU USE?

  • Choose Ridge if you have many features that all have some meaningful relationship to the output. It’s often the safer, more stable choice.
  • Choose Lasso if you're in an exploratory phase, have a huge number of features (e.g., hundreds of genes predicting a disease), and want to identify the most critical ones. The built-in feature selection is a huge advantage for interpretability.
  • Pro-Tip: There's also Elastic Net, which combines both Ridge and Lasso penalties. It’s a great practical compromise that often delivers the best performance.

IN CONCLUSION
Both Ridge and Lasso are essential tools that move linear regression from a simple baseline to a robust, modern technique.

  • Ridge regression is your go-to for general purpose prevention of overfitting. It's reliable and handles correlated data well.
  • Lasso regression is your tool for creating simple, interpretable models by automatically selecting only the most important features.

By understanding their distinct "philosophies"—moderation vs. selection—you can strategically choose the right tool to build models that are not only accurate but also generalize well to the real world.

Debugging Dynamic Cookie Validation in Express.js

2026-01-27 02:52:58

When building multi-tenant applications, validating cookies per domain can be tricky. I recently worked on a project where each domain had its own cookie configuration, and I wanted to ensure the correct cookie was being read for each request.

public validateToken: RequestHandler = catchAsync(
  async (req: Request, res: Response, next: NextFunction): Promise<void> => {
    // Extract the hostname dynamically from the request
    const host = parse(req.hostname).hostname;

    // Get the access cookie name for this domain
    const { ACCESS } = DOMAIN_COOKIE[host as keyof typeof DOMAIN_COOKIE];

    console.log('Debug Mode – Hostname:', host);

    if (!ACCESS) {
      return this.unauthorized(req, res, next);
    }

    const accessCookie = req.signedCookies[ACCESS];

    // If the access token is missing, throw an unauthorized error
    if (!accessCookie) {
      return this.unauthorized(req, res, next);
    }

    // Continue to next middleware or route
    next();
  }
);

Key Takeaways

Dynamic Cookie Access
Using parse(req.hostname).hostname allows you to determine which cookie to check for the current request dynamically. This is especially useful for multi-domain setups.

Early Debugging
Adding a console.log statement for the hostname helps confirm which domain the request is coming from and whether the correct cookie name is being used.

Fail Fast
Always check for missing cookies and return an unauthorized response early to prevent unauthorized access.

Why This Matters

Without this setup, your multi-domain app could mistakenly use the wrong cookie, leading to authentication errors. Dynamic validation ensures every request is verified against its intended domain configuration.

The Agentic Developer: Orchestrating My 2026 Portfolio with Google Antigravity &amp; Gemini 3

2026-01-27 02:52:30

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

About Me

Hi! I'm a developer based in Mumbai with a deep passion for AI. Lately, I've been exploring the intersection of web development and Artificial Intelligence. I love trying new AI tools to accelerate my workflow and complete tasks at a faster pace.

I am the author of "The Secrets To Master Your Mind", which I published at the age of 14. Currently, I am enrolled as a student at K.J. Somaiya Institute of Technology.

Portfolio

How I Built It

🛠️ The Tech Stack
I wanted my portfolio to be more than just a static page—I wanted it to be an immersive, "liquid" experience. To achieve this, I used a modern, performance-focused stack, orchestrated entirely by Google Antigravity:

Frontend Core: React 19 (via Vite 7) for ultra-fast HMR and build times.

Styling: Tailwind CSS 4 & PostCSS for rapid UI development.

Infrastructure: Docker (Multi-stage build) & Nginx.

Hosting: Google Cloud Run (Serverless container deployment).

🎨 Immersive Design & Animations
To create a "premium" feel, I layered several animation libraries:

Framer Motion: Used for complex component animations and scroll-triggered layout reveals.

GSAP: Powered high-performance tweens and timelines.

Locomotive Scroll: Enabled smooth, inertia-based scrolling to give the site weight and momentum.

WebGL & Shaders: I implemented a custom fluid simulation (SplashCursor.jsx) and particle effects (@tsparticles/react) to create a background that reacts to user interaction.

☁️ Powered by Google Cloud & AI
This project relies heavily on the Google ecosystem for both development and deployment:

Google Cloud Run: I containerized the application using Docker. The Dockerfile uses a multi-stage build (Node 18 for building → Nginx Alpine for serving) to keep the image lightweight. Deploying to Cloud Run was seamless, allowing me to scale to zero when not in use (keeping costs low) while maintaining high availability.

Gemini & AI Assistance: As a Google Student Ambassador, I leverage Google's tools daily. For this project, I used Gemini 3 Pro (via Google Antigravity) to build the entire website from scratch. I generated the UI elements, animations, and styling simply by providing prompts to the Antigravity agents.

What I'm Most Proud Of

I am most proud of finally breaking my cycle of procrastination. I had always put off building my portfolio, but this challenge proved to be the perfect platform to get started. With the help of Google AI tools, I finally completed it.

The animations generated by Antigravity were outstanding and went far beyond my imagination.

SQL Restaurant Kitchen ---Indexes and Partitions

2026-01-27 02:49:32

The Night Everything Broke

It was 2 AM on a Friday when my phone exploded with alerts. Our e-commerce platform was dying. Page load times had gone from 200ms to 40+ seconds. Customers were abandoning carts. My hands shook as I SSH'd into the database server.

The culprit? A seemingly innocent query:

SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

This query was taking 38 seconds to execute. Our orders table had grown to 50 million rows. No indexes. No partitions. Just pure, unoptimized chaos.

As I sat there watching EXPLAIN ANALYZE output scroll by, my roommate walked in from his late shift as a line cook. "Dude, why do you look like you're about to cry?"

I explained the problem. He looked at my screen and said something that changed how I think about databases forever:

"Your database is like a kitchen with no organization. You're asking the chef to find one specific tomato in a warehouse full of every ingredient ever used."

That night, sitting in our tiny apartment, a chef taught me more about database optimization than any textbook ever did.

The Restaurant Kitchen Model: How Professional Kitchens Actually Work

Before we dive into SQL, let me paint you a picture of how a professional kitchen operates during dinner rush.

The Unorganized Kitchen (No Indexes)

Imagine a restaurant where everything is in giant boxes:

  • Box A: All vegetables (500 items mixed together)
  • Box B: All proteins (300 items mixed together)
  • Box C: All spices (200 items mixed together)

When the chef needs "fresh basil," someone has to:

  1. Open Box C
  2. Look through ALL 200 spices
  3. Check each container one by one
  4. Finally find the basil

This is a full table scan. It works, but it's painfully slow.

The Organized Kitchen (With Indexes)

Now imagine the same kitchen with a proper system:

  • A labeled spice rack organized alphabetically
  • A card catalog showing: "Basil → Rack 3, Shelf 2, Position 5"
  • Color-coded containers
  • A floor plan with marked zones

The chef needs basil? They:

  1. Check the catalog (the index)
  2. Go directly to Rack 3, Shelf 2, Position 5
  3. Grab the basil
  4. Continue cooking

This is an indexed query. The time saved is exponential.

The Multi-Station Kitchen (Partitioned Tables)

But here's where it gets interesting. Large restaurants don't have one kitchen—they have stations:

┌─────────────────────────────────────────────┐
│           RESTAURANT KITCHEN                │
├──────────────┬──────────────┬───────────────┤
│   Salad      │     Grill    │    Pastry     │
│   Station    │    Station   │    Station    │
│              │              │               │
│ • Lettuce    │ • Steaks     │ • Flour       │
│ • Tomatoes   │ • Chicken    │ • Sugar       │
│ • Dressings  │ • Fish       │ • Chocolate   │
└──────────────┴──────────────┴───────────────┘

When an order comes in for "Caesar Salad," the system:

  1. Routes it to the Salad Station only
  2. That station has its own organized tools and ingredients
  3. Other stations aren't even disturbed

This is table partitioning. You're dividing your data into logical segments so queries only search where data actually lives.

The Technical Translation: From Kitchen to Database

Act 1: Adding Your First Index (The Spice Rack)

Let's go back to our disaster query:

-- THE SLOW QUERY (Full Kitchen Search)
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

-- Execution time: 38 seconds 😱
-- Rows scanned: 50,000,000

The database is literally checking all 50 million orders, one by one. Let's add an index:

-- CREATE THE SPICE RACK (Index)
CREATE INDEX idx_orders_customer_created 
ON orders(customer_id, created_at DESC);

What just happened?

The database created a separate data structure that looks like this:

Index Structure (Simplified):
┌────────────────┬──────────────┬─────────────────────┐
│  customer_id   │  created_at  │  row_pointer        │
├────────────────┼──────────────┼─────────────────────┤
│  12847         │  2024-01-15  │  → Row at block 423 │
│  12847         │  2024-01-10  │  → Row at block 421 │
│  12847         │  2024-01-05  │  → Row at block 418 │
│  ...           │  ...         │  ...                │
└────────────────┴──────────────┴─────────────────────┘

Now when you run the query:

-- SAME QUERY, NOW WITH INDEX
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC 
LIMIT 20;

-- Execution time: 12ms 🚀
-- Rows scanned: 20 (exactly what we needed!)

From 38 seconds to 12 milliseconds. That's a 3,166x improvement.

The Index Decision Tree (When to Add the Spice Rack)

Do you frequently query by this column?
    │
    ├─ YES → Is it in a WHERE clause?
    │         │
    │         ├─ YES → CREATE INDEX
    │         │
    │         └─ NO → Is it in a JOIN?
    │                   │
    │                   ├─ YES → CREATE INDEX
    │                   │
    │                   └─ NO → Is it in ORDER BY?
    │                             │
    │                             └─ YES → Consider INDEX
    │
    └─ NO → Don't waste space on an index

Act 2: The Multi-Column Index Gotcha

Here's where most developers trip up. Watch this:

-- Creating individual indexes (WRONG)
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_created ON orders(created_at);

-- Query performance: STILL SLOW
-- Why? The database can only use ONE index efficiently

You need a composite index (multi-column):

-- The RIGHT way (Order matters!)
CREATE INDEX idx_orders_customer_created 
ON orders(customer_id, created_at DESC);

The order matters because of the "Leftmost Prefix Rule":

This index helps queries with:

  • WHERE customer_id = 123
  • WHERE customer_id = 123 AND created_at > '2024-01-01'
  • WHERE created_at > '2024-01-01' (customer_id must be specified first!)

Think of it like a phone book: You can find "Smith, John" but you can't efficiently find "everyone named John" because it's sorted by last name first.

Act 3: When One Kitchen Isn't Enough (Table Partitioning)

Six months after the index fix, we hit another wall. Our orders table was now 200 million rows. Even with indexes, some queries were slowing down. The table maintenance (VACUUM, backups) was taking hours.

My chef roommate: "You need to split the kitchen. You're trying to run a single kitchen for 50 different cuisines."

The Partitioning Strategy

We decided to partition by time (the most common pattern):

-- Step 1: Create the partitioned table
CREATE TABLE orders (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP NOT NULL,
    status VARCHAR(50)
) PARTITION BY RANGE (created_at);

-- Step 2: Create partitions (the "stations")
CREATE TABLE orders_2023_q1 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE orders_2023_q2 PARTITION OF orders
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

CREATE TABLE orders_2023_q3 PARTITION OF orders
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');

CREATE TABLE orders_2023_q4 PARTITION OF orders
    FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');

CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

-- Step 3: Add indexes to EACH partition
CREATE INDEX idx_orders_2023_q1_customer 
    ON orders_2023_q1(customer_id, created_at);

CREATE INDEX idx_orders_2023_q2_customer 
    ON orders_2023_q2(customer_id, created_at);
-- ... (repeat for each partition)

What changed?

-- Query for recent orders
SELECT * FROM orders 
WHERE created_at > '2024-01-15'
  AND customer_id = 12847;

-- Before partitioning:
-- Scans: 200 million rows → finds 20
-- Time: 5 seconds

-- After partitioning:
-- Scans: ONLY orders_2024_q1 (15 million rows) → finds 20
-- Time: 80ms

The database is smart enough to know: "This query only needs Q1 2024 data. I'll skip the other 185 million rows entirely."

Partition Pruning in Action

Here's the magic visualized:

Query: WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01'

┌──────────────────────────────────────────────┐
│         ORDERS TABLE (Partitioned)           │
├───────────────┬───────────────┬──────────────┤
│  2023_Q1      │   2023_Q2     │   2023_Q3    │
│  ❌ SKIPPED   │   ❌ SKIPPED  │   ❌ SKIPPED │
└───────────────┴───────────────┴──────────────┘
├───────────────┬───────────────┐
│  2023_Q4      │   2024_Q1     │
│  ❌ SKIPPED   │   ✅ SCANNED  │  ← Only this one!
└───────────────┴───────────────┘

The Recipe: A Step-by-Step Guide to Implementing This

Scenario: You have a slow, growing table

-- 1. IDENTIFY slow queries
EXPLAIN ANALYZE 
SELECT * FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2023-01-01'
ORDER BY created_at DESC;

-- Look for: "Seq Scan on orders" (BAD!)
-- Want to see: "Index Scan using idx_..." (GOOD!)

2. Add Strategic Indexes

-- For WHERE clauses
CREATE INDEX idx_orders_customer_id 
    ON orders(customer_id);

-- For multi-condition queries (BETTER)
CREATE INDEX idx_orders_customer_created 
    ON orders(customer_id, created_at DESC);

-- For foreign keys (often forgotten!)
CREATE INDEX idx_orders_user_id 
    ON orders(user_id);

3. Test Performance

-- Before
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE customer_id = 12847;

-- Output:
-- Seq Scan on orders  (cost=0.00..892341.00 rows=1240 width=120) 
--   (actual time=0.034..5234.123 rows=1240 loops=1)
-- Planning Time: 0.123 ms
-- Execution Time: 5234.567 ms

-- After adding index:
-- Index Scan using idx_orders_customer_id on orders  
--   (cost=0.56..1234.00 rows=1240 width=120) 
--   (actual time=0.034..12.456 rows=1240 loops=1)
-- Planning Time: 0.098 ms
-- Execution Time: 12.678 ms

4. Consider Partitioning When...

  • ✅ Table has 50M+ rows
  • ✅ Queries often filter by date/time
  • ✅ You regularly archive/delete old data
  • ✅ Maintenance operations are slow
  • ❌ Most queries need data across all partitions
  • ❌ Partitioning column changes frequently

5. Implement Partitioning (PostgreSQL)

-- Create new partitioned table
CREATE TABLE orders_new (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    created_at TIMESTAMP NOT NULL,
    -- ... other columns
) PARTITION BY RANGE (created_at);

-- Create partitions for existing data
CREATE TABLE orders_2023 PARTITION OF orders_new
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders_new
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Migrate data (during low-traffic window)
INSERT INTO orders_new SELECT * FROM orders;

-- Atomic swap (requires careful planning)
BEGIN;
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_new RENAME TO orders;
COMMIT;

-- Add indexes to each partition
CREATE INDEX idx_orders_2023_customer 
    ON orders_2023(customer_id, created_at);
CREATE INDEX idx_orders_2024_customer 
    ON orders_2024(customer_id, created_at);

The Gotchas (What They Don't Tell You in Tutorials)

1. Indexes Aren't Free

Every index slows down INSERT, UPDATE, and DELETE operations:

-- Without indexes
INSERT INTO orders VALUES (...);  -- 2ms

-- With 5 indexes
INSERT INTO orders VALUES (...);  -- 12ms

-- The database must update all 5 indexes!

Rule of thumb: Only index columns you query frequently.

2. Over-Indexing Is Real

I once worked with a table that had 23 indexes. Write performance was abysmal. We removed 17 of them. Nothing broke. Writes became 4x faster.

3. Partial Indexes Are Your Secret Weapon

-- Instead of indexing ALL orders
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Only index PENDING orders (if that's what you query)
CREATE INDEX idx_orders_pending 
    ON orders(customer_id) 
    WHERE status = 'pending';

-- This index is 10x smaller and MUCH faster

4. Partitioning Doesn't Automatically Create Indexes

This mistake cost me 3 hours of debugging:

-- When you partition a table, indexes DON'T carry over!
-- You must create them on EACH partition:

CREATE INDEX idx_orders_2023_customer 
    ON orders_2023(customer_id);
CREATE INDEX idx_orders_2024_customer 
    ON orders_2024(customer_id);
-- ... etc

5. Partitioning Complicates Queries

-- This query CAN'T use partition pruning (must check all partitions)
SELECT COUNT(*) FROM orders WHERE customer_id = 12847;

-- This query DOES use pruning (filters by partition key)
SELECT COUNT(*) FROM orders 
WHERE customer_id = 12847 
  AND created_at > '2024-01-01';

Real-World Results: The Numbers

After implementing indexes and partitioning on our production database:

Metric Before After Improvement
Avg Query Time 5.2s 48ms 108x faster
P95 Query Time 38s 340ms 111x faster
Peak QPS 120 2,400 20x increase
Table Size 180GB 180GB (same) Data unchanged
Index Size 0GB 24GB Worth it
Backup Time 4 hours 45 min Parallel dumps
Customer Complaints Daily Zero Priceless

Cost to implement: 2 days of development + 4 hours of maintenance window

Annual cost savings: $120K in server costs alone (fewer resources needed)

The Automatic Partition Manager (Bonus Script)

Manually creating partitions is tedious. Here's a script that auto-creates future partitions:

-- PostgreSQL function to auto-create monthly partitions
CREATE OR REPLACE FUNCTION create_future_partitions()
RETURNS void AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date TEXT;
    end_date TEXT;
BEGIN
    -- Create partitions for next 12 months
    FOR i IN 0..11 LOOP
        partition_date := DATE_TRUNC('month', CURRENT_DATE) + (i || ' months')::INTERVAL;
        partition_name := 'orders_' || TO_CHAR(partition_date, 'YYYY_MM');
        start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
        end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');

        -- Check if partition exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_tables 
            WHERE tablename = partition_name
        ) THEN
            -- Create partition
            EXECUTE format(
                'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)',
                partition_name, start_date, end_date
            );

            -- Add indexes
            EXECUTE format(
                'CREATE INDEX idx_%I_customer ON %I(customer_id, created_at)',
                partition_name, partition_name
            );

            RAISE NOTICE 'Created partition: %', partition_name;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule this to run monthly (using pg_cron or external scheduler)
SELECT create_future_partitions();

The Lessons I Learned (That Night and Beyond)

1. Indexes Are Not Magic

They're a trade-off. More indexes = faster reads but slower writes. Find the balance.

2. Measure, Don't Guess

Always use EXPLAIN ANALYZE. Your intuition about what's slow is probably wrong.

3. Start Simple

Don't partition until you need to. Indexes often solve 95% of performance problems.

4. Partitioning Is Not Sharding

Partitions are in the same database. Sharding is splitting across multiple databases/servers. Know the difference.

5. The Kitchen Metaphor Really Works

When explaining database concepts to non-technical stakeholders, I still use the kitchen model. It clicks instantly.

Your Turn: The 5-Minute Performance Audit

Try this on your own database RIGHT NOW:

-- Find your slowest queries
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Find tables with no indexes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
    AND tablename NOT IN (
        SELECT tablename FROM pg_indexes
    )
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Find unused indexes (wasting space)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
    AND indexrelname NOT LIKE '%_pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;

Final Thoughts

That 2 AM disaster was the best thing that happened to my career. It forced me to understand databases at a fundamental level. And weirdly, it deepened my appreciation for professional kitchens.

Every time I create an index now, I think: I'm organizing the spice rack.

Every time I partition a table, I think: I'm creating specialized stations.

Every time I optimize a query, I think: I'm improving the restaurant flow.

My roommate quit cooking last year and became a database administrator. He says I ruined restaurants for him—now he can't stop seeing them as distributed systems.

Go Forth and Optimize

Challenge: This week, audit one slow query in your production database. Create an index. Measure the improvement. Share your results in the comments—I want to hear your before/after numbers!

Questions I'll answer in comments:

  • MySQL vs PostgreSQL partitioning differences
  • When to use HASH vs RANGE vs LIST partitioning
  • Index bloat and maintenance strategies
  • Partitioning strategies for multi-tenant apps

Resources

- Explain.depesz.com - Visualize EXPLAIN output

Have a database disaster story? Share it below! The best ones often teach the most valuable lessons. 🚀

Tags: #sql #database #performance #postgresql

Meta Description: A production disaster story that teaches SQL indexing and partitioning through restaurant kitchen metaphors. Includes real code examples and a 3,166x performance improvement.

Cover Image Suggestion: A split image showing a chaotic kitchen (left) vs. an organized kitchen with labeled stations (right), metaphorically representing unoptimized vs. optimized databases.

Stop Sending "Monster" PRs: Use stacked pull requests

2026-01-27 02:49:06

Ever worked on a "quick" feature only to realize it involves changes across your data model, services, APIs, and frontend?

You create a single branch, dump 30 commits into it, and open a PR so big your team needs a weekend retreat to review it.

Or even worse, did you ever have to review one of those monster PRs?

That's how you end up with developer fatigue.

Reviewing a large PR takes time if you want quality, yet you're blocking your teammates until it's done.

The trade-off between speed and review depth is real.

You may think, "It's just one PR, no big deal." But what if that becomes the norm for everyone on the team?

So what's the alternative?

Stacked Pull Requests: break the work into tiny, reviewable chunks that land in sequence.

Every chunk stacked on top of it's dependency

In this video I show the workflow, the manual pain points, and how tools like the Graphite CLI automate the stacking so you can ship faster without the maintenance headaches.

Links:

Power Cookie - irisCTF '22

2026-01-27 02:46:40

Challenge description

Looking at the network tab in the dev tools, we see the isAdmin cookie parameter. It seems that all we have to do is intercept the traffic and change the cookie param to 1.

Challenge page

Taking it over to Burp Suite, we change the value of the isAdmin cookie param, and there it is, the FLAG.

FLag

FLAG: picoCTF{gr4d3_A_c00k13_65fd1e1a}