2026-01-27 02:56:16
Ridge vs. Lasso Regression: A Clear Guide to Regularization Techniques
In the world of machine learning, linear regression is often one of the first algorithms we learn. But standard linear regression has a critical weakness: it can easily overfit to training data, especially when dealing with many features. This is where Ridge and Lasso regression come in—two powerful techniques that prevent overfitting and can lead to more interpretable models. Let's break down how they work, their differences, and when to use each.
THE CORE PROBLEM: OVERFITTING
Imagine you're trying to predict house prices based on features like size, age, number of bedrooms, proximity to a school, and even the color of the front door. A standard linear regression might assign some weight (coefficient) to every single feature, even the irrelevant ones (like door color). It will fit the training data perfectly but will fail miserably on new, unseen houses. This is overfitting.
Ridge and Lasso solve this by adding a "penalty" to the regression model's objective. This penalty discourages the model from relying too heavily on any single feature, effectively simplifying it.
RIDGE REGRESSION: THE GENTLE MODERATOR
What it does: Ridge regression (also called L2 regularization) adds a penalty equal to the square of the magnitude of the coefficients.
Simple Explanation: Think of Ridge as a strict but fair moderator in a group discussion. It allows everyone (every feature) to speak, but it prevents any single person from dominating the conversation. No feature's coefficient is allowed to become extremely large, but very few are ever set to zero.
The Math (Simplified):
The Ridge model tries to minimize:(Sum of Squared Errors) + λ * (Sum of Squared Coefficients)
Where λ (lambda) is the tuning parameter. A higher λ means a stronger penalty, pushing all coefficients closer to zero (but never exactly zero).
Example:
`Predicting a student's final exam score (y) using:
A standard regression might output:
Score = 5.0*(Hours) + 0.3*(Pencils)
Ridge regression, with its penalty, might output:
Score = 4.8*(Hours) + 0.05*(Pencils)
See what happened? The coefficient for the important feature (Hours) shrank slightly, and the coefficient for the nonsense feature (Pencils) shrank dramatically. The irrelevant feature is suppressed but not removed.
LASSO REGRESSION: THE RUTHLESS SELECTOR
What it does: Lasso regression (also called L1 regularization) adds a penalty equal to the absolute value of the magnitude of the coefficients.
Simple Explanation: Lasso is a ruthless talent scout. It evaluates all features and doesn't just quiet down the weak ones—it completely eliminates those it deems unnecessary. It performs feature selection.
The Math (Simplified):The Lasso model tries to minimize:
(Sum of Squared Errors) + λ * (Sum of Absolute Coefficients)
Example:
Using the same student score prediction:
A standard regression might output:
Score = 5.0*(Hours) + 0.3*(Pencils)
Lasso regression, with its penalty, might output:
Score = 4.9*(Hours) + 0.0*(Pencils)
The coefficient for Pencils has been forced to absolute zero. Lasso has identified it as useless and removed it from the model entirely, leaving a simpler, more interpretable model.
HEAD-TO-HEAD COMPARISON
Feature Ridge Regression ** Lasso Regression**
Penalty Term Sum of squared coefficients Sum of absolute
coefficients
Effect on Coefficients Shrinks them
smoothly towards zero Can force coefficients to
exactly zero
Feature Selection No. Keeps all features. Yes. Creates sparse
models.
Use Case When you believe all features are relevant, but need to reduce overfitting. When you have many features and suspect only a
subset are important.
Good for Handling multicollinearity (highly correlated features). Building simpler, more interpretable models.
Geometry Penalty region is a circle. Solution tends to be where the error contour touches the circle. Penalty region is a diamond. Solution often occurs at a corner, zeroing out coefficients.
VISUAL ANALOGY: THE FITTING GAME
Imagine you're fitting a curve to points on a graph, with two dials (coefficients) to adjust.
WHICH ONE SHOULD YOU USE?
IN CONCLUSION
Both Ridge and Lasso are essential tools that move linear regression from a simple baseline to a robust, modern technique.
By understanding their distinct "philosophies"—moderation vs. selection—you can strategically choose the right tool to build models that are not only accurate but also generalize well to the real world.
2026-01-27 02:52:58
When building multi-tenant applications, validating cookies per domain can be tricky. I recently worked on a project where each domain had its own cookie configuration, and I wanted to ensure the correct cookie was being read for each request.
public validateToken: RequestHandler = catchAsync(
async (req: Request, res: Response, next: NextFunction): Promise<void> => {
// Extract the hostname dynamically from the request
const host = parse(req.hostname).hostname;
// Get the access cookie name for this domain
const { ACCESS } = DOMAIN_COOKIE[host as keyof typeof DOMAIN_COOKIE];
console.log('Debug Mode – Hostname:', host);
if (!ACCESS) {
return this.unauthorized(req, res, next);
}
const accessCookie = req.signedCookies[ACCESS];
// If the access token is missing, throw an unauthorized error
if (!accessCookie) {
return this.unauthorized(req, res, next);
}
// Continue to next middleware or route
next();
}
);
Key Takeaways
Dynamic Cookie Access
Using parse(req.hostname).hostname allows you to determine which cookie to check for the current request dynamically. This is especially useful for multi-domain setups.
Early Debugging
Adding a console.log statement for the hostname helps confirm which domain the request is coming from and whether the correct cookie name is being used.
Fail Fast
Always check for missing cookies and return an unauthorized response early to prevent unauthorized access.
Why This Matters
Without this setup, your multi-domain app could mistakenly use the wrong cookie, leading to authentication errors. Dynamic validation ensures every request is verified against its intended domain configuration.
2026-01-27 02:52:30
This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI
Hi! I'm a developer based in Mumbai with a deep passion for AI. Lately, I've been exploring the intersection of web development and Artificial Intelligence. I love trying new AI tools to accelerate my workflow and complete tasks at a faster pace.
I am the author of "The Secrets To Master Your Mind", which I published at the age of 14. Currently, I am enrolled as a student at K.J. Somaiya Institute of Technology.
🛠️ The Tech Stack
I wanted my portfolio to be more than just a static page—I wanted it to be an immersive, "liquid" experience. To achieve this, I used a modern, performance-focused stack, orchestrated entirely by Google Antigravity:
Frontend Core: React 19 (via Vite 7) for ultra-fast HMR and build times.
Styling: Tailwind CSS 4 & PostCSS for rapid UI development.
Infrastructure: Docker (Multi-stage build) & Nginx.
Hosting: Google Cloud Run (Serverless container deployment).
🎨 Immersive Design & Animations
To create a "premium" feel, I layered several animation libraries:
Framer Motion: Used for complex component animations and scroll-triggered layout reveals.
GSAP: Powered high-performance tweens and timelines.
Locomotive Scroll: Enabled smooth, inertia-based scrolling to give the site weight and momentum.
WebGL & Shaders: I implemented a custom fluid simulation (SplashCursor.jsx) and particle effects (@tsparticles/react) to create a background that reacts to user interaction.
☁️ Powered by Google Cloud & AI
This project relies heavily on the Google ecosystem for both development and deployment:
Google Cloud Run: I containerized the application using Docker. The Dockerfile uses a multi-stage build (Node 18 for building → Nginx Alpine for serving) to keep the image lightweight. Deploying to Cloud Run was seamless, allowing me to scale to zero when not in use (keeping costs low) while maintaining high availability.
Gemini & AI Assistance: As a Google Student Ambassador, I leverage Google's tools daily. For this project, I used Gemini 3 Pro (via Google Antigravity) to build the entire website from scratch. I generated the UI elements, animations, and styling simply by providing prompts to the Antigravity agents.
I am most proud of finally breaking my cycle of procrastination. I had always put off building my portfolio, but this challenge proved to be the perfect platform to get started. With the help of Google AI tools, I finally completed it.
The animations generated by Antigravity were outstanding and went far beyond my imagination.
2026-01-27 02:49:32
It was 2 AM on a Friday when my phone exploded with alerts. Our e-commerce platform was dying. Page load times had gone from 200ms to 40+ seconds. Customers were abandoning carts. My hands shook as I SSH'd into the database server.
The culprit? A seemingly innocent query:
SELECT * FROM orders
WHERE customer_id = 12847
AND created_at > '2023-01-01'
ORDER BY created_at DESC
LIMIT 20;
This query was taking 38 seconds to execute. Our orders table had grown to 50 million rows. No indexes. No partitions. Just pure, unoptimized chaos.
As I sat there watching EXPLAIN ANALYZE output scroll by, my roommate walked in from his late shift as a line cook. "Dude, why do you look like you're about to cry?"
I explained the problem. He looked at my screen and said something that changed how I think about databases forever:
"Your database is like a kitchen with no organization. You're asking the chef to find one specific tomato in a warehouse full of every ingredient ever used."
That night, sitting in our tiny apartment, a chef taught me more about database optimization than any textbook ever did.
Before we dive into SQL, let me paint you a picture of how a professional kitchen operates during dinner rush.
Imagine a restaurant where everything is in giant boxes:
When the chef needs "fresh basil," someone has to:
This is a full table scan. It works, but it's painfully slow.
Now imagine the same kitchen with a proper system:
The chef needs basil? They:
This is an indexed query. The time saved is exponential.
But here's where it gets interesting. Large restaurants don't have one kitchen—they have stations:
┌─────────────────────────────────────────────┐
│ RESTAURANT KITCHEN │
├──────────────┬──────────────┬───────────────┤
│ Salad │ Grill │ Pastry │
│ Station │ Station │ Station │
│ │ │ │
│ • Lettuce │ • Steaks │ • Flour │
│ • Tomatoes │ • Chicken │ • Sugar │
│ • Dressings │ • Fish │ • Chocolate │
└──────────────┴──────────────┴───────────────┘
When an order comes in for "Caesar Salad," the system:
This is table partitioning. You're dividing your data into logical segments so queries only search where data actually lives.
Let's go back to our disaster query:
-- THE SLOW QUERY (Full Kitchen Search)
SELECT * FROM orders
WHERE customer_id = 12847
AND created_at > '2023-01-01'
ORDER BY created_at DESC
LIMIT 20;
-- Execution time: 38 seconds 😱
-- Rows scanned: 50,000,000
The database is literally checking all 50 million orders, one by one. Let's add an index:
-- CREATE THE SPICE RACK (Index)
CREATE INDEX idx_orders_customer_created
ON orders(customer_id, created_at DESC);
What just happened?
The database created a separate data structure that looks like this:
Index Structure (Simplified):
┌────────────────┬──────────────┬─────────────────────┐
│ customer_id │ created_at │ row_pointer │
├────────────────┼──────────────┼─────────────────────┤
│ 12847 │ 2024-01-15 │ → Row at block 423 │
│ 12847 │ 2024-01-10 │ → Row at block 421 │
│ 12847 │ 2024-01-05 │ → Row at block 418 │
│ ... │ ... │ ... │
└────────────────┴──────────────┴─────────────────────┘
Now when you run the query:
-- SAME QUERY, NOW WITH INDEX
SELECT * FROM orders
WHERE customer_id = 12847
AND created_at > '2023-01-01'
ORDER BY created_at DESC
LIMIT 20;
-- Execution time: 12ms 🚀
-- Rows scanned: 20 (exactly what we needed!)
From 38 seconds to 12 milliseconds. That's a 3,166x improvement.
Do you frequently query by this column?
│
├─ YES → Is it in a WHERE clause?
│ │
│ ├─ YES → CREATE INDEX
│ │
│ └─ NO → Is it in a JOIN?
│ │
│ ├─ YES → CREATE INDEX
│ │
│ └─ NO → Is it in ORDER BY?
│ │
│ └─ YES → Consider INDEX
│
└─ NO → Don't waste space on an index
Here's where most developers trip up. Watch this:
-- Creating individual indexes (WRONG)
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_created ON orders(created_at);
-- Query performance: STILL SLOW
-- Why? The database can only use ONE index efficiently
You need a composite index (multi-column):
-- The RIGHT way (Order matters!)
CREATE INDEX idx_orders_customer_created
ON orders(customer_id, created_at DESC);
The order matters because of the "Leftmost Prefix Rule":
This index helps queries with:
WHERE customer_id = 123
WHERE customer_id = 123 AND created_at > '2024-01-01'
WHERE created_at > '2024-01-01' (customer_id must be specified first!)Think of it like a phone book: You can find "Smith, John" but you can't efficiently find "everyone named John" because it's sorted by last name first.
Six months after the index fix, we hit another wall. Our orders table was now 200 million rows. Even with indexes, some queries were slowing down. The table maintenance (VACUUM, backups) was taking hours.
My chef roommate: "You need to split the kitchen. You're trying to run a single kitchen for 50 different cuisines."
We decided to partition by time (the most common pattern):
-- Step 1: Create the partitioned table
CREATE TABLE orders (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
total_amount DECIMAL(10,2),
created_at TIMESTAMP NOT NULL,
status VARCHAR(50)
) PARTITION BY RANGE (created_at);
-- Step 2: Create partitions (the "stations")
CREATE TABLE orders_2023_q1 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE orders_2023_q2 PARTITION OF orders
FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE orders_2023_q3 PARTITION OF orders
FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE orders_2023_q4 PARTITION OF orders
FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
-- Step 3: Add indexes to EACH partition
CREATE INDEX idx_orders_2023_q1_customer
ON orders_2023_q1(customer_id, created_at);
CREATE INDEX idx_orders_2023_q2_customer
ON orders_2023_q2(customer_id, created_at);
-- ... (repeat for each partition)
What changed?
-- Query for recent orders
SELECT * FROM orders
WHERE created_at > '2024-01-15'
AND customer_id = 12847;
-- Before partitioning:
-- Scans: 200 million rows → finds 20
-- Time: 5 seconds
-- After partitioning:
-- Scans: ONLY orders_2024_q1 (15 million rows) → finds 20
-- Time: 80ms
The database is smart enough to know: "This query only needs Q1 2024 data. I'll skip the other 185 million rows entirely."
Here's the magic visualized:
Query: WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01'
┌──────────────────────────────────────────────┐
│ ORDERS TABLE (Partitioned) │
├───────────────┬───────────────┬──────────────┤
│ 2023_Q1 │ 2023_Q2 │ 2023_Q3 │
│ ❌ SKIPPED │ ❌ SKIPPED │ ❌ SKIPPED │
└───────────────┴───────────────┴──────────────┘
├───────────────┬───────────────┐
│ 2023_Q4 │ 2024_Q1 │
│ ❌ SKIPPED │ ✅ SCANNED │ ← Only this one!
└───────────────┴───────────────┘
-- 1. IDENTIFY slow queries
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 12847
AND created_at > '2023-01-01'
ORDER BY created_at DESC;
-- Look for: "Seq Scan on orders" (BAD!)
-- Want to see: "Index Scan using idx_..." (GOOD!)
-- For WHERE clauses
CREATE INDEX idx_orders_customer_id
ON orders(customer_id);
-- For multi-condition queries (BETTER)
CREATE INDEX idx_orders_customer_created
ON orders(customer_id, created_at DESC);
-- For foreign keys (often forgotten!)
CREATE INDEX idx_orders_user_id
ON orders(user_id);
-- Before
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 12847;
-- Output:
-- Seq Scan on orders (cost=0.00..892341.00 rows=1240 width=120)
-- (actual time=0.034..5234.123 rows=1240 loops=1)
-- Planning Time: 0.123 ms
-- Execution Time: 5234.567 ms
-- After adding index:
-- Index Scan using idx_orders_customer_id on orders
-- (cost=0.56..1234.00 rows=1240 width=120)
-- (actual time=0.034..12.456 rows=1240 loops=1)
-- Planning Time: 0.098 ms
-- Execution Time: 12.678 ms
-- Create new partitioned table
CREATE TABLE orders_new (
id BIGSERIAL,
customer_id BIGINT NOT NULL,
created_at TIMESTAMP NOT NULL,
-- ... other columns
) PARTITION BY RANGE (created_at);
-- Create partitions for existing data
CREATE TABLE orders_2023 PARTITION OF orders_new
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders_new
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Migrate data (during low-traffic window)
INSERT INTO orders_new SELECT * FROM orders;
-- Atomic swap (requires careful planning)
BEGIN;
ALTER TABLE orders RENAME TO orders_old;
ALTER TABLE orders_new RENAME TO orders;
COMMIT;
-- Add indexes to each partition
CREATE INDEX idx_orders_2023_customer
ON orders_2023(customer_id, created_at);
CREATE INDEX idx_orders_2024_customer
ON orders_2024(customer_id, created_at);
Every index slows down INSERT, UPDATE, and DELETE operations:
-- Without indexes
INSERT INTO orders VALUES (...); -- 2ms
-- With 5 indexes
INSERT INTO orders VALUES (...); -- 12ms
-- The database must update all 5 indexes!
Rule of thumb: Only index columns you query frequently.
I once worked with a table that had 23 indexes. Write performance was abysmal. We removed 17 of them. Nothing broke. Writes became 4x faster.
-- Instead of indexing ALL orders
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Only index PENDING orders (if that's what you query)
CREATE INDEX idx_orders_pending
ON orders(customer_id)
WHERE status = 'pending';
-- This index is 10x smaller and MUCH faster
This mistake cost me 3 hours of debugging:
-- When you partition a table, indexes DON'T carry over!
-- You must create them on EACH partition:
CREATE INDEX idx_orders_2023_customer
ON orders_2023(customer_id);
CREATE INDEX idx_orders_2024_customer
ON orders_2024(customer_id);
-- ... etc
-- This query CAN'T use partition pruning (must check all partitions)
SELECT COUNT(*) FROM orders WHERE customer_id = 12847;
-- This query DOES use pruning (filters by partition key)
SELECT COUNT(*) FROM orders
WHERE customer_id = 12847
AND created_at > '2024-01-01';
After implementing indexes and partitioning on our production database:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Avg Query Time | 5.2s | 48ms | 108x faster |
| P95 Query Time | 38s | 340ms | 111x faster |
| Peak QPS | 120 | 2,400 | 20x increase |
| Table Size | 180GB | 180GB (same) | Data unchanged |
| Index Size | 0GB | 24GB | Worth it |
| Backup Time | 4 hours | 45 min | Parallel dumps |
| Customer Complaints | Daily | Zero | Priceless |
Cost to implement: 2 days of development + 4 hours of maintenance window
Annual cost savings: $120K in server costs alone (fewer resources needed)
Manually creating partitions is tedious. Here's a script that auto-creates future partitions:
-- PostgreSQL function to auto-create monthly partitions
CREATE OR REPLACE FUNCTION create_future_partitions()
RETURNS void AS $$
DECLARE
partition_date DATE;
partition_name TEXT;
start_date TEXT;
end_date TEXT;
BEGIN
-- Create partitions for next 12 months
FOR i IN 0..11 LOOP
partition_date := DATE_TRUNC('month', CURRENT_DATE) + (i || ' months')::INTERVAL;
partition_name := 'orders_' || TO_CHAR(partition_date, 'YYYY_MM');
start_date := TO_CHAR(partition_date, 'YYYY-MM-DD');
end_date := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');
-- Check if partition exists
IF NOT EXISTS (
SELECT 1 FROM pg_tables
WHERE tablename = partition_name
) THEN
-- Create partition
EXECUTE format(
'CREATE TABLE %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
-- Add indexes
EXECUTE format(
'CREATE INDEX idx_%I_customer ON %I(customer_id, created_at)',
partition_name, partition_name
);
RAISE NOTICE 'Created partition: %', partition_name;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Schedule this to run monthly (using pg_cron or external scheduler)
SELECT create_future_partitions();
They're a trade-off. More indexes = faster reads but slower writes. Find the balance.
Always use EXPLAIN ANALYZE. Your intuition about what's slow is probably wrong.
Don't partition until you need to. Indexes often solve 95% of performance problems.
Partitions are in the same database. Sharding is splitting across multiple databases/servers. Know the difference.
When explaining database concepts to non-technical stakeholders, I still use the kitchen model. It clicks instantly.
Try this on your own database RIGHT NOW:
-- Find your slowest queries
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Find tables with no indexes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
AND tablename NOT IN (
SELECT tablename FROM pg_indexes
)
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Find unused indexes (wasting space)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
That 2 AM disaster was the best thing that happened to my career. It forced me to understand databases at a fundamental level. And weirdly, it deepened my appreciation for professional kitchens.
Every time I create an index now, I think: I'm organizing the spice rack.
Every time I partition a table, I think: I'm creating specialized stations.
Every time I optimize a query, I think: I'm improving the restaurant flow.
My roommate quit cooking last year and became a database administrator. He says I ruined restaurants for him—now he can't stop seeing them as distributed systems.
Challenge: This week, audit one slow query in your production database. Create an index. Measure the improvement. Share your results in the comments—I want to hear your before/after numbers!
Questions I'll answer in comments:
Have a database disaster story? Share it below! The best ones often teach the most valuable lessons. 🚀
Tags: #sql #database #performance #postgresql
Meta Description: A production disaster story that teaches SQL indexing and partitioning through restaurant kitchen metaphors. Includes real code examples and a 3,166x performance improvement.
Cover Image Suggestion: A split image showing a chaotic kitchen (left) vs. an organized kitchen with labeled stations (right), metaphorically representing unoptimized vs. optimized databases.
2026-01-27 02:49:06
Ever worked on a "quick" feature only to realize it involves changes across your data model, services, APIs, and frontend?
You create a single branch, dump 30 commits into it, and open a PR so big your team needs a weekend retreat to review it.
Or even worse, did you ever have to review one of those monster PRs?
That's how you end up with developer fatigue.
Reviewing a large PR takes time if you want quality, yet you're blocking your teammates until it's done.
The trade-off between speed and review depth is real.
You may think, "It's just one PR, no big deal." But what if that becomes the norm for everyone on the team?
Stacked Pull Requests: break the work into tiny, reviewable chunks that land in sequence.
In this video I show the workflow, the manual pain points, and how tools like the Graphite CLI automate the stacking so you can ship faster without the maintenance headaches.
2026-01-27 02:46:40
Looking at the network tab in the dev tools, we see the isAdmin cookie parameter. It seems that all we have to do is intercept the traffic and change the cookie param to 1.
Taking it over to Burp Suite, we change the value of the isAdmin cookie param, and there it is, the FLAG.
FLAG: picoCTF{gr4d3_A_c00k13_65fd1e1a}