2026-02-14 05:43:22
You've optimized your PostgreSQL queries to death, but your analytics dashboard still crawls under load. Meanwhile, your document-heavy user profiles are forcing you into EAV anti-patterns, and your cache invalidation strategy has grown so Byzantine that onboarding engineers need a separate wiki page just to understand when data is fresh. You add another index, denormalize another table, and watch your write performance take another hit. At some point during your third consecutive sprint dedicated to "database performance improvements," you start to wonder: what if the database isn't the problem?
It isn't. The problem is that you're using one database for everything.
Relational databases are extraordinary tools, but they're not universal tools. PostgreSQL excels at transactional consistency and complex joins, but it buckles under analytical workloads that scan millions of rows. It handles structured data beautifully but turns schema evolution into a migration nightmare. MongoDB gives you flexible schemas but makes cross-document transactions painful. Redis delivers sub-millisecond reads but offers no query language. Each database technology represents a carefully tuned set of trade-offs—and your application needs more than one set.
This is polyglot persistence: using multiple, specialized data stores in a single application, each optimized for specific access patterns. It's not about chasing trends or over-engineering. It's recognizing that the moment you start contorting your database to handle workloads it wasn't designed for, you're paying a tax—in performance, complexity, and developer velocity—that quickly exceeds the cost of running a second database.
The question isn't whether you'll eventually need multiple databases. If you're building anything beyond a CRUD app, you will. The question is recognizing when that moment arrives, and how to make the transition without creating an operational nightmare.
Your PostgreSQL database has served you well for three years. Then, one quarter, everything changes.
The product team ships a recommendation engine, the analytics dashboard adds real-time metrics, and suddenly your carefully tuned queries are timing out. Welcome to the breaking point.
The first warning sign appears in your query analyzer. What started as straightforward joins across three tables now spans eight, with subqueries performing aggregations that scan millions of rows. You're calculating trending scores, filtering by geographic proximity, and joining against user preference hierarchies—all in a single query that your relational database was never designed to handle efficiently.
Your senior engineer proposes adding more indexes. You now have 23 indexes on your main products table. Query planning time has become a bottleneck itself. The PostgreSQL query planner spends 40ms just deciding how to execute queries that should complete in 30ms. You've hit the point where the optimization tools are more expensive than the problem they're solving.
Next comes the schema gymnastics. Your user preferences started as a simple preferences JSONB column. Now it's evolved into a separate user_preferences table with 47 columns, half of them indexed, because queries against JSONB fields are too slow at scale. But this creates a new problem: every preference change requires a row update, creating write amplification and bloating your WAL files.
Meanwhile, your session data is generating 500,000 writes per hour, each one triggering autovacuum cycles that compete with your OLTP workload. You've increased autovacuum_max_workers to 6, tuned autovacuum_vacuum_cost_delay to near-zero, and you're still falling behind. The database that excels at ACID transactions is now spending 30% of its resources managing ephemeral data that expires in 24 hours.
The breaking point crystallizes during a routine product launch. Your primary database is handling 12,000 queries per second—well within its theoretical limits. But 3,000 of those are full-text searches against product descriptions, 2,000 are key-value lookups for session data, and 1,500 are graph traversals for social connections. Each workload pattern fights for different resources: full-text search demands sequential scans, session lookups need memory-resident hash tables, and graph traversals require recursive CTEs that don't scale.
Your monitoring shows 85% CPU utilization, but throughput has plateaued. Adding more CPU cores won't help when the architecture itself creates contention. The cost analysis becomes stark: you're spending $45,000 annually on a database instance sized for your peak workload across all patterns, when specialized databases could handle each pattern for $8,000 each.
This is where single-database architecture doesn't just bend—it breaks. The next section maps these failure patterns to the database types designed to solve them.
Every database technology excels at specific workloads and fails spectacularly at others.
The art of polyglot persistence lies in recognizing these natural boundaries and routing data to the system best suited to handle it.
PostgreSQL, MySQL, and their relational siblings remain the backbone of most applications for good reason. They provide ACID guarantees that make them irreplaceable for domains where data integrity is non-negotiable: financial transactions, order processing, user account management, and any workflow requiring multi-table consistency.
Use relational databases when your queries span relationships ("show me all orders with their line items and customer details"), when you need strict referential integrity, or when regulatory compliance demands audit trails and point-in-time recovery. The structured schema enforces data quality at write time, preventing the data quality erosion that plagues looser storage models.
The breaking point arrives when read patterns don't align with your normalized schema, forcing expensive joins across increasingly large tables, or when write throughput demands horizontal scaling that relational systems resist.
MongoDB, Couchbase, and DynamoDB shine when your data model mirrors how your application actually uses it. Product catalogs with varying attributes across categories, user profiles with customizable fields, and content management systems all benefit from storing complete entities as self-contained documents.
Document databases eliminate the object-relational impedance mismatch—your application works with JSON-like structures that map directly to your code's data structures. Schema flexibility lets you iterate rapidly without migration downtime, adding new fields to documents as features evolve.
Choose document stores when your access patterns are document-centric (fetch entire entities by ID), when different records legitimately have different structures, or when denormalization improves read performance more than it complicates writes.
Redis and Memcached reduce data access to its purest form: store a blob, retrieve it by key, typically in microseconds. This radical simplicity enables the sub-millisecond latency that interactive applications demand.
Key-value stores dominate caching layers, session stores, rate limiting counters, real-time leaderboards, and pub/sub messaging. They trade query flexibility for predictable, blazing-fast operations at massive scale. Redis extends the model with data structures (lists, sets, sorted sets) that support surprisingly sophisticated use cases without sacrificing speed.
Deploy key-value stores as a performance multiplier in front of slower systems, not as a source of truth for data you can't afford to lose. Their in-memory nature means planned restarts require warming strategies to avoid cache stampedes.
InfluxDB, TimescaleDB, and Prometheus specialize in data that arrives continuously and is queried by time range: application metrics, IoT sensor readings, financial tick data, and observability signals.
Time-series databases compress sequential data aggressively, index by timestamp automatically, and provide built-in downsampling and retention policies. They handle write-heavy workloads where 99% of queries target recent data, making them orders of magnitude more efficient than forcing time-series workloads into general-purpose databases.
Match your database choice to your data's read-write ratio, consistency requirements, query patterns, and growth trajectory. Transactional consistency pushes you toward relational. High write throughput with simple lookups favors key-value or document stores. Temporal analysis demands time-series optimization.
The next challenge becomes maintaining consistency when a single user action must update multiple database systems simultaneously—a coordination problem that requires careful architectural boundaries.
The cache-aside pattern with write-through guarantees provides a practical starting point for polyglot persistence. We'll build a dual-write coordinator that maintains consistency between PostgreSQL and Redis while handling the inevitable failures that occur in distributed systems.
The core challenge is maintaining consistency during writes. Every update must succeed in PostgreSQL before propagating to Redis, establishing PostgreSQL as the source of truth:
import redis
import psycopg2
from contextlib import contextmanager
from typing import Optional, Dict, Any
import json
class CacheCoordinator:
def __init__(self, db_config: Dict[str, str], redis_config: Dict[str, str]):
self.db_pool = psycopg2.pool.ThreadedConnectionPool(5, 20, **db_config)
self.redis_client = redis.Redis(**redis_config, decode_responses=True)
self.cache_ttl = 3600 # 1 hour default TTL
@contextmanager
def get_db_connection(self):
conn = self.db_pool.getconn()
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
self.db_pool.putconn(conn)
def update_user(self, user_id: int, updates: Dict[str, Any]) -> bool:
cache_key = f"user:{user_id}"
with self.get_db_connection() as conn:
cursor = conn.cursor()
# Write to PostgreSQL first
set_clause = ", ".join([f"{k} = %s" for k in updates.keys()])
query = f"UPDATE users SET {set_clause}, updated_at = NOW() WHERE id = %s RETURNING *"
cursor.execute(query, list(updates.values()) + [user_id])
updated_user = cursor.fetchone()
if not updated_user:
return False
# PostgreSQL write succeeded, now invalidate cache
try:
self.redis_client.delete(cache_key)
except redis.RedisError:
# Cache invalidation failed, but DB write succeeded
# Log for monitoring but don't fail the operation
print(f"Warning: Cache invalidation failed for {cache_key}")
return True
def get_user(self, user_id: int) -> Optional[Dict[str, Any]]:
cache_key = f"user:{user_id}"
# Try cache first
try:
cached = self.redis_client.get(cache_key)
if cached:
return json.loads(cached)
except redis.RedisError:
# Cache read failed, fall through to database
pass
# Cache miss or unavailable, read from PostgreSQL
with self.get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
user = cursor.fetchone()
if user:
user_dict = dict(zip([desc[0] for desc in cursor.description], user))
# Populate cache (best-effort, don't fail if this breaks)
try:
self.redis_client.setex(
cache_key,
self.cache_ttl,
json.dumps(user_dict, default=str)
)
except redis.RedisError:
pass
return user_dict
return None
This implementation establishes a critical principle: cache failures never fail operations. Redis unavailability degrades performance but not correctness. The database remains the authoritative source, and cache operations are strictly opportunistic. When Redis is down, reads incur higher latency but continue serving correct data from PostgreSQL.
The invalidate-on-write strategy is deliberately simple. Rather than attempting to update the cache with new values (which introduces serialization complexity and potential inconsistencies), we delete the cache entry and let the next read operation repopulate it. This trades a single cache miss for guaranteed consistency.
Invalidation timing significantly impacts consistency guarantees. The naive approach—delete cache after database commit—works for most scenarios but creates observable inconsistency windows. During high concurrency, a read operation might cache stale data immediately after invalidation but before the next read.
For workloads where this matters, consider invalidating before the write commits. This ensures any concurrent reader either gets the old cached value (which matches the uncommitted database state) or triggers a cache miss that reads the newly committed value. The tradeoff is a longer inconsistency window if the database write fails after cache invalidation.
For distributed cache topologies with multiple Redis instances, implement cache key routing to ensure all replicas invalidate consistently. Hash the cache key to determine the primary Redis node, then propagate invalidations to replicas asynchronously. Accept that replica lag creates brief windows of stale reads—optimize for common-case latency over perfect consistency.
The delete-on-write strategy introduces a narrow window where stale data can leak back into cache. Consider this sequence:
For high-consistency requirements, implement versioning using PostgreSQL's updated_at timestamp:
def get_user_versioned(self, user_id: int) -> Optional[Dict[str, Any]]:
cache_key = f"user:{user_id}"
version_key = f"user:{user_id}:version"
try:
cached = self.redis_client.get(cache_key)
cached_version = self.redis_client.get(version_key)
except redis.RedisError:
cached = None
with self.get_db_connection() as conn:
cursor = conn.cursor()
cursor.execute(
"SELECT *, EXTRACT(EPOCH FROM updated_at) as version FROM users WHERE id = %s",
(user_id,)
)
user = cursor.fetchone()
if user:
user_dict = dict(zip([desc[0] for desc in cursor.description], user))
db_version = str(user_dict['version'])
# Only use cache if version matches
if cached and cached_version == db_version:
return json.loads(cached)
# Update cache with new version
try:
pipe = self.redis_client.pipeline()
pipe.setex(cache_key, self.cache_ttl, json.dumps(user_dict, default=str))
pipe.setex(version_key, self.cache_ttl, db_version)
pipe.execute()
except redis.RedisError:
pass
return user_dict
return None
Version checking closes the race condition at the cost of additional Redis operations. Each cache population stores both the data and its version. Reads verify the cached version matches the current database version before trusting cached data. Mismatches trigger cache refresh from PostgreSQL.
The version key approach also enables partial cache invalidation strategies. Rather than deleting cache entries on every write, update only the version key. Subsequent reads detect version mismatches and refresh automatically. This reduces cache churn for write-heavy workloads while maintaining consistency guarantees.
💡 Pro Tip: For write-heavy workloads, consider lazy invalidation instead of immediate cache repopulation. Let the next read operation populate the cache, reducing write amplification.
Extended Redis outages shouldn't cascade into timeout storms. Implement a circuit breaker that trips after consecutive failures:
from datetime import datetime, timedelta
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class RedisCircuitBreaker:
def __init__(self, failure_threshold: int = 5, timeout: int = 60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if datetime.now() - self.last_failure_time > timedelta(seconds=self.timeout):
self.state = CircuitState.HALF_OPEN
else:
return None # Circuit open, skip Redis
try:
result = func(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self.reset()
return result
except redis.RedisError:
self.record_failure()
return None
def record_failure(self):
self.failure_count += 1
self.last_failure_time = datetime.now()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def reset(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
The circuit breaker prevents cascading failures by detecting Redis unavailability patterns. After five consecutive failures, it opens the circuit and stops attempting Redis operations for 60 seconds. This fail-fast behavior prevents request queuing and timeout accumulation that would otherwise degrade application latency.
When the timeout expires, the breaker enters half-open state, allowing a single probe request through. If it succeeds, normal operation resumes. If it fails, the circuit reopens for another timeout cycle. This automatic recovery mechanism eliminates manual intervention for transient Redis outages while protecting against sustained failures.
Integrate the circuit breaker into your cache coordinator by wrapping all Redis operations. Monitor circuit state transitions via metrics—frequent open states indicate Redis capacity or reliability issues requiring infrastructure attention. Consider per-operation circuit breakers for fine-grained failure isolation: cache reads might fail independently of cache invalidations.
With dual-write coordination established, the next challenge emerges: defining where strong consistency ends and eventual consistency begins. Not all data deserves the same consistency guarantees.
The hardest architectural decision in polyglot persistence isn't choosing which databases to use—it's defining where consistency guarantees begin and end. In a single-database system, ACID transactions provide clear boundaries. With multiple databases, you must explicitly architect these boundaries or accept silent data corruption.
The foundational principle: maintain strong consistency within each database's transactional boundary, accept eventual consistency across databases. This means your PostgreSQL transactions remain ACID-compliant, your MongoDB writes are atomic at the document level, but coordination between them follows different rules.
class OrderService:
def create_order(self, user_id: str, items: list):
# Strong consistency: PostgreSQL transaction
with db.transaction():
order = db.orders.create(user_id=user_id, status='pending')
db.order_items.bulk_create([
{'order_id': order.id, 'product_id': item['id']}
for item in items
])
db.inventory.decrement_stock(items)
# Eventual consistency: publish to event store
events.publish('order.created', {
'order_id': order.id,
'user_id': user_id,
'timestamp': time.time()
})
# MongoDB analytics update happens asynchronously
# Redis cache invalidation happens via event consumer
return order
The transaction ensures order creation, line items, and inventory updates happen atomically. The event publication happens after commit—if downstream systems fail to process it, the order still exists. This asymmetry is intentional.
The critical insight: choose transactional boundaries based on invariants that must never be violated. Inventory must never go negative. An order must always have line items. User balance can't be debited without a corresponding transaction record. These are the operations that belong inside a single database transaction.
Analytics dashboards showing slightly stale numbers? Cache entries taking seconds to invalidate? Search indexes lagging behind writes? These can be eventually consistent. The business doesn't break if the "total orders this month" counter is 30 seconds behind reality.
When business operations span multiple databases, implement sagas: sequences of local transactions coordinated through events or orchestration. Each step is independently committed with compensating actions defined for rollback.
class PaymentSaga:
def execute(self, order_id: str, amount: decimal):
saga_state = {'order_id': order_id, 'steps': []}
try:
# Step 1: Reserve funds (PostgreSQL)
payment_id = self.payments.reserve(order_id, amount)
saga_state['steps'].append(('reserve', payment_id))
# Step 2: Update analytics (MongoDB)
analytics_id = self.analytics.record_payment(order_id, amount)
saga_state['steps'].append(('analytics', analytics_id))
# Step 3: Capture payment (PostgreSQL)
self.payments.capture(payment_id)
saga_state['steps'].append(('capture', payment_id))
# Step 4: Clear cache (Redis)
self.cache.delete(f'order:{order_id}')
except Exception as e:
self.compensate(saga_state)
raise SagaFailed(f"Payment saga failed: {e}")
def compensate(self, saga_state: dict):
# Rollback in reverse order
for step_type, step_id in reversed(saga_state['steps']):
if step_type == 'reserve':
self.payments.release_reservation(step_id)
elif step_type == 'analytics':
self.analytics.mark_voided(step_id)
elif step_type == 'capture':
self.payments.refund(step_id)
Each step commits immediately. Failures trigger compensating transactions that semantically undo previous steps. This differs from atomic rollback—there's a window where partial state is visible.
The saga state tracking is crucial. Without it, you can't reliably compensate failures. In production systems, persist this state to a dedicated saga log table so compensation can complete even if the process crashes mid-saga. The log becomes your source of truth for what succeeded and what needs unwinding.
Sagas introduce new failure modes. What if compensation fails? What if the same saga executes twice due to a retry? These questions force you to design idempotent operations and decide whether to retry failed compensations indefinitely or alert operations to intervene manually.
Not all operations have clean compensations. You can refund a payment, but you can't unread an email notification. You can restock inventory, but you can't unpublish a message to a message queue that downstream consumers have already processed.
Design compensations by distinguishing reversible operations from irreversible ones. Reversible: database writes, API calls to systems supporting undo operations, internal state changes. Irreversible: notifications sent to users, webhooks delivered to external systems, physical world actions triggered.
For irreversible operations, build forward-only compensations. Instead of unsending a notification, send a cancellation notification. Instead of deleting a published event, publish a correction event. The history remains immutable, but the current state reflects the compensation.
Two-phase commit (2PC) provides atomic commits across databases but introduces coordinator failure modes and latency penalties. Use it only when correctness absolutely requires atomicity and you can tolerate the performance cost.
The 99% case: don't use 2PC. Financial transfers between accounts in the same PostgreSQL database? Use transactions. Cross-database operations? Use sagas. The 1% case: regulatory requirements mandate atomic updates across systems with no visible intermediate state.
The practical problems with 2PC: locks held across network round-trips, coordinator single point of failure, timeout complexity, and poor performance at scale. A coordinator crash during the commit phase leaves participant databases in limbo—prepared to commit but unable to proceed without coordinator recovery.
💡 Pro Tip: If you're debating whether you need 2PC, you probably don't. Systems requiring that level of coordination often indicate boundaries drawn in the wrong places. Consider whether those databases should be separate at all.
With consistency boundaries established and saga patterns implemented, the next challenge emerges: how do you actually migrate from your existing monolithic database to this polyglot architecture without downtime?
Migrating from a single database to polyglot persistence without downtime requires a disciplined approach. The strangler fig pattern provides a proven framework: gradually route traffic to new data stores while keeping the old system operational until the migration completes.
Start by writing to both the legacy database and the new specialized store without reading from the new system. This validates your write path and builds up data in the target database.
import logging
from dataclasses import dataclass
from typing import Optional
@dataclass
class Product:
id: str
name: str
price: float
inventory: int
class ProductService:
def __init__(self, postgres_repo, redis_cache):
self.postgres = postgres_repo
self.redis = redis_cache
self.logger = logging.getLogger(__name__)
def update_product(self, product: Product) -> bool:
primary_success = False
shadow_success = False
# Primary write to PostgreSQL
try:
primary_success = self.postgres.update(product)
except Exception as e:
self.logger.error(f"Primary write failed: {e}")
return False
# Shadow write to Redis (best effort)
try:
shadow_success = self.redis.set(
f"product:{product.id}",
{"name": product.name, "price": product.price},
ttl=3600
)
except Exception as e:
self.logger.error(f"Shadow write failed: {e}")
# Don't fail the operation - log for monitoring
# Emit metrics for validation
self.logger.info(
"dual_write",
extra={
"primary": primary_success,
"shadow": shadow_success,
"product_id": product.id
}
)
return primary_success
Monitor shadow write failures aggressively. A divergence rate above 0.1% signals configuration issues or capacity problems that will derail your migration. Instrument both latency and error rates—if shadow writes add more than 10ms p95 latency to your primary write path, you need either asynchronous writes via a queue or connection pool tuning.
During this phase, validate data integrity by running periodic reconciliation jobs. Sample 1% of records daily and compare checksums between the legacy and new databases. Catch systematic issues like schema mismatches or data transformation bugs before they propagate to millions of records.
Once shadow writes stabilize, read from both systems and compare results. Serve responses from the legacy database while validating the new system produces identical data.
from typing import Optional
import hashlib
import json
class ProductReader:
def __init__(self, postgres_repo, redis_cache, metrics):
self.postgres = postgres_repo
self.redis = redis_cache
self.metrics = metrics
def get_product(self, product_id: str) -> Optional[Product]:
# Read from primary source
primary_data = self.postgres.get(product_id)
# Read from new source for validation
try:
shadow_data = self.redis.get(f"product:{product_id}")
self._validate_consistency(product_id, primary_data, shadow_data)
except Exception as e:
self.metrics.increment("shadow_read_error")
# Continue serving from primary
return primary_data
def _validate_consistency(self, product_id: str, primary, shadow):
if shadow is None:
self.metrics.increment("shadow_read_miss")
return
# Compare critical fields
mismatches = []
if primary.name != shadow.get("name"):
mismatches.append("name")
if abs(primary.price - shadow.get("price", 0)) > 0.01:
mismatches.append("price")
if mismatches:
self.metrics.increment(
"consistency_mismatch",
tags={"fields": ",".join(mismatches)}
)
self.logger.warning(
f"Data mismatch for product {product_id}: {mismatches}"
)
Run dual-read validation on a percentage of traffic—start at 1% and scale to 100% over two weeks. Set alerting thresholds: paging alerts at 1% mismatch rate, warnings at 0.1%. Use sampling to control load: even at 10% sampling, you'll validate millions of reads per day in a high-traffic system.
Beware of non-deterministic fields that will always mismatch: timestamps, random identifiers, or fields with floating-point precision differences. Exclude these from validation or use tolerance ranges. Focus validation on business-critical fields that impact customer experience or financial accuracy.
Switch read traffic to the new database only after validation shows consistency above 99.9% for at least 72 hours. Implement feature flags to control the rollout at runtime.
Create a rollback runbook before cutover. Document the exact commands to revert traffic routing, including database connection string changes and cache invalidation steps. Test the rollback procedure in staging under load—a theoretical rollback plan fails when you need it. Your runbook should include: connection pool drain procedures, DNS TTL considerations if you're switching endpoints, and the order of operations for multi-region deployments.
Establish clear rollback criteria before you begin cutover. Define automatic triggers: if error rates exceed 0.5%, if p99 latency degrades by more than 20%, or if any customer-impacting data loss occurs, initiate rollback immediately. Assign a single decision-maker who has authority to trigger rollback without consensus-building during an incident.
Keep dual-write active for 30 days post-migration. This grace period lets you switch back to the legacy database if performance degrades or bugs surface under production patterns that staging missed. After 30 days of stable operation, disable writes to the legacy system but retain the data for 90 days as a disaster recovery option.
The strangler pattern succeeds because it eliminates the big-bang cutover. Each phase delivers measurable progress and provides an exit ramp. With polyglot persistence introducing operational complexity, this cautious approach prevents migration failures from becoming production outages.
The architectural elegance of polyglot persistence dissolves quickly when your production system goes down at 3 AM. While the technical benefits are compelling, the operational overhead compounds in ways that don't appear in architecture diagrams.
Each database system brings its own backup tooling, retention policies, and recovery procedures. PostgreSQL uses pg_dump or WAL archiving. MongoDB requires mongodump or filesystem snapshots. Redis needs RDB snapshots or AOF logs. Elasticsearch has its own snapshot API.
The real challenge isn't running these tools—it's orchestrating consistent backups across systems and testing recovery procedures. A point-in-time recovery becomes a coordination nightmare when you need to restore multiple databases to the same logical moment. Your backup scripts grow from a single cron job to a complex orchestration system tracking dependencies and verification checksums across systems.
Budget 2-3x more engineering time for backup infrastructure compared to a single-database setup.
Tracing a single user request across PostgreSQL, Redis, and Elasticsearch requires correlating logs, metrics, and traces from three different monitoring stacks. Each database has different metrics that matter: PostgreSQL connection pools, Redis memory fragmentation, Elasticsearch JVM heap pressure.
Your alerting logic becomes more nuanced. Is that Redis spike causing the PostgreSQL slowdown, or is it the other way around? When data diverges between systems, debugging requires mental models of multiple consistency patterns simultaneously. The engineer on-call needs expertise across your entire database portfolio—or you need multiple specialists on-call.
Polyglot persistence creates knowledge silos. Your team fragments into "the PostgreSQL person," "the Redis person," and "the Elasticsearch person." Code reviews slow down because fewer engineers understand each database's operational characteristics. Onboarding new engineers takes longer as they must learn multiple systems before being productive.
Polyglot persistence makes sense when you have genuine data pattern mismatches that cause measurable pain. If you're considering it for theoretical scalability or resume-driven development, don't. The operational tax is real: expect 40-60% more infrastructure engineering time compared to a single well-tuned database.
If your team is smaller than 8-10 engineers, or if you lack dedicated infrastructure expertise, the operational burden likely outweighs the benefits. Optimize your existing database first—modern PostgreSQL with proper indexing and caching handles far more than most teams realize.
2026-02-14 05:42:48
For my experiments I'm renting a remote Linux server. As soon as it was
online, it became clear that the first real problem wasn't installing
software, but reducing how much of the server was exposed to the
internet by default. Services like SSH are continuously scanned and
probed, and a freshly provisioned host is immediately visible.
This lab documents how I secured that host step by step: first by
establishing a strict firewall baseline, then by introducing a private
administrative VPN, and finally by removing public SSH exposure
entirely.
The goal of this lab is to secure a rented Linux host that acts as the
entry point to a private network.
In this setup, the public host fronts several internal virtual machines.
External HTTP(S) traffic is terminated on the host and routed to
internal services via a reverse proxy, while internal systems are not
directly exposed to the internet.
Although built as a homelab, this mirrors real-world infrastructure
patterns where a single edge node provides controlled access to private
services.
This lab focuses on:
The host serves two roles:
Traffic model:
The final state minimizes public exposure and separates management
traffic from application traffic.
The objective is to regain control.
Using firewalld with a deny-by-default policy, inbound traffic is
restricted to explicitly allowed services. SSH remains publicly
accessible temporarily to avoid lockout during baseline setup.
Phase 1 ensures:
ICMP remains enabled for diagnostics.
With a stable firewall in place, administrative access is redesigned.
OpenVPN is deployed to create a private management plane. SSH is no
longer treated as a public service but as a capability granted to
authenticated VPN members.
Split-tunnel mode is intentionally used. Administrative traffic routes
through the VPN, while general internet traffic remains local.
Once VPN access is validated:
This eliminates global SSH exposure.
Root login is disabled. Administrative access occurs through a dedicated
non-root user with sudo.
Instead of manually assembling the .ovpn file, the repository includes
a helper script:
make-ovpn.sh
The script:
Example:
./make-ovpn.sh client1
This prevents formatting mistakes and ensures server/client consistency.
The full and step‑by‑step documentation — is available here:
👉 https://github.com/iuri-covaliov/devops-labs/tree/main/ProtectRemoteHostWithFirewallAndVPN
This repository is intended to be read alongside the article: the article explains why each layer exists, while the repository shows how it is implemented.
2026-02-14 05:42:35
Earlier this week, @greggyb and @stephr_wong kicked off Google Cloud Live, a no fluff weekly livestream series designed around hands-on building, vibe coding, and live debugging.
For their next episode, Jack Wotherspoon and Greg are taking things further with Gemini CLI. Last week they started with the basics, but now they're going to show us how automate workflows and make Gemini CLI an active part of your development lifecycle.
They'll be diving into skills, hooks, and Plan Mode. Watch them:
Like last time, the Google AI team wants to answer your questions! Here's what happens when they do:
Shoutout to @ofri-peretz for getting their questions answered! Here's the replay if you want to watch the response.
Drop your questions for their next episode in the comments and they'll answer as many as they can on the stream.
2026-02-14 05:40:36
In the previous article, we have processed the input through the neural network and predicted the O shape correctly.
Now lets do the same for the X shape.
If we repeat the same process of creating the feature map and then applying max pooling, we will obtain the following result.
Now let us see what happens when we shift the image of the letter X one pixel to the right.
Let's calculate the feature map.
Now let's pass it through the RELU activation function
Now, we can apply Max pooling.
Now, let's take the input nodes and process it through the neural network.
We can observe that the output value for the letter X is now much closer to 1 than the output value for the letter O, which is -0.2.
In any convolutional neural network, no matter how complex it is, the model will consist of the following components:
These components together allow convolutional neural networks to handle variations in image position while still making correct predictions.
So that is the end of this series, we will explore Recurrent Neural networks next.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀
2026-02-14 05:38:28
This week was... challenging, but in a good way.
I didn't really implement any keywords (unless you count the makeshift type implementation from last week), but I did build the majority of the foundation for this project, and even though some parts took way more than what I'd expected, I feel like most of my issues from last week have been fixed.
If you want a more detailed explanation of each of the upcoming issues, you can check out last week's blog.
This mostly bothered me when I was working on the types for each of the system components, and even though it took a lot of time and effort to finish, it's done now, and it's actually made working on other parts of the system easier, and to be clear, when I say that this part is "done", I don't mean that I will never go near it again, I probably will, but it shouldn't be anything drastic, but rather just minor tweaks.
The difficulty with this part was to write the types in a way that is both flexible enough for what it's needed for, whilst also being strict enough to actually support the project structure.
At this point, I've finished most of the project's foundation, giving me something to work with. I'm no longer writing code in an empty file, but rather, I have something solid that I can attach new code to.
Also, most of the foundational/architectural parts of the code are finished, so I don't need to keep thinking of "how might this make me miserable in the future" as much while writing code, which was pretty exhausting. Most of that exhaustion comes from uncertainty, you reach a point where you're unsure if you're future-proofing your code, or if you're just paranoid.
Most of what I did last week was architecture related, so I didn't really implement any keywords, but I did lay the foundation that would help me implement keywords more efficiently, with less potential bugs, and less difficulty.
Initially, I planned on just using an npm package for JSON Pointers, but most of the packages that I found had JSON Pointer classes with much more features and details than what I needed. I just wanted a class to track JSON Pointers, no need for JSON construction, JSON navigation, and all of these things that I found in the json-pointer, jsonpointer, and @hyperjump/json-pointer packages. So, I decided to just implement it myself.
This class took way more time than expected, it took over a full day of work just to finish the base class (which I spent even more time adding some extra methods to, but more on that in a bit).
One of the reasons this took more time than expected was the fact that it turned out I didn't fully understand JSON Pointers.
I understood how to escape and unescape, how a JSON Pointer worked, how it navigated a schema/instance, but I was a bit confused with the purpose of escaping and unescaping.
Initially, I escaped any input to the JSON Pointer object, and unescaped any output from it, but then I realized that it was the other way around, and then I realized that I shouldn't do that for every single input/output method, but rather just the ones that deal with JSON Pointer strings.
The reason for this is because the purpose of escaping and unescaping is to deal with string representations of JSON Pointers, and string representations are expected to use ~0 and ~1 to represent ~ and / respectively, so you unescape to do that conversion when creating a JSON Pointer object using a JSON Pointer string so that the object can structure the JSON Pointer correctly, and you escape when you output the string representation of that JSON Pointer object so that you have a correctly formatted JSON Pointer string, and you only ever escape and unescape when dealing with JSON Pointer strings, so if you directly push or pop a segment, you shouldn't escape or unescape.
Also, after I fully understood how JSON Pointers worked and finished working on the classes, I realized that I need to add some extra methods (the fork and reconstruct methods), since those methods would make the rest of my code easier to write and decrease the likelihood of me adding bugs that would potentially take days of debugging to fix (e.g. having a hidden bug because I forgot to pop a segment at the end of a random recursive run in a random keyword implementation).
Even though this took way more time than I expected, it actually deepened my understanding of how JSON Pointers work and gave me a clear mental model of what I need to think of and do when working on foundational parts of a system, and that's the main purpose of this project, learning.
This is an interesting one.
After reading the specs and trying to implement the logic and structure for the Output result, I was unable to actually work on it, since some parts were unclear in the specs, so I asked the JSON Schema maintainers on slack, and I was given a blog post by Greg Dennis, who is one of the maintainers at JSON Schema, and the main person who worked on the Output Formatting section in the specs.
In that blog post, he said that there were problems with how Output results were structured, and that he had made an updated version of it that fixes those problems.
The new Output Format is just amazing, it's just super simple, clear, and pragmatic. I'd advise anyone interested in JSON Schema to take a look at it.
Also, after some back and forth with one of the maintainers, I decided to ditch the detailed output format in favor of the basic one. The detailed output format probably wouldn't be that hard to do, but the basic one is just trivial, and it's better to save as much time and energy as I can, since I still have a lot of things to do.
I explained how the Visitor Architecture worked in last week's blog, so I won't re-explain it here, but I will talk about what I did and what I plan on doing.
The Visitor Architecture has 2 primary components that interact with one another, those being the ValidationContext and the individual keyword implementations.
The plan is to keep the individual keyword implementations "dumb" and give the ValidationContext pretty much all of the power. The reason for this is separation of concerns, so that I'm less likely to mess up when working on the keyword implementations (e.g. making a mistake when trying to save an error object in one of the keyword implementations, so the code for saving error objects would be in a method inside ValidationContext and saving errors in keyword implementations would only be done using that method).
The ValidationContext is the last part of this project's foundation that still needs a good amount of work.
These are some things that are on my mind relating to upcoming tasks in this project.
I'm definitely planning on using the official JSON Schema Test Suite for this project, no use having a bunch of code if it doesn't actually do what it's supposed to, and I can't be sure it does unless I test it.
But I'm planning on doing this when I actually have something tangible to test, so I'll probably wait until I finish implementing a couple of good keywords before I start testing my code.
When I started writing the code for this project, I was seriously overwhelmed, mostly because I just felt completely lost, but right now, that feeling is completely gone.
If I had to guess, I'd say things are only going to get more exciting from now on.
The code can be found on GitHub
2026-02-14 05:35:48
I’ve been studying OSINT techniques and asynchronous programming in Python, so I decided to build a small experimental CLI tool called VoidScan.
The goal was not to compete with established tools, but to understand async I/O, HTTP requests at scale, and CLI architecture.
VoidScan scans a given username across multiple platforms and checks whether the account exists.
It includes:
aiohttp
Typer and Rich
I wanted to practice:
There are larger OSINT tools like Sherlock and Maigret that are more complete.
VoidScan is intentionally:
https://github.com/secretman12-lang/voidscan
Feedback is welcome!