2025-11-23 06:02:29
CinemaSins takes a fresh look at The Wiz (now that Wicked is back in theaters) with a rapid-fire “Everything Wrong With The Wiz In 15 Minutes Or Less” video. They highlight the sins of the film, introduce their writers, and point you toward their YouTube channels, social media, and website for more content.
They also invite fans to fill out a quick poll, join their Discord and Reddit communities, and support the team on Patreon for exclusive perks. Follow them on Twitter, Instagram, TikTok, and more for daily movie critiques and behind-the-scenes fun.
Watch on YouTube
2025-11-23 06:02:22
CinemaSins serves up their signature snark in a bite-sized roast of the new KPop Demon Hunters movie, rattling off every plot hole, trope and over-the-top moment in just 16 minutes. Dive deeper at cinemasins.com or catch more sin-filled content on YouTube via TVSins, CommercialSins and the CinemaSins Podcast Network.
Hungry for more? Hit their Linktree for polls, Patreon support and all the socials—Twitter, Instagram, TikTok, Discord and Reddit. Big shout-out to sin scribes Jeremy, Chris, Aaron, Jonathan, Deneé, Ian and Daniel for keeping the cinematic guilt trip hilarious.
Watch on YouTube
2025-11-23 05:55:52
Reading time: 9 minutes
PageSpeed optimization isn't new. Google released Lighthouse in 2016, and performance best practices have been documented for years. You might expect that by 2025, most professional websites would score well.
The opposite is true.
Modern websites are often slower than sites built five years ago. Despite faster networks and more powerful devices, the average website in 2025 is heavier, more complex, and performs worse than its predecessors. Why?
The complexity creep:
I regularly encounter professionally built websites from 2024-2025 scoring 40-70/100. These aren't old legacy sites—they're new builds using modern tools, costing thousands of dollars, from established agencies.
You might ask: "Isn't 85/100 good enough? Does the last 15 points really matter?"
The answer depends on what you're optimizing for.
The business case:
Google's research shows that 53% of mobile users abandon sites taking longer than 3 seconds to load. Each additional second of load time correlates with approximately 7% reduction in conversions. These aren't small numbers—they directly affect revenue.
For a business with 10,000 monthly visitors and a 3% conversion rate:
That's 90 lost conversions monthly. At $100 average transaction value, that's $108,000/year in lost revenue.
The SEO case:
Core Web Vitals became Google ranking factors in 2021. Two identical sites with identical content will rank differently based on performance. The faster site gets more organic traffic. More traffic means more conversions. The performance advantage compounds over time.
The user experience case:
Beyond metrics, there's a qualitative difference between a site that scores 85/100 and one that scores 100/100. The 100/100 site feels instant. Content appears immediately. Nothing jumps around. Users trust it more, engage more, and return more often.
The competitive advantage:
If your competitors score 60-70/100 and you score 100/100, you've created a measurable advantage in user experience, search rankings, and conversion rates. In competitive markets, these margins matter.
So yes, the last 15 points matter—not for the score itself, but for the business outcomes those points represent.
Most developers know the basics—optimize images, minify CSS, reduce JavaScript. But knowing the basics and achieving 100/100 are different things. The gap between 85/100 and 100/100 isn't about doing more of the same. It requires understanding which techniques have the most impact and implementing them correctly.
I've built multiple sites scoring 100/100/100/100 across all four metrics (Performance, Accessibility, Best Practices, SEO). In this guide, I'll explain the specific techniques that matter most, why they work, and what to watch out for.
You'll learn:
Before we start, a reality check: getting to 100/100 takes time the first time through—typically 4-6 hours for a complete site. Once you understand the patterns, subsequent optimizations go faster. But there's no shortcut around learning what works.
The problem: When browsers load a page, external CSS files block rendering. The browser downloads your HTML, encounters <link rel="stylesheet">, pauses rendering, downloads the CSS, parses it, then finally renders the page. This delay affects your Largest Contentful Paint (LCP) score significantly.
The solution: Inline your critical CSS directly in the <head> tag. The browser can render immediately without waiting for external files.
A real example: A professional services site I optimized scored 94/100 before CSS inlining. After moving critical styles inline, it scored 100/100. The only change was moving approximately 3KB of above-the-fold CSS into the HTML head.
Here's what that structure looks like:
<head>
<style>
/* Critical CSS - inline everything needed for initial render */
body { margin: 0; font-family: system-ui, sans-serif; }
header { background: #1a1a1a; color: white; padding: 1rem; }
.hero { min-height: 400px; background: linear-gradient(...); }
</style>
<!-- Load non-critical CSS asynchronously -->
<link rel="stylesheet" href="/full-styles.css" media="print" onload="this.media='all'">
</head>
What to watch for: Keep inline CSS under 8KB. Beyond this size, you're delaying HTML download time, which can actually hurt your First Contentful Paint score instead of helping it. Extract only the styles needed for above-the-fold content.
Framework considerations: If you're using Nuxt, Next.js, or similar frameworks, look for build-time CSS extraction features. Nuxt's experimental.inlineSSRStyles handles this automatically during static generation.
The problem: Images typically account for 60-80% of page weight. Unoptimized images directly affect load times, especially on mobile networks.
The solution: Use AVIF format where supported, with WebP fallback, and serve appropriately sized images for different viewports.
A real example: I built a healthcare website (medconnect.codecrank.ai) with professional medical imagery and team photos. Initial image exports totaled approximately 2.5MB per page. After optimization:
The implementation:
<picture>
<source srcset="hero-400.avif 400w, hero-800.avif 800w, hero-1200.avif 1200w"
type="image/avif">
<source srcset="hero-400.webp 400w, hero-800.webp 800w, hero-1200.webp 1200w"
type="image/webp">
<img src="hero-800.jpg"
alt="Hero image"
width="1200"
height="800"
loading="lazy">
</picture>
How browsers handle this: Modern browsers automatically select the best format and size they support. Chrome uses AVIF, Safari uses WebP (AVIF support pending), and older browsers fall back to JPG. Mobile devices get 400px versions, desktop gets 1200px. You write it once, browsers handle the rest.
Tools worth knowing: Squoosh (squoosh.app) for manual conversion with quality preview, or Sharp (Node.js library) for batch processing. Both give you control over quality settings per image.
The problem: External JavaScript files block the browser's main thread during download and execution. Even optimized scripts add 200-500ms of blocking time, which directly affects your Time to Interactive and Total Blocking Time scores.
The solution: Defer JavaScript execution until after initial page render. Load scripts after the page displays content to users.
The Google Analytics consideration: Standard GA4 implementation is the most common performance issue I encounter. The default tracking code blocks rendering and adds approximately 500ms to LCP.
Standard implementation (blocks rendering):
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script>
Performance-optimized implementation:
<script>
window.addEventListener('load', function() {
// Load GA4 after page fully renders
var script = document.createElement('script');
script.src = 'https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX';
document.head.appendChild(script);
});
</script>
The trade-off: This approach won't track users who leave within the first 2 seconds of page load. In practice, this represents less than 1% of traffic for most sites and is worth the performance improvement.
What to check: Review your <head> section. Any <script> tag without defer or async attributes is blocking. Move it to the page bottom or defer its execution.
The problem: Content shifting position while the page loads creates a poor user experience and hurts your CLS (Cumulative Layout Shift) score. Common causes include images loading without reserved space, web fonts swapping in, or ads inserting dynamically.
The solution: Reserve space for all content before it loads.
A real example: A professional services site I built maintains a CLS score of 0 (zero layout shift). Here's the approach:
1. Set explicit image dimensions:
<img src="hero.jpg" width="1200" height="800" alt="Hero">
<!-- Browser reserves 1200x800 space before image loads -->
2. Use CSS aspect-ratio for responsive images:
img {
width: 100%;
height: auto;
aspect-ratio: 3/2; /* Maintains space even as viewport changes */
}
3. Configure fonts to display fallbacks immediately:
@font-face {
font-family: 'CustomFont';
src: url('/fonts/custom.woff2');
font-display: swap; /* Show system font immediately, swap when custom font loads */
}
The result: Content positions remain stable throughout the load process. The page feels responsive and professional.
Debugging layout shift: Run PageSpeed Insights and review the filmstrip view. If you see elements jumping position, add explicit dimensions or aspect-ratios to the shifting elements.
Even when following established practices, you might encounter these issues:
This happens when inline CSS exceeds 8-10KB. The browser must download the entire HTML file before rendering anything, which delays First Contentful Paint.
Solution: Extract only above-the-fold styles. Identify which CSS is needed for initial viewport rendering, inline only that portion, and load the rest asynchronously:
<link rel="stylesheet" href="/full-styles.css" media="print" onload="this.media='all'">
Default AVIF quality settings are often too aggressive. A quality setting of 50 works for photographs but degrades images containing text or graphics.
Solution: Increase quality to 75-85 for images with text or fine details. Use image conversion tools that show quality previews before batch processing.
Common culprits beyond images: web fonts loading (text reflows when custom font loads), ads inserting dynamically, or content above images pushing them down during load.
Solutions:
font-display: swap and preload critical font fileswidth and height attributes or use CSS aspect-ratio
Mobile devices have slower processors and network connections. What renders quickly on a development machine often struggles on mid-range Android phones over 4G networks. If you're only testing on desktop, you're missing how most users experience your site.
Solution: Always test mobile performance with Chrome DevTools throttling enabled (4x CPU slowdown, Fast 3G network). This simulates realistic mobile conditions and reveals actual user experience. Aim for 90+ on mobile, and you will likely get 100 on desktop.
The challenge: Your site might perform well on your development machine but struggle in real-world conditions. Mobile users on 4G with mid-range phones experience performance very differently than you do on a MacBook Pro with fiber internet.
The testing process:
A perspective on perfection: Don't necessarily chase 100/100 if you're already at 95+. The final 5 points often represent diminishing returns. Consider whether that time is better spent on content, user experience, or other priorities.
Real device testing: Test on actual mobile devices when possible, not just Chrome DevTools simulation. Real hardware reveals issues that simulators miss.
Examples you can test right now:
I built these sites to demonstrate these techniques in production:
Professional Services Example - zenaith.codecrank.ai
MedConnect Pro (Healthcare Site) - medconnect.codecrank.ai
Mixology (Cocktail Recipes) - mixology.codecrank.ai
You can test any of these sites yourself. Newer sites even include a 'PageSpeed Me' button linking directly to Lighthouse testing. I have nothing to hide—the scores are verifiable.
Most developers ship sites that "work" and move on. Getting to 100/100 takes additional time and attention to detail that many choose not to invest.
I put PageSpeed buttons on every site I build because the work should speak for itself. If I claim 100/100, you can verify it immediately. If I don't achieve it, I'll explain exactly why (client-requested features, necessary third-party integrations, etc.).
Fair warning: PageSpeed scores can fluctuate by a few points depending on network conditions, server load, and time of day. A site scoring 100/100 might test at 98/100 an hour later. What matters is consistent high performance (90-100 range), not chasing a perfect score every single test.
This transparency is uncommon in web development. Many agencies don't want you testing their work. I built these techniques into my process specifically so performance isn't an afterthought—it's built in from the start.
The results are measurable, verifiable, and reproducible. Some clients care deeply about performance. Some don't. I serve those who do.
Optimizing to 100/100 isn't quick the first time through. For a typical site, expect:
Total: 4-6 hours for first-time implementation
However: Once you've completed this process for one site, you understand the patterns. Your second site takes approximately 2 hours. Your tenth site takes 30 minutes because you've built the tooling and established the workflow.
Most developers never invest this time because "good enough" ships. But if you care about user experience, SEO performance, and conversion rates, it's worth learning these techniques.
You now understand how to achieve 100/100 PageSpeed scores. You know the techniques, the trade-offs, and the testing approach.
In my next article, I'll examine why performance optimization often gets overlooked in professional web development. I'll share a real case study—a $5,000 professional website scoring 40/100—and explain what affects the cost and quality of web development.
Want to verify these techniques work? Visit any of the sites mentioned above and click "⚡ PageSpeed Me" to test them live. Then consider: what would perfect performance scores mean for your business?
Need help with performance optimization? Visit codecrank.ai to learn about our approach to web development. We build performance optimization into every project from day one.
All performance metrics verified with Google Lighthouse. Sites tested on mobile with 4G throttling and mid-tier device simulation.
2025-11-23 05:49:28
Using golang and cobra-cli, I built a simple command-line interface for managing support tickets. Tickets are stored locally in a CSV file.
As developers, we often find ourselves juggling multiple tasks, bugs, and feature requests. While tools like Jira, Trello, or GitHub Issues are powerful, sometimes you just need something simple, fast, and local to track your daily work without leaving the terminal.
That's why I built Ticket CLI—a simple command-line tool written in Go to track daily tickets and store them in a CSV file. No servers, no databases, just a binary and a text file.
For this project, I choose:
encoding/csv): To keep dependencies low, I used Go's built-in CSV support for data persistence.The project follows a standard Go CLI structure:
ticket-cli/
├── cmd/ # Cobra commands (add, list, delete)
├── internal/ # Business logic
│ └── storage/ # CSV handling
└── main.go # Entry point
Using Cobra, I defined commands like add, list, and delete. Here's a snippet of how the add command handles flags to create a new ticket:
// cmd/add.go
var addCmd = &cobra.Command{
Use: "add",
Short: "Add a new ticket",
Run: func(cmd *cobra.Command, args []string) {
// default values logic...
t := storage.Ticket{
ID: fmt.Sprintf("%d", time.Now().UnixNano()),
Title: flagTitle,
Customer: flagCustomer,
Priority: flagPriority,
Status: flagStatus,
Description: flagDescription,
}
if err := storage.AppendTicket(t); err != nil {
fmt.Println("Error saving ticket:", err)
return
}
fmt.Println("Ticket saved with ID:", t.ID)
},
}
Instead of setting up SQLite or a JSON store, I opted for CSV. It's human-readable and easy to debug. The internal/storage package handles reading and writing to tickets.csv.
// internal/storage/storage.go
func AppendTicket(t Ticket) error {
// ... (file opening logic)
w := csv.NewWriter(f)
defer w.Flush()
rec := []string{t.ID, t.Date, t.Title, t.Customer, t.Priority, t.Status, t.Description}
return w.Write(rec)
}
You can clone the repo and build it yourself:
git clone https://github.com/yourusername/ticket-cli
cd ticket-cli
go mod tidy
go build -o ticket-cli .
./ticket-cli add --title "Fix login bug" --priority high --customer "Acme Corp"
./ticket-cli list
./ticket-cli list --date 2025-11-15
A CSV file is created automatically at the Project directory, and it keeps getting updated, once a new ticket is added using the .ticket-cli add --flags
ID, Date, Title, Customer, Priority, Status, Description
This is just an MVP. Some ideas for the future include:
bubbletea for an interactive dashboard.Building CLI tools in Go is a rewarding experience. Cobra makes the interface professional, and Go's standard library handles the rest. If you're looking for a weekend project, try building your own developer tools!
Check out the code on GitHub.
2025-11-23 05:48:00
Over the past two years, large language models have moved from research labs to real-world products at an incredible pace. What began as a single API call quickly evolves into a distributed system touching compute, networking, storage, monitoring, and user experience. Teams soon realize that LLM engineering is not prompt engineering — it’s infrastructure engineering with new constraints.
In this article, we’ll walk through the key architectural decisions, bottlenecks, and best practices for building robust LLM applications that scale.
Traditional software systems are built around predictable logic and deterministic flows. LLM applications are different in four ways:
Even a small prompt can require billions of GPU operations. Latency varies dramatically based on:
The same input can return slightly different answers due to sampling. This complicates:
LLMs are one of the most expensive workloads in modern computing. GPU VRAM, compute, and network speed all constrain throughput.
Architecture decisions directly affect cost.
New models appear monthly, often with:
A production LLM application has five major components:
Pros:
Access to top models
Cons:
Expensive at scale
Limited control over latency
Vendor lock-in
Private data may require additional compliance steps
Use API hosting when your product is early or workloads are moderate.
Pros:
Deploy on-prem for sensitive data
Cons:
Complex to manage
Requires GPU expertise
Requires load-balancing around VRAM limits
Use self-hosting when:
you exceed ~$20k–$40k/mo in inference costs
latency control matters
models must run in-house
you need fine-tuned / quantized variants
Real systems require:
short-term vs long-term memory separation
RAG extends the model with external knowledge. You need:
a vector database (Weaviate, Pinecone, Qdrant, Milvus, pgvector)
embeddings model
chunking strategy
ranking strategy
Best practice:
Use hybrid search (vector + keyword) to avoid hallucinations.
Agents need memory layers:
Ephemeral memory: what’s relevant to the current task
Long-term memory: user preferences, history
Persistent state: external DB, not the LLM itself
As soon as you do more than “ask one prompt,” you need an orchestration layer:
LangChain
LlamaIndex
Eliza / Autogen
TypeChat / E2B
Custom state machines
Why?
Because real workflows require:
tool use (API calls, DB queries)
conditional routing (if…else)
retries and fallbacks
parallelization
truncation logic
evaluation before showing results to users
Best practice:
Use a deterministic state machine under the hood.
Use LLMs only for steps that truly require reasoning.
LLM evals are not unit tests. They need:
a curated dataset of prompts
automated scoring (BLEU, ROUGE, METEOR, cosine similarity)
LLM-as-a-judge scoring
human evaluation
Correctness: factual accuracy
Safety: red teaming, jailbreak tests
Reliability: consistency across temperature=0
Latency: P50, P95, P99
Cost: tokens per workflow
Best practice:
Run nightly evals and compare the current model baseline with:
new models
new prompts
new RAG settings
new finetunes
This prevents regressions when you upgrade.
Observability must be built early.
prompts
responses
token usage
latency
truncation events
RAG retrieval IDs
model version
chain step IDs
Alert on:
latency spikes
cost spikes
retrieval failures
model version mismatches
hallucination detection thresholds
Tools like LangSmith, Weights & Biases, or Arize AI can streamline this.
LLM compute cost is often your biggest expense. Ways to reduce it:
Today’s 1B–8B models (Llama, Mistral, Gemma) are extremely capable.
Often, a well-prompted small model beats a poorly-prompted big one.
semantic caching
response caching
template caching
This reduces repeated calls.
Quantized 4-bit QLoRA models can cut VRAM use by 70%.
Batching increases GPU efficiency dramatically.
Streaming reduces perceived latency and helps UX.
Long prompts = long latency = expensive runs.
LLM systems must handle:
Never trust user input. Normalize, sanitize, or isolate it.
Don’t send sensitive data to external APIs unless fully compliant.
Protect:
model APIs
logs
datasets
embeddings
vector DBs
Post-processing helps avoid toxic or harmful outputs.
Over the next 18 months, we’ll see:
long-context models (1M+ tokens)
agent frameworks merging into runtime schedulers
LLM-native CI/CD pipelines
cheaper inference via MoE and hardware-optimized models
GPU disaggregation (compute, memory, interconnect as separate layers)
The direction is clear:
LLM engineering will look more like distributed systems engineering than NLP.
Building a production-grade LLM system is much more than writing prompts. It requires thoughtful engineering across compute, memory, retrieval, latency, orchestration, and evaluation.
If your team is moving from early experimentation to real deployment, expect to invest in:
reliable inference
RAG infrastructure
model orchestration
observability
cost optimization
security
The companies that succeed with LLMs are not the ones that use the biggest model — but the ones that engineer the smartest system around the model.
2025-11-23 05:46:52
When you work with microservices in AWS (especially in ECS, EKS, or internal applications inside a VPC), sooner or later you need to expose a REST endpoint through Amazon API Gateway, but without making your backend public.
For many years, the only “official” way to integrate API Gateway (REST) with a private Application Load Balancer (ALB) was by placing a Network Load Balancer (NLB) in the middle.
This created three common issues in real-world projects:
For students or small teams, this architecture was confusing and far from intuitive:
“Why do I need an NLB if my backend is already behind an ALB?”
And yes… we were all asking ourselves the same thing.
Until recently, the flow looked like this:
API Gateway → VPC Link → NLB → ALB (Privado)
The NLB acted as a “bridge” because API Gateway could only connect to an NLB using VPC Link. ALB wasn’t supported directly.
This worked, but:
AWS finally heard our prayers 🙏.
Now API Gateway (REST) supports direct private integration with ALB using VPC Link v2.
The new flow looks like this:
API Gateway → VPC Link v2 → ALB (Privado)
In summary:
This allows you to naturally expose your internal microservices behind a private ALB without adding any extra resources.
| Aspect | Old (NLB Required) | New (Direct to ALB) |
|---|---|---|
| Infrastructure | More complex (extra NLB) | Much simpler |
| Cost | Hourly NLB + extra NLCUs | Only ALB + API Gateway |
| Latency | Higher (extra hop through NLB) | Lower |
| Maintenance | Two load balancers | One load balancer |
| Security | Good, but more SG rules | Equally secure, fewer failure points |
| Clarity | Hard to explain | Much more intuitive |
| Scalability | Depends on NLB | Highly scalable VPC Link v2 |
| Flexibility | Limited to NLB | Supports multiple ALBs/NLBs |
If you’re learning cloud architecture — especially microservices — this change is a huge benefit:
It also unlocks modern patterns:
AWS simplified a pattern that had been unnecessarily complex for years. The direct integration between API Gateway and private ALB:
If you were building internal APIs and previously needed an NLB just to bridge API Gateway to ALB… you can forget about that now.
The architecture is cleaner, more modern, and aligned with real cloud-native best practices.
Here are the official sources and recommended materials for deeper study: