2025-12-28 00:30:17
Successful AI transformation starts with deeply understanding your organization’s most critical use cases. This practical guide from You.com walks through a proven framework to identify, prioritize, and document high-value AI opportunities.
In this AI Use Case Discovery Guide, you’ll learn how to:
Map internal workflows and customer journeys to pinpoint where AI can drive measurable ROI
Ask the right questions when it comes to AI use cases
Align cross-functional teams and stakeholders for a unified, scalable approach
This week’s system design refresher:
Common Network Protocols Every Engineer Should Know
🚀 Learn AI in the New Year! Become an AI Engineer | Learn by Doing | Cohort 3
8 Popular Network Protocols
9 best practices for developing microservices
SPONSOR US
Ever wonder what actually happens when you click "Send" on an email or join a video call? Every click, message, and API call on the internet relies on network protocols. They define how data moves, who can talk, and how securely it all happens.
At the foundation are transport protocols: TCP ensures reliable delivery, UDP prioritizes speed, and QUIC brings both worlds together over UDP.
On top of that, HTTP powers the web, TLS secures it, and DNS translates names into addresses.
Need remote access? That’s SSH. File transfers? SFTP or SMB.
Real-time chat and media? WebSocket, WebRTC, and MQTT keep data flowing live.
For identity and access, OAuth and OpenID handle authorization and authentication.
In the backend, DHCP, NTP, ICMPv6, and LDAP quietly keep everything synchronized, addressed, and discoverable.
From simple emails (SMTP, IMAP) to encrypted VPNs (WireGuard, IPsec), these protocols form the invisible language that keeps the internet connected and secure.
Over to you: If one protocol suddenly stopped working worldwide, which one would break the internet first?
After the amazing success of Cohorts 1 and 2 (with close to 1,000 engineers joining and building real AI skills), I’m excited to announce the launch of Cohort 3 of Become an AI Engineer!
This isn’t just another course on AI tools and frameworks. Our mission is to equip engineers with the solid foundation and complete end-to-end skill set required to excel as AI engineers in today’s fast-moving world.
Here’s what sets this cohort apart:
Learn by doing: Build real-world AI applications hands-on, far beyond just watching videos.
Structured, systematic curriculum: Progress step by step from core fundamentals to advanced concepts in a carefully crafted learning path.
Live feedback and mentorship: Receive direct guidance and reviews from experienced instructors and peers.
Strong community support: Learning solo is tough — learning together with a motivated community makes it enjoyable and effective!
If you missed the previous cohorts and want to 𝐥𝐞𝐚𝐫𝐧 𝐀𝐈 𝐢𝐧 𝐭𝐡𝐞 𝐍𝐞𝐰 𝐘𝐞𝐚𝐫, this is your perfect opportunity to join Cohort 3 and level up your AI engineering career.
Network protocols are the key to transferring data between two systems in a network.
FTP (File Transfer Protocol)
Uses separate control and data channels to upload and download files between a client and server.
TCP (Transmission Control Protocol)
Establishes a reliable connection using a 3-way handshake (SYN, SYN+ACK, ACK) for accurate data delivery.
UDP (User Datagram Protocol)
Sends lightweight, connectionless packets (requests and responses) with minimal latency. Ideal for fast transmissions.
HTTP (HyperText Transfer Protocol)
Uses TCP to request and receive web resources (HTML, images) through HTTP requests and responses.
HTTP/3 (QUIC)
Built on top of UDP, it enables faster and more reliable connections by multiplexing data streams and reducing latency.
HTTPS (Secure HTTP)
Secures HTTP with encryption using public and session keys over a TCP connection, thereby protecting web data.
SMTP (Simple Mail Transfer Protocol)
Transfer emails from a sender to a recipient through an SMTP server. It is commonly used for email delivery.
WebSocket
Upgrades an HTTP connection to a full-duplex channel for real-time, bidirectional communication like live chats.
When we develop microservices, we need to follow the following best practices:
Use separate data storage for each microservice
Keep code at a similar level of maturity
Separate build for each microservice
Assign each microservice with a single responsibility
Deploy into containers
Design stateless services
Adopt domain-driven design
Design micro frontend
Orchestrating microservices
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-27 00:30:42
After the amazing success of Cohorts 1 and 2 (with close to 1,000 engineers joining and building real AI skills), we are excited to announce the launch of Cohort 3 of Become an AI Engineer!
This is not just another course about AI frameworks and tools. Our goal is to help engineers build the foundation and end to end skill set needed to thrive as AI engineers.
Here’s what makes this cohort special:
• Learn by doing: Build real world AI applications, not just by watching videos.
• Structured, systematic learning path: Follow a carefully designed curriculum that takes you step by step, from fundamentals to advanced topics.
• Live feedback and mentorship: Get direct feedback from instructors and peers.
• Community driven: Learning alone is hard. Learning with a community is easy!
We are focused on skill building, not just theory or passive learning. Our goal is for every participant to walk away with a strong foundation for building AI systems.
If you want to start learning AI from scratch, this is the perfect time to begin.
2025-12-24 00:30:27
Real-time isn’t just about speed. It’s about instant, fresh, and reliable responses at scale.
This definitive Redis guide breaks down how to architect a real-time data layer that keeps user experiences snappy, AI agents responsive, and data up to date across your stack.
Inside, you’ll learn:
How to get your apps from “fast” to truly real-time
The role of Redis in low-latency caching, vector search, AI agent memory, and streaming workloads
Real-world patterns from companies using Redis to cut latency, reduce drop-offs, and keep users in flow
Note: This article is written in collaboration with the Shopify engineering team. Special thanks to the Shopify engineering team for sharing details with us about their Black Friday Cyber Monday preparation work and also for reviewing the final article before publication. All credit for the technical details shared in this article goes to the Shopify Engineering Team.
Black Friday Cyber Monday (BFCM) 2024 was massive for Shopify. The platform processed 57.3 petabytes of data, handled 10.5 trillion database queries, and peaked at 284 million requests per minute on its edge network. On app servers alone, they handled 80 million requests per minute while pushing 12 terabytes of data every minute on Black Friday.
Here’s the interesting part: this level of traffic is now the baseline for Shopify. And BFCM 2025 was even bigger, serving 90 petabytes of data, handling 1.75 trillion database writes with peak performance at 489 million requests per minute. This is why Shopify rebuilt its entire BFCM readiness program from scratch.
The preparation involved thousands of engineers working for nine months, running five major scale tests.
In this article, we will look at how Shopify prepared for success during the Super Bowl of commerce
Shopify’s BFCM preparation started in March with a multi-region strategy on Google Cloud.
The engineering team organized the work into three parallel tracks that run simultaneously and influence each other:
Capacity Planning involves modeling traffic patterns using historical data and merchant growth projections. The team submits these estimates to their cloud providers early so the providers can ensure they have enough physical infrastructure available. This planning defines how much computing power Shopify needs and where it needs to be located geographically.
The Infrastructure Roadmap is where the team reviews their technology stack, evaluates what architectural changes are needed, and identifies system upgrades required to hit their target capacity. This track helps sequence all the work ahead. Importantly, Shopify never uses BFCM as a release deadline. Every architectural change and migration happens months before the critical window.
Risk Assessments use “What Could Go Wrong” exercises to document failure scenarios. The team sets escalation priorities and generates inputs for what they call Game Days. This intelligence helps them test and harden systems well in advance.
These three tracks constantly feed into each other. For example, risk findings might reveal capacity gaps the team didn’t account for. Infrastructure changes might introduce new risks that need assessment. In other words, it’s a continuous feedback loop.
To assess risks properly, the Shopify engineering team runs Game Days. These are chaos engineering exercises that intentionally simulate production failures at the BFCM scale.
The team started hosting Game Days in early spring. This involves deliberately injecting faults into the systems to test how they respond under failure conditions. Think of it like a fire drill, but for software.
During these Game Days, the engineering team focuses extra attention on what they call “critical journeys”. These are the most business-critical paths through their platform: checkout, payment processing, order creation, and fulfillment. If these break during BFCM, merchants lose sales immediately.
Critical Journey Game Days run cross-system disaster simulations. Here are some common aspects that are tested by the team:
The team tests search and pages endpoints while randomizing navigation to mimic real user behavior. They inject network faults and latency to see what happens when services can’t communicate quickly.
They bust caches to create realistic load patterns instead of the artificially fast responses you get when everything is cached.
Frontend teams run bug bashes during these exercises. They identify regressions, test critical user flows, and validate that the user experience holds up under peak load conditions.
These exercises build muscle memory for incident response by exposing gaps in operational playbooks and monitoring tools.
Most importantly, Shopify closes those gaps well ahead of BFCM instead of discovering them when merchants need the platform most. All findings from Game Days feed into what Shopify calls the Resiliency Matrix. This is centralized documentation that tracks vulnerabilities, incident response procedures, and fixes across the entire platform.
The Resiliency Matrix includes five key components.
First is service status, showing the current operational state of all critical services.
Second is failure scenarios that document how things can break and what the impact would be.
Third is recovery procedures, listing expected recovery time objectives and detailed runbooks for fixing issues.
Fourth is operational playbooks with step-by-step incident response guides.
Fifth is on-call coverage showing team schedules and PagerDuty escalation paths.
The Matrix becomes the roadmap for system hardening before BFCM. Teams update it continuously throughout the year, documenting resilience improvements as they go.
Game Days test components in isolation, but Shopify also needs to know if the entire platform can handle BFCM volumes. That’s where load testing comes in.
The engineering team built a tool called Genghis that runs scripted workflows mimicking real user behavior. It simulates browsing, adding items to the cart, and going through checkout flows. The tool gradually ramps up traffic until something breaks, which helps the team find their actual capacity limits.
Tests run on production infrastructure simultaneously from three Google Cloud regions: us-central, us-east, and europe-west4. This simulates global traffic patterns accurately. Genghis also injects flash sale bursts on top of baseline load to test peak capacity scenarios.
Shopify pairs Genghis with Toxiproxy, an open-source framework they built for simulating network conditions. Toxiproxy injects network failures and partitions that prevent services from reaching each other. For reference, a network partition is when two parts of your system lose the ability to communicate, even though both are still running.
During tests, teams monitor dashboards in real time and are ready to abort if systems begin to degrade. Multiple teams coordinate to find and fix bottlenecks as they emerge.
When load testing reveals limits, teams have three options:
Horizontal scaling means adding more instances of the application.
Vertical scaling means giving each instance more resources, such as CPU and memory.
Optimizations mean making architecture-level changes that improve performance, ranging from better database queries to performance tuning across consuming layers up to the frontend.
These decisions set the final BFCM capacity and drive optimization work across Shopify’s entire stack. The key insight is that the team cannot wait until BFCM to discover the capacity limits. It takes months of preparation to scale infrastructure and optimize code.
BFCM tests every system at Shopify, but 2025 presented a unique challenge. Part of their infrastructure had never experienced holiday traffic, which creates a problem: how do you prepare for peak load when you have no historical data to model from?
In 2024, Shopify’s engineering team rebuilt its entire analytics platform. They created new ETL pipelines. ETL stands for Extract, Transform, Load, which is the process of pulling data from various sources, processing it, and storing it somewhere useful. They also switched the persistence layer and replaced their legacy system with completely new APIs.
This created an asymmetry. The ETL pipelines ran through BFCM 2024, so the team had one full season of production data showing how those pipelines perform under holiday load. But their API layer launched after peak season ended. They were preparing for BFCM on APIs that had never seen holiday traffic.
This matters a lot because during BFCM, merchants obsessively check their analytics. They want real-time sales numbers, conversion rates, traffic patterns, and data about popular products. Every single one of these queries hits the API layer. If those APIs can’t handle the load, merchants lose visibility during their most critical sales period.
Shopify ran Game Days specifically for the analytics infrastructure. These were controlled experiments designed to reveal failure modes and bottlenecks. The team simulated increased traffic loads, introduced database latency, and tested cache failures to systematically map how the system behaves under stress.
The results showed four critical issues that needed fixes:
First, the ETL pipelines needed Kafka partition increases to maintain data freshness during traffic spikes. Apache Kafka is a distributed streaming platform that handles real-time data flows. More partitions mean more parallel processing, which keeps data fresh for the APIs to serve.
Second, the API layer memory usage required optimization. The team found this through profiling, which means measuring exactly how the code uses memory. Each API request was using too much memory. Under high load, this would cause out-of-memory errors, slower response times, or complete crashes.
Third, connection timeouts needed tuning to prevent pool exhaustion. A connection pool is a set of reusable database connections. Creating new connections is expensive, so applications reuse them. The problem was that timeouts were too long, meaning connections would get stuck waiting. Under high load, you run out of available connections, and new requests start failing. Shopify tuned the timeouts to release connections faster.
Fourth, the team split API requests through a different load balancer approach. Originally, API requests would all enqueue to one region, which added latency and load. By scaling up the secondary region’s cluster and updating the load balancing policy, they better distributed the work and prevented API servers from being overwhelmed.
Beyond the performance fixes, the team validated alerting and documented response procedures. Their teams were trained and prepared to handle failures during the actual event.
Game Days and load testing prepare individual components, but scale testing is different. It validates the entire platform working together at BFCM volumes, revealing issues that only surface when everything runs at capacity simultaneously.
From April through October, Shopify ran five major scale tests at their forecasted traffic levels, specifically their peak p90 traffic assumptions. In statistics, p90 means the 90th percentile, or the traffic level that 90% of requests will be below.
Here are the details of those scale tests:
The first two tests validated baseline performance against 2024’s actual numbers.
Tests three through five ramped up to 2025 projections, targeting 150% of last year’s load.
By the fourth test, Shopify hit 146 million requests per minute and over 80,000 checkouts per minute. On the final test of the year, they tested their p99 scenario, which reached 200 million requests per minute.
These tests are extraordinarily large, and therefore, Shopify runs them at night and coordinates with YouTube because the tests impact shared cloud infrastructure. The team tested resilience, not just raw load capacity. They executed regional failovers, evacuating traffic from core US and EU regions to validate their disaster recovery procedures actually work.
Shopify ran four types of tests:
Architecture scale-up tests validated that their infrastructure handles planned capacity.
Load tests during normal operations established baseline performance at peak load.
Load tests with failover validated disaster recovery and cross-region failover capabilities.
Game Day simulations tested cross-system resilience through chaos engineering.
The team simulated real user behavior, such as storefront browsing and checkout, admin API traffic from apps and integrations, analytics and reporting loads, and backend webhook processing. They also tested critical scenarios like sustained peak load, regional failover, and cascading failures where multiple systems fail simultaneously.
Each test cycle identified issues that would never appear under steady-state load, and the team fixed each issue as it emerged. Some of the key issues were as follows:
Scale Tests 1 and 2 revealed that under heavy load, core operations threw errors, and checkout queues backed up.
Scale Test 3 validated key migrations and confirmed that regional routing behaved as expected after infrastructure changes.
Scale Test 4 hit limits that triggered an unplanned failover, identifying priority issues in test traffic routing and discovering delays when bringing regions back online during rebalancing.
Scale Test 5 performed a full dress rehearsal and was the only test run during North American business hours to simulate real BFCM conditions. All the other tests ran at night.
Mid-program, Shopify made an important shift. They added authenticated checkout flows to their test scenarios. Modeling real logged-in buyers exposed rate-limiting code paths that anonymous browsing never touches. Even though authenticated flows were a small percentage of traffic, they revealed bottlenecks that would have caused problems during the real event.
BFCM preparation gets Shopify ready, but operational excellence keeps them steady when traffic actually spikes.
The operational plan coordinates engineering teams, incident response, and live system tuning. Here are the key components of this plan:
The plan for BFCM weekend includes real-time monitoring with dashboard visibility across all regions and automated alerts.
For incident response, Incident Manager OnCall teams provide 24/7 coverage with clear escalation paths.
Merchant communications ensure stores get status updates and notifications about any issues.
Live optimization allows system tuning based on real-time traffic patterns as they develop.
After BFCM ends, the post-mortem process correlates monitoring data with actual merchant outcomes to understand what worked and what needs improvement.
The philosophy is simple: preparation gets you ready, but operational excellence keeps you steady.
Shopify’s 2025 BFCM readiness program shows what systematic preparation looks like at scale. Thousands of engineers worked for nine months, running five major scale tests that pushed their infrastructure to 150% of expected load. They executed regional failovers, ran chaos engineering exercises, documented system vulnerabilities, and hardened systems with updated runbooks before merchants needed them.
What makes this different from typical pre-launch preparation is the systematic approach. Most companies load test once, maybe twice, fix critical bugs, and hope for the best. Shopify spent nine months continuously testing, finding breaking points, fixing issues, and validating that the fixes actually work.
Also, the tools Shopify built aren’t temporary BFCM scaffolding. The Resiliency Matrix, Critical Journey Game Days, and real-time adaptive forecasting became permanent infrastructure improvements. They make Shopify more resilient every day, not just during peak season.
To provide a visualization of BFCM, Shopify also launched an interesting pinball game to showcase the Shopify Live Globe. The game itself runs at 120fps in a browser with a full 3d environment, physics engine, and VR Support. Behind the scenes, the game is a three[dot]js app built with “react-three-fiber”. Every merchant sale shows up a few seconds later on this globe. Everyone can check out the game and the visualization on the homepage for Shopify Live Globe
References:
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-23 00:30:45
Static training data can’t keep up with fast-changing information, leaving your models to guess. We recommend this technical guide from You.com, which gives developers the code and framework to connect GenAI apps to the live web for accurate, real-time insights.
What you’ll get:
A step-by-step Python tutorial to integrate real-time search with a single GET request
The exact code logic to build a “Real-Time Market Intelligence Agent” that automates daily briefings
Best practices for optimizing latency, ensuring zero data retention, and establishing traceability
Turn “outdated” into “real-time.”
For a long time, AI systems were specialists confined to a single sense. For example:
Computer vision models could identify objects in photographs, but couldn’t describe what they saw.
Natural language processing systems could write eloquent prose but remained blind to images.
Audio processing models could transcribe speech, but had no visual context.
This fragmentation represented a fundamental departure from how humans experience the world. Human cognition is inherently multimodal. We don’t just read text or just see images. We simultaneously observe facial expressions while listening to the tone of voice. We connect the visual shape of a dog with the sound of a bark and the written word “dog.”
To create AI that truly operates in the real world, these separated sensory channels needed to converge.
Multimodal Large Language Models represent this convergence. For example, GPT-4o can respond to voice input in just 232 milliseconds, matching human conversation speed. Google’s Gemini can process an entire hour of video in a single prompt.
These capabilities emerge from a single unified neural network that can see, hear, and read simultaneously.
But how does a single AI system understand such fundamentally different types of data? In this article, we try to answer this question.
What if you could spend most of your IT resources on innovation, not maintenance?
The latest report from the IBM Institute for Business Value explores how businesses are using intelligent automation to get more out of their technology, drive growth & cost the cost of complexity.
The core breakthrough behind multimodal LLMs is quite simple. Every type of input, whether text, images, or audio, gets converted into the same type of mathematical representation called embedding vectors. Just as human brains convert light photons, sound waves, and written symbols into uniform neural signals, multimodal LLMs convert diverse data types into vectors that occupy the same mathematical space.
Let us consider a concrete example. A photograph of a dog, the spoken word “dog,” and the written text “dog” all get transformed into points in a high-dimensional mathematical space. These points cluster together, close to each other, because they represent the same concept.
This unified representation enables what researchers call cross-modal reasoning. The model can understand that a barking sound, a photo of a golden retriever, and the sentence “the dog is happy” all relate to the same underlying concept. The model doesn’t need separate systems for each modality. Instead, it processes everything through a single architecture that treats visual patches and audio segments just like text tokens.
The diagram below shows the high-level view of a multimodal LLM works:
Modern multimodal LLMs consist of three essential components working together to process diverse inputs.
The first component handles the translation of raw sensory data into initial mathematical representations.
Vision Transformers process images by treating them like sentences, dividing photographs into small patches and processing each patch as if it were a word.
Audio encoders convert sound waves into spectrograms, which are visual-like representations showing how frequencies change over time.
These encoders are typically pre-trained on massive datasets to become highly skilled at their specific tasks.
The second component acts as a bridge. Even though both encoders produce vectors, these vectors exist in different mathematical spaces. In other words, the vision encoder’s representation of “cat” lives in a different geometric region than the language model’s representation of the word “cat.”
Projection layers align these different representations into the shared space where the language model operates. Often, these projectors are surprisingly simple, sometimes just a linear transformation or a small two-layer neural network. Despite their simplicity, they’re crucial for enabling the model to understand visual and auditory concepts.
The third component is the core LLM, such as GPT or LLaMA.
This is the “brain” that does the actual reasoning and generates responses. It receives all inputs as sequences of tokens, whether those tokens originated from text, image patches, or audio segments.
The language model treats them identically, processing everything through the same transformer architecture that powers text-only models. This unified processing is what allows the model to reason across modalities as naturally as it handles pure text.
See the diagram below that shows the transformers architecture:
The breakthrough that enabled modern multimodal vision came from a 2020 paper with a memorable title: “An Image is Worth 16x16 Words.” This paper introduced the idea of processing images exactly like sentences by treating small patches as tokens.
The process works through several steps:
First, the image gets divided into a grid of fixed-size patches, typically 16x16 pixels each.
A standard 224x224 pixel image becomes approximately 196 distinct patches, each representing a small square region.
Each patch is flattened from a 2D grid into a 1D vector of numbers representing pixel intensities.
Positional embeddings are added so the model knows where each patch came from in the original image.
These patch embeddings flow through transformer layers, where attention mechanisms allow patches to learn from each other.
The attention mechanism is where understanding emerges. A patch showing a dog’s ear learns it connects to nearby patches showing the dog’s face and body. Patches depicting a beach scene learn to associate with each other to represent the broader context of sand and water. By the final layer, these visual tokens carry rich contextual information. The model doesn’t just see “brown pixels” but understands “golden retriever sitting on beach.”
The second critical innovation was CLIP, developed by OpenAI. CLIP revolutionized how vision encoders are trained by changing the fundamental objective. Instead of training on labeled image categories, CLIP was trained on 400 million pairs of images and their text captions from the internet.
CLIP uses a contrastive learning approach. Given a batch of image-text pairs, it computes embeddings for all images and all text descriptions. The goal is to maximize the similarity between embeddings of correct image-text pairs while minimizing similarity between incorrect pairings. An image of a dog should produce a vector close to the caption “a dog in the park” but far from “a plate of pasta.”
Audio presents unique challenges for language models.
Unlike text, which naturally divides into discrete words, or images, which can be divided into spatial patches, sound is continuous and temporal. For example, a 30-second audio clip sampled at 16,000 Hz contains 480,000 individual data points. Feeding this massive stream of numbers directly into a transformer is computationally impossible and inefficient. The solution requires converting audio into a more tractable representation.
The key innovation is transforming audio into spectrograms, which are essentially images of sound. The process involves several mathematical transformations:
The long audio signal gets sliced into tiny overlapping windows, typically 25 milliseconds each.
A Fast Fourier Transform extracts which frequencies are present in each window
These frequencies are mapped onto the mel scale, which matches human hearing sensitivity by giving more resolution to lower frequencies
The result is a 2D heat map where time runs along one axis, frequency along the other, and color intensity represents volume
This mel-spectrogram looks like an image to the AI model. For a 30-second clip, this might produce an 80x3,000 grid, which is essentially a visual representation of acoustic patterns that can be processed similarly to photographs.
Once audio is converted to a spectrogram, models can apply the same techniques used for vision. The Audio Spectrogram Transformer divides the spectrogram into patches, just as an image is divided. For example, models like Whisper, trained on 680,000 hours of multilingual audio, excel at this transformation.
The training process goes through different stages:
Training a multimodal LLM typically happens in two distinct stages.
The first stage focuses purely on alignment, teaching the model that visual and textual representations of the same concept should be similar. During this stage, both the pre-trained vision encoder and the pre-trained language model remain frozen. Only the projection layer’s weights get updated through training.
Alignment alone isn’t sufficient for practical use. A model might describe what’s in an image but fail at complex tasks like “Why does the person look sad?” or “Compare the two charts”.
Visual instruction tuning addresses this by training the model to follow sophisticated multimodal instructions.
During this stage, the projection layer continues training and the language model is also updated, often using parameter-efficient methods. The training data shifts to instruction-response datasets formatted as conversations.
An important innovation here was using GPT-4 to generate synthetic training data. Researchers fed GPT-4 textual descriptions of images and prompted it to create realistic conversations about those images. Training on this synthetic but high-quality data effectively distills GPT-4’s reasoning capabilities into the multimodal model, teaching it to engage in nuanced visual dialogue rather than just describing what it sees.
Multimodal LLMs achieve their remarkable capabilities through a unifying principle. By converting all inputs into sequences of embedding vectors that occupy a shared mathematical space, a single transformer architecture can reason across modalities as fluidly as it processes language alone.
The architectural innovations powering this capability represent genuine advances: Vision Transformers treating images as visual sentences, contrastive learning aligning modalities without explicit labels, and cross-attention enabling selective information retrieval across different data types.
The future points toward any-to-any models that can both understand and generate all modalities. In other words, a model that outputs text, generates images, and synthesizes speech in a single response.
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-21 00:31:00
If slow QA processes bottleneck you or your software engineering team and you’re releasing slower because of it — you need to check out QA Wolf.
QA Wolf’s AI-native service supports web and mobile apps, delivering 80% automated test coverage in weeks and helping teams ship 5x faster by reducing QA cycles to minutes.
QA Wolf takes testing off your plate. They can get you:
Unlimited parallel test runs for mobile and web apps
24-hour maintenance and on-demand test creation
Human-verified bug reports sent directly to your team
Zero flakes guaranteed
The benefit? No more manual E2E testing. No more slow QA cycles. No more bugs reaching production.
With QA Wolf, Drata’s team of 80+ engineers achieved 4x more test cases and 86% faster QA cycles.
This week’s system design refresher:
Evolution of HTTP
System Performance Metrics Every Engineer Should Know
Why Is Nginx So Popular?
Network Debugging Commands Every Engineer Should Know
Hub, Switch, & Router Explained
SPONSOR US
The Hypertext Transfer Protocol (HTTP) has evolved over the years to meet the needs of modern applications, from simple text delivery to high-performance, real-time experiences.
Here is how HTTP has progressed:
HTTP/0.9: Built to fetch simple HTML documents with a single GET request.
HTTP/1.0: Added headers and status codes to support richer interactions, but every request still required a new connection.
HTTP/1.1: Introduced persistent connections and more methods, making the web faster and more efficient for everyday browsing.
HTTP/2: Solved performance bottlenecks with multiplexing, enabling multiple requests to share one connection.
HTTP/3 (QUIC): Shifted to UDP with QUIC to reduce latency and improve reliability, especially for mobile and real-time apps.
Over to you: Are you already taking advantage of HTTP/3 in your projects?
Code from Claude is about to hit prod, but it doesn’t have to be painful.
Engineering teams at Coinbase, Toast, Gametime, MSCI, and Zscaler use Resolve AI to resolve incidents, optimize costs, and build with production context using AI that works across code, infra, and telemetry
The results mean 70% faster MTTR, 30% fewer engineers pulled in per incident, and thousands of saved engineering hours. Imagine what you could ship with that time in 2026
Learn more about AI for prod, workflow-autonomous multi-agent systems, and how you can cut orchestration tax, improve investigations, and shift engineering time from grunt work to great work.
Your API is slow. But how slow, exactly? You need numbers. Real metrics that tell you what's actually broken and where to fix it.
Here are the four core metrics every engineer should know when analyzing system performance:
Queries Per Second (QPS): How many incoming requests your system handles per second. Your server gets 1,000 requests in one second? That's 1,000 QPS. Sounds straightforward until you realize most systems can't sustain their peak QPS for long without things starting to break.
Transactions Per Second (TPS): How many completed transactions your system processes per second. A transaction includes the full round trip, i.e., the request goes out, hits the database, and comes back with a response.
TPS tells you about actual work completed, not just requests received. This is what your business cares about.
Concurrency: How many simultaneous active requests your system is handling at any given moment. You could have 100 requests per second, but if each takes 5 seconds to complete, you're actually handling 500 concurrent requests at once.
High concurrency means you need more resources, better connection pooling, and smarter thread management.
Response Time (RT): The elapsed time from when a request starts until the response is received. Measured at both the client level and server level.
A simple relationship ties them all together: QPS = Concurrency ÷ Average Response Time
More concurrency or lower response time = higher throughput.
Over to you: When you analyze performance, which metric do you look at first, QPS, TPS, or Response Time?
Apache dominated web servers for 20 years, then Nginx showed up and changed everything. Now Nginx powers some of the largest sites on the internet, including Netflix, Airbnb, Dropbox, and WordPress. com. Not because it's newer or trendier, but because it solves problems that Apache couldn't handle efficiently.
Here’s what makes Nginx so popular:
High-Performance Web Server
Reverse Proxy & Load Balancer
Caching Layer
SSL Termination (Offloading)
Over to you: What’s your primary use for Nginx today, web server, reverse proxy, or load balancer?
When someone says “It’s a network issue,” these commands help you find what’s wrong fast.
ping: Checks if the destination responds and reports the round-trip time for basic reachability.
traceroute / tracert: Shows each hop on the path so you can see where packets slow down or stop.
mtr / pathping: Continuously measures latency and loss per hop to catch intermittent issues.
ip addr, ip link / ipconfig /all: Prints local IPs, MACs, and interface status so you can verify the machine’s network identity.
ip route: Reveals the routing table to confirm which gateway and next hop the system will use.
ip neigh: Displays IP-to-MAC entries to detect duplicates or stale ARP records on the LAN.
ss -tulpn: Lists listening sockets and PIDs so you can confirm a service is actually bound to the expected port.
dig: Resolves DNS records to verify the exact IPs clients will connect to.
curl -I: Fetches only HTTP(S) headers to check status codes, redirects, and cache settings.
tcpdump / tshark: Captures packets so you can inspect real traffic and validate what’s sent and received.
iperf3: Measures end-to-end throughput between two hosts to separate bandwidth limits from app issues.
ssh: Opens a secure shell on the remote machine to run checks and apply fixes directly.
sftp: Transfers files securely so you can pull logs or push artifacts during an incident.
nmap: Scans open ports and probes versions to confirm which services are exposed and responding.
Over to you: What's your go-to command when debugging network issues?
Every home and office network relies on these three devices, hub, switch, and router, yet their roles are often mixed up.
A hub operates at Layer 1 (Physical Layer). It’s the simplest of the three, it doesn’t understand addresses or data types. When a packet arrives, it simply broadcasts it to every connected device, creating one big collision domain. That means all devices compete for the same bandwidth, making hubs inefficient in modern networks.
A switch works at Layer 2 (Data Link Layer). It learns MAC addresses and forwards frames only to the correct destination device. Each port on a switch acts as its own collision domain, improving efficiency and speeding up communication within a LAN.
A router operates at Layer 3 (Network Layer). It routes packets based on IP addresses and connects different networks together, for example, your home network to the Internet. Each router interface forms a separate broadcast domain, keeping local and external traffic isolated.
Understanding how these three layers work together is the foundation of every modern network, from your home Wi-Fi to the global Internet backbone.
Over to you: How do you usually figure out whether a network issue is caused by the router or the switch?
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-19 00:31:12
In a monolithic application, a function call is a local, in-memory process. Aside from a catastrophic hardware failure or a process crash, the execution of a function is essentially guaranteed. If the process is alive, the call succeeds.
However, in distributed systems, this guarantee does not hold. Components communicate over physical networks that are inherently unreliable. This reality is captured in the “Fallacies of Distributed Computing,” specifically the first fallacy: “The network is reliable”. In truth, it is not. A request sent from Service A to Service B may fail not because Service B is broken, but simply because the communication medium momentarily faltered.
This creates a need for defensive programming patterns, and one of the primary mechanisms we use is the Retry pattern. By automatically retrying a failed operation, a system can trade latency for availability, turning what would have been a failed user request into a successful one.
However, retries are both essential and dangerous in distributed systems. On the one hand, they transform unreliable networks into reliable ones. But on the other hand, indiscriminate retries can lead to latency amplification, resource exhaustion, and cascading failures that can take down entire platforms.
In this article, we will explore the retry pattern in depth, understand when and how to use it safely and effectively.