2025-12-18 00:30:16
Code reviews are critical but time-consuming. CodeRabbit acts as your AI co-pilot, providing instant Code review comments and potential impacts of every pull request.
Beyond just flagging issues, CodeRabbit provides one-click fix suggestions and lets you define custom code quality rules using AST Grep patterns, catching subtle issues that traditional static analysis tools might miss.
CodeRabbit has so far reviewed more than 10 million PRs, installed on 2 million repositories, and used by 100 thousand Open-source projects. CodeRabbit is free for all open-source repo’s.
Disclaimer: The details in this post have been derived from the details shared online by the Meta Engineering Team. All credit for the technical details goes to the Meta Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
When Meta announced in Q2 2025 that its new Generative Ads Model (GEM) had driven a 5% increase in ad conversions on Instagram and a 3% increase on Facebook Feed, the numbers might have seemed modest.
However, at Meta’s scale, these percentages translate to billions of dollars in additional revenue and represent a fundamental shift in how AI-powered advertising works.
GEM is the largest foundation model ever built for recommendation systems. It has been trained at the scale typically reserved for large language models like GPT-4 or Claude. Yet here’s the paradox: GEM is so powerful and computationally intensive that Meta can’t actually use it directly to serve ads to users.
Instead, the company developed a teacher-student architecture that lets smaller, faster models benefit from GEM’s intelligence without inheriting its computational cost.
In this article, we look at how the Meta engineering team built GEM and the challenges they overcame.
Bugs sneak out when less than 80% of user flows are tested before shipping. However, getting that kind of coverage (and staying there) is hard and pricey for any team.
QA Wolf’s AI-native solution provides high-volume, high-speed test coverage for web and mobile apps, reducing your organization’s QA cycle to minutes.
They can get you:
80% automated E2E test coverage in weeks—not years
Unlimited parallel test runs
24-hour maintenance and on-demand test creation
Zero flakes, guaranteed
The benefit? No more manual E2E testing. No more slow QA cycles. No more bugs reaching production.
With QA Wolf, Drata’s team of engineers achieved 4x more test cases and 86% faster QA cycles.
⭐ Rated 4.8/5 on G2
Every day, billions of users scroll through Facebook, Instagram, and other Meta platforms, generating trillions of potential ad impression opportunities. Each impression represents a decision point: which ad, from millions of possibilities, should be shown to this specific user at this particular moment? Getting this wrong means wasting advertiser budgets on irrelevant ads and annoying users with content they don’t care about. Getting it right creates value for everyone involved.
Traditional ad recommendation systems struggled with this in several ways. Some systems treated each platform separately, which meant that insights about user behavior on Instagram couldn’t inform predictions on Facebook. This siloed approach missed valuable cross-platform patterns. Other systems tried to treat all platforms identically, ignoring the fact that people interact with Instagram Stories very differently from how they browse Facebook Feed. Neither approach was optimal.
The data complexity also compounds these challenges in the following ways:
Meaningful signals like clicks and conversions are extremely sparse compared to total impression volume.
User features are dynamic and constantly changing.
The system must process multimodal inputs, including text, images, video, and complex behavioral sequences.
Traditional models had severe memory limitations, typically only considering a user’s last 10 to 20 actions.
GEM’s goal was to create a unified intelligence that understands users holistically across Meta’s entire ecosystem, learning from long behavioral histories and complex cross-platform patterns while maintaining the nuance needed to optimize for each specific surface and objective.
GEM’s architecture processes user and ad information through three complementary systems, each handling a different aspect of the prediction problem.
The first system handles what Meta calls non-sequence features, which are essentially static attributes and their combinations. These include user demographics like age and location, user interests, ad characteristics like format and creative content, and advertiser objectives.
The challenge here isn’t just knowing these individual features but understanding how they interact. For example, a 25-year-old tech worker has very different purchasing patterns than a 25-year-old teacher, even if they share some interests. The system needs to learn which combinations of features actually matter.
GEM uses an enhanced version of the Wukong architecture with stackable factorization machines that can scale both vertically for deeper interactions and horizontally for broader feature coverage. This architecture works through multiple stacked layers, where each successive layer learns increasingly complex patterns from the simpler patterns discovered by previous layers. For instance, an early layer might discover the basic pattern that young professionals respond well to tech product ads. A layer deeper in the stack builds on this by learning that young professionals in urban areas who show interest in fitness respond especially well to smart wearable ads. An even deeper layer might refine this further, discovering that this combination works best specifically when those ads emphasize data tracking features rather than fashion elements.
The second system handles sequence features, which capture the timeline of user behavior. A user’s actions don’t exist in isolation. They tell a story with order and meaning. Someone who clicked on home workout content, then searched for gyms nearby, then viewed several gym websites, then researched membership costs is clearly on a specific journey. Traditional architectures struggled to process long sequences efficiently because the computational cost grows rapidly with sequence length.
GEM overcomes this with a pyramid-parallel structure. Think of it as processing your behavior history in chunks at the bottom level, then combining those chunks into broader patterns at middle levels, and finally synthesizing everything into a complete journey understanding at the top level. Multiple chunks can be processed simultaneously rather than sequentially, which dramatically improves efficiency.
The breakthrough here is scale. GEM can now analyze thousands of your past actions rather than just the most recent handful. This extended view reveals patterns that shorter windows simply cannot capture, like the progression from casual interest to serious purchase intent that might develop over months.
See the diagram below:
The third system, called InterFormer, handles cross-feature learning by connecting your static profile with your behavioral timeline. This is where GEM’s intelligence really becomes evident. Previous approaches would compress your entire behavior history into a compact summary vector (like reducing an entire novel to a single rating). This compression inevitably loses critical details about your journey.
InterFormer takes a different approach using an interleaving structure. It alternates between layers that focus purely on understanding your behavior sequence and layers that connect those behaviors to your profile attributes.
The first sequence layer might identify that you’ve shown increasing interest in fitness over time.
The first cross-feature layer then considers how your age, income, and location context shape what that fitness interest means.
The second sequence layer re-examines your behavior with these new insights and might notice that your fitness research intensified after a gym opened near your workplace.
The second cross-feature layer then makes even deeper connections about purchase intent and timing.
This alternating process continues through multiple layers, with each cycle refining understanding without losing access to the complete behavioral record.
Despite GEM’s obvious strengths, Meta faced a fundamental engineering challenge in using GEM.
GEM is enormous and trained using thousands of GPUs over extended periods. Running GEM directly for every ad prediction would be impossibly slow and expensive. When a user scrolls through Instagram, the system needs to make ad decisions in tens of milliseconds. GEM simply cannot operate at that speed while serving billions of users simultaneously.
Meta’s solution was a teacher-student architecture where GEM acts as the master teacher that trains hundreds of smaller, faster Vertical Models (VMs) that actually serve ads in production. These VMs are specialized for specific contexts like Instagram Stories click prediction or Facebook Feed conversion prediction. Each VM is lightweight enough to make predictions in milliseconds, but they’re much smarter than they would be if trained independently because they learn from GEM.
The knowledge transfer happens through two strategies. Direct transfer works when a VM operates in the same domain where GEM was trained, with similar data and objectives. GEM can teach these models directly. Hierarchical transfer applies when VMs work in specialized areas quite different from GEM’s training domain. In these cases, GEM first teaches medium-sized domain-specific foundation models for areas like Instagram or Facebook Marketplace. These domain models then teach the even smaller VMs. The knowledge flows down through levels, getting adapted and specialized at each stage.
Meta employs three sophisticated techniques to maximize transfer efficiency:
Knowledge distillation with Student Adapter: Student models learn to replicate GEM’s reasoning process, not just final predictions. The Student Adapter refines GEM’s predictions using recent ground-truth data, adjusting for timing delays and domain-specific differences.
Representation learning: Creates a shared conceptual framework between teacher and students. GEM learns to encode information in ways that transfer well across different model sizes, adding no computational overhead during ad serving.
Parameter sharing: This lets VMs selectively incorporate specific components directly from GEM. Small VMs stay fast while borrowing GEM’s sophisticated components for complex user understanding tasks.
Together, these three techniques achieve twice the effectiveness of standard knowledge distillation alone. The continuous improvement cycle works like this:
Users interact with fast VMs in real time
Their engagement data flows back into Meta’s data pipelines
GEM periodically re-trains on this fresh data, updated knowledge transfers to VMs through the post-training techniques, and
Improved VMs get deployed to production.
This cycle repeats continuously, with GEM getting smarter and VMs getting regular intelligence updates.
Building GEM required Meta to rebuild its training infrastructure from the ground up.
The challenge was training a model at LLM scale, but for the fundamentally different task of recommendation rather than language generation. The company achieved a 23x increase in effective training throughput while using 16x more GPUs and simultaneously improving hardware efficiency by 1.43x.
This required innovations across multiple areas. Multi-dimensional parallelism orchestrates how thousands of GPUs work together, splitting the model’s dense components using techniques like Hybrid Sharded Distributed Parallel while handling sparse components like embedding tables through a combination of data and model parallelism. The goal was to ensure every GPU stayed busy with minimal idle time waiting for communication from other GPUs.
System-level optimizations pushed GPU utilization even higher:
Custom GPU kernels designed for variable-length user sequences, fusing operations to reduce memory bandwidth bottlenecks.
PyTorch 2.0 graph-level compilation automates optimizations like activation checkpointing and operator fusion.
Memory compression, including FP8 quantization to reduce the footprint without impacting accuracy.
NCCLX communication collectives that handle inter-GPU communication without consuming the main compute resources.
The efficiency gains extended beyond raw training speed.
Meta reduced job startup time by 5x through optimizations in trainer initialization, data reader setup, and checkpointing. They cut PyTorch 2.0 compilation time by 7x using intelligent caching strategies. These might seem like minor details, but when you’re training models that cost millions of dollars in compute resources, every percentage point of efficiency improvement matters enormously.
The result is a training system that can iterate rapidly on GEM, incorporating new data and architectural improvements at a pace that would have been impossible with previous infrastructure. This enables Meta to keep GEM at the frontier of recommendation AI while controlling costs enough to make the massive investment worthwhile.
Meta’s roadmap for GEM extends well beyond its current capabilities.
The next major evolution involves true multimodal learning, where GEM processes text, images, audio, and video together rather than treating them as separate input streams. This will enable an even richer understanding of both user preferences and ad creative effectiveness across all content types. The company is also exploring inference-time scaling, which would allow the system to dynamically allocate more computational resources to difficult predictions while handling straightforward cases more efficiently.
Perhaps most ambitiously, Meta envisions a unified engagement model that ranks both organic content and ads using the same underlying intelligence. This would fundamentally change how advertising integrates into social feeds, potentially creating more seamless experiences where ads feel like natural content recommendations rather than interruptions. On the advertiser side, GEM’s intelligence will enable more sophisticated agentic automation, where AI systems can manage and optimize campaigns with minimal human intervention while achieving better results.
References:
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-17 00:30:58
Enterprise customers expect SSO, Directory Sync, RBAC, and Audit Logs, but building and maintaining that infrastructure slows teams down and pulls focus from core product work.
WorkOS provides these features through simple APIs and a hosted Admin Portal that integrates with every identity provider. You get production-ready enterprise capabilities without owning the complexity yourself.
Trusted by OpenAI, Cursor, Vercel, 1000+ more. Your first million MAUs are free.
Disclaimer: The details in this post have been derived from the details shared online by the LinkedIn Engineering Team. All credit for the technical details goes to the LinkedIn Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
Recruiting is a profession that demands both strategic thinking and meticulous attention to detail. Recruiters must make high-value decisions about which candidates are the best fit for a role, but they also spend countless hours on repetitive pattern recognition tasks. Sorting through hundreds of resumes, evaluating qualifications against job requirements, and drafting personalized outreach messages are all essential activities. However, they also consume enormous amounts of time that could otherwise be spent on relationship-building and strategic hiring decisions.
LinkedIn’s Hiring Assistant represents a new approach to solving this challenge.
Rather than replacing recruiters, this AI agent is designed to handle the repetitive, time-consuming aspects of the recruiting workflow, freeing professionals to focus on what they do best: connecting with people and making critical hiring choices.
The most labor-intensive parts of recruiting fall into three main categories.
First, sourcing candidates requires searching through LinkedIn’s network of over 1.2 billion profiles to identify qualified individuals.
Second, evaluating candidates involves carefully reading resumes and profiles to assess whether each person meets the specific requirements of a role.
Third, engaging candidates means drafting and sending personalized communications to potential hires, answering their questions, and maintaining ongoing dialogue throughout the hiring process.
To address these challenges, LinkedIn built the Hiring Assistant with three core capabilities.
The system delivers value at scale by efficiently searching across billions of profiles and handling enterprise-level workloads reliably.
It enables interactive communication by understanding recruiter intent through natural conversation, asking clarifying questions when needed, and adapting its behavior based on real-time feedback.
Lastly, it also features continuous learning by improving over time based on observing what recruiters do, learning individual preferences, and remembering past interactions and decisions.
In this article, we will look at the architecture and technical building blocks of LinkedIn’s Hiring Assistant.
This holiday season, the equation is simple: everyone gets a better deal with Verizon. Best devices. Best plans. Add that to an award-winning network, and you have the best deals. Period.
Unbeatable Deal: Switch to Verizon and get four lines on Unlimited Welcome for $25 per line/month (on Auto Pay plus taxes & fees) and get four of the newest, premium devices like the iPhone 17 Pro, Samsung Galaxy S25+, or Google Pixel 10 Pro XL all on Verizon.
Enjoy flexibility and save money this holiday season because every dollar you spend matters.
Explore Holiday Deals. See here for full terms.
At its core, the Hiring Assistant is built on what LinkedIn calls a “plan-and-execute” architecture as shown in the diagram below:
To understand why this matters, it helps to know what they avoided. A simpler approach, known as ReAct, would have the AI try to handle everything at once in a single continuous loop. While straightforward, this method runs into problems when tasks get complex. Large language models, the AI systems that power tools like this, can become unreliable when asked to juggle too many things simultaneously.
See the diagram below for the ReAct pattern
.Instead, LinkedIn split the work into two distinct phases:
The Planner acts as the strategic thinker. When a recruiter makes a request, the Planner examines it from a high level, breaks the work into smaller, manageable steps, and creates a structured plan for what needs to happen. Think of it as a project manager outlining the approach before any actual work begins.
The Executor then takes over. It works through the plan step by step, using available tools to complete each task. For each step, the Executor runs its own loop of reasoning and action, figuring out what needs to happen and then making it happen.
This divide-and-conquer strategy brings several advantages:
First, it makes the system more reliable. Breaking complex recruiting workflows into discrete steps means the AI is less likely to get confused or make mistakes.
Second, it allows for better cost management. LinkedIn can use more powerful AI models for complex reasoning tasks while deploying simpler, cheaper models for straightforward steps.
Third, tasks are far more likely to be completed successfully when they are well-defined and manageable in scope.
Beyond the plan-and-execute design, the Hiring Assistant uses a message-driven architecture.
Each recruiter gets their own individual instance of the assistant, complete with its own identity and mailbox. Everything works through asynchronous messages, much like email. When a recruiter asks the assistant to find candidates, they do not have to sit and wait for results. The assistant receives the message, processes it in the background, and sends updates when ready.
This asynchronous approach is what enables the assistant to work at scale. While a recruiter focuses on other tasks, their assistant can be searching through millions of profiles, evaluating candidates, and preparing recommendations, all without requiring constant attention or supervision.
The Hiring Assistant operates in two complementary modes, each designed for different stages of the recruiting process:
Interactive Mode: When recruiters first start a new project, they work with the assistant in interactive mode. This feels like having a conversation with a colleague. Recruiters can clarify what kind of person they are looking for, refine job requirements, and get immediate feedback on their requests. The assistant shows its reasoning as it works, making the process transparent. This builds trust because recruiters can see exactly what the system is doing and correct course quickly if something seems off.
Asynchronous Mode: Once the recruiter and assistant are aligned on what success looks like, the system shifts into asynchronous mode. This is where the real power of automation comes into play. The assistant works autonomously in the background, running large-scale searches across millions of profiles, continuously updating candidate pipelines, and evaluating new applicants as they appear.
LinkedIn describes this as a “source while you sleep” capability.
The assistant can review thousands of candidates overnight, a task that would take a human recruiter weeks to complete manually.
Yet even in this autonomous mode, humans remain in control of important decisions. The assistant surfaces candidates and provides recommendations, but recruiters make the final calls about who to contact and ultimately hire. This balance between automation and human judgment is central to how the system is designed.
The Hiring Assistant is built on top of LinkedIn’s broader agent platform, a foundation of reusable components that can power any AI agent product across the company. This approach means the LinkedIn engineering team does not have to reinvent the wheel each time it builds a new intelligent system.
At the user-facing level, a client-side SDK embeds the assistant directly into recruiter workflows. This SDK creates dynamic interfaces that adapt based on what the AI needs at any given moment. It supports multiple input methods, including chat, voice, and typing assistance, while logging all interactions for future analysis and improvement.
Connecting this interface to backend services is a GraphQL API, which delivers data in structured packages called view models. These contain everything needed to display information on screen. LinkedIn calls it the agent-driven UI, where the AI itself can determine what recruiters see, dynamically adjusting the interface as tasks progress.
Rather than the traditional request-response pattern where you ask a question and wait for an answer, the system uses a push-based, event-driven architecture. It works as follows:
The user interface subscribes to updates from the agent, and when something changes, the agent publishes that update. This means the interface refreshes automatically without users needing to manually reload anything.
Long-running AI tasks are delivered through streaming responses. Instead of waiting for a complete answer, recruiters see the AI’s reasoning unfold in real time, with results appearing as soon as they become available.
If a recruiter is logged in on multiple devices, cross-session synchronization keeps everything in sync. An action taken on a phone immediately reflects on a desktop browser.
At the center of the Hiring Assistant sits what LinkedIn calls the supervisor agent. If the overall system is a team, the supervisor is the team leader who makes sure everyone works together effectively.
See the diagram below:
The supervisor handles several critical responsibilities:
It oversees workflow management for the entire hiring process, ensuring tasks move forward in the right sequence.
When a recruiter sends a message or request, the supervisor receives it and routes it to the appropriate sub-agent for handling.
It also makes judgment calls about task prioritization, deciding what requires human input versus what can be safely automated.
Beyond just delegating work, the supervisor coordinates between different sub-agents to ensure they work together smoothly. It actively observes the environment, watching for changes like new candidate activity or application submissions, and triggers appropriate actions in response.
The supervisor also manages the human-in-the-loop aspect of the system. It knows which decisions are significant enough to require human approval and surfaces those moments to recruiters.
All communication, whether from users or between sub-agents, flows through the supervisor. It serves as the central hub that keeps the entire operation organized and aligned with recruiter goals.
The Hiring Assistant divides recruiting work among several specialized sub-agents, each focused on a specific part of the workflow. This modular design allows each component to excel at its particular task while working together as a cohesive system. Let’s look at the various sub-agents in detail:
The intake agent serves as the starting point for every hiring project.
It gathers job requirements from recruiters, confirming essential details like job title, location, and seniority level. When information is missing, the agent leverages LinkedIn’s Economic Graph (a digital map of the global economy) to intelligently fill in gaps. The agent then generates specific qualifications based on successful past hires and industry knowledge, creating a clear framework for evaluating candidates.
Finding the right candidates is perhaps the most knowledge-intensive part of recruiting, and the sourcing agent approaches this challenge with multiple strategies.
It creates search queries using traditional Boolean logic (AND, OR, NOT operators), generates AI-powered queries based on hiring requirements, and draws on historical recruiter search patterns as starting points. Importantly, customer data never crosses company boundaries, maintaining strict data isolation.
What sets this agent apart is its integration with LinkedIn’s Economic Graph.
This gives it access to insights about top locations, job titles, and skills for specific talent pools. It can identify which candidates are actively looking or were recently hired, understand talent flow patterns between companies and industries, spot fast-growing companies and skill sets, flag companies experiencing layoffs, and highlight opportunities at top schools or companies with open positions. These insights help the agent find hidden gems that might otherwise be overlooked, going well beyond simple keyword matching.
The sourcing agent also implements a closed feedback loop. It combines sourcing with evaluation results, using AI reasoning to refine queries based on which candidates prove to be good matches. This allows the system to balance precision (finding exactly the right candidates) with liquidity (finding enough candidates), continuously improving the quality and volume of results over time.
Reading resumes and assessing qualifications is one of the most time-consuming tasks for recruiters.
The evaluation agent tackles this by reading candidate profiles and resumes, comparing them against job qualifications, and providing structured recommendations backed by evidence. It shows why a candidate may or may not match requirements, rather than simply offering a yes or no answer.
LinkedIn engineered this agent to address several complex challenges.
Before any evaluation begins, recruiters must review and approve the qualifications being used.
Safety checks ensure these qualifications follow responsible AI policies. The agent searches through profiles and resumes for specific evidence demonstrating how candidates meet each qualification, surfacing this evidence to recruiters for review.
To ensure accuracy, LinkedIn built quality benchmarks for testing the evaluation agent across different scenarios.
They developed custom AI models specifically optimized for qualification evaluation, as general-purpose models could not achieve the necessary combination of accuracy and speed. Using techniques like speculative decoding and custom serving infrastructure, these fine-tuned models can evaluate candidates in seconds rather than minutes, fast enough to support real-time, conversational refinement of requirements.
Once promising candidates are identified, the outreach agent handles communication.
It writes personalized messages, sends initial outreach and follow-ups, and replies to candidate questions using job-specific FAQs defined during intake. The agent can even schedule phone screenings directly through messaging, streamlining coordination.
Supporting the interview process, the screening agent prepares tailored interview questions based on hiring requirements and candidate profiles.
It can transcribe and summarize screening conversations while capturing notes and insights. Importantly, recruiters maintain full control, able to take over conversations at any time or guide the process as needed.
The learning agent enables the system to improve over time.
It analyzes recruiter actions such as which candidates they message or add to pipelines, learning from both explicit feedback and implicit behavioral signals. The agent updates job qualifications based on these patterns, but any suggested changes must be reviewed and approved by recruiters before being applied. This ensures the assistant adapts while keeping humans in control.
Finally, the cognitive memory agent gives the assistant persistent memory across interactions.
It remembers past conversations, preferences, and decisions, helping personalize recommendations over time. All memory data remains scoped to the individual recruiter’s environment with strong privacy protections.
This data is never used to train AI models, ensuring customer information stays secure and confidential.
Building an AI agent that operates at scale requires a comprehensive approach to quality that ensures the system behaves safely, responsibly, and effectively.
The LinkedIn engineering team built its quality framework on two complementary pillars:
Product policy serves as the rails that keep the system on track. These policies set clear boundaries for safety, compliance, and legal standards while defining expected agent behavior. They establish minimum quality thresholds that must be met.
To enforce these standards, LinkedIn employs AI-powered judges that evaluate different aspects of quality. Some judges check for coherence, asking whether outputs make logical sense. Others verify factual accuracy, ensuring the system does not generate false or misleading information.
Human alignment acts as the compass, ensuring the assistant moves toward genuinely valuable outcomes.
This pillar is grounded in human-validated data, including annotated datasets where people label examples, and real recruiter activity. When a recruiter messages a candidate or adds them to a pipeline, the system treats this as a strong positive signal.
Over time, the assistant learns to recommend candidates matching these recruiter-validated patterns. Human alignment also serves to validate whether product policies are actually working in practice.
LinkedIn’s Hiring Assistant demonstrates a big approach to building enterprise-grade AI agents.
By adopting a plan-and-execute architecture, the system breaks complex recruiting workflows into manageable steps, improving reliability and reducing errors. The message-driven design allows each recruiter to have their own assistant instance that works asynchronously in the background, enabling true scale.
The division of labor among specialized sub-agents ensures that each component can focus on what it does best, from sourcing and evaluation to outreach and screening. Integration with LinkedIn’s Economic Graph provides market intelligence that goes beyond simple keyword matching, helping uncover candidates who might otherwise be overlooked.
Perhaps most importantly, the system balances automation with human judgment. The quality framework keeps the assistant safe and aligned with real hiring outcomes, while the learning agent ensures continuous improvement based on individual recruiter preferences.
References:
Building the agentic future of recruiting: how we engineered LinkedIn’s Hiring Assistant
Under the hood: The tech behind the first agent from LinkedIn
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-16 00:31:21
Most companies get stuck tinkering with prompts and wonder why their agents fail to deliver dependable results. This guide from You.com breaks down the evolution of agent management, revealing the five stages for building a successful AI agent and why most organizations haven’t gotten there yet.
In this guide, you’ll learn:
Why prompts alone aren’t enough and how context and metadata unlock reliable agent automation
Four essential ways to calculate ROI, plus when and how to use each metric
Real-world challenges at each stage of agent management and how to avoid them
When we first interact with large language models, the experience is straightforward. We type a prompt, the model generates a response, and the interaction ends.
This single-turn approach works well for simple questions or basic content generation, but it quickly reveals its limitations when we tackle more complex tasks. Imagine asking an AI to analyze market trends, create a comprehensive report, and provide actionable recommendations. A single response, no matter how well-crafted, often falls short because it lacks the opportunity to gather additional information, reflect on its reasoning, or refine its output based on feedback.
This is where agentic workflows come into play.
Rather than treating AI interactions as one-and-done transactions, agentic workflows introduce iterative processes, tool integration, and structured problem-solving approaches. These workflows transform language models from sophisticated text generators into capable agents that can break down complex problems, adapt their strategies, and produce higher-quality results. The difference is similar to comparing a quick sketch to a carefully refined painting. Both have their place, but when quality and reliability matter, the iterative approach wins.
In this article, we will look at the most popular agentic workflow patterns and how they work.
An agentic workflow doesn’t just respond to a single instruction. Instead, it operates with a degree of autonomy, making decisions about how to approach a task, what steps to take, and how to adapt based on what it discovers along the way. This represents a fundamental shift in how we think about using AI systems.
Consider the difference between asking a basic chatbot and an agentic system to help write a research report. The basic chatbot receives our request and generates a report based on its training data, delivering whatever it produces in one response. An agentic system, however, might first search the web for current information on the topic, then organize the findings into themes, draft sections of the report, review each section for accuracy and coherence, revise weak areas, and finally compile everything into a polished document. Each of these steps might involve multiple sub-steps, decisions about which tools to use, and adaptations based on what the agent discovers.
What makes workflows truly agentic are the iteration and feedback loops built into the process. Instead of generating output in a single pass, agentic workflows involve cycles where the agent takes an action, observes the result, and uses that observation to inform the next action. This mirrors how humans actually solve complex problems. We rarely figure everything out up front and execute a perfect plan. Instead, we try something, see what happens, learn from the result, and adjust our approach. Agentic workflows bring this same adaptive, iterative quality to AI systems.
Let us now look at five essential agentic workflow patterns:
At its core, reflection is about having an agent review and critique its own work, then revise based on that critique. This simple idea improves output quality because it introduces an iterative refinement process that catches errors, identifies weaknesses, and enhances strengths.
Here’s how the reflection cycle works in practice.
The agent first generates an initial output based on the task or prompt it receives.
Then, instead of immediately presenting this output as final, the agent switches into critique mode. It examines what it just produced, looking for problems, inconsistencies, areas that lack clarity, or opportunities for improvement. This critique becomes the basis for revision.
The agent generates an improved version that addresses the issues it identified. Depending on the implementation, this cycle might repeat multiple times, with each iteration refining the output further.
See the diagram below:
The power of reflection becomes even more apparent when we specialize in the type of critique being performed. Some examples are as follows:
An agent might reflect specifically on accuracy, checking whether the facts and claims it made are correct and well-supported.
Alternatively, reflection might focus on clarity, asking whether someone unfamiliar with the topic would understand the explanation.
For creative writing, reflection might evaluate tone, ensuring the voice matches the intended style and audience.
For code generation, reflection could focus on identifying bugs, security vulnerabilities, or opportunities to optimize performance.
The reflection pattern works best for tasks where quality matters more than speed and where there are subjective aspects that benefit from review. The pattern, however, is less necessary for simple, factual queries where the answer is straightforward or for tasks where speed is paramount and good enough is truly sufficient.
The tool use pattern represents a fundamental expansion of what AI agents can accomplish.
A language model by itself, no matter how sophisticated, is limited to reasoning about information it learned during training and generating text based on that knowledge. It cannot access current information, perform precise calculations with large numbers, retrieve data from specific databases, or interact with external systems. Tools change everything.
In the tool use pattern, agents are equipped with a set of capabilities they can invoke when needed. These might include web search engines for finding current information, APIs for accessing services like weather data or stock prices, code interpreters for running programs and performing calculations, database query tools for retrieving specific records, file system access for reading and writing documents, and countless other specialized functions. The critical distinction from traditional software is that the agent itself decides when and how to use these tools based on the task at hand.
See the diagram below:
When an agent receives a task, it analyzes what capabilities are needed to accomplish that task. For example:
If the task requires information the agent doesn’t have, it recognizes the need for a search or data retrieval tool.
If the task involves mathematical operations, it accesses a calculator or code interpreter.
If the task requires interacting with a specific service, it uses the appropriate API tool.
What makes tool use powerful is the dynamic nature of tool selection and the ability to chain multiple tool calls together.
The agent doesn’t follow a predetermined script. If the first search doesn’t return adequate information, the agent might reformulate its query and search again. If an API call fails or returns an error, the agent might try an alternative approach or a different tool entirely. This adaptability makes tool-enabled agents far more capable than rigid automated workflows.
The Reason and Act pattern, commonly known as ReAct, represents a sophisticated approach to problem-solving that combines explicit reasoning with iterative action. Rather than thinking through an entire plan before acting, or blindly taking actions without reflection, ReAct agents alternate between reasoning about what to do next and actually doing it. This interleaving of thought and action creates a natural, adaptive problem-solving process.
The ReAct cycle follows a clear pattern.
First, the agent reasons about the current situation and what it needs to accomplish. This reasoning step is made explicit, often literally written out as the agent’s internal thought process. The agent might think about what information it has, what it still needs, what approaches might work, and what the best next step is.
Then, based on this reasoning, the agent takes an action. This might be using a tool to gather information, performing a calculation, or making a decision.
After the action, the agent observes the results and enters a new reasoning phase, thinking about what it learned and what to do next. This cycle continues until the agent determines it has accomplished the goal or reached a point where it cannot proceed further.
See the diagram below:
The explicit reasoning steps serve multiple important purposes.
First, they help the agent stay on track and maintain focus on the goal. By articulating what it’s trying to accomplish and why each action makes sense, the agent is less likely to go down irrelevant paths or get stuck in unproductive loops.
Second, reasoning steps enable adaptation. When an action doesn’t yield expected results, the reasoning phase allows the agent to diagnose why and adjust its approach rather than blindly continuing.
Third, the reasoning trail provides transparency. Users and developers can see not just what the agent did, but why it made those choices, which is valuable for trust, debugging, and understanding the agent’s decision-making process.
Comparing ReAct to pure planning or pure execution highlights its strengths.
Pure planning means figuring out all the steps before taking any action. This works well when we have complete information, and the environment is predictable, but it struggles when we need to discover information along the way or when circumstances change.
Pure execution means taking actions without much forethought, which is fast but often inefficient and prone to mistakes.
ReAct finds a middle ground, providing enough structure through reasoning while maintaining flexibility through iterative action.
The planning pattern takes a different approach from ReAct by emphasizing upfront strategic thinking before execution begins.
When using the planning pattern, the agent starts by analyzing the overall goal and understanding what success looks like. It then breaks down this goal into smaller, more manageable subtasks. This decomposition continues until the agent has identified concrete, actionable steps.
Crucially, the agent identifies dependencies between tasks, determining which steps must be completed before others can begin and which steps can potentially happen in parallel. The agent also considers what resources, tools, or information each step will require. Only after creating this structured plan does the agent begin execution.
See the diagram below:
One of the planning pattern’s key strengths is adaptive planning.
The planning pattern works best for tasks with natural phases or stages where some activities logically precede others. It’s valuable for tasks with constraints like deadlines, budgets, or resource limitations where coordination matters. It shines in situations where mistakes or backtracking would be costly, making it worth investing time in thoughtful planning. Complex projects involving multiple work streams benefit greatly from planning.
However, the planning pattern has limitations.
For simple, linear tasks where each step naturally suggests the next one, the overhead of creating a formal plan provides little benefit.
For highly uncertain tasks where we’re likely to discover critical information during execution that fundamentally changes the approach, extensive upfront planning might be wasted effort.
The multi-agent pattern represents perhaps the most sophisticated approach to building AI systems.
Instead of relying on a single agent to handle everything, this pattern uses multiple specialized agents that collaborate to accomplish tasks. Each agent has specific expertise, capabilities, or perspectives, and they work together much like human teams do.
The core insight behind multi-agent systems is that specialization often leads to better performance than generalization.
A single agent trying to be excellent at everything faces challenges. It must balance competing requirements in its design and training. It needs broad knowledge but also deep expertise. It must be creative but also critical. By dividing responsibilities among multiple agents, each can be optimized for its specific role.
In a multi-agent system, we typically see several types of roles.
There are specialist agents focused on particular domains or tasks, such as a research agent that excels at finding and synthesizing information, a coding agent optimized for writing and debugging code, or a data analysis agent skilled at statistical analysis and visualization.
There are often critics or review agents whose job is to evaluate outputs from other agents, identifying flaws, suggesting improvements, or verifying quality.
There’s usually a coordinator or orchestrator agent that manages the overall workflow, deciding which specialist should handle each subtask and ensuring all the pieces come together coherently.
The multi-agent pattern introduces complexity trade-offs as follows:
Coordination overhead increases with more agents.
Communication between agents requires clear protocols.
Debugging becomes more challenging because problems might arise from interactions between agents rather than individual agent errors.
The benefits must justify these costs. For simple tasks, a single capable agent is almost always better. For complex tasks requiring diverse expertise, careful coordination, or multiple perspectives, the multi-agent approach often produces superior results despite its added complexity.
The various agentic workflow patterns represent a fundamental evolution in how we build and deploy AI systems.
Moving beyond simple prompting to sophisticated, iterative processes has transformed what AI agents can reliably accomplish. Here’s a quick summary of the patterns we have covered:
The reflection pattern ensures quality through self-improvement.
Tool use extends capabilities far beyond pure language generation.
ReAct combines thoughtful reasoning with adaptive action.
Planning brings strategic thinking to complex tasks.
Multi-agent collaboration leverages specialization and diverse perspectives.
Together, these patterns provide a robust toolkit for building AI systems capable of handling real-world complexity.
What makes these patterns particularly powerful is that they’re not mutually exclusive. The most sophisticated agent systems often combine multiple patterns to achieve their goals.
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-14 00:30:43
To better understand the vulnerabilities and threats facing modern DevOps organizations, Datadog analyzed security posture data from a sample of thousands of organizations that use AWS, Azure, or Google Cloud.
In this report, you’ll gain valuable cloud security insights based on this research including:
How long-lived credentials create opportunities for attackers to breach cloud environments
Adoption of proactive cloud security mechanisms such as S3 Public Access Block or IMDSv2 in AWS
Most common risks when using managed Kubernetes distributions
This week’s system design refresher:
Transformers Step-by-Step Explained (Youtube video)
Database Types You Should Know in 2025
Apache Kafka vs. RabbitMQ
The HTTP Mindmap
How DNS Works
SPONSOR US
There’s no such thing as a one-size-fits-all database anymore. Modern applications rely on multiple database types, from real-time analytics to vector search for AI. Knowing which type to use can make or break your system’s performance.
Relational: Traditional row-and-column databases, great for structured data and transactions.
Columnar: Optimized for analytics, storing data by columns for fast aggregations.
Key-Value: Stores data as simple key–value pairs, enabling fast lookups.
In-memory: Stores data in RAM for ultra-low latency lookups, ideal for caching or session management.
Wide-Column: Handles massive amounts of semi-structured data across distributed nodes.
Time-series: Specialized for metrics, logs, and sensor data with time as a primary dimension.
Immutable Ledger: Ensures tamper-proof, cryptographically verifiable transaction logs.
Graph: Models complex relationships, perfect for social networks and fraud detection
Document: Flexible JSON-like storage, great for modern apps with evolving schemas.
Geospatial: Manages location-aware data such as maps, routes, and spatial queries.
Text-search: Full-text indexing and search with ranking, filters, and analytics.
Blob: Stores unstructured objects like images, videos, and files.
Vector: Powers AI/ML apps by enabling similarity search across embeddings.
Over to you: Which database type do you think will grow fastest in the next 5 years?
Kafka and RabbitMQ both handle messages, but they solve fundamentally different problems. Understanding the difference matters when designing distributed systems.
Kafka is a distributed log. Producers append messages to partitions. Those messages stick around based on retention policy, not because someone consumed them. Consumers pull messages at their own pace using offsets. You can rewind, replay, reprocess everything. It is designed for high throughput event streaming where multiple consumers need the same data independently.
RabbitMQ is a message broker. Producers publish messages to exchanges. Those exchanges route to queues based on binding keys and patterns (direct, topic, fanout). Messages get pushed to consumers and then deleted once acknowledged. It is built for task distribution and traditional messaging workflows.
The common mistake is using Kafka like a queue or RabbitMQ like an event log. They’re different tools built for different use cases.
Over to you: If you had to explain when NOT to use Kafka, what would you say?
HTTP has evolved from HTTP/1.1 to HTTP/2, and now HTTP/3, which uses the QUIC protocol over UDP for improved performance. Today, it’s the backbone of almost everything on the internet, from browsers and APIs to streaming, cloud, and AI systems.
At the foundation, we have underlying protocols. TCP/IP for IPv4 and IPv6 traffic. Unix domain sockets for local communication. HTTP/3 running over UDP instead of TCP. These handle the actual data transport before HTTP even comes into play.
Security wraps around everything. HTTPS isn’t optional anymore. WebSockets power real-time connections. Web servers manage workloads. CDNs distribute content globally. DNS resolves everything to IPs. Proxies (forward, reverse, and API gateways) route, filter, and secure traffic in between.
Web services exchange data in different formats, REST with JSON, SOAP for enterprise systems, RPC for direct calls, and GraphQL for flexible queries. Crawlers and bots index the web, guided by robots.txt files that set the boundaries.
The network world connects everything. LANs, WANs, and protocols like FTP for file transfers, IMAP/POP3 for email, and BitTorrent for peer-to-peer communication. For observability, packet capture tools like Wireshark, tcpdump, and OpenTelemetry let developers peek under the hood to understand performance, latency, and behavior across the stack.
Over to you: HTTP has been evolving for 30+ years, what do you think the next big shift will be?
You type a domain name and hit enter, but what actually happens before that webpage loads is more complex than most people realize. DNS is the phonebook of the internet, and every request you make triggers a chain of lookups across multiple servers.
Step 1: Someone types bytebytego. com into their browser and presses enter.
Step 2: Before doing anything, the browser looks for a cached IP address. Operating system cache gets checked too.
Step 3: Cache miss triggers a DNS query. The browser sends a query to the configured DNS resolver, usually provided by your ISP or a service like Google DNS or Cloudflare.
Step 4: Resolver checks its own cache.
Step 5-6: If the resolver doesn’t have the answer cached, it asks the root servers, “Where can I find .com?” For bytebytego. com, the root server responds with the address of the .com TLD name server.
Step 7-8: Resolver queries the .com TLD server. TLD server returns the authoritative server address.
Step 9-10: This server has the actual A/AAAA record mapping the domain to an IP address. The resolver finally gets the answer: 172. 67. 21. 11 for bytebytego. com.
Step 11-12: The IP gets cached at the resolver level for future lookups, and returned to the browser.
Step 13-14: The browser stores this for its own future use, and uses the IP to make the actual HTTP request.
Step 15: The web server returns the requested content.
All this happens in milliseconds, before your first page even starts loading.
Over to you: Which DNS tools or commands do you rely on most, dig, nslookup, or something else?
An HTTP server cannot automatically initiate a connection to a browser. As a result, the web browser is the initiator. What should we do next to get real-time updates from the HTTP server?
Both the web browser and the HTTP server could be responsible for this task.
Web browsers do the heavy lifting: short polling or long polling. With short polling, the browser will retry until it gets the latest data. With long polling, the HTTP server doesn’t return results until new data has arrived.
HTTP server and web browser cooperate: WebSocket or SSE (server-sent event). In both cases, the HTTP server could directly send the latest data to the browser after the connection is established. The difference is that SSE is uni-directional, so the browser cannot send a new request to the server, while WebSocket is fully-duplex, so the browser can keep sending new requests.
Over to you: of the four solutions (long polling, short polling, SSE, WebSocket), which ones are commonly used, and for what use cases?
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-13 00:30:57
What if you could spend most of your IT resources on innovation, not maintenance?
The latest report from the IBM Institute for Business Value explores how businesses are using intelligent automation to get more out of their technology, drive growth & cost the cost of complexity.
Disclaimer: The details in this post have been derived from the details shared online by OpenAI, Gemini, xAI, Perplexity, Microsoft, Qwen, and Anthropic Engineering Teams. All credit for the technical details goes to OpenAI, Gemini, xAI, Perplexity, Microsoft, Qwen, and Anthropic Engineering Teams. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
Deep Research has become a standard capability across modern LLM platforms.
ChatGPT, Gemini, and Claude all support tasks that run for long periods of time and gather information from large portions of the public web.
A typical deep research request may involve dozens of searches, several rounds of filtering, and the careful assembly of a final, well-structured report. For example, a query like “list 100 companies working on AI agents in 2025” does not rely on a single search result. It activates a coordinated system that explores a wide landscape of information over 15 to 30 minutes before presenting a final answer.
This article explains how these systems work behind the scenes.
We will walk through the architecture that enables Deep Research, how different LLMs implement it, how agents coordinate with one another, and how the final report is synthesized and validated before being delivered to the user.
Deep Research systems are built from AI agents that cooperate with each other. In this context, an AI agent is a service driven by an LLM that can accept goals, design workflows to achieve those goals, and interact with its environment through tools such as web search or code execution.
See the diagram below to understand the concept of an AI Agent:
At a high level, the architecture begins with the user request. The user’s query is sent into a multi-agent research system. Inside this system, there is usually an orchestrator or lead agent that takes responsibility for the overall research strategy.
The orchestrator receives the query, interprets what the user wants, and then creates a plan for how to answer the question. That plan is broken into smaller pieces and delegated to multiple sub-agents. The most common sub-agents are “web search” agents. Each of these is instructed to search the web for a specific part of the overall topic or a particular sub-task, such as one region, one time period, or one dimension of the question.
Once the web agents finish their work, they return two things:
The content they have extracted. This typically takes the form of text snippets, summaries, or key facts.
Citations that record exactly where that content came from, such as URLs and page titles.
These results then move into what we can call the “synthesizer” flow. This stage often contains two agents: a synthesizer agent and a citations agent. In some systems, the orchestrator itself also acts as the synthesizer, so a separate agent is not required.
The synthesizer agent takes all the content returned by the web agents and converts it into the final research report. It organizes the information into sections, resolves overlaps, and builds a coherent narrative. The citations agent then reads through the synthesized report and makes sure that each statement is supported by the correct sources. It inserts citations in the right locations in the text, so that the final report is thoroughly backed by the underlying material.
After this synthesis and citation process is complete, the synthesizer (or orchestrator) returns the final, fully cited research report to the user.
Anthropic has published a high-level diagram of its “Advanced Research” mode, which illustrates such a multi-agent research system in action. It shows the lead agent, the various sub-agents, and the data flowing between them through planning, research, and synthesis.

Although the broad idea behind Deep Research is shared across platforms, each major provider implements its own variations.
OpenAI’s deep research agent is built around a reasoning model that uses reinforcement learning.
The model is trained to plan multi-step research tasks, decide when to search, when to read, and how to combine information into a final answer. The use of reinforcement learning helps the agent improve over time by rewarding good sequences of tool calls and research decisions.
Google DeepMind’s Gemini Deep Research system is built on top of the Gemini model, which is multimodal. That means the same system can reason over text, images, and other types of inputs.
For deep research, this allows Gemini to integrate information from documents, web pages, and other media into a combined response. Gemini’s agent uses its planning ability to decide what to look for, how to structure the research, and how to bring everything together into one report.
Anthropic’s advanced research system uses a clearly defined multi-agent architecture. There is a lead agent that orchestrates several sub-agents running in parallel. Each sub-agent is asked to explore a specific part of the problem space.
For complex topics, this design allows Claude to divide the subject into multiple angles and explore them at the same time, then bring the results back to the orchestrator for synthesis.
Perplexity’s deep research agent uses an iterative information retrieval loop.
Instead of a single pass of search and summary, it repeatedly adjusts its retrieval based on new insights discovered along the way.
Perplexity also uses a hybrid architecture that can autonomously select the best underlying models for different parts of the task. For example, one model might be better at summarization while another is better at search interpretation, and the system can route work accordingly.
Grok DeepSearch has a segment-level module processing pipeline.
Content is processed in segments, and each segment passes through a credibility assessment stage. Additionally, Grok uses a sparse attention mechanism that allows it to perform concurrent reasoning across multiple pieces of text.
The system can also dynamically allocate resources, switching between retrieval and analysis modes as needed, all inside a secure sandbox environment.
Microsoft has introduced two related reasoning agents:
A Researcher is focused on complex, multi-step research tasks that combine web information with a user’s work data. It uses sophisticated orchestration and search capabilities to handle multi-stage questions.
An Analyst is an advanced data analytics agent that can interpret and transform raw data into useful insights. It uses a chain-of-thought reasoning approach to break down analytical problems, apply appropriate operations, and present the results.
Both Researcher and Analyst are designed to work securely over enterprise data and the public web.
Alibaba’s Qwen Deep Research is an advanced agent that supports dynamic research blueprinting.
It can generate an initial research plan, then refine that plan interactively. Qwen’s architecture supports concurrent task orchestration, which means that retrieval, validation, and synthesis of information can happen in parallel. This allows the system to retrieve data, verify it, and integrate it into the final output efficiently.
The entire deep research workflow starts with a single user query.
Users can phrase requests in many different ways. Some users write very vague prompts such as “tell me everything about AI agents,” while others provide highly detailed, focused instructions. The system must be able to handle this variability and translate the query into a precise, machine-executable research plan.
This initial stage is critical. It converts the user’s often broad or ambiguous request into a clear strategy with specific steps. The quality of the final report is directly tied to the quality of this plan. If the plan is incomplete or misinterprets the user’s intent, the resulting research will miss key information or go in the wrong direction.
See the diagram below:
Different systems handle this planning phase in different ways.
Some architectures, such as OpenAI’s Deep Research, use an interactive clarification approach. Here, the agent does not immediately start a long research process. Instead, it may ask the user follow-up questions. These questions are designed to refine the research scope, clarify the objectives, and confirm exactly what information the user cares about.
For example, if the user asks for a comparison of technologies, the agent might ask whether the user wants only recent developments, whether specific regions should be included, or whether certain constraints apply. This conversational back-and-forth continues until the agent has a crisp understanding of the user’s needs, at which point it commits to the full research process.
Other systems, such as Google’s Gemini, take a different path. Rather than asking the user follow-up questions by default, Gemini can autonomously generate a comprehensive multi-step plan based on its interpretation of the initial query. This plan outlines the sub-tasks and research angles the system intends to explore.
Gemini then presents this proposed plan to the user for review and approval. The user can read the plan, make edits, add constraints, or remove unwanted sub-tasks. Once the user is satisfied and approves the plan, the system begins the research process.
Once the plan is ready, the system moves from strategy to execution. Instead of a single agent performing all steps, the lead agent delegates work to multiple sub-agents that “work for” it.
The diagram below from Anthropic shows how the lead agent assigns work to specialized agents that run in parallel and then gather results back into a central synthesis process.

The lead agent delegates each sub-task using a structured API call. Technically, this means the orchestrator calls another service (the sub-agent) with a payload that contains everything the sub-agent needs:
A precise prompt that explains its specific research goal, such as “Investigate the financial performance of NVIDIA in Q4 2024.”
Any constraints, such as time ranges, data sources, or limits on how many pages to read.
Access permissions and tool configuration, so the sub-agent knows which tools it can use.
Sub-agents are often specialized rather than fully general. While some systems may have general-purpose “research agents,” it is more common to see a pool of agents tuned for particular functions. Examples include:
A web search agent specialized in forming effective search queries, interacting with search engines, and interpreting result snippets.
A data analysis agent that has access to a code interpreter and can perform statistical analyses, process CSV files, or generate simple visualizations.
By using specialized agents, the system can apply the best tool and approach to each part of the plan, which improves both the accuracy and efficiency of the overall research.
A key benefit of this architecture is parallel execution. Since sub-agents are separate services, many of them can run at the same time. One sub-agent might be researching market trends, another might be gathering historical financial data, and a third might be investigating competitor strategies, all in parallel.
However, not all tasks run simultaneously. Some tasks must wait for others to complete. The orchestrator keeps track of dependencies and triggers sub-agents when their inputs are ready.
To interact with the outside world, sub-agents use tools. The agents themselves do not have direct access to the web or files. Instead, they issue tool calls that the system executes on their behalf.
Common tools include:
Search tool: The agent calls something like web_search(query=”analyst ratings for Microsoft 365 Copilot”). The system sends this query to an external search engine API (such as Google or Bing) and returns a list of URLs and snippets.
Browser tool: After receiving search results, the agent can call browse(url=”...”) to fetch the full content of a webpage. The browser tool returns the page text, which the agent then processes.
Code interpreter tool: For numerical or data-heavy tasks, the agent can write Python code and execute it in a secure, sandboxed environment. The code interpreter might read CSV data, compute averages, or run basic analyses. The agent then reads the output and incorporates the findings into its report.
As a sub-agent receives data from tools, it must constantly evaluate whether the information is relevant to its goal. This involves:
Checking whether the source is authoritative or credible.
Cross-referencing facts across multiple pages when possible.
Noticing when initial search results are weak and adjusting the query.
For example, if a search returns mostly irrelevant marketing pages, the agent might refine the query with more specific terms or filters. It might add keywords like “PDF,” “quarterly report,” or a specific year to narrow the results.
When the agent finds useful content, it extracts the relevant snippets and stores them along with their original URLs. This pairing of content and citation is essential because it ensures that every piece of information used later in the synthesis stage is traceable back to its source.
Each sub-agent maintains its own short-term memory or “context” of what it has seen so far. This memory allows it to build a coherent understanding of its sub-task and avoid repeating work. When the sub-agent finishes its assignment, it returns a well-structured packet of information that includes both the findings and their citations.
The output of the entire retrieval phase is not yet a single document. Instead, it is a collection of these self-contained information packets from all sub-agents, each focused on a different part of the research problem.
See the diagram below:
Once all sub-agents return their results, the system enters the synthesis phase. At this point, the system has a large set of fragmented insights, each tied to a specific part of the research plan. The objective is to transform these pieces into a unified report.
See the diagram below:
The orchestrator or synthesizer agent begins by collecting all information packets. It performs a high-level analysis to identify themes, overlaps, and logical connections. For example, insights about market adoption may complement insights about customer sentiment, and both may feed into a broader section of the report.
The synthesizer then constructs a narrative outline for the final document. It decides the structure that best fits the material, whether chronological, thematic, or based on a problem and solution. Redundant information from multiple sub-agents is merged into a single, clean statement.
With the outline ready, the agent begins writing the report. It incorporates extracted facts, creates transitions between sections, and maintains a consistent tone. As it writes, each claim is connected to its source. Some systems assign this step to a dedicated citation agent that reviews the draft and inserts citations in the correct locations.
This stage is important because it prevents hallucinations and ensures that every assertion in the final report can be traced back to a verified source.
The outcome is a polished research document supported by citations and, when needed, a formal bibliography.
Deep Research systems rely on multi-agent architectures that coordinate planning, parallel exploration, and structured synthesis.
Specialized sub-agents retrieve information, evaluate it, and return detailed findings. The orchestrator or synthesizer then turns this distributed knowledge into a coherent and well-cited report. As LLMs improve in planning, reasoning, and tool use, these systems will continue to become more capable, more reliable, and more comprehensive.
References:
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].
2025-12-12 00:31:24
When we start building software, we often think of performance as simply how fast our application runs.
We might equate performance to making a function run faster or optimizing a short piece of code. However, as we move into professional software development and system architecture, we must adopt a more strategic and precise definition of what performance truly is.
We must realize that system performance is not just an abstract idea of “speed”. Instead, it is a formal, measurable quality defined by industry standards.
This standardized quality attribute is called Performance Efficiency. Performance is formally defined as the degree to which a software system or component meets its responsiveness and throughput requirements within the limits of its available resources.
In simple terms, performance is a strategic ratio: it measures the useful work we get done compared to the resources (like time, CPU, and memory) we use up while operating under a specific workload. A high-performing system maximizes the work output while minimizing resource waste
In this article, we will look at system performance in detail, understand how it can be measured, and investigate key strategies that can be used to improve the performance of a system on different levels.