2026-02-25 00:30:15
Monster SCALE Summit is a new virtual conference all about extreme-scale engineering and data-intensive applications.
Join us on March 11 and 12 to learn from engineers at Discord, Disney, LinkedIn, Uber, Pinterest, Rivian, ClickHouse, Redis, MongoDB, ScyllaDB and more. A few topics on the agenda:
What Engineering Leaders Get Wrong About Scale
How Discord Automates Database Operations at Scale
Lessons from Redesigning Uber’s Risk-as-a-Service Architecture
Scaling Relational Databases at Nextdoor
How LinkedIn Powers Recommendations to Billions of Users
Powering Real-Time Vehicle Intelligence at Rivian with Apache Flink and Kafka
The Data Architecture behind Pinterest’s Ads Reporting Services
Bonus: We have 500 free swag packs for attendees. And everyone gets 30-day access to the complete O’Reilly library & learning platform.
Uber’s infrastructure runs on thousands of microservices, each making authorization decisions millions of times per day. This includes every API call, database query, and message published to Kafka. To make matters more interesting, Uber needs these decisions to happen in microseconds to have the best possible user experience.
Traditional access control could not handle the complexity. For instance, you might say “service A can call service B” or “employees in the admin group can access this database.” While these rules work for small systems, they fall short when you need more control. For example, what if you need to restrict access based on the user’s location, the time of day, or relationships between different pieces of data?
Uber needed a better approach. They built an attribute-based access control system called Charter to evaluate complex conditions against attributes pulled from various sources at runtime.
In this article, we will look at how the Uber engineering team built Charter and the challenges they faced.
Disclaimer: This post is based on publicly shared details from the Uber Engineering Team. Please comment if you notice any inaccuracies.
Before diving into ABAC, you need to understand how Uber thinks about authorization. Every access request can be broken down into a simple question:
Can an Actor perform an Action on a Resource in a given Context?
Let’s understand each component of this statement:
Actor represents the entity making the request. At Uber, this could be an employee, a customer, or another microservice. Uber uses the SPIFFE format to identify actors. An employee might be identified as spiffe://personnel.upki.ca/eid/123456, where 123456 is their employee ID. A microservice running in production would be identified as spiffe://prod.upki.ca/workload/service-foo/production
Action describes what the actor wants to do. Common actions include create, read, update, and delete, often abbreviated as CRUD. Services can also define custom actions like invoke for API calls, subscribe for message queues, or publish for event streams.
A resource is the object being accessed. Uber represents resources using UON, which stands for Uber Object Name. This is a URI-style format that looks like uon://service-name/environment/resource-type/identifier. For example, a specific table in a database might be uon://orders.mysql.storage/production/table/orders.
The host portion of the UON is called the policy domain. This acts as a namespace for grouping related policies and configurations.
As mentioned, Uber built a centralized service called Charter to manage all authorization policies. Think of Charter as a policy repository where administrators define who can access what. This approach offers several advantages over having each service implement its own authorization logic.
See the diagram below:
Policies stored in Charter are distributed to individual services. Each service includes a local library called authfx that evaluates these policies.
The architecture works as follows:
Policy authors create and update policies in Charter
Charter stores these policies in a database
A unified configuration distribution system pushes policy updates to all relevant services
Services use the authfx library to evaluate policies for incoming requests
Authorization decisions are made locally within each service
SerpApi turns live search engines into APIs, returning clean JSON for results, reviews, prices, locations, and more. Use it to ground your app or LLMs with real-world data from Google, Maps, Amazon, and beyond, without maintaining scrapers.
The simplest form of policy at Uber connects actors to resources through actions.
A basic policy might look like this in YAML format:
file_type: policy
effect: allow
actions:
- invoke
resource: “uon://service-foo/production/rpc/foo/method1”
associations:
- target_type: WORKLOAD
target_id: “spiffe://prod.upki.ca/workload/service-bar/production”This policy translates to: “Allow service-bar to invoke method1 of service-foo.” Another example shows how employees can be granted access:
file_type: policy
effect: allow
actions:
- read
- write
resource: “uon://querybuilder/production/report/*”
associations:
- target_type: GROUP
target_id: “querybuilder-development”This policy means: “Allow employees in the querybuilder-development group to read and write query reports.”
These basic policies work well for straightforward authorization scenarios. However, real-world requirements are often more complex.
Uber encountered several limitations with the basic policy model.
For example, consider a payment support service. Customer support representatives need to access payment information to help customers. However, for privacy and compliance reasons, support reps should only access payment data for customers in their assigned region. The basic policy syntax can only specify that a representative can access a payment profile by its UUID. It cannot express the requirement that the rep’s region must also match the customer’s region.
Another example involves employee data. An employee information service needs to allow employees to view and edit their own profiles. It should also allow their managers to access their profiles. The basic policy model cannot express this “self or manager” relationship because it would require checking whether the actor’s employee ID matches either the resource’s employee ID or the resource’s manager ID.
A third scenario involves data analytics. Some reports should only be accessible to users who belong to multiple specific groups simultaneously. The existing model supported checking if a user belonged to any group in a list, but not whether they belonged to all groups in a list.
In a nutshell, Uber needed a way to incorporate additional context and attributes into authorization decisions.
ABAC extends the basic policy model by adding conditions. A condition is a Boolean expression that evaluates to true or false based on attributes. If a permission includes a condition, that permission only grants access when the condition evaluates to true.
Attributes are characteristics of actors, resources, actions, or the environment. For example:
An actor might have attributes like location, department, or role.
A resource might have attributes such as owner, sensitivity level, or creation date.
The environment might provide attributes like current time, day of the week, or request IP address.
Attribute Stores are the sources that provide attribute values at authorization time. In formal authorization terminology, these are called Policy Information Points or PIPs. When evaluating a condition, the authorization engine queries the appropriate attribute store to fetch the necessary values.
The enhanced policy model adds an optional condition field to each permission. Here’s an example:
actions: [create, delete, read, update]
resource: “uon://payments.svc/production/payment/*”
associations:
- target_type: EMPLOYEE
condition:
expression: “resource.paymentType == ‘credit card’ && actor.location == resource.paymentLocation”
effect: ALLOW
```This policy allows employees to perform CRUD operations on payment records, but only when two conditions are met: the payment type is a credit card, and the employee’s location matches the payment’s location.
When ABAC is enabled, the authorization architecture includes additional components.
The authfx library now includes an authorization engine that coordinates policy evaluation. When a request arrives, the engine first checks if the basic requirements are met: does the actor match, does the action match, does the resource match? If those checks pass and a condition exists, the engine moves to condition evaluation.
The authorization engine interacts with an expression engine that evaluates the condition expression. The expression engine identifies which attributes are needed and requests them from the appropriate attribute stores. See the diagram below:
Uber defined four types of attribute store interfaces:
ActorAttributeStore fetches attributes about the actor making the request. This might include their employee ID, group memberships, location, or department.
ResourceAttributeStore fetches attributes about the resource being accessed. This could include the resource’s owner, creation date, sensitivity classification, or any custom business attributes.
ActionAttributeStore fetches attributes related to the action being performed, though this is used less frequently than actor and resource attributes.
EnvironmentAttributeStore fetches contextual attributes like the current timestamp, day of week, or request metadata.
Each attribute store must implement a SupportedAttributes() function that declares which attributes it can provide. This enables the authorization engine to pre-compile condition expressions and validate that all required attributes are available. At runtime, when an attribute value is needed, the engine calls the appropriate method on the corresponding store.
See the code snippet below:

The design allows a single service to use multiple attribute stores, and a single attribute store can be shared across multiple services for reusability.
To represent conditions based on attributes, Uber needed an expression language. Rather than inventing a new language from scratch, the engineering team evaluated existing open-source options.
They selected the Common Expression Language (CEL), developed by Google. CEL offered several advantages:
First, it has a simple, familiar syntax similar to other programming languages.
Second, it supports multiple data types, including strings, numbers, booleans, and lists.
Third, it includes built-in functions for string manipulation, arithmetic operations, and boolean logic.
CEL also provides macros that are particularly useful for working with collections. For instance, you can write actor.groups.exists(g, g == ‘admin’) to check if the actor belongs to a group called “admin.”
The performance characteristics of CEL were excellent. Expression evaluation typically takes only a few microseconds. Both Go and Java implementations of CEL are available, meeting Uber’s backend service requirements. Additionally, both implementations support lazy attribute fetching, meaning they only request the attribute values actually needed to evaluate the expression, improving efficiency.
A sample CEL expression looks like this:
resource.paymentType == ‘credit card’ && actor.location == resource.paymentLocationThis expression is evaluated against attribute values fetched at runtime to produce a true or false result.
To illustrate the practical benefits of ABAC, consider how Uber manages authorization for Apache Kafka topics.
Uber uses thousands of Kafka topics for event streaming across its platform. Each topic needs access controls to specify which services can publish messages and which can subscribe to receive messages. The Kafka infrastructure team is responsible for managing these policies.
With basic policies, the Kafka team would need to create individual policies for every topic. Given the sheer volume of topics, this would be impractical and time-consuming.
Uber has a service called uOwn that tracks ownership and roles for technological assets. Each Kafka topic can have roles assigned directly or inherited through the organizational hierarchy. One such role is “Develop,” which indicates responsibility for developing and maintaining that topic.
Using ABAC, the Uber engineering team created a single generic policy that applies to all Kafka topics:
effect: allow
actions: [admin]
resource: “uon://topics.kafka/production/*”
associations:
- target_type: EMPLOYEE
condition:
expression: ‘actor.adgroup.exists(x, x in resource.uOwnDevelopGroups)’Source: Uber Engineering Blog
The wildcard in the resource pattern means this policy applies to every Kafka topic. The condition checks whether the actor belongs to any Active Directory group that has the Develop role for the requested topic.
An attribute store plugin retrieves the list of groups with the Develop role for each topic from uOwn. This information becomes the resource.uOwnDevelopGroups attribute. When an employee attempts to perform an admin action on a topic, the authorization engine evaluates whether that employee belongs to one of the authorized groups.
This solution saved the Kafka team enormous effort. Instead of managing thousands of individual policies, they maintain one generic policy. As ownership changes in uOwn, authorization automatically adjusts without any policy updates.
The implementation of ABAC delivered multiple benefits across Uber’s infrastructure.
Authorization policies became more precise and fine-grained. Decisions could now consider any relevant attribute rather than just basic identity and group membership. This enabled security policies that more accurately reflected business requirements.
The system became more dynamic. When attribute values change in source systems like uOwn or employee directories, authorization decisions automatically adapt. No code deployment or policy update is required. This agility is critical in a fast-moving organization.
Scalability improved dramatically. A single well-designed ABAC policy can govern authorization for thousands or even millions of resources.
Centralization through the Charter made policy management easier. Rather than scattering authorization logic across hundreds of services, security teams can audit and manage policies in one place.
Performance remained excellent. Despite the added complexity of condition evaluation and attribute fetching, authorization decisions are still completed in microseconds due to local evaluation and on-demand attribute fetching.
Also, most importantly, ABAC separated policy from code. System owners can change authorization policies without building and deploying new code. This separation of concerns allows security policies to evolve independently from application logic.
Since implementing ABAC, 70 Uber services have adopted attribute-based policies to meet their specific authorization requirements. The framework provides a unified approach across diverse use cases, from protecting microservice endpoints to securing database access to managing infrastructure resources.
References:
2026-02-24 00:30:39
To scale with LLMs, you need to know how to monitor them effectively. In this eBook, get practical strategies to monitor, debug, and secure LLM-powered applications. From tracing multi-step workflows and detecting prompt injection attacks to evaluating response quality and tracking token usage, you’ll learn best practices for integrating observability into every layer of your LLM stack.
When we talk about large language models “learning,” we can end up creating a misleading impression. The word “learning” suggests something similar to human learning, complete with understanding, reasoning, and insight.
However, that’s not what happens inside these systems. LLMs don’t learn the way you learned to code or solve problems. Instead, they follow repetitive mathematical procedures billions of times, adjusting countless internal parameters until they become very good at mimicking patterns in text.
This distinction matters more than you might think because it changes the way LLMs generate their answers.
Understanding how LLMs actually work helps you know when to trust their outputs and when to be skeptical. It reveals why they can write convincing essays about topics they don’t fully understand, and why they sometimes fail in surprising ways.
In this article, we’ll explore three core concepts that have a key impact on the working of LLMs: loss functions (how we measure failure), gradient descent (how we make improvements), and next-token prediction (what LLMs actually do).
Before an LLM can learn anything, we need a way to measure how badly it’s performing. This measurement is called a loss function.
Think of it as a scoring system that provides a single number representing how wrong the model is. The higher the number, the worse the performance. The goal of training is to make this number as small as possible.
However, you can’t just pick any measurement and expect it to work. A good loss function must satisfy three critical requirements:
First, it must be specific. It needs to measure something concrete and not vague. If someone told you to “build an intelligent computer,” you’d struggle because intelligence itself is hard to define. Would a system that passes an IQ test count? Probably not, since computers have passed IQ tests for over a decade without being useful for much else. For LLMs, we pick something very specific, such as predicting the next word in a sequence correctly. This is concrete and measurable.
Second, the loss function must be computable. The computer needs to calculate it quickly and repeatedly. We can’t measure abstract qualities like “creativity” or “hard work” because these aren’t things you can easily quantify with the data available during training. However, you can measure whether a predicted word matches the actual next word in your training data. That’s a simple comparison that computers handle effortlessly.
Third, the loss function must be smooth. This is the trickiest requirement to grasp. Smoothness means the function’s output should change gradually as inputs change, without sudden jumps or breaks. Imagine walking down a gentle slope versus walking down a staircase. The slope is smooth because your altitude changes continuously. Stairs are not smooth because you suddenly drop from one step to the next.
Why does smoothness matter?
The training algorithm needs to figure out which direction to adjust the model’s parameters. If the loss function jumps around wildly, the algorithm can’t determine whether it’s moving in the right direction. Interestingly, accuracy (counting correct predictions) isn’t smooth because you can’t have partial predictions. You either got 47 or 48 predictions right, not 47.3. This is why LLMs actually optimize for something called cross-entropy loss instead, which is smooth and works better mathematically, even though accuracy is what we ultimately care about.
The crucial point to understand here is that LLMs are scored on matching patterns in their training data, not on being truthful or correct. If false information appears frequently in training data, the model gets rewarded for reproducing it. This fundamental design choice explains why LLMs can confidently state things that are completely wrong.
Many developer tools promise context-aware AI, but having data access doesn’t automatically mean agents know when to use it.
Real context requires understanding. Unblocked synthesizes knowledge from your codebase, PRs, discussions, docs, project trackers, and runtime signals. It connects past decisions to current work, resolves conflicts between outdated docs and actual practice, respects data permissions, and surfaces what matters for the task at hand.
With Unblocked:
Coding agents like Cursor, Claude, and Copilot generate output that aligns with your actual architecture and conventions
Code review focuses on real bugs rather than stylistic nits
You find instant answers without interrupting teammates
Once the loss function is decided, we need a process to actually improve the model. This is where gradient descent comes in.
Gradient descent is the algorithm that figures out how to adjust the billions of parameters inside a neural network to reduce the loss.
See the diagram below:
Imagine you have a ball sitting somewhere on a hilly landscape. The ball’s position represents the model’s current parameter values. The height of the ground beneath the ball represents the loss function’s output. Valleys represent low loss (good performance), and peaks represent high loss (bad performance). The goal is to get the ball to the lowest valley possible.
The process follows these steps:
Start with the ball at a random position on the landscape
Look at the slope directly around the ball to determine which direction is downhill
Roll the ball a tiny distance in that downhill direction
Repeat this process billions of times until the ball settles in a valley
Each adjustment is incredibly small. We’re not throwing the ball or making dramatic changes, but nudging it slightly based on the local slope. The “gradient” in gradient descent refers to this slope measurement, which tells you both the direction and steepness of the decline.
This approach uses a greedy algorithm, meaning it only considers the immediate next step without looking ahead. Picture walking downhill in thick fog where you can only see your feet. We can tell which direction slopes downward right where we’re standing, but we can’t see if there’s a deeper valley just beyond a small uphill section. The ball might settle in a minor dip when a much better solution exists nearby.
Why use such a limited approach?
This is because the alternative is computationally impossible. An LLM might have hundreds of billions of parameters. Evaluating all possible future states to find the absolute best solution would take longer than the lifespan of the universe. Gradient descent is practical because each step is simple and cheap to compute, even though we need billions of them.
Modern LLMs use a variation called Stochastic Gradient Descent, or SGD. The word “stochastic” means random. Instead of calculating loss across all your training data at once (which would require impossible amounts of memory), SGD uses small random batches of data. This makes training feasible with massive datasets. If we have a billion training examples, we can take a billion small steps using different random samples, which actually works better than trying to process everything at once.
Now we get to what LLMs actually train on. Despite their ability to write essays, explain concepts, and hold conversations, LLMs are trained on one simple task: predict the next word in a sequence.
Take the sentence “The cat sat on the mat.” During training, the model doesn’t see the whole sentence at once. Instead, it trains on overlapping segments:
Input: “The” → Predict: “cat” → If correct, gain a point
Input: “The cat” → Predict: “sat” → If correct, gain a point
Input: “The cat sat” → Predict: “on” → If correct, gain a point
Input: “The cat sat on” → Predict: “the” → If correct, gain a point
Input: “The cat sat on the” → Predict: “mat” → If correct, gain a point
This process repeats billions of times across trillions of words from the internet, books, articles, and other text sources. Every time the model predicts correctly, gradient descent adjusts its parameters to make similar predictions more likely in the future. Every time it predicts incorrectly, the parameters adjust to make that mistake less likely.
But why does this simple task produce such convincing outputs? The answer lies in how context narrows down possibilities.
Consider predicting the next word in this sequence: “I love to eat.” Without more context, it could be almost any food. But add more information: “I love to eat something for breakfast.” Now you’re narrowed down to breakfast foods like eggs, cereal, pancakes, or toast. Add even more: “I love to eat something for breakfast with chopsticks.” Now you’re thinking about foods eaten with chopsticks at breakfast, perhaps rice or noodles. Include geography: “I love to eat something for breakfast with chopsticks in Tokyo.” The possibilities narrow further to Japanese breakfast items.
LLMs excel at this pattern recognition because they process billions of these associations during training. They learn which words tend to follow others in different contexts. The more context we provide, the better their predictions become. This is why longer prompts often produce better results.
The transformer architecture that powers modern LLMs has a critical advantage over older approaches. It can process all these training examples in parallel rather than one at a time. This parallelization is why we can now train models on datasets that would take you multiple lifetimes to read. It’s the breakthrough that made current LLMs possible.
Next-token prediction through pattern matching produces impressive results. LLMs can write in different styles, translate languages, explain complex topics, and generate code. They spot subtle patterns across billions of examples that humans would never notice. For most common tasks, this approach works quite well.
However, pattern matching is not reasoning, and this creates predictable failure modes.
Consider what happens when you ask an LLM a question with a false premise. The model doesn’t stop to verify whether the premise is true. Instead, it might pattern-match to find the appropriate answer based on its training data. The answer can sound authoritative and detailed, but it might explain something that isn’t true. In other words, the model is trained to continue patterns in text, but not to fact-check or apply logical reasoning.
This problem extends to situations where training data is scarce. Suppose you ask an LLM to write code in Python. It will likely produce excellent results because massive amounts of Python code exist in its training data. However, ask it to write the same code in an obscure programming language, and it starts making confident mistakes. It might use operators that don’t exist in that language or call functions with the wrong number of arguments. The model extrapolates common programming patterns from popular languages, assuming they apply everywhere. With insufficient training examples to learn otherwise, these extrapolations lead to errors.
Perhaps most tellingly, LLMs fail at variations of problems they’ve seen before. There’s a famous logic puzzle about transporting a cabbage, a goat, and a wolf across a river with specific constraints about which items can’t be left alone together. LLMs solve this puzzle easily because it appears many times in their training data. However, if you slightly modify the constraints, the model often continues using the original solution. It doesn’t reason through the new logical requirements. Instead, it pattern-matches to the familiar puzzle and reproduces the memorized answer.
This happens because of how transformers work internally. When the model sees text that looks very similar to something in its training data, it does a fuzzy match and retrieves the known answer. This is efficient for common problems but fails when those small differences actually matter.
The core issue is that LLMs are optimized to reproduce patterns from their training data, not to be truthful, logical, or correct. When training data contains errors (and internet data contains many), models learn to reproduce those errors. When training data contains biases, models learn those too. When a task requires actual reasoning rather than pattern matching, the illusion can break down.
Understanding the mechanics of LLM training helps you use these tools more effectively.
LLMs are sophisticated pattern-matching systems that predict tokens through billions of small parameter adjustments. They’re not reasoning engines, and they don’t truly understand the text they generate.
This knowledge suggests several practical guidelines:
Use LLMs for tasks that are well-represented in their training data. They excel at common programming problems, generating content in standard formats, and answering frequently asked questions. They’re powerful productivity tools that can save enormous amounts of time on routine work.
However, be skeptical when dealing with novel problems, unusual edge cases, or domains where accuracy is critical.
Always verify outputs for important use cases. Don’t assume that confident-sounding responses are correct. The training process optimizes for sounding like training data, not for being right.
Most importantly, remember that LLMs are tools with specific capabilities and specific limitations. They’re remarkable at what they do, which is identifying and reproducing patterns in text. However, pattern matching, no matter how sophisticated, is not the same as reasoning, understanding, or intelligence. Knowing this difference helps you leverage their strengths while avoiding their weaknesses.
2026-02-22 00:30:27
npx workos launches an AI agent, powered by Claude, that reads your project, detects your framework, and writes a complete auth integration directly into your existing codebase. It’s not a template generator. It reads your code, understands your stack, and writes an integration that fits.
Then it typechecks and builds, feeding any errors back to itself to fix. Just run npx workos, from WorkOS.
This week’s system design refresher:
What Is Redis Really About? Why Is It So Popular? (Youtube video)
RabbitMQ vs Kafka vs Pulsar
What Are Agent Skills Really About? (Youtube video)
REST vs GraphQL
LAST CALL FOR ENROLLMENT: Become an AI Engineer - Cohort 4
RabbitMQ, Kafka, and Pulsar all move messages, but they solve very different problems under the hood.
This diagram looks simple, but it hides three very different mental models for building distributed systems.
RabbitMQ is a classic message broker. Producers publish to exchanges, exchanges route messages to queues, and consumers compete to process them.
Messages are pushed, acknowledged, and then gone. It’s great for task distribution, request handling, and workflows where “do this once” really matters.
Kafka flips the model. It’s not a queue, it’s a distributed log. Producers append events to partitions. Data stays there based on retention, not consumption. Consumers pull data using offsets and can replay everything.
This is why Kafka works so well for event streaming, analytics, and pipelines where multiple teams need the same data at different times.
Pulsar tries to combine both worlds. Brokers handle serving traffic, while BookKeeper stores data in a durable ledger. Consumers track position with cursors instead of offsets.
This separation lets Pulsar scale storage and compute independently and support both streaming and queue-like patterns.
Choosing between them isn’t about “which is faster” or “which is popular.” It’s about how you want data to flow, how long it should live, and how many times it needs to be read.
Join us on February 24, 2026 (AMER) / February 25, 2026 (EMEA & APJ) for a free live webinar where we’ll unveil how Intelligent Observability can help you build smarter automations. During the event, you’ll see our new agentic platform in action—an essential tool if you’re working with AI Agents. We’ll also share key updates on our innovations in APM, Infrastructure, and the latest advancements in OpenTelemetry support. This is your opportunity to explore cutting-edge solutions designed to empower your work and streamline your operations.
With REST, the server decides the response shape. You call “/v1/articles/123” and you get whatever that endpoint returns. If you need related data, you make another request. If the payload is larger than needed, you live with over-fetching.
HTTP gives you great primitives though: clear resource boundaries, URL-based versioning, and native caching via ETag, Cache-Control, and CDNs.
With GraphQL, the client decides the response shape. You send a single query describing exactly what fields you want. Behind the scenes, a GraphQL gateway fans out to multiple services, runs resolvers, and aggregates the response.
The complexity shifts from the client to the server. Caching still exists, but it usually lives at the application layer (persisted queries, response caching), not automatically at the HTTP layer.
Neither approach is “better” by default. REST optimizes for simplicity, cacheability, and clear ownership of resources. GraphQL optimizes for flexibility, client-driven data needs, and aggregation across services.
Over to you: What signals tell you REST is enough, and when GraphQL becomes worth it?
Enrollment for our upcoming Become an AI Engineer - Cohort 4 is closing soon, and classes officially begin on February 21.
Get 40% on your registration cost with code: BBGNL
This is not just another course about AI frameworks and tools. Our goal is to help engineers build the foundation and end to end skill set needed to thrive as AI engineers.
Here’s what makes this cohort special:
Learn by doing: Build real world AI applications, not just by watching videos.
Structured, systematic learning path: Follow a carefully designed curriculum that takes you step by step, from fundamentals to advanced topics.
Live feedback and mentorship: Get direct feedback from instructors and peers.
Community driven: Learning alone is hard. Learning with a community is easy!
We are focused on skill building, not just theory or passive learning. Our goal is for every participant to walk away with a strong foundation for building AI systems.
If you want to start learning AI from scratch, this is the perfect time to begin.
2026-02-21 00:31:57
Enrollment for our upcoming Become an AI Engineer - Cohort 4 is closing soon, and classes officially begin on February 21.
Get 40% on your registration cost with code: BBGNL
This is not just another course about AI frameworks and tools. Our goal is to help engineers build the foundation and end to end skill set needed to thrive as AI engineers.
Here’s what makes this cohort special:
Learn by doing: Build real world AI applications, not just by watching videos.
Structured, systematic learning path: Follow a carefully designed curriculum that takes you step by step, from fundamentals to advanced topics.
Live feedback and mentorship: Get direct feedback from instructors and peers.
Community driven: Learning alone is hard. Learning with a community is easy!
We are focused on skill building, not just theory or passive learning. Our goal is for every participant to walk away with a strong foundation for building AI systems.
If you want to start learning AI from scratch, this is the perfect time to begin.
2026-02-20 00:30:44
Eventual consistency is a key architectural choice in modern distributed systems. When we choose eventual consistency, we are making a trade-off between immediate synchronization across all database copies for better performance, scalability, and availability.
Using eventual consistency has been a key factor in our ability to build systems that serve millions of users globally. Whether we are building social media platforms, e-commerce sites, or real-time gaming applications, eventual consistency gives us the tools to handle data under load and during failures.
In this article, we will look at what eventual consistency is, why it exists, how to control it, and how to handle the challenges it creates.
2026-02-19 00:31:13
If slow QA processes bottleneck you or your software engineering team and you’re releasing slower because of it — you need to check out QA Wolf.
QA Wolf’s AI-native service supports web and mobile apps, delivering 80% automated test coverage in weeks and helping teams ship 5x faster by reducing QA cycles to minutes.
QA Wolf takes testing off your plate. They can get you:
Unlimited parallel test runs for mobile and web apps
24-hour maintenance and on-demand test creation
Human-verified bug reports sent directly to your team
Zero flakes guarantee
The benefit? No more manual E2E testing. No more slow QA cycles. No more bugs reaching production.
With QA Wolf, Drata’s team of 80+ engineers achieved 4x more test cases and 86% faster QA cycles.
When Stripe first launched, they became known for integrating payment processing into any business with just seven lines of code.
This was a really big achievement. Taking something as complex as credit card processing and reducing it to a simple code snippet felt revolutionary. In essence, a developer could open a terminal, run a basic curl command, and immediately see a successful credit card payment.
However, building and maintaining a payment API that works across dozens of countries, each with different payment methods, banking systems, and regulatory requirements, is one of the most difficult problems. Most of the time, companies either lock themselves into supporting just one or two payment methods or force developers to write different integration code for each market.
Stripe had to evolve the API multiple times over the next 10 years to handle credit cards, bank transfers, Bitcoin wallets, and cash payments through a unified integration.
But getting there wasn’t easy. In this article, we look at how Stripe’s payment APIs evolved over the years, the technical challenges they faced, and the engineering decisions that shaped modern payment processing.
Disclaimer: This post is based on publicly shared details from the Stripe Engineering Team. Please comment if you notice any inaccuracies.
When Stripe launched in 2011, credit cards dominated the US payment landscape. The initial API design reflected this reality.
Stripe introduced two fundamental concepts that would become the foundation of their platform.
The Token was the first concept. When a customer entered their card details in a web browser, those details were sent directly to Stripe’s servers using a JavaScript library called Stripe.js.
This was crucial for security. By never allowing card data to touch the merchant’s servers, Stripe helped businesses avoid complex PCI compliance requirements. PCI compliance refers to security standards that businesses must follow when handling credit card information. These requirements are expensive and technically demanding to implement correctly.
In exchange for the card details, Stripe returned a Token. Think of a Token as a safe reference to the card information. The actual card number lived in Stripe’s secure systems. The Token was just a pointer to that data.
The Charge was the second concept. After receiving a Token from the client, the merchant’s server could create a Charge using that Token and a secret API key.
A Charge represented the actual payment request. When you created a Charge, the payment either succeeded or failed immediately. This immediate response is called synchronous processing, meaning the result comes back right away.
See the diagram below that shows this approach:
The payment flow followed a pattern common in traditional web applications:
JavaScript client creates a Token using a publishable API key
The browser sends the Token to the merchant’s server
The server creates a Charge using the Token and a secret API key
Payment succeeds or fails immediately
The server can fulfill the order based on the result
As Stripe expanded, they needed to support payment methods beyond credit cards. In 2015, they added ACH debit and Bitcoin. These payment methods introduced fundamental differences that challenged the existing API design.
Payment methods differ along two important dimensions.
First, when is the payment finalized? Finalized means you have confidence that the funds are guaranteed and you can ship goods to the customer. Credit card payments are finalized immediately. However, Bitcoin payments can take about an hour, whereas ACH debit payments may take days to finalize.
Second, who initiates the payment? With credit cards and ACH debit, the merchant initiates the payment by charging the customer. With Bitcoin, the customer creates a transaction and sends it to the merchant. This requires the customer to take action before any money moves.
For ACH debit, Stripe extended the Token resource to represent both card details and bank account details. However, they needed to add a pending state to the Charge. An ACH debit Charge would start as pending and only transition to successful days later. Merchants had to implement webhooks to know when the payment actually succeeded.
See the diagram below:
For reference, a webhook is a mechanism where Stripe calls your server when something happens. Instead of your server repeatedly asking Stripe if the payment succeeded yet, Stripe sends a notification to a URL on your server when the status changes. Your server needs to set up an endpoint that listens for these notifications and processes them accordingly.
For Bitcoin, the existing abstractions did not work at all. Stripe introduced a new BitcoinReceiver resource. A receiver was a temporary storage for funds. It had a simple state machine with one boolean property called filled. A state machine is a system that can be in different states and transitions between them based on events. The BitcoinReceiver could be filled (true) or not filled (false).
The Bitcoin payment flow worked like this:
Client creates a BitcoinReceiver.
The customer sends Bitcoin to the receiver’s address.
Receiver transitions to filled.
The server creates a Charge using the BitcoinReceiver.
The charge starts in the pending state.
Charge transitions to “succeeded” after confirmations.
See the diagram below:
This introduced complexity. Merchants now had to manage two state machines to complete a single payment: BitcoinReceiver on the client side and Charge on the server side. Additionally, they needed to handle asynchronous payment finalization through webhooks.
Over the next two years, Stripe added many more payment methods. Most were similar to Bitcoin, requiring customer action to initiate payment. The Stripe engineering team realized that creating a new receiver-like resource for each payment method would become unmanageable. Therefore, they decided to design a unified payments API.
To do so, Stripe combined Tokens and BitcoinReceivers into a single client-driven state machine called a Source. When created, a Source could be immediately chargeable, like credit cards, or pending, like payment methods requiring customer action. The server-side integration remained simple: create a Charge using the Source.
See the diagram below:
The Sources API supported cards, ACH debit, SEPA direct debit, iDEAL, Alipay, Giropay, Bancontact, WeChat Pay, Bitcoin, and many others. All of these payment methods use the same two API abstractions: a Source and a Charge.
While this approach seemed elegant at first, the team discovered serious problems once they understood how the flow integrated into real applications. Consider a common scenario with iDEAL, the predominant payment method in the Netherlands:
The customer completes payment on their bank’s website.
If the browser loses connectivity before communicating back to the merchant’s server, the server never creates a Charge.
After a few hours, Stripe automatically refunds the money to the customer. The merchant loses the sale even though the customer successfully paid. This is a conversion nightmare.
To reduce this risk, Stripe recommended that merchants either poll the API from their server until the Source became chargeable or listen for the source.chargeable webhook event to create the Charge. However, if a merchant’s application went down temporarily, these webhooks would not be delivered, and the server would not create the Charge.
The integration grew more complex because different Sources behaved differently:
Some Sources like cards and bank account were synchronously chargeable and could be charged immediately on the server. Others were asynchronous and could only be charged hours or days later. Merchants often built parallel integrations using both synchronous HTTP requests and event-driven webhook handlers.
For payment methods like OXXO, where customers print a physical voucher and pay cash at a store, the payment happens entirely outside the digital flow. Listening for the webhook became necessary for these payment methods.
Merchants also had to track both the Charge ID and Source ID for each order. If two Sources became chargeable for the same order, perhaps because a customer decided to switch payment methods mid-payment, the merchant needed logic to prevent double-charging.
See the diagram below:
Stripe realized they had designed their system around the simplest payment method: credit cards. Looking at all payment methods, cards were actually the outlier. Cards were the only payment method that finalized immediately and required no customer action to initiate payment. Everything else was more complex.
Developers had to understand the success, failure, and pending states of two state machines whose states varied across different payment methods. This demanded far more conceptual understanding than the original seven lines of code promised.
In late 2017, Stripe assembled a small team: four engineers and one product manager. They locked themselves in a conference room for three months with a singular goal of designing a truly unified payments API that would work for all payment methods globally.
The team followed strict rules:
They closed their laptops during working sessions to stay fully present.
They started each session with questions they wanted to answer and wrote down new questions for later sessions rather than getting sidetracked.
They used colors and shapes on whiteboards instead of naming concepts prematurely, avoiding premature anchoring on specific definitions.
Most importantly, they focused on enabling real user integrations. They wrote hypothetical integration guides for every payment method to validate their concepts.
They even wrote guides for imaginary payment methods to ensure the abstractions were flexible enough.

The team created two new concepts that finally achieved true unification.
PaymentMethod represents the “how of a payment.” It contains static information about the payment instrument the customer wants to use. This includes the payment scheme and credentials needed to move money, such as card information, bank account details, or customer email. For some methods (like Alipay), only the payment method name is required because the payment method itself handles collecting further information. Importantly, a PaymentMethod has no state machine and contains no transaction-specific data. It is simply a description of how to process a payment.
PaymentIntent represents the “what of a payment.” It captures transaction-specific data such as the amount to charge and the currency. The PaymentIntent is the stateful object that tracks the customer’s attempt to pay. If one payment attempt fails, the customer can try again with a different PaymentMethod. The same PaymentIntent can be used with multiple PaymentMethods until payment succeeds.
See the diagram below:
The key insight was creating one predictable state machine for all payment methods:
requires_payment_method: Need to specify how the customer will pay
requires_confirmation: Have the payment method ready to initiate payment
requires_action: Customer must do something like authenticate or redirect
processing: Stripe is processing the payment
succeeded: Funds are guaranteed, and the merchant can fulfill the order
Notably, there is no failed state. If a payment attempt fails, the PaymentIntent returns to requires_payment_method so the customer can try again with a different method.
The new integration works consistently across all payment methods:
The server creates a PaymentIntent with an amount and a currency
Server sends the PaymentIntent’s client_secret to the browser
The browser collects the customer’s preferred payment method
The browser confirms the PaymentIntent using the secret and payment method
PaymentIntent may enter requires_action state with instructions
The browser handles the action, such as 3D Secure authentication
Server listens for payment_intent.succeeded webhook
The server fulfills the order when payment succeeds
This approach had major improvements over Sources and Charges. Only one webhook handler was needed, and it was not in the critical path for collecting money. The entire flow used one predictable state machine. The integration was resilient to client disconnects because the PaymentIntent persisted on the server. Most importantly, the same integration worked for all payment methods with just parameter changes.
Designing the PaymentIntents API was the hard but enjoyable part. Launching it took almost two years because of a perception challenge: the new API did not feel like seven lines of code anymore.
In normalizing the API across all payment methods, card payments became more complicated to integrate. The new flow flipped the order of client and server requests. It also introduced webhook events that were optional before. For developers building traditional web applications who only cared about accepting card payments in the US and Canada, PaymentIntents was objectively harder than Charges.
The power-to-effort curve looked different. Each incremental payment method was cheap to add to a PaymentIntents integration. However, getting started with just card payments required more upfront effort. Speed matters for startups wanting to get running quickly. With Charges, getting cards working was intuitive and low-effort.

Stripe’s solution was to add convenient packaging of the API that catered to developers who wanted the simplest possible flow. They called the default integration the global payments integration and created a simpler version called card payments without bank authentication.
This simpler integration used a special parameter called error_on_requires_action. This parameter tells the PaymentIntent to return an error if any customer action is required to complete the payment. A merchant using this parameter cannot handle actions required by the PaymentIntent state machine, effectively making it behave like the old Charges API.
The parameter name makes it very clear what merchants are choosing. When they eventually need to handle actions or add new payment methods, it is obvious what to do: remove this parameter and start handling the requires_action state. Developers using this packaging do not have to change the core resources even when upgrading to the full global integration.
Stripe emphasized that a great API requires more than just the API itself. Some approaches they used are as follows:
They developed the Stripe CLI, a command-line tool that made testing webhooks locally much simpler.
They created Stripe Samples, allowing developers who prefer learning by example to start with working code.
They redesigned the Stripe Dashboard to help developers debug and understand the PaymentIntent state machine visually.
The team also handled the unglamorous but essential work of updating every piece of documentation, support article, and canned response that referenced old APIs. They reached out to community content creators, asking them to update their materials. They recorded numerous tutorials for both users and internal support teams.
The journey from Charges to PaymentIntents revealed important principles about API design.
First, successful products tend to accumulate product debt over time, similar to technical debt. For API products, this debt is particularly hard to address because you cannot force developers to restructure their integrations fundamentally. It is much easier to add parameters to existing requests than to introduce new abstractions.
Second, designing from first principles is essential. Stripe realized that Charges and Tokens were foundational, not because they were the right abstraction for global payments, but simply because they were the first APIs built. They had to set aside the existing APIs and think about the problem fresh.
Third, keeping things simple does not mean reducing the number of resources or parameters. Two overloaded abstractions are not simpler than four clearly-defined abstractions. Simplicity means making APIs consistent and predictable while creating the right packages.
Fourth, migration requires compromise. Stripe created Charge objects behind the scenes for each PaymentIntent to maintain compatibility with existing integrations. This allowed merchants to migrate their payment flow without breaking their analytics and reporting systems.
Finally, API design is fundamentally collaborative work. The breakthrough came when engineers and product managers worked together intensively, closing laptops and focusing completely on understanding the problem space.
In a nutshell, Stripe’s evolution from seven lines of code to a sophisticated global payments API demonstrates that simplicity and power are not opposing goals. The challenge is creating abstractions that handle complexity internally while presenting a predictable, consistent interface to developers.
References: