MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

I’m starting with Unity and want to create 2D games — What advice would you give me to stay motivated and learn properly?

2025-11-20 10:08:43

Hi everyone
Today I officially started my learning journey with Unity. I spent the whole morning exploring the basics: a bit of 3D, a bit of 2D, and even tried Godot out of curiosity.
My long term goal is to create simple 2D games (maybe even start by replicating a small stickman-style game just to learn). For now, I’m treating this as a hobby, but I don’t rule out taking it more seriously in the future.

However, there’s something that learning paths don’t usually teach you:

  • how to stay motivated

  • how to deal with frustration when things get hard

  • how to avoid losing enthusiasm or giving up

I know there may be a point where I lose motivation or feel like I'm not progressing fast enough, so I’d love to hear real experiences from people who have already gone through this.

My questions for you

  • How did you stay motivated when you were starting out?

  • What do you wish you had known at the beginning?

  • What would you recommend to someone who literally downloaded Unity today?

  • How do you avoid frustration or giving up too quickly?

I really appreciate any advice or experience you’re willing to share

Thanks for reading and for your help!

Mutex and Lock Guard in C++

2025-11-20 09:53:59

Mutex (Mutual Exclusion)

Mutex is a synchronization object that controls access to shared resources in a multithreaded environment.
It is used to prevent Race Conditions that can occur when multiple threads access the same resource simultaneously.

Code Example

#include <mutex>
#include <thread>

std::mutex mtx;
int counter = 0;

void increment() {
    for (int i = 0; i < 100000; ++i) {
        mtx.lock();
        counter++; // Critical Section
        mtx.unlock();
    }
}

int main() {
    std::thread t1(increment);
    std::thread t2(increment);

    t1.join();
    t2.join();

    std::cout << "Counter: " << counter << std::endl;
    return 0;
}

Key Methods

  • lock()
    • Locks the mutex in a blocking manner.
    • If another thread has already acquired the lock, it waits until it is released.
  • unlock()
    • Releases the acquired lock.
    • Calling this from a thread that hasn't acquired the lock results in undefined behavior.
  • try_lock()
    • Attempts to lock in a non-blocking manner, returning true on success and false immediately on failure.
    • Allows performing other tasks without waiting to acquire the lock.

As shown in the example code above, the code region between lock() and unlock() (Critical Section) can only be executed by one thread at a time.
However, if an exception occurs or an early return happens before calling unlock(), a deadlock will occur.

Mutex for Special Situations

  • recursive_mutex
    • A mutex that allows the same thread to acquire the lock multiple times.
    • Must call unlock() as many times as the lock was acquired to fully release it.
  • timed_mutex
    • A mutex that allows specifying a timeout.
    • Provides try_lock_for() and try_lock_until() methods.
std::timed_mutex tmtx;

void function() {
    if (tmtx.try_lock_for(std::chrono::seconds(1))) {
        // Successfully acquired lock within 1 second
        // Critical Section
        tmtx.unlock();
    } else {
        // Timeout occurred
    }
}
  • shared_mutex
    • Implements a Reader-Writer Lock.
    • Supported from C++17.
    • Multiple threads can perform read operations simultaneously, but write operations are performed exclusively.

Lock Guard

Manually managing lock() and unlock() is dangerous. The solution to this problem is the RAII (Resource Acquisition Is Initialization) pattern. It acquires resources in the constructor and releases them in the destructor, utilizing C++'s stack unwinding mechanism to ensure resources are safely released even when exceptions occur.

lock guard is one of the classes provided by the C++ standard library that helps reduce mistakes in mutex management.
When using lock guard, the mutex is automatically locked, and when the scope is exited, the lock_guard's destructor is called to automatically release the mutex.

lock_guard

  • The most basic RAII-based mutex wrapper.
  • Automatically calls lock() on construction and unlock() on destruction.
  • The simplest with minimal overhead.
  • Acquires the lock immediately upon creation.
  • Cannot control when the lock is released.
  • Cannot be copied or moved.
#include <mutex>
#include <thread>

std::mutex mtx;
int counter = 0;

void increment() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(mtx);  // Lock on creation
        counter++;
        // Automatically unlocks when leaving scope
    }
}

void safe_function() {
    std::lock_guard<std::mutex> lock(mtx);
    if (some_condition)
        return;
    process_data();  // Unlock guaranteed even if exception occurs
}

unique_lock

  • Provides various features including deferred locking, condition variable integration, and ownership transfer.
  • Can manually call lock()/unlock().
  • Essential for use with condition variables like condition_variable.
  • Movable but not copyable.
std::mutex mtx;

void flexible_function() {
    std::unique_lock<std::mutex> lock(mtx, std::defer_lock);  // Deferred locking
    prepare_data();
    lock.lock();  // Manually lock at the needed point
    modify_shared_data();
    lock.unlock();  // Can manually release
    cleanup(); // Other tasks without lock

    // Automatically unlocks if still locked when scope ends
}
// Use with condition variables (most common case)
#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
bool ready = false;

void wait_for_signal() {
    std::unique_lock<std::mutex> lock(mtx);
    cv.wait(lock, [] { return ready; });  // unique_lock required
    process_data();
}

scoped_lock

  • Supported from C++17.
  • Used to prevent deadlocks when locking multiple mutexes simultaneously.
  • Uses the std::lock() algorithm internally.
  • Identical to lock_guard for a single mutex.
std::mutex mtx1, mtx2;

// Code that can cause deadlock
void thread1() {
    std::lock_guard<std::mutex> lock1(mtx1);
    std::lock_guard<std::mutex> lock2(mtx2);  // Order issue
    // ...
}

void thread2() {
    std::lock_guard<std::mutex> lock2(mtx2);
    std::lock_guard<std::mutex> lock1(mtx1);  // Opposite order!
    // ...
}

// Prevent deadlock with scoped_lock
void safe_thread1() {
    std::scoped_lock lock(mtx1, mtx2);  // Uses deadlock avoidance algorithm
    // ...
}

void safe_thread2() {
    std::scoped_lock lock(mtx2, mtx1);  // Safe regardless of order
    // ...
}

Lock Guard is a method for safely managing mutexes using the RAII pattern.
Since locks are automatically released even in exception or early return situations, it is much safer than manually managing lock()/unlock().

In most cases, lock_guard is sufficient, and unique_lock or scoped_lock should only be used in special situations.
Proper use of these can prevent many bugs that can occur in multithreaded programming.

My Open-Source Contribution - Working on TypeScript-ESLint

2025-11-20 09:46:49

I wanted to share my contributions to open-source, and especially the work I recently did for the TypeScript-ESLint project. For this week, I challenged myself to contribute to a much bigger and more complex project than I had worked with before. I wanted to push past basic fixes and actually make a change that improves developer experience for a large community. After exploring the repository and reading through open issues, I found Issue #11758, which described a bug in the no-base-to-string rule. The rule was mistakenly warning when calling .toString() on a class that extends Error and uses generics, even though that should be allowed.

Before writing any changes, I needed to understand how the rule checked TypeScript types. I spent a lot of time reading through the code and learning how the TypeScript type checker works with inheritance. Eventually, I discovered the cause of the bug: the rule only checked the type’s own name to decide whether it should be ignored, but it never looked at its parent type. Because of that, a type like a generic subclass of Error was treated as unsafe, even though it should be allowed. To fix the issue, I updated the rule so it now looks not only at the current type but also at all of its base types, even when generics are involved. This means the rule can now correctly recognize inherited behavior.

In the end, the fix turned out to be much smaller than I first expected. After updating the rule logic, I then needed to prove that it worked. I added a new valid test case that represents the situation from the issue, where a generic class extends another class which ultimately extends Error. This test confirms that the rule no longer reports .toString() in that scenario. Running the full test suite showed that everything passed successfully, which gave me confidence that my fix solved the problem without causing any new issues elsewhere in the project.

This contribution was a big learning step for me. Compared to earlier work, I gained much deeper experience reading unfamiliar code and learning how to efficiently navigate through large codebases. It also showed me that I am capable of contributing to major developer tools used by many people every day. I hope to continue doing more contributions like this going forward.

API Gateway vs Service Mesh: Beyond the North–South/East–West Myth

2025-11-20 09:41:21

Please note that the page became big because I had questions on my own and less information would have made things look speculatory. You can skip this and read links added at the end of the page, they are very good.

My Experimental Code Link

Like always, if you just read and not code for this, it pretty much becomes as good as not reading it.

Github Link: https://github.com/rajkundalia/api-gateway-service-mesh-sample

This took a long time, I tried implementing a service mesh but it went above my scope - so things like Intentions in Consul would not work.

Introduction: The Misconception That's Costing Teams

If you've worked with microservices, you've probably heard this oversimplification: "API Gateways handle north–south traffic, while Service Meshes handle east–west traffic."

This directional framing has become microservices folklore - repeated in architecture discussions and echoed in conference talks for years.

Here's the issue: it's fundamentally wrong.

This misconception leads to poor architectural decisions, unnecessary complexity, and recurring confusion about which technology solves which problem. Teams often reach for an API Gateway when a Service Mesh is what they truly need - or vice versa - because they focus on traffic direction rather than the underlying purpose.

The truth is more nuanced:

  • API Gateways can manage east–west traffic via internal gateways that govern inter-service communication, apply policies, and handle versioning.
  • Service Meshes can handle north–south traffic through mesh-aware ingress gateways (such as Istio's Ingress Gateway or Linkerd's ingress controller) that bring external traffic into the mesh.

So if traffic direction isn't the real difference, what is?

Image

Purpose and responsibility.

An API Gateway treats services as products - with user governance, access control, monetization, lifecycle management, and business context.

A Service Mesh, by contrast, provides infrastructure-level reliability for service-to-service communication - zero business logic, zero product thinking, purely connectivity.

In this article, we'll cut through the confusion and give you a clear mental model for when to use each technology - or when using both together creates the strongest architecture.

You'll learn:

  • What problems each technology actually solves (and why traffic direction doesn't matter)
  • The architectural differences that lead to different use cases
  • How capabilities like mTLS, retries, and zero-trust security define service meshes
  • A practical decision framework for choosing the right tool
  • How API Gateways and Service Meshes complement each other in real-world systems

Let's start by understanding the fundamental problems each technology was designed to solve.

Understanding the Real Problem Each Solves

API Gateway: APIs as a Product

An API Gateway's primary purpose is to expose services as managed, consumable APIs - treating your services like products that internal or external consumers can discover, use, and rely on.

But an API Gateway is far more than a reverse proxy. It embeds business logic and enables API composition: aggregating data from multiple services into a single response, transforming payloads, standardizing errors, and presenting a unified interface that shields clients from backend complexity. This is effectively the Backend-for-Frontend (BFF) pattern.

And once you move past request/response mechanics, the real power emerges. API Gateways participate in the entire API lifecycle - the part most developers overlook:

  • Creation & design: specs, versioning, schema validation
  • Testing & documentation: interactive docs, automated tests, sandboxes
  • Publishing & onboarding: developer portals, marketplaces, self-service access
  • Monetization: usage metering, billing hooks, tiered plans
  • Analytics: usage patterns, behavior insights, performance dashboards

This is where the gateway gains business context. It knows concepts like customers, products, API keys, and rate-limit tiers. When a mobile client sends a request, the gateway understands: "This is Acme Corp, a premium tier subscriber, allowed 10,000 requests per hour on the /payments API."

Modern platforms such as Kong, AWS API Gateway, Azure API Management, Apigee, and Ambassador all embody this philosophy - combining policy enforcement with full lifecycle and product-style API management.

Service Mesh: Service Connectivity Infrastructure

A Service Mesh has a fundamentally different purpose: providing decoupled infrastructure for service-to-service communication without requiring changes to application code.

Service Meshes offload network functions from services into a dedicated infrastructure layer. They handle concerns like service discovery, load balancing, circuit breaking, retries, and timeouts - all the complexity that developers would otherwise implement (and often implement inconsistently) across services.

Critically, Service Meshes have no business logic. They're purely connectivity and observability infrastructure. A service mesh doesn't know or care whether it's routing a payment transaction or a product catalog query. Every service is treated equally as a network endpoint with routing rules and policies.

This enables polyglot architectures. Your Python services, Go services, and Java services all get the same networking capabilities without embedding client libraries or writing language-specific code. The infrastructure handles it transparently.

The key insight: A Service Mesh is business-agnostic. It operates at the infrastructure layer, understanding concepts like "service instances," "endpoints," "failure rates," and "latency percentiles" - but never "customers," "API products," or "billing tiers."

Popular implementations include Istio, Linkerd, Consul Connect, and AWS App Mesh.

Quick Comparison

Aspect API Gateway Service Mesh
Primary Purpose Expose services as managed API products Decouple service communication infrastructure
Context Business-aware (users, products, billing) Business-agnostic (endpoints, metrics)
Logic Can contain transformation, aggregation logic No business logic, pure infrastructure
Lifecycle Scope Full API lifecycle (design → retirement) Runtime connectivity only
Consumer Focus External developers, partners, clients Services communicating with each other

Architecture Deep Dive

Deployment Models

The architectural differences between API Gateways and Service Meshes are stark, and understanding these differences clarifies why each excels at different problems.

API Gateway: Centralized Architecture

An API Gateway deploys as a standalone reverse proxy or clustered front-door, creating a single entry point (or small cluster) for API traffic. It lives in its own architectural layer, distinct from your services.

Here's a simplified view:

External Clients (Mobile, Web, Partners)
              ↓
    ┌─────────────────┐
    │  API Gateway    │ ← Centralized, clustered for HA
    │   (Kong/AWS)    │
    └─────────────────┘
         ↓    ↓    ↓
    ┌────┐ ┌────┐ ┌────┐
    │Svc │ │Svc │ │Svc │
    │ A  │ │ B  │ │ C  │
    └────┘ └────┘ └────┘

Traffic flows through the gateway as a dedicated hop. The gateway terminates external connections, applies policies, performs routing decisions, and forwards requests to backend services. Deployment is relatively straightforward - you provision the gateway infrastructure separately from your services.

Service Mesh: Decentralized Architecture

A Service Mesh deploys in a fundamentally different way: a sidecar proxy alongside every service replica. This is a decentralized, peer-to-peer model.

Service A          Service B          Service C
┌─────────┐        ┌─────────┐        ┌─────────┐
│  App    │        │  App    │        │  App    │
│Container│        │Container│        │Container│
└────┬────┘        └────┬────┘        └────┬────┘
     │                  │                  │
┌────┴────┐        ┌────┴────┐        ┌────┴────┐
│ Envoy   │◄──────►│ Envoy   │◄──────►│ Envoy   │
│ Sidecar │        │ Sidecar │        │ Sidecar │
└─────────┘        └─────────┘        └─────────┘
       ▲                 ▲                 ▲
       └─────────────────┴─────────────────┘
              Control Plane (Istio/Linkerd)
              (Configuration, not traffic)

Each service instance gets its own proxy (typically Envoy). When Service A calls Service B, the request flows: App A → Sidecar A → Sidecar B → App B. The service code itself doesn't know about the mesh - it makes standard HTTP or gRPC calls to localhost, and the sidecar handles everything else.

This deployment model is more invasive. It requires modifying your CI/CD pipelines to inject sidecars, updating Kubernetes manifests (or VM configurations), and managing the lifecycle of proxies alongside applications.

Key Insight: In an API Gateway, traffic converges at a central point. In a Service Mesh, traffic flows peer-to-peer between distributed proxies, with the control plane managing configuration but never touching actual requests.

Control Plane vs Data Plane Architecture

This separation of concerns is crucial for understanding Service Meshes, though it applies (less critically) to some API Gateway implementations.

Service Mesh: Deep Dive into Control and Data Planes

The control plane (examples: Istio's Pilot, Linkerd's Controller, Consul's servers) is the brain of the mesh:

  • Configuration management: Distributes routing rules, traffic policies, and service configurations to all sidecars
  • Service discovery: Maintains a live registry of all service instances and their endpoints
  • Certificate authority: Generates and rotates mTLS certificates for service identity
  • Telemetry aggregation: Collects metrics and traces from data plane proxies
  • Policy enforcement setup: Configures access control rules and rate limits

Critically: the control plane is NOT on the request path. It handles configuration and management but never sees actual user requests. This is fundamental to mesh scalability.

The data plane (examples: Envoy sidecars in Istio, Linkerd2-proxy in Linkerd) does the heavy lifting:

  • Handles actual request traffic: Every request flows through data plane proxies
  • Enforces policies: Implements circuit breakers, retries, timeouts configured by control plane
  • L4/L7 routing and load balancing: Makes real-time routing decisions
  • Security enforcement: Performs mTLS handshakes, validates certificates
  • Telemetry generation: Reports metrics, logs, and traces for observability

Let's make this concrete with service discovery as an example. When Service C scales from 3 to 5 replicas, here's what happens:

  1. Kubernetes (or your orchestrator) starts two new pods with Service C containers and Envoy sidecars
  2. The Envoy sidecars register with the control plane upon startup
  3. The control plane updates its service registry with the two new endpoints
  4. The control plane pushes updated routing configurations to all Envoy sidecars in the mesh
  5. Within seconds, Service A and Service B know about the new Service C instances and start load balancing across all 5 replicas

No DNS propagation delays. No manual configuration updates. No service discovery libraries in application code. The control plane orchestrates everything, while sidecars handle the actual routing.

API Gateway: Simpler Control Plane Model

Some API Gateway implementations (like Kong with its declarative configuration) have control plane concepts, but the separation is less critical. Many gateways bundle control and data plane functions in the same process. Configuration changes might require gateway reloads, and the gateway itself is on the request path - serving as both traffic handler and configuration enforcer.

Organizational and Deployment Challenges

Service Meshes face unique adoption barriers that API Gateways largely avoid:

1. Universal Sidecar Deployment Requirement

To get value from a service mesh, you need sidecars deployed alongside all services you want to manage. This creates organizational friction: it's not something a single team can adopt independently. You need buy-in from every service owner.

2. Shared Control Plane Access

All services must share access to the mesh control plane. This crosses security boundaries - teams that previously had isolated deployments now share infrastructure. Organizations with strict security postures find this challenging.

3. Cannot Control External Services

You can only mesh services you directly control. Third-party APIs, legacy systems outside your infrastructure, and managed services like external databases cannot participate in the mesh. This limits where resilience patterns apply.

4. Certificate Authority Coordination

Services in the same mesh must share a Certificate Authority (CA) for mTLS. This requires cross-team coordination on security policies and trust models. Different teams or products often want separate CAs for isolation - which means separate meshes.

Why This Matters: Service mesh adoption is often limited to team or product boundaries. An API Gateway, deployed as central infrastructure, can span the entire organization much more easily. It doesn't require every team to change their deployment processes.

Now that we understand the architectural differences and deployment realities, let's examine specific capabilities side-by-side.

Capabilities Comparison

Both technologies offer overlapping capabilities, but with different implementations and tradeoffs. Understanding these differences guides architectural decisions.

Service Discovery

  • API Gateway: Uses external service registries (Consul, Eureka, DNS, Kubernetes Services). The gateway queries the registry to find service endpoints, then routes traffic accordingly.
  • Service Mesh: Built-in service discovery via the control plane. The control plane automatically tracks all sidecar-enabled services, maintaining a live registry without external dependencies. When a service scales or moves, the mesh knows immediately.

Authentication and Authorization ⭐

This is perhaps the most important architectural differentiator between the two patterns.

  • API Gateway: Focuses on user and client identity. Validates API keys, OAuth2 tokens, JWT claims. Answers questions like: "Is this mobile app authorized to call the /payments endpoint?" or "Has this partner exceeded their rate limit?" Security is about edge protection - who gets into your system and what they can access.

  • Service Mesh: Focuses on service identity via mTLS certificates. Every service gets a cryptographic identity. Answers questions like: "Is this really the Payment service calling Fraud Detection?" or "Should Order Service be allowed to communicate with User Profile Service?" Security is about Zero-Trust architecture - no service implicitly trusts another.

Load Balancing

  • API Gateway: Server-side load balancing at the gateway layer. The gateway distributes requests across service instances based on configured algorithms (round-robin, least connections, weighted).
  • Service Mesh: Client-side load balancing distributed via sidecars. Each sidecar makes load balancing decisions locally, using health status and latency information from the control plane. This enables more sophisticated strategies like locality-aware routing (prefer same-zone instances).

Rate Limiting

  • API Gateway: Edge-focused, per-client or per-API-key. Limits like "1000 requests per hour for this developer" or "premium tier customers get 10x capacity." Centralized enforcement at the gateway.
  • Service Mesh: Can implement distributed rate limiting to prevent service overload. For example, preventing the Notification Service from overwhelming Email Service with requests, regardless of which client triggered the flow. Enforcement happens at sidecars across the mesh.

Circuit Breakers and Retries

  • API Gateway: Configured at the gateway level to protect against downstream service failures. If Payment Service is down, the gateway can circuit break to avoid cascading failures.
  • Service Mesh: Configured at the control plane, enforced at every sidecar. Each service gets automatic circuit breakers and retries without code changes. When Inventory Service calls Warehouse Service and detects failures, the sidecar automatically circuit breaks - no retry logic in Inventory Service code.

Health Checks

  • API Gateway: Gateway actively probes downstream services for health, removing unhealthy instances from its routing pool.
  • Service Mesh: Sidecars monitor local service health and report to the control plane. Passive health checks based on actual request success rates. Faster reaction to failures because the sidecar sits adjacent to the service.

Observability

  • API Gateway: Edge metrics and API-level analytics. Tracks which APIs are called, by whom, how often, and with what latency. Great for understanding API usage patterns and client behavior.
  • Service Mesh: Deep service-to-service metrics and distributed tracing. Tracks every internal call with detailed latency breakdowns, success rates, and request volumes. Enables debugging complex distributed transactions by tracing requests as they flow through multiple services.

Example: When a user checkout fails, the API Gateway shows the client request hit the /checkout endpoint with a 500 error. The service mesh traces reveal that Order Service → Inventory Service succeeded, but Inventory Service → Warehouse Service timed out after 3 retries - pinpointing the exact failure point.

Protocol Support

  • API Gateway: Primarily HTTP/HTTPS, with increasing support for gRPC, WebSockets, and GraphQL. Focused on application-layer protocols.
  • Service Mesh: Supports both L4 (TCP) and L7 (HTTP, gRPC) protocols. Can handle raw TLS connections, TCP traffic, and any IP-based protocol. Broader protocol range because it operates at the network infrastructure layer.

Chaos Engineering and Defect Simulation

  • API Gateway: Limited capabilities - some gateways allow injecting delays or errors, but it's not a primary feature.
  • Service Mesh: Built-in chaos engineering support. Can inject faults (return 500 errors), add delays (simulate network latency), or abort connections to specific services. Enables testing resilience in production-like conditions. For example, "Make 10% of calls from Order Service to Inventory Service return 503 errors to verify circuit breakers work."

image

Summary Table

Capability API Gateway Service Mesh
Service Discovery External registry (Consul, DNS) Built-in via control plane
Authentication/Authorization User/client identity (OAuth, API keys) Service identity (mTLS certificates)
Load Balancing Server-side, centralized Client-side, distributed
Rate Limiting Per-client/API key at edge Per-service, distributed
Circuit Breakers At gateway Distributed, no code changes
Health Checks Gateway probes services Sidecars monitor local health
Observability Edge metrics, API analytics Service-to-service tracing
Protocols HTTP/HTTPS, gRPC, WebSockets L4 + L7 (TCP, HTTP, gRPC, TLS)
Chaos Engineering Limited Built-in fault injection

Among these capabilities, mutual TLS deserves special attention because it fundamentally changes how services authenticate and trust each other.

Mutual TLS (mTLS) in Service Mesh

How mTLS Works and Why It Matters

The Mechanism:

When a service mesh is deployed, the control plane includes a Certificate Authority (CA). This CA generates unique, short-lived certificates for every service replica. When Service A's sidecar calls Service B's sidecar, both sides present certificates during the TLS handshake, cryptographically proving their identities.

Here's the flow:

  1. Order Service sidecar initiates connection to Payment Service
  2. Payment sidecar presents certificate: "I am payment.production.svc.cluster"
  3. Order sidecar verifies certificate against the mesh CA
  4. Order sidecar presents its own certificate: "I am order.production.svc.cluster"
  5. Payment sidecar verifies Order's certificate
  6. Encrypted, authenticated connection established

Crucially, sidecars automatically handle certificate rotation. Certificates might rotate every few hours, and services never see this complexity - it's entirely transparent.

The Value:

This eliminates the need for service-level authentication code. Previously, Payment Service might check an API key or JWT token to verify the caller. With mTLS, the infrastructure proves identity cryptographically. Your service code doesn't need to know about authentication - it receives requests that have already been authenticated at the network layer.

Additionally:

  • Encryption by default: All east-west traffic is encrypted, protecting against network sniffing
  • Audit trail: The mesh knows exactly which services communicated with which other services
  • Compliance: Meets requirements for data-in-transit encryption (SOC2, PCI-DSS, HIPAA)

Certificate Authority Boundaries

Services in the same mesh must share a Certificate Authority. This has organizational implications.

Consider a large company with two product teams: Banking and Trading. For security isolation, they want separate Certificate Authorities - Banking services shouldn't trust certificates from Trading services. This means they need two separate service meshes (Mesh A and Mesh B).

But what if Banking needs to expose APIs to Trading? This is where API Gateways complement service meshes. An API Gateway can sit at the boundary between meshes, terminating mTLS from one mesh and re-establishing it in another mesh (or using traditional API authentication). The gateway bridges different trust domains.

mTLS and Zero-Trust Networking

mTLS enables Zero-Trust architecture for internal service communication.

Traditional security followed the "castle and moat" model: strong perimeter defenses, but once inside the network, services implicitly trusted each other. An attacker who breached the perimeter had free access to internal systems.

Zero-Trust rejects this model: never trust, always verify. Every request, even between internal services, requires authentication. No service is trusted by default, regardless of network location.

Service meshes with mTLS implement Zero-Trust for east-west traffic. Even if an attacker deploys a rogue container inside your cluster, it cannot communicate with legitimate services because it lacks valid certificates signed by the mesh CA. Every service must cryptographically prove its identity on every request.

With these capabilities and security models in mind, let's turn to practical decision-making: when should you use each technology?

When to Use Each

There's no one-size-fits-all answer. Choosing between API Gateways and Service Meshes depends on your primary challenge, team maturity, and architectural scale. Let's build a decision framework.

Decision Framework: Use API Gateway When…

Primary Challenge: External Access & Client Management

If you need to expose services to external consumers - developers, partners, customers, mobile apps - choose an API Gateway. It excels at edge security, client authentication (API keys, OAuth2), and managing the full API product lifecycle.

Concrete scenario: You're building a SaaS platform where third-party developers integrate with your product catalog API. You need developer onboarding, API key provisioning, documentation portals, usage analytics, and tiered rate limiting. An API Gateway provides all of this out-of-the-box.

Primary Challenge: Service Abstraction & Evolution

If different products or teams need to communicate with governance, versioning, and backward compatibility, choose an API Gateway. It provides abstraction as underlying services evolve.

Concrete scenario: Your mobile team needs stable APIs while your backend undergoes frequent changes. The API Gateway maintains version 1 and version 2 of the /orders endpoint, routing v1 clients to legacy services and v2 clients to the new architecture. Backend teams can refactor without breaking mobile apps.

Primary Challenge: Centralized Control & Simplicity

If you're starting your microservices journey and need immediate value with lower operational complexity, choose an API Gateway. Simpler deployment, easier to understand, lower barrier to entry.

Concrete scenario: You're migrating from a monolith to 5–10 microservices. You need request routing, basic rate limiting, and API documentation. A service mesh would be overkill - too much infrastructure overhead for your scale. An API Gateway solves your immediate needs without the operational burden.

Primary Challenge: Edge Security & Rate Limiting

If your main concern is protecting services from external threats and managing API quotas per customer, choose an API Gateway.

Concrete scenario: Your public APIs face potential DDoS attacks, credential stuffing, and abusive clients. The API Gateway implements rate limiting, IP blocking, JWT validation, and anomaly detection at the edge, before traffic reaches your services.

Decision Framework: Use Service Mesh When…

Primary Challenge: Internal Service Reliability

If you have large-scale internal architecture (dozens to hundreds of services) with complex communication patterns, and services need automatic retries, circuit breakers, and timeouts without code changes, choose a Service Mesh.

Concrete scenario: You have 80 microservices across 12 teams. Services frequently fail partially - timeouts, transient errors, network blips. Rather than each team implementing retry logic differently (or not at all), the service mesh provides consistent resilience patterns across all services. When Recommendation Service calls User Profile Service and gets a timeout, the sidecar automatically retries with exponential backoff - no code change needed.

Primary Challenge: Polyglot Environments & Code Elimination

If you want to eliminate networking code from services and need uniform connectivity across services written in different languages, choose a Service Mesh.

Concrete scenario: Your platform includes Python ML services, Go APIs, Java batch processors, and Node.js real-time services. Rather than maintaining four different HTTP client libraries with circuit breakers, retries, and observability, the service mesh provides identical capabilities to all services regardless of language. Developers focus on business logic, not networking infrastructure.

Primary Challenge: Security Compliance & Zero-Trust

If security compliance requires mTLS encryption for all internal communication, or you need Zero-Trust architecture with cryptographic service identity, choose a Service Mesh.

Concrete scenario: Rather than configuring TLS in every service's application code, the service mesh provides automatic mTLS between all services. Auditors see consistent encryption policies enforced at the infrastructure layer, dramatically simplifying compliance evidence.

Primary Challenge: Deep Observability & Traffic Control

If you require deep east-west observability and distributed tracing across all services, or need advanced traffic management (canary deployments, traffic splitting, A/B testing) for internal services, choose a Service Mesh.

Concrete scenario: You're rolling out a major refactor of Order Service. You want to send 5% of traffic to the new version, monitor error rates and latency, gradually increase to 50%, then 100%. The service mesh enables this with configuration changes - no deployment changes, no feature flags in code. If error rates spike, you roll back instantly by updating traffic weights.

When NOT to Use Service Mesh

Avoiding Unnecessary Complexity:

Service meshes are powerful but operationally complex. Don't use them if:

  • Small architectures (< 10–15 services): Operational overhead outweighs benefits. You'll spend more time managing the mesh than you save from its features.
  • Team lacks infrastructure expertise: Service meshes have a steep learning curve. If your team struggles with Kubernetes basics, adding a service mesh will slow you down.
  • Cannot deploy sidecars: If you depend on external services, legacy systems you don't control, or third-party SaaS APIs, a service mesh can't manage those connections.
  • Organizational resistance: Service meshes require cross-team adoption. If teams resist sidecar injection or control plane dependencies, forced adoption fails.
  • Ultra-sensitive performance requirements: Sidecars add latency (typically 1–5ms per hop). For ultra-low-latency scenarios where even milliseconds matter, this overhead is unacceptable.
  • Limited operational resources: Service meshes require dedicated platform engineering resources. If you lack staff to manage mesh infrastructure, troubleshoot sidecar issues, and handle certificate rotation problems, don't adopt a mesh.

Decision Matrix: Use Both When…

The Comprehensive Approach:

Many mature architectures use both technologies together, leveraging each for its strengths.

Use both when:

  • You need edge control for external clients (API Gateway) AND in-mesh reliability for internal services (Service Mesh)
  • You want API-as-a-product capabilities (documentation, monetization, developer portals) AND Zero-Trust security internally (mTLS between services)
  • You have a mature platform engineering team capable of managing layered infrastructure

Example decision: "We expose our Payment API to mobile apps and partners via API Gateway - handling JWT validation, per-customer rate limiting, and maintaining a developer portal. Internal communication between Payment Service, Fraud Detection Service, and Notification Service uses a service mesh - providing mTLS encryption, circuit breakers, and distributed tracing. The API Gateway itself runs as a service within the mesh, getting the same resilience and observability benefits."

Real-World Architecture Example

Let's walk through a financial institution scenario that illustrates how both technologies complement each other.

Scenario: Multi-Product Financial Platform

A financial institution has two major products:

  • Banking Platform (account management, transfers, statements)
  • Trading Platform (stock trading, portfolio management, market data)

Each product has its own engineering team, separate deployments, and independent release cycles. Here's how they use both technologies:

Service Mesh Deployment (Two Separate Meshes)

  • Banking Mesh: Covers 25 microservices (Account Service, Transaction Service, Statement Generator, etc.) with its own Certificate Authority for security isolation
  • Trading Mesh: Covers 18 microservices (Order Execution, Portfolio Service, Market Data, etc.) with a separate Certificate Authority

Each mesh provides:

  • mTLS encryption for all internal communication within that product
  • Circuit breakers and retries for resilience
  • Distributed tracing to debug complex transactions
  • Zero-Trust security - no service trusts another by default

API Gateway Deployment (Multiple Gateways)

  • Internal API Gateway: Banking Platform exposes select APIs to Trading Platform (e.g., "Get Account Balance" for margin trading). This gateway sits at the boundary between Banking Mesh and Trading Mesh, bridging different trust domains.
  • Edge API Gateway: Both products expose APIs to mobile applications. This gateway handles:
    • JWT validation for user authentication
    • Rate limiting per user tier (retail vs institutional)
    • API versioning (mobile app v1.2 uses older endpoint, v2.0 uses new schema)
    • Developer portal for partner integrations
    • Analytics on API usage patterns

Multi-Datacenter Deployment

The architecture spans two datacenters (DC1 and DC2) for high availability:

  • Each datacenter has full mesh deployment (Banking Mesh and Trading Mesh)
  • API Gateways in each datacenter for local request handling
  • Cross-datacenter mesh communication uses mTLS across the WAN
  • API Gateway load balancers route users to nearest datacenter

Key Architectural Insights:

This architecture demonstrates several principles:

  • Isolation through separate meshes: Banking and Trading use different CAs, preventing accidental trust relationships
  • API Gateways bridge trust domains: Internal gateway mediates between meshes when cross-product communication is needed
  • Layered security: Edge gateway handles user authentication, mesh handles service authentication
  • Different lifecycle management: API versions can change without mesh reconfiguration; mesh policies can change without API versioning

When a mobile user checks their trading portfolio's buying power, here's the flow:

  1. Mobile app → Edge API Gateway (JWT validation, rate limiting)
  2. Edge API Gateway → Trading Platform's Portfolio Service (via Trading Mesh, with mTLS)
  3. Portfolio Service → Internal API Gateway (requesting account balance from Banking)
  4. Internal API Gateway → Banking Platform's Account Service (via Banking Mesh, with mTLS)
  5. Response flows back through each layer

Each technology layer adds value: the edge gateway protects against external threats and manages API products, while the meshes ensure reliable, secure service-to-service communication.

Pros and Cons Summary

Understanding the tradeoffs helps set realistic expectations and plan for operational challenges.

API Gateway

Pros:

  • Standardizes API delivery: Consistent authentication, rate limiting, and versioning across all APIs
  • Simplifies client integration: Single entry point with unified documentation reduces client complexity
  • High flexibility: Can transform requests, aggregate responses, implement complex routing logic
  • Easier adoption: Centralized deployment model requires less organizational coordination
  • Centralized analytics: Single place to monitor API usage, client behavior, and performance trends
  • Legacy integration: Can front legacy systems, providing modern API interfaces to old infrastructure

Cons:

  • Single point of failure risk: Though clustering mitigates this, the gateway remains a critical chokepoint
  • Centralization complexity at scale: As more APIs are added, gateway configuration grows complex
  • Latency introduction: Extra hop adds latency (typically 5–20ms depending on gateway processing)
  • Limited internal visibility: Only sees edge traffic, not service-to-service communication patterns
  • Scaling challenges: While horizontal scaling is possible, it's more complex than distributed architectures

Service Mesh

Pros:

  • Built-in observability: Comprehensive metrics, distributed tracing, and logging without code instrumentation
  • Enhanced security: Automatic mTLS, Zero-Trust architecture, cryptographic service identity
  • Resilience without code: Circuit breakers, retries, timeouts configured centrally, enforced everywhere
  • Fine-grained traffic control: Canary deployments, traffic splitting, A/B testing at infrastructure level
  • Chaos engineering capabilities: Inject faults and delays to test system resilience
  • Abstracts networking from code: Developers focus on business logic, not HTTP clients and retry libraries
  • Language agnostic: Same capabilities for Go, Python, Java, Node.js services

Cons:

  • Steep learning curve: Complex architecture requires dedicated platform engineering expertise
  • Operational complexity: Managing control plane, certificate rotation, sidecar upgrades adds operational burden
  • Latency overhead: Each sidecar hop adds latency; multiple hops compound this
  • Resource overhead: Memory and CPU per sidecar
  • Requires infrastructure maturity: Best suited for Kubernetes environments with GitOps practices
  • Organizational challenges: Requires cross-team adoption and coordination - can't be implemented in isolation
  • Deployment complexity: Sidecar injection, control plane dependencies increase deployment complexity

Conclusion

Let's return to where we started: the pervasive north-south/east-west myth that frames API Gateways and Service Meshes as mutually exclusive technologies defined by traffic direction.

This framing is fundamentally flawed. Both technologies can handle both traffic types. API Gateways can manage internal service-to-service communication through private gateways. Service Meshes can expose external traffic through ingress gateways. The real distinction has nothing to do with where traffic flows.

What actually matters is purpose:

  • API Gateways treat services as products with business context - managing full API lifecycles, understanding users and customers, handling monetization and developer onboarding. They operate at the application edge with business awareness.
  • Service Meshes provide business-agnostic infrastructure for service connectivity - offloading networking concerns from application code, enabling Zero-Trust security through mTLS, and providing deep observability without instrumentation. They operate at the infrastructure layer with no business logic.

Looking forward, both patterns continue to evolve. Service Meshes are simplifying operationally (Linkerd's focus on simplicity, Istio's ambient mesh reducing sidecar overhead). API Gateways are adding mesh-like features (Kong Mesh, Ambassador's service mesh integration). The boundaries blur, but the fundamental purposes remain distinct.

Choose your tools based on the problems they solve, not the traffic patterns they handle. Your architecture - and your team's sanity - will thank you.

Note

Obviously this content has been generated by LLM, but my approach to writing has been the following:

  1. I read topics from various pages out there.
  2. I come across questions/sub topics that I would want to cover.
  3. I add this questions/subtopics and then generate using LLM.
  4. I read the LLM generated content and then keep what I find necessary.

Links

Daily AI &amp; Automation Tech News - November 20, 2025

2025-11-20 09:41:07

Daily AI & Automation Tech News - November 20, 2025

AI news and automation tools continued to accelerate today, with open-source AI products surging on GitHub and fresh debates over how AI should operate in consumer and enterprise contexts. The biggest trend: agentic systems are moving from concept to practical stacks — memory engines, RL frameworks, and code-first agent kits are trending together, signaling a maturing ecosystem for building production-grade AI products.

At the same time, industry news underscores how fast the policy and trust landscape is shifting. The EU is softening parts of its AI and privacy rules, while Microsoft’s Windows AI agent rollout has sparked a new privacy conversation. For product teams and operators, the signal is clear: ship with guardrails, instrument for auditability, and keep privacy-by-design front and center.

Top Products

Curated highlights across AI products and automation tools that stood out for real-world impact and momentum.

TrendRadar (AI Analysis & Monitoring)

  • Category: AI products • Monitoring & Insights
  • Key features: Multi-platform trend aggregation; AI-powered analysis; alerting to Slack/Telegram/Email; fast web deployment; minimal setup
  • Why it matters: Cuts through information overload and turns tech trends into actionable intelligence for founders, analysts, and growth teams.
  • Impact on AI/automation/blockchain: Demonstrates how agentic analysis tools operationalize “AI as research analyst,” making continuous market sense-making feasible for lean teams.
  • Link: https://github.com/sansan0/TrendRadar

Microsoft Call Center AI (Agentic Telephony)

  • Category: Automation tools • Voice Agents
  • Key features: Programmable phone calls via API; inbound/outbound agent; configurable workflows
  • Why it matters: Brings voice-first agents into standard business operations — scheduling, escalations, and customer care.
  • Impact on AI/automation/blockchain: Practical path to automate Tier-1 support; illustrates where hybrid human-in-the-loop remains essential for quality and compliance.
  • Link: https://github.com/microsoft/call-center-ai

GibsonAI Memori (Memory Engine for Agents)

  • Category: AI infrastructure • Memory for LLMs
  • Key features: Open-source memory engine; multi-agent support; designed for retrieval and long-lived context
  • Why it matters: Long-term memory is the backbone of capable agents; shared state unlocks persistent workflows and higher success rates.
  • Impact on AI/automation/blockchain: Raises the ceiling for autonomous systems and complex orchestration.
  • Link: https://github.com/GibsonAI/Memori

Google ADK-Go (Code-First Agent Toolkit)

  • Category: AI products • Developer Tools
  • Key features: Go toolkit for building, evaluating, and deploying sophisticated agents; code-first ergonomics
  • Why it matters: A maturing agent stack needs durable, language-agnostic tooling; Go shops can now adopt agentic patterns natively.
  • Impact on AI/automation/blockchain: Accelerates enterprise-grade agent adoption with predictable performance and deployment hygiene.
  • Link: https://github.com/google/adk-go

Volcano Engine VERL (RL for LLMs)

  • Category: AI infrastructure • Reinforcement Learning
  • Key features: RL frameworks tailored for LLMs; reproducible training; tooling for policy optimization
  • Why it matters: RL is returning to the foreground as teams tune agents for safety, reliability, and business metrics beyond raw accuracy.
  • Impact on AI/automation/blockchain: Improves controllability, offering a path to measurable ROI from agent behavior.
  • Link: https://github.com/volcengine/verl

GitHub Trending

Open-source momentum that reflects where builders are investing time right now. Note the clustering around agent memory, RL, and code-first agent frameworks — strong “production agent” signal.

  • TrendRadar — Stars today: 1,714; Total: 20,588 — AI news aggregation + MCP-based analysis toolkit. Impact: workflow-ready market intelligence for teams.
  • iptv-org/iptv — Stars today: 511; Total: 102,094 — Massive IPTV collection. Impact: not AI-specific, but showcases large, community-led data curation.
  • GibsonAI/Memori — Stars today: 336; Total: 5,363 — Memory engine for LLMs and multi-agent systems. Impact: enables continuity and compounding context.
  • cursor-free-vip — Stars today: 245; Total: 42,783 — Utility around Cursor. Impact: user demand for AI dev tooling and productivity hacks remains high.
  • microsoft/call-center-ai — Stars today: 194; Total: 3,804 — API-accessible voice agent. Impact: concrete automation path for service operations.
  • google/adk-go — Stars today: 127; Total: 4,100 — Go toolkit for agent development. Impact: brings agent adoption to Go-heavy backends.
  • volcengine/verl — Stars today: 90; Total: 16,124 — RL for LLMs. Impact: tuning and guardrails for agent reliability.

Industry News

What’s shaping adoption, policy, and perception across AI and automation.

Microsoft AI leadership responds to Windows AI backlash

  • Category: Industry leadership • Trust & UX
  • Key points: Public debate around background AI agents, data access, and consent. Users want control, transparency, and opt-outs by default.
  • Why it matters: Mainstream adoption hinges on trust. Without clear UX and privacy defaults, even valuable AI features meet resistance.
  • Impact on AI/automation/blockchain: Expect stronger privacy disclosures, granular permissions, and audit logs in consumer OS and productivity suites.

EU eases AI and privacy rules amid criticism

  • Category: Policy • Regulation
  • Key points: Reports indicate the EU is relaxing parts of GDPR and AI rules; critics argue it benefits large platforms disproportionately.
  • Why it matters: Regulatory recalibration could speed up AI product rollout but raises questions about consumer protections and competition.
  • Impact on AI/automation/blockchain: Compliance strategies must remain adaptable; privacy engineering becomes a competitive advantage.

Agentic tools entering daily workflows

  • Category: Future of work • Productivity
  • Key points: Community reports show more developers acting as “managers” of AI agents — orchestrating tasks instead of doing every step manually.
  • Why it matters: Roles shift from pure IC to orchestration and QA. Teams that systematize this change will see outsized output gains.
  • Impact on AI/automation/blockchain: Standard operating procedures will incorporate prompts, evaluation, and agent-runbooks as first-class assets.

Enterprise adoption: voice agents and support automation

  • Category: Customer operations • Automation
  • Key points: Voice agents (e.g., call-center AI) are reaching practical maturity for Tier-1 flows; human escalation remains essential.
  • Why it matters: Cost-to-serve can drop meaningfully without sacrificing CSAT when agents are measured and supervised.
  • Impact on AI/automation/blockchain: Blended AI + human models will define service orgs for the next few years.

Key Insights

  • The agent stack is crystallizing: memory + RL + code-first kits are trending together. That’s a sign of readiness for production, not just demos.
  • Trust is the product: privacy defaults, auditability, and transparent permissions will be “table stakes” for AI products shipping at OS or suite scale.
  • Developer ergonomics matter: Go, Python, and JS ecosystems will each demand native agent frameworks — expect cross-language convergence on patterns.
  • Compliance is becoming a moat: teams that build privacy-by-design and policy-aware telemetry can ship faster across regions.
  • Open-source remains the R&D frontier: the fastest ideas are visible in public repos long before they appear in enterprise suites.

What’s Worth Watching

  • Memory engines like Memori: Persistent state and retrieval are determinant for multi-step agent success — watch for integrations with vector DBs and graph stores.
  • RL frameworks (VERL and peers): Expect more structured reward modeling tied to business KPIs (quality, throughput, safety) rather than generic benchmarks.
  • Code-first agent toolkits (ADK-Go and others): Enterprises will adopt agent patterns where deployment, testing, and observability align with existing SRE practices.
  • Regulatory whiplash: EU vs. US policy trajectories will diverge in specifics but converge on enforceable transparency and opt-in norms.
  • OS-integrated agents: Rollouts will be paced by privacy and control UX; progress will be stepwise, not overnight.

Key Takeaways

  • Prioritize trust by design: ship clear consent, data minimization, and event-level audit logs.
  • Prepare your stack for agents: add memory, evaluation harnesses, and safe action tools; start with narrow, high-ROI use cases.
  • Instrument outcomes, not just accuracy: tie agent rewards to business metrics — quality, latency, containment, and cost.
  • Keep a policy playbook: design for configurable privacy defaults by region to ship faster with less rework.

Internal linking suggestions

  • “How to productionize AI agents” — anchor: Production AI Agent Playbook
  • “Web3 + AI orchestration patterns” — anchor: On-Chain Signals for Agent Workflows
  • “DeFi risk and automation” — anchor: Automated Risk Controls in DeFi
  • “Privacy engineering for AI products” — anchor: Privacy by Design for AI
  • “LLM evaluation at scale” — anchor: Evaluating AI Systems Beyond Accuracy

Subscribe to daily AI news and updates

About the author

W3J Dev is a self-taught AI full-stack developer with expertise in blockchain, DeFi, and AI automation.
Connect: GitHub · LinkedIn

CancellationToken: The Complete Technical Guide for .NET Developers

2025-11-20 09:41:01

C# CancellationToken: The Complete Technical Guide for .NET Developers

A CancellationToken in .NET allows cooperative cancellation of long-running or asynchronous operations.

Instead of force-stopping a thread, .NET lets you signal that work should stop, and the running task checks the token and exits cleanly. This prevents wasted CPU time, improves responsiveness, and avoids unsafe thread aborts.

You create a CancellationTokenSource, pass its Token into your async method, and inside that method you check the token or pass it to cancellable APIs (like Task.Delay or HttpClient).

What is a CancellationToken in C#? Understanding Cooperative Cancellation in .NET

The C# CancellationToken is a struct that propagates notification that operations should be canceled. It's the cornerstone of cooperative cancellation in .NET, enabling graceful termination of async operations, parallel tasks, and long-running processes without forcefully aborting threads.

Table of Contents

  • CancellationToken Architecture & Internals
  • C# CancellationToken Implementation Patterns
  • Advanced CancellationToken Techniques
  • Performance Considerations
  • Production-Ready Examples
  • Integration with async/await

CancellationToken Architecture & Internals

Core Components of C# Cancellation System

The C# CancellationToken system consists of three primary components:

// 1. CancellationTokenSource - The controller
public sealed class CancellationTokenSource : IDisposable
{
    public CancellationToken Token { get; }
    public bool IsCancellationRequested { get; }
    public void Cancel();
    public void Cancel(bool throwOnFirstException);
    public void CancelAfter(TimeSpan delay);
    public void CancelAfter(int millisecondsDelay);
}

// 2. CancellationToken - The signal carrier (struct)
public readonly struct CancellationToken
{
    public static CancellationToken None { get; }
    public bool IsCancellationRequested { get; }
    public bool CanBeCanceled { get; }
    public WaitHandle WaitHandle { get; }

    public CancellationTokenRegistration Register(Action callback);
    public CancellationTokenRegistration Register(Action<object?> callback, object? state);
    public void ThrowIfCancellationRequested();
}

// 3. CancellationTokenRegistration - Callback management
public readonly struct CancellationTokenRegistration : IDisposable, IEquatable<CancellationTokenRegistration>
{
    public CancellationToken Token { get; }
    public void Dispose();
    public ValueTask DisposeAsync();
    public bool Unregister();
}

Memory and Threading Model

The CancellationToken in C# uses a lock-free implementation for performance:

public class CustomCancellationAnalyzer
{
    // Demonstrates internal callback registration mechanics
    public static void AnalyzeTokenInternals(CancellationToken token)
    {
        // Token is a value type (struct) - cheap to copy
        var tokenSize = Marshal.SizeOf<CancellationToken>(); // 8 bytes on 64-bit

        // Registration uses volatile fields internally
        var registration = token.Register(() => 
        {
            // Callbacks execute on the thread that calls Cancel()
            var threadId = Thread.CurrentThread.ManagedThreadId;
            Console.WriteLine($"Cancelled on thread: {threadId}");
        });

        // Registrations are stored in a lock-free linked list
        // Multiple registrations scale O(n) for execution
    }
}

C# CancellationToken Implementation Patterns

Pattern 1: Timeout-Based Cancellation

public class TimeoutService
{
    private readonly ILogger<TimeoutService> _logger;

    public async Task<T> ExecuteWithTimeoutAsync<T>(
        Func<CancellationToken, Task<T>> operation,
        TimeSpan timeout,
        CancellationToken externalToken = default)
    {
        // Link external cancellation with timeout
        using var cts = CancellationTokenSource.CreateLinkedTokenSource(externalToken);
        cts.CancelAfter(timeout);

        try
        {
            return await operation(cts.Token).ConfigureAwait(false);
        }
        catch (OperationCanceledException) when (cts.IsCancellationRequested && !externalToken.IsCancellationRequested)
        {
            throw new TimeoutException($"Operation timed out after {timeout.TotalSeconds} seconds");
        }
    }
}

Pattern 2: Hierarchical Cancellation with C# CancellationToken

public class HierarchicalTaskManager
{
    private readonly ConcurrentDictionary<Guid, CancellationTokenSource> _taskSources = new();

    public async Task ExecuteHierarchicalTasksAsync(CancellationToken parentToken)
    {
        // Parent cancellation token
        using var parentCts = CancellationTokenSource.CreateLinkedTokenSource(parentToken);

        // Create child tasks with linked tokens
        var childTasks = Enumerable.Range(0, 10).Select(async i =>
        {
            var childId = Guid.NewGuid();
            using var childCts = CancellationTokenSource.CreateLinkedTokenSource(parentCts.Token);
            _taskSources[childId] = childCts;

            try
            {
                await ProcessChildTaskAsync(i, childCts.Token);
            }
            finally
            {
                _taskSources.TryRemove(childId, out _);
            }
        });

        await Task.WhenAll(childTasks);
    }

    private async Task ProcessChildTaskAsync(int id, CancellationToken token)
    {
        while (!token.IsCancellationRequested)
        {
            // Check cancellation at computation boundaries
            token.ThrowIfCancellationRequested();

            // Simulate work with cancellation support
            await Task.Delay(100, token);

            // CPU-bound work with periodic checks
            for (int i = 0; i < 1000000; i++)
            {
                if (i % 10000 == 0)
                    token.ThrowIfCancellationRequested();
                // Process...
            }
        }
    }
}

Pattern 3: Polling vs Callback Registration

public class CancellationStrategies
{
    // Strategy 1: Polling (suitable for tight loops)
    public async Task PollingStrategyAsync(CancellationToken token)
    {
        var buffer = new byte[4096];
        var processedBytes = 0L;

        while (!token.IsCancellationRequested)
        {
            // Process buffer
            await ProcessBufferAsync(buffer);
            processedBytes += buffer.Length;

            // Check every N iterations to reduce overhead
            if (processedBytes % (1024 * 1024) == 0)
            {
                token.ThrowIfCancellationRequested();
            }
        }
    }

    // Strategy 2: Callback registration (suitable for wait operations)
    public async Task<T> CallbackStrategyAsync<T>(
        TaskCompletionSource<T> tcs,
        CancellationToken token)
    {
        // Register callback to cancel the TCS
        using var registration = token.Register(() =>
        {
            tcs.TrySetCanceled(token);
        });

        return await tcs.Task;
    }

    // Strategy 3: WaitHandle for interop scenarios
    public void InteropStrategy(CancellationToken token)
    {
        var waitHandles = new[] 
        { 
            token.WaitHandle,
            GetLegacyWaitHandle() 
        };

        var signaledIndex = WaitHandle.WaitAny(waitHandles, TimeSpan.FromSeconds(30));

        if (signaledIndex == 0)
            throw new OperationCanceledException(token);
    }

    private WaitHandle GetLegacyWaitHandle() => new ManualResetEvent(false);
    private Task ProcessBufferAsync(byte[] buffer) => Task.CompletedTask;
}

Advanced CancellationToken Techniques

Custom Cancellation Sources

public class CustomCancellationSource : IDisposable
{
    private readonly CancellationTokenSource _cts = new();
    private readonly Timer _inactivityTimer;
    private DateTime _lastActivity = DateTime.UtcNow;
    private readonly TimeSpan _inactivityTimeout;

    public CustomCancellationSource(TimeSpan inactivityTimeout)
    {
        _inactivityTimeout = inactivityTimeout;
        _inactivityTimer = new Timer(CheckInactivity, null, 
            TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
    }

    public CancellationToken Token => _cts.Token;

    public void RecordActivity()
    {
        _lastActivity = DateTime.UtcNow;
    }

    private void CheckInactivity(object? state)
    {
        if (DateTime.UtcNow - _lastActivity > _inactivityTimeout)
        {
            _cts.Cancel();
            _inactivityTimer?.Dispose();
        }
    }

    public void Dispose()
    {
        _inactivityTimer?.Dispose();
        _cts?.Dispose();
    }
}

C# CancellationToken with Channels and Dataflow

public class DataflowCancellationExample
{
    public async Task ProcessDataflowPipelineAsync(CancellationToken token)
    {
        // Create channel with cancellation
        var channel = Channel.CreateUnbounded<DataItem>(
            new UnboundedChannelOptions
            {
                SingleReader = false,
                SingleWriter = false
            });

        // Producer with cancellation
        var producer = Task.Run(async () =>
        {
            try
            {
                await foreach (var item in GetDataStreamAsync(token))
                {
                    await channel.Writer.WriteAsync(item, token);
                }
            }
            finally
            {
                channel.Writer.Complete();
            }
        }, token);

        // Multiple consumers with cancellation
        var consumers = Enumerable.Range(0, 4).Select(id =>
            Task.Run(async () =>
            {
                await foreach (var item in channel.Reader.ReadAllAsync(token))
                {
                    await ProcessItemAsync(item, token);
                }
            }, token)
        ).ToArray();

        // Graceful shutdown on cancellation
        token.Register(() =>
        {
            channel.Writer.TryComplete();
        });

        await Task.WhenAll(consumers.Append(producer));
    }

    private async IAsyncEnumerable<DataItem> GetDataStreamAsync(
        [EnumeratorCancellation] CancellationToken token = default)
    {
        while (!token.IsCancellationRequested)
        {
            yield return await FetchNextItemAsync(token);
        }
    }

    private Task<DataItem> FetchNextItemAsync(CancellationToken token) 
        => Task.FromResult(new DataItem());

    private Task ProcessItemAsync(DataItem item, CancellationToken token) 
        => Task.CompletedTask;

    private record DataItem;
}

Performance Considerations

Benchmarking C# CancellationToken Overhead

[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net80)]
public class CancellationTokenBenchmarks
{
    private CancellationTokenSource _cts = new();
    private CancellationToken _token;

    [GlobalSetup]
    public void Setup()
    {
        _token = _cts.Token;
    }

    [Benchmark(Baseline = true)]
    public async Task WithoutCancellation()
    {
        for (int i = 0; i < 1000; i++)
        {
            await Task.Yield();
        }
    }

    [Benchmark]
    public async Task WithCancellationPolling()
    {
        for (int i = 0; i < 1000; i++)
        {
            _token.ThrowIfCancellationRequested();
            await Task.Yield();
        }
    }

    [Benchmark]
    public async Task WithCancellationPropagation()
    {
        for (int i = 0; i < 1000; i++)
        {
            await Task.Delay(0, _token);
        }
    }
}

// Typical results:
// | Method                      | Mean     | Error   | StdDev  | Ratio | Gen0   | Allocated |
// |---------------------------- |---------:|--------:|--------:|------:|-------:|----------:|
// | WithoutCancellation         | 15.23 ms | 0.12 ms | 0.11 ms |  1.00 | 1000.0 |   3.81 MB |
// | WithCancellationPolling     | 15.45 ms | 0.09 ms | 0.08 ms |  1.01 | 1000.0 |   3.81 MB |
// | WithCancellationPropagation | 16.89 ms | 0.14 ms | 0.12 ms |  1.11 | 1015.0 |   3.87 MB |

Optimization Techniques for C# CancellationToken

public class OptimizedCancellationHandling
{
    // 1. Batch cancellation checks in tight loops
    public void ProcessLargeDataset(byte[] data, CancellationToken token)
    {
        const int CheckInterval = 1024; // Check every 1KB

        for (int i = 0; i < data.Length; i++)
        {
            // Process byte
            data[i] = (byte)(data[i] ^ 0xFF);

            // Check cancellation at intervals
            if ((i & (CheckInterval - 1)) == 0)
            {
                token.ThrowIfCancellationRequested();
            }
        }
    }

    // 2. Use ValueTask for high-frequency async operations
    public async ValueTask<int> ReadWithCancellationAsync(
        Stream stream, 
        Memory<byte> buffer, 
        CancellationToken token)
    {
        // ValueTask reduces allocations for synchronous completions
        var readTask = stream.ReadAsync(buffer, token);

        if (readTask.IsCompletedSuccessfully)
            return readTask.Result;

        return await readTask.ConfigureAwait(false);
    }

    // 3. Avoid creating unnecessary linked sources
    private readonly ObjectPool<CancellationTokenSource> _ctsPool = 
        new DefaultObjectPool<CancellationTokenSource>(
            new DefaultPooledObjectPolicy<CancellationTokenSource>());

    public async Task ExecutePooledAsync(CancellationToken token)
    {
        var cts = _ctsPool.Get();
        try
        {
            // Reuse pooled CTS
            using var linked = CancellationTokenSource
                .CreateLinkedTokenSource(token, cts.Token);

            await DoWorkAsync(linked.Token);
        }
        finally
        {
            _ctsPool.Return(cts);
        }
    }

    private Task DoWorkAsync(CancellationToken token) => Task.CompletedTask;
}

Production-Ready Examples

Web API with Request Cancellation

[ApiController]
[Route("api/[controller]")]
public class DataProcessingController : ControllerBase
{
    private readonly IDataService _dataService;
    private readonly ILogger<DataProcessingController> _logger;

    [HttpPost("process")]
    [RequestSizeLimit(100_000_000)] // 100MB limit
    [RequestTimeout(300_000)] // 5 minutes
    public async Task<IActionResult> ProcessLargeDataset(
        [FromBody] ProcessingRequest request,
        CancellationToken cancellationToken) // Automatically bound to request abort
    {
        try
        {
            // Link request cancellation with custom timeout
            using var cts = CancellationTokenSource.CreateLinkedTokenSource(
                cancellationToken,
                HttpContext.RequestAborted);

            cts.CancelAfter(TimeSpan.FromMinutes(5));

            var result = await _dataService.ProcessAsync(
                request.Data,
                cts.Token);

            return Ok(new ProcessingResponse
            {
                ProcessedItems = result.ItemCount,
                Duration = result.Duration
            });
        }
        catch (OperationCanceledException) when (HttpContext.RequestAborted.IsCancellationRequested)
        {
            _logger.LogWarning("Client disconnected during processing");
            return StatusCode(499); // Client Closed Request
        }
        catch (OperationCanceledException)
        {
            _logger.LogWarning("Processing timeout exceeded");
            return StatusCode(408); // Request Timeout
        }
    }
}

Background Service with Graceful Shutdown

public class QueueProcessorService : BackgroundService
{
    private readonly IServiceProvider _serviceProvider;
    private readonly ILogger<QueueProcessorService> _logger;
    private readonly Channel<WorkItem> _queue;

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        // Create workers with cancellation support
        var workers = Enumerable.Range(0, Environment.ProcessorCount)
            .Select(id => ProcessQueueAsync(id, stoppingToken))
            .ToArray();

        // Register graceful shutdown
        stoppingToken.Register(() =>
        {
            _logger.LogInformation("Shutdown signal received, completing remaining work...");
            _queue.Writer.TryComplete();
        });

        await Task.WhenAll(workers);
        _logger.LogInformation("All workers completed");
    }

    private async Task ProcessQueueAsync(int workerId, CancellationToken stoppingToken)
    {
        await foreach (var item in _queue.Reader.ReadAllAsync(stoppingToken))
        {
            using var scope = _serviceProvider.CreateScope();
            var processor = scope.ServiceProvider.GetRequiredService<IWorkItemProcessor>();

            try
            {
                // Process with timeout per item
                using var itemCts = CancellationTokenSource.CreateLinkedTokenSource(stoppingToken);
                itemCts.CancelAfter(TimeSpan.FromMinutes(1));

                await processor.ProcessAsync(item, itemCts.Token);
            }
            catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
            {
                // Graceful shutdown - requeue item
                await RequeueItemAsync(item);
                throw;
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Worker {WorkerId} failed processing item {ItemId}", 
                    workerId, item.Id);
            }
        }
    }

    private Task RequeueItemAsync(WorkItem item) => Task.CompletedTask;
}

Integration with async/await

Comprehensive async/await with C# CancellationToken

public class AsyncCancellationIntegration
{
    // Proper exception handling with cancellation
    public async Task<T> ExecuteWithRetryAsync<T>(
        Func<CancellationToken, Task<T>> operation,
        int maxRetries = 3,
        CancellationToken cancellationToken = default)
    {
        var exceptions = new List<Exception>();

        for (int attempt = 0; attempt <= maxRetries; attempt++)
        {
            try
            {
                // Check cancellation before each attempt
                cancellationToken.ThrowIfCancellationRequested();

                return await operation(cancellationToken).ConfigureAwait(false);
            }
            catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
            {
                // Don't retry on cancellation
                throw;
            }
            catch (Exception ex) when (attempt < maxRetries)
            {
                exceptions.Add(ex);

                // Exponential backoff with cancellation
                var delay = TimeSpan.FromMilliseconds(Math.Pow(2, attempt) * 100);
                await Task.Delay(delay, cancellationToken);
            }
        }

        throw new AggregateException(
            $"Operation failed after {maxRetries} retries", 
            exceptions);
    }

    // Parallel async operations with cancellation
    public async Task<IReadOnlyList<T>> ProcessParallelAsync<T>(
        IEnumerable<Func<CancellationToken, Task<T>>> operations,
        int maxConcurrency = 10,
        CancellationToken cancellationToken = default)
    {
        using var semaphore = new SemaphoreSlim(maxConcurrency, maxConcurrency);
        var results = new ConcurrentBag<T>();

        var tasks = operations.Select(async operation =>
        {
            await semaphore.WaitAsync(cancellationToken);
            try
            {
                var result = await operation(cancellationToken);
                results.Add(result);
            }
            finally
            {
                semaphore.Release();
            }
        });

        await Task.WhenAll(tasks);
        return results.ToList();
    }
}

Stream Processing with C# CancellationToken

public class StreamProcessor
{
    public async IAsyncEnumerable<ProcessedChunk> ProcessStreamAsync(
        Stream input,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        var buffer = new byte[4096];
        var position = 0L;

        while (true)
        {
            // Read with cancellation
            var bytesRead = await input.ReadAsync(
                buffer.AsMemory(0, buffer.Length), 
                cancellationToken);

            if (bytesRead == 0)
                break;

            // Process chunk
            var processed = await ProcessChunkAsync(
                buffer.AsMemory(0, bytesRead), 
                position, 
                cancellationToken);

            position += bytesRead;

            yield return processed;

            // Allow cancellation between chunks
            cancellationToken.ThrowIfCancellationRequested();
        }
    }

    private async Task<ProcessedChunk> ProcessChunkAsync(
        ReadOnlyMemory<byte> data, 
        long position, 
        CancellationToken cancellationToken)
    {
        // Simulate async processing with cancellation support
        await Task.Delay(10, cancellationToken);

        return new ProcessedChunk
        {
            Position = position,
            Size = data.Length,
            Checksum = CalculateChecksum(data.Span)
        };
    }

    private uint CalculateChecksum(ReadOnlySpan<byte> data)
    {
        uint checksum = 0;
        foreach (var b in data)
            checksum = (checksum << 1) ^ b;
        return checksum;
    }

    public record ProcessedChunk
    {
        public long Position { get; init; }
        public int Size { get; init; }
        public uint Checksum { get; init; }
    }
}

Best Practices Summary

Do's for C# CancellationToken:

  1. Always propagate tokens through your entire async call chain
  2. Check cancellation at boundaries between logical operations
  3. Use linked tokens for combining multiple cancellation sources
  4. Dispose CancellationTokenSource when done (implements IDisposable)
  5. Handle OperationCanceledException separately from other exceptions
  6. Use ConfigureAwait(false) in library code
  7. Pass CancellationToken.None explicitly when cancellation isn't supported

Don'ts for C# CancellationToken:

  1. Don't ignore cancellation requests - check regularly in long-running operations
  2. Don't catch OperationCanceledException without rethrowing (unless intentional)
  3. Don't create tokens for trivial operations - overhead may exceed benefit
  4. Don't use Thread.Abort() - use CancellationToken instead
  5. Don't forget to test cancellation paths - they're often undertested
  6. Don't pass tokens to operations that complete instantly
  7. Don't create multiple CancellationTokenSource instances when one suffices

Conclusion

The C# CancellationToken is essential for building responsive, scalable .NET applications. By mastering cooperative cancellation patterns, you ensure graceful shutdown, prevent resource leaks, and maintain application responsiveness. Whether building web APIs, background services, or desktop applications, proper CancellationToken usage is critical for production-quality C# code.

Additional Resources

Want to handle documents with proper cancellation support? Check out IronPDFs Blog for async PDF generation with full CancellationToken integration in C#.

Author Bio:

Jacob Mellor is the Chief Technology Officer and founding engineer of Iron Software, leading the development of the Iron Suite of .NET libraries with millions of NuGet installations worldwide.

With 41 years of programming experience (having learned as a young child eagerly: 8-bit assembly and basic ) , he architects enterprise document processing solutions used by Infrastructure everyone uses every day. But a few names I can drop are: : NASA, Tesla, and Comprehensive support at the highest levels for Australian, US and UK governments.

Currently spearheading Iron Software 2.0's migration to C#/Rust/WebAssembly/TSP for universal language support, Jacob is passionate about AI-assisted development and building developer-friendly tools. Learn more about his work at Iron Software and follow his open-source contributions on GitHub.