MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

SeaTunnel CDC Explained: A Layman’s Guide

2026-01-18 01:00:11

Based on recent practices in production environments using SeaTunnel CDC (Change Data Capture) to synchronize scenarios such as Oracle, MySQL, and SQL Server, and combined with feedback from a wide range of users, I have written this article to help you understand the process by which SeaTunnel implements CDC. The content mainly covers the three stages of CDC: Snapshot, Backfill, and Incremental.

The Three Stages of CDC

The overall CDC data reading process can be broken down into three major stages:

  1. Snapshot (Full Load)
  2. Backfill
  3. Incremental

1. Snapshot Stage

The meaning of the Snapshot stage is very intuitive: take a snapshot of the current database table data and perform a full table scan via JDBC.

\ Taking MySQL as an example, the current binlog position is recorded during the snapshot:

SHOW MASTER STATUS;

| File | Position | BinlogDoDB | BinlogIgnoreDB | ExecutedGtidSet | |----|----|----|----|----| | binlog.000011 | 1001373553 | | | |

\ SeaTunnel records the File and Position as the low watermark.

Note: This is not just executed once, because SeaTunnel has implemented its own split cutting logic to accelerate snapshots.

\

MySQL Snapshot Splitting Mechanism (Split)

Assuming the global parallelism is 10:

  • SeaTunnel will first analyze all tables and their primary key/unique key ranges and select an appropriate splitting column.

  • It splits based on the maximum and minimum values of this column, with a default of snapshot.split.size = 8096.

  • Large tables may be cut into hundreds of Splits, which are allocated to 10 parallel channels by the enumerator according to the order of subtask requests (tending toward a balanced distribution overall).

Table-level sequential processing (schematic):

// Processing sequence:
// 1. Table1 -> Generate [Table1-Split0, Table1-Split1, Table1-Split2]
// 2. Table2 -> Generate [Table2-Split0, Table2-Split1]
// 3. Table3 -> Generate [Table3-Split0, Table3-Split1, Table3-Split2, Table3-Split3]

\ Split-level parallel allocation:

// Allocation to different subtasks:
// Subtask 0: [Table1-Split0, Table2-Split1, Table3-Split2]
// Subtask 1: [Table1-Split1, Table3-Split0, Table3-Split3]
// Subtask 2: [Table1-Split2, Table2-Split0, Table3-Split1]

\ Each Split is actually a query with a range condition, for example:

SELECT * FROM user_orders WHERE order_id >= 1 AND order_id < 10001;

\ Crucial: Each Split separately records its own low watermark/high watermark.

\ Practical Advice: Do not make the split_size too small; having too many Splits is not necessarily faster, and the scheduling and memory overhead will be very large.

2. Backfill Stage

Why is Backfill needed? Imagine you are performing a full snapshot of a table that is being frequently written to. When you read the 100th row, the data in the 1st row may have already been modified. If you only read the snapshot, the data you hold when you finish reading is actually "inconsistent" (part is old, part is new).

\ The role of Backfill is to compensate for the "data changes that occurred during the snapshot" so that the data is eventually consistent.

\ The behavior of this stage mainly depends on the configuration of the exactly_once parameter.

2.1 Simple Mode (exactly_once = false)

This is the default mode; the logic is relatively simple and direct, and it does not require memory caching:

  • Direct Snapshot Emission: Reads snapshot data and sends it directly downstream without entering a cache.
  • Direct Log Emission: Reads Binlog at the same time and sends it directly downstream.
  • Eventual Consistency: Although there will be duplicates in the middle (old A sent first, then new B), as long as the downstream supports idempotent writes (like MySQL's REPLACE INTO), the final result is consistent.

2.2 Exactly-Once Mode (exactly_once = true)

This is the most impressive part of SeaTunnel CDC, and it is the secret to guaranteeing that data is "never lost, never repeated." It introduces a memory buffer (Buffer) for deduplication.

\ Simple Explanation: Imagine the teacher asks you to count how many people are in the class right now (Snapshot stage). However, the students in the class are very mischievous; while you are counting, people are running in and out (data changes). If you just count with your head down, the result will definitely be inaccurate when you finish.

SeaTunnel does it like this:

  1. Take a Photo First (Snapshot): Count the number of people in the class first and record it in a small notebook (memory buffer); don't tell the principal (downstream) yet.
  2. Watch the Surveillance (Backfill): Retrieve the surveillance video (Binlog log) for the period you were counting.
  3. Correct the Records (Merge):
  • If the surveillance shows someone just came in, but you didn't count them -> add them.
  • If the surveillance shows someone just ran out, but you counted them in -> cross them out.
  • If the surveillance shows someone changed their clothes -> change the record to the new clothes.
  1. Submit Homework (Send): After correction, the small notebook in your hand is a perfectly accurate list; now hand it to the principal.

\ Summary for Beginners: exactly_once = true means "hold it in and don't send it until it's clearly verified."

  • Benefit: The data received downstream is absolutely clean, without duplicates or disorder.
  • Cost: Because it must be "held in," it needs to consume some memory to store the data. If the table is particularly large, memory might be insufficient.

2.3 Two Key Questions and Answers

Q1: Why is case READ: throw Exception written in the code? Why aren't there READ events during the Backfill stage?

  • The READ event is defined by SeaTunnel itself, specifically to represent "stock data read from the snapshot."
  • The Backfill stage reads the database's Binlog. Binlog only records "additions, deletions, and modifications" (INSERT/UPDATE/DELETE) and never records "someone queried a piece of data."
  • Therefore, if you read a READ event during the Backfill stage, it means the code logic is confused.

\ Q2: If it's placed in memory, can the memory hold it? Will it OOM?

  • It's not putting the whole table into memory: SeaTunnel processes by splits.
  • Splits are small: A default split has only 8096 rows of data.
  • Throw away after use: After processing a split, send it, clear the memory, and process the next one.
  • Memory occupancy formula ≈ : Parallelism × Split size × Single row data size.

2.4 Key Detail: Watermark Alignment Between Multiple Splits

This is a very hidden but extremely important issue. If not handled well, it will lead to data being either lost or repeated.

\ Plain Language Explanation: The Fast/Slow Runner Problem: Imagine two students (Split A and Split B) are copying homework (Backfill data).

  • Student A (fast): Copied to page 100 and finished at 10:00.
  • Student B (slow): Copied to page 200 and just finished at 10:05.

\ Now, the teacher (Incremental task) needs to continue teaching a new lesson (reading Binlog) from where they finished copying. Where should the teacher start?

\

  • If starting from page 200: Student B is connected, but the content Student A missed between pages 100 and 200 (what happened between 10:00 and 10:05) is completely lost.
  • If starting from page 100: Student A is connected, but Student B will complain: "Teacher, I already copied the content from page 100 to 200!" This leads to repetition.

\ SeaTunnel's Solution: Start from the earliest and cover your ears for what you've already heard: SeaTunnel adopts a "Minimum Watermark Starting Point + Dynamic Filtering" strategy:

  1. Determine the Start (care for the slow one): The teacher decides to start from page 100 (the minimum watermark among all splits).
  2. Dynamic Filtering (don't listen to what's been heard): While the teacher is lecturing (reading Binlog), they hold a list: { A: 100, B: 200 }.
  • When the teacher reaches page 150:
  • Look at the list; is it for A? 150 > 100, A hasn't heard it, record it (send).
  • Look at the list; is it for B? 150 < 200, B already copied it, skip it directly (discard).
  1. Full Speed Mode (everyone has finished hearing): When the teacher reaches page 201 and finds everyone has already heard it, they no longer need the list.

    快慢跑问题英文

Summary in one sentence: With exactly_once: The incremental stage strictly filters according to the combination of "starting offset + split range + high watermark."

\ Withoutexactly_once: The incremental stage becomes a simple "sequential consumption from a certain starting offset."

3. Incremental Stage

After the Backfill (for exactly_once = true) or Snapshot stage ends, it enters the pure incremental stage:

  • MySQL: Based on binlog.
  • Oracle: Based on redo/logminer.
  • SQL Server: Based on transaction log/LSN.
  • PostgreSQL: Based on WAL.

\ SeaTunnel's behavior in the incremental stage is very close to native Debezium:

  • Consumes logs in offset order.
  • Constructs events like INSERT/UPDATE/DELETE for each change.
  • When exactly_once = true, the offset and split status are included in the checkpoint to achieve "exactly-once" semantics after failure recovery.

4. Summary

The core design philosophy of SeaTunnel CDC is to find the perfect balance between "Fast" (parallel snapshots) and "Stable" (data consistency).

\ Let's review the key points of the entire process:

  • Slicing (Split) is the foundation of parallel acceleration: Cutting large tables into small pieces to let multiple threads work at the same time.
  • Snapshot is responsible for moving stock: Utilizing slices to read historical data in parallel.
  • Backfill is responsible for sewing the gaps: This is the most critical step. It compensates for changes during the snapshot and eliminates duplicates using memory merging algorithms to achieve Exactly-Once.
  • Incremental is responsible for real-time synchronization: Seamlessly connecting to the Backfill stage and continuously consuming database logs.

\ Understanding this trilogy of "Snapshot -> Backfill -> Incremental" and the coordinating role of "Watermarks" within it is to truly master the essence of SeaTunnel CDC.

\

The HackerNoon Newsletter: 680 Hours, 4 Rebuilds, and Getting Fired: How I Built Software While Working Warehouse Shifts (1/17/2026)

2026-01-18 00:03:10

How are you, hacker?


🪐 What’s happening in tech today, January 17, 2026?


The HackerNoon Newsletter brings the HackerNoon homepage straight to your inbox. On this day, Persian Gulf War began in 1991, Popeye the Sailor made his first appearance in 1929, Google Videos launched in 2006, and we present you with these top quality stories.

680 Hours, 4 Rebuilds, and Getting Fired: How I Built Software While Working Warehouse Shifts


By @huckler [ 4 Min read ] Just about alone programming, innovational program. My story. Read More.


🧑‍💻 What happened in your world this week?

It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️


ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME


We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it.See you on Planet Internet! With love, The HackerNoon Team ✌️


Third-Party Risks in 2026: Outlook and Security Strategies

2026-01-17 23:00:04

Many companies rely on external services to keep their operations running smoothly. However, while third-party vendors help power systems and support day-to-day operations, each new integration adds a potential access point that attackers can target. In 2026, third-party risk influences the speed at which incidents spread, the effectiveness of compliance, and the rate at which teams can recover. To prepare for what lies ahead, it is helpful to understand the current risks and know the steps IT teams can take to secure vendor access.

The State of Third-Party Cybersecurity in 2026

Third-party risk is everywhere in 2026. It is apparent on the web, where third-party code runs on customer-facing pages and can access sensitive areas such as login and account recovery.

\ A recent study reviewed 4,700 major websites and found that 64% of third-party apps were accessing sensitive data without a clear need — up from 51% in 2024. The same report highlighted an execution gap where many security leaders rank web attacks as a top priority, while far fewer have deployed solutions aimed at reducing that exposure.

\ Third-party risk is not limited to website tags and scripts — it also encompasses other potential vulnerabilities. Many outside providers connect to core business functions like payments, user accounts, support systems, and analytics. Survey data shows that over 60% of organizations have dealt with a cybersecurity incident linked to a vendor. In real incidents, a vendor might be how an attacker gains entry, how they remain undetected, or how they spread access across additional systems.

\ Attackers have also improved at exploiting business trust. Techniques that work against internal users also work against vendor relationships, including credential theft, session hijacking, OAuth abuse, token replay, malicious updates, and injected browser-side scripts. The difference lies in speed and blast radius.

\ A good example is what happened to Ledger. In 2023, attackers exploited vulnerabilities in decentralized finance applications connected to Ledger-related services and stole nearly $500,000 from users. The incident exposed a hard lesson on dependency sprawl. Hardware wallet safety can be undermined by adjacent services that handle customer data and workflows, including integrations, payment and fulfillment layers, and support tools.

Why Traditional TPRM Is Falling Short

Many third-party risk management (TPRM) programs still run on old procurement checklists. They assume vendor onboarding is centralized, the vendor list remains stable, and periodic reviews are enough. These break down in 2026.

\ Teams can now purchase tools independently, connect apps through marketplaces and application programming interfaces, and onboard new vendors for fast experiments. All these can happen before security realizes the changes.

\ Classic TPRM was built for slower and more predictable procurement cycles and often struggles when vendor decisions happen across the business with agile onboarding patterns. In addition, many workflows have not yet evolved at the same pace as cloud adoption and modern software delivery methods. The result is a predictable set of gaps.

\ Point-in-time assessments miss fast changes in ownership, infrastructure, subcontractors, and release cadence. Vendor inventories also fall behind real usage, especially when teams add scripts and integrations through self-service workflows. Contracts often lag behind technical reality, as well, resulting in weak requirements for breach notification, log retention, forensic cooperation, and subprocessor transparency.

\ Despite knowing these realities, some organizations skip the fundamentals. Fifteen percent of businesses skip third-party risk checks, even while positioning strong TPRM programs to address supply chain concerns. That omission is critical because vendor onboarding is often the only structured moment to restrict access and prevent unsafe integrations.

A Disconnect Between Awareness and Action

Security leaders understand that vendors can expose companies to risk — the problem is follow-through. Many organizations lack a tested plan for vendor-driven incidents and cannot see all the vendor connections that matter, especially when integrations and subcontractors are involved.

\ Regulators have also become stricter. The Securities and Exchange Commission’s cybersecurity disclosure rules push public companies to share material incident details quickly. The agency noted that a Form 8-K Item 1.05 filing is generally due within four business days after the entity decides an incident is material.

\ A 2026 Panorays survey found that while 77% of chief information security officers (CISOs) viewed third-party risk as a major threat, only 21% said their enterprises have tested crisis response plans. It also reported that although 60% saw a rise in third-party security incidents, only 15% had full visibility into such situations.

\ Response speed depends on how quickly the vendor shares impact details. If agreements do not require fast notification and evidence preservation, internal teams are left to make decisions even with missing information. If scenarios have never been practiced, coordination between teams slows down dramatically.

Key Strategies for a Resilient TPRM Program in 2026

Resilience starts with viewing third parties as extensions of the security perimeter. That shift favors enforceable technical controls and contracts that align with real incident workflows, not just theoretical models.

Embrace Automation and AI

Automation can keep vendor inventories current, classify vendors by data access and business criticality, and monitor for meaningful posture changes. High-value signals include exposed credentials, new internet-facing assets, security advisories, and unexpected permission growth in SaaS integrations. Of course, privileged connections and high-impact vendors should still be left to human reviewers.

Foster a Culture of Security

Make vendor security everyone’s job. Ensure that the right elements are listed up-front at each vendor — a security contact, a legal contact, and an operations contact. For internal teams that add scripts or connect new apps on their own, provide quick training on what access they are granting, where the data will go, and who needs to sign off.

Adopt a Zero-Trust Approach

Default to least privilege. Require strong authentication and limit vendor access to a specific time frame with full logging and regular reviews. For SaaS integrations, control OAuth approvals, limit token scopes, and audit permissions on a schedule.

Prioritize Continuous Monitoring

Track vendor posture changes and production web changes continuously — don’t just rely on annual reviews. Monitor what third-party code can read and transmit, especially on login, checkout, and account recovery pages.

Develop a Robust Incident Response Plan

Third-party incident response should include shared severity levels, notification timelines, and evidence preservation steps. Plans should cover how to disable integrations quickly, rotate secrets, revoke tokens, and ship compensating controls. Testing vendor-driven scenarios can reveal coordination gaps and areas for improvement.

Building a Proactive and Future-Proof TPRM Framework

Future-proofing TPRM means anticipating and controlling real-world exposure. Inventories should trace back to data flows, identity privileges, code execution paths, and operational dependencies. This deep visibility reveals hidden risk concentrations, specifically identifying vendors who may still hold high-level administrative access or operate inside your most critical processes despite having low contract values.

\ Compliance checklists no longer measure readiness. True progress is defined by reducing standing privileges, endorsing rapid vendor offboarding, and eliminating unknown scripts in production. By defining these technical responsibilities before a crisis happens, organizations avoid rushed coordination and can make immediate containment decisions the moment an incident strikes.

\ Ultimately, treating TPRM as an ongoing risk discipline creates significant operational resilience. Speed and precision ultimately protect customer trust and minimize disruptions in an interconnected environment.

Fortify Your Business in the Interconnected Age

Third-party risk in 2026 demands continuous visibility and strictly enforced access controls. Unmonitored connections can turn minor vendor breaches into major operational failures. To close this gap, companies must aggressively limit privileges and validate response plans through real-world simulations. This guarantees that the threat can be isolated instantly when a partner is compromised, preventing an external incident from becoming an internal disaster.

The AI Engine is the New Artist: Rethinking Royalties in an Age of Infinite Content

2026-01-17 22:00:04

People can use generative AI to create art, text, and music from datasets of previous art, which is significantly impacting the current creative economy. The debate of what makes an artist and the lack of clear compensation are growing concerns, prompting the evolving issue of royalty battles over AI-generated work.

The Evolution of Royalty Battles

AI is changing royalty battles to a debate about what makes art original. The traditional notions of authorship and ownership are being abandoned, as AI utilizes data from existing art to create new pieces. The original artists are not compensated for the new creation, while the users who prompted the machine are arguing about copyrighting the AI’s product.

\ Royalty battles are not a new concept. Recently, Rick Nelson’s family sued his former record label for not compensating them for the royalties from his songs. The lawsuit reached a settlement, but it reveals that artists and their families have been arguing over copyright for many years, predating the AI argument. However, these new machines are using data from previous artwork without permission, significantly complicating the battle.

Current Legal and Ethical Debates

AI developers and artists are consistently arguing over the legal and ethical issues surrounding algorithms in creative fields. On the legal side, artists are filing lawsuits against AI companies for using their work without permission to train their models. Some popular art generators are DALL-E 2 and Artbreeder, which create images from large datasets of original human work. The work is copyrighted, so artists are demanding compensation. Many also want brands to stop using their artwork altogether, as they consider it a form of theft.

\ Currently, the U.S. Copyright Office is developing policies to address this legal debate. In 2023, the office ruled that work generated entirely by AI is not eligiblefor copyright. However, work with significant human modifications after the initial AI-generated piece is eligible. The Office based its ruling on the premise that completely generated work lacks a human author, regardless of the prompt’s detail. \n

Beyond the legal battles for appropriate royalties is the ethical debate surrounding AI-generated art. These pieces were not created by a human, but draw from many examples of human-made work, causing some to debate the true meaning of art. Many believe AI cannot make true art because it does not understand the emotional aspect. Others believe that if they use a bot to create something similar to the idea in their head, it should be considered original.

New Royalty Models for Fairness

There are several solutions to modify royalty models that provide fair compensation for artists and AI users: \n

  • Usage transparency: Users should clearly demonstrate when and how they used AI to create a book, painting, song, or other piece. People might enjoy the work more if the artist is transparent about their usage. It also alerts those who want to avoid AI-generated art.
  • Micropayments for artists: Large AI enterprises could give micropayments to artists every time machines use their art to generate something new. This method reduces disgruntled artists and accurately compensates them for their hard work on the original piece. However, some may still want their work removed from new training sets, limiting the scope of AI-generated content.
  • New copyright law: Given the U.S. Copyright Office’s ruling, new AI-generated works must undergo many changes to qualify for copyright. Work with limited human interference will not be considered original.

The Need for Ongoing Dialogue

While royalty battles are not new, AI is significantly complicating them. Currently, the technology is evolving faster than officials can create adequate policies. Artists, policymakers, and AI companies must collaborate to create a sustainable framework for art in the new world.

\

The TechBeat: Why Data Quality Is Becoming a Core Developer Experience Metric (1/17/2026)

2026-01-17 15:10:58

How are you, hacker? 🪐Want to know what's trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## Governing and Scaling AI Agents: Operational Excellence and the Road Ahead By @denisp [ 23 Min read ] Success isn't building the agent; it's managing it. From "AgentOps" to ROI dashboards, here is the operational playbook for scaling Enterprise AI. Read More.

The Seven Pillars of a Production-Grade Agent Architecture

By @denisp [ 12 Min read ] An AI agent without memory is just a script. An agent without guardrails is a liability. The 7 critical pillars of building production-grade Agentic AI. Read More.

Patterns That Work and Pitfalls to Avoid in AI Agent Deployment

By @denisp [ 17 Min read ] Avoid the "AI Slop" trap. From runaway costs to memory poisoning, here are the 7 most common failure modes of Agentic AI (and how to fix them). Read More.

Best HR Software For Midsize Companies in 2026

By @stevebeyatte [ 12 Min read ] Modern midsize companies need platforms that balance sophistication with agility, offering powerful features without overwhelming complexity. Read More.

Playbook for Production ML: Latency Testing, Regression Validation, and Automated Deployment

By @stevebeyatte [ 4 Min read ] Even the most automated systems still need an underlying philosophy. Read More.

Should We Be Worried About Losing Jobs? Or Just Adapt Our Civilization to New Reality?

By @chris127 [ 10 Min read ] The question isn't whether jobs will disappear—it's whether our traditional work model is still valid. Read More.

AI Doesn’t Mean the End of Work for Us

By @bernard [ 4 Min read ] I believe that AI’s impact and future pathways are overstated because human nature is ignored in such statements. Read More.

In a World Obsessed With AI, The Miniswap Founders Are Betting on Taste

By @stevebeyatte [ 4 Min read ] Miniswap, a Warhammer marketplace founded by Cambridge students, is betting on taste, curation, and community over AI automation. Learn how they raised $3.5M. Read More.

Innovation And Accountability: What AstraBit’s Broker-Dealer Registration Signals for Web3 Finance

By @astrabit [ 5 Min read ] What AstraBit’s FINRA broker-dealer registration signals for Web3 finance, regulatory accountability, and how innovation and compliance can coexist. Read More.

9 RAG Architectures Every AI Developer Should Know: A Complete Guide with Examples

By @hck3remmyp3ncil [ 11 Min read ] RAG optimizes language model outputs by having them reference external knowledge bases before generating responses. Read More.

**[ISO 27001 Compliance Tools in 2026: A Comparative Overview of 7 Leading Platforms

](https://hackernoon.com/iso-27001-compliance-tools-in-2026-a-comparative-overview-of-7-leading-platforms)** By @stevebeyatte [ 7 Min read ] Breaking down the best ISO 27001 Compliance tools in the market for 2026. Read More.

A Developer's Guide to Building Next-Gen Smart Wallets With ERC-4337 — Part 2: Bundlers

By @hacker39947670 [ 15 Min read ] Bundlers are the bridge between account abstraction and the execution layer. Read More.

IPv6 and CTV: The Measurement Challenge From the Fastest-Growing Ad Channel

By @ipinfo [ 7 Min read ] IPv6 breaks digital ad measurement. Learn how IPinfo’s research-driven, active-measurement model restores accuracy across CTV and all channels. Read More.

Should You Trust Your VPN Location?

By @ipinfo [ 9 Min read ] IPinfo reveals how most VPNs misrepresent locations and why real IP geolocation requires active measurement, not claims. Read More.

I Built an Enterprise-Scale App With AI. Here’s What It Got Right—and Wrong

By @leonrevill [ 8 Min read ] Is AI making developers faster or just worse? A CTO builds a complex platform from scratch to test the "Stability Tax, and why "Vibe Coding" is dead. Read More.

We Replaced 3 Senior Devs with AI Agents: One Year Later

By @dineshelumalai [ 7 Min read ] A Software Architect's account of replacing senior devs with AI. $238K savings became $254K in real costs. Why human judgment still matters. Read More.

Brand Clarity vs Consensus

By @erelcohen [ 2 Min read ] In a polarized 2025 market, enterprise software companies can no longer win through broad consensus—only through brand clarity. Read More.

Why Data Quality Is Becoming a Core Developer Experience Metric

By @melissaindia [ 4 Min read ] Bad data secretly slows development. Learn why data quality APIs are becoming core DX infrastructure in API-first systems and how they accelerate teams. Read More.

DynamoDB: When to Move Out ![]()

By @scylladb [ 6 Min read ] ScyllaDB offers a high-performance NoSQL alternative to DynamoDB, solving throttling, latency, and size limits for scalable workloads. Read More.

How to Choose the Right Vector Database for a Production-Ready RAG Chatbot

By @nee2112 [ 10 Min read ] A hands-on comparison of vector databases for RAG chatbots, showing why filtering and hybrid search matter in real production systems. Read More. 🧑‍💻 What happened in your world this week? It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️ ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it. See you on Planet Internet! With love, The HackerNoon Team ✌️

Replacing Service Principal Secrets in Crossplane with Azure Workload Identity Federation

2026-01-17 12:00:08

When using Crossplane to provision Azure resources from Kubernetes, authentication becomes a critical challenge. Traditional approaches using service principal secrets are insecure and operationally complex. This blog post shares how we solved Azure authentication using Workload Identity Federation across three distinct deployment scenarios:

  1. Local Development: Kind cluster with Crossplane on developer laptops
  2. CI/CD Pipeline: GitHub Actions running Kind cluster with Crossplane for automated testing
  3. Production: EKS cluster with Crossplane managing Azure infrastructure

Each scenario presented unique challenges, and we’ll share the exact configurations, code snippets, and solutions that made credential-free Azure authentication work seamlessly across all environments.

The Challenge: Why Traditional Approaches Fall Short

Before diving into solutions, let’s understand the problem we were solving:

Traditional Approach: Service Principal Secrets

# ❌ The old way - storing secrets
apiVersion: v1
kind: Secret
metadata:
  name: azure-credentials
type: Opaque
data:
  clientId: base64-encoded-client-id
  clientSecret: base64-encoded-secret  # Long-lived credential!
  tenantId: base64-encoded-tenant-id

Problems:

  • Long-lived credentials stored in Kubernetes secrets
  • Manual rotation required
  • Security risk if secrets are compromised
  • Different authentication patterns across environments
  • Secret management overhead

Our Goal: Workload Identity Federation

We wanted to achieve:

  • Zero stored secrets across all environments
  • Automatic token rotation with short-lived credentials
  • Consistent authentication pattern from local dev to production
  • Individual developer isolation in local development
  • Clear audit trail for all Azure operations

Understanding Azure Workload Identity Federation

Before diving into each scenario, let’s understand the core concept:

Key Components:

  1. OIDC Provider: Kubernetes cluster’s identity provider (must be publicly accessible)
  2. Service Account Token: Short-lived JWT issued by Kubernetes
  3. Federated Credential: Trust relationship in Azure AD
  4. Token Exchange: JWT → Azure access token

Scenario 1: Production EKS with Crossplane

Overview

In production, we run Crossplane on EKS clusters to provision and manage Azure resources. EKS provides a native OIDC provider that Azure can validate directly.

Architecture

Step 1: EKS Cluster Configuration

EKS clusters come with OIDC provider enabled by default. Get your OIDC provider URL:

# Get EKS OIDC provider URL
aws eks describe-cluster --name your-cluster-name \
  --query "cluster.identity.oidc.issuer" --output text

# Example output: https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE

Step 2: Azure AD Application Setup

Create an Azure AD application for production:

# Create Azure AD application
az ad app create --display-name "crossplane-production-azure"

# Get the client ID
AZURE_CLIENT_ID=$(az ad app list --display-name "crossplane-production-azure" \
  --query "[0].appId" -o tsv)

# Get tenant ID
AZURE_TENANT_ID=$(az account show --query tenantId -o tsv)

echo "Client ID: $AZURE_CLIENT_ID"
echo "Tenant ID: $AZURE_TENANT_ID"

Step 3: Create Federated Credential

Configure the trust relationship between EKS and Azure AD:

# Get EKS OIDC issuer (without https://)
EKS_OIDC_ISSUER=$(aws eks describe-cluster --name your-cluster-name \
  --query "cluster.identity.oidc.issuer" --output text | sed 's|https://||')

# Create federated credential
az ad app federated-credential create \
  --id $AZURE_CLIENT_ID \
  --parameters '{
    "name": "eks-crossplane-federated-credential",
    "issuer": "https://'"$EKS_OIDC_ISSUER"'",
    "subject": "system:serviceaccount:crossplane-system:provider-azure-sa",
    "audiences": ["api://AzureADTokenExchange"]
  }'

Step 4: Assign Azure Permissions

Grant necessary permissions to the Azure AD application:

# Assign Contributor role
az role assignment create \
  --role "Contributor" \
  --assignee $AZURE_CLIENT_ID \
  --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID"

# Assign User Access Administrator (if needed for role assignments)
az role assignment create \
  --role "User Access Administrator" \
  --assignee $AZURE_CLIENT_ID \
  --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID"

Step 5: Crossplane Deployment Configuration

Configure Crossplane to use workload identity:

# deployment-runtime-config.yaml
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: azure-provider-deployment-runtime-config
spec:
  serviceAccountTemplate:
    metadata:
      name: provider-azure-sa
      annotations:
        azure.workload.identity/client-id: "YOUR_AZURE_CLIENT_ID"
        azure.workload.identity/tenant-id: "YOUR_AZURE_TENANT_ID"
      labels:
        azure.workload.identity/use: "true"
  deploymentTemplate:
    spec:
      template:
        spec:
          containers:
          - name: package-runtime
            env:
            - name: AZURE_CLIENT_ID
              value: "YOUR_AZURE_CLIENT_ID"
            - name: AZURE_TENANT_ID
              value: "YOUR_AZURE_TENANT_ID"
            - name: AZURE_FEDERATED_TOKEN_FILE
              value: "/var/run/secrets/azure/tokens/azure-identity-token"
            volumeMounts:
            - name: azure-identity-token
              mountPath: /var/run/secrets/azure/tokens
              readOnly: true
          volumes:
          - name: azure-identity-token
            projected:
              sources:
              - serviceAccountToken:
                  path: azure-identity-token
                  audience: api://AzureADTokenExchange
                  expirationSeconds: 3600

Step 6: Azure Provider Configuration

Configure the Crossplane Azure provider:

# provider-config.yaml
apiVersion: azure.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: OIDCTokenFile
  subscriptionID: "YOUR_AZURE_SUBSCRIPTION_ID"
  tenantID: "YOUR_AZURE_TENANT_ID"
  clientID: "YOUR_AZURE_CLIENT_ID"

Step 7: Deploy Crossplane Provider

# Install Crossplane
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm install crossplane crossplane-stable/crossplane \
  --namespace crossplane-system --create-namespace

# Install Azure provider
kubectl apply -f - <<EOF
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-azure-network
spec:
  package: xpkg.upbound.io/upbound/provider-azure-network:v0.39.0
  runtimeConfigRef:
    name: azure-provider-deployment-runtime-config
EOF

# Apply provider config
kubectl apply -f provider-config.yaml

Verification

# Check provider status kubectl get providers

Check provider pods

# Check provider status
kubectl get providers

# Check provider pods
kubectl get pods -n crossplane-system

# Verify token projection
kubectl exec -n crossplane-system deployment/provider-azure-network -- \
  ls -la /var/run/secrets/azure/tokens/

# Test Azure connectivity
kubectl logs -n crossplane-system deployment/provider-azure-network \
  -c package-runtime --tail=50

Scenario 2: Local Development with Kind and ngrok

Overview

Local development presented the biggest challenge: Kind clusters don’t have publicly accessible OIDC providers, but Azure needs to validate tokens against public endpoints. Our solution uses ngrok to expose the Kind cluster’s OIDC endpoints.

The Problem

The Solution: ngrok Tunnel

Step 1: Install Prerequisites

# Install ngrok
brew install ngrok

# Authenticate ngrok (get token from ngrok.com)
ngrok config add-authtoken YOUR_NGROK_TOKEN

# Install Kind
brew install kind

# Install kubectl
brew install kubectl

Step 2: Start ngrok Tunnel

# Start ngrok tunnel to expose Kubernetes API server
ngrok http https://localhost:6443 --log=stdout > /tmp/ngrok.log 2>&1 &

# Wait for ngrok to start
sleep 3

# Get ngrok public URL
NGROK_URL=$(curl -s http://localhost:4040/api/tunnels | \
  jq -r '.tunnels[0].public_url')

echo "ngrok URL: $NGROK_URL"
# Example: https://abc123.ngrok.io

Step 3: Create Kind Cluster with ngrok OIDC

This is the critical configuration that makes it work:

# Create Kind cluster with ngrok as OIDC issuer
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: crossplane-dev
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        service-account-issuer: ${NGROK_URL}
        service-account-jwks-uri: ${NGROK_URL}/openid/v1/jwks
        service-account-signing-key-file: /etc/kubernetes/pki/sa.key
        service-account-key-file: /etc/kubernetes/pki/sa.pub
        api-audiences: api://AzureADTokenExchange
        anonymous-auth: "true"
EOF

Key Configuration Points:

  • service-account-issuer: Set to ngrok URL (not localhost!)
  • service-account-jwks-uri: Points to ngrok URL for public key discovery
  • api-audiences: Must include api://AzureADTokenExchange
  • anonymous-auth: "true": Allows Azure to fetch OIDC discovery without authentication

Step 4: Configure RBAC for OIDC Discovery

Azure needs anonymous access to OIDC endpoints:

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: oidc-discovery
rules:
- nonResourceURLs:
  - "/.well-known/openid-configuration"
  - "/.well-known/jwks"
  - "/openid/v1/jwks"
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: oidc-discovery
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: oidc-discovery
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:anonymous
EOF

Step 5: Create Individual Azure AD App

# Get developer name
DEVELOPER_NAME=$(whoami)

# Create Azure AD app
az ad app create --display-name "crossplane-local-dev-${DEVELOPER_NAME}"

# Get client ID
AZURE_CLIENT_ID=$(az ad app list \
  --display-name "crossplane-local-dev-${DEVELOPER_NAME}" \
  --query "[0].appId" -o tsv)

# Create federated credential with ngrok URL
az ad app federated-credential create \
  --id $AZURE_CLIENT_ID \
  --parameters '{
    "name": "kind-local-dev-federated-credential",
    "issuer": "'"$NGROK_URL"'",
    "subject": "system:serviceaccount:crossplane-system:provider-azure-sa",
    "audiences": ["api://AzureADTokenExchange"]
  }'

# Assign Azure permissions
az role assignment create \
  --role "Contributor" \
  --assignee $AZURE_CLIENT_ID \
  --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID"

Step 6: Deploy Crossplane with Workload Identity

# Install Crossplane
helm install crossplane crossplane-stable/crossplane \
  --namespace crossplane-system --create-namespace

# Create deployment runtime config
kubectl apply -f - <<EOF
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: azure-provider-deployment-runtime-config
spec:
  serviceAccountTemplate:
    metadata:
      name: provider-azure-sa
      annotations:
        azure.workload.identity/client-id: "${AZURE_CLIENT_ID}"
        azure.workload.identity/tenant-id: "${AZURE_TENANT_ID}"
      labels:
        azure.workload.identity/use: "true"
  deploymentTemplate:
    spec:
      template:
        spec:
          containers:
          - name: package-runtime
            env:
            - name: AZURE_CLIENT_ID
              value: "${AZURE_CLIENT_ID}"
            - name: AZURE_TENANT_ID
              value: "${AZURE_TENANT_ID}"
            - name: AZURE_FEDERATED_TOKEN_FILE
              value: "/var/run/secrets/azure/tokens/azure-identity-token"
            volumeMounts:
            - name: azure-identity-token
              mountPath: /var/run/secrets/azure/tokens
              readOnly: true
          volumes:
          - name: azure-identity-token
            projected:
              sources:
              - serviceAccountToken:
                  path: azure-identity-token
                  audience: api://AzureADTokenExchange
                  expirationSeconds: 3600
EOF

# Install Azure provider
kubectl apply -f - <<EOF
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-azure-network
spec:
  package: xpkg.upbound.io/upbound/provider-azure-network:v0.39.0
  runtimeConfigRef:
    name: azure-provider-deployment-runtime-config
EOF

# Create provider config
kubectl apply -f - <<EOF
apiVersion: azure.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: OIDCTokenFile
  subscriptionID: "${AZURE_SUBSCRIPTION_ID}"
  tenantID: "${AZURE_TENANT_ID}"
  clientID: "${AZURE_CLIENT_ID}"
EOF

Step 7: Verify Setup

# Verify OIDC discovery is accessible via ngrok
curl -k "${NGROK_URL}/.well-known/openid-configuration"

# Check provider status
kubectl get providers

# Verify token projection
kubectl exec -n crossplane-system deployment/provider-azure-network -- \
  cat /var/run/secrets/azure/tokens/azure-identity-token | \
  cut -d. -f2 | base64 -d | jq .

# Check provider logs
kubectl logs -n crossplane-system deployment/provider-azure-network \
  -c package-runtime --tail=50

Cleanup

# Delete Azure AD app
az ad app delete --id $AZURE_CLIENT_ID

# Delete Kind cluster
kind delete cluster --name crossplane-dev

# Stop ngrok
pkill ngrok

Scenario 3: GitHub Actions CI with Kind

Overview

For CI/CD, we use GitHub Actions’ native OIDC provider instead of ngrok. This provides a stable, public OIDC issuer that Azure can validate directly.

Architecture

Step 1: One-Time Azure AD App Setup

Create a shared Azure AD app for CI:

# Create Azure AD app for CI
az ad app create --display-name "crossplane-ci-github-actions"

# Get client ID
AZURE_CLIENT_ID=$(az ad app list \
  --display-name "crossplane-ci-github-actions" \
  --query "[0].appId" -o tsv)

# Create federated credential for pull requests
az ad app federated-credential create \
  --id $AZURE_CLIENT_ID \
  --parameters '{
    "name": "github-pr-federated-credential",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:your-org/your-repo:pull_request",
    "audiences": ["api://AzureADTokenExchange"]
  }'

# Assign Azure permissions
az role assignment create \
  --role "Contributor" \
  --assignee $AZURE_CLIENT_ID \
  --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID"

az role assignment create \
  --role "User Access Administrator" \
  --assignee $AZURE_CLIENT_ID \
  --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID"

Step 2: Store Configuration (Not Secrets!)

Create a configuration file with public identifiers:

# ci-azure-config.env
AZURE_CLIENT_ID=12345678-1234-1234-1234-123456789012
AZURE_TENANT_ID=87654321-4321-4321-4321-210987654321
AZURE_SUBSCRIPTION_ID=abcdef12-3456-7890-abcd-ef1234567890

Important: These are public identifiers, safe to commit to your repository!

Step 3: GitHub Actions Workflow

Create .github/workflows/e2e-tests.yaml:

name: E2E Integration Tests

on:
  pull_request:
    branches: [main]

permissions:
  id-token: write  # Required for GitHub OIDC
  contents: read

jobs:
  run-e2e-tests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Load CI Azure Configuration
        run: |
          source ci-azure-config.env
          echo "AZURE_CLIENT_ID=$AZURE_CLIENT_ID" >> $GITHUB_ENV
          echo "AZURE_TENANT_ID=$AZURE_TENANT_ID" >> $GITHUB_ENV
          echo "AZURE_SUBSCRIPTION_ID=$AZURE_SUBSCRIPTION_ID" >> $GITHUB_ENV

      - name: Azure Login with OIDC
        uses: azure/login@v1
        with:
          client-id: ${{ env.AZURE_CLIENT_ID }}
          tenant-id: ${{ env.AZURE_TENANT_ID }}
          subscription-id: ${{ env.AZURE_SUBSCRIPTION_ID }}

      - name: Create Kind Cluster
        run: |
          # Install Kind
          curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
          chmod +x ./kind
          sudo mv ./kind /usr/local/bin/kind

          # Create standard Kind cluster (no special OIDC config needed)
          kind create cluster --name ci-cluster

      - name: Setup GitHub OIDC Tokens for Crossplane
        run: |
          # Get GitHub OIDC token
          GITHUB_TOKEN=$(curl -s \
            -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
            "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=api://AzureADTokenExchange" | \
            jq -r ".value")

          # Create secrets with GitHub OIDC tokens
          kubectl create namespace crossplane-system
          kubectl create secret generic azure-identity-token \
            --from-literal=azure-identity-token="$GITHUB_TOKEN" \
            --namespace=crossplane-system

          # Start background token refresh (GitHub tokens expire in 5 minutes)
          nohup bash -c '
            while true; do
              sleep 240  # Refresh every 4 minutes
              GITHUB_TOKEN=$(curl -s \
                -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
                "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=api://AzureADTokenExchange" | \
                jq -r ".value")

              if [ -n "$GITHUB_TOKEN" ] && [ "$GITHUB_TOKEN" != "null" ]; then
                kubectl create secret generic azure-identity-token \
                  --from-literal=azure-identity-token="$GITHUB_TOKEN" \
                  --namespace=crossplane-system \
                  --dry-run=client -o yaml | kubectl apply -f -
              fi
            done
          ' > /tmp/token_refresh.log 2>&1 &

      - name: Install Crossplane
        run: |
          helm repo add crossplane-stable https://charts.crossplane.io/stable
          helm install crossplane crossplane-stable/crossplane \
            --namespace crossplane-system --create-namespace --wait

      - name: Configure Crossplane with Workload Identity
        run: |
          # Create deployment runtime config
          kubectl apply -f - <<EOF
          apiVersion: pkg.crossplane.io/v1beta1
          kind: DeploymentRuntimeConfig
          metadata:
            name: azure-provider-deployment-runtime-config
          spec:
            serviceAccountTemplate:
              metadata:
                name: provider-azure-sa
                annotations:
                  azure.workload.identity/client-id: "${{ env.AZURE_CLIENT_ID }}"
                  azure.workload.identity/tenant-id: "${{ env.AZURE_TENANT_ID }}"
                labels:
                  azure.workload.identity/use: "true"
            deploymentTemplate:
              spec:
                template:
                  spec:
                    containers:
                    - name: package-runtime
                      env:
                      - name: AZURE_CLIENT_ID
                        value: "${{ env.AZURE_CLIENT_ID }}"
                      - name: AZURE_TENANT_ID
                        value: "${{ env.AZURE_TENANT_ID }}"
                      - name: AZURE_FEDERATED_TOKEN_FILE
                        value: "/var/run/secrets/azure/tokens/azure-identity-token"
                      volumeMounts:
                      - name: azure-identity-token
                        mountPath: /var/run/secrets/azure/tokens
                        readOnly: true
                    volumes:
                    - name: azure-identity-token
                      secret:
                        secretName: azure-identity-token
                        items:
                        - key: azure-identity-token
                          path: azure-identity-token
          EOF

          # Install Azure provider
          kubectl apply -f - <<EOF
          apiVersion: pkg.crossplane.io/v1
          kind: Provider
          metadata:
            name: provider-azure-network
          spec:
            package: xpkg.upbound.io/upbound/provider-azure-network:v0.39.0
            runtimeConfigRef:
              name: azure-provider-deployment-runtime-config
          EOF

          # Wait for provider to be ready
          kubectl wait --for=condition=healthy --timeout=300s \
            provider/provider-azure-network

          # Create provider config
          kubectl apply -f - <<EOF
          apiVersion: azure.upbound.io/v1beta1
          kind: ProviderConfig
          metadata:
            name: default
          spec:
            credentials:
              source: OIDCTokenFile
            subscriptionID: "${{ env.AZURE_SUBSCRIPTION_ID }}"
            tenantID: "${{ env.AZURE_TENANT_ID }}"
            clientID: "${{ env.AZURE_CLIENT_ID }}"
          EOF

      - name: Run E2E Tests
        run: |
          # Your E2E tests here
          kubectl apply -f test/e2e/test-resources.yaml

          # Wait for resources to be ready
          kubectl wait --for=condition=ready --timeout=600s \
            -f test/e2e/test-resources.yaml

      - name: Cleanup
        if: always()
        run: |
          # Delete test resources
          kubectl delete -f test/e2e/test-resources.yaml --wait=false

          # Delete Kind cluster
          kind delete cluster --name ci-cluster

Key Differences from Local Dev

| Aspect | Local Development | GitHub Actions CI | |----|----|----| | OIDC Issuer | ngrok tunnel | GitHub native OIDC | | Token Source | Projected service account | GitHub OIDC token in secret | | Token Lifetime | 1 hour (auto-refresh) | 5 minutes (manual refresh) | | Cluster Config | Custom OIDC issuer | Standard Kind cluster | | Azure AD App | Individual per developer | Shared for CI | | Token Storage | Projected volume | Kubernetes secret |

Token Refresh Implementation

GitHub OIDC tokens expire in 5 minutes, so we implement automatic refresh:

# Background token refresh daemon
nohup bash -c '
  while true; do
    sleep 240  # Wait 4 minutes

    # Get fresh GitHub OIDC token
    GITHUB_TOKEN=$(curl -s \
      -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
      "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=api://AzureADTokenExchange" | \
      jq -r ".value")

    if [ -n "$GITHUB_TOKEN" ] && [ "$GITHUB_TOKEN" != "null" ]; then
      # Update secret (Kubernetes auto-updates mounted files)
      kubectl create secret generic azure-identity-token \
        --from-literal=azure-identity-token="$GITHUB_TOKEN" \
        --namespace=crossplane-system \
        --dry-run=client -o yaml | kubectl apply -f -
    fi
  done
' > /tmp/token_refresh.log 2>&1 &

Comparison: Three Scenarios Side-by-Side

| Feature | EKS Production | Local Development | GitHub Actions CI | |----|----|----|----| | OIDC Provider | EKS native | ngrok tunnel | GitHub native | | Cluster Type | EKS | Kind | Kind | | Token Projection | Projected volume | Projected volume | Secret volume | | Token Lifetime | 1 hour | 1 hour | 5 minutes | | Token Refresh | Automatic | Automatic | Manual daemon | | Azure AD App | Production app | Individual per dev | Shared CI app | | Setup Complexity | Low | Medium | Medium | | Security Isolation | High | High (per dev) | Medium (shared) | | Public Accessibility | ✅ Native | ✅ Via ngrok | ✅ Native |

Troubleshooting Guide

Common Issues Across All Scenarios

Issue 1: Token File Not Found

Error:

reading OIDC Token from file "/var/run/secrets/azure/tokens/azure-identity-token": no such file or directory

Solution:

# Check if volume is mounted
kubectl exec -n crossplane-system deployment/provider-azure-network -- \
  ls -la /var/run/secrets/azure/tokens/

# Verify deployment configuration
kubectl get deploymentruntimeconfig azure-provider-deployment-runtime-config -o yaml

# Check provider pod spec
kubectl get pod -n crossplane-system -l pkg.crossplane.io/provider=provider-azure-network -o yaml

Issue 2: Azure Authentication Failure

Error:

AADSTS700211: No matching federated identity record found for presented assertion issuer

Solution:

# Verify federated credential configuration
az ad app federated-credential list --id $AZURE_CLIENT_ID

# Check token claims
kubectl exec -n crossplane-system deployment/provider-azure-network -- \
  cat /var/run/secrets/azure/tokens/azure-identity-token | \
  cut -d. -f2 | base64 -d | jq .

# Ensure issuer and subject match exactly

Local Development Specific Issues

Issue 3: ngrok URL Changed

Error: Authentication fails after restarting ngrok

Solution:

# Get new ngrok URL
NGROK_URL=$(curl -s http://localhost:4040/api/tunnels | \
  jq -r '.tunnels[0].public_url')

# Update federated credential
az ad app federated-credential update \
  --id $AZURE_CLIENT_ID \
  --federated-credential-id <credential-id> \
  --parameters '{
    "issuer": "'"$NGROK_URL"'"
  }'

# Recreate Kind cluster with new URL
kind delete cluster --name crossplane-dev
# Then recreate with new ngrok URL

Issue 4: OIDC Discovery Endpoint Unreachable

Error:

AADSTS50166: Request to External OIDC endpoint failed

Solution:

# Verify ngrok is running
curl -s http://localhost:4040/api/tunnels

# Test OIDC discovery endpoint
curl -k "${NGROK_URL}/.well-known/openid-configuration"

# Check RBAC permissions
kubectl get clusterrolebinding oidc-discovery -o yaml

GitHub Actions Specific Issues

Issue 5: Token Expiration in Long Tests

Error: Authentication fails after 5 minutes

Solution:

# Verify token refresh daemon is running
ps aux | grep "refresh_tokens"

# Check refresh logs
tail -f /tmp/token_refresh.log

# Manually refresh token
GITHUB_TOKEN=$(curl -s \
  -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
  "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=api://AzureADTokenExchange" | \
  jq -r ".value")

kubectl create secret generic azure-identity-token \
  --from-literal=azure-identity-token="$GITHUB_TOKEN" \
  --namespace=crossplane-system \
  --dry-run=client -o yaml | kubectl apply -f -

Best Practices and Recommendations

Security Best Practices

  1. Individual Identities: Use separate Azure AD apps for each environment
  2. Least Privilege: Grant minimum required Azure permissions
  3. Resource Group Scoping: Limit permissions to specific resource groups
  4. Regular Audits: Review Azure AD audit logs for unusual activity
  5. Token Expiration: Use short token lifetimes (1 hour recommended)

Operational Best Practices

  1. Automation: Use scripts to automate Azure AD app creation and cleanup
  2. Documentation: Maintain clear documentation of federated credentials
  3. Monitoring: Set up alerts for authentication failures
  4. Testing: Test configuration changes in non-production first
  5. Cleanup: Always clean up Azure AD apps after development

Workflow Recommendations

For Local Development:

  • Create automation scripts to start/stop your development environment
  • Include Azure AD app creation and cleanup in your setup scripts
  • Document the setup process for new team members

For CI/CD:

  • Configure your CI pipeline to automatically handle token refresh
  • Set up proper cleanup steps to remove test resources
  • Use repository-scoped federated credentials for security

For Production:

  • Implement monitoring and alerting for authentication failures
  • Document the federated credential configuration
  • Plan for disaster recovery scenarios

Conclusion

We successfully implemented Azure Workload Identity Federation across three distinct scenarios:

  1. EKS Production: Leveraging native EKS OIDC for seamless Azure authentication
  2. Local Development: Using ngrok to expose Kind cluster OIDC endpoints with individual developer isolation
  3. GitHub Actions CI: Utilizing GitHub’s native OIDC provider for automated testing

Key Achievements

  • Zero Stored Secrets: No credentials stored anywhere across all environments
  • Consistent Pattern: Same workload identity approach from dev to production
  • Individual Isolation: Each developer has separate Azure identity
  • Automatic Rotation: All tokens are short-lived and auto-refreshed
  • Clear Audit Trail: Full visibility into all Azure operations

Implementation Summary

This approach has transformed Azure authentication from a security liability into a robust, automated system that works consistently across all environments. The complete configurations shown in this blog post can be adapted to your specific infrastructure and repository structure.

Key takeaways:

  • All three scenarios use the same workload identity federation principle
  • Configuration differences are minimal between environments
  • The same Azure provider setup works across all scenarios
  • Token management is automatic in all cases

Additional Resources

\