MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

The TechBeat: Vibe Coding: How AI Is Shaping a New Paradigm in Software Development (1/18/2026)

2026-01-18 15:11:00

How are you, hacker? 🪐Want to know what's trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting By @zbruceli [ 18 Min read ] A deep dive into the Internet Archive's custom tech stack. Read More.

CodeRabbit vs Code Reviews in Kilo: Which One Is Best For You in 2026

By @kilocode [ 6 Min read ] CodeRabbit alternative for 2026: Kilo's Code Reviews combines AI code review with coding agents, deploy tools, and 500+ models in one unified platform. Read More.

Back to Basics: Database Design as Storytelling

By @dataops [ 3 Min read ] Why great database design is really storytelling—and why ignoring relational fundamentals leads to poor performance AI can’t fix. Read More.

How Automation Makes DataOps Work in Real Enterprise Environments

By @dataops [ 4 Min read ] DataOps provides the blueprint, but automation makes it scalable. Learn how enforced CI/CD, observability, and governance turn theory into reality. Read More.

HARmageddon is cancelled: how we taught Playwright to replay HAR with dynamic parameters

By @socialdiscoverygroup [ 19 Min read ] We taught Playwright to find the correct HAR entry even when query/body values change and prevented reusing entities with dynamic identifiers. Read More.

Jetpack Compose Memory Leaks: A Reference-Graph Deep Dive

By @mohansankaran [ 10 Min read ] Jetpack Compose memory leaks are usually reference leaks. Learn the top leak patterns, why they happen, and how to fix them. Read More.

Zero-Trust Data Access for AI Training: New Architecture Patterns for Cloud and On-Prem Workloads

By @rahul-gupta [ 8 Min read ] As AI adoption grows, legacy data access controls fall short. Here’s why zero-trust data security is becoming essential for modern AI systems. Read More.

Proof of Usefulness Hackathon: Win $150K+ from Bright Data, Neo4j, Algolia, Storyblok & HackerNoon 

By @proofofusefulness [ 8 Min read ] Proof of Usefulness is a global hackathon powered by HackerNoon that rewards one thing and one thing only: usefulness. Win from $150k! Read More.

The Authorization Gap No One Wants to Talk About: Why Your API Is Probably Leaking Right Now

By @drechimyn [ 7 Min read ] Broken Object Level Authorization (BOLA) is eating the API economy from the inside out. Read More.

Agent-specificity is the New Accuracy

By @erelcohen [ 4 Min read ] Accuracy is no longer the gold standard for AI agents—specificity is. Read More.

Complete Ollama Tutorial (2026) – LLMs via CLI, Cloud & Python

By @proflead [ 4 Min read ] Ollama is an open-source platform for running and managing large-language-model (LLM) packages entirely on your local machine. Read More.

A Year of AI in My Life as an Engineer

By @manoja [ 4 Min read ] A senior engineer explains how AI tools changed document writing, code review, and system understanding, without replacing judgment or accountability. Read More.

How to Make Email Marketing Work for You

By @jonstojanjournalist [ 3 Min read ] Ensure your emails are seen with deliverability testing. Optimize campaigns, boost engagement, and protect sender reputation effectively. Read More.

AI - Should we Be Afraid? 3 Years Later

By @djcampbell [ 6 Min read ] Is AI good or bad? We must decide. Read More.

Meet Ola.cv: HackerNoon Company of the Week

By @companyoftheweek [ 4 Min read ] Ola.cv is the official registry for the .CV domain, helping individuals to build next-gen professional links and profiles to enhance their digital presence. Read More.

Slop Isn’t the Problem. It’s the Symptom.

By @normbond [ 3 Min read ] When teams move fast without shared meaning, quality dissolves quietly. Why slop is a symptom of interpretation lag, not a technology failure. Read More.

How I stopped fighting AI and started shipping features 10x faster with Claude Code and Codex

By @tigranbs [ 9 Min read ] A deep dive into my production workflow for AI-assisted development, separating task planning from implementation for maximum focus and quality. Read More.

Top 10 Bitcoin Mining Companies Tested for 2026: Real ROI, Costs, and Rankings

By @sanya_kapoor [ 16 Min read ] A 60-day test of 10 Bitcoin mining companies reveals which hosting providers deliver the best uptime, electricity rates, and ROI in 2026. Read More.

The Secret Math Behind Every Creative Breakthrough

By @praisejamesx [ 6 Min read ] Stop relying on "vibes" and "hustle." History rewards those with better models, not better speeches. Read More.

Vibe Coding: How AI Is Shaping a New Paradigm in Software Development

By @khushboo [ 3 Min read ] What Is Vibe Coding? AI-First Software Development Explained Read More. 🧑‍💻 What happened in your world this week? It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️ ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it. See you on Planet Internet! With love, The HackerNoon Team ✌️

How to Use EKS Pod Identity to Isolate Tenant Data in S3 With a Shared IAM Role

2026-01-18 13:00:08

The Challenge: IAM Role Proliferation in Multi-Tenant Architectures

When building multi-tenant Kubernetes applications that require AWS resource access, teams traditionally face a difficult choice: either create separate IAM roles for each tenant (leading to IAM role sprawl) or implement complex application-level access controls. With AWS’s default limit of 1,000 IAM roles per account, this becomes a critical scalability bottleneck for platforms serving hundreds or thousands of tenants.

Consider a typical multi-tenant SaaS platform running on Amazon EKS where each tenant needs isolated access to S3 storage. Using the traditional IRSA (IAM Roles for Service Accounts) approach, you would need:

  • One IAM role per tenant for S3 access
  • Separate service accounts for each tenant
  • Individual IRSA annotations on each service account
  • Complex role management as tenants are added or removed

For a platform with 500 tenants, this means managing 500+ IAM roles just for S3 access alone—consuming half of your account’s IAM role quota before considering any other AWS services or infrastructure needs.

The Solution: EKS Pod Identity with Shared IAM Roles

EKS Pod Identity, introduced in late 2023, fundamentally changes this equation. Instead of requiring one IAM role per tenant, you can use a single shared IAM role for all tenants while maintaining strict security isolation through namespace-based access controls.

How It Works

The key innovation is the automatic injection of principal tags by the Pod Identity agent. When a pod assumes an IAM role through Pod Identity, AWS automatically adds the pod’s namespace as a principal tag (kubernetes-namespace). This tag can then be used in IAM and S3 bucket policies to enforce tenant isolation at the AWS policy level.

Here’s the architecture:

The IAM Policy Magic

The shared IAM role uses the ${aws:PrincipalTag/kubernetes-namespace} variable to dynamically scope permissions based on the pod’s namespace:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListBucketByNamespacePrefix",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::my-tenant-bucket",
      "Condition": {
        "StringLike": {
          "s3:prefix": "${aws:PrincipalTag/kubernetes-namespace}/*"
        }
      }
    },
    {
      "Sid": "ReadWriteInNamespaceFolder",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-tenant-bucket/${aws:PrincipalTag/kubernetes-namespace}/*"
    }
  ]
}

When a pod in the tenant-app-1 namespace assumes this role, the ${aws:PrincipalTag/kubernetes-namespace} variable automatically resolves to tenant-app-1, restricting access to only the tenant-app-1/ prefix in the S3 bucket.

The Scalability Comparison

Visual Comparison: IAM Role Growth

Traditional IRSA Approach

| Tenants | IAM Roles Required | % of Account Quota Used | |----|----|----| | 100 | 100+ | 10% | | 500 | 500+ | 50% | | 1,000 | 1,000+ | 100% (quota limit) | | 2,000 | ❌ Not possible | ❌ Exceeds quota |

Challenges:

  • Linear growth in IAM roles with tenant count
  • Complex role lifecycle management
  • Service account annotation overhead
  • Quota exhaustion at scale
  • Difficult to audit and maintain

EKS Pod Identity Approach

| Tenants | IAM Roles Required | % of Account Quota Used | |----|----|----| | 100 | 1 | 0.1% | | 500 | 1 | 0.1% | | 1,000 | 1 | 0.1% | | 10,000 | 1 | 0.1% |

Benefits:

  • Constant IAM role count regardless of tenant count
  • Simplified role management
  • No service account annotations needed for tenants
  • Scales to tens of thousands of tenants
  • Centralized policy management

Defense-in-Depth Security

While using a shared IAM role might initially seem less secure, the implementation actually provides defense-in-depth through multiple security layers:

Layer 1: IAM Role Policy

The IAM role policy uses principal tags to restrict resource access patterns:

  • Pods can only list objects with their namespace prefix
  • Object operations are scoped to namespace/* paths
  • Upload operations require matching namespace tags

Layer 2: S3 Bucket Policy

The S3 bucket policy mirrors the IAM restrictions at the bucket level:

  • Provides protection even if IAM roles are misconfigured
  • Enforces path-based access controls
  • Validates namespace tags on all operations

Layer 3: Mandatory Object Tagging

All uploaded objects must include a kubernetes-namespace tag matching the principal tag:

{
  "Sid": "PutObjectWithNamespaceTag",
  "Effect": "Allow",
  "Action": "s3:PutObject",
  "Resource": "arn:aws:s3:::bucket/${aws:PrincipalTag/kubernetes-namespace}/*",
  "Condition": {
    "StringEquals": {
      "s3:RequestObjectTag/kubernetes-namespace": "${aws:PrincipalTag/kubernetes-namespace}"
    }
  }
}

Layer 4: Tag Modification Prevention

Explicit deny policies prevent post-upload tag modifications to prevent namespace spoofing:

{
  "Sid": "DenyPostUploadTagModification",
  "Effect": "Deny",
  "Action": "s3:PutObjectTagging",
  "Resource": "arn:aws:s3:::bucket/${aws:PrincipalTag/kubernetes-namespace}/*",
  "Condition": {
    "Null": {
      "s3:ExistingObjectTag/kubernetes-namespace": "false"
    }
  }
}

Real-World Implementation

Here’s what tenant isolation looks like in practice:

Allowed Operations (Pod in tenant-app-1 namespace)

# ✅ List objects in own namespace
aws s3 ls s3://my-bucket/tenant-app-1/

# ✅ Upload with proper namespace tag
aws s3 cp file.txt s3://my-bucket/tenant-app-1/file.txt \
  --tagging "kubernetes-namespace=tenant-app-1"

# ✅ Download from own namespace
aws s3 cp s3://my-bucket/tenant-app-1/file.txt ./downloaded.txt

# ✅ Delete from own namespace
aws s3 rm s3://my-bucket/tenant-app-1/file.txt

Blocked Operations (Automatic Denial)

# ❌ Cannot access other tenant's data
aws s3 ls s3://my-bucket/tenant-app-2/
# Error: Access Denied

# ❌ Cannot upload without proper tag
aws s3 cp file.txt s3://my-bucket/tenant-app-1/untagged.txt
# Error: Access Denied

# ❌ Cannot upload with wrong namespace tag
aws s3 cp file.txt s3://my-bucket/tenant-app-1/file.txt \
  --tagging "kubernetes-namespace=tenant-app-2"
# Error: Access Denied

# ❌ Cannot list bucket root
aws s3 ls s3://my-bucket/
# Error: Access Denied

Operational Benefits

Beyond the obvious scalability advantages, EKS Pod Identity provides significant operational improvements:

Simplified Tenant Onboarding

IRSA Approach:

  1. Create new IAM role for tenant
  2. Configure trust policy with OIDC provider
  3. Create service account with IRSA annotation
  4. Deploy tenant workload
  5. Verify IAM role assumption

Pod Identity Approach:

  1. Create namespace for tenant
  2. Create Pod Identity Association (one API call)
  3. Deploy tenant workload
  4. Automatic credential injection

Reduced Management Overhead

  • No service account annotations needed for tenant workloads
  • Centralized policy updates affect all tenants simultaneously
  • Simplified auditing with single IAM role to monitor
  • Easier compliance with consolidated access patterns

Cross-Account Support

The architecture supports cross-account S3 buckets seamlessly:

  • IAM roles in EKS cluster account
  • S3 bucket in separate storage account
  • Automatic policy synchronization
  • Multiple DataPlanes can share buckets

When to Use EKS Pod Identity vs IRSA

Use EKS Pod Identity When:

  • ✅ Building multi-tenant platforms with many tenants
  • ✅ Need to scale beyond hundreds of tenants
  • ✅ Want simplified tenant lifecycle management
  • ✅ Require namespace-based resource isolation
  • ✅ Approaching IAM role quota limits

Stick with IRSA When:

  • ⚠️ Need per-tenant IAM policy customization
  • ⚠️ Require different AWS service access per tenant
  • ⚠️ Have complex cross-account role assumption patterns
  • ⚠️ Running on EKS clusters that don’t meet Pod Identity requirements (Kubernetes 1.24+ with supported platform versions)

Getting Started

To implement this pattern in your EKS cluster:

  1. Enable Pod Identity on your EKS cluster (EKS 1.24+)
  2. Create the shared IAM role with principal tag-based policies
  3. Configure S3 bucket policy with matching restrictions
  4. Create Pod Identity Associations linking namespaces to the IAM role
  5. Deploy tenant workloads with standard service accounts (no annotations)

The Pod Identity agent automatically handles credential injection and namespace tag propagation—no application code changes required.

Conclusion

EKS Pod Identity represents a paradigm shift in how we approach multi-tenant AWS resource access. By leveraging automatic principal tag injection and policy variables, teams can:

  • Scale to thousands of tenants with a single IAM role
  • Maintain strict security isolation through defense-in-depth policies
  • Simplify operations with centralized policy management
  • Avoid IAM quota limitations that constrain growth

For platforms serving hundreds or thousands of tenants, the choice is clear: EKS Pod Identity eliminates the IAM role proliferation problem while actually improving security through standardized, auditable access patterns.

The future of multi-tenant Kubernetes on AWS is not about creating more IAM roles—it’s about using smarter policies with fewer roles.


Additional Resources

\

HSVM Decision Boundaries: Visualizing PGD vs. SDP and Moment Relaxation

2026-01-18 06:00:07

Table of Links

Abstract and 1. Introduction

  1. Related Works

  2. Convex Relaxation Techniques for Hyperbolic SVMs

    3.1 Preliminaries

    3.2 Original Formulation of the HSVM

    3.3 Semidefinite Formulation

    3.4 Moment-Sum-of-Squares Relaxation

  3. Experiments

    4.1 Synthetic Dataset

    4.2 Real Dataset

  4. Discussions, Acknowledgements, and References

    \

A. Proofs

B. Solution Extraction in Relaxed Formulation

C. On Moment Sum-of-Squares Relaxation Hierarchy

D. Platt Scaling [31]

E. Detailed Experimental Results

F. Robust Hyperbolic Support Vector Machine

E Detailed Experimental Results

E.1 Visualizing Decision Boundaries

Here we visualize the decision boundary of for PGD, SDP relaxation and sparse moment-sum-ofsquares relaxation (Moment) on one fold of the training to provide qualitative judgements.

\ We first visualize training on the first fold for Gaussian 1 dataset from Figure 3 in Figure 5. We mark the train set with circles and test set with triangles, and color the decision boundary obtained by three methods with different colors. In this case, note that SDP and Moment overlap and give identical decision boundary up to machine precision, but they are different from the decision boundary of PGD method. This slight visual difference causes the performance difference displayed in Table 1.

\ We next visualize the decision boundary for tree 2 from Figure 3 in Figure 6. Here the difference is dramatic: we visualize both the entire data in the left panel and the zoomed-in one on the right. We indeed observe that the decision boundary from moment-sum-of-squares relaxation have roughly equal distance from points to the grey class and to the green class, while SDP relaxation is suboptimal in that regard but still enclosing the entire grey region. PGD, however, converges to a very poor local minimum that has a very small radius enclosing no data and thus would simply classify all data sample to the same class, since all data falls to one side of the decision

\ Figure 5: Decision boundary obtained by each method on one fold of train test split on Gaussian 1 dataset in Figure 3. While SDP and moment overlap, they differ from the PGD solution.

\ boundary. As commented in Section 4, data imbalance is to blame, in which case the final converged solution is very sensitive to the choice of initialization and other hyperparameters such as learning rate. This is in stark contrast with solving problems using the interior point method, where after implementing into MOSEK, we are essentially care-free. From this example, we see that empirically sparse moment-sum-of-squares relaxation finds linear separator of the best quality, particularly in cases where PGD is expected to fail.

\ Figure 6: Decision boundary visualization of the train test split from the first fold. The left panel shows all the data and the right panel zooms in to the decision boundary. PGD gets stuck in a bad local minima (a tiny circle in the right panel) and thus classify all data samples to one class. While both SDP and moment relaxation give a decision boundary that demarcate one class from another, Moment has roughly equal margin to samples from the grey class and to samples from the green class, which is preferred in large-margin learning.

E.2 Synthetic Gaussian

To generate mixture of Gaussian in hyperbolic space, we first generate them in Euclidean space, with the center coordinates independently drawn from a standard normal distribution. 𝐾 such centers are drawn for defining 𝐾 different classes. Then we sample isotropic Gaussian at respective center with scale 𝑠. Finally, we lift the generated Gaussian mixtures to hyperbolic spaces using exp0 . For simplicity, we only present results for the extreme values: 𝐾 ∈ {2, 5}, 𝑠 ∈ {0.4, 1}, and 𝐶 ∈ {0.1, 10}.

\ For each method (PGD, SDP, Moment), we compute the train/test accuracy, weighted F1 score, and loss on each of the 5 folds of data for a specific (𝐾, 𝑠, 𝐶) configuration. We then average these metrics across the 5 folds, for all methods and configurations. To illustrate the performance, we plot the improvements of the average metrics of the Moment and SDP methods compared to PGD as bar plots for 15 different seeds. Outliers beyond the interquartile range (Q1 and Q3) are excluded for clarity, and a zero horizontal line is marked for reference. Additionally, to compare the Moment and SDP methods, we compute the average optimality gaps similarly, defined in Equation (15), and present them as bar plots. Our analysis begins by examining the train/test accuracy and weighted F1 score of the PGD, SDP, and Moment methods across various synthetic Gaussian configurations, as shown in Figures 7 to 10.

\ Figure 7: Train/test accuracy and train/test f1 improvements compared to PGD across various 𝐶 ∈ {0.1, 10} for 𝐾 = 2 and 𝑠 = 0.4

\ Figure 8: Train/test accuracy and train/test f1 improvements compared to PGD across various 𝐶 ∈ {0.1, 10} and 𝐶 for 𝐾 = 2 and 𝑠 = 1.0

\ Figure 9: Train/test accuracy and train/test f1 improvements compared to PGD across various 𝐶 ∈ {0.1, 10} for 𝐾 = 5 and 𝑠 = 0.4

\ Figure 10: Train/test accuracy and train/test f1 improvements compared to PGD across various 𝐶 ∈ {0.1, 10} for 𝐾 = 2 and 𝑠 = 1.0

\ Across various configurations, we observe that both the Moment and SDP methods generally show improvements over PGD in terms of train and test accuracy as well as weighted F1 score. Notably, we observe that Moment method often shows more consistent improvements compared to SDP. This consistency is evident across different values of (𝐾, 𝑠, 𝐶), suggesting that the Moment method is more robust and provide more generalizable decision boundaries. Moreover, we observe that 1. for larger number of classes (i.e. larger 𝐾), the Moment method consistently and significantly outperforms both SDP and PGD, highlighting its capability to manage complex class structures efficiently; and 2. for simpler datasets (with smaller scale 𝑠), both Moment and SDP methods generally outperform PGD, where the Moment method particularly shows a promising performance advantage over both PGD and SDP.

\ Figure 11: Train/test loss improvements compared to PGD and optimality gaps comparison across various 𝐶 ∈ {0.1, 10} for 𝐾 = 2 and 𝑠 = 0.4

\ Figure 12: Train/test loss improvements compared to PGD and optimality gaps comparison across various 𝐶 ∈ {0.1, 10} for 𝐾 = 2 and 𝑠 = 1

\ Figure 13: Train/test loss improvements compared to PGD and optimality gaps comparison across various 𝐶 ∈ {0.1, 10} for 𝐾 = 5 and 𝑠 = 0.4

\ Figure 14: Train/test loss improvements compared to PGD and optimality gaps comparison across various 𝐶 ∈ {0.1, 10} for 𝐾 = 5 and 𝑠 = 1

\ Next, we move to examine the train/test loss improvements compared to PGD and optimality gaps comparison across various configurations, shown in Figures 11 to 14. We observe that for 𝐾 = 5, the Moment method achieves significantly smaller losses compared to both PGD and SDP, which aligns with our previous observations on accuracy and weighted F1 scores. However, for 𝐾 = 2, the losses of the Moment and SDP methods are generally larger than PGD’s. Nevertheless, it is important to note that these losses are not direct measurements of our optimization methods’ quality; rather, they measure the quality of the extracted solutions. Therefore, a larger loss does not necessarily imply that our optimization methods are inferior to PGD, as the heuristic extraction methods might significantly impact the loss. Additionally, we observe that the optimality gaps of the Moment method are significantly smaller than those of the SDP method, suggesting that Moment provides better solutions. Interestingly, the optimality gaps of the Moment method also exhibit smaller variance compared to SDP, as indicated by the smaller boxes in the box plots, further supporting the consistency and robustness of the Moment method.

\ Lastly, we compare the computational efficiency of these methods, where we compute the average runtime to finish 1 fold of training for each model on synthetic dataset, shown in Table 4. We observe that sparse moment relaxation typically requires at least one order of magnitude in runtime compared to other methods, which to some extent limits the applicability of this method to large scale dataset.

\ Table 4: Average runtime to finish 1 fold of training for each model on synthetic dataset.

E.3 Real Data

In this section we provide detailed performance breakdown by the choice of regularization 𝐶 for both one-vs-one and one-vs-rest scheme in Tables 5 to 10.

\ Table 5: Real dataset performance (𝐶 = 0.1), one-vs-rest

\ Table 6: Real dataset performance (𝐶 = 1.0), one-vs-rest

\ Table 7: Real dataset performance (𝐶 = 10.0), one-vs-rest

\ In one-vs-rest scheme, we observe that the Moment method consistently outperforms both PGD and SDP across almost all datasets and 𝐶 in terms of accuracy and F1 scores. Notably, the optimality gaps, 𝜂, for Moment are consistently lower than those for SDP, indicating that the Moment method’s solution obtain a better gap, which underscore the effectiveness of the Moment method in real datasets.

\ Table 8: Real dataset performance (𝐶 = 0.1), one-vs-one

\ Table 9: Real dataset performance (𝐶 = 1.0), one-vs-one

\ Table 10: Real dataset performance (𝐶 = 10.0), one-vs-one

\ In one-vs-one scheme however, we observe that the SDP and Moment have comparative performances, both better than PGD. Nevertheless, the optimality gaps of SDP are still significantly larger than the Moment’s, for almost all cases.

\ Similarly, we compare the average runtime to finish 1 fold of training for each model on these real datasets, shown in Table 11. We observe a similar trend: the sparse moment relaxation typically requires at least an order of magnitude more runtime compared to the other methods.

\ Table 11: Average runtime to finish 1 fold of training for each model on real dataset.

\

:::info Authors:

(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA ([email protected]).

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Ethereum Targets $7,000—But PEPETO Could Deliver 10,000% More Upside

2026-01-18 03:18:32

Ethereum trades near $3,300 as institutional staking and ETF inflows support a possible move toward $7,000 by 2026. But as a $399B asset, ETH’s upside is incremental. Pepeto ($PEPETO), still in presale at $0.000000178, combines meme appeal with zero-fee swaps, cross-chain bridging, a verified exchange, and whale accumulation—creating potential for exponential gains before listings.

AI Coding Tip 003 - Force Read-Only Planning

2026-01-18 03:00:04

Think first, code later

TL;DR: Set your AI code assistant to read-only state before it touches your files.

Common Mistake ❌

You paste your failing call stack to your AI assistant without further instructions.

\ The copilot immediately begins modifying multiple source files.

\ It creates new issues because it doesn't understand your full architecture yet.

\ You spend the next hour undoing its messy changes.

Problems Addressed 😔

The AI modifies code that doesn't need changing.

\ The copilot starts typing before it reads the relevant functions.

\ The AI hallucinates when assuming a library exists without checking your package.json.

\ Large changes make code reviews and diffs a nightmare.

How to Do It 🛠️

Enter Plan Mode: Use "Plan Mode/Ask Mode" if your tool has it.

\ If your tool doesn't have such a mode, you can add a meta-prompt

Read this and wait for instructions / Do not change any files yet.

\ Ask the AI to read specific files and explain the logic there.

\ After that, ask for a step-by-step implementation plan for you to approve.

\ When you like the plan, tell the AI: "Now apply step 1."

Benefits 🎯

Better Accuracy: The AI reasons better when focusing only on the "why."

\ Full Control: You catch logic errors before they enter your codebase.

\ Lower Costs: You use fewer tokens when you avoid "trial and error" coding loops.

\ Clearer Mental Model: You understand the fix as well as the AI does.

Context 🧠

AI models prefer "doing" over "thinking" to feel helpful. This is called impulsive coding.

\ When you force it into a read-only phase, you are simulating a Senior Developer's workflow.

\ You deal with the Artificial Intelligence first as a consultant and later as a developer.

Prompt Reference 📝

Bad prompt 🚫

Fix the probabilistic predictor
in the Kessler Syndrome Monitor component 
using this stack dump.

\ Good prompt 👉

Read @Dashboard.tsx and @api.ts. Do not write code yet.

Analyze the stack dump.

When you find the problem, explain it to me.

Then, write a Markdown plan to fix it, restricted to the REST API..

[Activate Code Mode]

Create a failing test representing the error.

Apply the fix and run the tests until all are green

Considerations ⚠️

Some simple tasks do not need a plan.

\ You must actively read the plan the AI provides.

\ The AI might still hallucinate the plan, so verify it.

Type 📝

[X] Semi-Automatic

Limitations ⚠️

You can use this for refactoring and complex features.

\ You might find it too slow for simple CSS tweaks or typos.

\ Some AIs go the other way around, being too confirmative before changing anything. Be patient with them.

Tags 🏷️

  • Complexity

Level 🔋

[X] Intermediate

Related Tips 🔗

Request small, atomic commits.

Conclusion 🏁

You save time when you think.

\ You must force the AI to be your architect before letting it be your builder.

\ This simple strategy prevents hours of debugging later. 🧠

More Information ℹ️

https://github.blog/ai-and-ml/github-copilot/copilot-ask-edit-and-agent-modes-what-they-do-and-when-to-use-them/?embedable=true

https://www.thepromptwarrior.com/p/windsurf-vs-cursor-which-ai-coding-app-is-better?embedable=true

https://aider.chat/docs/usage/modes.html?embedable=true

https://opencode.ai/docs/modes/?embedable=true

Also Known As 🎭

Read-Only Prompting

Consultant Mode

Tools 🧰

| Tool | Read-Only Mode | Write Mode | Mode Switching | Open Source | Link | |----|----|----|----|----|----| | Windsurf | Chat Mode | Write Mode | Toggle | No | https://windsurf.com/ | | Cursor | Normal/Ask | Agent/Composer | Context-dependent | No | https://www.cursor.com/ | | Aider | Ask/Help Modes | Code/Architect | /chat-mode | Yes | https://aider.chat/ | | GitHub Copilot | Ask Mode | Edit/Agent Modes | Mode selector | No | https://github.com/features/copilot | | Cline | Plan Mode | Act Mode | Built-in | Yes (extension) | https://cline.bot/ | | Continue.dev | Chat/Ask | Edit/Agent Modes | Config-based | Yes | https://continue.dev/ | | OpenCode | Plan Mode | Build Mode | Tab key | Yes | https://opencode.ai/ | | Claude Code | Review Plans | Auto-execute | Settings | No | https://code.claude.com/ | | Replit Agent | Plan Mode | Build/Fast/Full | Mode selection | No | https://replit.com/agent3 |

Disclaimer 📢

The views expressed here are my own.

\ I am a human who writes as best as possible for other humans.

\ I used AI proofreading tools to improve some texts.

\ I welcome constructive criticism and dialogue.

\ I shape these insights through 30 years in the software industry, 25 years of teaching, and writing over 500 articles and a book.


This article is part of the AI Coding Tip series.

SeaTunnel CDC Explained: A Layman’s Guide

2026-01-18 01:00:11

Based on recent practices in production environments using SeaTunnel CDC (Change Data Capture) to synchronize scenarios such as Oracle, MySQL, and SQL Server, and combined with feedback from a wide range of users, I have written this article to help you understand the process by which SeaTunnel implements CDC. The content mainly covers the three stages of CDC: Snapshot, Backfill, and Incremental.

The Three Stages of CDC

The overall CDC data reading process can be broken down into three major stages:

  1. Snapshot (Full Load)
  2. Backfill
  3. Incremental

1. Snapshot Stage

The meaning of the Snapshot stage is very intuitive: take a snapshot of the current database table data and perform a full table scan via JDBC.

\ Taking MySQL as an example, the current binlog position is recorded during the snapshot:

SHOW MASTER STATUS;

| File | Position | BinlogDoDB | BinlogIgnoreDB | ExecutedGtidSet | |----|----|----|----|----| | binlog.000011 | 1001373553 | | | |

\ SeaTunnel records the File and Position as the low watermark.

Note: This is not just executed once, because SeaTunnel has implemented its own split cutting logic to accelerate snapshots.

\

MySQL Snapshot Splitting Mechanism (Split)

Assuming the global parallelism is 10:

  • SeaTunnel will first analyze all tables and their primary key/unique key ranges and select an appropriate splitting column.

  • It splits based on the maximum and minimum values of this column, with a default of snapshot.split.size = 8096.

  • Large tables may be cut into hundreds of Splits, which are allocated to 10 parallel channels by the enumerator according to the order of subtask requests (tending toward a balanced distribution overall).

Table-level sequential processing (schematic):

// Processing sequence:
// 1. Table1 -> Generate [Table1-Split0, Table1-Split1, Table1-Split2]
// 2. Table2 -> Generate [Table2-Split0, Table2-Split1]
// 3. Table3 -> Generate [Table3-Split0, Table3-Split1, Table3-Split2, Table3-Split3]

\ Split-level parallel allocation:

// Allocation to different subtasks:
// Subtask 0: [Table1-Split0, Table2-Split1, Table3-Split2]
// Subtask 1: [Table1-Split1, Table3-Split0, Table3-Split3]
// Subtask 2: [Table1-Split2, Table2-Split0, Table3-Split1]

\ Each Split is actually a query with a range condition, for example:

SELECT * FROM user_orders WHERE order_id >= 1 AND order_id < 10001;

\ Crucial: Each Split separately records its own low watermark/high watermark.

\ Practical Advice: Do not make the split_size too small; having too many Splits is not necessarily faster, and the scheduling and memory overhead will be very large.

2. Backfill Stage

Why is Backfill needed? Imagine you are performing a full snapshot of a table that is being frequently written to. When you read the 100th row, the data in the 1st row may have already been modified. If you only read the snapshot, the data you hold when you finish reading is actually "inconsistent" (part is old, part is new).

\ The role of Backfill is to compensate for the "data changes that occurred during the snapshot" so that the data is eventually consistent.

\ The behavior of this stage mainly depends on the configuration of the exactly_once parameter.

2.1 Simple Mode (exactly_once = false)

This is the default mode; the logic is relatively simple and direct, and it does not require memory caching:

  • Direct Snapshot Emission: Reads snapshot data and sends it directly downstream without entering a cache.
  • Direct Log Emission: Reads Binlog at the same time and sends it directly downstream.
  • Eventual Consistency: Although there will be duplicates in the middle (old A sent first, then new B), as long as the downstream supports idempotent writes (like MySQL's REPLACE INTO), the final result is consistent.

2.2 Exactly-Once Mode (exactly_once = true)

This is the most impressive part of SeaTunnel CDC, and it is the secret to guaranteeing that data is "never lost, never repeated." It introduces a memory buffer (Buffer) for deduplication.

\ Simple Explanation: Imagine the teacher asks you to count how many people are in the class right now (Snapshot stage). However, the students in the class are very mischievous; while you are counting, people are running in and out (data changes). If you just count with your head down, the result will definitely be inaccurate when you finish.

SeaTunnel does it like this:

  1. Take a Photo First (Snapshot): Count the number of people in the class first and record it in a small notebook (memory buffer); don't tell the principal (downstream) yet.
  2. Watch the Surveillance (Backfill): Retrieve the surveillance video (Binlog log) for the period you were counting.
  3. Correct the Records (Merge):
  • If the surveillance shows someone just came in, but you didn't count them -> add them.
  • If the surveillance shows someone just ran out, but you counted them in -> cross them out.
  • If the surveillance shows someone changed their clothes -> change the record to the new clothes.
  1. Submit Homework (Send): After correction, the small notebook in your hand is a perfectly accurate list; now hand it to the principal.

\ Summary for Beginners: exactly_once = true means "hold it in and don't send it until it's clearly verified."

  • Benefit: The data received downstream is absolutely clean, without duplicates or disorder.
  • Cost: Because it must be "held in," it needs to consume some memory to store the data. If the table is particularly large, memory might be insufficient.

2.3 Two Key Questions and Answers

Q1: Why is case READ: throw Exception written in the code? Why aren't there READ events during the Backfill stage?

  • The READ event is defined by SeaTunnel itself, specifically to represent "stock data read from the snapshot."
  • The Backfill stage reads the database's Binlog. Binlog only records "additions, deletions, and modifications" (INSERT/UPDATE/DELETE) and never records "someone queried a piece of data."
  • Therefore, if you read a READ event during the Backfill stage, it means the code logic is confused.

\ Q2: If it's placed in memory, can the memory hold it? Will it OOM?

  • It's not putting the whole table into memory: SeaTunnel processes by splits.
  • Splits are small: A default split has only 8096 rows of data.
  • Throw away after use: After processing a split, send it, clear the memory, and process the next one.
  • Memory occupancy formula ≈ : Parallelism × Split size × Single row data size.

2.4 Key Detail: Watermark Alignment Between Multiple Splits

This is a very hidden but extremely important issue. If not handled well, it will lead to data being either lost or repeated.

\ Plain Language Explanation: The Fast/Slow Runner Problem: Imagine two students (Split A and Split B) are copying homework (Backfill data).

  • Student A (fast): Copied to page 100 and finished at 10:00.
  • Student B (slow): Copied to page 200 and just finished at 10:05.

\ Now, the teacher (Incremental task) needs to continue teaching a new lesson (reading Binlog) from where they finished copying. Where should the teacher start?

\

  • If starting from page 200: Student B is connected, but the content Student A missed between pages 100 and 200 (what happened between 10:00 and 10:05) is completely lost.
  • If starting from page 100: Student A is connected, but Student B will complain: "Teacher, I already copied the content from page 100 to 200!" This leads to repetition.

\ SeaTunnel's Solution: Start from the earliest and cover your ears for what you've already heard: SeaTunnel adopts a "Minimum Watermark Starting Point + Dynamic Filtering" strategy:

  1. Determine the Start (care for the slow one): The teacher decides to start from page 100 (the minimum watermark among all splits).
  2. Dynamic Filtering (don't listen to what's been heard): While the teacher is lecturing (reading Binlog), they hold a list: { A: 100, B: 200 }.
  • When the teacher reaches page 150:
  • Look at the list; is it for A? 150 > 100, A hasn't heard it, record it (send).
  • Look at the list; is it for B? 150 < 200, B already copied it, skip it directly (discard).
  1. Full Speed Mode (everyone has finished hearing): When the teacher reaches page 201 and finds everyone has already heard it, they no longer need the list.

    快慢跑问题英文

Summary in one sentence: With exactly_once: The incremental stage strictly filters according to the combination of "starting offset + split range + high watermark."

\ Withoutexactly_once: The incremental stage becomes a simple "sequential consumption from a certain starting offset."

3. Incremental Stage

After the Backfill (for exactly_once = true) or Snapshot stage ends, it enters the pure incremental stage:

  • MySQL: Based on binlog.
  • Oracle: Based on redo/logminer.
  • SQL Server: Based on transaction log/LSN.
  • PostgreSQL: Based on WAL.

\ SeaTunnel's behavior in the incremental stage is very close to native Debezium:

  • Consumes logs in offset order.
  • Constructs events like INSERT/UPDATE/DELETE for each change.
  • When exactly_once = true, the offset and split status are included in the checkpoint to achieve "exactly-once" semantics after failure recovery.

4. Summary

The core design philosophy of SeaTunnel CDC is to find the perfect balance between "Fast" (parallel snapshots) and "Stable" (data consistency).

\ Let's review the key points of the entire process:

  • Slicing (Split) is the foundation of parallel acceleration: Cutting large tables into small pieces to let multiple threads work at the same time.
  • Snapshot is responsible for moving stock: Utilizing slices to read historical data in parallel.
  • Backfill is responsible for sewing the gaps: This is the most critical step. It compensates for changes during the snapshot and eliminates duplicates using memory merging algorithms to achieve Exactly-Once.
  • Incremental is responsible for real-time synchronization: Seamlessly connecting to the Backfill stage and continuously consuming database logs.

\ Understanding this trilogy of "Snapshot -> Backfill -> Incremental" and the coordinating role of "Watermarks" within it is to truly master the essence of SeaTunnel CDC.

\