MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Building a Rails Engine #7 — The Orchestrator: Coordinating the Workflow

2026-03-03 21:00:00

The Orchestrator: Coordinating the Import Workflow

How to coordinate parsing, validation, and persistence through a single class that manages state transitions, handles errors per-record, and integrates with ActiveJob.

Context

This is part 7 of the series where we build DataPorter, a mountable Rails engine for data import workflows. In part 6, we built the DataImport model to track import state and the Source layer to parse CSV files into mapped hashes.

We now have all the individual pieces: targets describe imports, sources parse files, the RecordValidator checks column constraints, and DataImport tracks state. But nothing ties them together. In this article, we build the Orchestrator -- the class that coordinates the full parse-then-import workflow.

The problem

Without an orchestration layer, the import workflow ends up as 80 lines of procedural code in a controller action:

def import
  import_record.update!(status: :parsing)
  source = DataPorter::Sources::Csv.new(import_record, content: file.read)
  rows = source.fetch
  records = []

  rows.each_with_index do |row, i|
    record = build_import_record(row, i)
    record = target.transform(record)
    target.validate(record)
    validate_columns(record, target.columns)
    records << record
  end

  import_record.update!(records: records, status: :previewing)
  # ... 40 more lines for the import phase, error handling, reporting
end

Parse the file. Loop the rows. Validate each one. Handle errors somehow. Update the status. Build a report. Send a notification. Pray nothing crashes between step 3 and step 7.

If this lives in a controller, it is untestable and impossible to run in the background. If it lives in the model, DataImport becomes a god object. We need a dedicated coordination layer that knows the order of operations but delegates the details to the objects that own them.

How it fits together

The Orchestrator coordinates two phases, each following the same pattern: transition the status, delegate work to the right objects, handle failures.

parse!                                import!
──────                                ───────
status → parsing                      status → importing
    │                                     │
    ▼                                     ▼
Source.fetch                          for each importable record
    │                                     │
    ▼                                     ├─ target.persist(record)
for each row                              │   ├─ success → created++
    ├─ extract_data                       │   └─ error → record.add_error
    ├─ target.transform                   │              target.on_error
    ├─ target.validate                    │
    └─ validator.validate                 ▼
    │                                 target.after_import(results)
    ▼                                     │
status → previewing                       ▼
report generated                      status → completed
                                      report updated

The user reviews parsed records between the two phases. This preview checkpoint is what makes the import workflow safe -- data never touches the database until the user confirms.

What we're building

Here is the Orchestrator in action -- two method calls that drive the entire workflow:

# In a controller or job
orchestrator = DataPorter::Orchestrator.new(data_import, content: csv_string)

# Step 1: Parse the file, validate records, generate a preview
orchestrator.parse!
data_import.status   # => "previewing"
data_import.records  # => [ImportRecord, ImportRecord, ...]

# Step 2: After user reviews the preview, persist the records
orchestrator.import!
data_import.status   # => "completed"
data_import.report.imported_count  # => 42

Two methods, two phases. parse! turns raw data into validated records and stops at previewing so the user can review. import! takes the importable records and persists them through the target's persist method. If anything goes catastrophically wrong, the import transitions to failed with an error report.

Implementation

Step 1 -- The Orchestrator skeleton and parse phase

The Orchestrator is a plain Ruby object. It receives a DataImport and optional content (for testing), then delegates to the pieces we already built:

# lib/data_porter/orchestrator.rb
class Orchestrator
  def initialize(data_import, content: nil)
    @data_import = data_import
    @target = data_import.target_class.new
    @source_options = { content: content }.compact
  end

  def parse!
    @data_import.parsing!
    records = build_records
    @data_import.update!(records: records, status: :previewing)
    build_report
  rescue StandardError => e
    handle_failure(e)
  end
end

The constructor instantiates the target (so we can call its hooks) and compacts the source options (so content: nil does not override ActiveStorage downloads). The parse! method follows a strict sequence: transition to parsing, build and validate records, save them with a previewing status, then generate a summary report. The rescue at the method level catches any failure -- a malformed CSV, a missing file, an unexpected source error -- and transitions the import to failed with the error message preserved in the report.

Notice that parsing! is called before the work starts. This is intentional: if the job crashes between the status transition and the update!, the import is left in parsing rather than pending, signaling to the user that something went wrong mid-process rather than silently sitting idle.

Step 2 -- Building and validating records

The build_records method is where the Source, Target, and RecordValidator converge:

# lib/data_porter/orchestrator.rb
def build_records
  source = Sources.resolve(@data_import.source_type)
                  .new(@data_import, **@source_options)
  raw_rows = source.fetch
  columns = @target.class._columns || []
  validator = RecordValidator.new(columns)

  raw_rows.each_with_index.map do |row, index|
    build_record(row, index, columns, validator)
  end
end

def build_record(row, index, columns, validator)
  record = StoreModels::ImportRecord.new(
    line_number: index + 1,
    data: extract_data(row, columns)
  )
  record = @target.transform(record)
  @target.validate(record)
  validator.validate(record)
  record.determine_status!
  record
end

def extract_data(row, columns)
  columns.each_with_object({}) do |col, hash|
    hash[col.name] = row[col.name]
  end
end

Each row goes through a four-step pipeline: extract_data picks only the values matching declared columns (ignoring any extra data in the row), transform lets the target normalize values (e.g., formatting phone numbers), validate runs target-specific business rules, and the RecordValidator checks structural constraints (required fields, type checks). Finally, determine_status! sets each record to complete, partial, or missing based on whether errors were added.

The RecordValidator handles the generic constraints we defined in the column DSL:

# lib/data_porter/record_validator.rb
class RecordValidator
  def initialize(columns)
    @columns = columns
  end

  def validate(record)
    @columns.each do |col|
      value = record.data[col.name]
      validate_required(record, col, value)
      validate_type(record, col, value)
    end
  end
end

This separation matters: the target owns business-rule validations ("email must be unique in the system"), while the RecordValidator owns structural validations ("email must look like an email"). Neither knows about the other.

Step 3 -- The import phase and per-record error handling

Once the user reviews the preview and confirms, import! persists the records:

# lib/data_porter/orchestrator.rb
def import!
  @data_import.importing!
  results = import_records
  update_import_report(results)
  @target.after_import(results, context: build_context)
rescue StandardError => e
  handle_failure(e)
end

def persist_record(record, context, results)
  @target.persist(record, context: context)
  results[:created] += 1
rescue StandardError => e
  record.add_error(e.message)
  @target.on_error(record, e, context: context)
  results[:errored] += 1
end

The critical design decision here is the error boundary. Each record is persisted individually, and if persist raises -- a uniqueness violation, a foreign key constraint, a custom validation from the host app -- the error is captured on that record and the import continues. The import does not wrap everything in a single transaction. This means a 10,000-row file with 3 bad records will successfully import 9,997 records rather than rolling back the entire batch.

The on_error hook lets the target react to failures (logging, notifying, skipping related records), while after_import runs once after all records are processed, receiving the results hash for summary work like sending a confirmation email.

The remaining private methods handle the plumbing:

# lib/data_porter/orchestrator.rb (private methods)
def import_records
  context = build_context
  results = { created: 0, errored: 0 }

  @data_import.importable_records.each do |record|
    persist_record(record, context, results)
  end

  @data_import.save!
  results
end

def build_context
  { user: @data_import.user, import: @data_import }
end

def handle_failure(error)
  @data_import.update!(
    status: :failed,
    report: @data_import.report.tap { |r| r.error = error.message }
  )
end

def update_import_report(results)
  @data_import.report.assign_attributes(
    imported_count: results[:created],
    errored_count: results[:errored]
  )
  @data_import.update!(status: :completed)
end

build_context provides the host app context that targets receive in their hooks -- the current user and the import record itself. handle_failure is the safety net: any unrecoverable error (source cannot be read, target cannot be resolved) transitions the import to failed with the error message preserved. update_import_report writes the final counts and marks the import as completed.

Step 4 -- ActiveJob integration

The Orchestrator is designed to be called from anywhere, but its primary consumer is a pair of ActiveJob classes:

# app/jobs/data_porter/parse_job.rb
class ParseJob < ActiveJob::Base
  queue_as { DataPorter.configuration.queue_name }

  def perform(import_id)
    data_import = DataImport.find(import_id)
    Orchestrator.new(data_import).parse!
  end
end

# app/jobs/data_porter/import_job.rb
class ImportJob < ActiveJob::Base
  queue_as { DataPorter.configuration.queue_name }

  def perform(import_id)
    data_import = DataImport.find(import_id)
    Orchestrator.new(data_import).import!
  end
end

Each job is a one-liner: find the import, delegate to the Orchestrator. The queue name comes from the engine's configuration, so the host app controls which queue processes imports. Because the Orchestrator already handles failures internally (transitioning to failed and recording the error), the jobs do not need their own error handling -- a crash at the ActiveJob level means something truly unexpected happened, and the adapter's retry mechanism takes over.

Decisions & tradeoffs

Decision We chose Over Because
Coordination layer Dedicated Orchestrator class Controller-level logic or model callbacks Keeps controllers thin, models focused on data, and the workflow independently testable
Transaction boundaries Per-record persist (no wrapping transaction) Single transaction around all records A failed record should not roll back thousands of successful ones; partial success is more useful than total failure
Error recovery Capture error on the record, continue importing Halt on first error Users expect to see which rows failed and why, not just "import failed"; the report becomes actionable
Two-phase workflow Separate parse! and import! methods A single run! method The preview step between parse and import lets users catch problems before data hits the database
Job design Thin jobs delegating to Orchestrator Logic inside the job classes The Orchestrator is testable without ActiveJob; jobs are just the async trigger

Testing it

The Orchestrator specs exercise both phases end-to-end using an anonymous target class and injected CSV content:

# spec/data_porter/orchestrator_spec.rb
let(:csv_content) { "First Name,Last Name,Email\nAlice,Smith,[email protected]\n" }

describe "#parse!" do
  it "transitions to previewing" do
    orchestrator = described_class.new(data_import, content: csv_content)

    orchestrator.parse!

    expect(data_import.reload.status).to eq("previewing")
  end

  it "validates required fields" do
    csv = "First Name,Last Name,Email\n,Smith,[email protected]\n"
    orchestrator = described_class.new(data_import, content: csv)

    orchestrator.parse!

    record = data_import.reload.records.first
    expect(record.status).to eq("missing")
  end
end

describe "#import!" do
  let(:failing_target_class) do
    Class.new(DataPorter::Target) do
      label "Failing"
      model_name "Guest"
      columns do
        column :name, type: :string
      end

      def persist(_record, context:)
        raise "DB constraint violation"
      end
    end
  end

  it "handles per-record errors without failing the import" do
    data_import.update!(records: [
      DataPorter::StoreModels::ImportRecord.new(
        line_number: 1, data: { name: "Alice" }, status: :valid
      )
    ])
    allow(DataPorter::Registry).to receive(:find).and_return(failing_target_class)

    orchestrator = described_class.new(data_import)
    orchestrator.import!

    expect(data_import.reload.status).to eq("completed")
    expect(data_import.report.errored_count).to eq(1)
  end
end

The key pattern: even when every persist call raises, the import still reaches completed -- not failed. The failed status is reserved for catastrophic errors (the source cannot be read, the target cannot be resolved). Per-record errors are expected operational noise, tracked in the report.

Recap

  • The Orchestrator is a plain Ruby class that coordinates the parse-validate-persist workflow, keeping controllers thin and models focused.
  • The two-phase design (parse! then import!) creates a natural preview checkpoint where users can review data before it touches the database.
  • Per-record error handling means a single bad row never takes down the entire import; errors are captured on individual records and surfaced in the report.
  • ActiveJob integration is a thin wrapper: two one-liner jobs that delegate to the Orchestrator, using the engine's configured queue name.

Next up

The import now runs in the background, but the user has no way to know what is happening. They click "Import" and stare at a static page. In part 8, we build a real-time progress system using ActionCable and Stimulus -- a Broadcaster service that pushes status updates and record counts to the browser as the Orchestrator processes each row. No more refreshing to check if it is done.

This is part 7 of the series "Building DataPorter - A Data Import Engine for Rails". Previous: Parsing CSV Data with Sources | Next: Real-time Progress with ActionCable & Stimulus

GitHub: SerylLns/data_porter | RubyGems: data_porter

5 Web Dev Pitfalls That Are Silently Killing Your Projects (With Real Fixes)

2026-03-03 21:00:00

Most of us have shipped something that "worked on my machine" only to watch it fall apart in production. The frustrating part? Beginner projects tend to fail in the same areas: mobile UX, performance, accessibility, and security. These mistakes are predictable which means they're fixable.

This post walks through five critical pitfalls I see constantly, with real code examples and actionable fixes you can apply today.

Pitfall #1: Breaking Your Site on Mobile

Over 60% of web traffic comes from mobile devices, yet most beginners test exclusively on a large monitor with DevTools occasionally set to "iPhone" mode. The result: horizontal scrolling, cramped spacing, and buttons too small to tap accurately.

The Fix

Go mobile-first with fluid layouts and proper touch targets:

/* ✅ Mobile-friendly approach */
.container {
  width: 100%;
  max-width: 1200px;
  padding: clamp(1rem, 5vw, 3rem); /* Scales between 16px and 48px */
  margin: 0 auto;
}

.button {
  padding: 12px 24px;
  min-height: 44px; /* Apple HIG + WCAG 2.2 requirement */
  min-width: 44px;
  font-size: 1rem;
}

Checklist

  • Test on real devices, not just DevTools
  • Use clamp() for responsive spacing
  • All touch targets should be minimum 44×44px
  • Avoid fixed widths use max-width instead
  • Check your layout at 320px, 768px, and 1440px

Pitfall #2: Shipping Slow Sites (Core Web Vitals Failures)

The three metrics that matter:

  • LCP (Largest Contentful Paint): under 2.5s
  • INP (Interaction to Next Paint): under 200ms
  • CLS (Cumulative Layout Shift): under 0.1

Images without dimensions are a classic CLS killer, and blocking scripts tank LCP.

The Fix

Prevent layout shift with explicit dimensions:

<!-- ✅ Prevents CLS and optimizes loading -->
<img
  src="hero.jpg"
  srcset="hero-400.jpg 400w, hero-800.jpg 800w, hero-1200.jpg 1200w"
  sizes="(max-width: 768px) 100vw, 1200px"
  alt="Hero image"
  width="1200"
  height="630"
  style="aspect-ratio: 1200 / 630;"
  loading="lazy"
  decoding="async"
>

Load non-critical scripts only when needed:

<!-- ✅ Load chat widget on first user interaction -->
<script>
  const loadChatWidget = () => {
    const script = document.createElement('script');
    script.src = 'chat-widget.js';
    script.defer = true;
    document.body.appendChild(script);
  };
  document.addEventListener('mousemove', loadChatWidget, { once: true });
</script>

Stop importing entire libraries:

// ❌ Imports everything
import _ from 'lodash';
import moment from 'moment';

// ✅ Tree-shakeable
import { sum } from 'lodash-es';

// ✅ Use native APIs
const formatted = new Intl.DateTimeFormat('en-US').format(new Date());

Checklist

  • Run Lighthouse before every deployment
  • Always specify image dimensions
  • Defer or async all non-critical scripts
  • Code split large JavaScript bundles
  • Monitor Core Web Vitals in Google Search Console

Last year I worked on a client project where the homepage CLS was 0.32 due to missing image dimensions. Fixing just three images dropped it to 0.05 and improved mobile engagement immediately.

Pitfall #3: Locking Out Users with Disabilities

Accessibility lawsuits are rising, but beyond legal risk you're genuinely locking out real users if your site isn't keyboard or screen reader friendly.

The Fix

Use semantic HTML with proper ARIA attributes:

<!-- ✅ Accessible form input -->
<div>
  <label for="email">Email Address</label>
  <input
    type="email"
    id="email"
    name="email"
    aria-required="true"
    aria-invalid="true"
    aria-describedby="email-error"
  >
  <span id="email-error" role="alert" style="color: #d32f2f;">
    Please enter a valid email address in the format: [email protected]
  </span>
</div>

Verify color contrast meets WCAG AA (4.5:1 ratio for body text):

/* ❌ Insufficient contrast (2.5:1) */
.text { color: #767676; background: #ffffff; }

/* ✅ WCAG AA compliant (4.6:1) */
.text { color: #595959; background: #ffffff; }

Make dropdowns keyboard-navigable:

function DropdownMenu() {
  const [isOpen, setIsOpen] = useState(false);

  const handleKeyDown = (e) => {
    if (e.key === 'Escape') setIsOpen(false);
  };

  return (
    <div onKeyDown={handleKeyDown}>
      <button
        aria-expanded={isOpen}
        aria-haspopup="true"
        onClick={() => setIsOpen(!isOpen)}
      >
        Menu
      </button>
      {isOpen && (
        <ul role="menu">
          <li role="menuitem"><a href="/profile">Profile</a></li>
          <li role="menuitem"><a href="/settings">Settings</a></li>
        </ul>
      )}
    </div>
  );
}

Checklist

  • Use semantic HTML (<button>, <nav>, <main>, <article>)
  • Every form input needs an associated <label>
  • Test with keyboard-only navigation
  • Install eslint-plugin-jsx-a11y to catch issues early
  • Test with NVDA (Windows) or VoiceOver (Mac)

Pitfall #4: Building Insecure APIs

The most common API vulnerability is BOLA Broken Object Level Authorization. It happens when an endpoint doesn't verify that the authenticated user actually owns the resource they're requesting.

// ❌ Anyone can access ANY order by changing the ID in the URL
app.get('/api/orders/:orderId', authenticate, async (req, res) => {
  const order = await db.orders.findById(req.params.orderId);
  res.json(order); // No ownership check!
});

The Fix

Always verify resource ownership:

// ✅ Ownership check
app.get('/api/orders/:orderId', authenticate, async (req, res) => {
  const order = await db.orders.findOne({
    id: req.params.orderId,
    userId: req.user.id // Critical
  });

  if (!order) return res.status(404).json({ error: 'Order not found' });

  res.json(order);
});

Add rate limiting to prevent brute force:

import rateLimit from 'express-rate-limit';

const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  message: 'Too many login attempts, please try again later',
  standardHeaders: true,
  legacyHeaders: false,
});

app.post('/api/login', loginLimiter, async (req, res) => { ... });

Use short-lived tokens with secure storage:

// ✅ Short-lived access token + httpOnly refresh token
const accessToken = jwt.sign({ userId: user.id }, ACCESS_SECRET, { expiresIn: '15m' });
const refreshToken = jwt.sign({ userId: user.id }, REFRESH_SECRET, { expiresIn: '7d' });

res.cookie('refreshToken', refreshToken, {
  httpOnly: true,
  secure: true,
  sameSite: 'strict',
});

res.json({ accessToken });

Checklist

  • Verify resource ownership in every API endpoint
  • Rate limit all public endpoints
  • Use short-lived JWTs (15 minutes max)
  • Store refresh tokens in httpOnly cookies
  • Validate and sanitize all user inputs

Pitfall #5: Blindly Trusting AI-Generated Code

AI tools like Copilot and ChatGPT are genuinely useful but they generate code that looks correct while hiding security holes and edge-case bugs. Here's a real example:

// ❌ AI-generated file upload looks fine, has a critical vulnerability
app.post('/api/upload', (req, res) => {
  const file = req.files.upload;
  file.mv(`./uploads/${file.name}`); // Path traversal attack!
  res.json({ success: true });
});

An attacker uploads a file named ../../../etc/passwd and you're in trouble.

The Fix

import path from 'path';
import { v4 as uuidv4 } from 'uuid';

const ALLOWED_EXTENSIONS = ['.jpg', '.jpeg', '.png', '.gif'];
const MAX_FILE_SIZE = 5 * 1024 * 1024; // 5MB

app.post('/api/upload', async (req, res) => {
  const file = req.files?.upload;

  if (!file) return res.status(400).json({ error: 'No file uploaded' });
  if (file.size > MAX_FILE_SIZE) return res.status(400).json({ error: 'File too large' });

  const ext = path.extname(file.name).toLowerCase();
  if (!ALLOWED_EXTENSIONS.includes(ext)) return res.status(400).json({ error: 'Invalid file type' });

  // Generate safe filename prevents path traversal
  const safeFilename = `${uuidv4()}${ext}`;
  const uploadPath = path.join(__dirname, 'uploads', safeFilename);

  await file.mv(uploadPath);
  res.json({ filename: safeFilename });
});

Checklist

  • Review AI-generated code line-by-line
  • Test edge cases AI tends to miss
  • Never accept code you don't fully understand
  • Run security linters (eslint-plugin-security)
  • Treat AI as an assistant, not a replacement for thinking

Your Action Plan for This Week

  1. Run a Lighthouse audit on your main pages fix anything below 90
  2. Install eslint-plugin-jsx-a11y and resolve violations
  3. Audit your API endpoints for missing authorization checks
  4. Review any AI-generated code from the past month
  5. Test your site on a real mobile device, not just DevTools

Wrapping Up

These pitfalls affect developers at every level. The difference is that experienced developers have systems to catch them before they reach production automated Lighthouse CI, security scanning in PRs, accessibility linting in the editor, real device testing in QA.

You don't need years of experience to build secure, accessible, performant websites. You just need to know what to look for and now you do.

If you want more content like this, the original and more posts are on my blog: Dharanidharan's Solopreneur Blog.

What's the worst web dev pitfall you've run into? Drop it in the comments I'd love to hear how you fixed it.

I got tired of the official EU VAT API crashing, so I built a Serverless wrapper with Webhooks 🚀

2026-03-03 20:59:24

Hello DEV community! 👋

If you've ever built a B2B SaaS or an e-commerce checkout in Europe, you know the struggle. By law, you have to validate your customers' VAT numbers to apply the reverse charge mechanism.

The official way to do this is via the European Commission's VIES API. But there are a few huge problems with it:

  1. It frequently crashes or rate-limits you during business hours.
  2. It's incredibly slow.
  3. The biggest issue: It only tells you the status today. If your biggest client goes bankrupt or closes next month, you won't know until the unpaid invoices pile up.

I wanted a modern, fast, and proactive solution. So, I built VatFlow.

🛠 What I built

VatFlow is a REST API hosted on RapidAPI that acts as a smart shield and monitor for B2B company data.

Here is what makes it different:

  • ⚡️ Smart Caching: I built a DynamoDB caching layer. If you request a VAT number that was recently checked, it returns in milliseconds. No more VIES downtime impacting your checkout flow.
  • 🔔 Real-time Webhooks: This is the feature I'm the most proud of. You can subscribe to a VAT number. Every night, my serverless cron job checks the company's status. If they close down or change their address, your server gets a POST request instantly.
  • 🇫🇷 Deep Enrichment (France): For French companies, the API automatically enriches the VIES data with financial data (revenue, net income) and executives' names using local Open Data.

🏗 The Tech Stack (100% Serverless)

I wanted this to be infinitely scalable and cost-effective, so I went all-in on AWS Serverless:

  • API Gateway & AWS Lambda (Node.js) for the endpoints.
  • DynamoDB for the lightning-fast caching and storing webhook subscriptions.
  • DynamoDB Streams & EventBridge to detect changes in the data and automatically trigger the webhook dispatcher.

💻 Developer Experience First

I know how annoying it is to integrate a new API. So alongside the launch, I published two official, zero-dependency wrappers with built-in auto-retry mechanisms (because network glitches happen).

For Node.js (npm):

npm install vatflow

const VatFlowClient = require('vatflow');
const client = new VatFlowClient('YOUR_RAPIDAPI_KEY');

// Validate a VAT number in one line
const result = await client.validate('FR14652014051');
console.log(result.data.name);

For PHP (Composer):

composer require quicreatdev/vatflow-php

🎁 Try it out!

I've published the API on RapidAPI with a Free Tier so you can test it without putting down a credit card.

👉 Check out VatFlow on RapidAPI here

I would absolutely love to hear your feedback on the architecture, the DX, or the RapidAPI integration. Have you ever struggled with the VIES API before? Let me know in the comments! 👇

Simple Code Injection into ELF Executable

2026-03-03 20:44:18

In this article, I will explain an code injection technique into ELF executable.

Firstly, you need to understand the ELF executable, its contents, sections, symbol table or so on. I've written a dedicated article previously. So if you don't have any information about ELF, please read it firstly and then come here again.

Let's say we have the following source file:

/**
 * Simple Code Injection
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    int i, num;

    num = 10;

    for (i = 0; i < num; i++)
    {
        printf("#%d\n", i);
    }

    return EXIT_SUCCESS;
}

Compile this source file and then run it:

$ gcc main.c -o main ; ./main

After that, you will see:

#0
#1
#2
#3
#4
#5
#6
#7
#8
#9

In here, I will change the num = 10 to num = 5 so that the output will be #0 to #4.

To accomplish this, I need to find the corresponding machine instruction/s. According to your background, you need to know that num variable will be in the .text section of ELF executable. Because it is the local variable. To display the .text section of the program, objdump is the primarily tool:

$ objdump -j .text -d main

The output will be as following:

(...)

0000000000001149 <main>:
    1149:   f3 0f 1e fa             endbr64
    114d:   55                      push   %rbp
    114e:   48 89 e5                mov    %rsp,%rbp
    1151:   48 83 ec 20             sub    $0x20,%rsp
    1155:   89 7d ec                mov    %edi,-0x14(%rbp)
    1158:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
    115c:   c7 45 fc 0a 00 00 00    movl   $0xa,-0x4(%rbp)
    1163:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
    116a:   eb 1d                   jmp    1189 <main+0x40>
    116c:   8b 45 f8                mov    -0x8(%rbp),%eax
    116f:   89 c6                   mov    %eax,%esi
    1171:   48 8d 05 8c 0e 00 00    lea    0xe8c(%rip),%rax        # 2004 <_IO_stdin_used+0x4>
    1178:   48 89 c7                mov    %rax,%rdi
    117b:   b8 00 00 00 00          mov    $0x0,%eax
    1180:   e8 cb fe ff ff          call   1050 <printf@plt>
    1185:   83 45 f8 01             addl   $0x1,-0x8(%rbp)
    1189:   8b 45 f8                mov    -0x8(%rbp),%eax
    118c:   3b 45 fc                cmp    -0x4(%rbp),%eax
    118f:   7c db                   jl     116c <main+0x23>
    1191:   b8 00 00 00 00          mov    $0x0,%eax
    1196:   c9                      leave
    1197:   c3                      ret

Which machine instruction/s corresponds to num = 10? You need to know reading the assembly instructions in here. The answer is the movl $0xa,-0x4(%rbp) assembly instruction. Because the stack pointer of the program was enlarged by 4 bytes (int holds 4 bytes of memory) to lower address and put the immediate 0xA value. So that I need to change the 0xA value to 0x5.

After the found required assembly instruction, I need to determine the memory address of it. If you look at center column, the machine instruction is c7 45 fc 0a 00 00 00. In here:

  • c7: the opcode of movl register
  • 45 fc: the displacement of -0x4(%rbp)
  • 0a 00 00 00: the immediate value, 0xA (little-endian)

c7 byte resides at 0x115c address so that 0a byte at 0x115f.

To change that byte, I'm gonna use hexedit program:

$ hexedit ./main

You will see:

After that, you move to 0x115F address and change the 0x0A value to 0x05. Save the changes and then run the program again. You will see:

#0
#1
#2
#3
#4

That's it. You injected the new code in the ELF executable 🥳.

This technique is simple and powerful. But there are some limitations. The most important one is that you should not shift the byte addresses, just replace it. If you wanna change the entire ELF executable addresses, you need the other techniques. That's the next article's topic.

25+ Best ETL Tools for 2026: The No-Fluff Engineer's Guide

2026-03-03 20:43:43

Most teams don’t have a data shortage. They have a data scattered everywhere problem.

CRM here. Database there. Marketing numbers hiding behind APIs. And a few scripts in the middle, hoping nothing changes upstream.

You can glue it all together yourself. Many of us have. But pipelines tend to break at the worst possible moment — usually right before someone important looks at a dashboard.

In this post, we’ll walk through 25+ data integration tools I’ve tested or seen in production — what they’re good at, where they fall apart, and how to choose without regretting it six months later.

What We're Actually Talking About

Extract, Transform, Load. Three deceptively simple words that hide an enormous amount of plumbing. Your data lives in a dozen places that have zero interest in talking to each other — a CRM here, a SaaS billing platform there, a spreadsheet someone emailed last Tuesday. ETL is what brings that all into one place, you can actually reason about.

  • The Extract step grabs it from wherever it's hiding.
  • The Transform step turns that raw mess into something consistent and useful.
  • The Load step puts it somewhere your analysts and BI tools can reach. Simple in theory. Absolutely wild in practice when you're doing it at scale.

ETL vs. ELT: The Sequencing Debate

This one comes up at basically every data team I've ever sat down with. Here's the short version:

ETL cleans and reshapes data before it lands in your warehouse. Better for complex transformations, legacy systems, compliance-heavy environments, or when your destination can't handle heavy lifting.

ELT dumps raw data into storage first, then transforms it using the warehouse's own compute. Better for cloud-native stacks, large volumes, and when you want flexibility to re-derive things later.

Neither is universally right. Most mature teams run both depending on the pipeline. What matters is having tooling that doesn't force you to pick one forever.

The Landscape, Honestly Categorized

No-Code / Low-Code (For When You'd Rather Ship Than Configure)

Skyvia — genuinely underrated. Covers integration, replication, reverse ETL, backup, MCP, OData endpoints, and REST API creation from one platform. 200+ connectors, solid free tier, starts at $79/mo. The MCP server lets AI agents query connected sources directly, OData endpoints expose your data as standards-compliant feeds for Power BI or Excel with zero API work, and the SQL builder keeps things accessible without hiding the power. The UI is friendly enough that business users can handle it without engineering support. Won't win awards for the most exotic transformation engine, but for 80% of real-world pipelines, it more than holds up.

Fivetran — the reliable workhorse for teams that want pipelines to just run without babysitting them. 700+ connectors, CDC support, auto schema migrations. The catch: it gets pricey fast (base is $1K/mo), and transformation capabilities are deliberately limited. It's an ingestion tool, not a transformation tool — pair it with dbt.

Stitch — leaner than Fivetran, cheaper entry point ($100/mo), 140+ connectors. Good if your transformation logic lives downstream. Not the tool for complex multi-step reshaping.

Hevo Data — sits nicely between Stitch and Fivetran. Real-time streaming, CDC, post-load transformations, and managed infrastructure that scales itself. Gets expensive at volume ($239/mo starting point), but the operational overhead is genuinely low.

Integrate.io — strong choice for mid-to-large teams, especially if reverse ETL is in the picture. Solid drag-and-drop experience, 150+ connectors, near real-time replication. Can feel pricey for smaller setups.

Matillion — low-code when you want speed, actual code when you need it. Built for cloud warehouses, has real orchestration and security baked in (not bolted on), and handles enterprise-scale complexity. Price point (~$1K/mo+) reflects the scope. If you're running serious analytics on Snowflake or Redshift, worth a hard look.

Enterprise Platforms (When Scale Is Non-Negotiable)

SSIS (SQL Server Integration Services) — if your stack is Microsoft-everything, this is your workhorse. Visual designer, parallel execution, solid error handling. Licensing gets expensive at scale, and it shows its age on streaming and cloud-native workflows. Still extremely capable for what it was built for.

Informatica PowerCenter — battle-tested in environments where failure is not an option. Parallel processing, governance, metadata management, and hybrid deployment. The price tag and setup complexity make it enterprise-only in practice. If you're in a regulated industry moving data across legacy systems at serious volume, it earns its keep.

Talend — now part of Qlik, which brings AI-assisted pipeline guidance and tighter analytics integration. 1,000+ connectors, strong data quality toolkit, MDM built in. Overkill for simple pipelines; genuinely powerful for organizations that treat data quality as a first-class concern. Pricing (~$4,800/user/year) reflects that scope.

Oracle ODI — ELT-first architecture, Knowledge Modules for reusable logic, CDC, and a tight Oracle ecosystem fit. Heavy infrastructure requirements, steep learning curve, custom pricing. The right tool if you're building large-scale warehouses on Oracle infrastructure; a hard sell otherwise.

IBM InfoSphere DataStage — parallel processing at serious scale, deep metadata tracking, compliant by design. Not a platform you pick up casually — it demands experienced ETL engineers. Built for organizations where cost isn't the primary concern and correctness absolutely is.

SAP Data Services — ETL with data quality and governance baked in. Deep SAP integration (obviously), handles both structured and unstructured sources, centralized transformation logic. ~$10K/year baseline. Hard to justify unless your business revolves around SAP.

Qlik Replicate (formerly Attunity) — CDC-powered replication at enterprise scale, real-time sync, automated schema evolution. Great for migrations and keeping sources/targets aligned with minimal lag. Starts around $1K/mo, scales up from there. Limited for multi-source merge scenarios.

Cloud-Native (If You Already Live in a Cloud Provider's World)

AWS Glue — serverless ETL that fits naturally into the AWS ecosystem. Auto-discovers schemas, writes Spark jobs, scales up and tears down automatically. Billed per DPU-hour (~$0.44). Zero free trial. Lives entirely inside AWS — if you're multi-cloud, look elsewhere.

Azure Data Factory — Microsoft's answer for hybrid ETL. 90+ connectors, visual or code-based pipelines, play well with Synapse, Databricks, and Power BI. Consumption-based pricing. Real-time streaming isn't native — you'll want Event Hubs or Stream Analytics for that.

Google Cloud Dataflow — Apache Beam on managed infrastructure. Handles streaming and batch with one programming model. Deeply integrated with BigQuery and Pub/Sub. Billed per vCPU/memory. Powerful but requires serious Beam knowledge; debugging complex failures is not a quick job.

Google Cloud Data Fusion — the visual, lower-code sibling to Dataflow. Drag-and-drop ETL, 50+ native connectors, good for analytics lake modernization. Priced by instance-hour (developer tier at $0.35/hr). Dataproc costs run alongside it — watch those when processing large sets.

Estuary — genuinely interesting: unifies CDC, streaming, and batch in one platform ("right-time" data movement). 200+ connectors, Kafka-compatible API, exactly-once semantics for supported destinations. $0.50/GB with a free 10GB tier. Flexible deployment including private/BYOC for compliance-sensitive environments. Newer than the incumbents but growing fast.

Open-Source / Developer-Focused (For Teams That Like Owning the Stack)

Airbyte — 600+ connectors, open-source core, CDC support, flexible deployment (cloud, Kubernetes, air-gapped). What it doesn't do: transformation. Pair it with dbt. Community connectors vary in polish — some require finishing touches. If you want open-source ELT without vendor lock-in, this is the most mature option right now.

dbt — not an ingestion tool, a transformation layer. SQL-first, runs inside your warehouse, turns models into tested, versioned, documented assets. Free core, $100/mo per user on dbt Cloud. Every serious modern data stack should have something like this downstream of ingestion. If you're not using it yet, why not?

Meltano — DataOps philosophy made real: Singer-based, dbt-native, CLI-first, version-controlled pipelines as code. Free to self-host. Perfect for teams that want full ownership and are comfortable with the operational overhead. Treat your pipelines like software — PRs, tests, CI/CD. Steep learning curve if you're used to UI-driven tools.

Singer — the underlying protocol that Meltano and others build on. Taps extract, Targets load, everything talks JSON schema. 350+ community connectors. Free and modular. Requires engineering investment to run well, but zero licensing overhead.

Apache Airflow — orchestration, not ingestion. If you need complex dependency management, retry logic, SLA monitoring, and a scheduling layer that handles workflows across any set of tools, Airflow is the go-to. Free/open-source, but running it in production means either managing infrastructure yourself or paying for Astronomer, Cloud Composer, or MWAA.

Pentaho Data Integration (Kettle) — a visual ETL designer that's been around long enough to have earned serious credibility. 100+ connectors, batch and near-real-time, structured and unstructured data. Community edition is free. Plugs well into the Pentaho analytics suite. Feels a bit dated compared to cloud-native options but still gets the job done, particularly for on-prem scenarios.

Apache NiFi — data routing and flow management at scale. Born in the NSA (seriously), built for security, lineage, and moving data reliably across heterogeneous infrastructure. 300+ processors, clustering, full provenance. Free/open-source. Strong fit for IoT, healthcare, finance, or any environment where compliance demands you know exactly where every byte came from.

Picking the Right One: The Honest Framework

Stop comparing feature tables. Ask yourself these instead:

Where does your data come from, and where does it need to go? Connector breadth matters a lot here — and not just the number, but whether your specific sources are first-class citizens or afterthoughts.

Who's building and maintaining the pipelines? Analysts who live in spreadsheets need a different experience than engineers who think in DAGs. Hybrid teams need tools that flex for both without forcing everyone into one mode.

What does transformation actually look like for you? Simple column renaming? Use almost anything. Complex multi-source joins with custom business logic? You need something that won't buckle — and probably a dedicated transformation layer on top.

What happens when things break at 2am? How good is the alerting? Are logs readable? Is there a support team that answers, or are you spelunking through GitHub issues?

What's the real total cost? Open-source has infrastructure costs. Managed platforms have usage costs. Both have engineering time costs. Don't just look at the pricing page; think about operational overhead over 18 months.

Build vs. Buy

Build your own when your workflows are genuinely unique (satellite telemetry, edge-case regulatory logic), you've got engineering bandwidth to maintain it, or licensing costs make commercial tools untenable.

Buy (or use open-source managed tooling) when you'd rather spend that engineering time on the problems your company actually exists to solve — not rebuilding connector infrastructure that someone else has already gotten right.

Most teams should be buying. The exceptions know who they are.

Final Thought

The best pipeline is the one nobody talks about in stand-up. It just runs, the data lands where it should, and your analysts are working with fresh, trustworthy numbers instead of filing tickets about sync failures.

Whatever you pick, run a real pilot with your actual data before committing. Benchmarks are fiction; your data is real.

What's your current setup? Always curious what people are running in production. Drop it in the comments.

From Reader to Contributor: Why I’m Finally Posting on Dev.to

2026-03-03 20:42:29

Why I’m Finally Posting on Dev.to

I’ve been reading Dev.to for a while.

I’ve learned from other builders here. Picked up patterns. Solved problems because someone else shared theirs.

But I never posted anything myself — until now.

For the past few years, I’ve been working in the Microsoft Power Platform after spending about 15 years building traditional web projects. And honestly, the transition wasn’t smooth.

Delegation errors confused me.
I mixed up triggers and actions.
I built flows that ran way more times than they should have.

Low-code didn’t mean simple.

Over time, though, something clicked. The Power Platform isn’t about replacing “real development.” It’s about solving business problems faster — especially when requirements change constantly.

Most of what I’ve learned came from breaking things in real projects and figuring out why they broke.

That’s what I’ll share here:

• Practical Power Apps lessons
• Common Power Automate mistakes
• What I’d do differently now
• Real-world patterns that actually scale

No polished demos. No marketing spin. Just honest lessons from someone building and learning as he goes.

If you’re newer to Power Platform — or transitioning from traditional dev like I did — you’re not alone.

Let’s build smarter.