MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Using GitHub Copilot CLI as a Beginner to Translate English into Shell Commands

2026-01-31 22:22:22

Why I built this

I am still learning my way around the command line. Many times I know what I want to do, but not how to express it as a shell command.

This challenge gave me a chance to explore GitHub Copilot CLI as a bridge between natural language and terminal commands.

What I built

I created a small helper workflow where I:

  • Describe a task in plain English
  • Ask GitHub Copilot CLI to suggest a command
  • Use Copilot again to explain how the command works

What I Leaned

Example:

Task

count number of folders

Copilot Suggestion


bash
find . -maxdepth 1 -type d | wc -l

What I learned

Copilot CLI is best used interactively, with a human in the loop

It is very helpful for beginners who understand goals but not syntax

Understanding CLI limitations is as important as using AI tools

This experience helped me become more confident with terminal commands instead of blindly copy-pasting them.

This post is my submission for the GitHub Copilot CLI Challenge.

Why Slower Software Often Leads to Better Products

2026-01-31 22:21:21

The software industry moves fast.

Frameworks change.
Trends rotate.
Roadmaps expand faster than teams can think.

But speed is not always progress.

Over time, I’ve noticed something counterintuitive:
the most durable products are rarely the fastest to ship.

They are the ones that slow down early.

Slower software doesn’t mean less ambition.
It means more intention.

It means asking:
• Why does this exist?
• What problem truly matters here?
• What can we deliberately leave out?

When everything is optimized for velocity, clarity is usually the first casualty.
Features accumulate.
Interfaces grow louder.
Decisions get rushed.

Slowness creates space.
Space to think.
Space to remove instead of add.
Space to design systems that age well.

This mindset is what I’m exploring through AVESIRA:
not as a product launch,
but as a way of thinking about software, design, and presence.

There’s no rush to scale.
No pressure to impress.
Just a commitment to build things that remain understandable over time.

Sometimes the most productive thing a team can do
is pause.

Slow down.
And choose deliberately.

If this resonates, you’re already moving fast enough.

LeetCode vs Educative: Which Is Better for Interview Prep in 2026?

2026-01-31 22:21:05

Originally published on LeetCopilot Blog

LeetCode offers 4,000+ practice problems. Educative teaches patterns and system design. Here's how to choose between practice volume and structured learning.

LeetCode and Educative take completely different approaches to coding interview prep.

LeetCode is a massive problem bank with 4,000+ coding challenges. It's where you grind, build speed, and practice solving problems under pressure.

Educative is a structured learning platform with courses like "Grokking the Coding Interview." It teaches you why solutions work through patterns and concepts.

Which is better? The answer: use both in the right order. But if you can only choose one, here's how to decide.

TL;DR: LeetCode vs Educative

Feature LeetCode Educative
Purpose Practice problems Learn patterns/concepts
Content 4,000+ problems 600+ courses
Approach Hands-on grinding Structured learning
System Design Limited Excellent (Grokking)
Format Code editor + problems Interactive text + coding
Free Tier Extensive Limited
Pricing ~$35/mo or $159/yr ~$59/mo or $299/yr
Best For Practice + speed Learning + concepts

What Is LeetCode?

LeetCode is the industry-standard platform for coding interview practice, used by millions of engineers preparing for tech interviews.

Key Features

  • 4,000+ Problems: Easy, Medium, Hard across all topics
  • Company Tags: See which companies ask which questions (Premium)
  • Frequency Data: Know what's currently being asked
  • Contests: Weekly competitive programming
  • Community Solutions: Discussions for every problem
  • 14 Languages: Python, Java, C++, JavaScript, etc.

Pricing

  • Free: Most problems accessible
  • Premium: ~$35/month or $159/year

What Is Educative?

Educative is a structured learning platform with interactive, text-based courses covering programming, system design, and interview prep.

Key Features

  • Grokking the Coding Interview: Famous pattern-based DSA course
  • Grokking System Design: Industry-leading system design course
  • 600+ Courses: Beyond just interview prep
  • Interactive Text: Read, code in-browser, no videos
  • AI Features: AI-powered mock interviews, personalized roadmaps
  • Educative-99: Curated problem set similar to Blind 75

Pricing

  • Free: Limited samples
  • Premium: ~$59/month or $299/year

Head-to-Head Comparison

Purpose: Practice vs Learning

LeetCode Educative
Core Purpose Practice problems Learn concepts
Philosophy "Solve problems to learn" "Learn patterns to solve problems"
Outcome Speed + pattern recognition via repetition Deep understanding + transferable skills

LeetCode's approach: Throw yourself into problems. Learn by doing. Build muscle memory.

Educative's approach: Learn the underlying patterns first. Then apply them systematically.

"I did 200 LeetCode problems but still failed interviews. Educative's Grokking course finally made DP click." — Reddit user

Content Coverage

LeetCode Educative
DSA Problems 4,000+ 100s (in courses)
System Design Limited Excellent (Grokking SD)
Behavioral None Yes
Language Learning No Yes (Python, Java, etc.)

LeetCode wins for raw DSA practice volume.

Educative wins for comprehensive interview prep including system design, behavioral, and concepts.

Learning Style

LeetCode Educative
Format Problems + community solutions Interactive text courses
Guidance Minimal (self-directed) High (structured paths)
Explanations Community-driven (variable quality) Professional (consistent)

LeetCode is better for self-directed learners who want to dive into problems immediately.

Educative is better for those who need structured guidance and thorough explanations.

System Design

LeetCode Educative
System Design Content Limited articles Comprehensive courses
Quality Basic Industry-leading
Courses None Grokking System Design

Educative dominates for system design. If you're interviewing for senior roles, Educative's system design courses are essential.

Pricing Comparison

Plan LeetCode Educative
Free Most problems Limited samples
Monthly ~$35/mo ~$59/mo
Annual ~$159/year ~$299/year
Lifetime N/A Sometimes available

LeetCode is cheaper overall. But Educative covers more ground (system design, courses).

Pros and Cons

LeetCode Pros

  • Massive problem bank — 4,000+ problems
  • Company tags — See what FAANG asks
  • Frequency data — Focus on hot problems
  • Active community — Solutions for everything
  • Free tier — Most problems accessible
  • Contests — Build competitive skills

LeetCode Cons

  • No structured learning — You're on your own
  • Memorization trap — Easy to memorize, not understand
  • Weak system design — Not comprehensive
  • Overwhelming — 4,000+ problems = decision paralysis
  • Inconsistent explanations — Community quality varies

Educative Pros

  • Pattern-based learning — Understand, don't memorize
  • Structured paths — Clear learning roadmaps
  • System design leader — Grokking SD is industry-standard
  • Consistent quality — Professional content
  • Beyond DSA — System design, behavioral, languages
  • AI features — Mock interviews, personalized paths

Educative Cons

  • Lower problem volume — Less practice vs LeetCode
  • More expensive — $299/yr vs $159/yr
  • Limited free tier — Must pay for full access
  • Text-heavy — No video explanations
  • Course migrations — Some content moved to DesignGurus

When to Choose LeetCode

Choose LeetCode if you:

  • Already understand DSA patterns
  • Need high-volume practice
  • Want company-specific question data
  • Are self-directed and don't need guidance
  • Have time to grind (months of prep)

Recommended Path:

  1. Learn patterns elsewhere (NeetCode, books)
  2. Practice on LeetCode (200+ problems)
  3. Use LeetCopilot for hints when stuck
  4. Get Premium for company tags

When to Choose Educative

Choose Educative if you:

  • Are a beginner who needs structure
  • Want to learn patterns before practicing
  • Need system design preparation
  • Prefer guided learning over self-direction
  • Have limited time (weeks, not months)

Recommended Path:

  1. Complete Grokking the Coding Interview
  2. Complete Grokking System Design (senior roles)
  3. Practice on LeetCode with LeetCopilot

The Optimal Strategy: Use Both

The best approach combines both platforms:

Recommended Workflow

  1. Educative First (2-4 weeks)

    • Grokking the Coding Interview: Patterns
    • Build conceptual foundation
  2. LeetCode Second (Ongoing)

    • Apply patterns to real problems
    • Build speed and accuracy
    • Use LeetCopilot for contextual hints
  3. System Design (If Senior)

    • Grokking System Design on Educative
    • ByteByteGo for additional depth

Comparison Table

Feature LeetCode Educative
Purpose Practice Learn
Problems 4,000+ 100s (in courses)
System Design Limited Excellent
Structure None High
Pricing $159/yr $299/yr
Best For Practice volume Conceptual learning

FAQ

Should I use LeetCode or Educative first?
Start with Educative to learn patterns, then practice on LeetCode.

Is Educative worth $299/year?
For learning patterns and system design, yes. For practice only, no—use LeetCode.

Can I pass FAANG interviews with just LeetCode?
Yes, but you risk memorizing rather than understanding. Combine with pattern learning.

Can I pass FAANG interviews with just Educative?
Educative teaches well but has less practice volume. You'll likely need LeetCode too.

Which is better for system design?
Educative, by far. Its Grokking System Design is the industry standard.

Verdict: Which Should You Choose?

Choose LeetCode if:

  • You need practice volume
  • You're self-directed
  • You already understand patterns

Choose Educative if:

  • You need to learn patterns
  • You need system design
  • You prefer structured learning

Or Use Both (Recommended):

  • Educative for learning (2-4 weeks)
  • LeetCode + LeetCopilot for practice (ongoing)

Good luck with your prep!

If you're looking for an AI assistant to help you master LeetCode patterns and prepare for coding interviews, check out LeetCopilot.

S3 Triggers: How to Launch Glue Python Shell via AWS Lambda

2026-01-31 22:20:35

Original Japanese article: S3トリガー×AWS Lambda×Glue Python Shellの起動パターン整理

Introduction

I'm Aki, an AWS Community Builder (@jitepengin).

In my previous articles, I introduced lightweight ETL using Glue Python Shell.
In this article, I’ll organize two patterns for triggering Glue Python Shell when a file is placed in S3, and explain the reasoning behind calling Glue via Lambda.

Why Use Lambda to Trigger Glue Python Shell

While it is possible to implement ETL processing entirely within Lambda triggered by S3 events, there are limitations in runtime and memory.
By using Lambda as a trigger and delegating lightweight preprocessing or integration with other services to Lambda while executing the main ETL in Glue Python Shell, you can achieve flexible service integration and long-running processing.

For more details on when to use Lambda vs Glue Python Shell, check out my previous article:
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

Trigger Patterns

The two patterns covered in this article are:

  1. Lambda + start_job_run (Direct Job Execution)
  2. Lambda + start_workflow_run (Workflow Execution)

Other patterns exist, such as S3 → EventBridge → Step Functions → Glue Python Shell, but we’ll focus on these two simpler approaches.

Pattern 1: Lambda + start_job_run (Direct Job Execution)

In this pattern, Lambda receives the S3 file placement event and directly triggers a Glue Python Shell job using start_job_run.
This was the setup used in my previous article.

Characteristics:

  • High flexibility: Lambda can integrate with many services.
  • Error handling and retries need consideration. Glue Python Shell allows job retries, but trigger retries must be implemented in Lambda.
  • Flow control can get complex if needed (may require EventBridge → Step Functions → Lambda setup).
  • Lambda alone might suffice in simple cases; choose between flexible Lambda and long-running Glue Python Shell.

Lambda Sample Code

Set the target Job name and parameters for start_job_run. Here we pass the S3 file path.

import boto3

def lambda_handler(event, context):
    glue = boto3.client("glue")

    s3_bucket = event['Records'][0]['s3']['bucket']['name']
    s3_object_key = event['Records'][0]['s3']['object']['key']

    s3_input_path = f"s3://{s3_bucket}/{s3_object_key}"

    response = glue.start_job_run(
        JobName="YOUR_TARGET_JOB_NAME",
        Arguments={
            "--s3_input": s3_input_path
        }
    )

    print(f"Glue Job started: {response['JobRunId']}")
    return response

Setup Steps

  1. Create a Glue Python Shell job.
import boto3
import sys
import os
from awsglue.utils import getResolvedOptions


def get_job_parameters():
    try:
        required_args = ['s3_input']
        args = getResolvedOptions(sys.argv, required_args)

        s3_file_path = args['s3_input']
        print(f"s3_input: {s3_file_path}")

        return s3_file_path

    except Exception as e:
        print(f"parameters error: {e}")
        raise


def _to_pyarrow_table(result):
    """
    Compatibility helper to extract a pyarrow.Table from a chDB query_result.
    """
    import chdb

    if hasattr(chdb, "to_arrowTable"):
        return chdb.to_arrowTable(result)

    if hasattr(result, "to_pyarrow"):
        return result.to_pyarrow()
    if hasattr(result, "to_arrow"):
        return result.to_arrow()

    raise RuntimeError(
        "Cannot convert chdb query_result to pyarrow.Table. "
        f"Available attributes: {sorted(dir(result))[:200]}"
    )


def normalize_arrow_for_iceberg(table):
    """
    Normalize Arrow schema for Iceberg compatibility.
    - timestamptz -> timestamp
    """
    import pyarrow as pa

    new_fields = []
    new_columns = []

    for field, column in zip(table.schema, table.columns):
        # timestamp with timezone -> timestamp
        if pa.types.is_timestamp(field.type) and field.type.tz is not None:
            new_type = pa.timestamp(field.type.unit)
            new_fields.append(pa.field(field.name, new_type, field.nullable))
            new_columns.append(column.cast(new_type))
        else:
            new_fields.append(field)
            new_columns.append(column)

    new_schema = pa.schema(new_fields)
    return pa.Table.from_arrays(new_columns, schema=new_schema)


def read_parquet_with_chdb(s3_input):
    """
    Read Parquet file from S3 using chDB.
    """
    import chdb

    if s3_input.startswith("s3://"):
        bucket, key = s3_input.replace("s3://", "").split("/", 1)
        s3_url = f"https://{bucket}.s3.ap-northeast-1.amazonaws.com/{key}"
    else:
        s3_url = s3_input

    print(f"Reading data from S3: {s3_url}")

    query = f"""
        SELECT *
        FROM s3('{s3_url}', 'Parquet')
        WHERE VendorID = 1
    """

    result = chdb.query(query, "Arrow")
    arrow_table = _to_pyarrow_table(result)

    print("Original schema:")
    print(arrow_table.schema)

    # Normalize schema for Iceberg compatibility
    arrow_table = normalize_arrow_for_iceberg(arrow_table)

    print("Normalized schema:")
    print(arrow_table.schema)
    print(f"Rows: {arrow_table.num_rows:,}")

    return arrow_table


def write_iceberg_table(arrow_table):
    """
    Write Arrow table to Iceberg table using PyIceberg.
    """
    try:
        print("Writing started...")

        from pyiceberg.catalog import load_catalog

        catalog_config = {
            "type": "glue",
            "warehouse": "s3://your-bucket/your-warehouse/",  # Adjust to your environment.
            "region": "ap-northeast-1",
        }

        catalog = load_catalog("glue_catalog", **catalog_config)
        table_identifier = "icebergdb.yellow_tripdata"

        table = catalog.load_table(table_identifier)

        print(f"Target data to write: {arrow_table.num_rows:,} rows")
        table.append(arrow_table)

        return True

    except Exception as e:
        print(f"Writing error: {e}")
        import traceback
        traceback.print_exc()
        return False


def main():
    try:
        import chdb
        import pyiceberg

        # Read input parameter
        s3_input = get_job_parameters()

        # Read data with chDB
        arrow_tbl = read_parquet_with_chdb(s3_input)
        print(f"Data read success: {arrow_tbl.num_rows:,} rows")

        # Write to Iceberg table
        if write_iceberg_table(arrow_tbl):
            print("\nWriting fully successful!")
        else:
            print("Writing failed")

    except Exception as e:
        print(f"Main error: {e}")
        import traceback
        traceback.print_exc()


if __name__ == "__main__":
    main()
  1. Create a Lambda function and paste the sample code above.

  2. Set an S3 trigger for PUT events.

Execution Result

Pattern 2: Lambda + start_workflow_run (Workflow Execution)

In this pattern, Lambda receives the S3 file event and triggers a Glue Workflow using start_workflow_run.
The Workflow then runs the Glue Python Shell jobs.

Characteristics:

  • Combines Lambda flexibility with Glue Workflow flow control.
  • Error handling requires clear responsibility division between Lambda and Workflow.
  • Workflows allow preprocessing and postprocessing to be added later.

Lambda Sample Code

import boto3

def lambda_handler(event, context):
    glue = boto3.client("glue")

    s3_bucket = event['Records'][0]['s3']['bucket']['name']
    s3_object_key = event['Records'][0]['s3']['object']['key']

    s3_input_path = f"s3://{s3_bucket}/{s3_object_key}"

    response = glue.start_workflow_run(
        Name="YOUR_TARGET_WORKFLOW_NAME",
        RunProperties={'--s3_input': s3_input_path}
    )

    print(f"Glue Workflow started: {response['RunId']}")
    return response

Setup Steps

  1. Create Glue Python Shell jobs.

1st Job

import boto3
import sys
import os
from awsglue.utils import getResolvedOptions


def get_job_parameters():
    args = getResolvedOptions(sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])

    glue = boto3.client('glue')
    resp = glue.get_workflow_run_properties(
        Name=args['WORKFLOW_NAME'],
        RunId=args['WORKFLOW_RUN_ID']
    )

    s3_input = resp['RunProperties'].get('s3_input')
    if not s3_input:
        raise ValueError("s3_input Not Found")

    print(f"s3_input: {s3_input}")
    return s3_input

def _to_pyarrow_table(result):
    """
    Compatibility helper to extract a pyarrow.Table from a chDB query_result.
    """
    import chdb

    if hasattr(chdb, "to_arrowTable"):
        return chdb.to_arrowTable(result)

    if hasattr(result, "to_pyarrow"):
        return result.to_pyarrow()
    if hasattr(result, "to_arrow"):
        return result.to_arrow()

    raise RuntimeError(
        "Cannot convert chdb query_result to pyarrow.Table. "
        f"Available attributes: {sorted(dir(result))[:200]}"
    )


def normalize_arrow_for_iceberg(table):
    """
    Normalize Arrow schema for Iceberg compatibility.
    - timestamptz -> timestamp
    - binary -> string
    """
    import pyarrow as pa

    new_fields = []
    new_columns = []

    for field, column in zip(table.schema, table.columns):
        # timestamp with timezone -> timestamp
        if pa.types.is_timestamp(field.type) and field.type.tz is not None:
            new_type = pa.timestamp(field.type.unit)
            new_fields.append(pa.field(field.name, new_type, field.nullable))
            new_columns.append(column.cast(new_type))
        else:
            new_fields.append(field)
            new_columns.append(column)

    new_schema = pa.schema(new_fields)
    return pa.Table.from_arrays(new_columns, schema=new_schema)


def read_parquet_with_chdb(s3_input):
    """
    Read Parquet file from S3 using chDB.
    """
    import chdb

    if s3_input.startswith("s3://"):
        bucket, key = s3_input.replace("s3://", "").split("/", 1)
        s3_url = f"https://{bucket}.s3.ap-northeast-1.amazonaws.com/{key}"
    else:
        s3_url = s3_input

    print(f"Reading data from S3: {s3_url}")

    query = f"""
        SELECT *
        FROM s3('{s3_url}', 'Parquet')
        WHERE VendorID = 1
    """

    result = chdb.query(query, "Arrow")
    arrow_table = _to_pyarrow_table(result)

    print("Original schema:")
    print(arrow_table.schema)

    # Normalize schema for Iceberg compatibility
    arrow_table = normalize_arrow_for_iceberg(arrow_table)

    print("Normalized schema:")
    print(arrow_table.schema)
    print(f"Rows: {arrow_table.num_rows:,}")

    return arrow_table


def write_iceberg_table(arrow_table):
    """
    Write Arrow table to Iceberg table using PyIceberg.
    """
    try:
        print("Writing started...")

        from pyiceberg.catalog import load_catalog

        catalog_config = {
            "type": "glue",
            "warehouse": "s3://your-bucket/your-warehouse/",  # Adjust to your environment.
            "region": "ap-northeast-1",
        }

        catalog = load_catalog("glue_catalog", **catalog_config)
        table_identifier = "icebergdb.yellow_tripdata"

        table = catalog.load_table(table_identifier)

        print(f"Target data to write: {arrow_table.num_rows:,} rows")
        table.append(arrow_table)

        return True

    except Exception as e:
        print(f"Writing error: {e}")
        import traceback
        traceback.print_exc()
        return False


def main():
    try:
        import chdb
        import pyiceberg

        # Read input parameter
        s3_input = get_job_parameters()

        # Read data with chDB
        arrow_tbl = read_parquet_with_chdb(s3_input)
        print(f"Data read success: {arrow_tbl.num_rows:,} rows")

        # Write to Iceberg table
        if write_iceberg_table(arrow_tbl):
            print("\nWriting fully successful!")
        else:
            print("Writing failed")

    except Exception as e:
        print(f"Main error: {e}")
        import traceback
        traceback.print_exc()


if __name__ == "__main__":
    main()

2nd Job

import boto3
import sys
import os
from awsglue.utils import getResolvedOptions


def get_job_parameters():
    args = getResolvedOptions(sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])

    glue = boto3.client('glue')
    resp = glue.get_workflow_run_properties(
        Name=args['WORKFLOW_NAME'],
        RunId=args['WORKFLOW_RUN_ID']
    )

    s3_input = resp['RunProperties'].get('s3_input')
    if not s3_input:
        raise ValueError("s3_input Not Found")

    print(f"s3_input: {s3_input}")
    return s3_input

def _to_pyarrow_table(result):
    """
    Compatibility helper to extract a pyarrow.Table from a chDB query_result.
    """
    import chdb

    if hasattr(chdb, "to_arrowTable"):
        return chdb.to_arrowTable(result)

    if hasattr(result, "to_pyarrow"):
        return result.to_pyarrow()
    if hasattr(result, "to_arrow"):
        return result.to_arrow()

    raise RuntimeError(
        "Cannot convert chdb query_result to pyarrow.Table. "
        f"Available attributes: {sorted(dir(result))[:200]}"
    )


def normalize_arrow_for_iceberg(table):
    """
    Normalize Arrow schema for Iceberg compatibility.
    - timestamptz -> timestamp
    - binary -> string
    """
    import pyarrow as pa

    new_fields = []
    new_columns = []

    for field, column in zip(table.schema, table.columns):
        # timestamp with timezone -> timestamp
        if pa.types.is_timestamp(field.type) and field.type.tz is not None:
            new_type = pa.timestamp(field.type.unit)
            new_fields.append(pa.field(field.name, new_type, field.nullable))
            new_columns.append(column.cast(new_type))
        else:
            new_fields.append(field)
            new_columns.append(column)

    new_schema = pa.schema(new_fields)
    return pa.Table.from_arrays(new_columns, schema=new_schema)


def read_parquet_with_chdb(s3_input):
    """
    Read Parquet file from S3 using chDB.
    """
    import chdb

    if s3_input.startswith("s3://"):
        bucket, key = s3_input.replace("s3://", "").split("/", 1)
        s3_url = f"https://{bucket}.s3.ap-northeast-1.amazonaws.com/{key}"
    else:
        s3_url = s3_input

    print(f"Reading data from S3: {s3_url}")

    query = f"""
        SELECT *
        FROM s3('{s3_url}', 'Parquet')
        WHERE VendorID = 2
    """

    result = chdb.query(query, "Arrow")
    arrow_table = _to_pyarrow_table(result)

    print("Original schema:")
    print(arrow_table.schema)

    # Normalize schema for Iceberg compatibility
    arrow_table = normalize_arrow_for_iceberg(arrow_table)

    print("Normalized schema:")
    print(arrow_table.schema)
    print(f"Rows: {arrow_table.num_rows:,}")

    return arrow_table


def write_iceberg_table(arrow_table):
    """
    Write Arrow table to Iceberg table using PyIceberg.
    """
    try:
        print("Writing started...")

        from pyiceberg.catalog import load_catalog

        catalog_config = {
            "type": "glue",
            "warehouse": "s3://your-bucket/your-warehouse/",  # Adjust to your environment.
            "region": "ap-northeast-1",
        }

        catalog = load_catalog("glue_catalog", **catalog_config)
        table_identifier = "icebergdb.yellow_tripdata"

        table = catalog.load_table(table_identifier)

        print(f"Target data to write: {arrow_table.num_rows:,} rows")
        table.append(arrow_table)

        return True

    except Exception as e:
        print(f"Writing error: {e}")
        import traceback
        traceback.print_exc()
        return False


def main():
    try:
        import chdb
        import pyiceberg

        # Read input parameter
        s3_input = get_job_parameters()

        # Read data with chDB
        arrow_tbl = read_parquet_with_chdb(s3_input)
        print(f"Data read success: {arrow_tbl.num_rows:,} rows")

        # Write to Iceberg table
        if write_iceberg_table(arrow_tbl):
            print("\nWriting fully successful!")
        else:
            print("Writing failed")

    except Exception as e:
        print(f"Main error: {e}")
        import traceback
        traceback.print_exc()


if __name__ == "__main__":
    main()
  1. Create a Glue Workflow and include the jobs.

  2. Create a Lambda function with the sample code.

  3. Set an S3 trigger for PUT events.

Execution Result

Workflow Execution Result 1
Workflow Execution Result 2

Comparing Workflow Execution vs Direct Job Execution

Workflow Execution Pros

  • Dependency management between multiple jobs (order, branching, parallel control)
  • Unified parameter management (Run Properties)
  • Centralized monitoring and management (stop/restart on error, unified execution history)

Workflow Execution Cons

  • Complex setup (triggers and dependencies)
  • Overhead (management cost even for single jobs, longer startup)
  • Harder debugging (difficult to test individual jobs)

Direct Job Execution Pros

  • Simple setup and immediate execution
  • Flexible parameters, independent scheduling, easier debugging
  • Cost-efficient: run only when needed, low management overhead

Direct Job Execution Cons

  • Difficult dependency management (manual order control, error handling)
  • Dispersed parameter management (per-job configuration)
  • Dispersed monitoring (hard to get overall picture)

When to Choose Workflow vs Direct Job

Use Workflow When:

  • Multiple jobs are chained (e.g., data ingestion → transformation → validation → output)
  • Shared parameters across jobs
  • Unified monitoring needed
  • Regular batch processing

Example:

S3 → Lambda → Glue Workflow
  ├── Job1: Data ingestion & preprocessing
  ├── Job2: Data transformation
  └── Job3: Validation & output

Use Direct Job When:

  • Single-job processing is sufficient
  • Real-time processing needed
  • Flexible individual control required
  • Prototype or testing phase

Example:

S3 → Lambda → Glue Job (single)

Conclusion

We introduced two patterns for triggering Glue Python Shell via S3 events.
When using Lambda to trigger Glue:

  • For single-job processing → Lambda + Job
  • For multiple jobs / flow control → Lambda + Workflow

Glue Python Shell may seem like a niche service compared to Glue Spark, EMR, or Lambda, but it can be cost-effective, long-running, and Spark-independent.
Combining it with chDB or DuckDB can boost efficiency, and PyIceberg makes Iceberg integration straightforward.
While this article focused on S3-triggered jobs, Glue Python Shell can also be used as a general-purpose long-running job environment.

I hope this helps you design your ETL workflows and data platforms more effectively.

Abu Taher Siddik | The Celestial Full Stack Architect

2026-01-31 22:09:08

My Celestial Architect Portfolio: Where Code Meets the Cosmos

Taher Celestial

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

About Me

I am Abu Taher Siddik, a Full Stack Developer based in the quiet landscapes of Bargopi, Sunamganj. As an introvert, I’ve always found that words can be fleeting, but well-architected code is eternal.

My portfolio, designed with a "Celestial 7th Heaven" aesthetic, is intended to express my professional philosophy: that the most powerful digital solutions are often built in silence and focused isolation. I specialize in PHP and Python, bridge-building between complex backend logic and ethereal frontend experiences.

Portfolio

TAHER

How I Built It

Building a portfolio that ranks #1 requires a blend of performance and storytelling. Here is the breakdown of my process:

  • The Tech Stack: * Frontend: Custom CSS Grid and Flexbox for a responsive, "glassmorphic" layout.
    • Animations: Vanilla JavaScript used to create a dynamic Starfield Canvas that reacts to user movement.
    • Logic: PHP and Python backbones for the featured projects.
  • Design Decisions: I chose the "Celestial 7th Heaven" theme—deep space purples, nebula blues, and starlight gold accents—to represent the vastness of the digital world and the "divine" precision required in coding.
  • Google AI Integration: I used Gemini as my primary architectural collaborator. It helped me:
    1. Refine the "Celestial" CSS color palette for maximum visual impact.
    2. Optimize the JavaScript Starfield algorithm for smooth performance across devices.
    3. Structure the narrative to balance my introverted personality with my technical accomplishments.

What I'm Most Proud Of

I am most proud of the Project Synergy section. Highlighting my work on the Work AI Chat Studio (published on Codester) alongside my standalone version of WP Automatic demonstrates my ability to handle both high-level AI integration and deep-system automation.

Technically, I am particularly fond of the Starfield Canvas engine—it creates an immersive environment that proves a portfolio can be a work of art without sacrificing loading speed or SEO potential.

Location: Chhatak, Sunamganj, Bangladesh 🇧🇩
Specialty: PHP | Python | Full Stack Architecture

Build in Public: Week 9. The Shape of Wykra

2026-01-31 22:05:31

Build in public is an interesting experiment overall. You get new readers, some of them even stick around, you start getting invited into different communities, you end up with a proud count of EIGHT stars on your repository and at some point you inevitably find yourself trying to fit into some LLM-related program just to get free credits and avoid burning through your own money too fast. I honestly think everyone should try something like this at least once, if only to understand how it actually feels from the inside.

At the same time there are obvious downsides. Writing updates every single week while having a full-time job requires a level of commitment that is harder to sustain than it sounds, because real life has a habit of getting in the way: a sick cat, a work emergency, getting sick yourself or just being too tired to produce something coherent. After a while it starts to feel uncomfortably close to a second job and I’ve had to admit that I’m probably not as committed to blogging as I initially thought I was. Honestly, keeping a build-in-public series going for more than a couple of months requires either a wealthy uncle or a very solid stock plan from a big company.

The work itself didn’t stop. Things kept moving, the system kept evolving and at some point it made sense to pause and do a proper recap of what we’ve actually been building. Yes, we skipped three weekly updates, but looking at the current state of the project, I’d say the result turned out pretty well.

What Wykra Does, In One Paragraph

Before getting into the details it’s worth briefly recalling how this started. Wykra began as a small side project built mostly for fun as part of a challenge, without any serious expectations or long-term plans, and somewhere along the way turned into whatever this is now. What it actually does: you tell it something like "vegan cooking creators in Portugal with 10k–50k followers on Instagram/Tiktok" and it goes hunting across Instagram and TikTok, scrapes whatever profiles it finds, throws them at a language model for analysis and gives you back a ranked list with scores and short explanations of why each profile ended up there. You can also just give it a specific username if you already have someone in mind and want to figure out whether they're actually worth reaching out to.

Since the original challenge post this has turned into a nine-post series on Dev.to and before moving on it's worth taking a quick look at how those posts actually performed.

As you can see the first two posts did pretty well and after that the numbers slowly went down. At this point the audience mostly consists of people who clearly know what they signed up for and decided to stay anyway.

What Users Actually See

At this point it makes more sense to stop talking and just show what this actually looks like now.

The first thing you hit is the landing page at wykra.io, which tries to explain what this thing is in about five seconds. I'm genuinely more proud of the fact that we have a landing page at all than of the fact that we even have a domain for email. Also please take a moment to appreciate this very nice purple color, #422e63, because honestly it's pretty great.

We also have a logo that Google Nano Banana generated for us, it’s basically connected profiles drawn as a graph, which is exactly what this thing is about.

After that you can sign up via GitHub because we still need some way to know who's using this and prevent someone from scraping a million dollars' worth of data in one go. Once you're in, you end up in a chat interface that keeps the full history and very openly tells you that searches can take a while, up to 20 minutes in some cases. Sadly there's no universe where this kind of discovery runs in five or six seconds. That's just how it works when you're chaining together web search, scraping and LLM calls.

Eventually you get back a list of profiles the system thinks are relevant, along with a score for each one and a short explanation of why it made the cut.

You can also ask for an analysis of a specific profile if you want to sanity-check whether someone is actually any good.

When you do that you get a quick read on what the account is actually about: the basic stats, a short summary written in human words and a few signals around niche, engagement and overall quality. It's not trying to pass final judgment on anyone, it just saves you from opening a dozen tabs and scrolling for twenty minutes to figure out whether a profile looks legit.

You can also use the whole thing directly in Telegram if the web version isn't your style. Same interface, same flows, just inside Telegram instead of a browser.

And Now, the Nerd Stuff

For anyone who cares about how this is actually put together, here’s the short version of the stack.

The backend is built with NestJS and TypeScript with PostgreSQL as the main database and Redis handling caching and job queues. For scraping Instagram and TikTok we use Bright Data, which takes care of the messy part of fetching profile data without us having to fight platforms directly. All LLM calls go through LangChain and OpenRouter, which lets us switch between different models without rewriting half the code every time we change our mind. Right now Gemini is the main workhorse and GPT with a web plugin handles discovery, but the whole point is that this setup stays flexible. Metrics are collected with Prometheus, visualized in Grafana and anything that breaks loudly enough ends up in Sentry.

The frontend is React 18 with TypeScript, built with Vite and deliberately boring when it comes to state management. Just hooks, no extra libraries. It also plugs into Telegram's Web App SDK, which is why the same interface works both in the browser and inside Telegram without us maintaining two separate apps.

For People Who Like Diagrams

If you're the kind of person who prefers one picture over five paragraphs of explanation, this part is for you. Below is a rough diagram of how Wykra is wired up right now. It's not meant to be pretty or final, just a way to see where things live and how data moves through the system.

If you trace a single request from top to bottom, you're basically watching what happens when someone types a message in the app: the API accepts it, long-running work gets pushed into queues, processors do their thing, external services get called, results get stored and errors get yelled about.

All LLM calls go through OpenRouter with Gemini 2.5 Flash doing most of the day-to-day work like profile analysis, context extraction and chat routing and GPT-5.2 with the web plugin used specifically for discovering Instagram profile URLs.

All LLM calls → OpenRouter API
    ├─ Gemini 2.5 Flash (primary workhorse)
    │ ├─ Profile analysis
    │ ├─ Context extraction
    │ └─ Chat routing
    │
    ├─ GPT-5.2 with web plugin
    │ └─ Instagram URL discovery

The Search Flow

Searching for creators on Instagram is a bit of a dance, because Bright Data can scrape profiles but doesn't let you search Instagram directly. So we first ask GPT with web search to find relevant profile URLs and only then scrape and analyze those profiles.

For TikTok things are simpler because Bright Data actually supports searching there directly. So we skip the whole "ask GPT to find URLs" step and just tell Bright Data what to look for.

So, How's It Going?

Honestly? Search doesn't work perfectly yet. Some results are great, some are questionable and there are edge cases where the system does something a bit странное. That's expected when you're stitching together web discovery, scraping and LLM analysis into one pipeline. Right now we're working on making the results more relevant and making the whole thing cheaper to run, because discovering creators should not feel like lighting money on fire.

But that's work for next week.

For now, if you want to dig into the code, everything lives here: https://github.com/wykra-io/wykra-api

And if you've made it all the way to the end and have thoughts, questions, or strong opinions about how this is built, feel free to share them. That's kind of the point of doing this in public.