MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Building a Simple Interest Calculator with HTML, CSS, and JavaScript

2026-02-20 11:38:16

Why I Built This

I wanted to build another small front-end project that focuses on logic and clarity rather than features or design polish.
An interest calculator felt like a good choice because it uses simple math but is still useful in real life.
The goal was to practice writing clean JavaScript and building something understandable without frameworks.

This is a learning-focused build log.

What the Interest Calculator Does

The calculator keeps things intentionally simple:

Takes principal amount, interest rate, and time as input

Calculates simple interest instantly

Displays the result clearly

Works well on mobile screens

No charts, no compound logic, no advanced options.

Tech Stack

I intentionally kept the stack minimal:

HTML – structure and input fields

CSS – clean, responsive layout

JavaScript – interest calculation and live updates

Using vanilla JavaScript helped me focus on fundamentals instead of abstractions.

Interest Calculation Logic
The Formula (Plain English)

Simple interest is calculated using:

Simple Interest = (P × R × T) / 100

Where:

P = Principal amount

R = Annual interest rate

T = Time (in years)

In this project:

Principal is entered as a number

Interest rate is annual

Time is entered in years

Core JavaScript Logic

Here’s the main calculation function:

function calculateInterest(principal, rate, time) {
if (principal <= 0 || rate <= 0 || time <= 0) {
return null;
}

const interest = (principal * rate * time) / 100;
return interest.toFixed(2);
}

To keep the calculator responsive, the result updates whenever the user changes an input:

inputs.forEach(input => {
input.addEventListener("input", updateInterest);
});

function updateInterest() {
const principal = Number(principalInput.value);
const rate = Number(rateInput.value);
const time = Number(timeInput.value);

const interest = calculateInterest(principal, rate, time);
result.textContent = interest ? Interest: $${interest} : "";
}

The focus here was readability and correctness rather than compact code.

UI & UX Decisions

I kept the UI simple and functional:

Mobile-first layout

Clearly labeled input fields

Instant feedback instead of a calculate button

Minimal styling to avoid distraction

The calculator is designed to be usable without any instructions.

Edge Cases & Possible Improvements

Things handled or noticed:

Empty inputs show no output

Zero or negative values are ignored

Assumes simple interest only

Possible improvements for a future version:

Total amount calculation (principal + interest)

Option to switch between years and months

Better validation and accessibility support

I avoided adding these to keep the scope small.

Demo & Source Code

If you want to try it or look at the code:

Live demo: https://yuvronixstudio.github.io/interest-calculator/

Source code: https://github.com/YuvronixStudio/interest-calculator

Feedback

I’d appreciate feedback from other developers, especially on:

Is the interest calculation logic correct?

Any UI or UX improvements you’d recommend?

Better ways to structure JavaScript?

This was built as a learning exercise, so constructive feedback is welcome.

What Are Machine Types in Google Compute Engine?

2026-02-20 11:20:11

Choosing a computer used to be simple: you’d walk into a store, pick a laptop with a decent screen, and hope it didn't lag. In the cloud, things are a bit more sophisticated. When you're setting up a Virtual Machine (VM), the most critical decision you’ll make is choosing your "machine type."

But what are machine types exactly? In Google Cloud Platform (GCP), a machine type is a specific set of virtual hardware resources—specifically vCPU (processing power) and memory (RAM)—that determines how powerful your virtual server will be.

Think of it as the "engine" of your cloud car. Depending on whether you're driving to the grocery store or competing in a Formula 1 race, you’ll need a very different engine.

The Three Pillars: Machine Families, Series, and Types

To master what are machine types, you need to understand the hierarchy Google uses to organize them. It’s like a Russian Nesting Doll of technology:

  1. Machine Family: The broad category based on the workload (e.g., General-purpose, Compute-optimized).
  2. Series: The specific generation or hardware "brand" (e.g., E2, N2, C3).
  3. Machine Type: The actual size of the instance (e.g., e2-standard-4).

Simple Analogy: The T-Shirt Shop

Imagine you are buying a company t-shirt:

  • Family: The Fabric (Cotton for comfort, Spandex for sports).
  • Series: The Brand (Nike vs. Gildan).
  • Machine Type: The Size (Small, Medium, Large).

Exploring the 4 Main Machine Families

Google doesn't just give you a random list of parts; they group them into "Families" designed for specific jobs.

1. General-Purpose (E2, N2, N4)

These are the "Swiss Army Knives" of the cloud. They offer a balance of price and performance.

  • Best For: Web servers, small databases, and development environments.
  • Real-World Example: If you’re running a medium-traffic blog or a company's internal HR portal, this is your go-to.

2. Compute-Optimized (C2, C3, H3)

These machines prioritize raw CPU speed. They are built on high-end processors that can crunch numbers at lightning speed.

  • Best For: High-performance computing (HPC), gaming servers, and media transcoding.
  • Real-World Example: Think of a video editor rendering a 4K movie. They need every ounce of "brainpower" to finish the task quickly.

3. Memory-Optimized (M1, M2, M3)

Some apps don't care about the CPU as much as they care about having a massive amount of "short-term memory" (RAM).

  • Best For: Large in-memory databases (like SAP HANA) and real-time data analytics.
  • Simple Analogy: It’s like a chef with a massive 20-foot kitchen counter. They can keep hundreds of ingredients ready at once without having to go back to the pantry (the hard drive).

4. Accelerator-Optimized (A2, A3, G2)

These are the heavy hitters equipped with NVIDIA GPUs or Google’s own TPUs.

  • Best For: Artificial Intelligence (AI), Machine Learning (ML), and complex 3D simulations.
  • Real-World Example: Training a model like ChatGPT requires massive parallel processing that only these "super-engines" can provide.

Predefined vs. Custom: Can You Build Your Own?

Google offers Predefined Machine Types, which are "set menus" (e.g., a standard machine with 4 vCPUs and 16GB of RAM). However, if none of those fit your needs, you can use Custom Machine Types.

  • Custom Machine Types: You move a slider to pick exactly 3 vCPUs and 11.5GB of RAM.
  • Why do this? To save money! Why pay for a "Large" when you only need a "Medium-and-a-half"?

Quick Comparison Table

Machine Series Primary Strength Best Use Case Cost Level
E2 Cost-Efficiency Small apps, Dev/Test $ (Lowest)
N2 Balanced Performance Enterprise apps, Databases $$
C3 Raw CPU Power High-traffic web, Gaming $$$
M3 Massive RAM In-memory DBs (SAP HANA)
A3 GPU Acceleration AI / Deep Learning
$ (Highest)

Actionable Takeaway: Choosing Your First Type

If you are just starting out and wondering what are machine types to try first, follow the "Rule of E2":

  1. Start with an E2-medium. It’s cheap, reliable, and fits 80% of starter projects.
  2. Monitor your "Cloud Console" metrics. If you see the CPU usage hitting 90% constantly, it’s time to upgrade to a Compute-Optimized (C-series).
  3. Check for Recommendations. Google Cloud actually watches your VM and will send you a notification saying: "Hey, you're paying for too much RAM. Switch to a smaller type to save $20 a month."

From Silent None to Insight: Debugging PySpark UDFs on AWS Glue with Decorators

2026-02-20 11:14:48

Last month I was debugging a PySpark UDF that was silently returning None for about 2% of rows in a 10-million-row dataset. No error. No exception. Just... None.

I couldn't reproduce it locally because I didn't have the exact row that caused it. I couldn't add print() statements because -- as I painfully discovered -- print() inside a UDF doesn't show up anywhere useful. The output vanishes into executor logs that are buried three clicks deep in the Spark UI, if they exist at all.

That frustration led me to build a small set of PySpark debugging decorators. Some of them turned out to be genuinely useful. Others taught me more about Spark's architecture than I expected. And the whole thing sent me down a rabbit hole about how AWS Glue's Docker image actually works under the hood.

This post covers:

  • Three decorators I actually use in production debugging
  • Why print() inside a UDF doesn't do what you think
  • How AWS Glue's local Docker environment works (Livy, Sparkmagic, and the stdout black hole)
  • How to set up and test Glue jobs locally with Docker

Let's get into it.

The Setup: Testing Glue Jobs Locally with Docker

Before we get to the decorators, let me explain the environment. AWS Glue provides a Docker image that lets you develop and test ETL jobs on your own machine without spinning up cloud resources. This is a massive time-saver -- no waiting for Glue job cold starts, no paying for dev endpoint hours.

Here's how to get it running:

Pull the image

docker pull public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01

Start the container

Linux/Mac:

WORKSPACE=/path/to/your/project

docker run -itd \
    --name glue_jupyter \
    -p 8888:8888 \
    -p 4040:4040 \
    -v "$WORKSPACE:/home/glue_user/workspace/jupyter_workspace" \
    -v "$HOME/.aws:/home/glue_user/.aws:ro" \
    -e DISABLE_SSL=true \
    public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 \
    /home/glue_user/jupyter/jupyter_start.sh \
    --NotebookApp.token='' \
    --NotebookApp.password=''

Windows (Git Bash / PowerShell):

docker run -itd \
    --name glue_jupyter \
    -p 8888:8888 \
    -p 4040:4040 \
    -v "C:/your/project://home/glue_user/workspace/jupyter_workspace" \
    -v "C:/Users/YourName/.aws://home/glue_user/.aws:ro" \
    -e DISABLE_SSL=true \
    public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 \
    //home/glue_user/jupyter/jupyter_start.sh \
    --NotebookApp.token='' \
    --NotebookApp.password=''

Then open http://localhost:8888 in your browser. Navigate to the jupyter_workspace folder. Your local project files are right there, mounted into the container. Any changes you make are reflected on your host filesystem.

You can monitor Spark jobs in real-time at http://localhost:4040.

If you're using VS Code, connect to the Jupyter server at http://127.0.0.1:8888 and select the PySpark kernel.

Wait, How Does This Actually Work? The Livy Architecture

Here's something that tripped me up for hours and is worth understanding before we talk about decorators.

When you run a PySpark cell in the Glue Docker notebook, your code does not run inside the Jupyter kernel process. Here's what actually happens:

 [Your Browser]
       |
 [Jupyter Server]
       |
 [Sparkmagic Kernel]  -- sends your code over HTTP -->  [Livy Server :8998]
                                                              |
                                                       [Spark Driver JVM]
                                                              |
                                                       [Spark Executors]

Sparkmagic is the Jupyter kernel. It doesn't run your Python code directly. Instead, it serializes your cell's code as a string and sends it via HTTP POST to Livy, which is an open-source REST server for Spark.

Livy creates a Spark session, executes your code in a separate JVM-hosted Python process, captures whatever gets written to stdout, and sends that text back to Sparkmagic, which displays it in your notebook cell.

This architecture has a critical consequence: Livy only reliably captures stdout from the top-level cell execution. If your code calls a function, which calls a decorator, which calls print() three stack frames deep -- that output often gets lost in transit. Livy captures what Spark's driver process writes to stdout at the top level. Nested print() calls inside wrapper functions? Hit or miss. Usually miss.

This is why df.show() works (Spark's JVM writes directly to stdout at the top level) but print("hello") inside a decorator wrapper gets swallowed.

Understanding this saved me from a lot of "why doesn't this work" frustration. It's not a bug. It's the architecture.

The Decorators

With that context, here are the three decorators I actually kept after throwing away the ones that weren't pulling their weight.

Decorator 1: @measure_time

The simplest one, and probably the one I use most.

def measure_time(func):
    """Decorator to measure execution time of a function."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start_time
        _log(f"[measure_time] {func.__name__} completed in {duration:.2f} seconds")
        return result
    return wrapper

Why it's useful: When you're chaining five transformations and the job takes 12 minutes, you need to know which transformation is the bottleneck. The Spark UI gives you stage-level timings, but this gives you function-level timings at a glance.

@measure_time
def build_features(input_df):
    return input_df.filter(...).join(...).groupBy(...).agg(...)

result = build_features(df)
result.show()
show_report()  # [measure_time] build_features completed in 4.32 seconds

Decorator 2: @show_sample_output

Quick peek at what your transformation actually produced.

def show_sample_output(n=5):
    """Decorator to show first n rows from resulting DataFrame."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            result = func(*args, **kwargs)
            if isinstance(result, DataFrame):
                _log(f"\n[show_sample_output] First {n} rows from {func.__name__}:")
                _log(result._jdf.showString(n, 20, False))
            return result
        return wrapper
    return decorator

A note on _jdf: yes, _jdf is a private API -- it could break in a future Spark version. For production logging I'd use result.limit(n).toPandas().to_string() instead. But for a debugging decorator that you strip before deploy, I'm fine with it. It avoids the Livy stdout capture problem that .show() has, and it's fast.

Why it's useful: When you're building a pipeline with multiple transformation stages, you often want to verify the output shape at each step without manually adding .show() everywhere. Slap this decorator on, see your data, remove it when you're done.

@show_sample_output(3)
def clean_names(input_df):
    return input_df.withColumn("Name", upper(col("Name")))

result = clean_names(df)
show_report()

Decorator 3: debug_udf (The One That Actually Solved My Problem)

This is the one that came out of real pain. Here's the problem: UDFs run on executors, not the driver. You cannot:

  • Add print() statements (output goes to executor logs, not your notebook)
  • Use pdb or any debugger (it's a serialized function running on a remote JVM worker)
  • Read accumulator values inside the task (throws RuntimeError)
  • Easily figure out which input caused a failure

The solution: use a Spark Accumulator to ship (input, output) samples from executors back to the driver.

class ListAccumulatorParam(AccumulatorParam):
    def zero(self, initial_value):
        return []
    def addInPlace(self, acc1, acc2):
        return acc1 + acc2


def debug_udf(func, n=5):
    sc = SparkSession.builder.getOrCreate().sparkContext
    acc = sc.accumulator([], ListAccumulatorParam())

    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        # Can't read acc.value here (throws RuntimeError inside tasks).
        # Just always add. Limit to n when printing on the driver.
        try:
            input_repr = str(args[0]) if args else "No Args"
            if len(input_repr) > 500:
                input_repr = input_repr[:500] + "..."
            acc.add([{"input": input_repr, "output": str(result)}])
        except Exception:
            pass
        return result
    return wrapper, acc


def print_debug_samples(acc, func_name="UDF", n=None):
    samples = acc.value
    if n is not None:
        samples = samples[:n]
    for i, s in enumerate(samples, 1):
        print(f"  Sample {i}: Input={s['input']}  ->  Output={s['output']}")

Usage:

def my_udf_logic(row):
    return f"Processed: {row.Name}"

debug_fn, acc = debug_udf(my_udf_logic, n=3)
my_udf = udf(debug_fn, StringType())

result = df.withColumn("Result", my_udf(struct("Name", "Age")))
result.show()
print_debug_samples(acc, "my_udf_logic", n=3)

Output:

+-------+---+------------------+
|   Name|Age|            Result|
+-------+---+------------------+
|  Alice| 34|  Processed: Alice|
|    Bob| 45|    Processed: Bob|
|Charlie| 29|Processed: Charlie|
|  David| 30|  Processed: David|
+-------+---+------------------+

==================================================
[debug_udf] my_udf_logic: 3 sample(s)
==================================================
  Sample 1: Input=Row(Name='Alice', Age=34)  ->  Output=Processed: Alice
  Sample 2: Input=Row(Name='Bob', Age=45)    ->  Output=Processed: Bob
  Sample 3: Input=Row(Name='Charlie', Age=29)->  Output=Processed: Charlie
==================================================

Now you can see exactly what your UDF received and what it returned. When that UDF is producing None for mysterious rows, you can bump n up to 1000, scan the samples, and find the problematic input.

One important gotcha: I originally tried to limit samples inside the wrapper using if counter.value < limit. Spark throws RuntimeError: Accumulator.value cannot be accessed inside tasks. You can only write to accumulators on executors, never read. The limiting has to happen on the driver side in print_debug_samples.

The Livy stdout Problem (and the Solution)

If you're using the Glue Docker image and wondering why print() inside decorators doesn't show up, here's the full explanation.

The problem: Sparkmagic sends your code to Livy, which runs it in a separate Spark process. Livy captures stdout, but only reliably from top-level execution. print() buried inside nested function calls (like decorators) gets lost.

What works:

  • df.show() -- Spark's JVM writes directly to stdout
  • df.printSchema() -- same thing
  • print() at the top level of a cell

What doesn't work:

  • print() inside a decorator wrapper function
  • logger.info() from anywhere (goes to Python logging, not stdout)
  • sys.stdout.write() inside nested calls

The solution I landed on: Buffer all decorator output into a list, then print everything in one shot at the end of the cell.

_report_lines = []

def _log(msg):
    _report_lines.append(str(msg))

def show_report():
    global _report_lines
    if _report_lines:
        print('\n'.join(_report_lines))
        _report_lines = []

Every decorator calls _log() instead of print(). At the end of your cell, you call show_report() and the whole buffer gets printed as a single top-level print() that Livy reliably captures.

@measure_time
@show_sample_output(3)
def my_transform(input_df):
    return input_df.filter(input_df.Age > 30)

result = my_transform(df)
show_report()  # everything appears in cell output

What I Threw Away (And Why)

I originally built more decorators. Here's what didn't survive, why, and what I do instead.

@cache_and_count -- Called .count() after caching to log the row count. The problem: .count() forces a full materialization of the DataFrame. On a large dataset, you're adding minutes of compute just to log a number. If I need the count, I call .count() explicitly once at a point where I know I need it, not on every function call via a decorator.

@log_checkpoint -- Printed schema, row count, and partition count. Same .count() problem, plus .rdd.getNumPartitions() triggers another action. Two extra Spark actions per decorated function. When I need schema info, I call .printSchema() inline. When I need partition counts for skew analysis, I check the Spark UI at localhost:4040 -- it has the information without triggering extra compute.

@log_partitions -- Used .rdd.glom().map(len).collect() to show rows per partition. This collects partition metadata to the driver. On a large, heavily partitioned dataset, this can OOM the driver. For partition skew analysis, the Spark UI's stage detail page shows you task-level input sizes -- same information, no extra actions, no OOM risk.

The pattern here: anything that triggers a Spark action (count, collect, show) inside a decorator is silently doubling your compute cost. Decorators should be lightweight. If you need that data, call it explicitly so the cost is visible in your code.

Putting It Together

Here's the complete, minimal version of spark_analyser.ipynb that I actually use. Three decorators, one UDF debugger, and the Livy-compatible output buffer.

I keep this as a separate notebook and load it at the top of my working notebook:

import nbformat
with open("/home/glue_user/workspace/jupyter_workspace/spark_analyser.ipynb") as f:
    nb = nbformat.read(f, as_version=4)
for cell in nb.cells:
    if cell.cell_type == "code":
        exec(cell.source)

Side note: %run spark_analyser.ipynb exists but it runs in a separate scope in the Glue Docker environment. The variables don't carry over to your notebook. The exec() approach runs everything in the current namespace, which is what you actually want.

The full source is available on

GitHub logo visuvishwa99 / pyspark-debug-toolkit

Essential decorators and utilities for debugging PySpark.

# pull this image from Docker "https://hub.docker.com/layers/amazon/aws-glue-libs/"
docker pull public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01
# for linux / inside docker
docker run -itd \
    --name glue_jupyter_v2 \
    -p 8888:8888 \
    -p 4040:4040 \
    -v "/home/your_user/local/<folder_name>:/home/glue_user/workspace/jupyter_workspace" \
    -v "/home/your_user/.aws:/home/glue_user/.aws:ro" \
    -e DISABLE_SSL=true \
    public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 \
    /home/glue_user/jupyter/jupyter_start.sh \
    --NotebookApp.token='' \
    --NotebookApp.password=''
# windows (vscode)
docker run -itd \
    --name glue_jupyter_v2 \
    -p 8888:8888 \
    -p 4040:4040 \
    -v "C:/local/<folder_name>://home/glue_user/workspace/jupyter_workspace" \
    -v "C:/Users/<your_username>/.aws://home/glue_user/.aws:ro" \
    -e DISABLE_SSL=true \
    public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01 \
    //home/glue_user/jupyter/jupyter_start.sh \
    --NotebookApp.token='' \
    --NotebookApp.password=''

browser:

  1. Open http://localhost:8888
  2. open to "jupyter_workspace" folder.
  3. Open "<notebook_name>.ipynb" and run

FOR VS Code

  1. Open "<notebook_name>.ipynb" in VS Code.
  2. Enter "http://127.0.0.1:8888".
  3. Select pyspark / Python 3

Spark jobs in real-time at http://localhost:4040.

TL;DR

  • @measure_time -- Lightweight, always useful, zero overhead.
  • @show_sample_output(n) -- Quick peek at transformation output during development.
  • debug_udf -- Uses Spark Accumulators to capture UDF inputs/outputs on the driver. Solves the "why is my UDF returning None" problem.
  • Avoid decorators that trigger Spark actions (.count(), .collect()) -- they silently double your compute cost.
  • Glue Docker uses Livy -- print() inside nested functions gets swallowed. Buffer output and print at the top level.
  • Accumulators can only be written on executors, never read -- limit your samples on the driver side.

The UDF debugger alone has saved me hours.

Tappi MCP Is Live - Give Claude Desktop a Real Browser

2026-02-20 11:14:31

Earlier today I published Every AI Browser Tool Is Broken Except One - a controlled benchmark of tappi against Playwright, playwright-cli, and OpenClaw's browser tool. Tappi went 3/3 with correct data at 59K tokens. The next-closest tool burned 252K. Before that, I introduced tappi in Your Browser on Autopilot - a local AI agent that controls your real browser, 10x more token-efficient than screenshot-based tools.

Today, tappi ships something I've been wanting since I started building it: an MCP server.

What Changed

pip install tappi now gives you everything - CDP library, MCP server, and AI agent. No more extras. No more tappi[mcp] vs tappi[agent]. One install, all features.

pip install tappi

And for Claude Desktop users who don't want to touch a terminal: a .mcpb bundle you double-click to install.

Why MCP Matters

MCP (Model Context Protocol) is the standard that lets AI agents use external tools. When tappi exposes its 24 browser control tools as an MCP server, any MCP-compatible client can use them:

  • Claude Desktop - ask Claude to browse the web, fill forms, extract data
  • Cursor / Windsurf - your coding agent can check docs, test deployments, scrape reference data
  • OpenClaw - tappi is already a ClawHub skill, now with MCP as a first-class transport
  • Any MCP client - tappi speaks stdio and HTTP/SSE

The difference between "Claude that can search the web" and "Claude with a real browser" is enormous. A real browser means:

  • Logged-in sessions. Your Gmail, GitHub, Jira, Slack - already authenticated.
  • No bot detection. It's your actual Chrome, with your fingerprint and cookies.
  • Shadow DOM support. Reddit, GitHub, Gmail - tappi pierces shadow roots automatically.
  • Cross-origin iframes. Captchas, payment forms, OAuth popups - coordinate commands handle them.

The .mcpb Bundle - Zero-Config Install

The .mcpb format is Claude Desktop's extension system. It's a zip file with a manifest and source code. Claude Desktop manages the Python runtime via uv - you don't need Python installed, you don't edit JSON configs, you don't run pip install.

  1. Download tappi-0.5.1.mcpb
  2. Double-click it
  3. Claude Desktop installs the extension
  4. Start Chrome with tappi launch or --remote-debugging-port=9222
  5. Ask Claude to browse

That's it. Here's a real conversation where Claude uses tappi MCP to open a browser, search Google for Houston events, and extract the results - all from a natural language prompt.

Claude Desktop using tappi MCP to search Google

Four Ways to Use Tappi

Tappi isn't just a CLI tool anymore. Here's the full picture:

1. MCP Server (for Claude Desktop, Cursor, etc.)

tappi mcp              # stdio (Claude Desktop, Cursor)
tappi mcp --sse        # HTTP/SSE (remote clients)

Or the .mcpb bundle for zero-config Claude Desktop install.

2. OpenClaw Skill

clawhub install tappi

OpenClaw's agent orchestration gives tappi access to cron scheduling, multi-session coordination, and cross-tool pipelines. If you need a browser step inside a larger workflow - email monitoring, social media automation, data collection - this is the integration point.

3. Standalone AI Agent

bpy setup              # one-time: pick provider, API key, workspace
bpy agent "Go to GitHub trending and summarize the top 5 repos"
bpy serve              # web UI with live tool-call visibility

Tappi's built-in agent has 6 tools (browser, files, PDF, spreadsheet, shell, cron) and supports 7 LLM providers including Claude Max OAuth - use your existing subscription, no per-call charges.

4. Python Library / CLI

from tappi import Browser

b = Browser()
b.open("https://github.com")
elements = b.elements()   # indexed list, shadow DOM pierced
b.click(3)
text = b.text()

Or from the command line:

tappi open github.com
tappi elements
tappi click 3
tappi text

A Note on Rate Limits

The Claude Desktop and claude.ai web interface have conservative rate limits. If you're doing heavy browser automation - scraping 50 pages, monitoring feeds, running cron jobs - you'll hit throttling fast.

For serious agentic work, use tappi through:

  • An API key (Anthropic, OpenRouter, OpenAI) via bpy agent - you control the rate limits
  • OpenClaw - designed for long-running, multi-step agent workflows
  • tappi's agent package (bpy agent) with Claude Max OAuth - same subscription, better throughput than the web UI

The .mcpb bundle and Claude Desktop are perfect for ad-hoc tasks - "check this page," "fill out this form," "extract this data." For pipelines and automation, use one of the agent approaches above.

What's Next

Tappi started as a 100KB CDP library. Now it's a full browser automation platform for AI agents - MCP server, standalone agent, web UI, Python library, and Claude Desktop extension. All from pip install tappi.

If you tried tappi before: update to 0.5.1. The MCP server is the biggest addition since the agent mode.

If you haven't tried it: github.com/shaihazher/tappi

pip install tappi
tappi launch
tappi mcp

Or just download the .mcpb bundle and double-click.

Previously: Tappi: Your Browser on Autopilot · Every AI Browser Tool Is Broken Except One

EU AI Act Article 6 — Is Your AI System High-Risk? A Developer Checklist

2026-02-20 11:12:06

I spent a weekend reading through the EU AI Act — all 144 pages of it. My goal was simple: figure out if the AI features I ship at work could land me (or my employer) in regulatory trouble.

The answer was buried in Article 6, the section that defines what counts as a "high-risk AI system." And honestly, it's not as straightforward as I expected.

So I built myself a checklist. Here it is.

What Article 6 actually says

Article 6 defines two paths to being classified as high-risk:

Path 1 — Your AI system is a safety component of a product (or is itself a product) covered by EU harmonisation legislation listed in Annex I. Think: medical devices, machinery, toys, cars, aviation equipment.

Path 2 — Your AI system falls into one of the use cases listed in Annex III. This is the one most software developers need to worry about.

If you match either path, you're high-risk. Full stop. That means conformity assessments, technical documentation, human oversight requirements, and penalties up to 35 million EUR or 7% of global turnover.

The Annex III checklist

Annex III lists eight categories. I turned them into yes/no questions you can run through in five minutes.

1. Biometric identification

  • Does your system identify people using face, voice, gait, or fingerprint data?
  • Does it categorize people by race, political opinions, religion, or sexual orientation?

If yes to either → high-risk.

2. Critical infrastructure

  • Does your system manage or operate road traffic, water, gas, heating, or electricity supply?
  • Is it a safety component of digital infrastructure?

If yes → high-risk.

3. Education and vocational training

  • Does your system determine access to education or vocational training?
  • Does it evaluate learning outcomes or detect cheating during exams?
  • Does it assess the appropriate level of education a person should receive?

If yes → high-risk.

4. Employment and worker management

  • Does your system screen or filter job applications?
  • Does it evaluate candidates during interviews?
  • Does it make or influence decisions on promotion, termination, or task allocation?
  • Does it monitor worker performance or behavior?

If yes → high-risk.

5. Access to essential services

  • Does your system evaluate creditworthiness or credit scores?
  • Does it assess eligibility for public assistance, benefits, or social services?
  • Does it evaluate insurance risk (health, life, property)?
  • Is it used for emergency dispatch (police, fire, medical)?

If yes → high-risk.

6. Law enforcement

  • Does your system assess the risk of someone committing a crime?
  • Is it used as a polygraph or to detect emotional states?
  • Does it detect deepfakes in criminal evidence?
  • Does it profile individuals during criminal investigations?

If yes → high-risk.

7. Migration and border control

  • Does your system assess risk from people entering the EU?
  • Does it evaluate asylum or visa applications?
  • Does it detect, recognize, or identify people in migration contexts?

If yes → high-risk.

8. Justice and democratic processes

  • Does your system assist judges in interpreting facts or applying law?
  • Is it used to influence the outcome of elections or referendums?

If yes → high-risk.

A quick code-level self-audit

I wrote a small function I run against my own projects. It checks for obvious high-risk indicators in your codebase:

HIGH_RISK_INDICATORS = {
    "biometric": ["face_recognition", "facial", "biometric", "fingerprint", "voice_id"],
    "hiring": ["resume_screen", "candidate_score", "applicant_rank", "cv_parser"],
    "credit": ["credit_score", "creditworth", "loan_eligibility", "risk_score"],
    "education": ["exam_proctor", "plagiarism_detect", "student_assess", "grade_predict"],
    "law_enforcement": ["recidivism", "crime_predict", "suspect_profile", "emotion_detect"],
    "infrastructure": ["traffic_control", "grid_manage", "water_system", "power_dispatch"],
}

def check_high_risk(source_files: list[str]) -> dict:
    """Scan source files for high-risk AI indicators."""
    findings = {}
    for filepath in source_files:
        with open(filepath) as f:
            content = f.read().lower()
        for category, keywords in HIGH_RISK_INDICATORS.items():
            hits = [kw for kw in keywords if kw in content]
            if hits:
                findings.setdefault(category, []).append({
                    "file": filepath,
                    "matched": hits
                })
    return findings

This isn't a compliance tool — it's a five-second sanity check. If it returns anything, that's your cue to dig deeper.

What if you're NOT high-risk?

Good news: most developer tools, content generators, coding assistants, and chatbots are not high-risk under Article 6.

But you still have obligations. Article 52 requires transparency for:

  • Systems that interact with humans (tell users they're talking to an AI)
  • Emotion recognition or biometric categorization systems
  • Deepfake generators (label the output)

And if you're building on top of a general-purpose AI model (GPT, Claude, Llama, Mistral), the model provider carries most of the burden under Article 53. But you — the deployer — still need to use the model in compliance with its intended purpose.

The timeline that matters

  • February 2025: Prohibited practices (Article 5) became enforceable
  • August 2025: Obligations for general-purpose AI models kick in
  • August 2026: Full enforcement of high-risk system rules (Article 6 + Annex III)

That gives you roughly six months from today to figure out your classification and, if needed, start the conformity process.

What I learned building this checklist

I built this after scanning my own projects and realizing I had zero documentation about risk classification. No assessment, no record, nothing.

Even if your system isn't high-risk, having a written self-assessment is worth the hour it takes. When August 2026 hits, regulators won't ask "are you compliant?" — they'll ask "show me your documentation."

If you want to automate the classification check across your codebase, I built an open-source MCP server that scans Python projects for EU AI Act compliance gaps: arkforge.fr/mcp. It detects frameworks, flags risk indicators, and generates a structured report.

This article is based on Regulation (EU) 2024/1689 (the AI Act). I'm a developer, not a lawyer — always verify with legal counsel for your specific situation.

What is Compute Engine? The Ultimate Guide to Google’s Virtual Powerhouse

2026-02-20 11:11:21

Imagine you’re building a massive Lego castle, but you don't have enough room in your house to store it. Instead of buying a bigger house, you rent a specialized, high-tech room that expands or shrinks based on how many bricks you’re using.

In the world of technology, Compute Engine is that high-tech room. It is a cornerstone of the Google Cloud Platform (GCP) that allows businesses and developers to run their software on Google’s world-class physical infrastructure. Whether you're launching a simple blog or a complex AI model, understanding what is Compute Engine is your first step toward mastering the cloud.

Understanding Compute Engine: The Basics

At its core, Compute Engine is an Infrastructure-as-a-Service (IaaS) product. It provides Virtual Machines (VMs)—which are essentially "computers within a computer"—hosted on Google’s data centers.

Instead of buying physical servers, setting them up in a noisy room, and worrying about electricity bills, you simply "rent" the processing power, memory, and storage you need via the web.

Simple Analogy: Think of Compute Engine like renting a car. You don’t own the vehicle, but you have the keys. You decide where to drive, what fuel to use, and who gets to ride with you. You are responsible for the driving (the software), but Google handles the oil changes and engine maintenance (the physical hardware).

Key Features of Compute Engine

To truly grasp what is Compute Engine, you need to look at the features that make it a favorite for developers:

1. Predefined and Custom Machine Types

Google offers "Predefined" setups for standard tasks, but the real magic lies in Custom Machine Types. You can choose exactly how much CPU power and memory you need.

  • Real-World Example: If your app needs a lot of memory but very little processing power (like a large database), you can build a "thin but strong" VM without paying for extra CPUs you won't use.

2. Global Network & Edge Locations

Your VMs live on the same lightning-fast fiber network that powers Google Search and YouTube. You can choose to host your server in specific "Regions" and "Zones" to be closer to your customers.

  • Simple Analogy: It’s like picking a warehouse location. If most of your customers are in London, you’d choose a London data center so their data doesn't have to travel across the Atlantic, reducing lag.

3. Live Migration

This is a "superpower" of Compute Engine. If Google needs to perform maintenance on the physical hardware your VM is sitting on, they move your running VM to a different machine without rebooting it.

  • Real-World Example: It’s like a mechanic fixing your car's engine while you’re still driving 60 mph down the highway—your music doesn't even skip a beat.

Compute Engine vs. The Alternatives

In the Google Cloud ecosystem, there are several ways to "run" code. Here is how Compute Engine stacks up against App Engine and Google Kubernetes Engine (GKE):

Feature Compute Engine (IaaS) App Engine (PaaS) Google Kubernetes Engine (CaaS)
Control Full Control (OS, Boot, Apps) Restricted (Code only) High (Container level)
Management Manual (You patch the OS) Fully Managed by Google Hybrid (Google manages the cluster)
Best For Legacy apps, Custom OS needs Web apps, APIs Microservices, Scalable containers
Analogy Building a house from scratch Staying in a hotel Living in a modular prefab home

Why Use Compute Engine? (Use Cases)

Why do companies choose Compute Engine over other options?

  • Lifting and Shifting: If you have an old app running on a physical server in your office, moving it to a VM in Compute Engine is the easiest way to get it into the cloud.
  • High-Performance Computing (HPC): Tasks like weather forecasting or genomic research require massive "brainpower." You can link thousands of Compute Engine VMs together to solve these problems.
  • Cost Savings with Spot VMs: You can get up to a 90% discount if you use "Spot VMs." These are spare resources Google lets you use, with the catch that they can take them back if someone else pays full price.
  • Simple Analogy: It’s like a "standby" flight ticket. It’s incredibly cheap, but you might get bumped if the flight fills up with full-fare passengers.

Actionable Takeaway: How to Start

Now that you know what is Compute Engine, the best way to learn is by doing. Google Cloud offers a Free Tier that includes one small (e2-micro) VM instance per month for free.