2026-03-17 22:01:53
Hey everyone! I've been building iOS apps for a while and kept copying the same utilities across projects, so I finally packaged them up as SPM libraries.
One-line modifier that moves your view when the keyboard appears.
TextField("Email", text: $email)
.keyboardAvoider()
Track ScrollView offset — great for collapsing headers.
OffsetTrackingScrollView { offset in
print(offset.y)
} content: {
// your content
}
Shimmer / skeleton loading effect for any view.
Text("Loading...")
.shimmer()
Wrapping HStack for tags and chips. Uses the Layout protocol.
FlowLayout(spacing: 8) {
ForEach(tags, id: \.self) { Text($0) }
}
Open App Store review page with one line.
AppStoreReview.open(appID: "123456789")
All MIT licensed, zero dependencies. Would love any feedback or suggestions!
2026-03-17 22:01:45
Claude Code has a settings file at .claude/settings.json. Most people never touch it. A few options change behavior significantly.
{
"defaultMode": "acceptEdits"
}
The default mode is "default" which asks for confirmation on file edits. "acceptEdits" accepts file changes automatically but still asks about shell commands. This is the setting that stops the "approve this edit" prompts.
{
"bash": {
"allowedCommands": ["npm", "git", "ls", "cat", "grep", "node"]
}
}
This controls which shell commands run without asking. List the specific commands your project uses. Use ["*"] to allow everything — reasonable for personal projects, worth thinking about before committing to a team repo.
{
"ignorePatterns": [
"node_modules/**",
"dist/**",
".env*",
"*.log"
]
}
Files Claude Code won't read or modify. This overlaps with .claudeignore but settings.json ignores apply at the tool level, before Claude even sees the file. Useful for keeping secrets and build artifacts out of context.
{
"model": "claude-opus-4-5"
}
Overrides the default model. Relevant if you have API access and want to control cost/speed tradeoffs for different tasks.
Three places it can live, in order of precedence:
.claude/settings.json in the project root — project-specific, checked into git (or not).claude/settings.local.json — project-specific, not checked in (use for personal overrides)~/.claude/settings.json — global, applies to all projectsThe project-level file overrides global. Local overrides project.
Minimal useful setup for most projects:
{
"defaultMode": "acceptEdits",
"bash": {
"allowedCommands": ["npm", "npx", "git", "ls", "cat", "grep", "node", "tsc"]
},
"ignorePatterns": [
"node_modules/**",
"dist/**",
".env*"
]
}
Commit this to the repo so everyone on the project gets consistent Claude behavior.
More configuration patterns — CLAUDE.md rules, settings combinations, and the prompts that change specific behaviors — are in the Agent Prompt Playbook. $29.
2026-03-17 22:00:56
In the rapidly evolving landscape of AI-driven automation, the OpenClaw
project stands out as a robust framework for managing complex agentic tasks.
At the heart of its latest release lies the agentic-loop-upgrade , a
sophisticated suite of features designed to bring reliability, safety, and
persistence to autonomous agents. Whether you are building an AI engineer or a
complex task orchestrator, understanding this upgrade is essential for modern
development.
The agentic-loop-upgrade is not just a patch; it is a foundational enhancement
to how OpenClaw handles execution. It shifts the paradigm from simple
'request-response' cycles to an observable state machine capable of planning,
executing, and recovering from errors without constant human intervention. By
integrating features like persistent state management and confidence gates,
OpenClaw allows developers to build agents that are both powerful and safe.
One of the biggest hurdles in agentic AI is context loss. The new state
manager in OpenClaw ensures that your agent's plans persist across sessions.
By storing the progress in ~/.openclaw/agent-state/, the agent knows exactly
where it left off, allowing for multi-day project execution without re-
prompting from scratch.
The createStepTracker utility acts as an analytical layer that monitors tool
outputs. Instead of blindly trusting the LLM to know when a task is finished,
the tracker analyzes tool results to confirm completion, ensuring high
fidelity in task execution.
Safety is paramount when agents interact with sensitive systems. The upgrade
introduces Approval Gates. You can define risk levels (low, medium, high,
critical) and set timeout parameters. If an agent attempts to execute a
'critical' action like rm -rf, it pauses for human approval. If no response
is received within the specified timeframe, it auto-proceeds or blocks,
depending on your configuration.
The retryEngine does more than just try again. It diagnoses the
failure—whether it's a network glitch or a permission error—and applies
intelligent fixes like injecting sudo or increasing timeout durations,
significantly improving the success rate of autonomous scripts.
LLMs have context windows, and they eventually fill up. ThecontextSummarizer manages this by compressing older messages into a summary
when a token threshold (e.g., 80k tokens) is reached, while preserving the
most recent interactions. This keeps the agent's 'mind' focused and
performant.
The checkpoint system allows developers to save the state of a long-running
task. If a process is interrupted, you can restore from a previous checkpoint,
injecting the previous plan status back into the agent’s context to resume
immediately.
With v2 features, OpenClaw can pull relevant facts and episodes from a
SurrealDB knowledge graph. By injecting ## Semantic Memory and ## Episodic blocks into the system prompt, the agent gains a 'long-term memory'
Memory
that improves over time.
Finally, the UI layer is now context-aware. If your agent is running in a
Discord channel, it will output clean emoji checklists. If it is in a Webchat,
it renders styled HTML cards. This ensures that the agent's progress is always
readable, regardless of the interface.
The true power of the OpenClaw agentic loop upgrade is unlocked through thecreateOrchestrator function. By centralizing the management of planning,
retries, and checkpointing, developers can create a unified, reliable
execution environment. If you are looking to scale your AI agent's
capabilities while maintaining strict control over risks and resources,
implementing these upgrades is the logical next step in your development
roadmap. The provided security summary reinforces that all these features are
designed with trust in mind, ensuring no unnecessary telemetry or credential
leakage occurs.
To get started, update your OpenClaw installation and begin by initializing
the orchestrator with your specific session requirements. You will find that
the stability of your autonomous agents increases almost immediately.
Skill can be found at:
mode-upgrades/SKILL.md>
2026-03-17 22:00:19
How do you let users write arbitrary SQL against a shared multi-tenant analytical database without exposing other tenants' data or letting a rogue query take down the cluster?
That's the problem we needed to solve for Query & Dashboards. The answer is TRQL (Trigger Query Language), a SQL-style language that compiles to secure, tenant-isolated ClickHouse queries. Users write familiar SQL. TRQL handles the security, the abstraction, and the translation.
This post is a deep dive into how it all works. We'll cover the language design, the compilation pipeline, the schema system, and the features that make TRQL more than just a SQL passthrough.
A DSL (domain-specific language) is a language designed for a particular problem domain. CSS is a DSL for styling. SQL is a DSL for querying databases. TRQL is a DSL for querying Trigger.dev data.
We could have exposed raw ClickHouse SQL directly. But there are three reasons we didn't:
1. The language itself is a security boundary. By defining our own grammar, we control exactly what operations are possible. INSERT, UPDATE, DELETE, DROP, and any ClickHouse function we haven't explicitly allowed simply don't exist in the language. This isn't validation that rejects dangerous queries; the parser physically cannot produce them. We cover this in more detail in the ANTLR section below.
2. Tenant isolation must be compiler-enforced, not user-trusted. In a multi-tenant system, every query must be scoped to the requesting organization. If we relied on users including WHERE organization_id = '...' in their queries, a missing filter would leak data across tenants. TRQL injects these filters automatically during compilation. There's no way to opt out.
3. Internal database details should be hidden. Our ClickHouse tables have names like trigger_dev.task_runs_v2 and columns like cost_in_cents and base_cost_in_cents. Users shouldn't need to know any of that. TRQL lets them write SELECT total_cost FROM runs while the compiler handles the translation.
4. We need features that don't exist in ClickHouse. Virtual columns, automatic time bucketing, value transforms, and rendering metadata are all things we've built into TRQL's schema layer. A raw SQL passthrough couldn't provide any of this.
A big thanks to PostHog who pioneered this approach with HogQL, a SQL-like interface on top of ClickHouse. TRQL started as a TypeScript conversion of their Python implementation but evolved significantly during development to handle our specific use cases.
Before we get into the language itself, it helps to understand the target. We chose ClickHouse as the analytical backend because it excels at exactly this kind of workload:
status and total_cost doesn't touch output, error, or any other columnIf you want to know more about how we run ClickHouse in production, we wrote a postmortem on a partitioning incident that goes into the internals.
TRQL is parsed using ANTLR, a parser generator that takes a formal grammar definition and produces a lexer and a parser. The lexer breaks the raw query text into tokens (keywords, identifiers, operators, string literals). The parser takes those tokens and arranges them into a structured tree based on the grammar rules. You write the grammar, ANTLR generates the code for both.
This is important for security. The grammar defines what the language can express. If DELETE, UPDATE, DROP, or SET aren't in the grammar, they can never appear in a parsed query. It's not that we validate and reject them. They literally don't exist in TRQL's syntax. This is security by construction, not by validation.
TRQL's grammar is a strict subset of SQL. If you've written SQL before, TRQL will feel completely familiar. SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT, and common aggregation functions all work as expected. But the grammar is physically incapable of expressing writes or administrative commands.
Our ANTLR grammar targets TypeScript and produces a full abstract syntax tree (AST) for each query. The AST is a structured tree representation of the query that the compiler can inspect, validate, and transform. Every subsequent step in the pipeline operates on this AST rather than on raw text.
For example, the query SELECT task_identifier, SUM(total_cost) FROM runs WHERE status = 'Failed' produces this tree:
SelectStatement
├── SelectList
│ ├── SelectListItem
│ │ └── ColumnReference: task_identifier
│ └── SelectListItem
│ └── AggregateFunctionCall: SUM
│ └── ColumnReference: total_cost
├── FromClause
│ └── TableReference: runs
└── WhereClause
└── ComparisonExpression (=)
├── ColumnReference: status
└── StringLiteral: 'Failed'
Each node in the tree is something the compiler can reason about. It can check that runs is a valid table, that task_identifier and total_cost exist on that table, that SUM is an allowed function, and that 'Failed' is a valid value for the status column.
Once parsed, the AST goes through a series of transformations before it becomes executable ClickHouse SQL. Here's each step:
Parse: The TRQL query is parsed into an AST using ANTLR. Only constructs that exist in the grammar can make it this far. Anything else is a syntax error.
Schema validation: We walk the AST and check every identifier against the table schemas. Does the table exist? Do all the referenced columns exist on that table? Are the functions valid? Are the argument types correct? If you write WHERE status = 123 but status is a string column with allowed values, this step catches it.
Tenant isolation: We inject tenant-specific filters into the WHERE clause. At a minimum, every query gets an organization_id filter. Depending on the query scope, we also add project_id and environment_id filters. These are added to the AST itself, so they're baked into the query structure before any SQL is generated. Without this step, any user could read any other organization's data.
Time restrictions: We add time bounds to prevent unbounded scans. Without this, a simple SELECT * FROM runs would attempt to scan the entire table history. The maximum queryable time range varies by plan on Trigger.dev Cloud.
Parameterize values: All literal values in the query (strings, numbers, dates) are extracted from the AST and replaced with named parameters like {tsql_val_0: String}. The actual values are passed separately to ClickHouse rather than being interpolated into the SQL string. Combined with the grammar restrictions from the parsing step, this means the generated ClickHouse SQL is always structurally safe.
Generate ClickHouse SQL: The transformed AST is "printed" into ClickHouse-compatible SQL. This is where virtual columns are expanded to their real expressions, table names are translated, and TRQL-specific functions are compiled to their ClickHouse equivalents.
Execute: The generated SQL is executed against ClickHouse in read-only mode. On Trigger.dev Cloud, queries run against a dedicated read-only replica to avoid impacting write performance.
Return results: Results come back in JSON format, along with column metadata that tells the UI how to render each value.
Here's the full pipeline visualized:
Let's make this concrete. Here's a simple TRQL query that finds the cost of each task:
SELECT
task_identifier,
SUM(total_cost) AS cost
FROM
runs
GROUP BY
task_identifier
And here's the parameterized ClickHouse SQL that TRQL generates:
SELECT
task_identifier,
-- `total_cost` is actually the sum of two columns and needs converting to dollars
sum(((cost_in_cents + base_cost_in_cents) / 100.0)) AS cost
-- Table names are translated and FINAL is used to avoid stale data
FROM trigger_dev.task_runs_v2 AS runs FINAL
WHERE
and(
and(
-- Tenant isolation: organization
equals(runs.organization_id, {tsql_val_0: String}),
),
-- Time restriction
greaterOrEquals(created_at, toDateTime64({tsql_val_1: String}, 3))
)
GROUP BY task_identifier
-- We limit results to 10k rows (we return an extra so we can tell the user if there are more)
LIMIT 10001;
Every step from the pipeline is visible here:
total_cost is a virtual column. Users write SUM(total_cost) but TRQL expands it to sum(((cost_in_cents + base_cost_in_cents) / 100.0)). The user never needs to know that costs are stored as two separate cent values in ClickHouse.runs to the actual trigger_dev.task_runs_v2 table. The FINAL keyword tells ClickHouse to read the latest merged data, which matters because ClickHouse uses a MergeTree engine that can have unmerged parts.equals(runs.organization_id, {tsql_val_0: String}). There's no way to query data from another organization because this filter is added by the compiler, not the user.greaterOrEquals(created_at, ...). Without this, the query would scan the entire history of the table.{tsql_val_0: String} prevent SQL injection. The actual organization ID and timestamp are passed as separate parameters to ClickHouse, never interpolated into the query string.The schema definition is where a lot of TRQL's power comes from. Each table is defined as a TypeScript object that describes not just the columns, but how they should be translated, validated, and rendered. Here's what's interesting about it.
TRQL currently exposes two tables:
runs: Every task run, including status, timing, costs, machine type, tags, error data, and other metadata. This is the primary table for understanding what your tasks are doing.metrics: CPU utilization, memory usage, and any custom metrics you record via OpenTelemetry. Metrics are pre-aggregated into 10-second buckets for efficient querying.Some of the most useful columns in TRQL don't exist in ClickHouse at all. They're defined as expressions that the compiler expands during query generation.
total_cost is a good example. In ClickHouse, costs are stored as two separate integer columns: cost_in_cents (compute cost) and base_cost_in_cents (invocation cost). The schema defines total_cost as:
total_cost: {
name: "total_cost",
expression: "(cost_in_cents + base_cost_in_cents) / 100.0",
// ...
}
When a user writes SELECT total_cost FROM runs, TRQL expands it to (cost_in_cents + base_cost_in_cents) / 100.0. The user gets a clean dollar amount without knowing about the internal storage format.
Other virtual columns follow the same pattern:
| User-facing column | Expression |
|---|---|
execution_duration |
dateDiff('millisecond', executed_at, completed_at) |
total_duration |
dateDiff('millisecond', created_at, completed_at) |
queued_duration |
dateDiff('millisecond', queued_at, started_at) |
is_finished |
if(status IN ('COMPLETED_SUCCESSFULLY', ...), true, false) |
is_root_run |
if(depth = 0, true, false) |
Users write WHERE execution_duration > 5000 and the compiler handles the rest.
ClickHouse column names are database artifacts. TRQL renames them to domain concepts:
| TRQL name | ClickHouse name |
|---|---|
run_id |
friendly_id |
triggered_at |
created_at |
machine |
machine_preset |
attempt_count |
attempt |
dequeued_at |
started_at |
This means we can refactor our ClickHouse schema without breaking user queries. The TRQL names are the stable public API.
Some columns need their values transformed at the boundary. For example, run IDs are stored in ClickHouse without a prefix, but users expect to write WHERE run_id = 'run_cm1a2b3c4d5e6f7g8h9i'. The schema defines a whereTransform that strips the run_ prefix before the value hits ClickHouse:
root_run_id: {
name: "root_run_id",
expression: "if(root_run_id = '', NULL, 'run_' || root_run_id)",
whereTransform: (value: string) => value.replace(/^run_/, ""),
// ...
}
The expression adds the prefix when reading (so results display run_...), and whereTransform strips it when filtering. Users never need to think about how IDs are stored internally. The same pattern applies to batch_id (stripping batch_) and parent_run_id.
Each column carries metadata that tells the UI how to display its values. The customRenderType field controls this:
| Render type | Behavior |
|---|---|
runId |
Displayed as a clickable link to the run |
duration |
Formatted as human-readable time (e.g. "3.5s") |
costInDollars |
Formatted as currency |
runStatus |
Rendered with colored status badges |
tags |
Displayed as tag chips |
environment |
Resolved to the environment slug |
This metadata is returned alongside query results, so the dashboard knows that 3500 in the execution_duration column should display as "3.5s", not as the raw number. The query engine isn't just returning data; it's returning instructions for how to present it.
Columns like status, machine, and environment_type declare their valid values directly in the schema:
status: {
name: "status",
allowedValues: ["Completed", "Failed", "Crashed", "Queued", ...],
// ...
}
These allowed values serve multiple purposes: the query editor uses them for autocomplete suggestions, the AI assistant uses them to generate valid queries, and the schema validator rejects queries that filter on values that don't exist.
TRQL includes functions that don't exist in ClickHouse. These are expanded during compilation into their ClickHouse equivalents.
The most important custom function. timeBucket() automatically selects an appropriate time interval based on the query's time range. You use it like this:
SELECT
timeBucket(),
COUNT(*) as runs
FROM runs
GROUP BY timeBucket
ORDER BY timeBucket
The compiler looks at the time range of the query and chooses bucket sizes that balance detail with performance:
| Time range | Bucket size |
|---|---|
| Up to 3 hours | 10 seconds |
| Up to 12 hours | 1 minute |
| Up to 2 days | 5 minutes |
| Up to 7 days | 15 minutes |
| Up to 30 days | 1 hour |
| Up to 90 days | 6 hours |
| Up to 180 days | 1 day |
| Up to 1 year | 1 week |
This matters for three reasons. First, users don't need to think about granularity. A chart that covers the last hour gets 10-second resolution. The same query over 30 days automatically switches to hourly buckets. Second, it prevents queries from returning millions of rows. Without automatic bucketing, a time-series query over a year of data could try to return a row for every 10-second interval. Third, and possibly most importantly, when you add a chart to a dashboard and adjust the time range, the chart will automatically switch to the appropriate bucket size.
Schema columns carry rendering metadata automatically (a duration column knows it should display as "3.5s"). But what about computed expressions? If you write SUM(usage_duration), the result is just a raw number with no formatting hint.
prettyFormat() solves this. It takes two arguments: an expression and a format type. The expression is passed through to ClickHouse unchanged, but the format type is attached as column metadata in the response so the UI knows how to render the result.
SELECT
timeBucket(),
prettyFormat(avg(value), 'bytes') AS avg_memory
FROM metrics
WHERE metric_name = 'process.memory.usage'
GROUP BY timeBucket
ORDER BY timeBucket
LIMIT 1000
The available format types are:
| Format type | Renders as |
|---|---|
duration |
Milliseconds as human-readable time (e.g. "3.5s", "2m 15s") |
durationSeconds |
Seconds as human-readable time |
costInDollars |
Dollar formatting with appropriate precision |
cost |
Generic cost formatting |
bytes |
Byte counts with binary units (KiB, MiB, GiB) |
decimalBytes |
Byte counts with decimal units (KB, MB, GB) |
quantity |
Large numbers abbreviated (1.2M, 3.4K) |
percent |
Percentage formatting |
This is the same rendering system that powers the schema's customRenderType, but available for any expression you write. The dashboard widgets use it to display computed values with the right units.
TRQL doesn't try to reinvent standard analytical functions. ClickHouse aggregations like quantile(), countIf(), avg(), sum(), and round() are all available directly and passed through to ClickHouse unchanged. TRQL only adds custom functions when it needs behavior that ClickHouse can't provide on its own.
The query editor in the Trigger.dev dashboard is built on CodeMirror 6 and uses a dual-parser architecture.
Syntax highlighting and linting are handled by two completely different parsers:
For highlighting, we use CodeMirror's built-in Lezer grammar with the StandardSQL dialect. Lezer is an incremental parser, meaning it only re-parses the parts of the document that changed. This makes it fast enough to run on every keystroke without any perceptible lag. It tokenizes the text into syntax nodes (keywords, identifiers, strings, numbers, operators) and our custom theme maps these to colors.
For linting, we use the full ANTLR4-based TRQL parser. Every edit (debounced by 300ms) runs the complete TRQL pipeline: parseTSQLSelect() produces a full AST, then validateQuery(ast, schema) checks it against the table schemas. This catches unknown columns, invalid table names, and type mismatches and shows them as inline diagnostics.
Why two parsers? Lezer is fast but doesn't understand TRQL-specific semantics like virtual columns or allowed values. ANTLR understands everything but is too heavy to run on every keystroke for syntax coloring. Using both gives us the interactive responsiveness of Lezer with the correctness guarantees of ANTLR.
Autocompletion is entirely custom. We don't use CodeMirror's built-in SQL completion. Instead, the completion source analyzes the cursor position and the surrounding query text to determine context:
FROM or JOIN: show table namesSELECT, WHERE, GROUP BY, ORDER BY: show columns from the tables referenced in the query, plus functionstableName.: show columns for that specific table= or IN (: show allowedValues from that column's schema definitionThis is where the schema really pays off. When you type WHERE status = ' the editor immediately suggests Completed, Failed, Crashed, and the other valid status values, because the schema declares them. The same allowedValues arrays that power validation also power autocomplete.
If you reference a column that doesn't exist, the linter catches it immediately and shows an inline error:
Every keystroke flows through three independent paths:
The TableSchema type is the glue that connects all three. It defines table names for FROM suggestions, column definitions for column suggestions, allowed values for enum completion, and validation rules for the linter.
We enforce several limits to keep the system healthy for all users. Each one exists for a specific reason:
SELECT * FROM runs would attempt a full table scan across the entire history. The allowed range varies by plan.TRQL is the foundation for everything we're building in observability. The same language powers the dashboard widgets, the SDK's query.execute() function, and the REST API. As we add more data to the system, we can expose it through new tables without changing the query language or the compilation pipeline.
If you haven't tried Query & Dashboards yet, every project already has a built-in dashboard waiting for you. Head to your Trigger.dev dashboard to try it out.
Read the companion post: Query & Dashboards: analytics for your Trigger.dev data
2026-03-17 22:00:00
In many backend teams, pull requests slowly grow into something nobody wants to review.
A developer starts working on a feature.
At first, it’s just a small change — maybe a new endpoint or a service update.
Soon another improvement gets added.
A small refactor follows.
Finally, a fix for something unrelated appears.
Two days later, the pull request contains 900 lines of changes across 14 files.
At that moment, something predictable happens.
The review slows down.
Not because the reviewers are lazy — but because large pull requests create cognitive overload.
And once reviews slow down, the entire development workflow begins to lose momentum.
Previous article in this category: https://codecraftdiary.com/2026/02/21/why-just-one-more-quick-fix/
Large pull requests create a subtle psychological effect.
When a reviewer opens a PR and sees:
+824 −217 changes
their brain immediately categorizes it as expensive work.
The result is predictable:
Eventually, when someone finally reviews it, they cannot realistically analyze everything in depth.
So one of two things happens:
Neither outcome improves delivery speed.
Now compare that to a PR that looks like this:
+42 −8 changes
This feels manageable.
A reviewer can realistically read the entire diff in a few minutes.
This leads to several important effects:
Small PRs get reviewed faster simply because they feel easier.
Many teams see review times drop from days to hours after adopting smaller pull requests.
When the code is smaller, reviewers can focus on design decisions instead of scanning files.
Instead of writing comments like:
“This file changed a lot, can you explain the logic?”
they can ask more meaningful questions:
“Should this logic live in the service layer instead?”
Large pull requests often hide bugs.
For example, reviewers may miss subtle mistakes when hundreds of lines change at once.
Small PRs isolate changes.
If a bug appears, it’s easier to trace it back to a specific change.
Consider a backend developer implementing a new feature:
Send notification emails when an order is shipped.
A typical large PR might include:
All combined into one big pull request.
Instead, this can be split into several smaller ones.
Add shipped_at column to orders table
Simple schema change.
Create OrderShipmentNotifier service
Pure backend logic.
Add SendShipmentEmail job
Background processing.
Trigger notification when order is shipped
Integration step.
Each PR becomes reviewable within minutes.
The entire feature moves through the pipeline much faster.
Small pull requests also improve CI pipelines.
Large PRs often cause several problems:
When a small PR fails CI, the cause is usually obvious.
Example:
PR title: Add OrderShipmentNotifier service
CI failure: NotificationServiceTest fails
The fix is straightforward.
Compare this to a massive PR with dozens of changes — the failure may come from anywhere.
Despite the benefits, many developers hesitate to split their work.
Common reasons include:
This is the most common argument.
But many PRs do not need to represent a finished feature.
They only need to represent a safe incremental step.
For example:
You can merge a service class before it’s used anywhere.
That’s perfectly valid.
This concern sounds logical but rarely holds in practice.
Ten small PRs usually move through the system faster than one large one.
Why?
Because multiple reviewers can process them quickly instead of blocking on one huge change.
At first, splitting work into smaller PRs requires discipline.
But after a few weeks, it becomes natural.
Developers start thinking in incremental changes rather than large feature drops.
Some teams use a simple heuristic:
If a pull request takes more than 10 minutes to review, it is probably too large.
This isn’t a strict rule, but it works surprisingly well.
Another guideline:
Beyond that, review quality drops quickly.
When teams consistently use small pull requests, several improvements appear over time.
PRs move steadily through the system instead of forming large queues.
Frequent merges reduce integration conflicts.
Instead of being a bottleneck, reviews become quick feedback loops.
And perhaps most importantly:
The team starts focusing on continuous delivery instead of large feature dumps.
Many teams try to improve their development workflow by introducing new tools, processes, or complex CI pipelines.
But one of the most effective improvements is surprisingly simple:
Keep pull requests small.
Small pull requests reduce friction in every stage of development:
They allow teams to maintain momentum — and momentum is often the most valuable resource in software development.
2026-03-17 22:00:00
It started simple. One Playwright script to capture the homepage.
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://myapp.com');
await page.screenshot({ path: 'homepage.png' });
await browser.close();
Then the team needed the pricing page. So I added another script. Then the dashboard (which needs login first). Then the settings page (which needs a specific tab clicked). Then mobile versions.
Two months later I had 14 Playwright scripts. Some shared a login helper. Some had hardcoded waits. One had a try-catch that silently swallowed errors because the cookie banner sometimes loaded and sometimes didn't.
I was maintaining a bespoke test suite, except it wasn't testing anything. It was just taking pictures.
Here's what those 14 scripts look like as config:
{
"hiddenElements": {
"myapp.com": [".cookie-banner", ".chat-widget"]
},
"screenshots": [
{ "name": "homepage", "url": "https://myapp.com", "selector": ".hero" },
{ "name": "pricing", "url": "https://myapp.com/pricing", "selector": ".pricing-grid" },
{ "name": "dashboard", "url": "https://myapp.com/dashboard", "selector": ".dashboard" },
{
"name": "settings",
"url": "https://myapp.com/settings",
"selector": ".settings-panel",
"actions": [
{ "type": "click", "selector": "[data-tab='notifications']" }
]
}
]
}
npx heroshot
All 14 screenshots. One command. No scripts to maintain.
In my scripts, I had this pattern everywhere:
try {
await page.click('.cookie-banner .dismiss', { timeout: 3000 });
} catch { /* maybe it didn't show */ }
With heroshot, you define hidden elements once per domain:
{
"hiddenElements": {
"myapp.com": [".cookie-banner"],
"docs.myapp.com": [".cookie-banner", ".announcement-bar"]
}
}
Every screenshot on that domain hides those elements automatically. No try-catch. No timeouts. No "maybe it showed, maybe it didn't."
The settings page needed a tab clicked first. In Playwright, that's 5 lines of setup. In config:
{
"actions": [
{ "type": "click", "selector": "[data-tab='notifications']" },
{ "type": "wait", "text": "Email preferences" }
]
}
There are 14 action types: click, type, hover, select_option, press_key, drag, wait, navigate, evaluate, fill_form, handle_dialog, file_upload, resize, and hide. Covers pretty much every pre-screenshot setup I've needed.
If you need complex conditional logic, dynamic data generation, or integration with a test framework, raw Playwright is the right tool.
But if you're just taking pictures of known pages at known states, config is simpler, more readable, and doesn't break when someone renames a helper function.