MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Why I Chose a Fine-Tuned 7B Model Over GPT-4 for High-Volume IT Support Ticket Routing

2026-04-07 23:08:49

How the “Distillation Revolution” of 2026 is shifting the enterprise focus from parameter count to parameter efficiency.

The 2026 Paradigm Shift: From “God Models” to “Expert Models”

For years, the mantra in Artificial Intelligence was bigger is better. We watched as parameter counts ballooned from billions to trillions, with the industry crowning a new “God Model” a massive, general-purpose LLM that could do everything from writing poetry to debugging legacy COBOL — every few months.

But as we moved into 2026, the honeymoon phase with massive models like GPT-4 ended. Enterprises faced a harsh reality: The Generalist Tax. When you use a 1.7-trillion parameter model to perform a narrow, repetitive task like classifying medical billing codes or routing IT tickets, you are paying for brainpower you don’t need. You are essentially hiring a NASA scientist to count change at a grocery store. It works, but it’s slow, expensive and a massive waste of resources.

In my role as a researcher, I faced this exact dilemma while architecting a support system for a large-scale institution. While I cannot share the proprietary internal data or the specific institutional weights due to strict privacy and security protocols, I have developed a parallel, identical demonstration model to share the findings of this journey. This article is a deep dive into why we transitioned our production pipeline for High-Volume IT Support Ticket Routing from a cloud-hosted frontier model to a locally fine-tuned Mistral-7B variant.

Small Language Models vs Generalist LLMs

Efficiency over Scale: Why fine-tuned expert models are outperforming generalist LLMs in specific enterprise tasks for 2026.

1. The Latency Wall: Why Milliseconds Matter at the Edge

In mission-critical IT environments, AI isn’t just a chatbot; it’s an automated dispatcher. It needs to keep up with the speed of a systems administrator’s operational workflow. If the AI is slower than the human it’s supposed to assist, it becomes technical debt.

Cloud LLM Latency vs Local Inference Speed

The Speed of Local: Local Mistral-7B inference is over 10x faster (200ms) than cloud-hosted alternatives by eliminating network round-trips.

The Problem with Cloud Inference

When using a massive cloud-hosted model, your request undergoes a long journey:

1. Network Latency: Data travels to the cloud provider’s gateway.
2.Queueing Latency: Your request waits in a multi-tenant buffer.
3.Compute Latency: The massive model calculates the response across dozens of GPUs.

In our institutional testing, GPT-4o averaged a Time To First Token (TTFT) of 850ms. A simple support ticket classification took nearly 2.5 seconds. In a global IT service desk processing 50,000 tickets a day, these seconds aggregate into 34 lost hours per day in mean-time-to-resolution (MTTR).

As illustrated in the Figure, the difference isn’t just a few milliseconds but it is a fundamental shift in how the data travels. By moving the brain to the edge, we eliminate the spiral of network wait-states shown in the cloud-hosted path

The 7B Alternative: Local Inference

By using a 7-billion parameter model (specifically the Mistral v0.3 architecture), we achieved Local Inference. Because a 7B model can fit into the VRAM of a single consumer-grade GPU, we eliminated the network round-trip. The total response time was under 200ms. Key Takeaway: If your application requires real-time automated dispatching, Bigger isn’t better it’s a bottleneck.

2. The Economics of Scale: Counting the Token Tax

The cost of our deployment is one of the most important considerations. We are always focused on the Total Cost of Ownership (TCO). The variable cost model of cloud APIs is a CFO’s nightmare.

Scenario: Processing 100,000 IT Support Tickets per Day

  1. GPT-4 (Standard Tier): $5.00 per 1M tokens (Input) + $15.00 per 1M tokens (Output).
  2. Monthly Estimated Cost: ~$12,000 USD.

The Fine-Tuned SLM (Small Language Model) Cost

By self-hosting our Mistral-7B on a single NVIDIA A100, the cost shifts from Usage to Infrastructure:

Annual Server Cost: ~$8,000
Electricity/Maintenance: ~$2,000.
Total Monthly Cost: ~$833 USD.
By moving to a fine-tuned small model, we reduced our operational costs by over 90% while gaining full control over our data privacy.

3. Accuracy: Does a 7B Model Know Enterprise IT?

The most common counterargument is a 7B model isn’t as smart as GPT-4. This is true for General Intelligence, but General Intelligence is a liability in a specific domain.

The Accuracy Paradox
A 7B model only needs to differentiate between an L2 Database Error and a L1 Password Reset Request.

GPT-4 (Base): 91.1% Accuracy.
Mistral-7B (Fine-Tuned): 94.5% Accuracy.

Why did the smaller model win? Focus. The fine-tuned 7B model has been over-fitted (in a positive, clinical sense) to our specific vocabulary, acronyms and routing architecture. It no longer guesses but it recognizes patterns with surgical precision.[2]

Fine-Tuned Mistral vs GPT-4 Accuracy

Better than the Giants: Fine-tuning a 7B model on domain-specific data results in higher classification accuracy (94.5%) compared to base generalist models.

4. Implementation: The Practitioners' Golden Path

LoRA Fine-Tuning Pipeline for Expert Models

The Expert Pipeline: Leveraging Human-in-the-Loop labeling and LoRA (Low-Rank Adaptation) to distill domain knowledge into efficient 7B parameter models.

Step A: Data Preparation: Quality distillation begins with structured data. We moved away from long, conversational datasets and focused on a strict Instruction-Output schema. This forces the model to ignore “noise” and focus purely on the mapping between a technical problem and a business action.

For our demonstration model, we utilized a synthetic dataset that mimics the high-stakes environment of corporate IT routing. Each entry follows this precise format:

Note: To comply with institutional security protocols and the EU AI Act’s data minimization principles, the proprietary internal dataset remains private. However, to ensure full reproducibility, I have curated and released a synthetic demonstration dataset that replicates the technical patterns of the production environment. You can take a look at the sample dataset in the HuggingFace link provided below:

rakshath1/it-support-mistral-7b-expert · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Step B: The Training Stack (Unsloth & LoRA): To achieve the 94.5% accuracy benchmark, we utilized Unsloth [3], an optimization library that allows for 2x faster training and 70% less memory usage. We applied Low-Rank Adaptation (LoRA) [1] to the Mistral-7B-v0.3 base model, targeting the attention modules where the expert knowledge resides.

By setting our Rank (r) to 16, we ensured the model was flexible enough to learn complex routing patterns without becoming so heavy that it sacrificed inference speed.

from unsloth import FastLanguageModel
import torch

# 1. Load the model in 4-bit for maximum memory efficiency
model, tokenizer = FastLanguageModel.from_pretrained(
 model_name = "unsloth/mistral-7b-v0.3",
 max_seq_length = 2048,
 load_in_4bit = True,
)

# 2. Add LoRA Adapters (The 'Expert' update)
model = FastLanguageModel.get_peft_model(
 model,
 r = 16, # The Rank: Determines the 'expressiveness' of the adapter
 target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
 lora_alpha = 16,
 lora_dropout = 0,
)

Step C: Verification and Local Deployment: Once trained, the model is exported to GGUF format. This is the final step in the Golden Path, as it allows the model to run on standard CPUs and local hardware without requiring a full Python environment.

You can verify the model’s performance yourself by pulling the live adapters from my repository. The following snippet demonstrates the inference speed we achieved (<200ms):

from unsloth import FastLanguageModel

# 1. Load the model and tokenizer in one go
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "rakshath1/it-support-mistral-7b-expert", # Your adapter
    max_seq_length = 2048,
    load_in_4bit = True,
)

# 2. Enable faster inference
FastLanguageModel.for_inference(model) 

# 3. Test ticket: Regional network failure in Mangalore
ticket_input = "### Instruction:\nTicket: 'VPN access denied for user in Mangalore office.'\n\n### Response:\n"

inputs = tokenizer([ticket_input], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 64)
response = tokenizer.batch_decode(outputs)

print(response[0])

Note: While the internal institutional weights remain private, a demonstration model trained on an identical synthetic dataset is available for testing.

Model Repository:

rakshath1/it-support-mistral-7b-expert · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Format: GGUF (for local testing) & Safetensors (for Python integration).

5. The Verdict: Large Models vs. Expert Adapters

I am not saying GPT-4o is bad but it is overqualified for repetitive tasks.

When to stay Large: Use GPT-4 or other models when you don’t know what the user will ask. If you need a model to reason through a new legal contract it has never seen, you need the massive parameter count of a generalist.
When to go Small (Experts): Use your fine-tuned 7B model when the task is narrow and high-volume. If you are processing 50,000 IT tickets, which can be repetitive you don’t need the model to know how to write a poem; you need it to know your software inside and out.

6. Conclusion: Small is Sustainable

As we navigate the AI landscape of 2026, it is becoming clear that smaller models are a moral choice just as much as a financial one. The environmental impact of training and running trillion-parameter models is immense; by contrast, a 7B model consumes only a tiny fraction of the power required for a 1.7T model inference. In an era where Green AI is no longer optional, efficiency is the ultimate sophistication.

By choosing to fine-tune, you aren’t settling for less intelligence you are choosing optimized intelligence. You are choosing speed that matches human thought, economics that satisfy a CFO and the sovereignty of owning your own weights. If your organization is still paying five-figure monthly API bills for repetitive classification tasks, you are essentially paying a Generalist Tax that is no longer necessary.

The “Small is the New Big” revolution is about empowerment. It’s about the fact that a researcher can deploy world-class AI on a single GPU. For those interested in testing the latency and accuracy benchmarks for themselves, I have released the LoRA adapters and a GGUF quantized version of this IT Expert on Hugging Face. While the dataset is synthetic to protect institutional privacy, the architecture and the logic remain identical to the production environment. The era of the “God Model” for every task is ending. The age of the Distilled Expert has begun.

References

  • Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, Weizhu. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685. https://arxiv.org/abs/2106.09685
  • Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., & Lample, G. (2023). Mistral 7B. arXiv preprint arXiv:2310.06825. https://arxiv.org/abs/2310.06825
  • Unsloth AI. (2024). Performance Benchmarks and Memory Optimization for Fine-Tuning. Unsloth Documentation. https://unsloth.ai/blog/mistral-benchmark

Connect with me on Medium and LinkedIn

Medium:https://medium.com/@rakshathnaik62
LinkedIn:https://www.linkedin.com/in/rakshath-/

Treesize le logiciel par excellence pour la gestion de vos fichiers et PC

2026-04-07 23:08:48

Treesize est un logiciel qui analyse votre espace disque et qui affiche tous les sous-dossiers d'un disque ou repertoires selectionnés.
La version gratuite (TreeSize Free) est essentiel pour un developpeur afin d'identifier rapidement les fichiers volumineux, les dossiers de dependances(node_modules) ou les "builds" oubliés qui saturent le disque. Il permet de visualiser l'espace occupé de maniere hiérachique et graphique pour libérer de la memoire et améliorer les performances .

Importance pour un developpeur

image montrant l'utilisation de Treesize Free

Nettoyage Rapide

Identificaton immédiates des dossiers volumineux sur votre pc que vous n'utilisés pas et qui deviennent én
ormes ,permettant de libérer rapidement des gigaoctets. Exemples:node_modules,builds ou cache de build, cache npm , cache bun

Visualisation graphque:

l'interface affiche sous forme d'arborescence les elements les plus lourds , rendant la gestion de stockage intuitive.

Suppression des projets :

Utile pour detecter les projets volumineux qui ne sont utilisés ou de vos projets qui dorment dans vos cimétieres de projet et qui prennent de l'espace , les environnements virtuels inutilisées qui encombrent le disque.

Les versions de TreeSize

Treesize presentent deux versions :
-TreeSize Free: Gratuit et ideal pour l'analyse locale et pour les developpeurs ou particuliers individuels.
-TreeSize Pro: Paynt et ideal pour l'analyse de serveurs , automatisation et recherche avancée de doublons

What the Hell is a Token?

2026-04-07 23:07:14

Months after ChatGPT launched, I still could not have told you what a token was. I had been using it since the first public launch and was basically having novel-long conversations with it. I had no idea that every time I hit "enter," my text was being chopped into pieces before the model even looked at it.

It turns out, those pieces (tokens) determine your usage limits, how much the AI can remember, and why it sometimes seems to forget things you told it.

So. Tokens.

Here is what I wish I understood earlier.

They are not words

I assumed "one token = one word," but that is not actually the case. A token is a chunk of text; it may be a whole word, part of a word, or punctuation. The word "hamburger" gets split into two tokens: h and amburger. Not "ham" and "burger". The splits are not based on syllables, like you might expect.

Here are a few more to make the point: "infrastructure" becomes inf and rastructure. "Unbelievable" becomes three tokens: un, belie, and vable. These splits look strange, but they are consistent. The same word always produces the same tokens. This isn't arbitrary; there is a method behind the madness...

The reason Large Language Models (LLMs) need to do this is that they don't actually work with text at all. They work with numbers. Tokenization is the step where human-readable text gets converted into a sequence of numbers the model can process. Each token maps to a number, and the model does all of its "thinking" in that numerical space. A "tokenizer" is basically a translation layer between your words and the model's math.

The splits themselves are not random either. Tokenizers are trained to find the most common patterns in language. A whole common word like "the" gets its own single token. Less common words get broken into reusable pieces that appear across many different words. That un in "unbelievable" is something the model has seen in hundreds of words: undo, unfair, unlikely, unusual. By splitting it out, the model learns what "un" means as a concept, not just as part of one specific word. The splits are chosen to maximize what the model can learn from the patterns in language.

So, essentially a tokenizer's job is to convert each chunk into a number that the model can work with, and that is done the same way every time. That consistency is what makes the math work.

Why should you care?

Because tokens are what determine your usage limits.

Most people use AI through a free tier. Free tiers do not charge you, but they do limit how many messages you can send per day or per hour. When you hit that cap and get the "you have reached your limit" message, it is because you used too many tokens. The longer your conversations get, the faster you burn through your allowance.

Even on a paid plan, tokens are the unit of measurement. Services price by the token, and input tokens (what you send) and output tokens (what the AI generates) are counted separately. To give you a sense of scale: pasting a 2,000 word document uses roughly 2,700 tokens. A detailed response might be another 800. At typical rates, that entire exchange costs less than two cents. For casual use, the cost is negligible. But the usage limits are very real.

The "context window" connection

You have probably seen numbers like "128K context" or "200K tokens" thrown around. That is the model's memory limit for a single conversation. It is measured in tokens because that is what the model actually works with.

If you have ever had an AI "forget" something you told it earlier in the conversation, there is a decent chance you hit the token limit. Everything past that boundary just falls off and is gone.

(We will get into context windows properly in one of the next posts. For now, just know that tokens are the unit of measurement for everything.)

What this means for you

If you are just chatting with an AI casually, you probably do not need to worry about tokens too much. The free tiers are generous enough for most conversations.

But here is something worth understanding. Every message you send in a conversation includes the entire conversation history. The AI doesn't just receive your latest message; it receives everything back to the start of the conversation, plus your new message, every time you hit "enter". So a chat that starts at 500 tokens per exchange can quietly grow to 10,000 or 20,000 tokens per exchange by message 30, because the whole history is being sent every time. That is where usage caps and missing context usually come from.

Pro tip: start new conversations frequently to avoid this and to keep the focus concentrated on the task at hand. Aside from staying under your usage limits, you will also get the benefit of more helpful responses to your current questions. Remember that when you change topics, the LLM is still considering the things you brought up with it before, even if they are unrelated. Understanding this is a prerequisite to understanding good prompt engineering.

Where tokens really start to matter is when you are building things. Automating workflows, processing documents, or running agents that make multiple calls. That is when tokens stop being an abstract concept and start being a line item in your budget.

Next time: do you actually need to care which AI you use? Honestly, it depends, but probably not the way you think...

If there is anything I left out or could have explained better, tell me in the comments.

I built a governance layer for AI agents after watching them fail silently in production

2026-04-07 23:03:33

Picture this: a healthcare AI agent is triaging patient intake. It's running on a solid model, well-prompted, tested in staging. In production, a patient describes symptoms that match two possible care pathways — one urgent, one routine. The agent picks routine. No error is thrown. No log entry flags it. No human is notified. The patient waits three days for a callback that should have been a same-day referral.

Nobody finds out until a follow-up call two weeks later.

I'm not describing a real incident. But I've talked to enough people shipping agents into healthcare, fintech, and legal workflows to know this scenario isn't hypothetical — it's a near-miss waiting in every ungoverned production agent.

The actual problem

When we started shipping AI agents into regulated environments, the agents themselves weren't the problem. The problem was what surrounded them. Or didn't.

No audit trail. When something went wrong, we had inference logs at best — token inputs and outputs, no semantic record of why a decision was made or what policy it touched.

No rollback. If an agent executed a bad action — sent a message, wrote a record, triggered a workflow — we had no native mechanism to undo it or even flag it for review.

No explainability. When a compliance officer asked "why did your agent do that?", the honest answer was "we don't know, here's the prompt."

No governance gate. Actions executed on intent match. There was no intercept layer that could say: this action requires human review before proceeding.

In consumer apps, that's a bad UX. In regulated industries, that's liability.

What we built

DingDawg is a governance layer that wraps any AI agent and intercepts every action before it executes. It's MCP-native, which means it slots directly into Claude Code, Codex, and Cursor without custom middleware. It also works with any Python agent via a two-line install.

pip install dingdawg-loop
from dingdawg import schedule_governed

schedule_governed(agent_id="@hipaa-intake", cron="0 9 * * *")

That's it. Every action the agent takes is now routed through a governance gate before execution.

What the governance receipt looks like

Every governed action produces a receipt:

{
  "action_id": "act_9f3a21bc",
  "agent_id": "@hipaa-intake",
  "timestamp": "2026-04-06T09:00:14Z",
  "action": "route_patient",
  "policy_result": "BLOCKED",
  "lnn_trace": {
    "features": [
      { "name": "symptom_urgency_score", "weight": 0.84, "direction": "ESCALATE" },
      { "name": "prior_visit_flag", "weight": 0.61, "direction": "ESCALATE" },
      { "name": "routing_decision", "weight": -0.91, "direction": "CONFLICT" }
    ],
    "explanation": "Agent routing conflicts with urgency signal at 0.84 confidence. Human review required before execution."
  },
  "ipfs_cid": "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi",
  "policy_version": "hipaa-v2.1"
}

The LNN causal trace is not a black-box score. It's a weighted feature explanation — you can see exactly which signals triggered the block and why. The ipfs_cid is a content-addressed, immutable proof stored on IPFS. Your regulator can verify it. You cannot alter it after the fact.

The open-core model

The SDK, governance primitives, LNN trace engine, and MCP integration are Apache 2.0. Free. Open on GitHub at github.com/dingdawg/governance-sdk.

The cloud tier adds multi-agent orchestration, managed IPFS pinning, enterprise policy management, and a creator marketplace where governance plugins can be published and monetized. We think the core infrastructure should be auditable. You shouldn't have to take our word for it on something this critical.

The regulatory window is closing

EU AI Act enforcement starts August 2026. It requires audit trails, explainability, and human oversight mechanisms for high-risk AI systems — healthcare, hiring, credit, law enforcement, critical infrastructure.

Colorado SB 205 hits June 30 2026. Narrower but sharper — specifically targeting consequential automated decisions with a right-to-explanation requirement.

If you're shipping agents in any of these domains and you don't have governance infrastructure in place, you're building technical debt that will be expensive to retrofit under deadline pressure.

Try it

Free harness score — 2 minutes, shows exactly where your agent governance gaps are: dingdawg.com/harness

Free compliance scan:

pip install dingdawg-compliance

If you're shipping agents in regulated environments, I'd genuinely like to hear what you're running into. The governance problem is underspecified and we're building in public.

Trunk-Based Development with Short-Lived Branches

2026-04-07 23:03:24

Why Long-Lived Branches Kill Velocity

You've seen it. A feature branch that started two weeks ago. It's 47 commits behind main. Three people are waiting on it. The merge conflict is 400 lines. Nobody wants to review it because reviewing 2,000 lines of diff is nobody's idea of a good time.

Long-lived branches are where productivity goes to die. And when you add an AI agent to the mix, they get even worse. The agent writes code against the branch state. Main moves on. By the time you merge, half the agent's assumptions are wrong.

Trunk-based development fixes this. The rule is simple: branches live for hours, not days. Merge to main early and often. Keep main releasable at all times.

Trunk-based development doesn't necessarily mean merging changes straight to main. In my view, it's more about ensuring everything works together to really take advantage of CI. Short-lived branches give us this, as well as the safety net that many developers prefer. Concern about pushing directly to main is a developer preference. Personally, I prefer not to.

The Workflow

Here's what a typical feature looks like in this project:

  1. Branch — create a branch from main: feat/PROJ-431-dashboard-migration
  2. Build — write tests, implement the feature, run make lint && make test
  3. PR — open a PR. Small diff. Clear description. Conventional commit title.
  4. CI — GitHub Actions runs the full pipeline (lint, test, test-js)
  5. Merge — once CI is green, merge to main
  6. Deploy — CI triggers a Forge deployment webhook. Staging updates automatically.

The entire cycle (branch to merged) is usually same-day. Sometimes within an hour for smaller changes.

145 PRs in 3 Months

This project has 258 commits across ~3 months. 145 of those went through pull requests. That's roughly 1.6 PRs per day, every day.

Most PRs are small. A refactoring extraction. A test coverage expansion. A bug fix. A single feature. The biggest PRs were the frontend migration (Tailwind, jQuery removal), and even those were broken into sequential stages.

Small PRs have compounding benefits:

  • Easier to review — you can actually read the diff
  • Easier to revert — if something breaks, git revert one PR, not a 2,000-line changeset
  • Faster CI — smaller changes mean fewer test failures to debug
  • Less merge conflict risk — you're never far from main

Conventional Commits

Every commit follows the conventional commits format:

feat: add GET /api/dashboard endpoint (PROJ-430) (#130)
fix: resolve planner bugs (PROJ-432) (#131)
refactor: extract CreateOrderAction from OrdersController::store() (#80)
test: expand OrdersController test coverage (#59)
docs: document legacy Blade vs React SPA architecture (#119)
ci: add workflow_dispatch trigger for manual CI runs
chore: remove legacy frontend dependencies and dead code (#103)

This isn't just aesthetics. Conventional commits create a machine-readable history. You can:

  • Generate changelogs automatically
  • See at a glance whether a commit is a feature, fix, or refactoring
  • Train an agent to follow the same convention (it will, if every existing commit uses it)

The commit message is a contract. feat: means new functionality. fix: means something was broken and now it's not. refactor: means the behavior didn't change. When the agent writes a commit message, these prefixes help me triage without reading the diff.

The CI Pipeline

Every push to main triggers the full pipeline:

Build  Code Quality  Tests  Deploy
         (make lint)    (make test + make test-js)

The pipeline runs in Docker containers built from the same docker-compose.yml as local development. Same PHP version. Same Node version. Same MySQL. If it passes locally, it passes in CI.

The deploy step triggers a webhook with our cloud provider that pulls the latest code, runs migrations, rebuilds assets, and restarts workers:

cd staging.example.com
git pull origin main
composer install --no-dev --optimize-autoloader
php artisan migrate --force
npm ci && npm run build
php artisan queue:restart
php artisan config:cache
php artisan route:cache
php artisan view:cache

Staging updates within minutes of a merge to main. Production deploys are triggered manually (or by the same webhook on the production server) after staging verification.

Infrastructure: Queue Workers and Redis

The deployment isn't just the web app. We also manage background infrastructure:

Queue workers process async jobs: CRM sync, notification dispatch, and background calculations. The Forge server runs supervised workers:

php artisan queue:work redis --queue=default,crm --sleep=3 --tries=3

The queue:restart in the deploy script gracefully restarts workers so they pick up the new code.

Redis backs the queue and can optionally back the cache. Separate Redis databases (DB=0 for cache, DB=1 for queues) prevent queue operations from evicting cached data.

The Docker Compose stack mirrors this:

redis:
  image: redis:7-alpine
  profiles: [queue]

queue-worker:
  build: .
  command: php artisan queue:work redis --queue=default,crm
  profiles: [queue]
  depends_on: [redis, mysql]

The profiles key means queue infrastructure only starts when you explicitly ask for it (docker compose --profile queue up). Local development doesn't need Redis running unless you're testing queue jobs.

The E2E Database

E2E tests (Playwright) run against a separate database: myapp_e2e. This gets its own migration and seeding:

make migrate-e2e    # Run migrations on E2E database
make seed-e2e       # Seed test users with proper roles, permissions, relationships

The E2E seeder creates users with known credentials and realistic data. It's idempotent — running it twice doesn't create duplicates.

In CI, the E2E job spins up the full Docker stack (app, nginx, mysql) and runs Playwright against it. Same app, same database engine, same infrastructure as production. The only difference is the data is seeded, not real.

Continuous Delivery (Not Continuous Deployment)

An important distinction: we practice continuous delivery, not continuous deployment.

Every merge to main is deployable. The pipeline proves it: tests pass, linting passes, the build succeeds. But deploying to production is a conscious decision, not an automatic one.

This matters because:

  • Some features are gated behind environment checks or feature flags
  • Some changes need manual verification on staging first
  • Production deploys happen when we decide, not when the CI pipeline finishes

The codebase is always releasable. Whether we release is a business decision, not a technical one.

How This Enables Agent-Assisted Development

Trunk-based development + CI + conventional commits create something crucial for working with an AI agent: a fast, reliable feedback loop.

When Claude writes code:

  1. The tests tell me if it works (seconds to minutes)
  2. The linter tells me if it's clean (seconds)
  3. CI confirms both in an environment I trust (minutes)
  4. If it passes, I merge. If it doesn't, Claude fixes it.
  5. The conventional commit tells me what changed without reading the diff.

There's no "let me review this 2,000-line PR over the weekend." It's: did it pass? Merge. Did it fail? Fix. Ship it. Move on.

Dave Farley calls this "optimizing for feedback." The faster you know whether a change worked, the faster you can iterate. Trunk-based development with CI gives you feedback in minutes, not days.

The Takeaway

  1. Branches live for hours. If your branch is older than a day, something's wrong.
  2. Small PRs, merged often. 145 PRs in 3 months. Each one small enough to review in minutes.
  3. Conventional commits are a communication protocol. Both for humans reading the log and agents writing commits.
  4. CI is the source of truth. If it passes CI, it's good. If it doesn't, fix it before merging.
  5. Continuous delivery means always releasable. Deploy when you want, not when you have to.
  6. Infrastructure is code. Docker, queue workers, Redis, deploy scripts: all versioned, all reproducible.

The combination of tests, linting, CI, and trunk-based development creates a system where changes are small, verified, and frequent. That's exactly the system an AI agent thrives in.

I’m not an 'IT Guy,' but I’m building a SaaS to save my industry from Spreadsheet Hell.

2026-04-07 23:01:37

Kaptiq

Hi Dev.to Community,

Sumit here. Full disclosure: I’m not a developer, heck, I’m not even an "IT guy." I’m a Mechanical Engineer working as a Project Manager in the EPC industry.
I started building Kaptiq out of pure frustration. I was drowning in spreadsheets, endless emails, and disconnected tools. Traditional ERPs are either too expensive or too rigid for smaller firms, and it turns out almost everyone in EPC faces this same mess.
Kaptiq isn't a "solve-everything" silver bullet yet, but it’s a step taken by someone working right in the heart of the problem.
I’m here to learn from the best in this community. What I’ve built is an MVP that still needs plenty of polishing, and your feedback is exactly what I need to take it to the next level.

Thanks.