MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

From Just a Scanner to a Smart Agent: How I Improved my SEO Prospecting Tool 🐍

2025-11-29 00:16:15

I recently built a prospecting agent with Python to find local businesses on Google’s lower-ranked pages and pitch them SEO services.

The initial version was... promising but flawed.

It tried to pitch Indeed.com because they didn't have a local phone number. It told Ford Dealerships their site was "down" because their firewall blocked my bot. It sent robotic emails starting with "Fail: H1 Missing" ... not exactly a charming opener.

I realized that to make this tool useful, I needed to move from a simple scraper to a true agent. Here is the breakdown of how I refactored the code to filter noise, crawl for contacts, and use GenAI to write personalized campaigns.

Step 1: Filtering the Noise (The "No-Go" List)

The first problem with scraping generic keywords is that half the results aren't businesses, they are directories, job boards, and government sites. My script was wasting resources auditing ZipRecruiter and Texas.gov.

The Fix:
I made the clean_and_deduplicate function even more robust with a strict blocklist. I expanded the existing blocklist significantly. We categorized domains into "Job Boards," "Government," "Social Media," and "National Brands" (like Penske) that wouldn't hire a local agency anyway.

# We filter these out before we even attempt an audit
DIRECTORY_DOMAINS = [
    'indeed', 'glassdoor', 'ziprecruiter', # Job Boards
    '.gov', 'texas.gov', 'fmcsa',          # Gov sites
    'yelp', 'yellowpages', 'bbb.org',      # Directories
    'penske', 'uhaul', 'ford.com'          # National Brands
]

def is_directory(url):
    if any(domain in url.lower() for domain in DIRECTORY_DOMAINS):
        return True
    return False

Result: The list went from ~200 "leads" to ~70 actual local businesses.

Step 2: Smarter On-Page Auditing

My original script checked for H1 tags using exact string matching. If the keyword was diesel mechanic and the H1 was Best Diesel Mechanic in Texas, the script marked it as a FAIL.

The Fix: Fuzzy Logic
I switched to token-based set matching. If the H1 contains a significant percentage of the target keywords (over 50%), it passes.

# Breaking strings into sets of words for flexible matching
required_words = set(keyword.lower().split())
found_words = set(h1_text.lower().split())

# Calculate intersection
matches = required_words.intersection(found_words)
match_percentage = len(matches) / len(required_words)

# If >50% overlap, it's a Pass.
if match_percentage >= 0.5:
    audit_data['H1_Audit_Result'] = "Pass"

Step 3: Distinguishing "Broken" from "Blocked"

Originally, if a site returned a 403 Forbidden, my script flagged it as "Actionable: Server Error." Pitching a client saying "Your site is down" when it's actually just secure is a great way to look incompetent.

The Fix: Handling Firewalls
I updated the requests logic to explicitly catch 403 and 406 errors and mark them as SKIP. Now, the agent only flags genuine connection errors (like 500 or SSLError) as actionable leads.

except RequestException as e:
    # If the server explicitly blocked us (Firewall/WAF), it's not a lead.
    if response.status_code in [403, 406, 429, 503]:
        audit_data['Error_Status'] = "Blocked" 
        return audit_data  # Stop processing, we will filter this out later

    # Real connection errors (DNS failure, Timeout) are actual leads 
    # We want to pitch "Site Health" services to these.
    audit_data['Error_Status'] = f"Error: {e.__class__.__name__}"

Step 4: The "Gap Analysis"

This was the strategic game-changer. A site with a missing H1 tag isn't necessarily a good lead. But a business with 50 five-star reviews and a missing H1 tag? That is a gold mine.

I integrated a secondary API call to fetch the Google Business Profile (GBP) ratings for every prospect to identify "Hidden Gems": businesses with great real-world reputations but poor digital presence.

# We categorize the lead before generating the pitch
is_gbp_strong = gbp_rating >= 4.0 and gbp_reviews >= 10
is_gbp_missing = gbp_rating == 0

# Strategy A: Strong GBP + Weak Site = "Your site hurts your reputation"
# Strategy B: No GBP + Weak Site = "You are invisible"

Step 5: The "Missing Link" - The Crawler

At this point, I had great prospects, but I was missing the most important piece of data: The Email Address. Many local businesses don't put their email in the Header; they hide it on the "Contact Us" page.

The Fix: The Spider Logic
I upgraded the agent to act like a human user:

  1. Scan Home: Look for mailto: links or regex matches.
  2. Heuristic Scoring: If no email is found, scan all links and score them. /contact-us gets 100 points, /about gets 30 points.
  3. The Hop: The agent navigates to the highest-scoring URL and scrapes that page.
def find_best_contact_url(soup, base_url):
    # Heuristic Scoring Logic
    score = 0
    if 'contact' in url_path: score += 100
    if 'contact' in link_text: score += 50
    if link_is_in_footer: score += 10

    # Returns the URL with the highest score to crawl next
    return best_candidate

This logic alone saved ~40% of leads that would have otherwise been discarded as "No Contact Info."

Step 6: From Template to GenAI (Gemini Integration)

Finally, I tackled the outreach itself. My previous email template was rigid and impersonal. I wanted a 3-email sequence that felt human.

The Fix: Google Gemini 2.5 Flash
I integrated the Gemini API (which is proven to be fast and cost-efficient). Instead of using a fixed string, I feed the Audit Data + GBP Data into a prompt.

The AI generates a 3-stage campaign:

  1. Email 1: The Hook (Referencing their specific Reputation vs. Site Gap).
  2. Email 2: The Value (Educational content about the error found).
  3. Email 3: The Breakup (Professional closing).
# Feeding the Gap Strategy into the LLM
prompt = f"""
    PROSPECT: {company}, Rating: {gbp_rating} stars.
    ISSUES: H1: {h1_status}, NAP: {nap_status}

    STRATEGY:
    1. If rating > 4.0, praise reputation but warn about site errors.
    2. Explain WHY {h1_status} kills rankings.
    3. Gentle breakup.

    OUTPUT FORMAT: JSON {{ "subject_1": "...", "body_1": "..." }}
"""

model = genai.GenerativeModel('gemini-2.5-flash')
response = model.generate_content(prompt)

The Result

The agent now runs autonomously. It scans SERPs, filters junk, crawls for emails across multiple pages, and uses LLMs to write custom campaigns.

The Metrics:

  • Raw Scrape: 200 URLs
  • After Cleaning Directories: 70 Businesses
  • Actionable Leads (With Emails): ~30 High-Quality Prospects

Key Takeaway: When developing an Agent or any tool in general, Iteration is King. You have to be able to know what you currently have and what's missing to reach that optimal output. In my case, the difference between "just a script" and an "agent" is the ability to handle imperfection, hopping pages when data is missing, understanding context, and generating dynamic output. This project has become something I look forward to working on and the most exciting part is that there's still room to grow.

🔗 Check out the Code:
You can find the full source code and contribute to the project on GitHub:
https://github.com/Rafa-romero-dev/seo-agent

A special thank you to the Dev.to team for featuring my previous article in the Top 7 Featured Dev Posts of the Week!

What do you think I should focus on next? What could use some refinement? Let me know in the comments!

Trust the Server, Not the LLM: A Deterministic Approach to LLM Accuracy

2025-11-29 00:16:03

🚫 Zero Mental Math: An Anti-Hallucination Architecture for LLM-Driven Analysis

A six-layer system for achieving 100% accurate numerical reporting from Large Language Models

🎯 The problem

I built an MCP server that extracts data from my MT5 terminals on a VPS. Basically its a load of financial data reports, like trades, averages, technical indicators etc.

I built it all out and I realized that my LLM would randomly hallucinate random things, for example it would say there was a 16th trade when there only had been 15 trades for that day.

When it comes to financial reporting I realize there is probaly a lot on this topic, so I grabbed some ideas from a lot of the latest research on RAG topics, and i threw something together.

I wrote tests that actually test the accuracy of the results of my embeddings over a period of 10 times, and each MCP tool has 100% accuracy on end to end integration tests.

I had the AI summarize it, but if anyone is curious about the exact code maybe I can open source a repeatable process, but i'm hoping from this Article you will have everything you need.

( incoming AI gen content )

📋 Abstract

Large Language Models (LLMs) are fundamentally pattern matchers, not calculators. When asked to analyze data, they generate "plausible-looking" numbers based on statistical patterns in training data—not deterministic computation. This is catastrophic for domains requiring precision, such as trading analysis, financial reporting, or medical diagnostics.

This document describes the Zero Mental Math Architecture, a multi-layered system that achieves accurate numerical reporting by shifting all computation to deterministic Python code and reducing the LLM to a "citation copy machine."

⚠️ The Core Problem

🤖 LLMs Hallucinate Numbers

Given raw trading data, an LLM will confidently state:

"Your win rate is approximately 70%"

...without performing any calculation. The model pattern-matched to a "reasonable-sounding" percentage. The actual win rate might be 65.52%, but the LLM has no mechanism to know this.

🧠 Why This Happens

LLMs predict the next token based on learned probability distributions. When they encounter a context suggesting a percentage is needed, they sample from the distribution of "percentages that appeared in similar contexts during training." This is fundamentally different from computation.

Research backing: Google's work on arithmetic capabilities in transformers (Nogueira et al., 2021) demonstrated that LLMs fail reliably at multi-digit arithmetic. The error rate increases with operand size and operation complexity. This isn't a bug to be fixed—it's an architectural limitation of attention-based sequence models.

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    ZERO MENTAL MATH ARCHITECTURE                │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 1: Fat MCP Server (Pre-Calculation)                      │
│  └── Shift ALL computation to deterministic Python              │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 2: Accuracy Reports (Provenance Tracking)                │
│  └── Pre-formatted citations with cryptographic checksums       │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 3: Response Formatter (Constrained Generation)           │
│  └── Template-based output with zero degrees of freedom         │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 4: RAG Context (Semantic Grounding)                      │
│  └── Retrieval-augmented generation for entity resolution       │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 5: LLM Validation (Adversarial Verification)             │
│  └── Second LLM fact-checks against source data                 │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 6: Auto-Retry (Iterative Refinement)                     │
│  └── Automatic correction loop with convergence guarantees      │
└─────────────────────────────────────────────────────────────────┘

🔧 Layer 1: Fat MCP Server (Pre-Calculation)

📊 What It Does

The MCP (Model Context Protocol) server performs ALL numerical calculations before returning data to the LLM. The LLM never sees raw data that would require arithmetic.

# ❌ BAD: Raw data requires LLM to calculate
get_mt5_history_deals()  [deal1, deal2, deal3, ...]
# LLM must: count deals, group by position, sum P&L, calculate ratios

# ✅ GOOD: Pre-calculated metrics
get_mt5_position_history()  {
    "summary": {
        "total_positions": 29,      # Server counted
        "win_rate": 65.52,          # Server calculated: (19/29)*100
        "profit_factor": 2.34,      # Server calculated: sum(wins)/abs(sum(losses))
        "expectancy": 42.57         # Server calculated: total_pl/total_positions
    }
}

✅ Why This Works

Principle: Tool-Augmented LLMs

The insight from Meta's "Toolformer" (Schick et al., 2023) and the broader ReAct paradigm (Yao et al., 2022) is that LLMs should delegate to external tools for tasks they perform poorly. Arithmetic is the canonical example.

Principle: Separation of Concerns

Asking an LLM to calculate percentages is like asking a poet to do accounting. Language models are trained on text prediction, not numerical computation. By moving calculation to Python—a language designed for computation—we use each system for its strengths.

Principle: Determinism Over Stochasticity

Python's 19/29*100 = 65.517... is deterministic. Running it 1000 times yields identical results. An LLM's "calculation" is stochastic—it samples from a probability distribution, introducing variance even at temperature 0 (due to floating-point non-determinism in GPU operations).

Research Foundation

  • Toolformer (Schick et al., 2023): LLMs can learn to call APIs for tasks like calculation
  • Program-Aided Language Models (Gao et al., 2022): Offloading computation to code interpreters
  • Chain-of-Thought Arithmetic Failures (Wei et al., 2022): Even with step-by-step reasoning, LLMs make arithmetic errors

📝 Layer 2: Accuracy Reports (Provenance Tracking)

🎯 What It Does

Every tool response includes an _accuracy_report field containing:

  1. Pre-formatted citations — Complete sentences ready for copy-paste
  2. CRC32 checksum — Cryptographic fingerprint of all metric values
  3. Confidence score — Data quality assessment
{
    "summary": { "win_rate": 65.52, "profit_factor": 2.34 },
    "_accuracy_report": {
        "checksum": "A7B3C2D1",
        "checksum_input": "29|19|10|65.52|1234.56|85.25|-42.15|2.34|42.57",
        "confidence": {
            "score": "high",
            "reason": "9/9 metrics populated, 29 positions analyzed"
        },
        "metrics": [
            {
                "path": "summary.win_rate",
                "value": 65.52,
                "citation": "Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]"
            }
        ],
        "instructions": {
            "checksum_required": true,
            "format": "End analysis with: [Verified: A7B3C2D1]"
        }
    }
}

✅ Why This Works

Principle: The LLM as Copy Machine

The critical insight is that LLMs are excellent at copying text verbatim. By providing the exact citation string, we reduce the LLM's job from "interpret this number and write about it" to "copy this string into your response." The former invites hallucination; the latter is mechanical.

Principle: Verifiable Provenance

Every number in the output has a traceable source. This enables:

  • Automated verification: Scripts can check that reported values match source data
  • Human auditing: Readers can follow citations to verify claims
  • Debugging: When errors occur, the citation trail identifies the failure point

Principle: Checksums as Commitment Devices

The CRC32 checksum serves multiple purposes:

  1. Tamper detection: If any metric changes, the checksum changes
  2. Verification anchor: The [Verified: A7B3C2D1] at the end of output confirms the LLM used the correct source data
  3. Debugging aid: The checksum_input field shows the exact values used, enabling manual verification

Research Foundation

  • Attribution in RAG Systems (Liu et al., 2023): Citation improves factual accuracy
  • Self-Consistency Checking (Wang et al., 2022): Multiple verification signals improve reliability
  • Data Provenance in ML Pipelines: Standard practice in MLOps for reproducibility

📄 Layer 3: Response Formatter (Constrained Generation)

🎯 What It Does

Templates define the exact structure of outputs, with placeholder slots for citations:

TEMPLATE = """## Performance Analysis (Confidence: {confidence.score})

### Overview
{citation:summary.total_positions}
{citation:summary.win_rate}
{citation:summary.profit_factor}

[Verified: {checksum}]"""

The formatter replaces {citation:summary.win_rate} with the exact citation string from Layer 2:

Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]

✅ Why This Works

Principle: Reducing Degrees of Freedom

Hallucination occurs when LLMs have too much freedom. Consider:

Approach Degrees of Freedom Hallucination Risk
"Analyze this data" Unlimited Very High
"Report the win rate" High (format, precision, context) High
"Copy this citation: Win rate: 65.52%" Near Zero Near Zero

Templates eliminate structural decisions. The LLM doesn't choose what to report, in what order, with what formatting—the template specifies everything.

Principle: Slot-Filling vs. Generation

This follows the "skeleton-then-fill" paradigm from structured NLG (Natural Language Generation). The template is the skeleton; citations are the fill. The LLM's role is purely mechanical substitution.

Critical Implementation Rule:

class ResponseFormatter:
    """
    Critical Rule: NEVER calculates numbers. Only uses citations from
    _accuracy_report.metrics provided by the server.
    """

The formatter is explicitly prohibited from performing any computation. It can only copy existing citations.

Research Foundation

  • Constrained Decoding (Hokamp & Liu, 2017): Forcing outputs to satisfy constraints
  • Template-Based NLG (Reiter & Dale, 1997): Classical approach to reliable text generation
  • Structured Output Forcing: JSON mode, function calling schemas

🗄️ Layer 4: RAG Context (Semantic Grounding)

🎯 What It Does

A ChromaDB knowledge base stores static facts:

  • Strategy mappings (magic numbers → strategy names)
  • Trading rules and constraints
  • Domain-specific terminology

Before generating responses, the system retrieves relevant context:

# Query: "What strategy uses magic 106?"
# Returns: ["Magic number 106 is Goldfish Scalper trading XAUUSD"]

This context is injected into both the formatter and the validator.

✅ Why This Works

Principle: Not All Hallucinations Are Numerical

An LLM might correctly report "Win rate: 65.52%" but incorrectly attribute it to "Dark Dione strategy" when it's actually "Goldfish Scalper." This is a semantic hallucination—the number is right, but the entity relationship is wrong.

RAG grounds the LLM in factual knowledge about entities, preventing semantic errors.

Principle: Ephemeral Session Scope

kb = KnowledgeBase(ephemeral=True)  # Resets each MCP session
kb.load_static_rules()              # Loads known-good facts

The knowledge base is session-scoped to prevent stale data accumulation. Static rules (which don't change) are loaded fresh; dynamic trading statistics are always fetched live from MT5.

Principle: Context for Both Generator and Validator

The same RAG context is passed to:

  1. Formatter: To ground response generation
  2. Validator: To prevent false-positive hallucination flags

If the response says "Goldfish Scalper (Magic 106)" and the validator's context confirms this mapping, it won't incorrectly flag it as a hallucination.

Research Foundation

  • RAG (Lewis et al., 2020): The foundational retrieval-augmented generation paper
  • REALM (Guu et al., 2020): Retrieval-enhanced pre-training
  • In-Context Learning (Brown et al., 2020): GPT-3's ability to use context examples
  • Grounding in Dialogue Systems (Roller et al., 2020): Connecting responses to knowledge

✅ Layer 5: LLM Validation (Adversarial Verification)

🎯 What It Does

A second LLM (Novita AI) validates the drafted response against source data before delivery to the user:

validation_result = validate_with_llm(
    response_text=draft,      # What the LLM wants to say
    source_data=mcp_response, # Ground truth from server
    context=rag_context       # Knowledge base facts
)

The validator checks four rules:

  1. Zero Mental Math: All numbers match source exactly
  2. Anti-Aggregation: Raw values shown before averages
  3. Citation Requirement: Every number has [Source: ...]
  4. Checksum Verification: Response ends with correct [Verified: XXXX]

✅ Why This Works

Principle: Verification is Easier Than Generation

This is a fundamental asymmetry in computational complexity. Consider:

  • Generation: "Analyze this data and write a report" (open-ended, creative)
  • Verification: "Does '65.52%' match the source value '65.52'?" (closed, deterministic)

The validator has a much simpler task: pattern matching and comparison. This makes it far less prone to hallucination than the generator.

Principle: Adversarial Checking

This draws from:

  • Constitutional AI (Anthropic, 2022): Using AI to critique and improve AI outputs
  • Debate (Irving et al., 2018): Having models argue to expose weaknesses
  • Red-teaming: Standard security practice of adversarial testing

The validator is explicitly instructed to be strict:

Be strict - any deviation from source is a hallucination

Principle: Structured Error Output

The validator returns structured JSON with specific issue categorization:

{
    "hallucinations_found": true,
    "issues": [{
        "claim": "Win Rate: approximately 70%",
        "problem": "Source shows 65.52%, not 'approximately 70%'",
        "severity": "critical",
        "correct_value": "Win rate: 65.52% [Source: ...]",
        "rule_violated": "Zero Mental Math"
    }]
}

This enables automated correction in Layer 6.

Research Foundation

  • Constitutional AI (Bai et al., 2022): AI systems that critique themselves
  • Self-Consistency (Wang et al., 2022): Sampling multiple times and checking agreement
  • Fact Verification (Thorne et al., 2018): FEVER dataset and verification systems
  • LLM-as-Judge (Zheng et al., 2023): Using LLMs to evaluate LLM outputs

🔄 Layer 6: Auto-Retry (Iterative Refinement)

🎯 What It Does

When validation fails, the system automatically:

  1. Parses the validation errors
  2. Applies corrections to the draft
  3. Re-validates
  4. Repeats up to N times (default: 3)
for attempt in range(1, max_retries + 1):
    validation = validate_with_llm(narrative, source_data, context)

    if not validation["hallucinations_found"]:
        # Success! Return validated response
        return {"analysis": narrative, "_validation_meta": {"validated": True}}

    # Failed - apply corrections and retry
    narrative = corrector.apply_corrections(narrative, validation["issues"])

✅ Why This Works

Principle: Iterative Refinement

Self-refinement is a well-established technique for improving LLM outputs. The key insight is that correction is easier than generation—given specific feedback ("this number is wrong, it should be X"), the fix is mechanical.

Principle: Bounded Retry with Graceful Degradation

The system doesn't retry forever:

  • Fixable issues (wrong numbers): Auto-correct and retry
  • Unfixable issues (structural problems): Fail immediately with diagnostics
  • Max retries exceeded: Return error with last attempt for debugging
if not can_fix:
    return {
        "success": False,
        "error": "Validation failed with unfixable issues",
        "validation_issues": issues,
        "unfixable_reasons": reasons
    }

Principle: Convergence Guarantees

Because corrections are deterministic (replace X with Y) and the validator is consistent, the system converges. If the corrector properly applies all fixes, the next validation will pass. The retry loop guards against transient failures, not fundamental incompatibility.

Research Foundation

  • Self-Refine (Madaan et al., 2023): Iterative refinement with self-feedback
  • Reflexion (Shinn et al., 2023): Verbal reinforcement learning through self-reflection
  • Error Correction in Communication (Shannon, 1948): Fundamental information theory

🚨 Special Rule: Anti-Aggregation

⚠️ The Problem

Aggregation hides critical information. Consider:

❌ WRONG (Aggregation Hallucination):
Current lot sizes:
- EURUSD: 0.06 lots average (range: 0.05-0.08)

This hides that lot size doubled from 0.05 → 0.08 recently—a critical signal that risk management changed.

✅ The Solution

✅ CORRECT (Raw Data First):
Current lot sizes [Source: positions list]:
- EURUSD last 5: [0.05, 0.05, 0.05, 0.08, 0.07]
  → CURRENT: 0.07 lots
  → TREND: Scaled up 60% on Nov 24 (0.05 → 0.08)
  → Average: 0.06 lots (for reference only)

✅ Why This Works

Principle: Simpson's Paradox Awareness

Aggregates can reverse the apparent direction of relationships. A "stable average" can hide dramatic changes in underlying data. By requiring raw values first, we prevent this information loss.

Principle: Auditability

Scientific reporting standards require showing raw data. If you only report "average 0.06," readers cannot detect:

  • Outliers that skew the average
  • Trends (increasing/decreasing)
  • Distribution shape (uniform vs. bimodal)

Principle: Transparency Over Convenience

It's easier to report a single number. But the Anti-Aggregation Rule prioritizes transparency over convenience. The small cognitive cost of reading 5 raw values prevents potentially catastrophic misunderstandings.

🎯 Accuracy Indicators in Final Output

✅ What Indicates Accuracy

Indicator Location Example Why It Matters
Citation tags After every number [Source: tool.path] Traceable provenance
Checksum End of response [Verified: A7B3C2D1] Data integrity proof
Confidence score Header (Confidence: high) Data quality signal
Validation metadata Response field "validated": true System verification passed
No approximations Absence Never: "~", "about" Zero Mental Math compliance
Raw values before aggregates Data sections last 5: [...] Anti-Aggregation compliance

🚩 Red Flags (Hallucination Indicators)

Red Flag Example Rule Violated
Missing citation Win rate: 65.52% Citation Requirement
Approximation words approximately 70% Zero Mental Math
Missing checksum No [Verified: XXXX] Checksum Requirement
Rounded numbers 70% vs 65.52% Zero Mental Math
Averages without raw data Average: 0.06 alone Anti-Aggregation
Confidence missing No confidence score Incomplete output

📖 Complete Example: End-to-End Pipeline

Step 1: User Request

"Analyze my trading performance for November"

Step 2: MCP Server Response (Layers 1-2)

{
    "success": true,
    "summary": {
        "total_positions": 29,
        "total_wins": 19,
        "total_losses": 10,
        "win_rate": 65.52,
        "profit_factor": 2.34,
        "total_pl": 1234.56
    },
    "_accuracy_report": {
        "checksum": "A7B3C2D1",
        "confidence": {"score": "high", "reason": "9/9 metrics, 29 positions"},
        "metrics": [
            {
                "path": "summary.win_rate",
                "value": 65.52,
                "citation": "Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]"
            },
            {
                "path": "summary.profit_factor",
                "value": 2.34,
                "citation": "Profit factor: 2.34 [Source: get_mt5_position_history.summary.profit_factor]"
            }
        ]
    }
}

Step 3: Template Formatting (Layer 3)

## Performance Analysis (Confidence: high)

### Overview
- Total positions: 29 [Source: get_mt5_position_history.summary.total_positions]
- Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]
- Profit factor: 2.34 [Source: get_mt5_position_history.summary.profit_factor]

### Financial Performance
- Total P&L: $1234.56 [Source: get_mt5_position_history.summary.total_pl]

[Verified: A7B3C2D1]

Step 4: RAG Context Retrieval (Layer 4)

Retrieved: "Magic number 106 is Goldfish Scalper trading XAUUSD"

Step 5: LLM Validation (Layer 5)

{
    "hallucinations_found": false,
    "checksum_valid": true,
    "summary": "All claims verified against source data"
}

Step 6: Final Response

{
    "success": true,
    "analysis": "## Performance Analysis (Confidence: high)\n\n...\n\n[Verified: A7B3C2D1]",
    "_validation_meta": {
        "validated": true,
        "attempts": 1,
        "model": "novita-default",
        "rag_context_used": true
    }
}

📊 Summary: Why This Architecture Works

Layer Technique What It Prevents Key Insight
1 Server-side calculation LLM arithmetic errors Use the right tool for the job
2 Pre-formatted citations LLM paraphrasing numbers Reduce LLM to copy machine
3 Template-based output Structural hallucination Minimize degrees of freedom
4 RAG context grounding Semantic hallucination Ground entities in facts
5 Second LLM validation Subtle errors slipping through Verification < Generation
6 Auto-retry with correction Transient failures Iterative refinement converges

💡 The Meta-Principle

Trust flows from deterministic systems to stochastic ones, never the reverse.

Python calculates → Server stores → Citations copy → Templates structure → Validator checks

At no point does an LLM "decide" a number. The LLM's role is purely mechanical: copying citations into template slots. This is the fundamental insight that makes 100% accuracy achievable.

📚 References

  1. Schick, T., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools." arXiv:2302.04761
  2. Yao, S., et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629
  3. Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020
  4. Wang, X., et al. (2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." arXiv:2203.11171
  5. Bai, Y., et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." arXiv:2212.08073
  6. Madaan, A., et al. (2023). "Self-Refine: Iterative Refinement with Self-Feedback." arXiv:2303.17651
  7. Gao, L., et al. (2022). "PAL: Program-Aided Language Models." arXiv:2211.10435
  8. Nogueira, R., et al. (2021). "Investigating the Limitations of Transformers with Simple Arithmetic Tasks." arXiv:2102.13019

📁 Appendix: Implementation Files

File Purpose Layer
server.py MCP server with pre-calculated metrics 1
accuracy_report.py Citation and checksum generation 2
response_formatter.py Template-based output formatting 3
knowledge_base.py ChromaDB RAG integration 4
llm_validator.py Second LLM fact-checking 5
validation_decorator.py Auto-retry orchestration 6
auto_corrector.py Automatic error correction 6

⭐ Two Free AI Tools You Should Try: A Surprisingly Good Image Generator &amp; Text-to-Speech Tool

2025-11-29 00:13:05

We’re living in a time where almost every “free” AI tool hides something behind a paywall — limited credits, forced registration, watermarks, or downgraded quality unless you upgrade.

So when I came across two AI tools that are actually free, require zero sign-up, and deliver genuinely high-quality results, I was honestly surprised. They feel like hidden gems, so I’m sharing them here for anyone who loves AI tools, productivity resources, or creative experiments.

🎨 1. Free AI Image Generator (High Quality + No Account Needed)
🔗 https://productivitygears.com/free-ai-image-generator-tool

If you’ve ever used online image generators, you know how frustrating it gets: credits, blurry previews, logins, watermarks — the usual story.
This tool is the opposite: clean, fast, and free.

Press enter or click to view image in full size

What Makes It Stand Out
No login or email required
Completely free (not a “free trial”)
Generates high-quality images
Surprisingly fast
No watermark on downloads
Simple, clean interface
You can create illustrations, concept art, characters, logos, wallpapers — whatever you want — without jumping through hoops. It’s refreshing to use something that just works.

🔊 2. Free AI Text-to-Speech Tool (Natural Voices, Instant MP3)
🔗 https://productivitygears.com/free-ai-text-to-speech

Become a member
The second tool I found is a Text-to-Speech converter, and honestly, it’s one of the better free ones I’ve tested recently. Many TTS websites place strict character limits, require login, or keep the good voices locked behind subscriptions.

Press enter or click to view image in full size

This one doesn’t.

Why It’s Worth Bookmarking
Completely free
No sign-up
Natural, realistic voices
Instant MP3 download
No character limit during my tests
Great for videos, voiceovers, content creation, or accessibility
For something free and frictionless, the quality is surprisingly good.

⭐ Final Thoughts
Finding AI tools that are truly free and actually good is rare. Both of these tools deliver value without the typical restrictions — no login, no credits, no watermark surprises.

If you enjoy playing around with AI, create content, or just love exploring useful online tools, these two are absolutely worth checking out and adding to your bookmarks.

If you try them, I’m curious which one you end up using more.

Workload And Agentic Identity at Scale: Insights From CyberArk's Workload Identity Day Zero

2025-11-29 00:10:36

What do the terms identity, AI, workload, access, SPIFFE, and secrets all have in common? These were the most common words used at CyberArk's Workload Identity Day Zero in Atlanta ahead of KubeCon 2025. 

Across an evening full of talks and hallway conversations, the conversation kept coming back to the fact that we have built our infrastructures, tools, and standards around humans, then quietly handed the keys to a fast-multiplying universe of non-human identities (NHIs). However, the evening didn't dwell on what we have gotten wrong, but instead on what we are getting right as we look towards a brighter future of workload identity. 

State of Workload Authentication

Every speaker discussed what has happened so far and how we have reached the state in which so many companies find themselves. These workload identities, in the form of services, agents, CI jobs, Lambdas, etc, are mostly authenticated today with long-lived API keys that are more likely than not, overprivileged. We have granted standing access too often. While some teams have embraced PKI setups at scale are trapped in a complexity that only a handful of experts truly understand. 

The result is explosive complexity as teams face multi-cloud and hybrid environments, multiple languages, and increasingly complex org charts. It is no wonder that every siloed team has come up with its own ad hoc solutions over the years. But that means it is unlikely that any governance model will be a good fit, preventing us from leveraging a single management system. There is also the fact that if an attacker gets hold of a single NHI credential, they gain a huge, often invisible foothold with a massive blast radius.

At the same time, scale and AI are turning this from "annoying" to "existential" threat. Workloads now spin up, talk to each other, cross trust domains, and die off in seconds, while organizations want billions of attestations per day without hiring an army just to rotate secrets. Now, Agentic AI shows up and starts acting on our behalf. It calls APIs, touches sensitive data, and hops across providers. We now don't know if a user triggered an action, or an autonomous agent did. 

We are beginning to recognize the need for clean attribution, access governance, and logging. Everyone on stage is essentially describing the same issue: we can't keep handing out magic tokens to non-human actors and hope spreadsheets, YAML, and "best effort" PKI will save us.

Weaving A Shared Workload Identity Story

The opening keynote from Andrew Moore, Staff Software Engineer at Uber, "From Bet to Backbone, Securing Uber with SPIRE," set the tone for the evening. For Uber, all of this work comes down to external customer trust.  Their SPIRE journey at Uber really began when they admitted the impossibility of governing "thousands of solutions at scale." 

Andrew's team moved toward a single, SPIFFE-based workload identity fabric that can handle hundreds of thousands to billions of attestations per day. They treat SPIRE as the "bottom turtle." This means trusted boot, agent validation, centralized signing, tight, well-designed SPIFFE IDs that don't accumulate junk. 

Andrew Moore

Tying AI To Workloads

Brett Caley, Senior Software Security Engineer at Block, echoed a similar story arc as Uber's in his talk "WIMSE, OAUTH and SPIFFE: A Standards-Based Blueprint for Securing Workloads at Scale." The core question they needed to answer was how to prove a workload "Is who they say they are." Their team went from plaintext keys in Git and bespoke OIDC hacks to "x509 everywhere," SPIRE-driven attestations. They have rolled out systems that can issue credentials where the workload is, and at the speed developers demand. He explained that they deployed SPIRE "where it should exist."

Brett also tied all of this to agentic AI and our tendency to anthropomorphize, that is, giving AI agents names in Slack. He said this might invoke Black Mirror storylines, but we need to remember that these 'human-like agents' are all still workloads that need narrowly scoped permissions, explicit authorization of actions, and confirmation of intent.

When something goes wrong, he argues, the right question isn't "What did the AI do to us?" but "How did our system fail in governing the AI's workload identity and permissions?"

Brett Caley

AI Agents Are Workloads That Need Identities

In their talk "AI agent communication across cloud providers with SPIFFE universal identities," Dan Choi, Senior Product Manager, AWS Cryptography, and Brendan Paul, Sr. Security Solutions Architect from AWS, highlighted that Agentic AIs are comprised of workloads and need to communicate across clouds without ever touching long-lived secrets.

They said if you can establish two-way trust between your authorization servers and your SPIFFE roots of trust, via SPIRE, you can treat SPIFFE Verifiable Identity Documents (SVID) as universal, short-lived identities for AI agents. 

From there, they walked through some concrete use cases, including an AI agent acting on behalf of a user. Framed as an "AI-enabled coffee shop" demo, they showed how you start with an authenticated web app, then propagate both user identity and workload identity so the agent can check inventory, update systems, and call tools with clear attribution and least privilege.

They stressed that you don't need MCP or any particular orchestration framework for this; the pattern is always "get the agent an SVID, then exchange it for scoped cloud credentials," whether you are interfacing with an S3 bucket or anything else. They closed by telling us that AI agents naturally span trust domains, and SPIFFE gives them a common identity fabric. The future work will refine how we do token proof-of-possession and delegation at scale for all of these non-human workloads.

Brendan Paul and Dan Choi

Where We Go From Here

If these talks are any indication, the next few years are about moving workload identity from heroic projects to boring infrastructure. SPIFFE/SPIRE, WIMSE, OAuth token-exchange patterns, and transaction tokens will quietly become the plumbing. This will define how we deploy CI/CD, microservices, and AI agents securely at scale for the next generation of applications and platforms. 

Enterprises are realizing that "API keys in Git" and "service account sprawl" are no longer acceptable risks. The time is now to either adopt internal identity fabrics that attest workloads, issue short-lived credentials, centralize policy, and log everything, enriching those logs with as much context as possible. The UX lesson from all of these talks is that identity is currently painful, and people route around it. If we can do the work now, to make workload identity automatic and invisible, teams will lean into it.

At the same time, agentic AI will force us to sharpen our thinking about representation and blame. We need to ask "What is this workload allowed to do, on whose behalf, and with what guardrails?" The future is building golden paths where every non-human identity, no matter what shape or function, comes pre-wired with a strong, attestable identity and tightly scoped access. The creative tension will be keeping that world flexible enough for experimentation, and safe enough that "exploding rockets" stay in test environments, not production.

No matter where your path towards better NHI governance, everyone at Workload Identity Day Zero agreed that having insight into your current inventory of workloads and machine identities is mandatory. We at GitGuardian firmly believe that the future can mean fewer leaked secrets, better secrets management, and scalable NHI governance, but we can't get there without first understanding the scope of the issue in the org today. We would love to talk to you and your team about that.

CinemaSins: Everything Wrong With KPop Demon Hunters In 16 Minutes Or Less

2025-11-29 00:02:49

Everything Wrong With KPop Demon Hunters In 16 Minutes Or Less

CinemaSins takes on the wild ride that is KPop Demon Hunters, throwing their signature “sins” over the movie in under 16 minutes of sharp, tongue-in-cheek critiques. They invite fans to dive deeper into the CinemaSins universe—follow their YouTube channels, social feeds, and even fill out a “sinful” poll or support the team on Patreon.

The video description also lists the crew behind the sins (Jeremy, Chris, Aaron, Jonathan, Deneé, Ian, Daniel) with links to their social profiles, plus community hangouts on Discord and Reddit, and more goodies like Jeremy’s book and TikTok highlights.

Watch on YouTube

Pure-Go Race Detector - Race Detection Without CGO

2025-11-29 00:00:28

Hi!
I've been working on a pure-Go race detector that works without CGO. Just released v0.3.0 with major performance improvements.

The Problem

Go's built-in race detector requires CGO. This means:

  • No race detection in Docker scratch images
  • Cross-compilation becomes complicated
  • Cloud functions and embedded systems are out of luck

The Solution

A pure-Go implementation of race detection using the FastTrack algorithm. Works with CGO_ENABLED=0.

What's New in v0.3.0

99% overhead reduction - from ~10,000x down to ~100x slowdown.

Key improvements:

  • Adaptive Sampling - configurable 1-100% sample rate for production use
  • Sparse VectorClocks - O(active goroutines) instead of O(total)
  • Address Compression - 8x memory reduction
  • 4 Inline Slots - zero allocations for common cases

Installation

go install github.com/kolkov/racedetector/cmd/[email protected]

Usage

# Build with race detection
racedetector build -o myapp main.go

# Run with race detection
racedetector run main.go

# Production with 10% sampling
RACEDETECTOR_SAMPLE_RATE=10 racedetector run main.go

Links

Looking for Feedback

I'd appreciate if you could try it on your projects and report any bugs or false positives. Contributions are welcome.

Also curious: should something like this be proposed for Go toolchain integration, or is it better as a standalone tool?

Thanks!