2025-11-29 00:16:15
I recently built a prospecting agent with Python to find local businesses on Google’s lower-ranked pages and pitch them SEO services.
The initial version was... promising but flawed.
It tried to pitch Indeed.com because they didn't have a local phone number. It told Ford Dealerships their site was "down" because their firewall blocked my bot. It sent robotic emails starting with "Fail: H1 Missing" ... not exactly a charming opener.
I realized that to make this tool useful, I needed to move from a simple scraper to a true agent. Here is the breakdown of how I refactored the code to filter noise, crawl for contacts, and use GenAI to write personalized campaigns.
The first problem with scraping generic keywords is that half the results aren't businesses, they are directories, job boards, and government sites. My script was wasting resources auditing ZipRecruiter and Texas.gov.
The Fix:
I made the clean_and_deduplicate function even more robust with a strict blocklist. I expanded the existing blocklist significantly. We categorized domains into "Job Boards," "Government," "Social Media," and "National Brands" (like Penske) that wouldn't hire a local agency anyway.
# We filter these out before we even attempt an audit
DIRECTORY_DOMAINS = [
'indeed', 'glassdoor', 'ziprecruiter', # Job Boards
'.gov', 'texas.gov', 'fmcsa', # Gov sites
'yelp', 'yellowpages', 'bbb.org', # Directories
'penske', 'uhaul', 'ford.com' # National Brands
]
def is_directory(url):
if any(domain in url.lower() for domain in DIRECTORY_DOMAINS):
return True
return False
Result: The list went from ~200 "leads" to ~70 actual local businesses.
My original script checked for H1 tags using exact string matching. If the keyword was diesel mechanic and the H1 was Best Diesel Mechanic in Texas, the script marked it as a FAIL.
The Fix: Fuzzy Logic
I switched to token-based set matching. If the H1 contains a significant percentage of the target keywords (over 50%), it passes.
# Breaking strings into sets of words for flexible matching
required_words = set(keyword.lower().split())
found_words = set(h1_text.lower().split())
# Calculate intersection
matches = required_words.intersection(found_words)
match_percentage = len(matches) / len(required_words)
# If >50% overlap, it's a Pass.
if match_percentage >= 0.5:
audit_data['H1_Audit_Result'] = "Pass"
Originally, if a site returned a 403 Forbidden, my script flagged it as "Actionable: Server Error." Pitching a client saying "Your site is down" when it's actually just secure is a great way to look incompetent.
The Fix: Handling Firewalls
I updated the requests logic to explicitly catch 403 and 406 errors and mark them as SKIP. Now, the agent only flags genuine connection errors (like 500 or SSLError) as actionable leads.
except RequestException as e:
# If the server explicitly blocked us (Firewall/WAF), it's not a lead.
if response.status_code in [403, 406, 429, 503]:
audit_data['Error_Status'] = "Blocked"
return audit_data # Stop processing, we will filter this out later
# Real connection errors (DNS failure, Timeout) are actual leads
# We want to pitch "Site Health" services to these.
audit_data['Error_Status'] = f"Error: {e.__class__.__name__}"
This was the strategic game-changer. A site with a missing H1 tag isn't necessarily a good lead. But a business with 50 five-star reviews and a missing H1 tag? That is a gold mine.
I integrated a secondary API call to fetch the Google Business Profile (GBP) ratings for every prospect to identify "Hidden Gems": businesses with great real-world reputations but poor digital presence.
# We categorize the lead before generating the pitch
is_gbp_strong = gbp_rating >= 4.0 and gbp_reviews >= 10
is_gbp_missing = gbp_rating == 0
# Strategy A: Strong GBP + Weak Site = "Your site hurts your reputation"
# Strategy B: No GBP + Weak Site = "You are invisible"
At this point, I had great prospects, but I was missing the most important piece of data: The Email Address. Many local businesses don't put their email in the Header; they hide it on the "Contact Us" page.
The Fix: The Spider Logic
I upgraded the agent to act like a human user:
mailto: links or regex matches./contact-us gets 100 points, /about gets 30 points.def find_best_contact_url(soup, base_url):
# Heuristic Scoring Logic
score = 0
if 'contact' in url_path: score += 100
if 'contact' in link_text: score += 50
if link_is_in_footer: score += 10
# Returns the URL with the highest score to crawl next
return best_candidate
This logic alone saved ~40% of leads that would have otherwise been discarded as "No Contact Info."
Finally, I tackled the outreach itself. My previous email template was rigid and impersonal. I wanted a 3-email sequence that felt human.
The Fix: Google Gemini 2.5 Flash
I integrated the Gemini API (which is proven to be fast and cost-efficient). Instead of using a fixed string, I feed the Audit Data + GBP Data into a prompt.
The AI generates a 3-stage campaign:
# Feeding the Gap Strategy into the LLM
prompt = f"""
PROSPECT: {company}, Rating: {gbp_rating} stars.
ISSUES: H1: {h1_status}, NAP: {nap_status}
STRATEGY:
1. If rating > 4.0, praise reputation but warn about site errors.
2. Explain WHY {h1_status} kills rankings.
3. Gentle breakup.
OUTPUT FORMAT: JSON {{ "subject_1": "...", "body_1": "..." }}
"""
model = genai.GenerativeModel('gemini-2.5-flash')
response = model.generate_content(prompt)
The agent now runs autonomously. It scans SERPs, filters junk, crawls for emails across multiple pages, and uses LLMs to write custom campaigns.
The Metrics:
Key Takeaway: When developing an Agent or any tool in general, Iteration is King. You have to be able to know what you currently have and what's missing to reach that optimal output. In my case, the difference between "just a script" and an "agent" is the ability to handle imperfection, hopping pages when data is missing, understanding context, and generating dynamic output. This project has become something I look forward to working on and the most exciting part is that there's still room to grow.
🔗 Check out the Code:
You can find the full source code and contribute to the project on GitHub:
https://github.com/Rafa-romero-dev/seo-agent
A special thank you to the Dev.to team for featuring my previous article in the Top 7 Featured Dev Posts of the Week!
What do you think I should focus on next? What could use some refinement? Let me know in the comments!
2025-11-29 00:16:03
A six-layer system for achieving 100% accurate numerical reporting from Large Language Models
I built an MCP server that extracts data from my MT5 terminals on a VPS. Basically its a load of financial data reports, like trades, averages, technical indicators etc.
I built it all out and I realized that my LLM would randomly hallucinate random things, for example it would say there was a 16th trade when there only had been 15 trades for that day.
When it comes to financial reporting I realize there is probaly a lot on this topic, so I grabbed some ideas from a lot of the latest research on RAG topics, and i threw something together.
I wrote tests that actually test the accuracy of the results of my embeddings over a period of 10 times, and each MCP tool has 100% accuracy on end to end integration tests.
I had the AI summarize it, but if anyone is curious about the exact code maybe I can open source a repeatable process, but i'm hoping from this Article you will have everything you need.
( incoming AI gen content )
Large Language Models (LLMs) are fundamentally pattern matchers, not calculators. When asked to analyze data, they generate "plausible-looking" numbers based on statistical patterns in training data—not deterministic computation. This is catastrophic for domains requiring precision, such as trading analysis, financial reporting, or medical diagnostics.
This document describes the Zero Mental Math Architecture, a multi-layered system that achieves accurate numerical reporting by shifting all computation to deterministic Python code and reducing the LLM to a "citation copy machine."
Given raw trading data, an LLM will confidently state:
"Your win rate is approximately 70%"
...without performing any calculation. The model pattern-matched to a "reasonable-sounding" percentage. The actual win rate might be 65.52%, but the LLM has no mechanism to know this.
LLMs predict the next token based on learned probability distributions. When they encounter a context suggesting a percentage is needed, they sample from the distribution of "percentages that appeared in similar contexts during training." This is fundamentally different from computation.
Research backing: Google's work on arithmetic capabilities in transformers (Nogueira et al., 2021) demonstrated that LLMs fail reliably at multi-digit arithmetic. The error rate increases with operand size and operation complexity. This isn't a bug to be fixed—it's an architectural limitation of attention-based sequence models.
┌─────────────────────────────────────────────────────────────────┐
│ ZERO MENTAL MATH ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 1: Fat MCP Server (Pre-Calculation) │
│ └── Shift ALL computation to deterministic Python │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 2: Accuracy Reports (Provenance Tracking) │
│ └── Pre-formatted citations with cryptographic checksums │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 3: Response Formatter (Constrained Generation) │
│ └── Template-based output with zero degrees of freedom │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 4: RAG Context (Semantic Grounding) │
│ └── Retrieval-augmented generation for entity resolution │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 5: LLM Validation (Adversarial Verification) │
│ └── Second LLM fact-checks against source data │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 6: Auto-Retry (Iterative Refinement) │
│ └── Automatic correction loop with convergence guarantees │
└─────────────────────────────────────────────────────────────────┘
The MCP (Model Context Protocol) server performs ALL numerical calculations before returning data to the LLM. The LLM never sees raw data that would require arithmetic.
# ❌ BAD: Raw data requires LLM to calculate
get_mt5_history_deals() → [deal1, deal2, deal3, ...]
# LLM must: count deals, group by position, sum P&L, calculate ratios
# ✅ GOOD: Pre-calculated metrics
get_mt5_position_history() → {
"summary": {
"total_positions": 29, # Server counted
"win_rate": 65.52, # Server calculated: (19/29)*100
"profit_factor": 2.34, # Server calculated: sum(wins)/abs(sum(losses))
"expectancy": 42.57 # Server calculated: total_pl/total_positions
}
}
Principle: Tool-Augmented LLMs
The insight from Meta's "Toolformer" (Schick et al., 2023) and the broader ReAct paradigm (Yao et al., 2022) is that LLMs should delegate to external tools for tasks they perform poorly. Arithmetic is the canonical example.
Principle: Separation of Concerns
Asking an LLM to calculate percentages is like asking a poet to do accounting. Language models are trained on text prediction, not numerical computation. By moving calculation to Python—a language designed for computation—we use each system for its strengths.
Principle: Determinism Over Stochasticity
Python's 19/29*100 = 65.517... is deterministic. Running it 1000 times yields identical results. An LLM's "calculation" is stochastic—it samples from a probability distribution, introducing variance even at temperature 0 (due to floating-point non-determinism in GPU operations).
Research Foundation
Every tool response includes an _accuracy_report field containing:
{
"summary": { "win_rate": 65.52, "profit_factor": 2.34 },
"_accuracy_report": {
"checksum": "A7B3C2D1",
"checksum_input": "29|19|10|65.52|1234.56|85.25|-42.15|2.34|42.57",
"confidence": {
"score": "high",
"reason": "9/9 metrics populated, 29 positions analyzed"
},
"metrics": [
{
"path": "summary.win_rate",
"value": 65.52,
"citation": "Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]"
}
],
"instructions": {
"checksum_required": true,
"format": "End analysis with: [Verified: A7B3C2D1]"
}
}
}
Principle: The LLM as Copy Machine
The critical insight is that LLMs are excellent at copying text verbatim. By providing the exact citation string, we reduce the LLM's job from "interpret this number and write about it" to "copy this string into your response." The former invites hallucination; the latter is mechanical.
Principle: Verifiable Provenance
Every number in the output has a traceable source. This enables:
Principle: Checksums as Commitment Devices
The CRC32 checksum serves multiple purposes:
[Verified: A7B3C2D1] at the end of output confirms the LLM used the correct source datachecksum_input field shows the exact values used, enabling manual verificationResearch Foundation
Templates define the exact structure of outputs, with placeholder slots for citations:
TEMPLATE = """## Performance Analysis (Confidence: {confidence.score})
### Overview
{citation:summary.total_positions}
{citation:summary.win_rate}
{citation:summary.profit_factor}
[Verified: {checksum}]"""
The formatter replaces {citation:summary.win_rate} with the exact citation string from Layer 2:
Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]
Principle: Reducing Degrees of Freedom
Hallucination occurs when LLMs have too much freedom. Consider:
| Approach | Degrees of Freedom | Hallucination Risk |
|---|---|---|
| "Analyze this data" | Unlimited | Very High |
| "Report the win rate" | High (format, precision, context) | High |
| "Copy this citation: Win rate: 65.52%" | Near Zero | Near Zero |
Templates eliminate structural decisions. The LLM doesn't choose what to report, in what order, with what formatting—the template specifies everything.
Principle: Slot-Filling vs. Generation
This follows the "skeleton-then-fill" paradigm from structured NLG (Natural Language Generation). The template is the skeleton; citations are the fill. The LLM's role is purely mechanical substitution.
Critical Implementation Rule:
class ResponseFormatter:
"""
Critical Rule: NEVER calculates numbers. Only uses citations from
_accuracy_report.metrics provided by the server.
"""
The formatter is explicitly prohibited from performing any computation. It can only copy existing citations.
Research Foundation
A ChromaDB knowledge base stores static facts:
Before generating responses, the system retrieves relevant context:
# Query: "What strategy uses magic 106?"
# Returns: ["Magic number 106 is Goldfish Scalper trading XAUUSD"]
This context is injected into both the formatter and the validator.
Principle: Not All Hallucinations Are Numerical
An LLM might correctly report "Win rate: 65.52%" but incorrectly attribute it to "Dark Dione strategy" when it's actually "Goldfish Scalper." This is a semantic hallucination—the number is right, but the entity relationship is wrong.
RAG grounds the LLM in factual knowledge about entities, preventing semantic errors.
Principle: Ephemeral Session Scope
kb = KnowledgeBase(ephemeral=True) # Resets each MCP session
kb.load_static_rules() # Loads known-good facts
The knowledge base is session-scoped to prevent stale data accumulation. Static rules (which don't change) are loaded fresh; dynamic trading statistics are always fetched live from MT5.
Principle: Context for Both Generator and Validator
The same RAG context is passed to:
If the response says "Goldfish Scalper (Magic 106)" and the validator's context confirms this mapping, it won't incorrectly flag it as a hallucination.
Research Foundation
A second LLM (Novita AI) validates the drafted response against source data before delivery to the user:
validation_result = validate_with_llm(
response_text=draft, # What the LLM wants to say
source_data=mcp_response, # Ground truth from server
context=rag_context # Knowledge base facts
)
The validator checks four rules:
[Source: ...]
[Verified: XXXX]
Principle: Verification is Easier Than Generation
This is a fundamental asymmetry in computational complexity. Consider:
The validator has a much simpler task: pattern matching and comparison. This makes it far less prone to hallucination than the generator.
Principle: Adversarial Checking
This draws from:
The validator is explicitly instructed to be strict:
Be strict - any deviation from source is a hallucination
Principle: Structured Error Output
The validator returns structured JSON with specific issue categorization:
{
"hallucinations_found": true,
"issues": [{
"claim": "Win Rate: approximately 70%",
"problem": "Source shows 65.52%, not 'approximately 70%'",
"severity": "critical",
"correct_value": "Win rate: 65.52% [Source: ...]",
"rule_violated": "Zero Mental Math"
}]
}
This enables automated correction in Layer 6.
Research Foundation
When validation fails, the system automatically:
for attempt in range(1, max_retries + 1):
validation = validate_with_llm(narrative, source_data, context)
if not validation["hallucinations_found"]:
# Success! Return validated response
return {"analysis": narrative, "_validation_meta": {"validated": True}}
# Failed - apply corrections and retry
narrative = corrector.apply_corrections(narrative, validation["issues"])
Principle: Iterative Refinement
Self-refinement is a well-established technique for improving LLM outputs. The key insight is that correction is easier than generation—given specific feedback ("this number is wrong, it should be X"), the fix is mechanical.
Principle: Bounded Retry with Graceful Degradation
The system doesn't retry forever:
if not can_fix:
return {
"success": False,
"error": "Validation failed with unfixable issues",
"validation_issues": issues,
"unfixable_reasons": reasons
}
Principle: Convergence Guarantees
Because corrections are deterministic (replace X with Y) and the validator is consistent, the system converges. If the corrector properly applies all fixes, the next validation will pass. The retry loop guards against transient failures, not fundamental incompatibility.
Research Foundation
Aggregation hides critical information. Consider:
❌ WRONG (Aggregation Hallucination):
Current lot sizes:
- EURUSD: 0.06 lots average (range: 0.05-0.08)
This hides that lot size doubled from 0.05 → 0.08 recently—a critical signal that risk management changed.
✅ CORRECT (Raw Data First):
Current lot sizes [Source: positions list]:
- EURUSD last 5: [0.05, 0.05, 0.05, 0.08, 0.07]
→ CURRENT: 0.07 lots
→ TREND: Scaled up 60% on Nov 24 (0.05 → 0.08)
→ Average: 0.06 lots (for reference only)
Principle: Simpson's Paradox Awareness
Aggregates can reverse the apparent direction of relationships. A "stable average" can hide dramatic changes in underlying data. By requiring raw values first, we prevent this information loss.
Principle: Auditability
Scientific reporting standards require showing raw data. If you only report "average 0.06," readers cannot detect:
Principle: Transparency Over Convenience
It's easier to report a single number. But the Anti-Aggregation Rule prioritizes transparency over convenience. The small cognitive cost of reading 5 raw values prevents potentially catastrophic misunderstandings.
| Indicator | Location | Example | Why It Matters |
|---|---|---|---|
| Citation tags | After every number | [Source: tool.path] | Traceable provenance |
| Checksum | End of response | [Verified: A7B3C2D1] | Data integrity proof |
| Confidence score | Header | (Confidence: high) | Data quality signal |
| Validation metadata | Response field | "validated": true | System verification passed |
| No approximations | Absence | Never: "~", "about" | Zero Mental Math compliance |
| Raw values before aggregates | Data sections | last 5: [...] | Anti-Aggregation compliance |
| Red Flag | Example | Rule Violated |
|---|---|---|
| Missing citation | Win rate: 65.52% | Citation Requirement |
| Approximation words | approximately 70% | Zero Mental Math |
| Missing checksum | No [Verified: XXXX] | Checksum Requirement |
| Rounded numbers | 70% vs 65.52% | Zero Mental Math |
| Averages without raw data | Average: 0.06 alone | Anti-Aggregation |
| Confidence missing | No confidence score | Incomplete output |
"Analyze my trading performance for November"
{
"success": true,
"summary": {
"total_positions": 29,
"total_wins": 19,
"total_losses": 10,
"win_rate": 65.52,
"profit_factor": 2.34,
"total_pl": 1234.56
},
"_accuracy_report": {
"checksum": "A7B3C2D1",
"confidence": {"score": "high", "reason": "9/9 metrics, 29 positions"},
"metrics": [
{
"path": "summary.win_rate",
"value": 65.52,
"citation": "Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]"
},
{
"path": "summary.profit_factor",
"value": 2.34,
"citation": "Profit factor: 2.34 [Source: get_mt5_position_history.summary.profit_factor]"
}
]
}
}
## Performance Analysis (Confidence: high)
### Overview
- Total positions: 29 [Source: get_mt5_position_history.summary.total_positions]
- Win rate: 65.52% [Source: get_mt5_position_history.summary.win_rate]
- Profit factor: 2.34 [Source: get_mt5_position_history.summary.profit_factor]
### Financial Performance
- Total P&L: $1234.56 [Source: get_mt5_position_history.summary.total_pl]
[Verified: A7B3C2D1]
Retrieved: "Magic number 106 is Goldfish Scalper trading XAUUSD"
{
"hallucinations_found": false,
"checksum_valid": true,
"summary": "All claims verified against source data"
}
{
"success": true,
"analysis": "## Performance Analysis (Confidence: high)\n\n...\n\n[Verified: A7B3C2D1]",
"_validation_meta": {
"validated": true,
"attempts": 1,
"model": "novita-default",
"rag_context_used": true
}
}
| Layer | Technique | What It Prevents | Key Insight |
|---|---|---|---|
| 1 | Server-side calculation | LLM arithmetic errors | Use the right tool for the job |
| 2 | Pre-formatted citations | LLM paraphrasing numbers | Reduce LLM to copy machine |
| 3 | Template-based output | Structural hallucination | Minimize degrees of freedom |
| 4 | RAG context grounding | Semantic hallucination | Ground entities in facts |
| 5 | Second LLM validation | Subtle errors slipping through | Verification < Generation |
| 6 | Auto-retry with correction | Transient failures | Iterative refinement converges |
Trust flows from deterministic systems to stochastic ones, never the reverse.
Python calculates → Server stores → Citations copy → Templates structure → Validator checks
At no point does an LLM "decide" a number. The LLM's role is purely mechanical: copying citations into template slots. This is the fundamental insight that makes 100% accuracy achievable.
| File | Purpose | Layer |
|---|---|---|
| server.py | MCP server with pre-calculated metrics | 1 |
| accuracy_report.py | Citation and checksum generation | 2 |
| response_formatter.py | Template-based output formatting | 3 |
| knowledge_base.py | ChromaDB RAG integration | 4 |
| llm_validator.py | Second LLM fact-checking | 5 |
| validation_decorator.py | Auto-retry orchestration | 6 |
| auto_corrector.py | Automatic error correction | 6 |
2025-11-29 00:13:05
We’re living in a time where almost every “free” AI tool hides something behind a paywall — limited credits, forced registration, watermarks, or downgraded quality unless you upgrade.
So when I came across two AI tools that are actually free, require zero sign-up, and deliver genuinely high-quality results, I was honestly surprised. They feel like hidden gems, so I’m sharing them here for anyone who loves AI tools, productivity resources, or creative experiments.
🎨 1. Free AI Image Generator (High Quality + No Account Needed)
🔗 https://productivitygears.com/free-ai-image-generator-tool
If you’ve ever used online image generators, you know how frustrating it gets: credits, blurry previews, logins, watermarks — the usual story.
This tool is the opposite: clean, fast, and free.
Press enter or click to view image in full size
What Makes It Stand Out
No login or email required
Completely free (not a “free trial”)
Generates high-quality images
Surprisingly fast
No watermark on downloads
Simple, clean interface
You can create illustrations, concept art, characters, logos, wallpapers — whatever you want — without jumping through hoops. It’s refreshing to use something that just works.
🔊 2. Free AI Text-to-Speech Tool (Natural Voices, Instant MP3)
🔗 https://productivitygears.com/free-ai-text-to-speech
Become a member
The second tool I found is a Text-to-Speech converter, and honestly, it’s one of the better free ones I’ve tested recently. Many TTS websites place strict character limits, require login, or keep the good voices locked behind subscriptions.
Press enter or click to view image in full size
This one doesn’t.
Why It’s Worth Bookmarking
Completely free
No sign-up
Natural, realistic voices
Instant MP3 download
No character limit during my tests
Great for videos, voiceovers, content creation, or accessibility
For something free and frictionless, the quality is surprisingly good.
⭐ Final Thoughts
Finding AI tools that are truly free and actually good is rare. Both of these tools deliver value without the typical restrictions — no login, no credits, no watermark surprises.
If you enjoy playing around with AI, create content, or just love exploring useful online tools, these two are absolutely worth checking out and adding to your bookmarks.
If you try them, I’m curious which one you end up using more.
2025-11-29 00:10:36
What do the terms identity, AI, workload, access, SPIFFE, and secrets all have in common? These were the most common words used at CyberArk's Workload Identity Day Zero in Atlanta ahead of KubeCon 2025.
Across an evening full of talks and hallway conversations, the conversation kept coming back to the fact that we have built our infrastructures, tools, and standards around humans, then quietly handed the keys to a fast-multiplying universe of non-human identities (NHIs). However, the evening didn't dwell on what we have gotten wrong, but instead on what we are getting right as we look towards a brighter future of workload identity.
Every speaker discussed what has happened so far and how we have reached the state in which so many companies find themselves. These workload identities, in the form of services, agents, CI jobs, Lambdas, etc, are mostly authenticated today with long-lived API keys that are more likely than not, overprivileged. We have granted standing access too often. While some teams have embraced PKI setups at scale are trapped in a complexity that only a handful of experts truly understand.
The result is explosive complexity as teams face multi-cloud and hybrid environments, multiple languages, and increasingly complex org charts. It is no wonder that every siloed team has come up with its own ad hoc solutions over the years. But that means it is unlikely that any governance model will be a good fit, preventing us from leveraging a single management system. There is also the fact that if an attacker gets hold of a single NHI credential, they gain a huge, often invisible foothold with a massive blast radius.
At the same time, scale and AI are turning this from "annoying" to "existential" threat. Workloads now spin up, talk to each other, cross trust domains, and die off in seconds, while organizations want billions of attestations per day without hiring an army just to rotate secrets. Now, Agentic AI shows up and starts acting on our behalf. It calls APIs, touches sensitive data, and hops across providers. We now don't know if a user triggered an action, or an autonomous agent did.
We are beginning to recognize the need for clean attribution, access governance, and logging. Everyone on stage is essentially describing the same issue: we can't keep handing out magic tokens to non-human actors and hope spreadsheets, YAML, and "best effort" PKI will save us.
The opening keynote from Andrew Moore, Staff Software Engineer at Uber, "From Bet to Backbone, Securing Uber with SPIRE," set the tone for the evening. For Uber, all of this work comes down to external customer trust. Their SPIRE journey at Uber really began when they admitted the impossibility of governing "thousands of solutions at scale."
Andrew's team moved toward a single, SPIFFE-based workload identity fabric that can handle hundreds of thousands to billions of attestations per day. They treat SPIRE as the "bottom turtle." This means trusted boot, agent validation, centralized signing, tight, well-designed SPIFFE IDs that don't accumulate junk.
Andrew Moore
Brett Caley, Senior Software Security Engineer at Block, echoed a similar story arc as Uber's in his talk "WIMSE, OAUTH and SPIFFE: A Standards-Based Blueprint for Securing Workloads at Scale." The core question they needed to answer was how to prove a workload "Is who they say they are." Their team went from plaintext keys in Git and bespoke OIDC hacks to "x509 everywhere," SPIRE-driven attestations. They have rolled out systems that can issue credentials where the workload is, and at the speed developers demand. He explained that they deployed SPIRE "where it should exist."
Brett also tied all of this to agentic AI and our tendency to anthropomorphize, that is, giving AI agents names in Slack. He said this might invoke Black Mirror storylines, but we need to remember that these 'human-like agents' are all still workloads that need narrowly scoped permissions, explicit authorization of actions, and confirmation of intent.
When something goes wrong, he argues, the right question isn't "What did the AI do to us?" but "How did our system fail in governing the AI's workload identity and permissions?"
Brett Caley
In their talk "AI agent communication across cloud providers with SPIFFE universal identities," Dan Choi, Senior Product Manager, AWS Cryptography, and Brendan Paul, Sr. Security Solutions Architect from AWS, highlighted that Agentic AIs are comprised of workloads and need to communicate across clouds without ever touching long-lived secrets.
They said if you can establish two-way trust between your authorization servers and your SPIFFE roots of trust, via SPIRE, you can treat SPIFFE Verifiable Identity Documents (SVID) as universal, short-lived identities for AI agents.
From there, they walked through some concrete use cases, including an AI agent acting on behalf of a user. Framed as an "AI-enabled coffee shop" demo, they showed how you start with an authenticated web app, then propagate both user identity and workload identity so the agent can check inventory, update systems, and call tools with clear attribution and least privilege.
They stressed that you don't need MCP or any particular orchestration framework for this; the pattern is always "get the agent an SVID, then exchange it for scoped cloud credentials," whether you are interfacing with an S3 bucket or anything else. They closed by telling us that AI agents naturally span trust domains, and SPIFFE gives them a common identity fabric. The future work will refine how we do token proof-of-possession and delegation at scale for all of these non-human workloads.
Brendan Paul and Dan Choi
If these talks are any indication, the next few years are about moving workload identity from heroic projects to boring infrastructure. SPIFFE/SPIRE, WIMSE, OAuth token-exchange patterns, and transaction tokens will quietly become the plumbing. This will define how we deploy CI/CD, microservices, and AI agents securely at scale for the next generation of applications and platforms.
Enterprises are realizing that "API keys in Git" and "service account sprawl" are no longer acceptable risks. The time is now to either adopt internal identity fabrics that attest workloads, issue short-lived credentials, centralize policy, and log everything, enriching those logs with as much context as possible. The UX lesson from all of these talks is that identity is currently painful, and people route around it. If we can do the work now, to make workload identity automatic and invisible, teams will lean into it.
At the same time, agentic AI will force us to sharpen our thinking about representation and blame. We need to ask "What is this workload allowed to do, on whose behalf, and with what guardrails?" The future is building golden paths where every non-human identity, no matter what shape or function, comes pre-wired with a strong, attestable identity and tightly scoped access. The creative tension will be keeping that world flexible enough for experimentation, and safe enough that "exploding rockets" stay in test environments, not production.
No matter where your path towards better NHI governance, everyone at Workload Identity Day Zero agreed that having insight into your current inventory of workloads and machine identities is mandatory. We at GitGuardian firmly believe that the future can mean fewer leaked secrets, better secrets management, and scalable NHI governance, but we can't get there without first understanding the scope of the issue in the org today. We would love to talk to you and your team about that.
2025-11-29 00:02:49
CinemaSins takes on the wild ride that is KPop Demon Hunters, throwing their signature “sins” over the movie in under 16 minutes of sharp, tongue-in-cheek critiques. They invite fans to dive deeper into the CinemaSins universe—follow their YouTube channels, social feeds, and even fill out a “sinful” poll or support the team on Patreon.
The video description also lists the crew behind the sins (Jeremy, Chris, Aaron, Jonathan, Deneé, Ian, Daniel) with links to their social profiles, plus community hangouts on Discord and Reddit, and more goodies like Jeremy’s book and TikTok highlights.
Watch on YouTube
2025-11-29 00:00:28
Hi!
I've been working on a pure-Go race detector that works without CGO. Just released v0.3.0 with major performance improvements.
Go's built-in race detector requires CGO. This means:
A pure-Go implementation of race detection using the FastTrack algorithm. Works with CGO_ENABLED=0.
99% overhead reduction - from ~10,000x down to ~100x slowdown.
Key improvements:
go install github.com/kolkov/racedetector/cmd/[email protected]
# Build with race detection
racedetector build -o myapp main.go
# Run with race detection
racedetector run main.go
# Production with 10% sampling
RACEDETECTOR_SAMPLE_RATE=10 racedetector run main.go
I'd appreciate if you could try it on your projects and report any bugs or false positives. Contributions are welcome.
Also curious: should something like this be proposed for Go toolchain integration, or is it better as a standalone tool?
Thanks!