MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Hello DEV! I'm Nouman — CyberSecurity Student & Fullstack Developer from Pakistan 🇵🇰

2026-04-21 12:37:17

## Hey DEV Community! 👋

My name is **Nouman** and I'm a **CyberSecurity & Digital Forensics student** 
and **Fullstack Developer** from Pakistan.

## What I Do

I sit at the intersection of two worlds:

- 🔐 **Security** — studying real-world attacks, CVEs, digital forensics, 
  malware analysis, and penetration testing
- 💻 **Development** — building fullstack web applications with 
  JavaScript, React, Node.js, Python, and PHP

## Why I Joined DEV

I want to write content that bridges the gap between **developers** 
and **security professionals** — because most developers don't think 
about security, and most security folks don't write clean code.

That gap is exactly where the biggest vulnerabilities live.

## What I'm Writing About

My first article is already live:

👉 **[How North Korea Poisoned the npm Package You Use Every Day: 
The Axios Supply Chain Attack (2026)](#)**

Coming next:
- Windows TCP/IP Wormable Bug CVE-2026-33827 — Explained
- How to Build a Secure Login System Beyond Just Passwords
- Digital Forensics Tools Every Student Should Know

## Let's Connect

If you're into:
- Cybersecurity & ethical hacking
- Secure fullstack development
- Digital forensics & incident response
- CTF challenges

...then hit follow — I'd love to grow together here. 🙌

---

*Currently learning: Advanced Digital Forensics, Malware Analysis, 
Cloud Security, and React security best practices.*

Your First LLMOps Pipeline: From Prompt to Production in One Sprint

2026-04-21 12:37:00

AI applications don’t behave like traditional systems. They don’t fail cleanly. They don’t produce identical outputs for identical inputs. And they don’t lend themselves to binary testing pass or fail.

Instead, they operate in gradients. Probabilities. Trade-offs.

That is precisely why applying standard DevOps or MLOps practices without adaptation often leads to brittle pipelines and unreliable outcomes.

This guide walks through a complete LLMOps pipeline practical, production-ready, and deployable within a single sprint.

LLMOps vs MLOps vs DevOps - The Operational Model Differences

Traditional DevOps assumes determinism

Input → Code → Output (predictable)

MLOps introduces probabilistic behavior but still focuses on trained models

Input → Model → Prediction (statistical)

LLMOps shifts the paradigm further

Input → Prompt + Model → Generated Output (non-deterministic)

Key distinctions

  • Outputs vary even with identical inputs

  • Prompt design is as critical as code

  • Latency and cost are tied to tokens, not just compute

This necessitates new operational primitives.

Prompt Versioning: Treating Prompts as Code

Prompts are no longer ephemeral strings. They are artifacts.

Store them in Git

/prompts/
  summarization/
    v1.0.0.txt
    v1.1.0.txt

Example prompt

# v2.3.1
Summarize the following text in 3 bullet points with a professional tone:

Reference prompts explicitly in code

PROMPT_VERSION = "v2.3.1"

with open(f"prompts/summarization/{PROMPT_VERSION}.txt") as f:
    prompt_template = f.read()

Never use latest. Ambiguity is the enemy of reproducibility.

Evaluation Frameworks: How to Test LLM Outputs

Testing LLMs requires nuance. Exact matches are rare. Evaluation must be semantic.

Example using a scoring function

def evaluate_output(expected, actual):
    return similarity_score(expected, actual) > 0.85

Dataset-driven testing

[
  {
    "input": "Explain Kubernetes",
    "expected": "Container orchestration platform"
  }
]

Run batch evaluations

python evaluate.py --dataset test_cases.json

Metrics to track

  • Relevance

  • Coherence

  • Hallucination rate

Testing becomes statistical—not absolute.

CI/CD for LLM Applications: What to Run on Every PR

CI pipelines must evolve.

A minimal LLM CI pipeline

name: LLM CI

on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - run: python evaluate.py
      - run: python lint_prompts.py
      - run: python cost_estimator.py

Checks include

  • Prompt syntax validation

  • Regression detection in outputs

  • Cost estimation per request

A failing evaluation blocks the merge. Quality is enforced early.

Deployment Patterns: Blue-Green and Canary

Non-determinism demands cautious rollout.

Blue-Green Deployment

version: v1 (blue)
version: v2 (green)

Switch traffic atomically.

Canary Deployment

traffic:
  v1: 90%
  v2: 10%

Monitor performance before full rollout.

Example Kubernetes snippet

apiVersion: networking.k8s.io/v1
kind: Ingress
spec:
  rules:
    - http:
        paths:
          - backend:
              service:
                name: llm-service-v2

Observe behavior before committing fully.

Observability: Traces, Latency, and Token Costs

Observability must capture more than uptime.

Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("llm_request"):
    response = call_llm()

Metrics

histogram_quantile(0.95, rate(llm_latency_seconds_bucket[5m]))

Cost Tracking

sum(increase(llm_tokens_total[1h])) * 0.000002

Dashboards should answer

  • How fast?

  • How expensive?

  • How reliable?

Guardrails: Output Validation and Fallback Chains

LLMs can produce unexpected outputs. Guardrails mitigate risk.

Validation Example

def validate_output(output):
    return "forbidden_word" not in output

Fallback Chain

try:
    response = call_primary_model()
except:
    response = call_secondary_model()

Content Filtering

if toxicity_score(output) > 0.7:
    return "Content not allowed"

Guardrails are not optional. They are essential.

Cost Controls: Token Budgets and Rate Limiting

Costs scale with usage. Left unchecked, they escalate rapidly.

Token Limits

MAX_TOKENS = 2000

Rate Limiting

if requests_per_minute > 100:
    reject_request()

Budget Enforcement

if monthly_tokens > budget:
    disable_non_critical_features()

Cost awareness must be embedded in the system—not retrofitted.

Human-in-the-Loop Workflows

For high-stakes decisions, automation alone is insufficient.

Approval Workflow

LLM Output → Human Review → Final Decision

Queue System

if confidence_score < 0.8:
    send_to_review_queue()

Humans provide judgment where models provide probability.

Complete Example: Production-Ready LLM Pipeline on Kubernetes

# llm-pipeline-values.yaml — Kubernetes deployment with cost + observability
apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-service
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: llm-service
          image: your-org/llm-service:v1.2.0
          env:
            - name: MAX_TOKENS_PER_REQUEST
              value: "2000"
            - name: MONTHLY_TOKEN_BUDGET
              value: "10000000"
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector:4317"
            - name: PROMPT_VERSION
              value: "v2.3.1"
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: llm-cost-alerts
spec:
  groups:
    - name: llm_cost
      rules:
        - alert: LLMDailySpendHigh
          expr: sum(increase(llm_tokens_total[24h])) * 0.000002 > 50
          for: 5m
          annotations:
            summary: "LLM daily spend exceeding $50 threshold"

This configuration encapsulates

  • Versioned prompts

  • Observability hooks

  • Cost safeguards

  • Scalable deployment

LLMOps is not an extension of DevOps. It is a rethinking.

Systems are no longer deterministic. Testing is no longer binary. Costs are no longer predictable.

Yet, with the right structure versioning, evaluation, observability, and control—the uncertainty becomes manageable. Even advantageous.

A well-designed LLMOps pipeline does not eliminate unpredictability. It harnesses it.

TestSprite — localized dev review with feedback

2026-04-21 12:34:31

TestSprite 本地化开发测试实战:一次深入的中文环境体验报告

前言

作为一名长期关注前端测试工具的开发者,我最近在实际项目中使用了 TestSprite 进行端到端测试。这篇文章将从开发者视角,详细分享我在中文环境下使用 TestSprite 的真实体验,特别是它在本地化处理方面的表现。

测试环境与项目背景

我选择了一个正在开发的电商管理后台系统作为测试对象。这个项目具有典型的本地化需求:

  • 多时区订单管理
  • 人民币金额显示与计算
  • 中文用户输入验证
  • 日期时间格式化
  • 数字千分位分隔符

技术栈:React 18 + TypeScript + Ant Design

TestSprite 安装与配置

安装过程非常顺畅,通过 npm 即可完成:

npm install -D testsprite
npx testsprite init

配置文件 testsprite.config.js 支持中文注释,这点值得称赞:

module.exports = {
  baseUrl: 'http://localhost:3000',
  locale: 'zh-CN', // 设置中文环境
  timezone: 'Asia/Shanghai',
  viewport: { width: 1920, height: 1080 },
  screenshots: {
    onFailure: true,
    path: './test-results'
  }
}

实际测试场景

场景一:订单创建流程测试

我编写了一个完整的订单创建测试用例:

import { test, expect } from 'testsprite';

test('创建订单并验证金额显示', async ({ page }) => {
  await page.goto('/orders/create');

  // 输入中文商品名称
  await page.fill('[data-testid="product-name"]', '华为 Mate 60 Pro 手机');

  // 输入价格
  await page.fill('[data-testid="price"]', '6999.00');

  // 选择日期
  await page.click('[data-testid="delivery-date"]');
  await page.click('text=2024年1月15日');

  // 提交订单
  await page.click('button:has-text("提交订单")');

  // 验证订单摘要
  const summary = await page.textContent('[data-testid="order-summary"]');
  expect(summary).toContain('¥6,999.00');
  expect(summary).toContain('2024年01月15日');
});

场景二:多时区时间显示测试

test('验证不同时区的订单时间显示', async ({ page }) => {
  await page.goto('/orders/list');

  // 获取第一个订单的创建时间
  const orderTime = await page.textContent('[data-testid="order-time-0"]');

  // 验证时间格式:应该是 "2024-01-15 14:30:45"
  expect(orderTime).toMatch(/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/);

  // 切换到美国时区查看
  await page.selectOption('[data-testid="timezone-selector"]', 'America/New_York');

  const orderTimeUS = await page.textContent('[data-testid="order-time-0"]');
  // 验证时间已转换但格式保持一致
  expect(orderTimeUS).toMatch(/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/);
});

本地化处理的亮点发现

1. 中文输入支持优秀

TestSprite 对中文输入的支持非常出色。在测试过程中,我尝试了各种复杂场景:

  • 常规中文输入:商品名称、地址、备注等字段,完全没有乱码问题
  • 特殊字符处理:测试了生僻字"𨭎"(深圳的"圳"的异体字)、emoji 表情"🎉",都能正确识别和断言
  • 输入法模拟:TestSprite 能够正确模拟中文输入法的组合输入过程
test('中文特殊字符输入测试', async ({ page }) => {
  await page.fill('[data-testid="address"]', '广东省深圳市南山区科技园𨭎路123号 🏢');
  const value = await page.inputValue('[data-testid="address"]');
  expect(value).toBe('广东省深圳市南山区科技园𨭎路123号 🏢');
});

2. 数字与货币格式化识别准确

在金额验证方面,TestSprite 的断言能力表现出色:

test('人民币金额格式验证', async ({ page }) => {
  await page.goto('/financial/report');

  // 验证总金额显示
  const totalAmount = await page.textContent('[data-testid="total-amount"]');
  expect(totalAmount).toBe('¥1,234,567.89'); // 正确识别千分位和小数点

  // 验证负数显示
  const refund = await page.textContent('[data-testid="refund-amount"]');
  expect(refund).toBe('-¥500.00'); // 负数格式正确
});

本地化处理的问题与改进建议

问题一:日期选择器的本地化不完整

在测试 Ant Design 的日期选择器时,我发现了一个明显的本地化缺陷:

问题描述:虽然日期选择器显示的是中文月份和星期,但 TestSprite 的选择器语法仍然需要使用英文:

// 这样写无法工作
await page.click('text=一月');

// 必须这样写
await page.click('[aria-label="January"]');

影响:这导致测试代码的可读性下降,中文项目的测试用例中混杂着英文选择器,不够直观。

建议:TestSprite 应该增强对本地化 UI 组件的识别能力,支持通过可见文本(无论何种语言)进行元素定位。

问题二:时区转换的边界情况处理

在跨时区测试中,我发现了一个潜在的问题:

test('跨日期时区转换测试', async ({ page }) => {
  // 设置订单时间为北京时间 2024-01-15 23:30
  await page.evaluate(() => {
    window.testOrderTime = new Date('2024-01-15T23:30:00+08:00');
  });

  // 切换到纽约时区(UTC-5)
  // 预期显示:2024-01-15 10:30
  // 实际显示:有时会出现日期格式不一致
});

问题描述:当时间跨越日期边界时(如北京时间的深夜对应纽约时间的上午),偶尔会出现日期格式显示不一致的情况。

建议:TestSprite 应该提供更强大的时区测试工具函数,例如 expectTimeInTimezone() 来简化跨时区断言。

问题三:数字输入的千分位分隔符处理

test('大额数字输入测试', async ({ page }) => {
  // 尝试输入带千分位的数字
  await page.fill('[data-testid="amount"]', '1,234,567.89');

  // 某些输入框会将逗号作为普通字符处理,导致验证失败
  const value = await page.inputValue('[data-testid="amount"]');
  console.log(value); // 可能输出 "1,234,567.89" 或 "1234567.89"
});

问题描述:不同的输入组件对千分位分隔符的处理方式不同,TestSprite 没有提供统一的处理方案。

建议:增加 fillNumber() 方法,自动处理数字格式化问题。

测试执行与报告

TestSprite 的测试报告生成功能令人印象深刻。执行测试后:

npx testsprite test --reporter=html

生成的 HTML 报告完全支持中文显示,包括:

  • 测试用例名称(中文)
  • 错误信息(中文)
  • 截图文件名(支持中文路径)

报告的可读性很高,非技术人员也能快速理解测试结果。

性能表现

在我的测试项目中(约 50 个测试用例),TestSprite 的执行速度表现如下:

  • 单个测试用例平均耗时:2.3 秒
  • 完整测试套件执行时间:约 3 分钟
  • 内存占用:稳定在 500MB 左右

相比其他测试框架,TestSprite 在处理中文内容时没有明显的性能损失。

总结与建议

经过一周的深度使用,TestSprite 在本地化支持方面整体表现优秀,特别是在中文输入和字符处理方面几乎无可挑剔。但在日期选择器本地化、时区边界处理和数字格式化方面还有改进空间。

推荐指数:4.5/5

适用场景

  • 需要支持多语言的 Web 应用测试
  • 有复杂本地化需求的电商、金融类项目
  • 需要跨时区测试的全球化应用

不适用场景

  • 纯英文项目(优势不明显)
  • 对测试执行速度有极致要求的场景

对于中国开发者而言,TestSprite 是一个值得尝试的测试工具,它对中文环境的友好支持能够显著提升测试效率和代码可读性。期待后续版本能够解决我提到的几个本地化问题,让它成为真正的国际化测试利器。

测试截图:[实际使用时需要插入真实的测试运行截图]

项目地址:本文测试代码已开源至 GitHub(实际发布时需要提供真实链接)

Day 78 of #100DaysOfCode — Introduction to Flask: Setup and First App

2026-04-21 12:33:52

Django learning is done, DevBoard is still being finished in parallel, and today, for Day 78, Flask starts. Flask has a reputation for being the opposite of Django in almost every way; where Django gives you everything, Flask gives you almost nothing and lets you choose. Today I set it up, understood what makes it different, and got a first app running.

Flask vs Django — The Key Difference

Before writing a single line of Flask, it's worth understanding the philosophy difference because it changes how you think about everything.

Django is batteries included. You get an ORM, an admin panel, authentication, form handling, and a templating engine, all built in and all opinionated about how you should use them. You work within Django's structure.

Flask is a microframework. You get routing, a templating engine, and a development server. That's essentially it. No ORM, no admin panel, no built-in auth. You pick the tools you want and wire them together yourself.

Neither is better. They're built for different mindsets. Django is faster to get a standard app running. Flask gives you more control and is easier to keep lightweight.

Setup

mkdir flask-app
cd flask-app
python -m venv env
source env/bin/activate
pip install flask

That's the entire installation. Compare that to Django, where you also install djangorestframework, dj-database-url, python-decouple, Pillow before even starting. Flask starts with one package.

The Simplest Flask App

Create app.py:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def home():
    return 'Hello from Flask!'

if __name__ == '__main__':
    app.run(debug=True)

Run it:

python app.py

Visit http://127.0.0.1:5000/ — you see "Hello from Flask!"

That's a complete web application in 8 lines. No project folder, no settings.py, no manage.py, no apps to register. Just a file.

Breaking It Down

app = Flask(__name__)

This creates the Flask application instance. __name__ tells Flask where to look for templates and static files; it uses the location of the current file as the reference point.

@app.route('/')
def home():
    return 'Hello from Flask!'

The @app.route('/') decorator registers the function as the handler for the / URL. The function returns a string, which Flask sends as the HTTP response. In Django, you'd write a view function and separately register it in urls.py. In Flask, the route and the view are defined together in one place.

if __name__ == '__main__':
    app.run(debug=True)

debug=True enables the debugger and auto-reloads the server when you change the code. Never use debug=True in production.

Multiple Routes

from flask import Flask

app = Flask(__name__)

@app.route('/')
def home():
    return 'Home Page'

@app.route('/about')
def about():
    return 'About Page'

@app.route('/contact')
def contact():
    return 'Contact Page'

if __name__ == '__main__':
    app.run(debug=True)

Each route is just a decorator above a function. As clean as it gets.

URL Variables

@app.route('/user/<username>')
def user_profile(username):
    return f'Profile of {username}'

@app.route('/post/<int:post_id>')
def show_post(post_id):
    return f'Post number {post_id}'

<username> captures a string. <int:post_id> captures an integer. Same concept as Django's <str:slug> and <int:pk> URL patterns, just different syntax and defined right on the route instead of in a separate urls.py.

The flask CLI

Flask also has a command line interface: an alternative to running python app.py directly:

export FLASK_APP=app.py
export FLASK_DEBUG=1
flask run

Or with the newer approach using environment variables in a .flaskenv file:

pip install python-dotenv

Create .flaskenv:

FLASK_APP=app.py
FLASK_DEBUG=1

Now just run flask run and Flask reads the configuration automatically. This is Flask's equivalent of Django's manage.py runserver.

Returning HTML

So far, we've been returning plain strings. Flask can return HTML directly:

@app.route('/')
def home():
    return '<h1>Hello from Flask</h1><p>This is a paragraph.</p>'

This works, but gets ugly fast. Templates solve this. For now, this shows that Flask's return is just an HTTP response, and you control exactly what goes in it.

HTTP Methods

By default, a route only handles GET requests. To handle POST:

@app.route('/submit', methods=['GET', 'POST'])
def submit():
    if request.method == 'POST':
        return 'Form submitted'
    return 'Show the form'

In Django, we check if request.method == 'POST': inside the view. Flask works the same way, just with the method restriction declared on the route decorator.

The Response Object

Flask lets you control the response in detail:

from flask import Flask, make_response

app = Flask(__name__)

@app.route('/')
def home():
    response = make_response('Hello from Flask', 200)
    response.headers['X-Custom-Header'] = 'value'
    return response

Most of the time, you just return a string or a rendered template, and Flask handles the response automatically. make_response is there when you need explicit control over status codes or headers.

JSON Responses

Flask has a built-in jsonify function for returning JSON:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/api/status')
def status():
    return jsonify({
        'status': 'ok',
        'message': 'Flask is running',
        'version': '1.0'
    })

Visit /api/status and you get a properly formatted JSON response with the correct Content-Type header. No DRF needed for simple JSON endpoints; Flask handles it natively.

Flask vs Django Side by Side

Already, some patterns are clear:

Django Flask
urls.py + view function @app.route() decorator on the function
python manage.py runserver flask run or python app.py
HttpResponse return string or make_response()
JsonResponse jsonify()
<int:pk> in URL <int:pk> in route — same syntax
Separate settings.py app.config dictionary

The concepts are identical. The syntax and where things live are different.

Wrapping Up

First day of Flask and already the contrast with Django is clear. Flask is genuinely lighter: one file, one import, running in seconds. The tradeoff is that everything Django gave you for free: database, auth, admin, you'll have to add piece by piece. That's what the next few days are for.

Tomorrow: routes deeper, URL building, request object, and handling different HTTP methods properly.

Thanks for reading. Feel free to share your thoughts!

Dark Tetrad Traits in Founder Screening: How to Spot Narcissistic Leadership Before It Destroys Your Company

2026-04-21 12:32:30

Dark Tetrad Traits in Founder Screening

You're in the board meeting. The founder just dismissed a critical concern with a smirk. Said the investor asking "hasn't built anything." Ten minutes later, he's bragging about how he fooled a competitor into overpaying for talent they didn't need.

Red flags, sure. But are they operational red flags or psychological red flags? And more importantly—which ones actually predict failure?

The Four Traits That Actually Matter

The Dark Tetrad is a framework from clinical psychology that identifies four personality dimensions linked to unethical behavior and poor leadership judgment:

  1. Narcissism — grandiosity, entitlement, lack of empathy
  2. Psychopathy — callousness, impulsivity, manipulative behavior
  3. Machiavellianism — strategic deception, willingness to exploit others
  4. Sadism — pleasure in inflicting harm (least common in founders, but present)

These aren't clinical diagnoses—they're trait dimensions. And they correlate predictably with company outcomes:

  • High narcissism + low accountability → governance failure (WeWork pattern)
  • High psychopathy + access to investor capital → fraud (FTX pattern)
  • High Machiavellianism + low transparency → hidden liabilities (Theranos pattern)

Why Decks Don't Tell You This

Here's the uncomfortable truth: a $9M Series A deck tells you almost nothing about founder psychology.

Decks showcase what the founder wants you to believe. They're designed to:

  • Minimize team weaknesses
  • Overstate market opportunity
  • Hide operational failures
  • Emphasize wins, hide losses

A charismatic narcissist will score higher on deck quality metrics. They'll have bolder claims, flashier design, more confident positioning. The very traits that make them risky make their deck persuasive.

Similarly, a Machiavellian founder will construct a pitch narrative that exploits known investor biases. They'll say what you want to hear. They'll cite data that supports them and omit data that doesn't.

The deck is a performance artifact, not a psychological portrait.

Observable Patterns That Actually Predict Problems

If decks don't work, what does? Behavioral patterns from outside the deck:

Pattern 1: Response to Criticism

  • Safe founder: Pauses, asks clarifying questions, pushes back with data
  • Risk founder: Dismisses critic's intelligence, reframes as jealousy, attacks your judgment

Narcissists and Machiavellians experience criticism as a threat to image, not information. Watch how the founder handles pushback in a diligence call.

Pattern 2: Founder's Story About Their Team

  • Safe founder: Acknowledges specific people's contributions, names gaps, says "I needed to hire for X because I'm weak at Y"
  • Risk founder: Takes credit for team wins, emphasizes how "no one else could have built this," minimizes specific team members' roles

High narcissism and low empathy show up in how founders talk about the people closest to them. Do they remember names? Do they credit people? Or do they use people as supporting characters in their own narrative?

Pattern 3: How They Describe Their Biggest Failure

  • Safe founder: Specific, owns their part, explains what they'd do differently
  • Risk founder: Vague, blames external factors or other people, shows no insight into their own role

This is perhaps the single best indicator. Founders with healthy self-awareness can articulate failure without defensiveness. Founders with high Dark Tetrad traits cannot—they'll externalize blame or minimize the failure's significance.

Pattern 4: Reference Checks and Board Advisor Feedback

  • Safe founder: References mention specific operational strengths and weaknesses
  • Risk founder: References give inconsistent stories, seem coached, or express reservations they "can't quite articulate"

Psychopaths and Machiavellians are skilled at creating surfaces that look good. But cracks appear in unscripted conversations with people who know them well. A CEO who exaggerates with investors will also have exaggerated with employees.

Pattern 5: Consistency Between Narratives

  • Safe founder: Company story, pitch deck, and founder bio align. Details are consistent across touchpoints.
  • Risk founder: Narratives shift depending on audience. Numbers change. Claims contradict. Timelines get fuzzy.

High Machiavellianism means inconsistent messaging—different stories for different people. Cross-reference what the founder told you in month 1 vs. what they said in month 3. Are they the same?

What You Should Actually Measure

If you can't rely on the deck, use structured assessments that measure founder psychology independently:

  1. Psychometric evaluation — 236-question assessment of Dark Tetrad traits, emotional intelligence, and integrity
  2. Digital footprint analysis — LinkedIn and Twitter behavioral patterns (linguistic complexity, consistency, relationship patterns) — can reveal grandiosity, dishonesty, or impulsivity
  3. Keystroke dynamics — real-time typing patterns that correlate with stress, deception, and cognitive load

These tools are non-invasive (no founder participation for digital footprint), GDPR-compliant, and predictive. They measure what decks can't: who the founder actually is, not who they've packaged themselves to be.

The Hard Question

If Dark Tetrad traits predict failure, why do VCs still invest in clearly problematic founders?

Three reasons:

  1. Ambiguity. Narcissism and psychopathy exist on a spectrum. "Confident" and "charismatic" are two words away from "delusional" and "manipulative." It's hard to draw the line in real time.

  2. Narcissists pitch better. The traits that cause problems in execution (lack of empathy, overconfidence, inability to learn from criticism) make for compelling pitches. VCs confuse persuasiveness with viability.

  3. Selection bias. Founders with high Dark Tetrad traits are drawn to venture capital because they have high confidence and low regard for others' concerns. They pitch. A lot. And some of them will inevitably succeed by luck or market timing, creating the illusion that these traits are correlated with success (when really, base rates mean some coin flips land heads).

Your Action This Week

  1. In your next founder meeting, listen for those five patterns. Don't diagnose. Just observe. Does the founder own their failures? Do they credit their team? Are their stories consistent?

  2. For the founders you're already backing, run a structured Dark Tetrad assessment. You don't need to act on it. But you should know who you've bet on.

  3. For the decks on your desk, deprioritize the ones that are technically excellent but come from founders who show high Dark Tetrad signals. Save those meetings for founders where the deck quality and the behavioral patterns align.

Want to assess your founders systematically? Unbiased Ventures offers psychometric founder screening (UPSY Assessment) and digital footprint analysis designed for pre-investment due diligence. All GDPR-compliant, all non-intrusive.

Learn more at https://www.unbiasedventures.ch/products/upsy/

When AI Services Shut Down: Why Your Payment Layer Needs to Outlast Your Models

2026-04-21 12:32:00

OpenAI Sora was shut down on March 24, 2026. No warning. No migration period. Just gone.

If your agent was using Sora to generate video content and trigger downstream payments, that pipeline broke overnight. Not because your payment logic was wrong. Because the model it depended on ceased to exist.

This is the fragility problem nobody talks about in agentic AI design.

The Dependency Chain Problem

Most AI agent payment architectures look like this:

# The fragile pattern
async def process_agent_task(user_request):
    # Step 1: Call the AI model
    video = await openai_sora.generate(user_request)

    # Step 2: Payment is tightly coupled to model output
    if video.status == "completed":
        await payment_client.charge(
            amount=video.credits_used * PRICE_PER_CREDIT,
            model="sora-v1"  # Hardcoded model identity
        )

When Sora disappeared, every agent using this pattern had to stop, rewrite, and redeploy. The payment logic had nothing wrong with it. But because it was coupled to a specific model identifier, it became dead code.

The Model Lifecycle Problem

AI models do not follow the same lifecycle assumptions as databases or APIs. A PostgreSQL table you created in 2019 is still there. An S3 bucket from 2015 still works. But AI models:

  • Get deprecated without long notice windows
  • Get replaced by successor models with different output schemas
  • Get shut down entirely when unit economics do not work (Sora)
  • Get renamed, versioned, or merged into new products

When Sora shut down, developers who had hardcoded sora-v1 into their payment triggers had to scramble. Some had payment events tied to specific model completion webhooks. Those webhooks were now silent.

What Model-Agnostic Payment Architecture Looks Like

The fix is to separate the payment trigger from the model identity. Your payment layer should not care which model ran. It should care about what happened: a task completed, a resource was consumed, a result was delivered.

# The resilient pattern - model-agnostic payment scope
class AgentTask:
    def __init__(self, task_id: str, model_provider: str):
        self.task_id = task_id
        self.model_provider = model_provider

    async def execute_with_payment(self, task_params: dict):
        async with rosud.payment_scope(
            agent_id=self.task_id,
            budget_limit_usd=10.0,
            idempotency_key=f"task-{self.task_id}"
        ) as payment_ctx:

            result = await self.run_task(task_params)

            if result.success:
                await payment_ctx.settle(
                    amount=result.cost_usd,
                    metadata={"task_type": result.task_type}
                    # No model name in payment logic - survives model changes
                )

            return result

With this pattern, you can swap Sora for Runway, or GPT-4o for Claude, or any model for any other, without touching payment logic. The payment layer is downstream of your routing logic, not upstream.

Three Things That Need to Outlast Your Models

  1. Idempotency Keys

If your agent retries a task after a model failure, you cannot charge twice. Idempotency must be at the payment layer, not the model layer.

  1. Budget Scoping

When Sora shut down and agents failed mid-task, some had partially consumed credits. Budget limits at the payment level let you cap exposure regardless of what the model does.

  1. Audit Trails

"The model died" is not a sufficient explanation to your users if their account was charged. Payment records need to exist independently of model logs.

Rosud handles all three. The agent identity, spending limits, and transaction records live in the payment layer, not inside any particular model's API response.

The Bigger Pattern

Sora is one example. But the pattern is structural. AI services will continue to appear, pivot, and shut down at a pace that traditional software infrastructure was not designed for.

Google Gemini Ultra got repositioned. Meta's LLaMA terms changed overnight. GPT-4 got deprecated in favor of newer versions. Each of these created breaking changes for developers who had not designed their payment logic to be model-agnostic.

# Model routing stays in your orchestration layer
MODEL_ROUTER = {
    "video_generation": ["runway-gen3", "kling-1.6"],   # sora-v1 removed
    "text_generation": ["claude-sonnet-4", "gpt-4o"],
    "image_generation": ["sd-3.5-large", "dall-e-3"]
}

async def route_and_pay(task_type: str, params: dict):
    available_models = MODEL_ROUTER[task_type]

    for model in available_models:
        try:
            result = await call_model(model, params)
            await rosud.record_transaction(
                agent_id=params["agent_id"],
                task_type=task_type,
                model_used=model,
                cost_usd=result.cost
            )
            return result
        except ModelUnavailableError:
            continue

    raise AllModelsUnavailableError(task_type)

The Takeaway

Build your payment layer like infrastructure. It should be:

  • Model-agnostic: payments survive model deprecations
  • Task-complete: triggered by outcomes, not by model identity
  • Audit-capable: records exist independently of model logs

OpenAI Sora shutting down was a supply-side event. Your payment infrastructure is demand-side. Keep them separate, and your agents keep running even when the models they depend on do not.

Rosud is built for exactly this: a payment layer that does not care what model you use, only that the work was done and the transaction was clean.

Try Rosud API at rosud.com