MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

Building Production-Ready AI Backends with FastAPI

2025-12-30 13:42:43

Why AI Backends Fail in Production (and how to build them properly)

Building AI demos has never been easier. With notebooks, Streamlit or Gradio, you can create something impressive in minutes.

But once AI is supposed to live inside a real system that serves requests, integrates with data sources, handles errors and evolves over time, most of these approaches start to fall apart. This article series focus on exactly that gap:
How to turn AI PoCs into production-ready backend services.

Thinking of AI as a backend service instead of a demo

A production-ready AI backend needs to do more than generate text.

It must provide:

  • clearly defined input and output contracts
  • predictable behavior despite probabilistic models
  • clean integration with databases, APIs and business logic
  • controlled error handling
  • extensibility (RAG, tools, agents, async workflows)
  • testability and long-term maintainability

A demo optimizes for speed and visibility. A backend service optimizes for reliability, structure and integration. That distinction becomes critical as soon as AI is not the product itself, but a capability inside a larger system.

This is where backend frameworks, and especially FastAPI, start to matter.

Why FastAPI fits production-ready AI backends so well

Its core concepts map extremely well to the architectural challenges of AI systems. Instead of treating AI as a special case, FastAPI allows it to be handled like any other backend component with clear boundaries and responsibilities.

Structured contracts instead of free-form AI

Large language models are probabilistic. Backend systems are not.

Using Pydantic as a contract layer makes AI outputs machine-consumable:

  • validated prompt inputs
  • structured LLM responses
  • tool schemas for agents
  • explicit agent actions

This is essential if AI is supposed to interact reliably with existing systems. Without strict contracts, small deviations in model output quickly lead to runtime errors, brittle integrations and difficult debugging.

RAG and agents as first-class dependencies

Retrieval pipelines, vector stores, agent tools or memory components are often treated as something “special” in AI projects. FastAPI’s dependency injection model removes that distinction.

RAG components, agents and tools can be injected exactly like:

  • database sessions
  • configuration objects
  • external services

This leads to a clean separation of concerns:

  • API layer
  • AI logic
  • retrieval
  • tooling
  • infrastructure

The result is an architecture where AI components are replaceable, testable and composable, rather than being hard wired into endpoint logic.

Backend primitives that matter for AI in production

Production AI systems often require:

  • file uploads (documents for RAG pipelines)
  • background tasks for long-running AI jobs
  • async execution for performance
  • security layers around AI endpoints

FastAPI provides these primitives out of the box, which makes it easier to move from experimentation to real backend services without changing the entire stack.

A minimal architectural idea in code

Not as a draft, but as a visual concept:

@app.post("/chat", response_model=ChatResponse)
asyncdefchat(
    prompt: ChatRequest,
    llm=Depends(get_llm),
):
    result = llm.invoke(prompt.message)
return ChatResponse(answer=result)

This is intentionally simple, but it already enforces:

  • a strict input and output schema
  • a decoupled AI component
  • a testable endpoint
  • an extensible foundation for RAG or agents

That is the difference between “AI as a demo” and “AI in a production backend”.

Conclusion: production-ready AI is an architectural problem

Building production-ready AI backends is not about better prompts or bigger models.

It is about:

  • contracts
  • separation of concerns
  • controlled integration
  • predictable behavior

Good architecture turns AI into a reliable and powerful backend capability. It allows you to design systems in which AI creates real and sustainable value instead of merely generating text.

Ansible Fundamentals Beyond the First Playbook

2025-12-30 13:38:00

This blog continues from my previous Ansible basics blog. If you haven't gone through it yet, I recommed reading it first, where I explained Ansible fundamentals inventory, playbooks, and how to write your first playbook.
Ansible Fundamentals

Variables - The basic inputs of Ansible
Variables are values that can change and are used to make playbooks flexible and reusable.

Instead of hard coding information inside a playbook, we store it in variables. Variables are important because one playbook works for multiple servers and changes are easy and safe

Think of it like: When you build a house, you first decide:

  • How many rooms?
  • What color paint?
  • What type of flooring?

These details can change from one house to another.

Similarly, in asnible:

  • Server name can change

  • Port number can change

  • Software version can change
    These changing values are stored in variables.

How to Define Variables in Ansible
1) Inside a Playbook

---
- name: Example of Ansible Variables
  hosts: all
  vars:
    app_name: "MyApp"
    app_port: 8080
  tasks:
    - name: Print application details
      debug:
        msg: "Deploying {{ app_name }} on port {{ app_port }}"

Here, vars: defines variables directly in the playbook. You can reference them using {{ variable_name }}.

2) In Inventory File

[webservers]
server1 ansible_host=192.168.1.10 app_port=8080
server2 ansible_host=192.168.1.11 app_port=9090

3) Using vars_files
You can keep variables in a separate file for better organization

# vars.yml
app_name: "MyApp"
app_port: 8080

4) Passing Extra Variables at Runtime

ansible-playbook site.yml --extra-vars "app_name=MyApp app_port=9090"

Ansible Facts
Ansible facts are pieces of information about the target system that Ansible automatically gathers when running a playbook. These include details like:

  • IP Addresses

  • Operating system type and version

  • Hostname

  • CPU, memory, and disk information
    Facts are collected by the setup module and stored in a dictionary called ansible_facts.

How to Gather Facts
By default, Ansible gathers facts at the start of a playbook run. This is controlled by the gather_facts parameter in the playbook:

- name: Example Playbook
  hosts: all
  gather_facts: yes
  tasks:
    - name: Print OS family
      debug:
        msg: "OS Family is {{ ansible_facts['os_family'] }}"

If you set gather_facts: no, Ansible will skip collecting facts, which can speed up execution when facts are not needed.

Ansible roles
Roles in Ansible are a way to organize playbooks into reusable components. Instead of writing large, monolithic playbooks, roles allow you to break them down into smaller, modular pieces.

Benefits of using roles:

  • Reusability: Use the same code across multiple projects.

  • Maintainability: Easier to update and manage.

  • Scalability: Ideal for large environment.

  • Standardization: Encourages best practices and consistent structure.

Structure of a role

roles/
  └── myrole/
      ├── tasks/        # Main list of tasks to execute
      ├── handlers/     # Handlers triggered by tasks
      ├── templates/    # Jinja2 templates for configuration files
      ├── files/        # Static files to copy
      ├── vars/         # Variables with higher precedence
      ├── defaults/     # Default variables (lowest precedence)
      └── meta/         # Role metadata (dependencies, author info)

Ansible Galaxy
Ansible Galaxy is a community hub and repository for Ansible content like roles, collections, and plugins. It’s essentially a marketplace where you can download pre-built automation content or share your own. This saves time because you don’t have to write everything from scratch.

1) Install a Role from Galaxy

ansible-galaxy role install geerlingguy.nginx

2) Use Installed Roles in Playbooks

---
- hosts: webservers
  roles:
    - geerlingguy.nginx

3) Create your own role

ansible-galaxy init roles/webserver

This creates the folder we need:

roles/
  └── webserver/
      ├── defaults/
      │   └── main.yml
      ├── files/
      ├── handlers/
      │   └── main.yml
      ├── meta/
      │   └── main.yml
      ├── tasks/
      │   └── main.yml
      ├── templates/
      ├── vars/
      │   └── main.yml
      └── README.md

Converting a Simple Playbook into a Role
Original Playbook:

- hosts: webservers
  tasks:
    - name: Install Apache
      yum:
        name: httpd
        state: present

    - name: Start Apache
      service:
        name: httpd
        state: started

Converted into Role:

# roles/webserver/tasks/main.yml
- name: Install Apache
  yum:
    name: httpd
    state: present

- name: Start Apache
  service:
    name: httpd
    state: started

Playbook using role:

- hosts: webservers
  roles:
    - apache

Comunidad LATAM de la FinOps Foundation

2025-12-30 13:35:55

¿Actualmente haces FinOps y te sientes solo/a? Esa fue la sensación que tuve durante algunos meses, cuando me enfrenté al reto de tomar un rol de nueva creación con una metodología muy nueva. Sin embargo, la comunidad de la FinOps Foundation ha representado para mí una alternativa de conectar y acercarme con otros/as profesionales que implementan la metodología en sus compañías y proyectos y gracias a eso, me animé a formar parte de este gran equipo.

2025 ha sido un año muy FinOpsero, y qué emoción escuchar que con más frecuencia hay más personas interesadas en aprender sobre esta nueva metodología.

Al menos en LATAM se ha detectado que un 70% de empresas que operan en la región están interesadas en ampliar su inversión en la nube, y esto también, les ha planteado tener y una gestión efectiva de los costos. En espera de los resultados del "State of FinOps" , estoy segura que encontraremos que las tendencias de la metodología tendrán un fuerte componente de cambio constante, con el objetivo de adaptarse a las tendencias del sector tecnológico, después del cambuo de enfoque hacia Cloud +, el ritmo ha sido más acelerado y la comunidad de la FinOps Foundation ha surgido como un mecanismo de impulso de la adopción de esta metodología

¿Te interesa unirte? Acá te cuento un poco sobre cómo hacerlo en América Latina:

El elemento inicial es integrarte a nuestra comunidad en slack, en él encontrarás, canales de comunicación por región, información por oportunidades laborales, anuncios de la comunidad, entre otros

  1. Regístrate acá

Das click en "Join community" y tendrás acceso a un formulario, debes ingresar información básica sobre ti, tu contacto, detalles profesionales e información específica sobre tus certificaciones de FinOps, estos datos son importantes porque será más fácil identificarte dentro de la comunidad

Un poco de detalle sobre tus intereses y la aceptación del código de conducta, este último es de los más importantes, ya que la comunidad de FinOps es un espacio, abierto, flexible y diverso, pero sobre todo un espacio seguro para todas y todos quienes estamos interesados/as en FinOps.

Una vez que das click en "Submit" recibirás un correo de parte de nuestro director de comunidad Rodolfo Silva, en la que se confirma tu acceso a la comunidad y detalles adicionales sobre tus primeros pasos en la comunidad

¡Aquí inicia todo! Preséntate, únete al grupo de tu país o región, y conecta con las personas, siempre estaremos agradecidos/as y disponibles para ayudar, y colaborar ¡Y no es todoooo! En América Latina, tenemos 2 embajadores "FinOps Ambassadors" que son profesionales usuarios finales apasionados por la comunidad de la Fundación FinOps, reconocidos/as por su experiencia, sus contribuciones y dispuestos/as a ayudar a otros a aprender y tener éxito, en LATAM, tenemos a Guido Fiamenco de Argentina y a Diego Alejandro Gómez Baena de Colombia, puedes escribirles siempre que quieras resolver alguna duda o tengas interés en proponer e integrarte a la comunidad ¡Son muy coool!

Y también, estamos los Meetup Organizers, que somos miembros de la comunidad que dirigen grupos locales o virtuales, fomentando el intercambio de conocimientos sobre FinOps, la creación de redes y el debate sobre las mejores prácticas, puedes acercarte a nosotros/as si te interesa unirte a nuestros canales, unirte a las charlas, dar una charla, o incluso, proponer nuevas ideas. Estos grupos están disponibles en Perú, Colombia, Argentina, Ecuador, Chile, y México (queremos llegar a más países), y estamos disponibles para escuchar y conectar ¡Acércate!

Creo que esta es una introducción general para que puedas integrarte, así que no dudes en hacerlo

Mejores costos, son mejores tecnologías ¡Arriba la comunidad de la FinOps Foundation LATAM!

Beyond Simple Rate Limiting: Behavioral Throttling for AI Agent Security

2025-12-30 13:31:15

Part 4 of the Zero-Trust AI Agent Security Series

As AI agents operate at machine speed with thousands of requests per second, traditional rate limiting approaches fall short. A compromised agent can stay within frequency limits while executing sophisticated attacks through behavioral manipulation, resource exhaustion, or coordinated activities. This is where behavioral throttling becomes critical for AI agent security.

The Problem with Traditional Rate Limiting

Standard rate limiting applies uniform thresholds: 100 requests per minute for everyone. But AI agents aren't uniform. A monitoring agent legitimately generates 500 telemetry messages per minute, while a decision-making agent should execute only 5 critical approvals per hour.

More importantly, sophisticated attacks operate within rate limits through:

  • Distributed coordination: 50 compromised agents each staying below individual limits while achieving 10,000 aggregate requests

  • Behavioral drift: Gradually modifying request patterns over weeks to normalize unauthorized access

  • Resource exhaustion: Submitting computationally expensive queries that consume 100x normal resources while staying within frequency limits

Sliding Windows: The Foundation

The first improvement moves from fixed windows to sliding windows. Fixed windows create exploitable edge cases where attackers send maximum requests at window boundaries, effectively doubling throughput in brief periods.

Fixed Window Vulnerability:

Window 1: [_________________100 requests at 59.8s]
Window 2: [100 requests at 60.2s
_________________]
Result: 200 requests in 0.4 seconds = Attack Success

Sliding Window Protection:

Any 60-second span from 0.2s to 60.2s contains 200 requests
Result: Limit exceeded, second burst blocked

Sliding windows continuously track requests over rolling time periods, ensuring consistent enforcement regardless of timing.

Behavioral Throttling: Beyond Frequency

While rate limiting constrains request frequency, behavioral throttling addresses sophisticated abuse through pattern analysis:

Temporal Pattern Analysis

  • Agents shifting from distributed patterns to synchronized bursts

  • Coordinated timing between multiple agents indicating orchestrated activity

  • Deviation from established operational rhythms

Semantic Drift Detection

  • Messages structurally valid but semantically inconsistent with agent purpose

  • Gradual shifts in request types indicating scope expansion

  • Context switching patterns inconsistent with operational models

Resource Consumption Profiling

  • CPU or memory consumption patterns inconsistent with declared functions

  • Network bandwidth usage exceeding operational requirements

  • Processing duration anomalies indicating hidden computational workloads

Progressive Throttling Implementation

Behavioral throttling applies graduated constraints based on anomaly severity rather than binary blocking:

Level 1 (Minor Anomalies): 25% rate reduction, enhanced logging

Level 2 (Moderate Anomalies): 50% rate reduction, supervisor notification

Level 3 (Significant Anomalies): 75% rate reduction, manual approval required

Level 4 (Severe Anomalies): Near-complete throttling, emergency response

Trust levels influence response severity. High-trust agents with established behavioral baselines receive more lenient treatment, while low-trust agents face immediate restrictions for minor anomalies.

Distributed Architecture Considerations

AI agent rate limiting requires distributed enforcement that maintains consistency across multiple entry points. Implementation leverages:

  • Redis clusters with sharding for sub-millisecond rate limit lookups

  • Consistent hashing ensuring agent requests route to same counter nodes

  • Real-time analysis pipelines using Kafka and Apache Flink for behavioral scoring

  • Hot-reloadable policies allowing dynamic threshold adjustment

Real-World Impact: Financial Trading Case Study

A cryptocurrency trading platform implemented behavioral throttling for 200 AI agents processing millions of market data points. Results:

  • 15 security incidents prevented in the first year, including 8 resource exhaustion attacks

  • 40% reduction in false trading signals while maintaining sub-2ms latency

  • $50 million in potential losses prevented through behavioral anomaly detection

  • Trust-based adaptation during market volatility improved operational resilience

Key Takeaways for Practitioners

  1. Move beyond simple frequency limits to behavioral pattern analysis

  2. Implement sliding windows to eliminate timing attack vulnerabilities

  3. Apply graduated responses based on trust levels and anomaly severity

  4. Design for distribution with consistent hashing and failover capabilities

  5. Monitor behavioral baselines to detect gradual drift and scope expansion

Behavioral throttling transforms rate limiting from a blunt instrument into a nuanced security control that adapts to AI agent behavior while maintaining operational performance. As AI agents become more sophisticated, our security controls must evolve to match their capabilities.

This article is part of an ongoing series on zero-trust architecture for AI-to-AI multi-agent systems. The complete framework addresses identity verification, authorization, temporal controls, rate limiting, logging, consensus mechanisms, and more.

About the Author: John R. Black III is a security practitioner with over two decades of experience in telecommunications and information technology, specializing in zero-trust architectures for AI agent systems.

🔗 AWS 119: Making the Connection - Attaching an IAM Policy to a User

2025-12-30 13:28:54

AWS

🔑 Activating Permissions: Linking Policies to Identities

Hey Cloud Gatekeepers! 👋

Welcome to Day 19 of the #100DaysOfCloud Challenge: Attach IAM Policy to User! We are finishing the loop on our Identity and Access Management tasks with KodeKloud Engineer.

Over the last few days, we’ve built users and we’ve written custom policies. But right now, iamuser_jim has a "key" but no permissions to use it. Today, we are going to fix that by Attaching his specific policy to his account.

Our mission: Attach the existing policy iampolicy_jim to the user iamuser_jim.

1. Introduction: The "Attachment" Principle 💡

In AWS, an IAM Policy is just a static document sitting in a library until it is associated with a "Principal" (a User, Group, or Role).

  • In-line vs. Managed: While you can write policies directly inside a user (In-line), it is a best practice to use "Managed Policies" (like we are doing today) because they are reusable and easier to track.
  • Immediate Effect: The moment you click "Attach," the permissions are live. There is no need for the user to log out and log back in.
  • Why it Matters: This is how you delegate work. Jim might be our "S3 Admin" or "Database Auditor." By attaching the right policy, we give him the power to do his job without giving him the keys to the whole kingdom.

2. Step-by-Step Guide: Attaching the Policy to Jim

We will use the IAM Dashboard to finalize this security link.

Step 2.1: Locate the User

  1. Log in to the AWS Console.

  1. Search for IAM and open the dashboard.

  1. In the left sidebar, click on "Users".

  1. Find and click on the name iamuser_jim.

Step 2.2: Add Permissions

  1. Inside Jim's user summary page, look for the "Permissions" tab.

  2. Click the "Add permissions" button on the right and select "Add permissions" from the dropdown.

Step 2.3: Attach Existing Policy Directly

  1. Select the option "Attach policies directly".

  1. In the Permissions policies search box, type iampolicy_jim.
  2. Check the box next to the policy name when it appears.

Step 2.4: Review and Add

  1. Click "Next" at the bottom.
  2. Review the summary. You should see "Permissions boundary is not set" (which is normal for this task) and the policy name iampolicy_jim.
  3. Click "Add permissions".

Success! iamuser_jim now has the specific powers defined in his custom policy. 🎉

3. Key Takeaways 📝

  • Granular Control: You can attach multiple policies to a single user if they have multiple responsibilities.
  • Inheritance: Remember, if Jim were in a Group, he would also inherit all the policies attached to that group in addition to this direct policy.
  • Global Visibility: You can see exactly which policies are affecting a user by looking at the "Permissions" tab at any time.

4. Common Mistakes to Avoid 🚫

  1. Direct Attachment Overload: In a large company, it's better to attach policies to Groups rather than individual users. Attaching directly to users (like we did today) is okay for specific exceptions, but can become a mess at scale.
  2. Naming Confusion: Always ensure you are attaching iampolicy_jim to iamuser_jim. It’s easy to misclick when names are similar!
  3. Policy Conflicts: If one policy says "Allow S3" and another says "Deny S3," Deny always wins in AWS.

5. Conclusion + Call to Action! 🌟

Jim is now fully equipped to help the Nautilus team with their cloud migration! You've successfully managed the "Who" (User), the "What" (Policy), and the "How" (Attachment).

How are you finding the 100 Days of Cloud Challenge? 🛡️

  • 💬 Let’s connect on LinkedIn: Have you ever accidentally "locked yourself out" by attaching a Deny policy? (We've all been there!) 👉 Hritik Raj
  • Support my journey on GitHub: Check out my full security and infrastructure logs. 👉 GitHub – 100 Days of Cloud

Azure AI Engineer Explained: Skills, Tools, and Responsibilities

2025-12-30 13:26:50

As of December 23, 2025, the role of a Microsoft Azure AI Engineer goes far beyond just writing code. It’s about building, deploying, and managing real AI solutions on Azure that actually work in production.
Azure AI engineers are involved in the full lifecycle of an AI solution. From understanding business requirements and designing the approach to development, deployment, integration, ongoing maintenance, and performance optimization, they play a hands-on role at every stage. Monitoring and fine-tuning models over time is just as important as building them.
The role is highly collaborative. Azure AI engineers work closely with solution architects to turn ideas into reality, and they regularly coordinate with data scientists, data engineers, IoT specialists, infrastructure teams, and fellow developers. Together, they create secure, end-to-end AI solutions and embed AI capabilities into larger applications and systems.
From a technical perspective, experience with Python or C# is essential. You’re expected to be comfortable working with REST APIs and SDKs to develop solutions for image and video processing, natural language processing, knowledge mining, and generative AI on Azure.
A strong understanding of the Azure AI ecosystem is also key, including how different AI services fit together and which data storage options make sense for different use cases. Just as importantly, Azure AI engineers are expected to apply responsible AI principles, ensuring solutions are ethical, secure, and trustworthy.