MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

🧠LLMs As Sensors

2025-12-06 18:47:16

Why OrKa 0.9.10 Wraps GenAI Inside Deterministic Systems

I will start bluntly.

I like generative AI. I use it every day. I build around it. But I do not trust it to own the outcome of a system.

For me, GenAI is a fantastic tool for two things:

  • Generating content
  • Analyzing context

That is already huge. But it is still just one tool in a bigger machine.

What worries me is how often I see people trying to bend the model into being the whole product.

"Just send a giant prompt, get an answer, ship it."

It works for demos. It does not scale to real systems that need reliability, reproducibility, or any kind of serious accountability.

This article is about that gap.

  • Why LLMs should be treated as probabilistic sensors, not entire applications
  • Why their outputs must be wrapped into real objects and fed into deterministic algorithms
  • And how this philosophy is shaping the current work I am doing with OrKa v0.9.10, including a routing fix that forces me to hold myself to the same standard I am describing here

I am not trying to hype anything. I am trying to describe how I think modern AI should be wired if we want it to behave like infrastructure instead of roulette.

The uncomfortable truth: LLMs are not your system

Let me restate the rough idea that kicked this off:

AI, especially GenAI, is a great tool for content generation and context analysis. But it is still just a tool.

We need to stop treating it as the whole solution and instead force it to generate outcomes that can feed a bigger system, so those outcomes can be used for deterministic execution of algorithms.

That is the core.

LLMs are:

  • Stochastic
  • Non deterministic
  • Sensitive to prompt phrasing, context ordering, temperature, and even invisible whitespace
  • Very good at pattern matching, fuzzy reasoning, and "filling in the missing piece"

They are not:

  • Reliable finite state machines
  • Formal decision trees
  • Deterministic planners
  • Systems you can audit in a classical sense

And that is fine, as long as you do not pretend otherwise.

Where LLMs shine is exactly where classic systems struggle:

  • Quick approximate reasoning
  • Extracting structure from messy input
  • Mapping unstructured signals into higher level descriptions
  • Acting almost like a "universal fuzzy detector" for patterns

So the question is not

"How do I make the LLM do everything?"

The question is

"How do I use the LLM where it shines, then hand off to deterministic code as soon as possible?"

Think of LLMs as sensors, not brains

The metaphor that keeps coming back in my head is this:

An LLM is a sensor that reads the world of language and returns a noisy, high level interpretation.

Just like:

  • A microphone turns air vibration into a waveform
  • A camera turns photons into pixels
  • An accelerometer turns motion into axes of numbers

An LLM turns sequences of tokens into:

  • Labels
  • Spans of text
  • Explanations
  • Rankings
  • Summaries
  • Structured JSON

The trick is to treat that output as measurement, not as law.

For example:

  • "This voice sounds like a 35 to 45 year old male, 70 percent confidence."
  • "This message is probably a support ticket about billing."
  • "This paragraph expresses frustration, particularly toward a teammate."

Those measurements are incredibly powerful. Before LLMs, many of these tasks required:

  • Custom signal processing
  • Domain specific feature extraction
  • Custom models for each upstream task
  • A lot of time and data

Now you can prototype them in hours.

But once you have that measurement, you should wrap it:

{
  "age_estimate": 38,
  "age_range": [35, 45],
  "confidence": 0.73,
  "source": "audio_segment_023.wav",
  "model": "my_local_model_1.5b"
}

That object is no longer just "LLM output". It is:

  • A typed entity in your system
  • Something you can log, replay, test, and validate
  • A first class citizen in your deterministic logic

Then the decisions are made by normal code:

if person.age_estimate >= 18:
    enable_feature("adult_profile", person.id)
else:
    enable_feature("underage_profile", person.id)

The "smart" part is upstream. The accountable part is downstream.

A concrete example: detecting aging from audio

You mentioned something like "detect the aging from an audio" and I like this example a lot because it is exactly the kind of thing that smells "AI-ish" but should be designed as a system, not as a prompt.

A naive approach looks like this:

  1. Send raw audio (or its transcription) to an LLM with a prompt like "Analyze this audio and tell me how old the speaker is and how it is changing over months."
  2. Get back some English explanation.
  3. Show it in a UI. Call it a feature.

That is fragile and impossible to test properly.

A more system-level design:

  1. Signal layer

    • Extract features from the audio over time.
    • Maybe you use some classic DSP, maybe you use a small embedding model.
    • Build a timeline of short samples.
  2. LLM as a sensor

    • For each window, the LLM gets a compressed description of the signal, or even just some textual metadata if you have it.
    • It outputs something compact and structured:
   {
     "timestamp": 1733332500,
     "age_estimate": 39,
     "confidence": 0.68,
     "voice_stability": "slightly_decreasing"
   }
  1. Deterministic aging detector

    • A standard algorithm (not an LLM) runs on top of these structured records.
    • It can be a simple function, or a time series model, but the key is:
      • The transitions are explicit
      • The thresholds are configurable
      • The logic is not hidden in a prompt
  2. System outcome

    • The system might decide:
      • "We do not detect significant aging over the last 12 months."
      • Or "We detect a consistent pattern of degradation, trigger an alert."

You can test this.

You can replay the same input data and verify you get the same decision. You can experiment with different threshold values. You can swap out the LLM with a smaller local model that returns a similar JSON structure.

The LLM is a pluggable sensor. The system is the deterministic pipeline that consumes its readings.

Why wrapping model output into objects matters

This is the part that seems small but changes everything.

If you let your LLM return "whatever it wants, as long as the text looks good", your system will always be at the mercy of prompt drift.

If you force your LLM to return objects, and you treat those objects as contract, you get:

  • A clear boundary between probabilistic and deterministic behavior
  • The ability to version that schema
  • Explicit error handling when the object is malformed or incomplete
  • Real regression tests

Typical pattern:

  1. Prompt the LLM to output strict JSON with an explicit schema.
  2. Validate that JSON in your code.
  3. Log the raw model output and the parsed object.
  4. Use only the parsed object downstream.

In pseudocode:

raw = call_llm(prompt, input_context)
parsed = json.loads(raw)

validate_schema(parsed, AgeEstimateSchema)  # raises if invalid

decision = age_classifier(parsed)
persist_decision(decision)

If validate_schema fails, that is not "mysterious AI behavior". It is a normal bug you can see in a log and fix by adjusting the prompt or model.

And now we can talk about orchestration.

OrKa: building a deterministic spine around probabilistic agents

OrKa exists because I wanted a way to:

  • Compose multiple "sensors" and agents
  • Route between them based on their outputs
  • Keep the execution trace fully visible and replayable
  • Avoid hardcoding everything in application code over and over

In OrKa, I do not think of "a big model that knows everything".

I think in terms of:

  • Agents that do one thing
  • Service nodes that mutate state or call external systems
  • Routers that decide which agent comes next, based on structured outputs

Everything is described in YAML, so the cognition graph is explicit.

A very simplified OrKa-style flow where an LLM decides which branch to take might look like this:

orchestrator:
  id: audio_aging_flow
  strategy: sequential
  queue: redis

agents:
  - id: audio_to_features
    type: service
    kind: audio_feature_extractor
    next: llm_age_sensor

  - id: llm_age_sensor
    type: llm
    model: local_llm_1
    prompt: |
      You are an age estimation sensor.
      Given these features, output strict JSON:
      {"age_estimate": int, "confidence": float}
    next: age_route

  - id: age_route
    type: router
    routing_key: age_estimate
    routes:
      - condition: "value < 18"
        next: underage_handler
      - condition: "value >= 18"
        next: adult_handler

  - id: underage_handler
    type: service
    kind: profile_flagger

  - id: adult_handler
    type: service
    kind: profile_flagger

The LLM here is just one node (llm_age_sensor). Its output becomes a field (age_estimate) that the router uses in a deterministic way.

If you replay the same input, the router will make the same decision for the same parsed values.

That guarantee is not automatic. It depends on the correctness of routing behavior. Which brings me to the latest OrKa release.

Why this matters beyond OrKa

You do not have to care about OrKa to care about this pattern.

If you are building any system around generative models, ask yourself a few questions:

  1. Where does the probabilistic behavior end?

    Is there a clear boundary where the LLM output is turned into a typed object and validated? Or does the "magic" just flow deep into your code base?

  2. Who owns the final decision?

    Does the model decide what happens, or does deterministic code decide based on model measurements?

  3. Can you replay a run?

    If a user reports something weird, can you reconstruct the full chain: input → model output → routing → system decision?

  4. What happens if you swap models?

    If you change from a proprietary model to a local one, do you only change the sensor, or do you need to rewrite half the app?

  5. What is the unit of testability?

    Can you test downstream logic with synthetic objects, without involving the LLM at all?

My bias is clear:

I want LLMs to be pluggable, swappable, measurable, and constrained.

I want the core of the system to feel boring in a good way.

That is what OrKa is trying to encode at the framework level:

model calls as agents, routing as explicit configuration, memory and traces as first class concepts, all tied together in a way that can be inspected, not guessed.

A small mental shift that changes system design

If I had to compress this article into one mental shift, it would be this:

Stop asking "What can the LLM do?"

Start asking "What kind of object do I need so that my system can behave deterministically, and how can I use an LLM to produce that object?"

Examples:

  • Instead of "write me a reply email", think "I need an EmailReplyPlan with fields: tone, key_points, call_to_action, and I will let deterministic templates render the final email."

  • Instead of "decide what to do next for this customer", think "I need a NextAction object with action_type, priority, and reason, and my orchestration layer will decide which internal systems to call."

  • Instead of "summarize this call for the CRM", think "I need a CallSummary object with sentiment, topics, promises_made, follow_up_tasks, and my CRM logic will handle storage and workflows."

In all of these, the LLM is powerful, but the real system lives around it.

You can inspect those objects. You can aggregate them. You can feed them into analytics and classic algorithms. You can design them once and evolve them over time.

And if you embrace orchestration tools, you can also define how these objects move, which nodes can create or transform them, and under what conditions routing happens.

Closing thoughts

So, to tie the threads:

  • GenAI is great at generating content and reading context. That is not a small thing. It is a massive shift in what we can build in reasonable time.
  • But models are not the system. They are components in the system. Treat them like sensors that emit measurements.
  • Wrap model outputs into strict, typed objects. Validate them. Version them. Use them as the raw material for deterministic logic, not as the final answer.
  • Orchestrate flows so that routing is explicit, traceable, and reproducible. If the routing itself is fuzzy, you just moved the black box one step further.
  • In OrKa v0.9.10, tightening routing behavior was not a cosmetic refactor. It was necessary to keep this philosophy consistent in the framework I am building. If I want OrKa to be a cognitive execution layer, it needs to behave like infrastructure, not like another probabilistic blob around the model.

If you are curious about OrKa, you can read more and follow the roadmap at orkacore.com. I am not claiming it is the answer. It is simply my current attempt to encode this belief in code:

LLMs should feed deterministic systems, not replace them.

If that idea resonates with you, then we are probably trying to solve similar problems, just with different tools.

OrKa v0.9.10: fixing routing is not a cosmetic change

I just cut a release of OrKa v0.9.10, focused on a fix in routing behavior.

I will not pretend this is some huge "launch" moment. It is a pretty boring fix if you look only at the diff. But for the philosophy in this article, it is critical.

What was wrong?

In some edge cases, the router:

  • Would evaluate conditions on slightly stale context, or
  • Could pick a next node that was not the one you would expect from the latest structured output, especially after more complex flows with forks and joins

This is exactly the type of thing that breaks the "LLM as sensor, system as deterministic spine" model.

When your router does not behave deterministically, you get:

  • Non reproducible traces
  • Confusing logs
  • Surprises during replay
  • The feeling that the orchestrator itself is "magical" instead of mechanical

That is the opposite of what OrKa is supposed to be.

So in v0.9.10 I focused on:

  • Making sure routing decisions always use the last committed output of the relevant agent
  • Making context selection explicit, not implicit
  • Tightening the mapping between routing_key and the object field it reads
  • Hardening the trace so that, for a given input plus memory state, the same routing path is taken every time

In more human words:

If your LLM says:

{ "route": "adult_handler" }

then OrKa should take that path, and you should be able to see exactly why in the trace.

No surprises. No "the orchestrator is a bit mysterious too".

Only the LLM is allowed to be fuzzy. The rest must behave like infrastructure.

The LLM Shield: How to Build Production-Grade NSFW Guardrails for AI Agents

2025-12-06 18:42:06

Content moderation is one of the most critical yet challenging aspects of building AI applications. As developers, we're tasked with creating systems that can understand context, detect harmful content, and make nuanced decisions—all while maintaining a positive user experience. Today, I want to share insights from building a production-grade NSFW detection system that goes beyond simple keyword blocking.

Why Simple Keyword Filtering Isn't Enough

When I first started working on content moderation, I thought a simple blocklist would suffice. Flag a few explicit words, block them, and call it a day. Reality quickly proved me wrong.

Users are creative. They use character substitutions ("s3x"), deliberate spacing ("p o r n"), and roleplay scenarios to bypass filters. Meanwhile, legitimate medical and educational content was getting incorrectly flagged. The system needed to be smarter—it needed context awareness.

The Multi-Layered Approach

The solution I developed uses a four-tier severity classification system, inspired by industry standards from organizations like OpenAI and Microsoft. Here's how it breaks down:

Level 0: Allowed Content

This includes medical, educational, and scientific content. Think anatomy textbooks, reproductive health articles, or clinical research papers. The system looks for contextual indicators like "doctor," "diagnosis," "textbook," or "peer-reviewed" to identify this category.

Level 1: Restricted Content

Mature themes that aren't explicitly sexual but may require age verification. This includes content about kissing, attraction, or sexual health education. It's the gray area that needs careful handling.

Level 2: Contextual Content

This is where things get interesting. Terms like "aroused," "seductive," or "naked" can be perfectly appropriate in some contexts (art history, literature analysis) but inappropriate in others. The system analyzes surrounding text to make informed decisions.

Level 3: Critical Content

Explicit sexual content, pornographic material, and sexual violence. This gets blocked immediately, no questions asked. The patterns here are carefully designed to catch both direct language and obfuscated attempts.

Detecting Jailbreak Attempts

One pattern I've seen repeatedly is users trying to bypass filters through roleplay: "Let's pretend we're characters in a story where..." The system specifically watches for roleplay indicators combined with sexual content, treating these as high-risk attempts to circumvent protections.

Handling Obfuscation

Users employ various tricks to evade detection:

Character separation: "p.o.r.n" or "s-e-x"
Deliberate misspellings: "p0rn" or "s3xy"
Leetspeak substitutions: "nak3d" or "h0rny"

Obfuscation Patterns

Character separation:

p[._-]?o[._-]?r[._-]?n

Leetspeak:

p[o0]rn
s[e3]x
h[o0]rny

The obfuscation detector uses regex patterns that account for these variations. It looks for suspicious patterns like excessive punctuation between characters or common number-for-letter substitutions.

The Ensemble Decision Engine

Here's where all the pieces come together. When content is analyzed:

  1. Signal Collection: Each detector (explicit content, contextual analysis, obfuscation) generates a signal with a confidence score
  2. Context Modification: The base confidence is adjusted based on context (medical terms present? roleplay detected? user verified?)
  3. Weighted Aggregation: Signals are combined, with critical content getting more weight
  4. Threshold Evaluation: The final decision compares against configurable thresholds
if severity_scores[L3] > 0:
    action = BLOCK
elif severity_scores[L2] > threshold:
    action = WARN or BLOCK
elif severity_scores[L1] > threshold:
    action = ALLOW or WARN
else:
    action = ALLOW

This ensemble approach is more robust than any single detector. Multiple weak signals can combine to indicate problematic content, while strong contextual indicators can override false positives.

Practical Implementation Considerations

Configuration Flexibility

Real-world applications need different strictness levels. The system supports three preset configurations:

Strict Mode: For general audience apps. Blocks Level 1+ content with a low confidence threshold (0.6). Best for platforms accessible to minors.

Age-Verified Mode: For adult platforms with user verification. Allows Level 1 content and requires higher confidence (0.7) before blocking Level 2 content.

Educational Mode: Optimized for academic settings. Only blocks Level 3 critical content and uses a high threshold (0.8) to minimize false positives on legitimate educational material.

Custom Rules

Every application has unique needs. The system allows:

  • Custom blocklists: Add domain-specific terms that should always block
  • Custom allowlists: Override detections for known safe terms in your context
  • Confidence thresholds: Adjust how aggressive the filtering should be

Transparency and Auditability

One crucial aspect often overlooked is transparency. When content is blocked, users deserve to understand why. The system provides detailed metadata:

Severity level and confidence score

Specific signals that triggered detection

Which patterns were matched (without exposing the full pattern library)

Whether the content appears to be in an educational/medical context

This transparency helps with:

  • User trust: People can understand and potentially appeal decisions
  • Debugging: Developers can identify false positives
  • Compliance: Audit trails for regulatory requirements

Future Enhancements

Content moderation is an evolving challenge. Some areas for future development:

Machine Learning Integration: Pattern-based detection has limits. ML models can learn nuanced patterns and adapt to new evasion techniques.

Multi-Language Support: The current system is English-focused. Expanding to other languages requires language-specific patterns and cultural context awareness.

Image and Video: Text is just the beginning. Visual content moderation adds another dimension of complexity.

User Feedback Loop: Allow users to report false positives/negatives, feeding improvements back into the system.

Conclusion

Building effective content moderation requires balancing multiple competing goals: safety, accuracy, user experience, and performance. A multi-layered approach with context awareness provides the flexibility to handle diverse scenarios while maintaining high accuracy.

The key takeaways:

  • Simple keyword blocking fails in production environments
  • Context analysis is essential for reducing false positives
  • Multiple detection signals provide robustness
  • Configuration flexibility allows adaptation to different use cases
  • Transparency builds user trust

Content moderation isn't a solved problem—it's an ongoing challenge that requires continuous refinement. But with thoughtful architecture and careful implementation, we can build systems that protect users while respecting legitimate content.

If you're building AI applications with user-generated content, I hope this guide provides a solid foundation for your moderation strategy. The code and patterns discussed here are based on real-world production experience and industry best practices.

Github Code : https://github.com/aayush598/agnoguard/blob/main/src/agnoguard/guardrails/nsfw_advanced.py

Stay safe, and happy coding!

Have questions about implementing NSFW detection in your application? Found this guide helpful? Leave a comment below or connect with me on your preferred platform. I'd love to hear about your experiences with content moderation.

Tired of old, clunky GUI libraries? Try PyUIkit 1.0.0 🚀

2025-12-06 18:37:47

Hey Python devs! 👋

I’m excited to share PyUIkit 1.0.0, a modern, web-style GUI framework for Python built on top of CustomTkinter. It brings Div-based layouts, reusable components, and an easy way to create interactive desktop apps without messy layout code.

Features

  • Web-like Div layout system for nesting components
  • Interactive components: Text, Button, Input, Slider, and more
  • Component IDs for dynamic updates
  • Modern styling with dark/light themes
  • Minimal boilerplate, easy to pick up

Example: Color Mixer

Color Mixer Example

This app demonstrates using Slider components to mix colors in real-time and display the hex value dynamically.

Installation

pip install pyuikit

To get started see the GitHub Repository | PyPI Page or the Quickstart guide

I’m open to PRs, feedback, and feature requests! Issues can be reported on GitHub’s issue tab.

Would love to hear what you think! 🙂

Building a Unified API Response Architecture (ASP.NET Minimal API + Next.js)

2025-12-06 18:37:27

How to eliminate inconsistent errors, simplify validation, and keep backend & frontend perfectly in sync

Introduction

If your frontend error-handling is overloaded with conditional logic…

if (error.response?.data?.message) { ... }
else if (typeof error === "string") { ... }
else if (error.status === 401) { ... }

…it’s not your fault - it’s your backend being inconsistent.
This article shows how to design a fully unified response architecture using:

  • ASP.NET Minimal API

  • Next.js

  • a Response Envelope

  • a Rich Error Object pattern

  • fully synced C# -> TypeScript enums
    The result is predictable, type-safe, frontend-friendly APIs.

The Problem: Inconsistent API Responses

Most projects begin with “quick” error returns:

return BadRequest("Email invalid");
throw new Exception("Something went wrong");
return Unauthorized();`
But later you realize your API responds with:
`{ "error": "UserNotFound" }
{ "message": "Invalid email" }
Something went wrong
<html>401 Unauthorized</html>

This causes:

1. Unpredictable API response structures
Different shapes break interceptors and fetch wrappers.

2. Frontend cannot centralize error handling
You end up parsing strings, objects, arrays, or HTML.

3. Validation errors are irregular
No unified shape, varying formats.

4. No type syncing
A typo in backend enums silently breaks the frontend.

Solution: The Unified Response Envelope

A single response format used by all endpoints — successful or not.
It should:

  • indicate success/failure

  • include a strict error code (enum)

  • standardize validation errors

  • never leak stack traces

  • be predictable for any client

  • fully map to TypeScript

Defining the ApiResponse Contract

Before diving into middleware, we need a consistent backend response type.

public class ApiResponse<T>
{
    public bool Success { get; set; }

    [JsonConverter(typeof(JsonStringEnumConverter))]
    public ErrorCodes? ErrorCode { get; set; }

    public string? Message { get; set; }

    public T? Data { get; set; }

    public List<ValidationError>? ValidationErrors { get; set; }

    public static ApiResponse<T> Ok(T data, string? message = null) =>
        new() { Success = true, Data = data, Message = message };

    public static ApiResponse<T> Fail(ErrorCodes code, string? message = null) =>
        new() { Success = false, ErrorCode = code, Message = message };

    public static ApiResponse<T> ValidationError(List<ValidationError> errors) =>
        new()
        {
            Success = false,
            ErrorCode = ErrorCodes.ValidationError,
            Message = "Validation failed",
            ValidationErrors = errors
        };
}

This class ensures that all API responses follow the same structure.
Success always returns Data, errors always return ErrorCode and Message.
Why it matters:

  • One response shape -> simple front-end logic

  • Strong typing between backend & frontend

  • No accidental HTML or string leakage

  • UI becomes fully predictable

Backend Architecture

1. Global Exception Middleware

public class ErrorMiddleware
{
    private readonly RequestDelegate _next;

    public ErrorMiddleware(RequestDelegate next) => _next = next;

    public async Task InvokeAsync(HttpContext context)
    {
        try
        {
            await _next(context);
        }
        catch
        {
            context.Response.ContentType = "application/json";
            context.Response.StatusCode = 500;

            await context.Response.WriteAsJsonAsync(
                ApiResponse<object>.Fail(
                    ErrorCodes.ServerError,
                    "Internal server error")
            );
        }
    }
}

This middleware wraps the entire pipeline in a try/catch.
Any unhandled exception becomes a structured 500 JSON response.
Key takeaways:

  • No stack traces or raw exceptions escape

  • Backend never returns unhandled errors

  • Frontend receives the same error shape every time

2. 401 & 403 Override

public class ApiAuthorizationHandler : IAuthorizationMiddlewareResultHandler
{
    private readonly AuthorizationMiddlewareResultHandler _default = new();

    public async Task HandleAsync(
        RequestDelegate next,
        HttpContext context,
        AuthorizationPolicy policy,
        PolicyAuthorizationResult result)
    {
        if (result.Challenged)
        {
            context.Response.StatusCode = 401;
            await context.Response.WriteAsJsonAsync(
                ApiResponse<object>.Fail(ErrorCodes.Unauthorized, "Authentication required")
            );
            return;
        }

        if (result.Forbidden)
        {
            context.Response.StatusCode = 403;
            await context.Response.WriteAsJsonAsync(
                ApiResponse<object>.Fail(ErrorCodes.Forbidden, "Access denied")
            );
            return;
        }

        await _default.HandleAsync(next, context, policy, result);
    }
}

This handler replaces raw 401/403 status codes with clean JSON envelopes.
Key takeaways:

  • No more raw status codes

  • Mobile and SPA clients get clean JSON

  • Auth errors match your global error format

3. API-Scoped 404 Handler
ASP.NET returns default HTML for unknown endpoints.
We override it only for /api/** paths.

app.Use(async (context, next) =>
{
    await next();

    if (context.Request.Path.StartsWithSegments("/api") &&
        context.Response.StatusCode == 404 &&
        !context.Response.HasStarted)
    {
        await context.Response.WriteAsJsonAsync(
            ApiResponse<object>.Fail(ErrorCodes.NotFound, "API endpoint not found")
        );
    }
});

Key takeaways:

  • No HTML 404 pages for API routes

  • Every API failure is a JSON envelope

Why This Architecture Works

- Predictable
Every response follows one structure.

- Strongly typed
Backend enums → frontend TS types.

- Easier debugging
You always know where things went wrong.

- Cleaner UI code
One global error handler instead of dozens.

- More secure
No stack traces or framework-generated HTML.

Frontend Architecture (Next.js)

1. apiFetch: Unified Fetch Wrapper

export async function apiFetch<T>(
  url: string,
  options: RequestInit = {}
): Promise<ApiResponse<T>> {
  const res = await fetch(API_URL + url, {
    ...options,
    headers: {
      "Content-Type": "application/json",
      ...(options.headers || {}),
    },
    credentials: "include",
  });

  let json: ApiResponse<T>;

  try {
    json = await res.json();
  } catch {
    return {
      success: false,
      errorCode: "ServerError",
      message: "Invalid JSON from server",
    };
  }

  return json;
}

This wrapper ensures that every request returns an ApiResponse<T>.
If the server returns invalid JSON, we still wrap the error safely.
Key points:

  • No front-end crashes from malformed responses

  • Always returns a predictable structure

2. Rich Error Object Pattern (ApiError)

export class ApiError extends Error {
  public error: ErrorResponse;

  constructor(response: ApiResponse<any>) {
    super("API Error");
    Object.setPrototypeOf(this, ApiError.prototype);
    this.name = "ApiError";

    this.error = {
      errorCode: response.errorCode!,
      message: response.message,
      validationErrors: response.validationErrors,
    };
  }
}

Throwing plain strings or raw fetch errors is unhelpful.
Instead, we throw a structured, typed error object that UIs can use safely.
Takeaways:

  • UI receives rich error metadata

  • Works perfectly with Next.js server/client components

  • Makes error boundaries more useful

2. Using apiFetch in a Service Module
Example:

export const userService = {
  async getUser(id: number) {
    const res = await apiFetch<{ id: number; name: string }>(
      `/api/users/${id}`
    );

    if (!res.success) {
      throw new ApiError(res);
    }

    return res.data;
  },
};

Why this is clean

All API logic is centralized - components receive only clean data or ApiError.

4. Syncing TypeScript With C# Enums
To eliminate typos and keep error codes synchronized between backend & frontend.

app.MapGet("/ts/error-codes.ts", () =>
{
    var names = Enum.GetNames(typeof(ErrorCodes));
    var lines = names.Select(n => $"  | \"{n}\"").ToList();
    lines[0] = lines[0].Replace("|", "");

    var ts = "export type ErrorCode =\n" + string.Join("\n", lines) + ";\n";

    return Results.Text(ts, "text/plain");
});

Result:
You generate the file:

curl http://localhost:5000/ts/error-codes.ts > lib/errorCodes.ts

Now the frontend has:

export type ErrorCode =
  | "Unauthorized"
  | "Forbidden"
  | "NotFound"
  | "ValidationError"
  | "Conflict"
  | "ServerError";

5. Frontend Error Map

export const errorMap: Record<ErrorCode, string> = {
  Unauthorized: "Need Authentication",
  Forbidden: "Access Denied",
  NotFound: "Not Found",
  ValidationError: "Validation Error",
  Conflict: "Conflict Detected",
  ServerError: "Server Error",
};

Why this matters:
TypeScript forces developers to map every error code — or fail the build.

Developer Experience: Before vs After

Before:

  • inconsistent API formats

  • HTML error pages

  • unpredictable validation formats

  • lots of repeated try/catch logic

  • difficult UI error states

After:

  • one universal response structure

  • strict typing across stack

  • easy global error handler

  • predictable UX

  • safer exceptions

  • more reliable monitoring/logging

When NOT to Use an Envelope

This pattern is not ideal for:

  • Public REST APIs (Stripe style)

  • OpenAPI-first API-first schemas

  • Strictly RESTful HATEOAS-driven systems

But for:

  • SPAs

  • SSR apps

  • mobile apps

  • internal tools

  • B2B dashboards

  • admin panels

…it’s a massive improvement.

Examples

  • All example code: Github
    You can clone that and start developing with this boilerplate.

  • Used in my recent production project: Neon Royale
    And integrating the API with the frontend was a smooth and straightforward process thanks to this architecture.

Manual testing.

2025-12-06 18:37:17

** COMMON MANUAL TESTING TECHNIQUES**:
Manual testing is the process of testing a software application without using any automation tools. Like a human tests the software step-by-step to find bugs.
They have some techniques they are :
1. Black box technique,
2. White box technique,
3. Grey box technique.
In black box technique they focuses only on inputs and outputs like;

  • (a) Boundary value analysis BVA

  • (B) Equivalence partitioning

  • (c) Decision table testing

** BOUNDARY VALUE ANALYSIS**:
This analysis used when inputs have ranges, limits and thresholds. For use some boundaries which is following given below;

              1. Lower boundary,
              2. Upper boundary,
              3. Just below the lower boundary,
              4. Just below the upper boundary.

(FOR EXAMPLE 1):
TEXT BOX ACCEPTS 1 TO 10
VALUE RESULT
0 Fail
1 Pass
2 Pass
9 Pass
10 Pass
11 Fail

( FOR EXAMPLE 2):
ATM WITHDRAWL 500 TO 10,000

        VALUE        RESULT

        499           Fail
        500           Pass
        501           Pass
       9999           Pass
       10,000         Pass
       10,001         Fail.

( FOR EXAMPLE 3) :
Password create must be 6 to 12 characters

     VALUE              RESULT

     5                  Fail
     6                  Pass
     7                  Pass
    11                  Pass
    12                  Pass
    13                  Fail . These are the examples foe boundary value analysis .

DECISION TABLE TESTING:

                 The decision table testing is comes under the black box testing . The output depends on multiple conditions.
                      Conditions, and
                      Possible combination.

( FOR EXAMPLE 1): LOAN APPROVAL
CONDITION:
Salary : 30,000
CIBIL : 700
SALARY CIBIL RESULT

         Yes           yes           approved
         yes           no            rejected
         no            yes           rejected
         no            no            rejected

( FOR EXAMPLE 2) : PASSWORD LOGIN
CONDITION : 1. Password correct
2. User is not blocked

   PASSWORD CORRECT         USER BLOCKED          LOGIN

      NO                        no                 fail
      no                        yes                fail
      yes                       no                  login
      yes                       yes                 fail

(FOR EXAMPLE 3) : ATM WITHDRAWEL
COMDITIONS : 1. Sufficient balance
2. Correct pin

   SUFFICIENT BALANCE      CORRECT PIN        WITHDRAW

      yes                     yes               valid
      yes                     no               invalid
      no                     yes               invalid
      no                     no                invalid

THE FUTURE OF MANUAL TESTING IN THE AGE OF AI:
The world full of software testing is rapidly changing driven by the rise of AI and automation. The future isn't a battle between humans and machines. Manual testers are adapting the changes, combining their unique human insights with the power of AI.
# HOW AI MANUAL TESTING CHANGES:
1. HUMAN IDEAS AND IMPORTANCES,
2. AI ASSISTED TESTING,
3. AI ASSISTED SPECIALIST.

HUMAN IDEAS AND IMPORTANCES:
Human can feel AI functionality things that use user experience, intution and creativity.

AI ASSISTED TESTING:
AI assisted test case helps to create, changes and error guessing.

AI ASSISTED SPECIALIST:
It gives some job vacancy like UI and UX testing consultant. It will helps for some freshers who searching for a jobs or else it helps to identify the answers related to their questions.

HOW THE TESTER OF THE FUTURE SHOULD BE:

           1. CONTINUOUS LEARNING
           2. COLLABROTION WITH AI
           3. NEW ROLES

CONTINUOUS LEARNING :
It is essential to learn automation and AI tools testers who uses AI will be ahead compared to those who don't.

COLLABORATION WITH AI :
Testers will engage in tasks like creating test cases and designing test plan using AI and this will save time.

NEW ROLES:
New roles such as AI-assisted testing specialist and UI testing consultant will emerge.

SUMMARY:
Manual testing will not disappear ;instead it will evolve by intergrating with AI, expanding the tester's skills and moving to the next level in their future.

AWS re:Invent 2025 - From Code to Policies: Accelerate Development w/ IAM Policy Autopilot (SEC351)

2025-12-06 18:36:16

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - From Code to Policies: Accelerate Development w/ IAM Policy Autopilot (SEC351)

In this video, AWS Principal Product Manager Kevin Luo and Principal Engineer Luke Kennedy introduce IAM Policy Autopilot, a newly launched open-source CLI tool that uses static code analysis to generate IAM policies deterministically from application code. They demonstrate building a multi-tenant S3 bucket creation microservice, comparing traditional approaches like AdministratorAccess, AWS managed policies, and handcrafted policies with their limitations. The tool analyzes code locally, maps SDK operations to IAM actions using AWS's published service reference data, and generates policies with 97% fewer permissions than typical developer policies. They showcase integration with Claude via MCP server, demonstrating automated policy generation, CloudFormation deployment, and a "fix access denied" workflow that enables coding assistants to self-correct permission errors. Current limitations include identity-only policies and manual resource scoping, with automatic resource identification planned.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: Building a Multi-Tenant Microservice with IAM Challenges

All right, welcome everyone. You made it to the Mandalay Bay. For folks who had trouble finding it, you made it. My name is Kevin Luo. I'm a Principal Product Manager at AWS Identity, and I'm joined by Luke Kennedy. I'm a Principal Engineer on the AWS IAM team. Excellent.

Now, before we get started, even though this is a 300 level session, I want to get a sense of who's in the room. By show of hands, who's new to AWS? You're a new builder, like in the last year or so. All right. Then, are you an experienced builder? So you kind of know your way around, you can write IAM policies. Show of hands. Great. And then how many of you can write IAM policies with your eyes closed? You just know them. We got one hand there, two hands. Luke didn't raise his hand and he helped build IAM, right?

I've been building on AWS for 10 years now, and even I get hung up on it. Today we're going to talk to you about IAM Policy Autopilot. It's a new tool we just launched that can help accelerate your workflows so that you can get started faster with IAM when you just want to build. So this is a cool talk. We're going to jump into the scenario here.

Thumbnail 110

Thumbnail 120

Today we're going to build a microservice that creates and encrypts S3 buckets for a multi-tenant service. This is really important because when I have a multi-tenant service and I onboard customers, I want to make sure that there's data isolation between them. I want to create a bucket just for that customer, and then I want to make sure I have a KMS key just for that customer's data. What we're going to do is when a new customer signs up, we'll create a dedicated KMS key for this customer, we're going to create a bucket, and we're going to configure the bucket to use their KMS key.

From an architecture perspective, it's going to be really simple. We'll have a Lambda function. It's going to accept a customer ID and name as a parameter. The first thing we're going to do is call KMS and create a key there. Then we'll create a bucket, and then we're going to call the encrypt bucket function. So with that, let's jump into the code.

Thumbnail 150

Thumbnail 170

Thumbnail 180

Walking Through the Lambda Function Code and Initial Permission Failures

All right, so I have my IDE open here, and I'll walk you folks really quickly through the code so you know what's going on. Here we have a Python-based Lambda function. At the top, I'm importing some of the clients that I'm going to need. I'm using the boto3 SDK. Here you can see that we're accepting a customer ID and customer name as a parameter for the event. The first thing we're going to do is we're going to check if the bucket exists because I don't want to create multiple buckets for that same customer.

Thumbnail 190

We're going to keep the code simple. There's a lot of things we could do to make this super robust, but we're going to keep the code simple here. I'm calling HeadBucket, and if it exists, then we'll just return early. Otherwise, we're going to go ahead and create a key, a KMS key, and we're doing a few different things here. We're also adding some tags so it's easy for me to look it up later.

Thumbnail 200

Thumbnail 210

Thumbnail 220

I'm passing in a key policy to determine who can manage the key. I'm going to call CreateAlias just to make it easier to reference. And then now I'm going to go ahead and call CreateBucket. The very last thing we need to do is call PutBucketEncryption to make sure that we're using the key we just created to encrypt all objects in that bucket. So fairly simple.

Thumbnail 240

Thumbnail 250

If I'm getting started, let's just say I'm going to go create a Lambda function. I'll go to the Lambda console here, and we're going to create a function. We'll call this Sec351-Demo. It's a Python function. We're going to go create. Then while that's creating, we're just starting with the default execution role here. We'll cover that in just a little second.

Thumbnail 270

Thumbnail 280

Thumbnail 290

All right, so I'm going to paste my code in. I'm going to deploy, and then what I'm going to do is create a test event so that we could see if it works. As we mentioned, it's going to accept a customer ID and the customer name. Let's do, I don't know, ACME123.

Thumbnail 310

Let's name it and save it so it's easier for us to reference, and then we're going to test it. Of course, it fails, right? Because this Lambda function needs permissions. Here it's saying there was access denied when calling the CreateKey operation because there's no KMS TagResource permissions.

Thumbnail 330

Thumbnail 360

So what does this mean? If I go here and look at the execution role, we could do a quick primer on what this means. For those who are newer to AWS, IAM stands for Identity and Access Management and controls who has access to what across everything. In this case, we have a Lambda function which needs an execution role, which is really just an IAM role. Then we're going to define IAM policies and attach them to that role to determine what this Lambda function has access to. In our case, we need this Lambda function to have explicit permissions to check if the bucket exists, create the KMS key, create a bucket, and then configure bucket encryption.

Thumbnail 370

Thumbnail 380

Thumbnail 390

Thumbnail 410

The Pitfalls of AdministratorAccess and AWS Managed Policies

If we go back here, then we can go and modify this execution role. This comes with some basic stuff, right? This Lambda function has permissions to go to CloudWatch Logs and create those logs so we could debug. Now we're here at the screen. How many of us kind of just use AdministratorAccess here? I'll just expand this. Show of hands, it's okay, be honest. You're just getting started. Nobody uses AdministratorAccess? Not for what? Not for a Lambda, right? Because it's really broad.

Well, when I got started for the first time, I'm like, hey, I want to just get this thing to work, and so I used to do this. Yeah, it's super broad, right? It allows star star. But there's some benefits. If you're just prototyping something, you're testing something, it has permissions to everything. So when I make an update, maybe I want to use a different service or I want to add DynamoDB to this, I don't want to constantly update my permissions. This is nice in the sense that it reduces some work, but we have to always scope down earlier, and we all remember to do that, okay.

Thumbnail 460

Thumbnail 490

So then what's the other option? The other option is you can use some of the other AWS managed policies that are scoped down some more. We use S3, and so I could look for S3 here. How many folks use AWS managed policies as a starting point for these policies? Okay, I see about a third of the room. AWS managed policies are a much better starting point. Here, if I'm looking at S3, well, I know I need to create buckets, right? If I look at my options here, I have read-only access or full access. These other ones are related to different parts of S3. If I look at read-only access, well, I can't create a bucket. So then my only other option here is I could start with read-only and then add the specific permissions I need, right? But now I'm jumping in between things.

Thumbnail 500

Thumbnail 520

Thumbnail 530

Or maybe I start here. Okay, it's a better starting point. It's not the whole world, but it's still pretty broad, and we have to scope it down. Let's just say for argument's sake, we use AWS managed policies. I just want to get started, so we'll do S3 full access. Then we use KMS. I'm going to add the KMS managed policy that I have available here. Now the function should work, right? So let's go to the test tab. And it worked. Uh oh.

Thumbnail 550

Let's see what we have here. We have an error message, and it says it's a malformed policy document exception when calling the CreateKey operation. The new key policy will not allow you to update the key policy in the future. Who knows what happened here? Does anyone know the root cause? Right, okay, so then we got to kind of search around. Oh, you raised your hand. I need to have a resource-based policy on a key? Not quite right. We have to kind of search around and figure this out. I'll save you some time. It's because we're missing an IAM permission.

Thumbnail 600

If we go back to the code, when we call CreateKey, we're passing in a policy. This is saying, hey, when I create this key, I want it to have this policy by default. But to do that, you need a KMS PutKeyPolicy permission.

And so what this error message is saying is it's warning you about a lockout. You're saying, hey, because I couldn't put this key policy, you're going to run into a situation where you're going to be locked out and you can't manage this key anymore. And so that's what this error message is saying. But if you're just getting started and you're not familiar with this, it can take a lot of time to debug.

Thumbnail 640

Thumbnail 650

Thumbnail 660

And so just to show you, I'm going to go ahead and now I'm going to have to create an inline policy because what we're seeing here is that AWS managed policies don't necessarily have all the permissions you need. There's a lot of permissions out there, and so sometimes you have to add the ones you know. And so here I'm going to go to KMS and this is about permissions management. So I'm going to add the PutKeyPolicy, and I'm going to scope it. And we're going to click next.

Thumbnail 690

Thumbnail 700

Thumbnail 710

Thumbnail 720

Alright, so now if I go back, it might need some time to propagate. Let's see what happened here. Let me just change this just in case, but I think it should work. Next, save. That's weird. Well, oh, create alias. OK, so it's because the key already exists. So let me just change the inputs here so we're starting fresh.

Thumbnail 730

Thumbnail 740

Thumbnail 750

Thumbnail 760

The specified bucket is invalid. What is going on? Is it too small? OK, yeah, I can zoom in. Specified bucket is invalid. Do you have a space in your name? I have a space, but I wasn't sure in the code. I'm sorry. No, no, in the inputs, sorry, but it looks like you're right. Yeah, that's weird. Well, for the sake of this, this is how you know it's a real demo, folks. It's not scripted. Yeah, so just trust me, it kind of works there.

Testing Coding Assistants: When LLMs Generate Incorrect IAM Policies

And so, OK, what are the other options? So let's see, it's 2025 and we're all using coding systems. I don't need to handwrite my policies anymore. I have coding systems that do that for me. And so how many of us here use coding systems? It's pretty much everyone, right? I love it. As a product manager, it's really great because it's allowed me to kind of get my ideas and build prototypes and really interact with my engineering teams. And so that's been really great.

Thumbnail 800

Thumbnail 830

And so now, OK, let's try using a coding system. And so I have my code here, and I have Carro. I'm going to say, hey, generate an AWS IAM policy for my code. And so what is Carro going to do here? So it's going to analyze my code and it's going to generate a policy. So it's reading this Python code. This file acts as restricted to a workspace attempted path. Hm, oh, it's great. What is it doing?

Thumbnail 850

Thumbnail 860

Thumbnail 870

OK, I think I figured out a different way to read it. So we have an IAM policy here, and it looks like it's got stuff. So let's take this, and then we're going to go and modify that role. So I'm going to delete the stuff we just added, and then we're going to add an inline policy. And so I'm just going to paste in what I have. And so this should, this is the Carro generated policy. And here we see something interesting.

Thumbnail 900

So when you use the IAM console, it comes with policy validation built in, which is really great. So it tells us, hey, is all this stuff real, right? Or if I'm doing like an anti-pattern. So we look at these error messages, what do we see here? It says the action S3 HeadBucket, I'll zoom in a little bit more, the action S3 HeadBucket does not exist. And indeed it does not exist. When you call HeadBucket, it's actually a ListBucket for the permission. This other one is interesting. PutBucketEncryption, that seems like it should work.

Thumbnail 910

Thumbnail 920

But actually, the IAM action is PutEncryptionConfiguration. And so if we think about why this is, why do coding systems and LLM elements tend to struggle with AWS IAM? They're trained on public information like documentation, code examples, and so on. And based on that, they learn patterns. And this is actually what humans do too. We learn patterns. Oh, I see the CreateKey method call. So therefore, the IAM action is probably KMS CreateKey. And that's a reasonable assumption to make most of the time. But then they don't understand things like hidden dependencies. So we talked a little bit earlier about CreateKey, and I put a policy as a parameter. Oh, it also needs the PutKeyPolicy action. Or there are these naming inconsistencies like PutBucketEncryption, that's the method call. But the action is PutEncryptionConfiguration. And then it's not always the case that one method call requires only one IAM action. Sometimes it requires more than one. For example, S3 CopyObject, you need, initially, you might write, oh, the permission is S3 CopyObject, but it's actually two. You need GetObject and PutObject, which makes a lot of sense.

Thumbnail 1020

So if we recap our options here, we have option one, which was administrator access. And for those who are really just kind of, hey, I'm just trying to blaze through, I'm in a sandbox environment, great. It's the fastest way to get started. And when I'm changing my code, I'm adding different services, I'm changing my mind, I'm not going to need to touch my permissions again. It just works. But of course, we know this is super broad, has powerful administrative permissions. And the security team wouldn't like that, right? But it's okay. I'm going to scope this down later. And we all remember to do that.

Thumbnail 1070

Okay, so then option two, we talked about AWS managed policies. And this is also fast, if you know what you're looking for, you know what services you're using, you know if you're using read or write permissions, or you're using permissions that require permission updates. And this is definitely a better starting point than star star. Now it comes with some trade-offs, right? When you're making code updates, you're adding new services, or you're changing, like, maybe I want to read from the S3 bucket now. Well, now you might need to change which AWS managed policies you use. And they also, we also learned they don't necessarily cover all the possible permissions like KMS PutKeyPolicy here. And so these are great. It's a good starting point, but it's still broad. And we definitely all remember to scope this out. How many people actually go and scope down these policies if they write for those who raised their hand on managed policies? Do we go and scope that down later? I see a lot fewer hands. Don't worry, I won't tell the security teams.

Thumbnail 1090

Thumbnail 1130

Okay, and so then we have option three, artisanal, handcrafted policies. And this is great. You know, it's closest to least privilege. Luke can write them with his eyes closed. But it's time consuming, if you're not like Luke and you didn't help build IAM, it's time consuming to research the exact permissions that you need. And again, we went through that kind of debugging flow there. And then also, every single time because it's least privilege, every single time you do something that requires additional permissions, you have to go back and modify these permissions. And so, sometimes this can slow things down, and you just want to build. Of course, LLMs solve that, right? It's fast. And a lot of times it does work until it doesn't. And then that continues to add to that debug cycle. Additionally, they're not always deterministic. They don't always generate the same policies. And that, you know, when it comes to access management, you really want some determinism there.

Thumbnail 1150

Thumbnail 1170

Introducing IAM Policy Autopilot: Deterministic Policy Generation Through Static Analysis

And so I want to introduce you to something we just launched on Sunday, and we made open source. It's called IAM Policy Autopilot. It's a command line tool that can statically analyze your code and generate policies based on what your code is doing. We've also exposed it as an MCP server so that your coding systems have access to the same deterministic policy generation capabilities. So let me talk a little bit about how it works.

Thumbnail 1200

So step one, we use static code analysis to parse your code. And we do this all locally on your machine. And so here we have the Lambda function on the left, and we pick out all the SDK operations, HeadBucket, CreateKey, CreateAlias, and so on. Then the next step is we map these operations to IAM actions. So, one thing we do is we've actually published our IAM service reference available.

Our IAM service reference is available, so I have a URL here. If you type that in, you'll get a big JSON blob of all the IAM actions and conditions for all the different services. This is great, super helpful reference data. Recently, over the last month, we just launched an enhancement to this and added operations to actions mappings.

Here's just a JSON snippet. What this does is it says, hey, we have this boto3 create key operation, and it tells you what are all the actions that are associated with that. Here it's saying, okay, when I call create key from boto3, I need the KMS create key action, the KMS put key policy action, and the tag resource action. Now, I don't always need them, but this is kind of like the outer boundary of what you might need when you typically call that, and this is based on actual service data that we have.

Thumbnail 1270

Now, the last step is we combine that. We got the method calls, we have our lookup, so now we can map out all the actions we need, and then we can generate a policy deterministically based on that. Now, I will say this policy is not a real one. I needed it to fit on one screen, but I'm trying to illustrate that based on this, we can now generate a policy.

There are three key benefits with this approach. The first is it's really deterministic. You give it the same code, and it generates the same actions and policies every single time, and that's really valuable when we want to think about access management. It's also up to date. It uses the live service reference data we publish. If we think about coding systems, their training data was like last Thanksgiving, something like that. There are tools and stuff that can help keep it up to date, but here, we just launched a whole bunch of services and all those actions are now available in the service reference data, so it has live up to date information.

Thumbnail 1340

Thumbnail 1350

It's reliable because we're using this service reference data. You're not going to see S3 head bucket as an action that shows up here. It's all based on real stuff. Now, we're just getting started here, and I do want to call out some limitations that you should be aware of.

Today, we just generate identity policies, and so we're not supporting cross account use cases where you might need a resource based policy and an identity policy to grant access to an S3 bucket there. These policies, as I touched on earlier, are not necessarily going to be the most shrink wrapped least privileged ones. We do the action mapping here, and these are typically the actions associated with that method call. Our intention here is to give you a better starting point than some of these other broad options that we have.

When we looked at some internal data, we looked at the developers' actual policies that developers wrote, and we compared them to the number of permissions that IAM Policy Autopilot generated. We found that it generated just 3% of the permissions of what's on live running code today. That's meaningful. What's out there is 97% overprivileged, even though we all have good intentions.

The other thing is static analysis today doesn't support automatic identification of resource names. You do want to scope your resources down to the specific ones. I don't want to just be able to read from every single S3 bucket in this account, so you definitely want to scope that down. Depending on your journey and where you are, it could be something you do later, or you want to do right in that moment. This is something that we definitely want to add coming soon.

Then we have that operation action mapping, and it generally catches the vast majority. But sometimes there are edge cases where certain complex cross-service dependencies might be missing there. I'll give you an example. Sometimes when you read from an S3 bucket and it's encrypted, you also need KMS permissions there, so there are some cross service mappings. We're continuously adding to this.

Thumbnail 1480

Thumbnail 1490

Live Demo: Setting Up IAM Policy Autopilot with Claude and Deploying via CloudFormation

I've done enough talking now. Let's actually see it in action. So I'm going to pass the mic to Luke. I'm going to take over and we'll start by setting up IAM Policy Autopilot. Today we're going to use Claude, and first we're going to get rid of this policy that we know is wrong. We're going to get rid of this session, which gave us the wrong data. But first, we need to set up IAM Policy Autopilot, so we're going to pop over into our terminal.

Thumbnail 1510

Thumbnail 1520

Thumbnail 1530

Thumbnail 1540

IAM Policy Autopilot has a variety of ways that we distribute it, but the easiest one to get started with is just doing pip install. So we'll just do a pip install of IAM Policy Autopilot, and it's that easy. So now I can just run IAM Policy Autopilot, and there it is. So as you can see, it does have functionality exposed here, but today we're going to focus mostly on the use case where you're using it with a coding assistant in your IDE.

Thumbnail 1560

So the first step of that is setting it up as an MCP server, and every coding assistant you use is going to have a different method of configuring an MCP server. For Kiro, you click on the little ghostty icon, you come down to modify your MCP servers, and you get this nice little configuration. We're just going to paste in IAM Policy Autopilot. Now, for those familiar with pip, you may be familiar with the uvx tool which allows you to run pip applications. I'm using uvx in this case so I can get the nice update functionality that comes with it, so it will always update the pip installation every time I run the server. So I'm going to save this, and you can see over on the left that it connected and it found all of my tools. So now IAM Policy Autopilot is available to Kiro as an agent it can use.

Thumbnail 1600

Thumbnail 1630

Thumbnail 1650

So let's go back to our code, and nothing has changed here, but we're going to try to generate a new policy now. Get rid of that session and start over. So I'm going to ask you once again to please create an IAM policy for this script. Now, this is the fun part of the demo because this is the non-determinism that you get with any model-based thing, so we're going to hope that Kiro behaves today and plays to our script. So it starts, reads our file and says, hey, I want to scope this appropriately, so tell me what region and what account ID you want to access. So we're going to use US West 2, and my account ID is also saved right here.

Thumbnail 1660

Thumbnail 1670

Thumbnail 1680

Thumbnail 1690

Thumbnail 1700

And so it's chosen to call IAM Policy Autopilot. It's going to pass these parameters. You know, we're just going to vibe today and not really read those and just say run. All right, and it spits out this policy. And here's a short description. Would you like me to save this file? Yeah, sure, please. And let's look at these permissions. So we'll start, we got these S3 permissions at the top. And most importantly, I don't see HeadBucket. I don't see any permissions that don't exist. It did give me the full closure permissions I may need for S3 based on the script, but I know they're all going to be correct and they're going to work. Here's my KMS permissions. You can see that the region is filled in, the account ID is filled in.

Thumbnail 1750

The resources are wildcards because these are all create operations and so we can't scope them to a specific thing. But now we have our policy. And so if we want to test out that policy, instead of going the long way, I'm going to use CloudFormation to do this. I encourage everyone to use some type of infrastructure as code setup to manage their entities across their account, and today I'm going to use CloudFormation, and Kiro's going to write it for me. Oh, I should, sorry, I need to accept those changes. Okay, so now Kiro has everything it needs. It has the code. It knows that if it's going to make a Lambda, it needs to run that using an IAM role, so it's probably going to create that role too.

Thumbnail 1760

Thumbnail 1800

Let's see what it spits out. All right, so here's my code, that's great. And here is my policy, and as we can see, it is exactly the same policy that we generated. It just took that in from the file and put it into here. Using CloudFormation, you can fill in variables and do substitutions for created resources. Our particular policy didn't have that because we're doing creates, but this is one of those cases where using something like infrastructure as code with IAM Policy Autopilot gives you a nice better together setup where it starts to least privilege down to the resources you need because it knows to fill in the variables at runtime and at creation time. So we have this now. We're going to ask it to deploy our template.

Thumbnail 1810

Thumbnail 1840

Alright, let's go. Okay, so while that's running, we can talk a little bit about the CLI. As Kevin mentioned, IAM Policy Autopilot does work as a CLI that also happens to have an MCP server endpoint exposed to it. When you use it as a CLI, it works just the same. So I can say iam-policy-autopilot, and then let's do generate-policies. I'm in the same directory, so we'll just say my customer onboarding. And I run that, and it immediately spits out that same policy.

Thumbnail 1850

Thumbnail 1860

Thumbnail 1870

Let's make it a little easier to read for everyone so we can see it is, in fact, the same. And that's the determinism that you get from this tool that it's always going to give you the same policy for your code. The CLI allows you to easily integrate this into other CI flows that you may have. So if you have an existing flow that takes your code and does some builds on it, we can now have that integrate with this and spit out your policies as you go.

Thumbnail 1880

Thumbnail 1890

Thumbnail 1910

Alright, let's go back to CloudFormation here. It says it's still running, but we can oh, it says it just finished, but let's check on it just to be sure. Hey, it's good, perfect. Okay, so now we need to test that Lambda function to make sure it works, because I've made these claims that we have the right permission. Let's enforce that, then back it up by saying let's invoke this function. Okay, it's going to ask to run the Lambda. We're going to say run. You can see here in the terminal, it ran the CLI command, and status 200. And Kira wants to run it again for some reason. We don't know why, but we'll just tell it to stop.

Thumbnail 1930

Thumbnail 1940

Thumbnail 1950

Thumbnail 1960

Thumbnail 1970

Iterating with SQS Integration: Agentic Workflows for Fixing Access Denied Errors

Okay, so that's great. I had some code, I ran it, it worked, and made a function that was good. But now let's say I want to iterate on this code, and I want to, instead of just creating the key, creating the bucket, I also want to get a notification that this is done because I want to do some further events on that. So let's add SQS as a notification queue. So I've got another script here I'm going to pull over because I don't have time to type this. But here's the same script. I've added an SQS client, I've given it a queue URL, and down at the very end, after we create the bucket and put the KMS configuration on it, we now send a message to SQS to let us know that we just created all the infrastructure we needed for this new customer, and here's the basic identifiers we need for it.

Thumbnail 2030

Thumbnail 2040

Thumbnail 2050

So from here, if we go back to our chat and say, please generate a CloudFormation template for, if I can spell, and apologies, for the current file, it's going to go through the same process again. It's going to create my template. Now, I didn't explicitly ask it to update the permissions on it, and so it's probably going to just pick up the IAM policy JSON it used here. I'll wait for the agent to do its thing. Still going. Okay, unfortunately, Kira was very helpful and decided to add this permission. Let's unfix that. So we're going to come down here, we're going to just remove this. We all know that it's needed, but, you know, for the sake of the argument, let's say that Kira decided not to mess with our policy. Okay, and now please deploy.

Thumbnail 2070

Thumbnail 2080

Thumbnail 2090

Thumbnail 2100

It'll go through again with the new change set. While it's running here, so in this case Kira did add the right permissions and we did probably this dry run, I don't know, ten times, Luke, and at least two dozen times, and, you know, going back to that nondeterminism, it did it some of the time, but a lot of times it didn't, so you're kind of coming down to chance on this.

Thumbnail 2110

So here we're going to simulate an example of what happens if it didn't add that permission, or sometimes ourselves, we're all humans too, right? And sometimes we deploy something and we're like, oh, I forgot to update the execution role. So now we can see IAM Policy Autopilot includes additional functionality to help you with this as well.

Thumbnail 2130

Thumbnail 2150

So we see a case where CloudFormation is just taking its time, taking its time, it's kind of stuck. We're going to, I have a feeling that stack will never finish, so we're just going to try from a different template. Doing a little bit of a cooking show, ever anyone watched that, you pulled it from the oven. All right, okay, so here we go.

Thumbnail 2170

Thumbnail 2180

Thumbnail 2210

Thumbnail 2220

It's deploying and we know this is going to break. So we know that it's going to create the template, it's going to create an IAM user, and it's going to create an IAM role. That role is not going to have the required SQS permissions. That's intended. So we're going to execute the Lambda function. Now we already deployed that. Okay, so let's run that function.

Thumbnail 2230

Thumbnail 2240

Thumbnail 2260

Okay, and we can see there is an error here. And that error is the KMS key already exists. Let's try a different input. Okay, so now we're using customer bouncy pancake, and okay, look, we got a message saying, as we expected, we do not have permission to do this thing. So this is where IAM Policy Autopilot has additional functionality that we can look at these messages and say, hey, you don't have permission to do this thing, why don't you fix that for me? And this is how we can do a full agentic flow with this by having the ability to analyze, hey, I got an error, can you go fix it for me?

Thumbnail 2290

Thumbnail 2300

Thumbnail 2310

And it chose instead to modify the template. So let's try this again. Okay, now it realizes, hey, I have a tool to fix these errors, so let's try running that. And this tool allows us to look at an access denied message and say, hey, I know what this resource is, I know what this principal is, I can generate a policy that's going to fix this and put all the necessary permissions in.

Thumbnail 2320

Thumbnail 2340

This is a simple case because it's just SQS SendMessage, but if it was a more complex case where it had dependent actions, IAM Policy Autopilot would also add those. So now I'm going to tell it to deploy it to my account. It's going to invoke a tool that I apparently just trust it all the way. And so now it is deployed in a policy on my account to give me those additional permissions.

Thumbnail 2350

Thumbnail 2360

And so it wants to test the function again, we can say run. Let's get to the bottom here, and we can see, oh, now it passed. And there's our valid response coming out of Lambda. And it might be important to talk about why this is really useful. So here we have kind of like a one-off instance, but with these agents, you want them to go run off by themselves. I want to say, hey, go complete this task for me, and I'm going to go walk away, grab a cup of coffee, and then come back.

Thumbnail 2390

And so with this, you know, it's easy enough for a human to go, hey, okay, this error message, I need to add this missing thing. But if I'm walking away, I'm not supervising it. I want my coding assistant in Kiro to be able to do that myself. And so now it has the tools to be able to do that. And let's say sometimes you have layered access denied errors, right? You have one that shows up, you fix it, you run it, another one shows up. And so for humans, it's kind of tedious. But now,

by exposing this as a tool, this coding assistant can go and unstuck itself. That's the power and why we expose this capability as one of the tools.

Thumbnail 2420

Recap and Call to Action: Open Source Development and Future Improvements

Okay, cool. So I'm going to do a quick recap here. We showed you how to use IAM Policy Autopilot to generate policies from your code. We showed that when you combine it with a coding assistant that finds issues in infrastructure as code, it can do things like variable substitution for scoping down my resources. There's a really great better-together story. We also talked to you a little bit about that fix access denied workflow, right? This empowers your coding system to get itself unstuck. It's the same tools that we ourselves would use if we looked at the access denied error.

We did this all without broad permissions like administrator access or AWS managed policies or just star even. It was deterministic. It only used real IAM actions and services, and it's up to date. Now, we're just getting started. We're only supporting identity policies today, and it's not guaranteed to be the least privilege, right? There's still some scoping down, but now you have a much better starting point.

We talked a little bit about how Policy Autopilot generates 3% of the permissions that developer policies are today. We're working on adding in the static analysis support to automatically identify your resources, so it will only get better over time. I have a call to action here. We've open sourced this for a reason. We want to collect your feedback. We want to do the development in the open.

Thumbnail 2510

Please go visit the GitHub repository and clone it. We built it in Rust, which we love very much. If you have feedback or you have questions, feel free to engage with us there. We're always planning to improve it, so keep on the lookout for that. All right.

Thumbnail 2530

So with that, I have my obligatory thank you. Luke and I will hang out here if you have some one-on-one questions, so happy to take them, or we can do a group discussion here up front. Otherwise, this is your reminder to please complete the survey in the mobile app.

; This article is entirely auto-generated using Amazon Bedrock.