2025-12-01 11:08:56
In this article, we will explore how to transition from a monolithic architecture to a microservices architecture without refactoring any existing code. We will leverage the power of the Onion/Clean architecture to achieve this goal.
In the past I have written extensively about the Onion/Clean architecture, so if you are not familiar with it, I recommend reading the following articles first:
Implementing SOLID and the onion architecture in Node.js with TypeScript and InversifyJS
Build HTTP APIs with Dependency Injection in TypeScript — Meet the Inversify Framework
Enforce Clean Architecture in Your TypeScript Projects with fresh-onion
To begin, we will start with a monolithic application that follows the Onion/Clean architecture principles. This application will have a well-defined separation of concerns, with distinct layers for domain models, domain services, application services, and infrastructure.
When working on a greenfield project, you should always implement a Monolithic first, even if you plan to transition to microservices later. It is very hard to predict the right service boundaries upfront.
If it is a new project there are some major risks derived from the fact that there are a lot of unknowns both in terms of technical implementation as well as business requirements. There is also added complexity that comes with a microservices architecture, such as inter-service communication, data consistency, and deployment strategies. As an engineer you want to reduce over exposure to risks as much as possible. Starting with a monolith allows you to focus on building the core functionality of your application without the added complexity of managing multiple services from the outset.
Note: Service boundaries are the boundaries that define the scope of a microservice. They determine what functionality and data a microservice is responsible for.
In my case I started to work on a project that I knew that I wanted to eventually split into microservices. So I started with a monolith that followed the Onion/Clean architecture principles. Because I knew that I wanted to split the application into microservices later, I made sure to keep the service boundaries in mind while designing the monolith. This meant that I designed the domain models and services in a way that would make it easy to extract them into separate services later on. Here are some of the rules I followed while designing the monolith:
Each boundary will use its own database schema (if using a relational database).
No database transactions, joins or foreign keys that involve multiple schemas (boundaries).
Calls from one boundary to another would always be http calls, even though in the monolith it would be possible to make direct calls. To do this we don't use URLs like http://localhost:8080/api/xxxx. We use services URLs like http://service-name:8080/api and use a reverse proxy to route the requests to the correct service. In the monolith all services run in the same process, so the reverse proxy routes the requests to the same process.
How does the reverse proxy work? In local development and in the monolith deployment, we use a reverse proxy (like Nginx or Traefik) that routes all service URLs (e.g.,
http://auth-service:8080/api,http://cms-service:8080/api) to the same monolith process. The service names are just DNS aliases that all resolve to the same host. When we transition to microservices, we simply update the reverse proxy configuration (or use Kubernetes service discovery) so that each service name resolves to its own dedicated container. The application code remains unchanged—only the infrastructure routing changes that is configured outside of the application.
These rules ensured that the monolith was designed in a way that would make it easy to extract the boundaries into separate services later on.
The application I'm sharing as an examples is a CMS with asset management and multi-tenant capabilities. The directory structure of the monolith looked like this:
src
├── app-services // These are not aware of infrastructure details
│ ├── auth
│ ├── cms
│ ├── dam
│ ├── email
│ ├── logging
│ └── tenant
├── domain-model
│ ├── auth
│ ├── cms
│ ├── dam
│ └── tenant
├── index.ts // Monolith composition root
└── infrastructure // These are aware of infrastructure details
├── blob
├── db
│ ├── repositories // db queries implementations
│ │ ├── auth
│ │ ├── cms
│ │ ├── dam
│ │ └── tenant
│ └── db.ts
├── email
├── env
├── http
│ ├── controllers // http layer implementations
│ │ ├── auth
│ │ ├── cms
│ │ ├── dam
│ │ └── tenant
│ ├── middleware
│ └── server.ts
├── ioc
│ ├── index.ts
│ └── modules // IoC modules for each boundary
│ ├── asset-management-ioc-module.ts
│ ├── auth-ioc-module.ts
│ ├── infrastructure-ioc-module.ts
│ ├── template-management-ioc-module.ts
│ ├── content-management-ioc-module.ts
│ └── tenant-ioc-module.ts
├── logging
└── secrets
As you can see the directory structure is organized as a monolith there are not separate root directories for each micro service. The onion architecture splits the application into layers, and each layer has its own responsibility. The infrastructure layer is responsible for the implementation details, such as database access, email sending, and logging. The application services layer is responsible for the business logic of the application. The domain model layer is responsible for the domain entities and value objects.
Note: If you want to learn more about the Onion/Clean architecture, I recommend reading my previous articles linked in the prerequisites section.
The layers remain decoupled from each other, at design and compile time. However, at runtime, the inversion of control (IoC) container resolves the dependencies and wires everything together. In order to achieve this, the
IoC containers needs to be aware of link between interfaces and implementations across all layers. These links are known as bindings.
It is important to understand and highlight the Inversion of Control (IoC) principle here. The IoC principle states that the control of the flow of the application should be inverted. In other words, you stop deciding "when" and "how" an object gets its dependencies. Instead, something external (in our case, the IoC container) gives (injects) the dependencies to your object. The IoC container is responsible for resolving the dependencies and wiring everything together at runtime.
With InversifyJS (my IoC container of choice for TypeScript projects) we can organize these bindings into IoC modules. Each module is responsible for binding the interfaces to their implementations for a specific boundary or layer. The following is an example of an IoC module for some infrastructure concerns:
// ...
export const infrastructureIocModule = new ContainerModule((options) => {
const { bind } = options;
// Singleton that manages Azure Key Vault connections
bind<SecretsManager>(SecretsManagerSymbol)
.to(SecretsManagerImplementation)
.inSingletonScope();
// Singleton that manages Cosmos DB connections
bind<DatabaseConnectionManager>(DatabaseConnectionManagerSymbol)
.to(DatabaseConnectionManagerImplementation)
.inSingletonScope();
bind<AppSecrets>(SecretsSymbol)
.toDynamicValue(async (context) => {
const secretsManager =
await context.getAsync<SecretsManager>(SecretsManagerSymbol);
await secretsManager.initialize();
return secretsManager.secrets;
})
.inSingletonScope();
bind<UnitOfWork>(UnitOfWorkSymbol)
.to(UnitOfWorkImplementation)
.inSingletonScope();
bind<DbClient>(DbClientSymbol)
.toDynamicValue(async (context) => {
const databaseConnectionManager =
await context.getAsync<DatabaseConnectionManager>(
DatabaseConnectionManagerSymbol,
);
return await databaseConnectionManager.getDbClient();
})
.inRequestScope();
bind<EmailService>(EmailServiceSymbol)
.to(EmailServiceImplementation)
.inRequestScope();
bind<Logger>(LoggerSymbol).to(LoggerImplementation).inSingletonScope();
bind<BlobStorage>(BlobStorageSymbol).to(BlobStorageImplementation);
});
This module binds various infrastructure services, such as the secrets manager, database connection manager, email service, and logger. Because these services are used across multiple boundaries, it makes sense to have them in a separate infrastructure IoC module. This IoC module can be considered a platform module, as it provides services that are used across multiple boundaries.
Then we have an IoC module for each boundary. The following is an example of an IoC module for the authentication & authorization boundary:
// ...
export const authIocModule = new ContainerModule((options) => {
const { bind } = options;
// Middleware
bind<ExpressMiddleware>(AuthorizeMiddlewareSymbol).to(AuthorizeMiddleware);
bind<ExpressMiddleware>(AuthenticateMiddleware).toSelf();
// Controllers
bind(AuthController).toSelf().inSingletonScope();
// Repositories
bind<UserRepository>(UserRepositorySymbol)
.to(UserRepositoryImplementation)
.inRequestScope();
bind<VerifyRepository>(VerifyRepositorySymbol)
.to(VerifyRepositoryImplementation)
.inRequestScope();
bind<ResetRepository>(ResetRepositorySymbol)
.to(ResetRepositoryImplementation)
.inRequestScope();
// Services
bind<AuthService>(AuthServiceSymbol)
.to(AuthServiceImplementation)
.inRequestScope();
bind<PasswordHashingService>(PasswordHashingServiceSymbol)
.to(PasswordHashingServiceImplementation)
.inRequestScope();
bind<AuthTokenService>(AuthTokenServiceSymbol)
.to(AuthTokenServiceImplementation)
.inRequestScope();
bind<TwoFactorAppService>(TwoFactorAppServiceSymbol)
.to(TwoFactorAppServiceImplementation)
.inRequestScope();
bind<TOTPService>(TOTPServiceSymbol)
.to(TOTPServiceImplementation)
.inRequestScope();
});
In the monolith, we only start one application server that handles all incoming requests for all boundaries. We can achieve this by using a single IoC container that loads all the IoC modules for all boundaries:
// ...
export function createContainer() {
const container = new Container();
container.load(
...[
infrastructureIocModule,
authIocModule,
tenantIocModule,
assetManagementIocModule,
templateManagementIocModule,
],
);
return container;
}
In your application there should be a single point in which the layers are "composed" together. This is known as the composition root. In our case, the composition root is the IoC container. In the monolith, we create a single IoC container which means that we have one composition root for the entire application.
Finally, we run the monolith application by creating a server that uses the container:
import { createAppServer } from "./infrastructure/http/server";
import { createContainer } from "./infrastructure/ioc";
import "reflect-metadata";
import "dotenv/config";
const port = process.env.API_PORT || 3001;
export const defaultOnReady = () =>
console.log(`Server started on http://localhost:${port}`);
export async function main(onReady: () => void) {
const container = createContainer();
const app = await createAppServer(container);
app.listen(port, () => {
onReady();
});
}
(async () => {
main(defaultOnReady);
})();
At this point, we have a fully functional monolith with well-defined boundaries. The key insight is that each boundary is encapsulated in its own IoC module, and all modules are composed together in a single container. Now we're ready to see how we can split this monolith into microservices without changing any of the existing code.
After working for an extended period of time on the monolith, we will learn more about the service boundaries and how they should be defined. At some point, we will be ready to split the monolith into microservices. The great news is that because we have followed the Onion/Clean architecture principles and have encapsulated each boundary in its own IoC module, we can easily extract each boundary into its own microservice without changing major parts of the existing code.
First we need to create a new composition root for each microservice. Each composition root will create its own IoC container and load only the IoC modules that are relevant for that specific microservice. We can create a helper function that creates a microservice given a configuration object:
export interface ServiceConfig {
port: number;
name: string;
iocModules: ContainerModule[];
}
export async function createMicroService(config: ServiceConfig) {
const { port, iocModules } = config;
const container = new Container();
container.load(...iocModules);
const app = await createAppServer(container, port);
app.listen(port, () => {
console.log(`Server started on http://localhost:${port}`);
});
}
Now we can create a new entry point for each microservice. Each entry point will use the createMicroService function to create a microservice with its own IoC container and relevant IoC modules. For example, here is the entry point for the authentication & authorization microservice:
// api/src/infrastructure/http/microservices/auth/index.ts
await createMicroService({
port: 8080,
iocModules: [
infrastructureIocModule,
authIocModule
],
name: "auth",
});
And here is the entry point for the content management microservice:
// api/src/infrastructure/http/microservices/cms/index.ts
await createMicroService({
port: 8080,
iocModules: [
infrastructureIocModule,
templateManagementIocModule,
contentManagementIocModule
],
name: "cms",
});
We then use Kubernetes to deploy each microservice as a separate deployment. Each deployment will run its own instance of the microservice, and we can use Kubernetes services to expose each microservice to the outside world.
The main idea here is that we should move most of the complexity of managing multiple microservices to the CI/CD layer. Each microservice has its own entry point, and we can use our CI/CD pipeline to build, test, and deploy each microservice independently. Most of the code remains unchanged, as we have not modified any of the existing business logic or domain models. The only changes we have made are in the composition roots for each microservice.
The key insight here is that we use the same codebase for all microservices. We don't create separate repositories or duplicate code. The main goal is to continue to develop the application in a way that feels like working on a monolith as much as possible.
Most of the microservices complexities are pushed out to the CI/CD layer, where you should leverage Kubernetes heavily to manage the deployments, scaling, and service discovery.
To achieve this, we use a single Dockerfile with different build arguments to specify which entry point to use. The key optimization is that each microservice image only includes the code relevant to that service:
FROM node:20-alpine
WORKDIR /app
COPY . .
ARG SERVICE_NAME=monolith
RUN node scripts/prune-services.js $SERVICE_NAME
RUN npm ci && npm run build
ARG SERVICE_ENTRY_POINT=dist/index.js
ENV ENTRY_POINT=$SERVICE_ENTRY_POINT
CMD ["sh", "-c", "node $ENTRY_POINT"]
The prune-services.js script removes directories not relevant to the target service. For example, when building the auth service, it removes app-services/cms, domain-model/dam, infrastructure/http/controllers/tenant, etc.—keeping only auth-related code and shared infrastructure.
Then in our CI/CD pipeline, we build separate images for each microservice by passing different entry points:
# Build auth microservice
docker build \
--build-arg SERVICE_NAME=auth \
--build-arg SERVICE_ENTRY_POINT=dist/infrastructure/http/microservices/auth/index.js \
-t auth-service .
# Build cms microservice
docker build \
--build-arg SERVICE_NAME=cms \
--build-arg SERVICE_ENTRY_POINT=dist/infrastructure/http/microservices/cms/index.js \
-t cms-service .
You can use hashes to verify if a microservice needs to be redeployed—if the code for a specific microservice has not changed, you can skip the deployment for that microservice. Since each image only contains service-specific code after pruning, the resulting image digest will only change when relevant code changes. Compare the new image digest against the one currently in your container registry using skopeo:
REGISTRY=myregistry.azurecr.io
# Get digest of newly built image (after pushing to registry)
NEW_DIGEST=$(skopeo inspect docker://$REGISTRY/auth-service:$COMMIT_SHA | jq -r '.Digest')
# Get digest of currently deployed image (tagged as 'latest' or 'production')
DEPLOYED_DIGEST=$(skopeo inspect docker://$REGISTRY/auth-service:latest | jq -r '.Digest')
# Only deploy if the digests differ
if [ "$NEW_DIGEST" != "$DEPLOYED_DIGEST" ]; then
# Tag the new image as latest
skopeo copy docker://$REGISTRY/auth-service:$COMMIT_SHA docker://$REGISTRY/auth-service:latest
# Update the deployment
kubectl set image deployment/auth-service auth-service=$REGISTRY/auth-service:$COMMIT_SHA
fi
Since each service build is independent, you can run all builds in parallel to speed up the CI/CD pipeline.
The Onion/Clean architecture is powerful because it allows you to build applications that are easy to maintain and extend over time. Your application becomes a modular plugin system where each component can be swapped out independently.
For example, in this particular application, we migrated from CosmosDB to PostgreSQL a few months after starting the project. Because we had followed the Onion/Clean architecture principles, we were able to swap out the database implementation layer (infrastructure/db/repositories) without changing any of the existing code. We simply created a new IoC module for PostgreSQL and updated the composition root to use the new module.
The composition root is a very powerful concept because we delay the decision of how to compose the application until runtime. This allows us to easily transition from a monolithic architecture to a microservices architecture without changing any of the existing code (if you designed the monolith with this goal in mind from the start).
2025-12-01 11:02:04
Quantum computing holds immense promise, but wrestling with its inherent complexity can feel like navigating a dense jungle. How do we abstract away low-level qubit operations and build more intuitive, manageable quantum programs? Imagine organizing quantum operations into reusable, effect-based units, much like organizing files into folders on your computer.
The core idea is Quantum Granular Computing: treating groups of quantum operations as single, abstract units called "granules." Instead of directly manipulating individual qubits, we manipulate these granules, each representing a specific effect on the quantum state. These granules aren't just fixed blocks; they can morph and adapt based on context, providing a flexible approach to quantum programming.
Think of it as crafting high-level "quantum verbs" that simplify algorithm design. These "verbs" are represented by mathematical operators, providing a solid theoretical foundation.
Benefits of Quantum Granules:
The most challenging aspect of implementing this is defining the right set of foundational granules. It's tempting to make them too specific, defeating the purpose of abstraction. A practical tip is to start with a small set of granules corresponding to common quantum operations and gradually expand as needed. We could apply this approach to quantum simulation, representing complex molecular interactions as a series of granular effects, leading to faster and more efficient simulations.
Quantum granular computing offers a powerful abstraction mechanism, paving the way for easier quantum software engineering. This approach promises to unlock a new era of quantum algorithm design, making quantum computing more accessible and scalable.
Related Keywords: Quantum Granular Computing, Granular Computing, Quantum Computing Architectures, Effect-Based Programming, Quantum Algorithms, Quantum Error Correction, Quantum Abstraction, Quantum Software Engineering, Quantum Simulation, Quantum Machine Learning, Algebraic Structures, Quantum Logic, Quantum Information Theory, Scalable Quantum Computing, Post-Quantum Cryptography, Quantum Supremacy, Quantum Hardware, Qubit, Superposition, Entanglement, Quantum Gate, Quantum Circuit, Reference Architecture, Granule
2025-12-01 11:01:19
Many people are confused when choosing LED light strips: What are the differences between RGB light strips and ordinary single-color light strips? In fact, apart from the different color effects, there are also essential differences in their control methods.
Single-color light strips only emit a fixed color (such as warm white, pure white or red), have a simple structure, and usually only have two wires (positive pole and negative pole). As long as it is connected to the matching power supply, the light strip will remain on constantly. If dimming is needed, simply add a simple PWM dimmer or a smart switch. The operation is intuitive and suitable for basic lighting.
Each LED of the RGB light strip contains three chips: red (R), green (G), and blue (B). By mixing the three colors, it can present 16 million colors. It usually has four wires: one common positive pole (+12V/24V), and the other three control the R, G, and B channels respectively. It must be used in conjunction with a dedicated RGB controller (such as an infrared, Bluetooth or Wi-Fi controller) and cannot be directly connected to a power source; otherwise, a short circuit may occur.
For this reason, RGB light strips can achieve dynamic effects such as color change, gradient, and music synchronization, but the control is more complex. Monochrome light strips excel in simplicity and reliability, making them suitable for scenarios that do not require fancy effects.
Therefore, for an atmosphere, choose RGB; for practicality, choose monochrome - provided that the correct controller and power supply are paired.
2025-12-01 11:00:00
Why You Need a Local-First API Client (With Hands-On Example)
Local-first API clients boost speed, privacy, and offline reliability by keeping requests and data on your device—cutting hidden dependencies and reducing technical debt.
Your lightweight Client for API debugging
No Login Required
Get Requestly
If you’re like most developers, API testing is part of your everyday workflow. Many of us have relied on popular tools like Postman or Insomnia to make API interaction easier. These tools excel at collaboration and syncing but often impose a hidden cloud dependency that can slow you down, introduce privacy concerns, and disrupt your work when the internet falters.
A local-first API client puts you in control by running everything locally on your device by default. This approach not only speeds up requests but also helps reduce technical debt by increasing transparency, consistency, and stability in your API testing workflows.
What Does “Local-First” Mean?
Local-first API clients store and execute requests, environment variables, and histories directly on your machine, avoiding routing through external servers unless you explicitly opt into cloud syncing. This design enables:
Blazing Fast Responses: Requests go directly from your device to your API endpoints without unnecessary detours.
Privacy and Security: Sensitive tokens, credentials, and request bodies never leave your device without your consent.
Offline Development: Complete API testing availability without any internet connection.
Optional Cloud Sync: For collaboration or backup, syncing is always optional, never mandatory.
Why Cloud-Only API Clients Can Create Technical Debt
Cloud-first clients, while great for team collaboration, can introduce hidden technical debt in your API testing lifecycle:
Hidden External Dependencies: Your workflow depends on third-party cloud infrastructure. Outages or changes on their end can halt your progress unexpectedly.
Opaque Debugging: Additional network hops add noise to debugging and obscure true API response times or error origins. This can lead to prolonged troubleshooting cycles.
Data Leakage Risk: Automatically syncing sensitive data like tokens and internal endpoints to the cloud increases risk exposure, especially in regulated environments.
Versioning Headaches: Cloud syncing without file-based storage can result in conflicts, lost history, and inconsistent API specifications.
Context Switching: Frequent toggling between cloud portals and local dev environments breaks developer flow and increases manual steps, amplifying human error.
This accumulating debt impacts not only speed but also the reliability and maintainability of your API testing infrastructure.
In-Depth Benefits of Local-First API Clients
Local-first clients do more than just speed up requests—they improve workflow quality and reduce long-term maintenance effort:
Full Transparency: Requests and configurations stored locally as plain files can be audited, modified, and managed without vendor lock-in or black-box cloud formats.
Seamless Git Integration: Easily track changes, branch, merge, and rollback API tests like code, aligning testing closely with application development.
Stable, Consistent Environments: Local environment variables avoid surprises from unexpected cloud overwrites or syncing issues.
Simplified Incident Analysis: Direct client-to-API interaction removes intermediate servers, providing cleaner logs and more reliable debugging data.
Reduced Cognitive Load: Developers deal with one coherent system locally, rather than juggling the complexities of both local and cloud UIs or inconsistent sync states.
Enhanced Security Posture: Keeping request payloads local mitigates common vectors for leaks and complies better with data protection standards.
Improved Developer Autonomy: Offline availability allows uninterrupted work during network disruptions—a frequent pain point rarely addressed by cloud clients.
Together, these factors lower your technical debt by establishing a transparent, traceable, and resilient API testing foundation.
Hands-On Example: Testing a Local API with Requestly
Imagine you’re running a local user management service at http://localhost:3000/users. Here’s how a local-first client simplifies your workflow:
Launch Requestly, no login required—immediate start.
Create a GET request:
http://localhost:3000/users
Send and get instant results:
[
{"id": 1, "name": "Alice", "email": "[email protected]"}
]
Add a POST request to add a new user:
{
"name": "Bob",
"email": "[email protected]"
}
Send POST, then GET again to confirm the addition is reflected locally.
Export and version requests as JSON files, committing them into version control alongside your app code.
Requestly also enables you to create local mocks and simulate APIs, providing instant feedback and isolating front-end development from backend dependencies, all without cloud infrastructure.
If you want to try a local-first workflow, you can start using Requestly here: https://requestly.com
Balancing Local-First and Cloud-Based Tools
Local-first clients are not a full replacement for cloud-first tools but an essential complement:
Use local-first clients to accelerate solo development, safeguard sensitive data, and ensure offline availability.
Bring in cloud clients when collaborative editing, shared environments, or CI/CD pipelines require it.
This hybrid strategy minimizes overhead and balances speed, security, and collaboration demands.
Why This Matters
APIs form the backbone of modern software, making efficient and secure testing critical. Local-first API clients tackle real-world pain points by enhancing privacy, performance, and maintainability. Moreover, by reducing hidden dependencies and integrating tightly with developer workflows, they help avoid creeping technical debt that slows teams over time.
Adopting local-first clients isn’t just a tooling upgrade—it’s a step towards more resilient, scalable, and developer-friendly API testing that can keep pace with the complexity of today’s software projects.
What API clients do you use? How do you balance cloud and local testing in your projects? Share your thoughts below!
2025-12-01 10:59:02
Lab overview
In this guided lab, you perform a series of tasks and actions to manage Microsoft Azure resources. You have the opportunity to modify a network, move a virtual machine between subnets, manage access to storage containers and file shares, and work with resource locks and resource tags.
During the setup, you create a virtual network, a virtual machine, a storage account, and associated resources.
Learning objectives
In this module, you'll practice how to:
First, we prepare the environment:
Login to Microsoft Azure
Create a resource group
In order to make clean-up easy at the end, start with creating a new resource group to hold the resources for this guided project.
Using resource groups to organize things is a quick way to ensure you can manage resources when a project is over.
1.From the Azure portal home page, in the search box, enter resource groups.
2.Select Resource groups under services.
3.Select Create
4.Enter guided-project-rg in the Resource group name field.
5.The Region field will automatically populate. Leave the default value.
6.Select Review + create.
7.Select Create.
8.Return to the home page of the Azure portal by selecting Home.
Create a virtual network with one subnet
1.From the Azure portal home page, in the search box, enter virtual networks.
2.Select virtual networks under services.
4.Scroll down to the Instance details section and enter guided-project-vnet for the Virtual network name.
5.Select Review + create.
6.Select Create.
7.Wait for the screen to refresh and show Your deployment is complete.
8.Select Home to return to the Azure portal home page.
Create a virtual machine
1.From the Azure portal home page, in the search box, enter virtual machines.
2.Select virtual machines under services.
3.Select Create and then select Virtual machine.
4.Select guided-project-rg for the Resource group.
5.Enter guided-project-vm for the Virtual machine name.
6.For the Image, select one of the Ubuntu Server options. (For example, Ubuntu Server 24.04 LTS - x64 Gen2)
7.Continue further on the Basics page to the Administrator account section.
8.Select Password for authentication type.
9.Enter guided-project-admin for the admin Username.
10.Enter a password for the admin account.
11.Confirm the password for the admin account.
12.Leave the rest of the settings as default settings. You can review the settings if you like, but shouldn’t change any.
Create a Storage account
3.Select Create.
4.Scroll down to the Instance details section and enter a name for the storage account. Storage accounts must be globally unique, so you may have to try a few different times to get a storage account name.
5.Select Review + create.
6.Select Create.
7.Wait for the screen to refresh and show Your deployment is complete.
8.Select Home to return to the Azure portal home page.
Exercise – Update the virtual network
Scenario
You’re helping an Azure Admin maintain resources. While you won’t be responsible for maintaining the entire infrastructure, the Admin will ask you to help out by completing certain tasks. Currently, there’s a Linux virtual machine (VM) that’s underutilized, and a need for a new Linux machine to serve as an FTP server. However, the Azure admin wants to be able to track network flow and resource utilization for the needed FTP server, so has asked you to start out by provisioning a new subnet. The current subnet should be left alone, as there are future plans for using it for additional VMs.
Create a new subnet on an existing virtual network (vNet)
1.Login to Microsoft Azure at https://portal.azure.com
2.From the Azure portal home page, in the search box, enter virtual networks.
3.Select virtual networks under services.
4.Select the guided-project-vnet virtual network.
5.From the guided-project-vnet blade, under settings, select Subnets.
6.To add a subnet, select + Subnet.
7.For Subnet purpose leave it as Default.
8.For Name enter: `ftpSubnet`.
9.Leave the rest of the settings alone and select Add.
10.Select Home to return to the Azure portal home page.
Create a network security group
1.From the Azure portal home page, in the search box, enter virtual networks.
2.Select virtual networks under services.
3.Select Network security groups.
4.Select + Create.
5.Verify the subscription is correct.
6.Select the guided-project-rg resource group.
7.Enter ftpNSG for the network security group name.
8.Select Review + create.
1.Once the validation is complete, select Create.
2.Wait for the screen to refresh and display Your deployment is complete.
3.Select Go to resource.
Create an inbound security rule
1.Under settings, select Inbound security rules.
2.Select + Add.
3.Change the Destination port ranges from 8080 to 22.
4.Select TCP for the protocol.
5.Set the name to ftpInbound.
6.Select Add.
7.Select Home to return to the Azure portal home page.
Move the virtual machine network to the new subnet
1.Login to Microsoft Azure at https://portal.azure.com
2.From the Azure portal home page, in the search box, enter virtual machines.
3.Select virtual machines under services.
4.Select the guided-project-vm virtual machine.
5.If the virtual machine is running, select Stop.
6.Wait for the Status field to update and show Stopped (deallocated).
7.Within the Networking subsection of the menu, select Network settings.
8.Select the Network interface / IP configuration hyperlink for the VM.
9.On the IP Configurations page, update the Subnet to ftpSubnet.
10.Select Apply.
11.Select Home to return to the Azure portal home page.
Vertically scale the virtual machine
1.From the Azure portal home page, in the search box, enter virtual machines.
2.Select virtual machines under services.
3.Select the guided-project-vm virtual machine.
4.Locate the Availability + scale submenu and select Size.
2025-12-01 10:53:51
Over the last few months, I built several AI products that relied heavily on:
On paper, everything looked great: solid models, decent infra, and reasonable traffic.
But in reality, things fell apart much faster than expected:
Like many engineers, my initial reaction was:
“We need more compute. Bigger models. More parallelism.”
I was wrong.
What I eventually realized was this:
Most AI systems don’t fail because the model is weak —
they fail because the infrastructure calling the model is inefficient.
I wasn’t architecting for scale.
I was brute-forcing the problem by firing more and more requests at the model and assuming hardware would magically handle the load.
After researching distributed inference systems, batching strategies, caching layers, vector storage, and reading several research papers, one pattern became clear across high-performance AI stacks:
Efficient AI isn’t achieved by calling the model more —
it’s achieved by reducing unnecessary calls.
In this article, I’ll break down the three system-level engineering strategies that dramatically improved cost, latency, and throughput:
Real-world traffic is extremely repetitive:
In most LLM-backed applications, 30–60% of requests are paraphrased variations of something already processed — but the system still sends a brand-new LLM request every time.
That’s pure waste.
Instead of assuming every request is unique, compare incoming requests against stored embeddings:
Example:
“How do I reset my password?”
“How can I change my login password?”
Different text — same meaning → one inference, unlimited reuse.
similarity ≥ threshold
→ return previous response
from typing import Optional
import numpy as np
vector_store: list[tuple[np.ndarray, str]] = []
SIMILARITY_THRESHOLD = 0.9
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-8))
def find_similar_response(query: str) -> Optional[str]:
if not vector_store:
return None
q_emb = get_embedding(query)
best_sim = -1.0
best_resp = None
for emb, resp in vector_store:
sim = cosine_similarity(q_emb, emb)
if sim > best_sim:
best_sim, best_resp = sim, resp
return best_resp if best_sim >= SIMILARITY_THRESHOLD else None
def handle_request(query: str) -> str:
cached_resp = find_similar_response(query)
if cached_resp:
return cached_resp
response = llm_call(query)
q_emb = get_embedding(query)
vector_store.append((q_emb, response))
return response
Pros
Cons
Traditional request handling:
N users → N separate LLM calls
Each request:
Batch incoming requests within a short time window:
Ideal for:
import threading
import time
from queue import Queue
REQUEST_QUEUE = Queue()
BATCH_SIZE = 16
MAX_WAIT_MS = 50
def llm_batch_call(prompts: list[str]) -> list[str]:
...
def batch_worker():
while True:
batch_prompts = []
batch_callbacks = []
start_time = time.time()
while (len(batch_prompts) < BATCH_SIZE and
(time.time() - start_time) * 1000 < MAX_WAIT_MS):
try:
prompt, callback = REQUEST_QUEUE.get(timeout=0.01)
batch_prompts.append(prompt)
batch_callbacks.append(callback)
except:
pass
if not batch_prompts:
continue
responses = llm_batch_call(batch_prompts)
for resp, cb in zip(responses, batch_callbacks):
cb(resp)
threading.Thread(target=batch_worker, daemon=True).start()
def handle_request_async(prompt: str, callback):
REQUEST_QUEUE.put((prompt, callback))
Pros
Cons
String-based caching fails when users change wording:
cache_key = hash(prompt_string)
Even tiny text differences cause cache misses.
Use semantic vector caching:
(embedding, response)
Before vs After:
Best Use Cases
Pros
Cons
Conceptually, the end-to-end pipeline looks like this:
You’ve now shifted from:
“Every request hits the LLM”
to:
“Only truly unique requests hit the LLM — and even those are batched.”
That change alone can reduce token cost by 60–90% depending on domain load patterns.
We’re entering a stage where:
So the real differentiator is no longer:
“Which model do you use?”
but instead:
“How intelligently do you orchestrate compute?”
Smarter token flow →
lower cost →
higher throughput →
better UX →
real scalability.
That’s the difference between:
If you want to turn this into a real engineering project, build:
That kind of project demonstrates:
“I understand AI infrastructure, not just prompt engineering.”
And that stands out sharply in today’s hiring market.
Efficient AI has little to do with bigger models
and everything to do with smarter compute orchestration.
If you architect the system well —
the model becomes the cheapest part of the pipeline.
Follow for updates.