2025-04-15 13:37:24
In the digital era, extracting information from documents and vehicle IDs has become increasingly important for web-based applications. Machine Readable Zones (MRZ) and Vehicle Identification Numbers (VIN) can now be scanned directly in the browser using modern web technologies. In this tutorial, you'll learn how to build a web-based MRZ and VIN scanner using JavaScript, HTML5 and Dynamsoft Capture Vision SDK.
The dynamsoft-capture-vision-bundle is the JavaScript version of Dynamsoft Capture Vision, available via npm or CDN. To use it, include the library in your index.html
:
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/dcv.bundle.min.js"></script>
The target HTML layout for the MRZ/VIN scanner consists of three main sections:
A div
element for license key setup, input source selection (File or Camera), and scanning mode toggle (MRZ or VIN).
<div class="container">
<div class="row">
<div>
<label>
Get a License key from <a
href="https://www.dynamsoft.com/customer/license/trialLicense/?product=dcv&package=cross-platform"
target="_blank">here</a>
</label>
<input type="text" id="license_key"
value="LICENSE-KEY"
placeholder="LICENSE-KEY">
<button onclick="activate()">Activate SDK</button>
</div>
</div>
<div class="row">
<div>
<select onchange="selectChanged()" id="dropdown">
<option value="file">File</option>
<option value="camera">Camera</option>
</select>
<form id="modeSelector">
<label>
<input type="radio" name="scanMode" value="mrz" checked>
MRZ
</label>
<label>
<input type="radio" name="scanMode" value="vin">
VIN
</label>
</form>
</div>
</div>
</div>
A div
for displaying the uploaded image and its scanning result.
<div class="container" id="file_container">
<div>
<input type="file" id="pick_file" accept="image/*" />
</div>
<div class="row">
<div class="imageview">
<img id="image_file" src="default.png" />
<canvas id="overlay_canvas" class="overlay"></canvas>
</div>
</div>
<div class="row">
<div>
<textarea id="detection_result"></textarea>
</div>
</div>
</div>
A dev
for showing the live camera stream along with real-time scanning results.
<div class="container" id="camera_container">
<div>
<select onchange="cameraChanged()" id="camera_source">
</select>
<button onclick="scan()" id="scan_button">Start</button>
<div id="videoview">
<div id="camera_view"></div>
</div>
<div class="row">
<div>
<textarea id="scan_result"></textarea>
</div>
</div>
</div>
</div>
The recognition engines for MRZ and VIN are initialized in the activate()
function, which is triggered when the user clicks the Activate SDK button. This function sets up the license key, loads the required models and code parsers, and registers result receivers for both MRZ and VIN.
async function activate() {
toggleLoading(true);
let divElement = document.getElementById("license_key");
let licenseKey = divElement.value == "" ? divElement.placeholder : divElement.value;
try {
await Dynamsoft.License.LicenseManager.initLicense(
licenseKey,
true
);
Dynamsoft.Core.CoreModule.loadWasm(["DLR"]);
parser = await Dynamsoft.DCP.CodeParser.createInstance();
// Load VIN and MRZ models
await Dynamsoft.DCP.CodeParserModule.loadSpec("VIN");
await Dynamsoft.DLR.LabelRecognizerModule.loadRecognitionData("VIN");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD1_ID");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD2_FRENCH_ID");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD2_ID");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD2_VISA");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD3_PASSPORT");
await Dynamsoft.DCP.CodeParserModule.loadSpec("MRTD_TD3_VISA");
await Dynamsoft.DLR.LabelRecognizerModule.loadRecognitionData("MRZ");
mrzRouter = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();
await mrzRouter.initSettings("./mrz.json");
mrzRouter.addResultReceiver({
onCapturedResultReceived: (result) => {
// TODO: Handle MRZ result
},
});
vinRouter = await Dynamsoft.CVR.CaptureVisionRouter.createInstance();
await vinRouter.initSettings("./vin.json");
vinRouter.addResultReceiver({
onCapturedResultReceived: (result) => {
// TODO: Handle MRZ result
},
});
isSDKReady = true;
}
catch (ex) {
console.error(ex);
}
toggleLoading(false);
}
Dynamsoft Capture Vision SDK provides the CameraEnhancer
and CameraView
classes for managing camera access and display. CameraEnhancer
wraps the getUserMedia()
method, while CameraView
class adds a live video view to the DOM.
async function openCamera(cameraEnhancer, cameraInfo) {
if (!Dynamsoft) return;
try {
await cameraEnhancer.selectCamera(cameraInfo);
cameraEnhancer.on("played", function () {
resolution = cameraEnhancer.getResolution();
});
await cameraEnhancer.open();
}
catch (ex) {
console.error(ex);
}
}
async function closeCamera(cameraEnhancer) {
if (!Dynamsoft) return;
try {
await cameraEnhancer.close();
}
catch (ex) {
console.error(ex);
}
}
async function setResolution(cameraEnhancer, width, height) {
if (!Dynamsoft) return;
try {
await cameraEnhancer.setResolution(width, height);
}
catch (ex) {
console.error(ex);
}
}
async function initCamera() {
if (!Dynamsoft) return;
try {
cameraView = await Dynamsoft.DCE.CameraView.createInstance();
cameraEnhancer = await Dynamsoft.DCE.CameraEnhancer.createInstance(cameraView);
let scanRegion = {
x: 10,
y: 30,
width: 80,
height: 40,
isMeasuredInPercentage: true
};
cameraEnhancer.setScanRegion(scanRegion);
cameras = await cameraEnhancer.getAllCameras();
if (cameras != null && cameras.length > 0) {
for (let i = 0; i < cameras.length; i++) {
let option = document.createElement("option");
option.text = cameras[i].label;
cameraSource.add(option);
}
try {
let uiElement = document.getElementById("camera_view");
uiElement.append(cameraView.getUIElement());
cameraView.getUIElement().shadowRoot?.querySelector('.dce-sel-camera')?.setAttribute('style', 'display: none');
cameraView.getUIElement().shadowRoot?.querySelector('.dce-sel-resolution')?.setAttribute('style', 'display: none');
}
catch (ex) {
console.error(ex);
}
}
else {
alert("No camera found.");
}
}
catch (ex) {
console.error(ex);
}
}
async function cameraChanged() {
if (cameras != null && cameras.length > 0) {
let index = cameraSource.selectedIndex;
await openCamera(cameraEnhancer, cameras[index]);
}
}
To recognize MRZ and VIN from images or camera streams, use the capture()
and startCapturing()
methods respectively. The capture()
method returns the recognition results directly, while the startCapturing()
method starts a continuous capturing process and returns the results through the onCapturedResultReceived
callback.
Recognizing MRZ/VIN from Image Files
function loadImage2Canvas(base64Image) {
imageFile.src = base64Image;
img.src = base64Image;
img.onload = function () {
let width = img.width;
let height = img.height;
overlayCanvas.width = width;
overlayCanvas.height = height;
if (!isSDKReady) {
alert("Please activate the SDK first.");
return;
}
toggleLoading(true);
let selectedMode = document.querySelector('input[name="scanMode"]:checked').value;
let context = overlayCanvas.getContext('2d');
context.clearRect(0, 0, overlayCanvas.width, overlayCanvas.height);
try {
if (selectedMode == "mrz") {
mrzRouter.capture(img.src, "ReadMRZ").then((result) => {
showFileResult(selectedMode, context, result);
});
}
else if (selectedMode == "vin") {
vinRouter.capture(img.src, "ReadVINText").then((result) => {
showFileResult(selectedMode, context, result);
});
}
}
catch (ex) {
console.error(ex);
}
toggleLoading(false);
};
}
Recognizing MRZ/VIN from Camera Stream
async function scan() {
if (!isSDKReady) {
alert("Please activate the SDK first.");
return;
}
let selectedMode = document.querySelector('input[name="scanMode"]:checked').value;
if (!isDetecting) {
scanButton.innerHTML = "Stop";
isDetecting = true;
if (selectedMode == "mrz") {
mrzRouter.setInput(cameraEnhancer);
mrzRouter.startCapturing("ReadMRZ");
}
else if (selectedMode == "vin") {
vinRouter.setInput(cameraEnhancer);
vinRouter.startCapturing("ReadVINText");
}
}
else {
scanButton.innerHTML = "Scan";
isDetecting = false;
if (selectedMode == "mrz") {
mrzRouter.stopCapturing();
}
else if (selectedMode == "vin") {
vinRouter.stopCapturing();
}
cameraView.clearAllInnerDrawingItems();
}
}
You can draw overlays on the image or video stream to highlight recognized areas and display the parsed results in a text area.
On Image Files
async function showFileResult(selectedMode, context, result) {
let parseResults = '';
let detection_result = document.getElementById('detection_result');
detection_result.innerHTML = "";
let txts = [];
let items = result.items;
if (items.length > 0) {
for (var i = 0; i < items.length; ++i) {
if (items[i].type !== Dynamsoft.Core.EnumCapturedResultItemType.CRIT_TEXT_LINE) {
continue;
}
let item = items[i];
parseResults = await parser.parse(item.text);
txts.push(item.text);
localization = item.location;
context.strokeStyle = '#ff0000';
context.lineWidth = 2;
let points = localization.points;
context.beginPath();
context.moveTo(points[0].x, points[0].y);
context.lineTo(points[1].x, points[1].y);
context.lineTo(points[2].x, points[2].y);
context.lineTo(points[3].x, points[3].y);
context.closePath();
context.stroke();
}
}
if (txts.length > 0) {
detection_result.innerHTML += txts.join('\n') + '\n\n';
if (selectedMode == "mrz") {
detection_result.innerHTML += JSON.stringify(extractMrzInfo(parseResults));
}
else if (selectedMode == "vin") {
detection_result.innerHTML += JSON.stringify(extractVinInfo(parseResults));
}
}
else {
detection_result.innerHTML += "Recognition Failed\n";
}
}
On Camera Stream
async function showCameraResult(result) {
let selectedMode = document.querySelector('input[name="scanMode"]:checked').value;
let items = result.items;
let scan_result = document.getElementById('scan_result');
if (items != null && items.length > 0) {
let item = items[0];
let parseResults = await parser.parse(item.text);
if (selectedMode == "mrz") {
scan_result.innerHTML = JSON.stringify(extractMrzInfo(parseResults));
}
else if (selectedMode == "vin") {
scan_result.innerHTML = JSON.stringify(extractVinInfo(parseResults));
}
}
}
Start a local server using Python:
python -m http.server 8000
Open your web browser and navigate to http://localhost:8000
.
https://github.com/yushulx/javascript-barcode-qr-code-scanner/tree/main/examples/mrz-vin-scanner
2025-04-15 13:27:26
Around 3 months ago, I visited the official website of my college:
🎓 Chhotanagpur Institute of Information Technology & Management, Dhanbad
And honestly… I was shocked and disappointed. 😓
There were many issues that made the experience really bad for students, parents, and even the admin staff. Here's what I noticed:
All these problems made me realize that something needed to change.
And then I thought to myself:
Build a Complete Education Management System that:
I chose the MERN Stack to build the project from scratch:
This was my first full-stack project, and I started alone — managing:
I had zero idea how to build many features at the beginning.
But I didn’t quit. I spent sleepless days and nights learning new concepts and fixing bugs.
Even when the code broke, even when the features didn’t work, even when nobody was there to help…
I kept going.
This project taught me more than any course ever did. 💯
✅ I restructured the entire frontend to make it clean, reusable, and scalable.
So I rebuilt the backend using:
/api/v1/...
routing
.js
to .mjs
for ES Module support
✅ The result? A clean, powerful, and easy-to-understand backend structure.
I also realized that JavaScript needed better type-checking.
So I started learning TypeScript, and now I’m slowly migrating the project to TS for more stability and maintainability. 🔐
At the same time, I started learning Data Structures & Algorithms (DSA) to improve my problem-solving skills.
This journey has not only been about building a project — it's been about building myself. 🧠
Eventually, people from around the world started noticing this project:
Even though many left halfway, I never stopped.
Because this project was more than code — it was my mission.
This project became more than mine — it became ours. 🌍
This project taught me:
To all the contributors, friends, GitHub supporters, and those who helped me learn:
I’ve learned from your code reviews, issues, suggestions, and support.
Are you a developer, designer, student, or tech enthusiast?
This project is open to everyone!
Contribute ➕ | Learn 📚 | Grow 🌱 | Inspire 💡
If you're reading this and you're struggling with your own project…
Just remember:
“It doesn’t matter if you're slow. What matters is that you don’t stop.”
“Keep learning. Keep building. Keep believing.”
From nothing to something — you can do it too.
Let’s keep growing. Together. 🚀
2025-04-15 13:27:11
2025-04-15 13:23:29
WebSockets enable real-time communication between clients and servers, but handling connection drops gracefully is critical. In this guide, we’ll build a reconnection strategy using exponential backoff with jitter, ensuring both reliability and server-friendliness.
Network issues, server restarts, or even browser tab inactivity can cause disconnections. Without a reconnection system, your app can break silently or hammer the server with repeated attempts.
class ReconnectingWebSocket {
constructor(url) {
this.url = url;
this.ws = null;
this.maxAttempts = 10;
this.attempt = 0;
this.connect();
}
connect() {
this.ws = new WebSocket(this.url);
this.ws.onopen = () => {
console.log('WebSocket connected');
this.attempt = 0;
};
this.ws.onmessage = (msg) => {
console.log('Received:', msg.data);
};
this.ws.onclose = () => {
console.warn('WebSocket closed. Reconnecting...');
this.reconnect();
};
this.ws.onerror = (err) => {
console.error('WebSocket error:', err);
this.ws.close();
};
}
reconnect() {
if (this.attempt >= this.maxAttempts) {
console.error('Max reconnection attempts reached.');
return;
}
const delay = this.getBackoffDelay(this.attempt);
console.log(`Reconnecting in ${delay}ms`);
setTimeout(() => {
this.attempt++;
this.connect();
}, delay);
}
getBackoffDelay(attempt) {
const base = 500; // 0.5 second
const max = 30000; // 30 seconds
const jitter = Math.random() * 1000;
return Math.min(base * 2 ** attempt + jitter, max);
}
send(data) {
if (this.ws.readyState === WebSocket.OPEN) {
this.ws.send(data);
} else {
console.warn('WebSocket not connected');
}
}
}
const socket = new ReconnectingWebSocket('wss://yourserver.com/ws');
setInterval(() => {
socket.send(JSON.stringify({ type: 'ping' }));
}, 5000);
setInterval
and timeouts.By implementing a solid reconnection strategy with exponential backoff and jitter, your WebSocket-based applications become more resilient and production-ready. These improvements reduce user disruption and protect your backend from overload during outages.
If this post helped you, consider supporting me: buymeacoffee.com/hexshift
2025-04-15 13:21:04
皆さんDifyは利用されていますでしょうか?
DifyのモデルプロバイダーにAmazon Bedrockを選択できるため、アクセスキーとシークレットアクセスキーの設定を行えば、Amazon Bedrockの幅広いモデルが利用できます。
Difyはクラウド版とオンプレミス版があり、クラウド版ではSANDBOXという無料プランでは利用できる範囲に制約があるため、セルフホスティングのCOMMUNITY版を利用することで、Amazon Bedrockを選択することができます。
社内などのローカル環境下で利用すれば、Amazon Bedrockとの通信は発生するもののローカルのRAGを作ったり、モデルに学習される心配をせずに利用した分だけを使用するということができるため、まずは慣れ親しんでみたいと思われる中小企業では最適な選択肢になります。
またローカルLLMを用意することで、インターネットとの通信を発生させないクローズドな環境で利用することができるようになります。
なお、オンプレミス版はDockerイメージで提供されているため、比較的セットアップも容易で導入が可能です。
以下はDify オンプレミス版に Amazon Bedrock/Nova Cross-Region 推論を組み込む方法を参考にして、EC2に構築したDifyにパブリックIP経由で接続していますが、ローカル環境に構築することでローカル接続することが同様にできます。
また、引用元ではオレゴンリージョンを利用した説明となっていますが、私はバージニア北部リージョンで実施してみたところ、モデルの選択では、US.Cross Region Inference
と表示されていないモデルでも正しく応答していました。
是非皆さんも試してみてください。
2025-04-15 13:21:03
A while ago, a friend and I were discussing code cognitive complexity and maintainability. A friend of mine wished to have a tool to automatically evaluate whether a piece of code is hard to maintain. I wasn’t sure this was even possible — maintainability is notoriously hard to quantify programmatically.
But LLMs can understand and generate human-like text or even code, and I wondered if that same capability could be applied to interpreting and evaluating code quality, going beyond what traditional static analysis tools can do.
That thought led to the tool I eventually built. It’s now available on PyPi, and I believe it could be a valuable addition to any CI pipeline.
In a previous post, I shared some early thoughts on maintainability and cognitive complexity of code that emerged while working on this tool. In this post, I’d like to go deeper and walk through the development process, using my CLI tool as a case study for building an LLM-based application.
To bring this idea to life, I used LangChain, a Python library for building LLM-powered applications. LangChain abstracts away the APIs of specific language models, so I could focus entirely on building the core functionality without worrying about the details of how to communicate with an LLM.
One of its most useful features is the seamless integration with Pydantic. The output schemas were defined using Pydantic models, and LangChain automatically generated prompts that guided the LLM to return structured responses, otherwise a lot more time would be spent on parsing and error handling of LLM output.
With the basic setup ready, I began designing the metric. The idea was straightforward:
Asking the LLM to evaluate each function individually and assign a cognitive complexity score from 1 to 5, where higher scores indicate code that’s harder to understand and maintain.
The prompt included both formatting instructions (generated via LangChain) and an explanation of grading criteria. I experimented with different ways of phrasing the grading scale to make the results more stable and meaningful.
To refine the evaluation, I added a few extra fields:
is_setup_or_declaration
flag to identify and skip boilerplate code (like config or constant declarations).start_line_number
and end_line_number
to estimate the size of each function.At first, I tried using a single function_length
field, but estimating start and end lines separately produced more reliable results.
class CodeComplexityEvaluation(BaseModel):
function_name: str = Field(description="Name of the function")
is_setup_or_declaration: bool = Field(
description="The code is part of setup or declaration boilerplate, such as defining constants or configuring a framework."
)
start_line_number: int = Field(
description="Number of the first line of the function, considering existing formatting"
)
end_line_number: int = Field(
description="Number of the last line of the function, considering existing formatting"
)
complexity_score: float = Field(
description=(
"Overall code complexity on a scale from 0 to 5, as discussed in the article "
"'Simplifying Complex Code with Advanced Programming Approaches.'\n\n"
"Interpretation:\n"
"0 - 1: Very low complexity. The code is straightforward, easy to read, and requires minimal domain or technical knowledge.\n"
"2 - 3: Moderate complexity. The code may use some advanced techniques or domain knowledge, but remains relatively approachable.\n"
"4: High complexity. The code relies on multiple advanced concepts, intricate domain logic, or specialized optimizations.\n"
"5: Extremely high complexity. The code likely combines various advanced paradigms, deep domain knowledge, and complex abstractions, "
"making it very challenging to understand or maintain."
)
)
The code was evaluated file by file. For each file, the LLM assessed maintainability of every function individually. Then, the overall file score was calculated as a length-weighted average of the complexity score for each function.
The same approach was applied at the project level: the total score for the entire codebase was computed as a weighted average across all files, again considering size of each file to better reflect its impact on overall maintainability.
This approach yielded some promising early results. The LLM was able to evaluate functions and assign scores that often aligned with my own assessments.
However, I noticed the results weren’t fully consistent. Scores varied by 5–15% between runs, likely due to the stochastic nature of LLMs and their sensitivity to slight changes in input or internal randomness.
In the context of a CI pipeline, this kind of inconsistency is a problem. CI tools need reliable metrics to determine whether code meets quality standards. An unstable score makes it hard to track whether code is actually improve or degrade over time.
To address the inconsistency, my next step was to enhance the prompt by explicitly asking the LLM to explain WHY it assigned a particular score.
From previous experiments, I had noticed that when an LLM is prompted to provide reasoning, it tends to produce more thoughtful and consistent responses. I suspect this happens because generating an explanation forces the model to “think through” its decision, leading to better alignment between the score and the reasoning behind it.
These explanations were also for helping me spot recurring patterns or biases in the model’s behavior. In some cases, I could identify where the prompt needed fine-tuning or where the LLM misunderstood me. This kind of prompt iteration is an essential part of building robust LLM-based applications.
While this adjustment did lead to slightly more stable scores, the improvement wasn’t enough. Variability was reduced, but not to a level I felt comfortable using in a CI pipeline. I needed a more robust solution.
I realized that relying on a single score to capture code complexity was too limiting and too fragile. So I shifted toward evaluating multiple metrics, each representing a different aspect of cognitive complexity.
The idea was simple: by breaking complexity into several dimensions and scoring each separately. I expected to create a more stable composite score. If one metric fluctuated slightly due to randomness, the others could help balance it out, leading to a more reliable overall result.
Introducing multiple metrics also opened the door to a new approach:
Treating each metric as a probability between 0 and 1. This standard scale made it easier for the LLM to reason about each factor, as it aligned with common patterns found in prompts and training data.
It also gave me more flexibility when interpreting the results. Instead of making a strict yes/no decision on whether a factor was present, I could adjust thresholds, playing around with sensitivity and accuracy.
class CodeComplexityConfidenceEvaluation(BaseModel):
function_name: str = Field(description="Name of the function")
is_setup_of_declaration: bool = Field(
description="The code is part of setup or declaration boilerplate, such as defining constants or configuring a framework."
)
start_line_number: int = Field(
description="Number of the first line of the function, considering existing formatting"
)
end_line_number: int = Field(
description="Number of the last line of the function, considering existing formatting"
)
use_of_advanced_algorithms: float = Field(
description="Use of advanced algorithms requiring domain-specific knowledge. 0 means no such algorithms, 1 means heavily reliant on them."
)
low_level_optimizations: float = Field(
description="Low-level optimizations that require deep knowledge of hardware or language internals."
)
complex_third_party_libraries: float = Field(
description="Use of complex third-party libraries (e.g., Rx.js, Pandas, TensorFlow)."
)
business_logic_domain_expertise: float = Field(
description="Business logic requiring domain-specific expertise."
)
advanced_coding_techniques: float = Field(
description="Use of advanced coding techniques (e.g., functional programming)."
)
excessive_mutable_state: float = Field(
description="Excessive reliance on mutable state. 0 means purely immutable or minimal state, 1 means heavy reliance on mutable data."
)
deeply_nested_control_structures: float = Field(
description="Deeply nested control structures (more than 3 levels)."
)
long_classes: float = Field(
description="Long classes (over 200 lines). 0 means no long classes, 1 means code is dominated by extremely large classes."
)
long_functions: float = Field(
description="Long functions (over 100 lines). 0 means short functions, 1 means extremely long, monolithic functions."
)
parallelism_and_concurrency: float = Field(
description="Usage of parallelism or concurrency patterns (threads, async, futures, etc.)."
)
recursion: float = Field(
description="Usage of recursive functions or algorithms."
)
global_variables: float = Field(
description="Use of global variables."
)
magic_numbers: float = Field(
description="Magic numbers (unexplained constants) that reduce readability."
)
long_lists_of_positional_parameters: float = Field(
description="Functions with a large number of positional parameters."
)
advanced_language_features: float = Field(
description="Use of advanced language features (e.g., metaprogramming, reflection)."
)
inconsistent_indentation_or_formatting: float = Field(
description="Poorly formatted code, inconsistent indentation, or misaligned braces."
)
long_monolithic_blocks_of_code: float = Field(
description="Large uninterrupted blocks of code lacking clear separation."
)
non_descriptive_variable_function_names: float = Field(
description="Non-descriptive or misleading names for variables, functions, or classes."
)
excessive_branching: float = Field(
description="Frequent or complicated branching (if/else, switch), making logic harder to follow."
)
inconsistent_error_handling: float = Field(
description="Multiple, inconsistent ways of handling errors throughout the code."
)
complex_boolean_logic: float = Field(
description="Multiple combined boolean expressions making the logic difficult to parse."
)
code_duplication: float = Field(
description="Repetitive code blocks or functions duplicated across the codebase."
)
non_idiomatic_use_of_language_features: float = Field(
description="Using language features in a way that goes against common idioms or best practices."
)
However, this approach came with a trade-off. It was challenging to balance the level of detail in the description of each factor with the overall size of the prompt. The more detailed and explicit the prompt, the better the LLM could identify specific factors, but longer prompts also meant slower response times, higher costs, and a greater chance of hitting token limits.
On the other hand, shorter prompts were faster and cheaper to run but often resulted in weaker detection accuracy, leading to higher error rates and less reliable evaluations.
Doubling down on the idea of averaging out inconsistencies, I decided to increase the number of factors even further, while simplifying the prompt by representing each factor as an enum value.
By scaling up the number of simpler, well-defined factors, I aimed to make the scoring system both more granular and consistent, without overwhelming the model with lengthy descriptions.
For example:
class CodeComplexityFactors(str, Enum):
use_advanced_coding_techniques = (
"Use of advanced coding techniques, such as functional programming, that are less commonly understood."
)
use_advanced_algorithms = (
"Use of advanced algorithms requiring specialized knowledge, making the code harder to understand."
)
use_parallelism_concurrency_patterns = (
"Use of parallelism, concurrency, or recursion, which adds complexity due to the challenges of handling state across multiple threads or processes."
)
use_advanced_language_features = (
"Use of advanced language features, such as reflection or metaprogramming, which can obscure code readability and require deep understanding."
)
complicated_arithmetic_expressions = (
"Complex arithmetic expressions that involve multiple operations or formulas, making it harder to reason about."
)
complicated_boolean_expressions = (
"Complex boolean logic, including multiple conditions that can be difficult to follow and debug."
)
complicated_string_manipulation = (
"Complex string manipulations that involve multiple functions or operations, reducing clarity."
)
complicated_bitwise_operations = (
"Use of bitwise operations and manipulation, which are generally low-level and harder to understand."
)
use_complex_third_party_libraries = (
"Use of complex third-party libraries (e.g., Rx.js, Pandas, TensorFlow) that require specialized knowledge to understand and work with."
)
business_domain_expertise = (
"Code that requires specific knowledge in the business domain, such as finance or healthcare."
)
technical_domain_expertise = (
"Code that requires technical domain knowledge, such as signal processing or computer graphics."
)
application_domain_expertise = (
"Code that requires understanding of the business logic unique to the application."
)
use_global_variables = (
"Use of global variables or hidden mutable state, which makes the code harder to reason about and introduces potential side effects."
)
non_standard_coding_conventions = (
"Use of non-standard or inconsistent coding and naming conventions, which can confuse engineers unfamiliar with the code."
)
excessive_mutable_state = (
"Excessive reliance on mutable state, making the code harder to predict and test."
)
magic_numbers = (
"Use of magic numbers (unexplained constants) that lack context, reducing clarity."
)
long_lists_of_positional_parameters = (
"Functions with long lists of positional parameters, which can lead to confusion and misuse."
)
excessive_boilerplate_code = (
"Excessive boilerplate code, which can obscure the core functionality and make the code harder to maintain."
)
inconsistent_indentation_or_formatting = (
"Inconsistent indentation or formatting, reducing readability and maintainability."
)
long_monolithic_blocks_of_code = (
"Long, monolithic blocks of code without clear separation of concerns, making it difficult to follow."
)
non_descriptive_variable_function_names = (
"Non-descriptive or misleading names for variables or functions, reducing clarity and making it harder to understand the code."
)
overly_complex_function_signatures = (
"Overly complex function signatures, making it hard to understand the purpose and use of the function."
)
deeply_nested_control_flow = (
"Deeply nested branching in control flow (e.g., if/else, switch), making it hard to follow the execution logic."
)
complicated_control_flow_branching = (
"Complicated branching in control flow, adding difficulty in understanding the code's decision-making."
)
deeply_nested_loops = (
"Deeply nested loops (e.g., for, while), which can reduce code readability and increase cognitive load."
)
complicated_loop_structure = (
"Complicated loop structures that involve multiple conditions, breaking out of loops, or complex logic."
)
hidden_side_effects = (
"Hidden side effects that are not immediately obvious from the function signature, making debugging and reasoning more difficult."
)
code_duplication = (
"Code duplication across functions or classes, which increases maintenance complexity."
)
non_idiomatic_use_of_language_features = (
"Non-idiomatic use of language features, which may be unfamiliar or unintuitive for engineers working in the language."
)
complex_math_concepts = (
"Use of advanced mathematical concepts or models, which require specialized knowledge to understand."
)
functional_programming = (
"Use of functional programming paradigms, which require a different way of thinking and may not be familiar to all engineers."
)
complex_inheritance = (
"Complex inheritance hierarchies, which can be hard to trace and understand."
)
complex_polymorphism = (
"Complex use of polymorphism, which may introduce unexpected behavior and harder-to-understand relationships between classes."
)
complex_data_structures = (
"Use of complex data structures (e.g., graphs, trees) that require specialized knowledge to work with."
)
bitwise_operations = (
"Use of bitwise operations, which are generally low-level and harder to understand."
)
concurrency_mechanisms = (
"Use of complex concurrency mechanisms, which add complexity in terms of state management and performance."
)
complex_regular_expressions = (
"Use of complex regular expressions, which are often hard to read and understand at a glance."
)
reflection_and_metaprogramming = (
"Use of reflection, metaprogramming, or other runtime code manipulation that reduces readability and increases cognitive load."
)
high_performance_computations = (
"High-performance computations or low-level system optimizations, requiring specialized knowledge and potentially obscuring clarity."
)
low_level_networking = (
"Low-level networking or socket programming, which requires specialized technical knowledge."
)
use_of_category_theory = (
"Use of category theory concepts, which are very abstract and require a deep understanding to work with."
)
domain_specific_languages = (
"Use of domain-specific languages (DSLs), which introduce custom syntax or rules that may be unfamiliar."
)
To make the scoring more meaningful, I also introduced a custom weight to each enum value. By asking the LLM to identify which factors were present in the code and then applying the corresponding weights, I could compute a weighted sum that reflected the impact of the estimated complexity. This gave me a more flexible way to evaluate code, where each factor contributed proportionally based on how much it affects readability, maintainability, or onboarding effort.
code_complexity_factors_weight = {
CodeComplexityFactors.use_advanced_coding_techniques: 10,
CodeComplexityFactors.use_advanced_algorithms: 6,
CodeComplexityFactors.complicated_control_structures: 3,
CodeComplexityFactors.use_parallelism_concurrency_patterns: 4,
CodeComplexityFactors.use_advanced_language_features: 6,
CodeComplexityFactors.complicated_arithmetic_expressions: 3,
CodeComplexityFactors.complicated_boolean_expressions: 3,
CodeComplexityFactors.complicated_string_manipulation: 2,
CodeComplexityFactors.complicated_bitwise_operations: 4,
CodeComplexityFactors.use_complex_third_party_libraries: 3,
CodeComplexityFactors.business_domain_expertise: 4,
CodeComplexityFactors.technical_domain_expertise: 4,
CodeComplexityFactors.application_domain_expertise: 3,
CodeComplexityFactors.use_global_variables: 2,
CodeComplexityFactors.non_standard_coding_conventions: 2,
CodeComplexityFactors.excessive_mutable_state: 2,
CodeComplexityFactors.magic_numbers: 1,
CodeComplexityFactors.long_lists_of_positional_parameters: 2,
CodeComplexityFactors.excessive_boilerplate_code: 1,
CodeComplexityFactors.inconsistent_indentation_or_formatting: 1,
CodeComplexityFactors.long_monolithic_blocks_of_code: 2,
CodeComplexityFactors.non_descriptive_variable_function_names: 2,
CodeComplexityFactors.overly_complex_function_signatures: 2,
CodeComplexityFactors.deeply_nested_control_flow: 3,
CodeComplexityFactors.complicated_control_flow_branching: 2,
CodeComplexityFactors.deeply_nested_loops: 3,
CodeComplexityFactors.complicated_loop_structure: 2,
CodeComplexityFactors.hidden_side_effects: 4,
CodeComplexityFactors.code_duplication: 2,
CodeComplexityFactors.non_idiomatic_use_of_language_features: 3,
CodeComplexityFactors.complex_math_concepts: 7,
CodeComplexityFactors.functional_programming: 10,
CodeComplexityFactors.complex_inheritance: 4,
CodeComplexityFactors.complex_polymorphism: 4,
CodeComplexityFactors.complex_data_structures: 6,
CodeComplexityFactors.bitwise_operations: 4,
CodeComplexityFactors.concurrency_mechanisms: 5,
CodeComplexityFactors.complex_regular_expressions: 4,
CodeComplexityFactors.reflection_and_metaprogramming: 4,
CodeComplexityFactors.high_performance_computations: 5,
CodeComplexityFactors.low_level_networking: 6,
CodeComplexityFactors.use_of_category_theory: 10,
CodeComplexityFactors.domain_specific_languages: 6,
}
My prompt was generating output in JSON format, one of the fields had to contain the array of enum values. During experimentation, I started noticing an increase in poorly formatted JSON outputs from the LLM. It turned out that LangChain was including the enum descriptions directly in the prompt and expecting output to contain array of string exactly matching my enum descriptions. LLM response was very long and verbose, that led to inconsistent representations and parsing errors. To fix this, I revised the enums to use concise, clear values that reduced ambiguity and minimized the chance of misinterpretation. This helped with formatting, but didn’t fully solve the problem.
Despite these improvements, the approach still had reliability issues. The LLM would sometimes miss key factors or falsely detect ones that weren’t present. This was especially problematic for high-weight factors, since errors in those would heavily skew the final score.
After playing with the trade-offs of fine-grained metrics, I realized I needed a better balance:
But by spending time refining the individual factors, I started to see patterns, groups of related traits that could be consolidated into broader categories. This led me to define five key dimensions of complexity:
Readability Issues – Problems related to naming, formatting, or clarity that reduce how easily code can be understood.
Control Flow Complexity – Use of deeply nested logic, recursion, or heavy branching that increases cognitive load.
Project-Specific Knowledge – Dependencies on internal business logic, frameworks, or custom libraries that are hard to understand without context.
Domain-Specific Knowledge – Use of specialized concepts (e.g., from machine learning, graphics, physics, or signal processing) that require prior expertise.
Advanced Coding Techniques – Patterns like metaprogramming, functional programming or other techniques that are powerful but mostly used due to personal preference.
Grouping the factors this way allowed me to keep the prompt concise while still capturing the most important sources of complexity. It also made it easier to assign meaningful weights to each category based on how difficult they are to understand, refactor, or onboard new developers into.
class FunctionComplexityEvaluation(BaseModel):
function_name: str = Field(description="Name of the function")
is_setup_of_declaration: bool = Field(
description="The code is part of setup or declaration boilerplate, such as defining constants or configuring a framework."
)
start_line_number: int = Field(
description="Number of the first line of the function, considering existing formatting"
)
end_line_number: int = Field(
description="Number of the last line of the function, considering existing formatting"
)
readability_score: float = Field(
description="Estimate how readable the code is based on factors like naming conventions, formatting, and non-runtime characteristics."
)
cognitive_complexity_score: float = Field(
description="Estimate the cognitive complexity of control structures and expressions. Higher scores result from deeply nested control flow, complex expressions, and multiple branching levels."
)
project_specific_knowledge_score: float = Field(
description="Estimate how much project-specific knowledge is required, such as the use of third-party libraries or specific business rules."
)
technical_domain_knowledge_score: float = Field(
description="Estimate the level of deep technical domain knowledge required, such as advanced algorithms, parallel programming, signal processing, or low-level optimizations."
)
advanced_code_techniques_score: float = Field(
description="Estimate the use of advanced coding techniques (like functional programming paradigms) that are not essential for solving the task but reflect the developer’s preference."
)
By focusing on five broad categories, I could include clear definitions and concrete examples for each, which helped the model make more consistent and accurate evaluations. It also made the weighting process much simpler. Instead of juggling dozens of individual factors, I could assign meaningful weights to just five core categories, each representing a different dimension of complexity.
After some refinement, I realized that adding more detailed grading guidelines for each category directly into the prompt would further improve consistency. Clear score ranges gave the LLM a more structured way to estimate maintainability and helped align its output with my expectations.
from pydantic import BaseModel, Field
class FunctionComplexityEvaluation(BaseModel):
function_name: str = Field(description="Name of the function")
is_setup_of_declaration: bool = Field(
description="The code is part of setup or declaration boilerplate, such as defining constants or configuring a framework."
)
start_line_number: int = Field(
description="Number of the first line of the function, considering existing formatting"
)
end_line_number: int = Field(
description="Number of the last line of the function, considering existing formatting"
)
readability_score: float = Field(
description="Estimate how readable the code is based on factors like naming conventions, formatting, and non-runtime characteristics.\n\
Score ranges:\n\
0 - 0.3: The code follows standard naming conventions, is well-formatted, and lacks clutter (e.g., no magic numbers or excessive boilerplate).\n\
0.3 - 0.7: Minor readability issues, inconsistent formatting, occasional use of non-descriptive names, or slight violations of coding standards.\n\
0.7 - 1: Significant readability problems, non-standard conventions, poor naming, inconsistent formatting, or extensive use of boilerplate code."
)
cognitive_complexity_score: float = Field(
description="Estimate the cognitive complexity of control structures and expressions.\n\
Score ranges:\n\
0 - 0.3: Simple control structures (minimal nesting, straightforward logic, few operators).\n\
0.3 - 0.7: Moderate complexity, involving some nesting (2–3 levels), more complex boolean/arithmetic expressions, or multiple operators.\n\
0.7 - 1: Highly complex control structures, deeply nested (3+ levels), intricate logic with many operators, or multiple conditional/loop combinations."
)
project_specific_knowledge_score: float = Field(
description="Estimate how much project-specific knowledge is required, such as the use of third-party libraries or specific business rules.\n\
Score ranges:\n\
0 - 0.3: Little to no project-specific knowledge required, uses common third-party libraries or standard business rules.\n\
0.3 - 0.7: Some project-specific knowledge is needed, involving custom libraries or moderately complex business rules.\n\
0.7 - 1: Extensive project-specific knowledge required, highly customized third-party libraries or intricate, specific business logic."
)
technical_domain_knowledge_score: float = Field(
description="Estimate the level of deep technical domain knowledge required, such as advanced algorithms, parallel programming, signal processing, or low-level optimizations.\n\
Score ranges:\n\
0 - 0.3: Minimal technical domain knowledge required, standard algorithms and techniques used.\n\
0.3 - 0.7: Moderate technical domain knowledge, involving specialized algorithms, parallel programming, or some scientific/engineering calculations.\n\
0.7 - 1: High level of technical domain knowledge required, including advanced algorithms, low-level optimizations, or complex scientific/mathematical concepts."
)
advanced_code_techniques_score: float = Field(
description="Estimate the use of advanced coding techniques (like functional programming paradigms) that are not essential for solving the task but reflect the developer’s preference.\n\
Score ranges:\n\
0 - 0.3: No or minimal use of advanced techniques, the code is straightforward and easy to follow.\n\
0.3 - 0.7: Some use of advanced techniques (e.g., functional programming, metaprogramming) that increase complexity but do not dominate the code.\n\
0.7 - 1: Heavy use of advanced techniques that significantly add complexity without being essential for solving the problem (e.g., monads, currying, complex metaprogramming)."
)
I assigned weights to each category based on how difficult it is to address that type of complexity in real-world scenarios:
Readability Issues (Weight: 1) – These are usually easy to fix. Renaming variables, cleaning up formatting, or adding comments, minimal effort or low risk.
Control Flow Complexity (Weight: 2) – Refactoring deeply nested logic or simplifying branching structures is harder and can introduce bugs if not done carefully.
Project-Specific Knowledge (Weight: 3) – This often requires onboarding or checking internal documentation. It makes harder to onboard new team members and it is hard for engineers to keep their knowledge up to date.
Domain-Specific Knowledge (Weight: 4) – Understanding concepts from fields like machine learning or graphics can take significant time and isn’t always easily accessible.
Advanced Coding Techniques (Weight: 5) – Unnecessary complexity, often reflecting personal preferences rather than project needs, and understanding them may require deep technical knowledge and hands-on experience.
In addition to the cognitive complexity estimated by the language model, I decided to also consider function length as part of the final score. Long functions often require developers to hold more context in their minds, which becomes especially difficult when the logic is hard.
A short function that handles something complex can still be understandable. But even simple logic, when stretched over dozens of lines, becomes difficult to follow. That’s why keeping functions small is a well-known best practice, something I wanted the tool to encourage.
To capture this, I introduced a function size factor:
desired_length = 10 # lines of code
function_size_factor = math.sqrt(function_length / desired_length)
A 10-line function is treated as the baseline. Shorter functions are typically simpler and easier to reason about, while longer ones get penalized. The function size factor growth is limited by sq. root, to prevent excessively long functions from dominating entire score.
To keep the final score more human friendly, I wanted to keep it in range from 1 to 5 like typical star
rating. I applied a hyperbolic tangent (tanh) function to the adjusted composite score:
MIN_VALUE = 1
MAX_VALUE = 5
VALUE_RANGE = MAX_VALUE - MIN_VALUE # 4
final_score = MIN_VALUE + VALUE_RANGE * tanh(composite_score * function_size_factor)
The hyperbolic tangent function brings several useful properties to the scoring formula:
By applying this function, I ensured the final score stays within desired range, while also modeling the non-linear nature of how developers experience complexity.
As the tool evolved, I added a few more features to improve its performance and usability:
Progressive Evaluation – To save time and compute, the tool caches previous results and skips files that haven’t changed since the last run. This makes it much faster to use in CI pipelines or large projects.
Improvement Suggestions – When a file exceeds the target complexity score, the tool generates helpful, actionable feedback on what could be improved, highlighting specific areas that contribute most to the score.
Configuration Options – The tool behaviour can be customized through a config file or CLI flags. This allows teams to adapt it to fit their needs.
I hope this tool will be useful to other engineers and companies looking to bring code complexity evaluation into their CI workflows. While it’s still early, I see this as one of the first practical steps toward automating parts of the code review process using LLMs.
With this post, I didn’t just want to showcase the tool, I wanted to share the journey of building it. From experimenting with prompts to balance reliability, performance, and cost. This project taught me a lot about working with LLMs in real-world scenarios.
It also gave me a deeper understanding of what actually makes code complex, readable, or maintainable. Now I approach code quality with a more structured mindset, and I hope these insights help others do the same.
🧪 Try it out: codepass on PyPI
🚀 Code: Github repo
💬 Got feedback or ideas? Drop a comment below!