2025-12-27 04:44:50
When most people think about QA (Quality Assurance) or SDET (Software Development Engineer in Test), they think of testing apps, finding bugs, or writing automation frameworks. But one of the biggest lessons I’ve learned in my career is this: automation isn’t just about testing software — it’s about removing repetitive pain anywhere you see it.
For me, that “pain” came in the form of payroll CSVs.
On the surface, a CSV file seems harmless — just rows and columns. But from a QA perspective, CSVs are a constant source of errors and wasted time, especially when used for payroll or timesheets.
Here are a few of the issues I’ve seen:
Every one of these problems leads to payroll delays, frustrated employees, and time lost fixing files that should have “just worked.”
As a QA/SDET engineer, I deal with data pipelines, test automation, and validation every day. One day I thought:
Why not apply the same principles I use in testing to payroll files?
That’s when I started building a small side project: an automation tool that auto-generates payroll CSVs.
The idea was simple:
I started with a few core requirements in mind:
Below is a simplified, pseudocode that shows the core idea:
load base_csv_template()
for each test_case in test_cases:
cloned_row = copy(base_csv_row)
update_required_columns(cloned_row, test_case.inputs)
validate_schema(cloned_row)
validate_business_rules(cloned_row)
append_to_output(cloned_row)
export_csv(output_file)
The difference was immediate:
Even in a small pilot, this tool saved dozens of hours each month. At scale, the impact could be massive.
Building this tool reinforced a few big lessons for me:
I’m continuing to refine the tool, add integrations, and explore ways to make it open-source so others can benefit.
If you’re in QA, Dev, DevOps, or HR tech, I’d love your feedback:
At the end of the day, CSV files may never be glamorous. But solving a real problem for real people — that’s the kind of innovation that makes me excited about being an automation engineer.
2025-12-27 04:34:03
If you run both Linux and Windows on the same machine, you might notice that after rebooting, your system clock jumps by a few hours.
This happens because both systems read the same hardware clock but interpret it differently: Windows treats it as local time, while Linux treats it as UTC.
To fix this, both systems need to use the same time reference. There are two main approaches:
Use UTC for both systems (recommended):
timedatectl set-local-rtc 0 --adjust-system-clock
It's not necessary because Linux uses UTC by default, but this explicitly ensures the setting.
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation" /v RealTimeIsUniversal /t reg_dword /d 1 /f
It will add a registry key to treat the hardware clock as UTC.
You can save it in a .txt file, rename it to switch-to-utc-time.bat and run as administrator. Alternatively, open a cmd or PowerShell window as administrator, paste the line, and press Enter.
Ater applying the registry setting, restart Windows.
Use local time for both systems (Windows default):
timedatectl set-local-rtc 1 --adjust-system-clock
Then reboot.
After this change, both OS clocks will stay in sync, preventing issues with logs, Git commits, scheduled tasks, or databases caused by time jumps.
What is hardware clock and where is it stored?
The hardware clock, sometimes called RTC or CMOS clock, is built into the motherboard and keeps time even when the computer is powered off. Both Linux and Windows read this clock at startup.
Why this matters for users
If your clocks aren’t in sync, you might notice:
Why this matters for developers
For developers, having consistent system time is critical. When dual-boot clocks drift, you may encounter:
Using UTC is generally better because it avoids timezone-related issues and daylight saving changes. Linux servers, Docker containers, CI/CD pipelines, and cloud environments almost always operate in UTC, so aligning your local machine avoids subtle bugs in development workflows.
2025-12-27 04:33:33
Diferente do
Console.WriteLine(), o métodoConsole.Write()não quebra a linha após a escrita.
Ele é muito usado para saídas contínuas, progresso em tempo real, logs formatados e interfaces de console interativas.
using System;
namespace ConsoleApp
{
class Program
{
static void Main(string[] args)
{
Console.Write("Hello ");
Console.Write("World ");
Console.Write("C# 1.0");
}
}
}
📌 Saída:
Hello World C# 1.0
Sem mudanças diretas na API do Console, mas o padrão permanece:
Console.Write("Hello World - C# 2.0");
var
var text = "Hello World - C# 3.0";
Console.Write(text);
🧠 Código mais limpo com inferência de tipo.
dynamic
dynamic message = "Hello World - C# 4.0";
Console.Write(message);
⚠️ Uso possível, mas não recomendado para cenários simples.
using System.Threading.Tasks;
static async Task Main()
{
Console.Write("Loading");
await Task.Delay(500);
Console.Write(".");
await Task.Delay(500);
Console.Write(".");
await Task.Delay(500);
Console.Write(".");
}
📌 Saída:
Loading...
var version = "C# 6.0";
Console.Write($"Hello World - {version}");
var info = (Name: "C#", Version: "7.x");
Console.Write($"{info.Name} {info.Version}");
string? message = "Hello World - C# 8.0";
Console.Write(message ?? "Mensagem padrão");
🛡️ Proteção contra NullReferenceException.
Console.Write("Hello World - C# 9.0");
🔥 Código direto, sem Program.cs.
Console.Write("Hello World - C# 10.0");
📉 Menos boilerplate.
Console.Write("""
Hello World - C# 11.0
Sem quebra automática de linha
""");
var parts = ["Hello", "World", "C# 12.0"];
foreach (var part in parts)
{
Console.Write(part + " ");
}
📌 Saída:
Hello World C# 12.0
Console.Write("Hello World - C# 13.0 (Preview)");
🔮 Evolução focada em imutabilidade e padrões.
| Método | Comportamento |
|---|---|
Console.Write() |
Continua na mesma linha |
Console.WriteLine() |
Quebra linha automaticamente |
Console.Write()
✔️ Barras de progresso
✔️ Logs contínuos
✔️ CLI interativas
✔️ Animações simples no terminal
Exemplo de progresso:
for (int i = 0; i <= 100; i += 10)
{
Console.Write($"\rProgresso: {i}%");
Thread.Sleep(200);
}
Mesmo sendo simples, o Console.Write() acompanhou toda a evolução do C#.
A diferença está na forma como escrevemos código hoje — mais limpo, expressivo e seguro — não no método em si.
Dominar esses detalhes é essencial para escrever ferramentas CLI modernas, scripts, workers e ferramentas DevOps em .NET.
Se você trabalha com .NET moderno e quer dominar arquitetura, C#, DevOps ou interoperabilidade, vamos conversar:
💼 LinkedIn
✍️ Medium
📬 daniloopinheiro
Porque a minha mão fez todas estas coisas, e assim todas elas foram feitas, diz o Senhor; mas para esse olharei, para o pobre e abatido de espírito, e que treme da minha palavra.
Isaías 66:2
2025-12-27 04:30:19
Before we read, before we write, we see. The human brain devotes more processing power to vision than to any other sense. We navigate the world through sight first, and a single glance tells us more than paragraphs of description ever could.
For decades, this kind of visual understanding eluded machines. Computer vision could detect edges and match patterns, but couldn't truly see. Now, vision-capable language models (VLMs) can interpret images, form spatial relations, and reason about what they're looking at. They don't just parse pixels; they understand scenes.
Here, we will walk through how these models process visual data, combine it with language, and produce outputs that we can use.
Text models learned to write. Vision models are learning to perceive. When machines learn to see, not just parse pixels, but understand what they're looking at, they move closer to how we experience the world and become genuinely useful tools for solving real-world problems.
To "see," a model must first break the world into parts it can process. Just like an LLM can't understand entire sentences and needs them broken down into tokens, VLMs can't understand a whole image. However, we also don't want to feed it an entire image pixel by pixel.
The first step is then to divide the image into a grid, typically 16x16 pixels, of patches. It is these patches that the model can compare and reason about. The next step is to flatten the patches into a one-dimensional array:
These are then passed through a linear projection layer to become a patch embedding, a dense numerical vector representing the content of that small piece of the image. Instead of analyzing every pixel in isolation, the model learns from the relationships between patches: how edges align, how colors cluster, and how forms repeat.
This structure, learning from relationships rather than raw pixels, is what gives vision models their power. Through self-attention, the model identifies which patches belong together and begins to reason about both spatial structure ("where things are") and semantic meaning ("what they are").
During patch processing, the VLM moves from recognizing where things are to understanding what they are. Early layers focus on spatial features: oriented edges, corner detectors, texture patterns, and geometric layouts. These low-level features capture the structural skeleton of the image, preserving positional relationships between objects.
Later layers build on this foundation to extract semantic features. Rather than detecting edges or textures, these layers recognize higher-level concepts, such as "cat," "pillar," and "floor." They encode object categories, scene types, and relationships between elements. This is where the model learns that certain patch combinations represent a sleeping animal, not just a black and white blob.
The hierarchical nature of this processing matters. Spatial features alone can locate objects, but can't identify them. A model might detect four legs and a tail without knowing whether it's looking at a cat or a dog. Semantic features provide identity but lose precise positioning. The combination allows the model to both detect shapes and understand the scene: Milan is a cat (semantic), the pillar is behind him (spatial), and he's resting against it (relational understanding from both).
This separation also determines what tasks the model can handle. Object detection relies heavily on spatial features to draw bounding boxes. Image classification depends more on semantic features to categorize the scene. Image captioning requires both maintaining spatial relationships while identifying objects and their interactions.
Seeing isn't understanding. Real perception means connecting what's seen with what's said. To become useful, visual understanding must connect to language.
This is multimodality: a model's ability to process and relate information across different types of data, such as text, images, audio, or video. For VLMs, the challenge is aligning visual and textual information so that when the model sees a photo of a cat and reads the word "cat," it understands they refer to the same concept.
VLMs achieve this through cross-modal context alignment, which involves projecting visual embeddings and text embeddings into a joint latent space via learned projection layers. In this space, visual feature vectors extracted from patches showing fur, whiskers, and pointed ears achieve high cosine similarity with the token embedding for "cat."
Similarly, visual patches showing a mane, hooves, and tail map near the token "horse," clustering separately but using the same alignment mechanism.
This alignment occurs during training through techniques such as CLIP (Contrastive Language-Image Pretraining). The model processes pairs of images and their associated text (captions, questions, descriptors), learning which visual patterns correspond to which words and concepts. The goal is to pull matching image-text pairs closer together in the embedding space while pushing unrelated pairs apart.
Even with sophisticated training methods, alignment remains imperfect. Several issues get in the way:
Language ambiguity: The phrase "man by the bank" could mean a riverbank or a financial institution. The model can get confused.
Different information densities: Images hold thousands of visual details, while captions summarize them in a few words. Images speak a thousand words, and that can be applied here.
Spatial grounding: Understanding where something is in the image (e.g., "the gray cat on the floor") requires spatial awareness.
Misalignment leads to hallucinations (describing objects that are not present or incorrectly) or missed context (failing to connect related elements).
Understanding how VLMs process and align visual and textual information explains what happens inside the model. But to actually use these capabilities, you interact with them through APIs that abstract away the complexity. These APIs expose the model's multimodal reasoning while handling the heavy lifting of image encoding, tokenization, and inference.
Working with vision-capable APIs follows the same principles as working with a text model, with a few extra considerations around image pre-processing and structured output.
Use standard formats such as JPEG, PNG, or WebP.
Ensure your images stay within the API's payload size limits (for example, OpenAI's models currently allow up to 50 MB per request)
Encoding an image in Base64 is required as APIs usually work with text-only, not binary data.
Here's an example using gpt-4o in Python.
import base64
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY")
# 1. Load and encode image
with open("input_image.jpg", "rb") as f:
image_bytes = f.read()
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
# 2. Compose messages for multimodal chat
messages = [
{"role": "system", "content": "You are an image-understanding assistant. Reply in JSON with keys: objects, confidence, bounding_boxes."},
{"role": "user", "content": {"type": "image_data", "data": f"data:image/jpeg;base64,{image_b64}"}},
{"role": "user", "content": "List all the objects you see and their approximate locations."}
]
# 3. Submit the request
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.0,
max_tokens=300
)
# 4. Parse and print
output = response.choices[0].message.content
print("Model response:", output)
Some best practices for working with vision models:
Resolution: Downsample images above 2048px on the longest side. Higher resolution doesn't improve reasoning and increases token usage.
Format: Use JPEG for photographs, PNG for diagrams or screenshots with text. Both compress well while preserving necessary detail.
Quality: Ensure sufficient clarity for human interpretation. Excessive compression artifacts degrade model performance.
Encoding: Always use base64 encoding as shown in the example above.
Prompting: Distinguish between descriptive tasks ("caption this image") and inferential tasks ("what might this person be doing?"). VLMs perform differently on each.
For applications that need to parse model responses programmatically, structured output ensures consistent formatting. Schema-guided prompting provides an explicit JSON schema in your prompt and constrains the model's output format.
Return JSON with this exact structure:
{
"objects": [],
"relationships": [],
"caption": ""
}
Do not include any text before or after the JSON.
Set temperature below 0.2 to reduce variance in field names and structure. Lower temperature makes the model more deterministic, following your schema more precisely.
If your API supports it, you can use function calling, which allows you to define functions that return typed objects. The model generates structured calls that your code can parse natively, eliminating the need for manual JSON parsing.
Hallucinations stem from three main sources.
Cross-modal misalignment occurs when training data bias causes the model to infer objects from textual associations rather than visual evidence (e.g., inferring a pillar because cats and pillars often co-occur in training data).
Visual ambiguities, such as occlusions, low contrast, or unusual camera angles, produce uncertain embeddings.
During fusion, attention layers may overweight textual priors from the prompt instead of the actual image content, producing confident but visually incorrect responses.
Standardized benchmarks measure both recognition accuracy and reasoning capability across task categories:
Visual QA and reasoning: MMMU (Massive Multi-discipline Multimodal Understanding and Reasoning), MMBench
Specialized domains: MathVista (mathematical reasoning), ChartQA (chart interpretation), DocVQA (document understanding)
Video understanding: Video-MME
Run evaluations using tools like VLMEvalKit to compare model performance on your specific use case before deployment.
Vision models work by learning relationships between visual patterns and language concepts. They transform image patches into embeddings, align them with text through contrastive learning, and reason about both spatial structure and semantic meaning.
Modern APIs make this accessible. Understanding the underlying mechanics helps you debug failures, optimize prompts, and select the most suitable model. Vision capabilities are production-ready. The challenge is knowing when to use them and how to validate their outputs for your specific use case.
VLMs don't "see" pixels directly; they tokenize them. Same as words. A typical encoder (like a ViT, or CLIP) divides the image into small, fixed-size patches (e.g., 16x16 pixels). Each patch is flattened and passed through a linear projection layer, converting it into a vector embedding—a numerical summary of the local visual pattern.
Hallucinations usually originate from cross-modal misalignment or contextual overgeneralization, which can occur in some forms of training data bias. Visual ambiguity, over-regularization, and compression artifacts can lead to false object detection.
Return JSON with this exact structure:
{
"objects": [],
"relationships": [],
"caption": ""
}
Use function calling or response_format parameters (where available). These enable the model to generate native, structured objects.
Always parse responses and re-ask for correction when schema errors occur ("Your JSON was invalid; please reformat according to ...").
Lowering the temperature (<0.2) reduces the model's creative variance in field names and JSON structure format.
While both are multimodal LLMs, their fusion architectures differ:
GPT-4o ("omni") uses unified early fusion. The input image is encoded into visual tokens that are processed through the same transformer as text tokens. This enables proper joint attention, where the model can simultaneously "look" at an image region while reading a sentence.
Gemini (1.5 Pro) follows hybrid or late fusion. Visual encoders (based on ViT/Perceiver) produce embeddings that are later injected into the text model.
Evaluation revolves around objective metrics and human-interpretable checks:
For captioning and description tasks: use BLEU, METEOR, ROUGE, or CIDEr.
For grounding / reasoning: Visual Question Answering (VQA-v2, GQA): test factual consistency with visual input; Visual entailment datasets (SNLI-VE, ScienceQA): measure logical reasoning grounded in images; RefCOCO, COCO-Panoptic: for object localization accuracy.
Human or synthetic audits: have the model explain why it made a claim and cross-check for visual justification.
Consistency testing: perturb the same image (e.g., crop, rotate, change caption wording) and check the stability of reasoning — large variance signals weak visual grounding.
2025-12-27 04:29:43
This post is a short, opinionated reflection on one thing that feels most notably missing in F# today. It provides brief background, references recent discussions on why this gap exists, and ends with a teaser of a possible solution that we will shortly have in our arsenal.
When F# Alone Is Not Enough
Let’s look at something we already have—even if we don’t quite treat it as a first-class citizen. Let's look without precise category-theory definitions, just an intuitive picture.
You may have heard of a Functor
Functor is a great name for a dog.
In programming, you can leave one alone for days
and it will still behave exactly the same, obeying all the laws.
As an F# developer, if you want to play with a Functor, you usually have to name your dog yourself (or your cat — but that would make the metaphor silly).
To be precise: such a pet does exist in F#. But it remains unnamed — unless you use the FSharpPlus library as a kind of dog collar. And while that helps, it’s still not quite the same.
The behavior is present; the abstraction is not.
In F#, the idea of a functor survives without the name — an example of how far the language goes, and where it deliberately stops.
This is not a complaint. It’s a reflection on growth — of the language, and of its ecosystem.
There are things that are simply not expressible in F# today.
The most important of these is the ability to abstract over type constructors themselves.
Full HKT support lets you name, pass around, store, compose, and reason about abstractions—not just use them implicitly through functions.
In F#, we can rely on functorial behavior (map on Option, List, Async, etc.), but we cannot name it, quantify over it, or require it in our own APIs.
Changing a language at its core—for example, by redesigning generics—is an extremely disruptive, breaking change. Scala is a notable example here: because it was designed from the ground up to support HKTs, it was able to implement type classes through its implicits mechanism.
F#, by contrast, is built on .NET’s reified generics, which prioritize runtime performance and cross-language compatibility, but lack the native infrastructure to abstract over containers (F[T]) without complex workarounds.
Don Syme, in his well-known position on type classes, writes:
“I don’t want F# to be the kind of language where the most empowered person in the Discord chat is the person who knows the most category theory or abstract algebra.”
(Please read the full article; this excerpt is intentionally selective and used here only to frame the discussion.)
That position is understandable — and arguably the right one for a mainstream language.
And what about creating a new language altogether?
As Anders Hejlsberg, the creator of C# and TypeScript, recently noted:
Introducing new programming languages in the AI era is inherently disadvantaged: the “best” language for AI is the one it has seen the most.
So, to tackle today's complexities, maybe indeed we should revisit Category Theory?
Yet another CT geek, huh? I'm still learning, but the past few months—after years of trying and abandoning the subject—have brought me to a realization: you either know Category Theory well enough to use it confidently, or you don't use it at all.
A DSL with real HKT support would allows us to define behavior once and lift it across structure. It lets us reason about effects without committing to execution, and about context without sequencing. That’s why functors—and, naturally, comonads—are often the right place to start when designing complex effect systems. They operate at a more fundamental layer than monads: the layer where the meaning of context is defined, before effects are sequenced.
With HKTs, a lot of CT will become more obvious. Almost boring, but surprisingly enjoyable.
And that’s the point.
What does not emerge in the open often continues to grow elsewhere.
In 2026, we’ll be able to use more ideas from Category Theory in practice—not by escaping language constraints, but by working with them deliberately. This will happen through a DSL/language being built at BLP: written in F# and extensible with F#.
I have the privilege of being the dumbest person in that room—and of getting to play with these ideas early.
This post is not a reveal of that DSL yet, but simply a short opening note. The concrete parts, with proper credit to authors will be discussed starting in January.
I'm alone in this opinion ?
This year’s Advent of F# calendar reflects a noticeable shift. The majority of contributions focus on domain modeling, types, language tooling (LSPs), and even programming language design, rather than MCP-related topics. At the same time, Tomas Petricek—the long-time advocate of effect systems—continues to deliver excellent lessons on building small, well-structured systems in F#. There is also a promising book coming next year, The Book of Functions, which fits perfectly into this broader reflection.
2025-12-27 04:24:17
Originally published on LeetCopilot Blog
Beginners copy the snippet without understanding why elements are popped. Here's the labeled step-by-step breakdown.
Sliding window maximum is the classic place where a monotonic queue feels like magic. Beginners often copy a snippet without understanding why elements are popped or when indices expire. This article slows the process down with labeled steps, ASCII visuals, and a minimal code template you can adapt in interviews.
If you prefer a string-focused sliding window first, revisit the pattern in Sliding Window Template for LeetCode Strings to warm up the mental model.
Array: [1, 3, -1, -3, 5, 3, 6, 7], window k = 3.
After processing 1,3,-1 (window ends at idx 2): deque holds [3, -1] → max = 3
After idx 3 (-3): pop from back? No. Evict front? No. max = 3
After idx 4 (5): pop -3, -1, 3; deque becomes [5] → max = 5
After idx 5 (3): pop from back? No (3 < 5). Evict front? No. max = 5
i - k + 1, pop front.i >= k - 1, record deque front as the max.k = 1, every element is its own maximum.from collections import deque
def max_sliding_window(nums, k):
q = deque() # holds indices, not values
res = []
for i, val in enumerate(nums):
while q and nums[q[-1]] <= val:
q.pop() # remove smaller values
q.append(i)
if q[0] <= i - k:
q.popleft() # evict out-of-window index
if i >= k - 1:
res.append(nums[q[0]])
return res
Compare how eviction and ordering are handled here versus the counting technique in Prefix Sum Patterns for LeetCode Beginners; both approaches hinge on keeping constant-time window information.
Without indices, you can’t evict correctly when the window moves.
If the front index is out of window, pop it before reading the max.
< instead of <= in the pop condition
Using < keeps equal values and can cause unnecessary churn; <= keeps the queue strictly decreasing.
Start recording results once you’ve filled the first window (i >= k - 1).
A monotonic queue for sliding window maximum works because each element is pushed and popped at most once while the deque remains decreasing. With indices, clear pop rules, and steady narration, you’ll walk through interview dry runs confidently and avoid stale maximum bugs.
If you're looking for an AI assistant to help you master LeetCode patterns and prepare for coding interviews, check out LeetCopilot.