MoreRSS

site iconThe Practical DeveloperModify

A constructive and inclusive social network for software developers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of The Practical Developer

I Built an MCP Server So I'd Never Have to Manually Import Excel Data Again

2026-02-10 12:55:52

Or: How I spent a Saturday building MindfulMapper instead of doing literally anything else

The Problem That Started It All

Picture this: You're running a small cafe. You've got your menu in an Excel spreadsheet (because that's what everyone uses, right?). Now you need to get that data into MongoDB for your new web app.

Your options:

  1. Copy-paste each item manually (soul-crushing)
  2. Write a one-off Node.js script (works once, breaks next time)
  3. Ask ChatGPT to write the script (gets you 80% there, then you're on your own)

I wanted option 4: Talk to Claude like a human and have it just... work.

That's where MCP (Model Context Protocol) comes in.

What Even is MCP?

MCP is basically a way to give Claude (or any AI) superpowers. Instead of Claude just answering questions, it can actually do things - like reading files, calling APIs, or in my case, importing Excel data into databases.

Think of it like this:

  • Without MCP: Claude is a really smart friend who can only give advice
  • With MCP: Claude is a really smart friend who can actually SSH into your server and fix things

The catch? You have to build the "server" that does the actual work.

Building MindfulMapper

The Vision

I wanted to be able to say:

"Hey Claude, import menu.xlsx into my products collection. Map 'Name (EN)' to name.en and 'Name (TH)' to name.th. Oh, and auto-generate IDs with prefix 'spb'."

And have it... just work.

The Reality (aka Pain Points)

Pain Point #1: The Dotenv Disaster

My first version used dotenv to load environment variables. Seemed innocent enough:

import dotenv from 'dotenv';
dotenv.config();

Turns out, dotenv prints a helpful message to stdout:

[[email protected]] injecting env (4) from .env

Claude Desktop saw this message, tried to parse it as JSON (because MCP uses JSON-RPC), and promptly died. Took me WAY too long to figure this out.

Solution: Either suppress the message or just hardcode the env vars in the Claude Desktop config. I went with option 2.

Pain Point #2: SDK Version Hell

The MCP SDK is evolving fast. Like, really fast. Version 1.26.0 uses completely different syntax than what's in the examples online.

What the examples showed:

server.addTool({
  name: "my_tool",
  description: "Does a thing",
  parameters: z.object({...}),
  execute: async ({...}) => {...}
});

What actually works (v1.26.0):

const server = new Server(
  { name: "my-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [...]
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  // Handle tool calls
});

Yeah. Completely different. Spent hours on this one.

Pain Point #3: Auto-Generating IDs

I wanted IDs like spb-0001, spb-0002, etc. Seems simple, right?

The trick is maintaining a counter in MongoDB:

async function getNextId(prefix = 'spb') {
  const counterCollection = db.collection('counters');
  const result = await counterCollection.findOneAndUpdate(
    { _id: 'item_id' },
    { $inc: { seq: 1 } },
    { upsert: true, returnDocument: 'after' }
  );
  const num = result.seq || 1;
  return `${prefix}-${String(num).padStart(4, '0')}`;
}

This ensures:

  • No duplicate IDs (even if you import the same file multiple times)
  • Sequential numbering
  • Custom prefixes for different item types

The Cool Parts

1. Flexible Column Mapping

Want to map Excel columns to nested MongoDB objects? Easy:

// Excel columns: "Name (EN)", "Name (TH)"
// Mapping: { "name.en": "Name (EN)", "name.th": "Name (TH)" }

// Result in MongoDB:
{
  id: "spb-0001",
  name: {
    en: "Americano",
    th: "อเมริกาโน่"
  }
}

The mapper handles the dot notation automatically.

2. Natural Language Interface

This is the magic part. Instead of writing code every time, I just tell Claude:

"Import menu.xlsx into products collection. Use prefix 'spb'. Clear existing data."

Claude translates that into the right MCP tool call with the right parameters. It's like having a very patient assistant who never gets tired of your data imports.

3. It Actually Works in Production

I'm using this with MongoDB Atlas (cloud) for a real cafe menu system. The fact that it works reliably enough for production use still surprises me.

How to Use It

If you want to try it yourself:

1. Install

git clone https://github.com/kie-sp/mindful-mapper.git
cd mindful-mapper
npm install

2. Configure Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mindful-mapper": {
      "command": "node",
      "args": ["/full/path/to/mindful-mapper/upload-excel.js"],
      "env": {
        "MONGODB_URI": "your-mongodb-connection-string",
        "MONGODB_DB_NAME": "your_db",
        "ID_PREFIX": "spb"
      }
    }
  }
}

3. Use It

Restart Claude Desktop, then:

"Import /path/to/menu.xlsx into products collection. Use prefix 'spb'."

That's it.

Lessons Learned

1. MCP is Still Early Days

The SDK is changing fast. Code from 2 months ago might not work today. Always check the version you're using.

2. Debugging MCP Servers is... Interesting

Your server crashes silently. No stack traces in Claude. Your only friend is:

node your-server.js 2>&1

And the logs at:

~/Library/Logs/Claude/mcp*.log

3. The Payoff is Worth It

Once it works, it's magic. I went from dreading data imports to actually enjoying them (well, not dreading them at least).

What's Next?

Current limitations I want to fix:

  1. No schema validation - It'll happily import garbage data
  2. No update/upsert mode - Only insert or clear-and-insert
  3. MongoDB only - PostgreSQL support exists but needs love
  4. No multi-sheet support - First sheet only

But honestly? For 90% of my use cases, it works perfectly as-is.

Try It Yourself

If you build something cool with it, let me know! Or if you hit the same pain points I did, at least now you know you're not alone.

Final Thoughts

Building MCP servers is weird. It's not quite backend development, not quite AI engineering. It's this new thing where you're building tools for an AI to use on your behalf.

But when it works? When you can just casually tell Claude to handle your data imports while you go make coffee? That's pretty cool.

If you end up using this or building something similar, I'd love to hear about it! Feel free to open an issue on GitHub or reach out.

Happy importing! 🎉

Built with: Node.js, MongoDB, MCP SDK, and an unreasonable amount of trial and error

You Sharded Your Database. Now One Shard Is On Fire

2026-02-10 12:49:44

You did everything right.

Split the database into 16 shards. Distributed users evenly by user_id hash. Each shard handles 6.25% of traffic. Perfect balance.

Then Black Friday happened.

One celebrity with 50 million followers posted about your product. All 50 million followers have user IDs that hash to... shard 7.

Shard 7 is now handling 80% of your traffic. The other 15 shards are idle. Shard 7 is melting.

Welcome to the Hot Partition Problem.

Why Hashing Isn't Enough

Hash-based sharding looks perfect on paper:

def get_shard(user_id):
    return hash(user_id) % num_shards

Uniform distribution. Simple logic. What could go wrong?

Everything. Because real-world access patterns don't care about your hash function.

Scenario 1: Celebrity Effect

A viral post from one user means millions of reads on that user's shard. Followers are distributed across shards, but the content they're accessing isn't.

Scenario 2: Time-Based Clustering

Users who signed up on the same day often have sequential IDs. They also often have similar usage patterns. Your "random" distribution isn't random at all.

Scenario 3: Geographic Hotspots

Morning in Tokyo means heavy traffic from Japanese users. If your sharding key correlates with geography, one shard gets hammered while others sleep.

How to Detect Hot Partitions

You can't fix what you can't see.

Monitor per-shard metrics:

Shard 1:  CPU 15%  |  QPS 1,200  |  Latency P99 45ms
Shard 2:  CPU 12%  |  QPS 1,100  |  Latency P99 42ms
Shard 7:  CPU 94%  |  QPS 18,500 |  Latency P99 890ms  ← PROBLEM
Shard 8:  CPU 18%  |  QPS 1,400  |  Latency P99 51ms

Set up alerts:

  • Single shard CPU > 70% while others < 30%
  • Single shard latency > 3x average
  • Single shard QPS > 5x average

Track hot keys:

Log the most frequently accessed keys per shard. The top 1% of keys often cause 50% of load.

Solution 1: Add Randomness to Hot Keys

For keys you know will be hot, add a random suffix:

def get_shard_for_post(post_id, is_viral=False):
    if is_viral:
        # Spread across multiple shards
        random_suffix = random.randint(0, 9)
        return hash(f"{post_id}:{random_suffix}") % num_shards
    else:
        return hash(post_id) % num_shards

A viral post now spreads across 10 shards instead of 1. Reads are distributed. Writes need to fan out, but that's usually acceptable.

The tricky part: knowing which keys will be hot before they're hot.

Solution 2: Dedicated Hot Shard

Accept that some data is special. Give it special treatment.

HOT_USERS = {"celebrity_1", "celebrity_2", "viral_brand"}

def get_shard(user_id):
    if user_id in HOT_USERS:
        return HOT_SHARD_CLUSTER  # Separate, beefier infrastructure
    return hash(user_id) % num_shards

The hot shard cluster has more replicas, more CPU, more memory. It's designed to handle disproportionate load.

Update the HOT_USERS list dynamically based on follower count or recent engagement metrics.

Solution 3: Caching Layer

Don't let hot reads hit the database at all.

def get_post(post_id):
    # Check cache first
    cached = redis.get(f"post:{post_id}")
    if cached:
        return cached

    # Cache miss - hit database
    post = database.query(post_id)

    # Cache with TTL based on hotness
    ttl = 60 if is_hot(post_id) else 300
    redis.setex(f"post:{post_id}", ttl, post)

    return post

For viral content, a 60-second cache means the database sees 1 query per minute instead of 10,000 queries per second.

Shorter TTL for hot content sounds counterintuitive, but it ensures fresher data for content people actually care about.

Solution 4: Read Replicas Per Shard

Scale reads horizontally within each shard:

Shard 7 Primary (writes)
    ├── Replica 7a (reads)
    ├── Replica 7b (reads)
    ├── Replica 7c (reads)
    └── Replica 7d (reads)

When shard 7 gets hot, spin up more read replicas for that specific shard. Other shards stay lean.

This works well for read-heavy hotspots. Write-heavy hotspots need different solutions.

Solution 5: Composite Sharding Keys

Don't shard on a single dimension:

# Bad: Single key sharding
shard = hash(user_id) % num_shards

# Better: Composite key
shard = hash(f"{user_id}:{content_type}:{date}") % num_shards

Composite keys add entropy. A celebrity's posts are now spread across shards by date, not concentrated in one place.

The trade-off: queries that span multiple values need to hit multiple shards. Design your access patterns accordingly.

Solution 6: Dynamic Rebalancing

When a partition gets hot, split it:

Before:
Shard 7 handles hash range [0.4375, 0.5000]

After split:
Shard 7a handles [0.4375, 0.4688]
Shard 7b handles [0.4688, 0.5000]

Modern distributed databases like CockroachDB and TiDB do this automatically. If you're running your own sharding, you'll need to build this logic.

Key considerations:

  • Data migration during split
  • Connection draining
  • Query routing updates

Prevention Checklist

Before your next traffic spike:

1. Know your hot keys
Run analytics on access patterns. Which users, which content, which time periods drive disproportionate load?

2. Design for celebrities
If your product could have viral users, plan for them. Don't wait until you have one.

3. Monitor per-shard, not just aggregate
Average latency across 16 shards hides the shard that's dying. Track each one individually.

4. Test with realistic skew
Load tests with uniform distribution prove nothing. Simulate 80% of traffic hitting 5% of keys.

5. Have a manual override
When detection fails, you need a way to manually mark keys as hot and reroute them.

The Reality

Perfect distribution doesn't exist in production.

Users don't behave uniformly. Content doesn't go viral uniformly. Time zones don't align uniformly.

Your sharding strategy needs to handle the 99th percentile, not the average. One hot partition can take down your entire system while 15 other shards sit idle.

Design for imbalance. Monitor for hotspots. Have a plan before the celebrity tweets.

Further Reading

For comprehensive patterns on building resilient distributed databases—including sharding strategies, replication topologies, and connection management for high-traffic platforms:

Enterprise Distributed Systems Architecture Guide

16 shards. Perfect hashing. One celebrity. One fire.

GitHub Copilot SDK vs Azure AI Foundry Agents: Which One Should Your Company Use?

2026-02-10 12:45:04

TL;DR for the Busy Person

  • GitHub Copilot SDK gives you the same agentic core that powers Copilot CLI — context management, tool orchestration, MCP integration, model routing — so you can embed it in any app without building an agent platform from scratch. Best for: developer tools, Copilot Extensions, and any software where the user is a developer.
  • Azure AI Foundry Agents is a full platform for building, deploying, and governing general-purpose enterprise AI agents across any business domain. Best for: Product & end-user facing apps, customer support, document processing, ops automation, and multi-agent workflows outside of dev.
  • They're complementary. Most enterprises will use both — Copilot SDK for the developer layer, Foundry for the business layer.
  • Both are enterprise-ready. GitHub's platform carries SOC 2 Type II, ISO 27001, IP indemnification, and has feature set that covers the entire SDLC.

The Problem: Building Agentic Workflows From Scratch Is Hard

Building agentic workflows from scratch is hard.

You have to manage context across turns, orchestrate tools and commands, route between models, integrate MCP servers, and think through permissions, safety boundaries, and failure modes. Even before you reach your actual product logic, you've already built a small platform.

Most teams don't want to build a platform. They want to build a product.

This is the core problem both GitHub Copilot SDK and Azure AI Foundry Agents solve — but they solve it for different audiences, in different contexts, with different trade-offs.

The Confusion

After talking to customers and internal teams, one theme keeps coming up:

"We want to build AI agents. Do we use the GitHub Copilot SDK or Azure AI Foundry? What's the difference? Can we use both?"

The short answer: they solve different problems. But the overlap in marketing language ("agents," "AI," "SDK") makes it murky. Let's fix that.

confused meme

What Is the GitHub Copilot SDK?

The GitHub Copilot SDK (now in technical preview) removes the burden of building your own agent infrastructure. It lets you take the same Copilot agentic core that powers GitHub Copilot CLI and embed it in any application.

What the SDK handles for you:

You used to build this yourself Copilot SDK handles it
Context management across turns ✅ Maintained automatically
Tool/command orchestration ✅ Automated invocation and chaining
MCP server integration ✅ Built-in protocol support
Model routing (GPT-4.1, etc.) ✅ Dynamic, policy-based routing
Planning and execution loops ✅ Multi-step reasoning out of the box
Permissions and safety boundaries ✅ Enforced by the runtime
Streaming responses ✅ First-class support
Auth and session lifecycle ✅ Managed under the hood

What you focus on: Your domain logic. Your tools. Your product.

Available in: TypeScript/Node.js, Python, Go, .NET

Designed for: Developer-facing applications — Copilot Extensions, dev portals, CLI tools, code review bots, internal productivity tools.

Why This Matters

Without the SDK, you're stitching together an LLM API, a context window manager, a tool registry, an execution loop, error handling, and auth — before writing a single line of product code. The SDK collapses all of that into an import.

Your App
    │
    ├── Your product logic + custom tools
    ├── Copilot SDK (agentic core)
    │       │
    │       └──── handles context, orchestration,
    │             MCP, model routing, safety
    │       │
    │       └──── HTTPS ──▶ GitHub Cloud (inference)
    │
    └── Your UI / API / Extension

What Is Azure AI Foundry Agent Service?

Azure AI Foundry is a full platform for building, deploying, and governing enterprise AI agents for any business domain — not just developer workflows.

What Foundry gives you:

  • Multi-agent orchestration (multiple specialized agents coordinating on a workflow)
  • Deep data connectors (SharePoint, SQL, M365, external APIs, Logic Apps)
  • Bring your own model (Azure OpenAI, open-source, frontier, or custom)
  • Goal-driven autonomy with thread-based memory
  • Full governance stack (Entra ID, Purview, Defender)
  • One-click deploy to Teams, M365 Copilot, web apps

Designed for: Business-wide automation — customer support agents, document processing pipelines, HR/finance/IT workflows, knowledge management with RAG.

When to Use Which

Use the Copilot SDK when:

  • Your user is a developer. You're building tools that live in the developer workflow — IDE extensions, CLI tools, Copilot Chat extensions, internal dev portals.
  • You want to ship fast. The SDK gives you a production-tested agentic core on day one. No need to build context management, tool orchestration, or model routing.
  • You're extending GitHub Copilot. Building a @my-tool extension for Copilot Chat is a first-class use case.
  • You want to stay in the GitHub ecosystem. Your code, your PRs, your CI/CD, and now your AI agent — all on the same platform.
  • You need something lightweight. Import the SDK, register your tools, start a session. That's it.

impressive

Use Azure AI Foundry when:

  • Your user is not a developer. You're building for support teams, ops, finance, HR, or customers.
  • You need multi-agent orchestration. Multiple specialized agents collaborating on a complex workflow (e.g., triage → investigate → resolve → notify).
  • You need deep data integration. Connecting to SharePoint, SQL, CRM, ERP, or other enterprise data sources via built-in connectors.
  • You want to bring your own model. Fine-tuned models, open-source models, or models from different providers.
  • You need VNet isolation or customer-managed encryption. For workloads where network-level isolation or key management is a hard requirement.

Use both when:

  • You're an enterprise with developer teams AND business teams that both need AI agents.
  • Your dev team uses Copilot SDK for internal tooling, while your platform team uses Foundry for customer-facing agents.
┌──────────────────────────────────────────────┐
│             ENTERPRISE AI STACK              │
│                                              │
│  ┌───────────────────┐ ┌──────────────────┐  │
│  │  Developer Layer  │ │  Business Layer  │  │
│  │                   │ │                  │  │
│  │  Copilot SDK      │ │  Azure AI        │  │
│  │  • @extensions    │ │  Foundry Agents  │  │
│  │  • Dev portals    │ │  • Product apps  │  │
│  │  • CLI agents     │ │  • Biz pipelines │  │
│  │  • Ops workflows  │ │  • Ops workflows │  │
│  └───────────��───────┘ └──────────────────┘│
│                                              │
│  Shared: Microsoft identity, models, trust   │
└──────────────────────────────────────────────┘

dev-meme-2

Trade-Offs Worth Knowing

Dimension Copilot SDK Azure AI Foundry
Time to first agent Fast — import SDK, register tools, go Slower — more config, but more control
Agentic core included? ✅ Yes — same engine as Copilot CLI You build or configure your own agent logic
Model choice GitHub-hosted models (GPT-5.3, etc.) + BYO Model BYO model, Azure OpenAI, open-source, frontier
MCP support ✅ Built-in Supported via configuration
Multi-agent orchestration Early / evolving First-class, production-ready
Data connectors You build your own (via custom tools) Built-in (SharePoint, SQL, M365, Logic Apps)
Deployment surface Your app, VS Code, GitHub.com Teams, M365 Copilot, web apps, containers
Network isolation (VNet) Not available ✅ Full VNet / private endpoint support
Customer-managed keys Microsoft-managed ✅ Azure Key Vault
Infrastructure ownership GitHub-managed (less to operate) Your Azure subscription (more control, more ops)
Billing model Per Copilot seat Azure consumption-based
Best for Developer tools and workflows Business-wide automation at scale

A Note on Enterprise Readiness & Compliance

There's a misconception that GitHub is only "good enough" for compliance compared to Azure. That's not accurate.

GitHub is an enterprise-grade platform trusted by the world's largest companies and governments. The compliance posture is strong and getting stronger:

Certification / Control GitHub (incl. Copilot)
SOC 2 Type II ✅ Available (Copilot Business & Enterprise in scope)
SOC 1 Type II ✅ Available
ISO/IEC 27001:2022 ✅ Copilot in scope
FedRAMP Moderate 🔄 Actively pursuing
IP Indemnification ✅ Included for enterprise plans
No training on your code ✅ Copilot Business/Enterprise data is never used for model training
Duplicate detection filtering ✅ Available to reduce IP risk

GitHub's story is the end-to-end developer platform — code, security, CI/CD, and now AI — all enterprise-ready under one roof. The Copilot SDK extends that story into your own applications.

On data residency specifically: yes, Azure AI Foundry offers more granular region-bound controls (VNet isolation, customer-managed keys, explicit region pinning). This matters for certain regulated workloads. But for many enterprises — especially in the US and Canada — data-in-transit concerns with GitHub Copilot are well-addressed by existing encryption, privacy controls, and contractual terms. Data residency is worth understanding, but it shouldn't be the sole deciding factor. Evaluate it alongside your actual regulatory requirements, not as a blanket blocker.

Decision Framework

What are you building?
│
├── A developer tool, extension, or dev or ops workflow?
│   └── ✅ GitHub Copilot SDK
│       • You get a production-tested agentic core out of the box
│       • Ship fast, stay in the GitHub ecosystem
│       • Enterprise-ready compliance
│
├── A product, business/ops/customer-facing agent?
│   └── ✅ Azure AI Foundry
│       • Multi-agent orchestration, data connectors, BYO model
│       • Full Azure governance and network isolation
│
├── Both?
│   └── ✅ Use both — they're complementary
│
└── Not sure yet?
    └── Start with the user:
        • Developer or internal? → Copilot SDK
        • End-facing user or Non-developer? → Foundry

Key Takeaways

  1. The Copilot SDK removes the hardest part of building agents. Context, orchestration, MCP, model routing, safety — it's all handled. You focus on your product.

  2. Foundry is for the business. When your agents are for an user facing product or orchestrate across departments, or serve non-developer users, Foundry is the right tool.

  3. GitHub is enterprise-ready. SOC 2, ISO 27001, IP indemnification, no training on your data.

  4. They're complementary, not competing. Copilot SDK for the developer/ops layer. Foundry for the product/business layer. Same Microsoft ecosystem underneath.

  5. Start with the user, not the technology. Who is the agent serving? That answer picks your tool.

Resources

I’m Ve Sharma, a Solution Engineer at Microsoft focusing on Cloud & AI working on GitHub Copilot. I help developers become AI-native developers and optimize the SDLC for teams. I also make great memes. Find me on LinkedIn or GitHub.

I built a Dark Mode Holiday Calendar with Next.js because TimeAndDate was too cluttered

2026-02-10 12:40:27

Building a Clean, Ad-Free Holidays Calendar (Because the Internet Needed One)

The Problem That Started It All

Picture this: It's late 2025, and I'm trying to plan my time off for 2026. Simple task, right? Just look up the bank holidays and mark my calendar.

Except it wasn't simple at all.

Every website I visited was a minefield of:

  • Aggressive pop-up ads
  • Auto-playing videos
  • Newsletter signup modals
  • Cluttered layouts that made finding actual dates feel like a treasure hunt

I literally just needed a list of dates. That's it. No fluff, no fuss, no "Sign up for our premium calendar experience!"

Sometimes You Just Have to Build It Yourself

After closing my tenth ad popup, I had that moment every developer knows well: "I could build this better in an afternoon."

So I did. Meet HolidaysCalendar.net — a straightforward, clean holidays calendar that respects your time and your eyeballs.

The Tech Stack (Keeping It Simple)

I didn't need anything fancy here. The goal was speed and simplicity:

  • Next.js: Perfect for static site generation. The pages load instantly because they're pre-rendered.
  • Tailwind CSS: Made implementing the dark mode incredibly smooth (more on that in a second).
  • Custom JSON datasets: Clean, structured data for US and UK holidays. No external API calls, no loading spinners.

What Makes It Different

🚫 Zero Ads

This one's non-negotiable. The whole point was escaping ad hell, so there are none. Not now, not ever.

🌙 Native Dark Mode

I'm a dark mode enthusiast, so this was a must-have. Using Tailwind's dark mode utilities, I built a theme that's easy on the eyes whether you're planning vacation days at midnight or during your lunch break.

The implementation is native to the system preferences — if your device is in dark mode, the site follows automatically. No toggle needed (though I might add one later for the rebels who want light mode at 2 AM).

⚡ 100/100 Lighthouse Score

I'm pretty proud of this one. The site scores perfect 100s across the board:

  • Performance
  • Accessibility
  • Best Practices
  • SEO

It loads fast, it's accessible, and it does exactly what it says on the tin.

Why I'm Sharing This

Honestly? Because I think we need more of this on the internet.

Not every website needs to be monetized to death. Sometimes a tool is just a tool — something useful you build because it solves a real problem you (and probably others) have.

If you've ever rage-closed a tab because you couldn't find simple information through the ad chaos, this site is for you.

What's Next?

Right now, it covers US and UK bank holidays. I'm thinking about adding:

  • More countries
  • iCal export functionality
  • A simple way to download the dates for your planning tools

But I'm keeping it lean. The whole point is simplicity.

Try It Out

Head over to HolidaysCalendar.net and let me know what you think — especially about the dark mode! I'd love feedback from fellow developers and regular users alike.

And if you're planning your 2026 PTO right now, you're welcome. 😊

Have thoughts on the dark mode implementation or ideas for features? I'm all ears. This is built for real people with real planning needs — your feedback makes it better.

📂 Build a File Size Organizer GUI in Python with Tkinter

2026-02-10 12:31:00

In this tutorial, we’ll build a Python application that:

Lets users monitor folders.

Categorizes files by size.

Organizes files into folders automatically.

Provides a live file preview with filtering and sorting.

We’ll be using Tkinter, watchdog, threading, and tkinterdnd2 for drag-and-drop support.

Step 1: Import Required Libraries

First, we need to import the libraries for GUI, file handling, and background monitoring.

import os
import json
import shutil
import tkinter as tk
from tkinter import ttk, filedialog, messagebox
from tkinterdnd2 import DND_FILES, TkinterDnD
from threading import Thread
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import sv_ttk

Explanation:

os, shutil → for interacting with the file system.

json → to save/load folder configurations.

tkinter, ttk → GUI elements.

tkinterdnd2 → drag-and-drop functionality.

watchdog → monitors folders in real-time.

threading → allows long-running tasks without freezing the GUI.

Step 2: Configuration & Helper Functions

We’ll store monitored folders in a JSON file and create some helper functions.

CONFIG_FILE = "folders_data.json"

def load_folders():
    if os.path.exists(CONFIG_FILE):
        with open(CONFIG_FILE, "r", encoding="utf-8") as f:
            return json.load(f)
    return []

def save_folders(folders_data):
    with open(CONFIG_FILE, "w", encoding="utf-8") as f:
        json.dump(folders_data, f, ensure_ascii=False, indent=4)

Explanation:

load_folders() → reads previously monitored folders from JSON.

save_folders() → writes current folders to JSON for persistence.

We also need some utility functions for file handling:

def sanitize_folder_name(name):
    return "".join(c for c in name if c not in r'<>:"/\|?*')

def format_size(size_bytes):
    if size_bytes < 1024:
        return f"{size_bytes} B"
    elif size_bytes < 1024**2:
        return f"{size_bytes/1024:.2f} KB"
    elif size_bytes < 1024**3:
        return f"{size_bytes/1024**2:.2f} MB"
    else:
        return f"{size_bytes/1024**3:.2f} GB"

def categorize_file(size):
    if size < 1024*1024:
        return "Small_1MB"
    elif size < 10*1024*1024:
        return "Medium_1-10MB"
    else:
        return "Large_10MB_plus"

Explanation:

sanitize_folder_name() → removes illegal characters for folder names.

format_size() → converts bytes into human-readable format.

categorize_file() → groups files by size into small, medium, or large.

Step 3: Create the Main Application Window

We use TkinterDnD for drag-and-drop support.

root = TkinterDnD.Tk()
root.title("📂 File Size Organizer Pro - Filter & Sort")
root.geometry("1300x620")

Explanation:

TkinterDnD.Tk() → initializes the main window with drag-and-drop support.

title and geometry → set the window title and size.

Step 4: Global Variables

We’ll use some global variables for storing folder data, filtered files, and observer state.

folders_data = load_folders()
last_operation = []
combined_files = []
filtered_files = []
observer = None
current_filter = tk.StringVar(value="All")
current_sort = tk.StringVar(value="Name")
watcher_paused = False

Explanation:

folders_data → all folders being monitored.

combined_files → all files scanned from these folders.

filtered_files → filtered/sorted files for display.

observer → watchdog observer for real-time monitoring.

current_filter & current_sort → track GUI dropdown selections.

Step 5: Add & Remove Folders

We’ll create functions to add or remove folders.

def add_folder(path):
    path = path.strip()
    if not path:
        messagebox.showwarning("Invalid Folder", "Please select or enter a folder path.")
        return
    path = os.path.abspath(path)
    if not os.path.isdir(path):
        messagebox.showwarning("Invalid Folder", "The selected path is not a valid folder.")
        return
    if path in folders_data:
        messagebox.showinfo("Already Added", "This folder is already being monitored.")
        return
    folders_data.append(path)
    save_folders(folders_data)
    start_watcher(path)
    refresh_combined_preview()

Explanation:

Validates the folder path.

Prevents duplicates.

Saves to JSON and starts a watcher for real-time updates.

def remove_folder(path):
    path = os.path.abspath(path.strip())
    if path in folders_data:
        folders_data.remove(path)
        save_folders(folders_data)
        refresh_combined_preview()
        set_status(f"Removed folder: {path}")
    else:
        messagebox.showwarning("Not Found", "Folder not found in the list.")

Explanation:

Removes a folder from monitoring and refreshes the preview.

Step 6: Scan Folders & Refresh Preview

def scan_folder(folder_path):
    files = []
    for f in os.listdir(folder_path):
        path = os.path.join(folder_path, f)
        if os.path.isfile(path):
            try:
                size = os.path.getsize(path)
            except (FileNotFoundError, PermissionError):
                continue
            category = sanitize_folder_name(categorize_file(size))
            files.append((folder_path, f, size, category))
    return files

def refresh_combined_preview():
    global combined_files
    combined_files = []
    for folder in folders_data:
        combined_files.extend(scan_folder(folder))
    apply_filter_sort()

Explanation:

scan_folder() → reads files in a folder and categorizes them.

refresh_combined_preview() → combines files from all folders and applies filter/sort.

Step 7: Filter & Sort Files

def apply_filter_sort():
    global filtered_files
    filtered_files = combined_files.copy() if current_filter.get() == "All" else [f for f in combined_files if f[3] == current_filter.get()]
    sort_key = current_sort.get()
    if sort_key == "Name":
        filtered_files.sort(key=lambda x: x[1].lower())
    elif sort_key == "Size":
        filtered_files.sort(key=lambda x: x[2])
    elif sort_key == "Folder":
        filtered_files.sort(key=lambda x: x[0].lower())
    elif sort_key == "Category":
        filtered_files.sort(key=lambda x: x[3].lower())
    update_file_tree()

Explanation:

Filters files based on category.

Sorts files based on Name, Size, Folder, or Category.

Step 8: Display Files in GUI

def update_file_tree():
    file_tree.delete(*file_tree.get_children())
    counts = {"Small_1MB":0, "Medium_1-10MB":0, "Large_10MB_plus":0}
    for folder, f, size, category in filtered_files:
        file_tree.insert("", "end", values=(folder, f, format_size(size), category))
        counts[category] += 1
    total_files_var.set(f"Total Files: {len(filtered_files)} | Small: {counts['Small_1MB']} | Medium: {counts['Medium_1-10MB']} | Large: {counts['Large_10MB_plus']}")

Explanation:

Updates the Treeview widget with current files.

Shows counts by category.

Step 9: Organize & Undo Files

def organize_files_thread():
    global watcher_paused
    watcher_paused = True
    for folder, f, size, category in combined_files:
        src = os.path.join(folder, f)
        dst_folder = os.path.join(folder, category)
        os.makedirs(dst_folder, exist_ok=True)
        dst = os.path.join(dst_folder, f)
        try:
            shutil.move(src, dst)
        except Exception:
            pass
    watcher_paused = False
    refresh_combined_preview()

Explanation:

Moves files into categorized folders.

Pauses the watcher to avoid triggering real-time updates during move.

def undo_last_operation():
    # Reverse the last organize operation
    pass  # Implement similarly to `organize_files_thread()` with reversed moves

Step 10: Real-time Folder Watching

class FolderEventHandler(FileSystemEventHandler):
    def on_any_event(self, event):
        if watcher_paused:
            return
        root.after(200, refresh_combined_preview)

def start_watcher(folder_path):
    global observer
    if observer is None:
        observer = Observer()
        observer.start()
    handler = FolderEventHandler()
    observer.schedule(handler, folder_path, recursive=True)

Explanation:

Uses watchdog to detect any file changes.

Updates GUI automatically.

Step 11: Drag & Drop Support

def drop(event):
    paths = root.tk.splitlist(event.data)
    for path in paths:
        if os.path.isdir(path):
            add_folder(path)

root.drop_target_register(DND_FILES)
root.dnd_bind('<<Drop>>', drop)

Explanation:

Enables dragging folders into the app.

Automatically adds dropped folders.

Step 12: GUI Layout

Set up the GUI frames, buttons, Treeview, filters, progress bar, and status bar.
Due to length, this can be broken into smaller sub-steps in your final tutorial.

File Size Organizer Pro - Filter & Sort

How I synced Cursor, Claude, and Windsurf with one shared brain (MCP)

2026-02-10 12:22:43

The "AHA" moment for me wasn't when I first used an AI coder. It was when I realized I was fragmented.

I’d do a deep architectural brainstorm in Claude, switch to Cursor to implement, and then jump into Windsurf to use its agentic flow. But Claude didn't know what Cursor did, and Cursor had no idea about the architectural epiphany I just had in Claude.

I was manually copy-pasting my own "brain" across tabs.

Then I built Nucleus.
Nucleus Architecture

The Architecture of Sovereignty

Nucleus is an MCP (Model Context Protocol) Recursive Aggregator.

Instead of treating MCP servers as individual plugins, Nucleus treats them as a Unified Control Plane. It creates a local-first memory layer (we call them Engrams) that stays on your hardware.

When I teach Claude something, it writes to the Nucleus ledger. When I open Cursor, Cursor reads that same ledger. The context is no longer session-bound; it's persistent and sovereign.

Why This Matters (The "Governance" Bit)

We’ve all seen the security warnings about giving agents full filesystem access. Nucleus solves this with a Hypervisor layer:

  1. Default Deny: No tool gets access to your drive unless explicitly granted.
  2. DSoR (Decision System of Record): Every single tool call and agent decision is SHA-256 hashed and logged. You can audit precisely why an agent decided to delete a file.
  3. Local First: Your strategic data never leaves your machine.

The Stack

  • Python/MCP for the recursive server logic.
  • Local-first storage for the data layer.
  • Control Plane UI (Planned / Coming Soon).

Join the Sovereign Movement

We just open-sourced the v1.0.1 (Sovereign) release on GitHub.

If you're tired of being a "context-courier" between agents, come check it out.

👉 Nucleus on GitHub
👉 PyPI: nucleus-mcp
👉 See it in action (58s demo): Watch the demo

Let’s stop building silos and start building a shared brain. 🚀🌕