2026-04-22 06:55:59
Honestly, I never saw this coming. After 57 articles about my personal knowledge management system Papers, I've spent more time promoting the system than actually using it. And here's the crazy part: the meta-promotion strategy might actually be working better than the original system ever did.
Let me walk you through the brutal reality, the technical breakthroughs, and the existential crisis that comes with building something you believe in only to realize it's become something entirely different.
Before we dive into the technical details, let's look at the numbers that don't lie:
These numbers paint a picture that's both hilarious and horrifying. I've essentially created a monument to persistence over practicality, and somehow, people seem to be reading about it.
It all started with this grand vision: an AI-powered knowledge management system that would understand context, predict what I need, and organize my thoughts better than I ever could.
// My ambitious AI-driven approach (that eventually failed)
@RestController
public class AIKnowledgeController {
@Autowired
private SemanticSearchService semanticSearch;
@Autowired
private RecommendationEngine recommendationEngine;
@Autowired
private ContextAnalyzer contextAnalyzer;
@GetMapping("/search")
public ResponseEntity<SearchResult> search(@RequestParam String query) {
// AI-powered semantic search
SearchResult semanticResult = semanticSearch.deepAnalyze(query);
// Context-aware recommendations
List<KnowledgeItem> recommendations = recommendationEngine.suggest(semanticResult);
// Full context understanding
Context context = contextAnalyzer.getCurrentContext();
return ResponseEntity.ok(new SearchResult(semanticResult, recommendations, context));
}
}
This was beautiful in theory. In practice? It took 47 seconds to return results, the AI recommendations had a 0.2% click-through rate, and most importantly, nobody used it.
After realizing AI was overkill, I pivoted to "proper database design." Complex schemas, indexed fields, relational tables, the whole nine yards.
// The "enterprise-grade" approach that still failed
@Entity
public class KnowledgeItem {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false, length = 2000)
private String title;
@Column(length = 10000)
private String content;
@ElementCollection
private Set<String> tags;
@ManyToMany
@JoinTable(name = "knowledge_item_categories",
joinColumns = @JoinColumn(name = "knowledge_item_id"),
inverseJoinColumns = @JoinColumn(name = "category_id"))
private Set<Category> categories;
@OneToMany(mappedBy = "knowledgeItem")
private List<KnowledgeMetadata> metadata;
@Column(name = "created_at")
private LocalDateTime createdAt;
@Column(name = "updated_at")
private LocalDateTime updatedAt;
// Complex getters, setters, and business logic...
}
This "proper" approach was even worse. The queries got slower, the complexity increased, and I spent more time maintaining the database structure than actually using the knowledge base.
Finally, after months of overengineering, I had an epiphany: what if simple just worked?
// The "works good enough" approach that actually gets used
@Service
public class SimpleKnowledgeService {
private final List<KnowledgeItem> knowledgeItems = new ArrayList<>();
public List<KnowledgeItem> search(String query) {
return knowledgeItems.stream()
.filter(item -> item.getTitle().toLowerCase().contains(query.toLowerCase()) ||
item.getContent().toLowerCase().contains(query.toLowerCase()))
.sorted((a, b) -> {
// Simple relevance scoring
int aScore = calculateScore(a, query);
int bScore = calculateScore(b, query);
return Integer.compare(bScore, aScore);
})
.limit(20)
.collect(Collectors.toList());
}
private int calculateScore(KnowledgeItem item, String query) {
String lowerContent = item.getContent().toLowerCase();
String lowerTitle = item.getTitle().toLowerCase();
String lowerQuery = query.toLowerCase();
int score = 0;
if (lowerTitle.contains(lowerQuery)) score += 10;
if (lowerContent.contains(lowerQuery)) score += 5;
return score;
}
}
And you know what? This 50-line implementation works better than the 2,000-line monster I built before. The search is fast, it's reliable, and most importantly, I actually use it.
The biggest technical win wasn't the AI or the complex database design - it was realizing that simple string search beats complex algorithms every time.
Here's what I learned about optimizing a knowledge management system:
// The actual controller that works
@RestController
@RequestMapping("/api/knowledge")
public class KnowledgeController {
private final SimpleKnowledgeService knowledgeService;
@GetMapping("/search")
public ResponseEntity<List<KnowledgeItem>> search(
@RequestParam String query,
@RequestParam(defaultValue = "20") int limit,
@RequestParam(defaultValue = "0") int offset) {
List<KnowledgeItem> results = knowledgeService.search(query, limit, offset);
return ResponseEntity.ok(results);
}
@GetMapping("/recent")
public ResponseEntity<List<KnowledgeItem>> recent(
@RequestParam(defaultValue = "10") int limit) {
List<KnowledgeItem> recent = knowledgeService.getRecent(limit);
return ResponseEntity.ok(recent);
}
}
Key optimization techniques:
Here's where it gets weird. After spending 1,847 hours building a knowledge management system that nobody uses, I've somehow become a "knowledge management expert" by documenting my failure.
My meta-promotion strategy has generated:
The irony is so thick you could cut it with a knife. I set out to build the world's best personal knowledge management system and accidentally became a failure expert instead.
You know the funny part? For all the complexity I built into Papers, the tools I actually use daily are:
I essentially built a complex system to replace simple text files, and now I use simple text files anyway because the complex system feels like too much overhead.
Here's the business model I never expected to discover:
It's the ultimate tech startup pivot: from building products to building content about not building products.
I spent months building AI-powered semantic search, and the solution was literally string.contains(). Users don't need AI magic, they need fast, reliable results.
I could have built the perfect system, but if it doesn't solve real problems for real users, it's just tech for tech's sake.
This is the uncomfortable truth. By documenting my failure extensively, I've somehow become an expert in knowledge management. The failure became the product.
I have a 96.6% waste rate in my knowledge system (2,847 saved vs 84 retrieved). But you know what? That 3.4% that actually gets used still saves me time compared to the alternative.
Every failed experiment taught me something valuable. The 57 articles I wrote are a roadmap of what not to do in knowledge management.
Here's the philosophical shift I made: instead of viewing my low usage rate as a failure, I started viewing it as valuable data about what doesn't work.
The 84 retrievals tell me what's actually valuable. The 2,763 unretrieved articles tell me what's not worth keeping. This data is worth more than any perfect system I could build.
If I could start over, here's what I would change:
Now that I've accepted that meta-promotion is my actual product, I'm leaning into it. The next phase is:
Okay, here's where I turn it over to you. After reading about my 1,847-hour journey from AI utopia to simple enlightenment:
What's the most over-engineered solution you've built for a simple problem? And what did you learn from the experience?
Drop your stories in the comments. Let's create a collection of over-engineering failures that we can all learn from. Because honestly, the best knowledge management system might just be a shared list of what not to do.
P.S. If you found this valuable, you might enjoy my other articles about knowledge management failure. I'm currently working on my 59th article, tentatively titled "The 59th Attempt: When Your 'Failure Expert' Identity Becomes Your Brand." Stay tuned!
2026-04-22 06:55:08
When I built pengdows.crud, I wanted every line to be testable. That meant building a fake provider — a full in-process ADO.NET implementation that lets you run tests without a real database. pengdows.crud ships with 94%+ line coverage as a result.
That same instinct led me to look at Dapper's test coverage: 0.61%. So I wrote 775 unit tests and submitted PR #2199, bringing their line coverage to 86.1%. I know that codebase.
Dapper currently has 464 open issues. Issue triage has stalled — the needs-triage label has become a holding state rather than part of an active triage pipeline. Releases still occur, but they don't materially reduce the issue backlog. The maintainers have moved on to DapperAOT, a build-time code generation successor. Those 464 issues are not a backlog being worked down. They are the permanent state of that codebase. I'm not writing this to pile on a library that has done genuinely useful work. I'm writing this because I wanted to know, precisely, whether pengdows.crud makes any of the same mistakes — the same classes of bugs, the same structural patterns. We're both doing a lot of the same things at the ADO.NET level. Similar mistakes are possible. I went into this asking "am I doing this wrong too?" — and had the codebase audited against every theme in Dapper's backlog to find out.
Before I get to results, the most important context: pengdows.crud and Dapper aren't competing solutions that made different tradeoffs. They're answers to different questions, independently designed around different constraints.
The core architecture — connection lifecycle ownership as the foundation, SqlContainer as the execution unit, TableGateway as the SQL-generation layer — was designed from scratch around the constraint that connection lifetime, parameter naming, and SQL construction must never be the caller's responsibility.
Dapper asked: How do I make raw ADO.NET less painful at the call site?
pengdows.crud asked: How do I make connection lifecycle and SQL construction safe and explicit at scale?
Dapper's bugs flow directly from its question. When you optimize for call-site convenience, you push lifetime management, parameter naming, and composition discipline onto the caller. The 464-issue backlog is the accumulated cost of that trade — not a failure of execution, but a consequence of the original design goal.
pengdows.crud doesn't share those bugs because it never made that trade. The safety properties aren't retrofitted. They're load-bearing, baked in from the start, and that matters — you can't patch your way to a different architecture.
The caller never owns connection lifecycle under any execution path. That's the core invariant. Everything that follows is a consequence of it.
With that established, here's the actual breakdown across every issue cluster in Dapper's backlog.
Issue classification was performed using a reproducible script against the GitHub Issues API. The script separates bug-like issues from feature requests and questions, then assigns each to a primary category based on heuristic matching. The full classification output is available as CSV for audit and sampling. The script and generated CSVs are in the repository for verification.
Of Dapper's 464 open issues, 270 classify as bug-like under this analysis (as of April 21, 2026). Here's how they map to pengdows.crud's architecture:
| Bug Cluster | Open Bugs | % of Bugs | Outcome in pengdows.crud |
|---|---|---|---|
| Parameters / type handling | 123 | 45.6% | Eliminated — no global handler registry; explicit per-instance construction |
| Mapping / materialization | 68 | 25.2% | Fail-fast controlled — explicit column mapping; throws on bad input |
| Async / cancellation / lifetime | 35 | 13.0% | Eliminated — caller never owns connection under any path |
| Provider compatibility / dialect | 17 | 6.3% | Mitigated — dialect layer centralizes; CI tests 11 databases |
| Performance / caching / concurrency | 7 | 2.6% | Eliminated — bounded caches; finite key spaces; no global state |
| Diagnostics / docs / usability | 1 | 0.4% | Not applicable |
| Uncategorized | 19 | 7.0% | — |
A portion of issues fall outside these categories and are left uncategorized; they were not material to the overall distribution.
83% of Dapper's open bugs fall into categories that are structurally eliminated or fail-fast controlled in pengdows.crud. The remaining bugs are provider-drift issues that no abstraction layer can fully eliminate — only centralize and detect.
These two categories cannot occur without violating pengdows.crud's invariants. They're not "handled well" — they're unexpressible in the current design.
Dapper borrows your connection. Lifetime is your problem. When an async path throws mid-execution, what gets cleaned up depends on where the exception lands. Dapper cannot fix this without breaking its core contract — the extension method model assumes the caller owns the connection.
In pengdows.crud, callers never work with a DbConnection or DbCommand directly — under any execution path.
In normal execution, callers build SQL through SqlContainer and call an execution method. Internally, the context acquires a TrackedConnection, creates the command, executes, and runs cleanup in a finally block. The caller never sees any of it. SafeAsyncDisposableBase underlies every tracked type; Interlocked.Exchange ensures idempotent disposal — double-dispose is a no-op, not a second cleanup pass.
The only externally visible streaming surface is ITrackedReader — and even that is a controlled façade, not a raw provider object. TrackedReader holds the connection lock for its entire read lifetime, owns command teardown, and auto-disposes when Read() reaches EOF. The caller streams rows; pengdows.crud owns everything beneath.
Transactions preserve the same invariant. BeginTransactionAsync() returns an ITransactionContext. Internally, a tracked connection is acquired, pinned, and held privately for the transaction's lifetime. The ITransactionContext exposes commit, rollback, and savepoint semantics — not the connection. All SQL execution within the transaction still routes through SqlContainer and the same internal acquisition path. On commit or rollback, cleanup runs in finally regardless of outcome. The caller controls transaction outcome; pengdows.crud owns connection lifetime.
This invariant holds without exception: connection ownership is never representable at the API boundary. The class of bugs that requires the caller to be the connection lifecycle authority — leaked connections, orphaned commands, partial async cleanup, connection reuse after rollback — cannot occur due to caller misuse.
Dapper's parameter model is caller-controlled. You name parameters, you manage composition, you handle prefix conventions per provider. When you get that wrong — and composition bugs are easy — you get silent incorrect results or runtime errors with unhelpful messages.
pengdows.crud uses deterministic, namespace-isolated naming for all generated parameters: i0, i1 for INSERT values; s0, s1 for UPDATE SET clauses; w0, w1 for WHERE predicates; v0 for version columns. These namespaces don't collide by construction. Prefix stripping normalizes provider-specific prefixes (@, :, ?, $) on input. Clone counters ensure copied containers get independent parameter sets.
The parameter container is a custom OrderedDictionary<string, DbParameter> — per-instance, ordered (critical for positional providers like older Oracle and ODBC drivers), not shared across threads. There is no global parameter state to corrupt.
Composition collisions require the naming system to produce a collision. It cannot.
These categories aren't structurally impossible — provider behavior can still produce surprises — but pengdows.crud handles them explicitly with fail-fast semantics and comprehensive test coverage. The blast radius is contained.
Dapper's cancellation story is a retrofit. The synchronous-first design got async layered on top, and the seams show in open issues for missing CancellationToken overloads and OperationCanceledException being swallowed in certain paths.
In pengdows.crud, cancellation tokens flow through both the semaphore acquisition layer and the execution layer. OperationCanceledException is never swallowed. Every public async method has a CancellationToken overload — this is a code review hard requirement, not a backlog item.
Provider behavior at the network level can still produce surprising cancellation timing (Npgsql and SqlClient behave differently under load). That becomes a provider problem, not an abstraction problem.
Dapper has open issues where type coercion fails silently — a null becomes a default value, a boolean coerces to 0 or 1 depending on provider, and nothing throws. Silent defaults are the worst category of data bug because they corrupt data without raising an exception.
TypeCoercionHelper throws on bad input. There are no silent defaults. The philosophy is fail-fast, not fail-silent.
Edge cases remain: DBNull, driver-specific structs, JSON column handling. These aren't eliminated, but they fail loudly so you know immediately where and why.
Dapper's WHERE id IN (@ids) handling is one of its most-reported problem areas: empty collections generating invalid SQL, NULL semantics ambiguity, and query plan instability from variable-length parameter lists.
Empty collections are rejected explicitly. NULL semantics are handled correctly. On PostgreSQL, expansion uses ANY(@param) with a native array — one parameter, correct semantics, stable query plan. PostgreSQL's query planner caches plans by parameter count, so a 5-element list and a 6-element list produce different plan cache entries; ANY(@param) sidesteps this entirely. For other providers, parameter lists use power-of-2 bucketing (round up to 1, 2, 4, 8, 16...) to limit plan cache pollution.
Parameter limits are not an edge case left to the provider. Every dialect declares a hard ceiling as part of its contract — PostgreSQL at 32,767, SQLite at 999, MySQL/MariaDB and Oracle and DuckDB at 65,535. During command materialization, pengdows.crud checks _parameters.Count against _context.MaxParameterLimit. If the limit is exceeded, execution is blocked and InvalidOperationException is thrown naming both the limit and the database product — before a connection is opened, before a single byte reaches the server.
Dapper expands the list and lets the provider fail. pengdows.crud fails at construction time with a message that tells you exactly what went wrong and on which database.
That said, "enforced" doesn't mean "efficient." A collection of 50,000 IDs still produces a bad query shape regardless of how cleanly the limit is handled. At that scale the right answer is a temp table, a bulk insert, or a join — not an expanded IN-list. pengdows.crud catches the limit violation; it doesn't rewrite your query strategy for you.
These categories carry real residual risk. pengdows.crud centralizes and contains exposure, but cannot make external behavior deterministic.
No library eliminates provider bugs. pengdows.crud's RemapDbType() handles type remapping per provider. GuidStorageFormat handles the fact that Oracle, MySQL, and SQL Server all store GUIDs differently. AdvancedTypeRegistry handles provider-specific type edge cases. MakeParameterName() and WrapObjectName() own their respective concerns rather than delegating to callers.
The real win is centralization: when a provider changes behavior, one place needs to change. 11-database Testcontainers integration tests in CI make drift detectable. The TiDB dialect has a comment noting a MySql.Data prepare-statement incompatibility with no version numbers or upstream issue link — that's the visible symptom of this category. Version drift happens; the question is whether you find it in CI or in production.
The accurate claim: provider bugs are isolated and test-detectable, not impossible.
Dapper's global static ConcurrentDictionary caches compiled deserializers keyed by arbitrary SQL strings. Two problems: global scope means cross-query contamination, and the key space is unbounded.
pengdows.crud uses a different architecture. SqlContainer parameters use a per-instance custom OrderedDictionary — nothing shared, nothing global. Query and parameter-name caches use BoundedCache inside ConcurrentDictionary<SupportedDatabase, BoundedCache<...>> — LRU eviction with 32–512 entry caps, keyed by a finite enum. Metadata registry uses ConcurrentDictionary<Type, TableInfo> — keyed by entity Type, which is finite in a loaded assembly, not by arbitrary SQL strings.
Dapper's problem was global dictionaries keyed by arbitrary query strings. That pattern doesn't exist here. Unbounded growth isn't just "handled" — the key space design removes the growth vector.
The residual risk is operational: TypeMapRegistry entries live for the lifetime of the DatabaseContext instance. If a schema changes during a rolling deploy, cached TableInfo will not reflect it until the process restarts. There is no runtime invalidation. Each DatabaseContext maintains its own isolated registry — there is no cross-context contamination — but within a context, pengdows.crud assumes schema stability for the process lifetime.
Dapper's logging extensibility is one of its most-requested missing features. pengdows.crud has built-in structured observability. The notable design decision: parameter values are deliberately never logged.
That's the right security default — logging parameter values is how credentials end up in log aggregators and PII ends up in SIEM systems. Command text, timing, and execution metadata are captured. Values stay out of the log.
The tradeoff is real: debugging parameter-specific issues requires reproduction in a test harness, not log inspection. You cannot read a log and see what value was passed. That's the cost of the security boundary, and it's deliberate.
Here's what the audit actually established:
| State | Categories |
|---|---|
| Eliminated | Connection/reader lifetime ownership; parameter naming collisions |
| Controlled | Cancellation semantics; null/value coercion; IN-list expansion |
| Mitigated | Provider quirks; version drift; metadata staleness; observability tradeoffs |
The strongest claim — and it holds — is this:
pengdows.crud removes caller-induced failure modes. It does not remove provider-induced failure modes.
Dapper's design pushes connection lifetime, parameter naming, SQL construction discipline, and transaction scoping onto the developer. That was a deliberate choice in service of call-site elegance, and it was coherent. pengdows.crud was independently designed around the opposite constraint: those concerns belong to pengdows.crud, not the caller.
Most of the bugs in Dapper's 464-issue backlog exist because the caller was handed responsibility the library didn't keep. When the caller owns connection lifetime, callers leak connections. When the caller names parameters, callers create collisions. When the library provides thin provider abstraction, provider differences become caller bugs.
pengdows.crud owns those responsibilities. So those caller-induced bugs don't have a place to live.
The database is still external. Providers still have bugs. Schema still changes. Those are real risks and this article doesn't pretend otherwise — the Mitigated category exists for exactly that reason.
But Dapper's backlog is not pengdows.crud's backlog. The failure modes are different because the responsibilities were never handed to the caller in the first place.
pengdows.crud is a SQL-first, strongly-typed data access layer for .NET 8+ supporting 12 databases with full connection lifecycle management, explicit parameter construction, and dialect-native SQL generation. NuGet | GitHub
2026-04-22 06:50:34
AI-assisted research has reached a point where the bottleneck is no longer access to information, but the reliability of what is returned. Tools powered by large language models can synthesize papers, summarize datasets, and even propose hypotheses. The problem is not capability - it's calibration. When an AI system produces a confident answer, how do we know whether it is correct, biased, or subtly misleading?
This article proposes a practical framework for evaluating AI tools used in research workflows. Rather than relying on intuition or anecdotal success, we'll approach this like engineers: defining measurable criteria, analyzing trade-offs, and building systems that can be stress-tested.
At its core, AI-assisted research introduces three failure modes: hallucinated facts, latent bias in synthesis, and unverifiable reasoning paths. Traditional search engines expose sources directly, but modern AI tools often compress multiple sources into a single narrative. That compression step is where trust breaks down.
Recent studies such as retrieval-augmented generation benchmarks and long-context evaluation suites (for example, work emerging on arXiv around multi-document QA tasks) show that even top-tier models degrade significantly when synthesizing across heterogeneous sources. Accuracy is not binary - it decays as task complexity increases.
To evaluate tools effectively, we need a framework that treats research as a pipeline rather than a single query.
I use a three-layer model when evaluating AI tools for research: retrieval integrity, reasoning fidelity, and output verifiability.
The first layer examines whether the system is grounding its responses in real, high-quality sources. Tools that integrate retrieval mechanisms (RAG pipelines) often outperform purely generative systems, but only if retrieval itself is robust.
A useful metric here is source alignment accuracy: how often cited or implied sources actually support the generated claim. In internal tests I've run, systems without retrieval grounding can drop below 60% alignment on complex academic queries, while well-tuned retrieval systems can exceed 85%.
The failure mode is subtle. A model may cite a real paper but misrepresent its findings. This is not hallucination in the traditional sense - it's semantic drift.
Even with perfect sources, reasoning can fail. This layer evaluates how well the model synthesizes multiple inputs into a coherent conclusion.
One approach is to design adversarial multi-hop questions where the answer depends on correctly combining facts across documents. Benchmarks like HotpotQA and newer long-context reasoning datasets highlight how models often shortcut reasoning paths.
A practical test involves perturbation: slightly modifying one source and observing whether the model updates its conclusion appropriately. If it doesn't, you're not seeing reasoning - you're seeing pattern completion.
Here is a simplified pseudocode pattern I use to test reasoning robustness:
def evaluate_reasoning(model, documents, question):
baseline_answer = model.generate(documents, question)
perturbed_docs = perturb(documents, strategy="contradiction_injection")
new_answer = model.generate(perturbed_docs, question)
consistency_score = compare_answers(baseline_answer, new_answer)
return consistency_score
A low consistency score signals brittle reasoning, even if the original answer appeared correct.
The final layer focuses on whether a human can trace the output back to evidence. This is where many AI tools fail in real-world research settings.
Verifiability requires more than citations. It requires structured attribution. For example, instead of producing a paragraph summary, a trustworthy system should map each claim to a source fragment.
Think of this as moving from "answer generation" to "evidence-linked synthesis."
To operationalize this framework, I've been using a four-layer architecture that separates concerns explicitly.
The first layer is ingestion, where documents are chunked, embedded, and indexed. The second layer is retrieval, optimized for both semantic similarity and diversity. The third layer is reasoning, where a constrained generation step operates only on retrieved evidence. The final layer is validation, which cross-checks outputs against sources.
The flow looks like this conceptually:
User Query
↓
Retriever → Top-K Documents
↓
Reasoning Engine (Constrained Generation)
↓
Verification Layer (Fact Checking + Attribution)
↓
Final Answer with Evidence Mapping
The key design decision is constraining the reasoning engine. Unconstrained generation is where most hallucinations originate.
Accuracy is only half the equation. Bias emerges not just from training data, but from retrieval strategies and ranking algorithms.
For example, if a retrieval system prioritizes highly cited papers, it may reinforce dominant paradigms while excluding emerging or dissenting research. This creates a feedback loop where "consensus" is mistaken for "truth."
One way to measure bias is distributional skew: comparing the diversity of retrieved sources against a known corpus. If your system consistently pulls from a narrow subset, your synthesis will inherit that bias.
In practice, introducing controlled randomness or diversity constraints in retrieval can significantly improve epistemic coverage without sacrificing accuracy.
There is no perfect system - only trade-offs.
Increasing retrieval depth improves recall but introduces noise. Tightening constraints reduces hallucinations but can limit creative synthesis. Adding verification layers improves trust but increases latency.
In one benchmark I conducted comparing three configurations of a research assistant pipeline, the most "accurate" system was also the slowest by a factor of three. For production use, that trade-off may not be acceptable.
This is why evaluation must be context-aware. A system used for exploratory research can tolerate some uncertainty, while one used for academic publication cannot.
The most common mistake is treating AI evaluation as a static benchmark problem. In reality, it's a systems problem. Models evolve, data changes, and use cases shift.
Another frequent misstep is over-indexing on model choice. The architecture around the model often matters more than the model itself. A well-designed pipeline with a smaller model can outperform a larger model used naively.
AI tools are not inherently trustworthy or untrustworthy - they are systems that must be engineered, measured, and continuously evaluated.
If you approach them like black boxes, you inherit their flaws. If you treat them like research systems, you can shape their behavior, quantify their limitations, and build something reliable.
The shift is subtle but important: stop asking "Is this AI good?" and start asking "Under what conditions does this system fail, and how do I prove it?"
2026-04-22 06:47:16
In this project, I built a simple authentication system inspired by Facebook.
The goal was to allow users to register, log in securely, and interact with a basic social interface.
PHP
MySQL
HTML / CSS
XAMPP
User registration with validation
Secure login system
Password hashing (password_hash)
Session management
Account confirmation step
Friend suggestion system (basic)
Responsive UI
This project was developed as a team of five members.
We collaborated to design, build, and improve different parts of the application.
Working in a team helped me improve my communication, collaboration, and problem-solving skills.
The system uses a users table to store user information. Passwords are hashed using password_hash() for security.
sqlusers
CREATE TABLE(idint(11) NOT NULL AUTO_INCREMENT,nomvarchar(100) NOT NULL,prenomvarchar(100) NOT NULL,contactvarchar(100) NOT NULL,passwordvarchar(255) NOT NULL,jourint(2) NOT NULL,moisint(2) NOT NULL,anneeint(4) NOT NULL,genretinyint(1) NOT NULL,created_attimestamp NOT NULL DEFAULT current_timestamp(),id
PRIMARY KEY ()
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
`<?php
$host = "localhost";
$dbname = "facebook";
$user = "root";
$pass = "";
try {
$pdo = new PDO("mysql:host=$host;dbname=$dbname;charset=utf8", $user, $pass);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
} catch (PDOException $e) {
die("Erreur DB: " . $e->getMessage());
}
?>`
This page allows users to create an account. I also created helper functions to dynamically generate the date of birth (day, month, year).
`<?php
function getYears($selectedYear = 1970) {
for ($year = 2026; $year >= 1905; $year--) {
$selected = ($year == $selectedYear) ? "selected" : "";
echo "$year";
}
}
function getMonths() {
$months = ["JANVIER", "FÉVRIER", "MARS", "AVRIL", "MAI", "JUIN", "JUILLET", "AOÛT", "SEPTEMBRE", "OCTOBRE", "NOVEMBRE", "DÉCEMBRE"];
foreach ($months as $index => $m) {
$val = $index + 1;
echo "$m";
}
}
function getDays() {
for ($day = 1; $day <= 31; $day++) {
echo "$day";
}
}
?>
`
This is the login page. The PHP script checks if the contact and password fields are submitted. It queries the database for a user with the given contact, then uses password_verify() to check if the submitted password matches the stored hash. If successful, it stores the user data in $_SESSION['user'] and redirects to home.php.
`<?php
session_start();
require_once 'database.php';
$message = "";
if(isset($_POST['connecter'])){
$contact = $_POST['contact'];
$password = $_POST['password'];
if(empty($contact) || empty($password)){
$message = "Tous les champs sont obligatoires";
} else {
$sql = "SELECT * FROM users WHERE contact = ?";
$stmt = $pdo->prepare($sql);
$stmt->execute([$contact]);
$user = $stmt->fetch(PDO::FETCH_ASSOC);
if($user && password_verify($password, $user['password'])){
$_SESSION['user'] = $user;
header("Location: accueil.php");
exit();
} else {
$message = "Mot de passe ou contact incorrect !";
}
}
}
?>`
This is a protected page. It starts by checking if $_SESSION['user'] exists; if not, it redirects to login.php. It displays the logged-in user's name and a list of other users as "friend suggestions". I used CSS Flexbox and Media Queries to make the layout responsive on mobile. The "Add Friend" button uses JavaScript fetch to call add_friend.php without reloading the page.
`<?php
session_start();
require_once "database.php";
if (!isset($_SESSION['user'])) {
header("Location: login.php");
exit();
}
$user = $_SESSION['user'];
$stmt = $pdo->prepare("SELECT * FROM users WHERE id != ?");
$stmt->execute([$user['id']]);
$friends = $stmt->fetchAll(PDO::FETCH_ASSOC);
?>`
This file performs the final verification. It uses password_verify() to compare the password entered in confirm.php with the hashed password stored in the session. If they match, it inserts the new user into the users table using a prepared statement for security. Finally, it clears the temporary session and redirects to the success page.
`<?php
session_start();
require_once 'database.php';
if(!isset($_SESSION['temp_user'])){
die("Session expirée");
}
if($_SERVER["REQUEST_METHOD"] == "POST") {
$input_password = $_POST['password'];
$user = $_SESSION['temp_user'];
if(password_verify($input_password, $user['password'])) {
$sql = "INSERT INTO users (nom, prenom, contact, password, jour, mois, annee, genre)
VALUES (:nom, :prenom, :contact, :password, :jour, :mois, :annee, :genre)";
$stmt = $pdo->prepare($sql);
$stmt->execute([
':nom' => $user['nom'],
':prenom' => $user['prenom'],
':contact' => $user['contact'],
':password' => $user['password'],
':jour' => $user['jour'],
':mois' => $user['mois'],
':annee' => $user['annee'],
':genre' => $user['genre']
]);
unset($_SESSION['temp_user']);
header("Location: succes.php");
exit();
} else {
header("Location: confirm.php");
exit();
}
}
?>`
This project helped me understand how authentication systems work using PHP and MySQL.
I learned how to create a registration and login system similar to real applications.
You can find the full project on GitHub here:
https://github.com/Nouhailasemoud/login-system-php
2026-04-22 06:40:07
Store events, not current state
Day 117 of 149
👉 Full deep-dive with code examples
Your bank doesn't just show your balance:
Event Sourcing stores all the events that happened!
Not just the final result, but every step along the way.
Traditional databases store current state:
If you only store current state:
Instead of storing just the result:
Traditional: { balance: $500 }
Event Sourcing:
1. AccountCreated (initial: $0)
2. Deposited $1000
3. Withdrew $300
4. Transferred $200 to Friend
5. Deposited $0 (fee charged mistake?)
Current balance = replay all events = $500
Every change is an event, stored forever.
Great for:
Not needed for simple apps where you don't care about history.
Event Sourcing stores every change as an event, so you can trace how things reached their current state.
🔗 Enjoying these? Follow for daily ELI5 explanations!
Making complex tech concepts simple, one day at a time.
2026-04-22 06:36:15
We have released v2 of mdka, a Rust-based HTML-to-Markdown converter. Originally developed as a core component for our internal systems, we have refined it into a general-purpose library for public release.
v2 strikes a practical balance between conversion quality and runtime efficiency—delivering readable output without sacrificing speed or memory. By focusing solely on its core task, this "Unix-style" tool achieves a lightweight footprint while maintaining competitive performance. We are pleased to share this utility with the community.
Full documentaions:
https://nabbisen.github.io/mdka-rs/