MoreRSS

site iconHackerNoonModify

We are an open and international community of 45,000+ contributing writers publishing stories and expertise for 4+ million curious and insightful monthly readers.
Please copy the RSS to your reader, or quickly subscribe to:

Inoreader Feedly Follow Feedbin Local Reader

Rss preview of Blog of HackerNoon

MEXC Doubles Market Share to 9% in Two Years, CoinGecko Reports

2026-04-14 16:00:57

Victoria, Seychelles, April 13, 2026

CoinGecko has released its Spot CEX Report 2026, offering a comprehensive analysis of 12 leading centralized exchanges across multiple dimensions, including spot trading volume, market share trends, token listings, and reserve holdings. The report highlights that MEXC, the world leader in 0‑fee digital asset trading, increased its spot market share from 5% to 9% over the past two years—nearly doubling its position. Concurrently, MEXC ranked first among all major exchanges with 1,333 new token listings over the past year, demonstrating formidable competitive advantages in both asset coverage and trading activity.

Market Share Doubles as Trading Volume Remains Among Top Leaders

According to CoinGecko data, MEXC's market share surged from 5% at the beginning of 2024 to 9% in 2026, firmly cementing the platform's status among the world's leading exchanges. Furthermore, MEXC recorded $95.9 billion in spot trading volume in February 2026, officially securing its position as the second-largest exchange globally in this category.

Leading the Industry in Asset Discovery and New Token Listings

Outpacing major competitors, MEXC ranks first in new token listings among the 12 centralized exchanges covered in the report. Since January 2025, the exchange has listed 1,333 new spot tokens, sustaining an onboarding rate of approximately 100 new assets per month.

To contextualize this scale, CoinGecko tracked 7,847 newly launched tokens across the broader market during this period. By listing approximately 17% of all newly created tokens, MEXC’s listing velocity aggressively outperforms the industry baseline, where most major competitors capture less than 5%. This performance demonstrates the operational superiority of MEXC’s listing infrastructure. The platform remains structurally engineered to offer the broadest asset coverage, ensuring users can capitalize on early-stage projects ahead of the wider market.

0 Fees Combined With Broad Asset Selection Drive Continued User Growth

Among the 12 centralized exchanges analyzed, MEXC maintains the industry's lowest baseline trading costs, enforcing a 0.00% maker fee and a 0.10% taker fee. By contrast, competing major platforms mandate baseline fees of 0.10% or higher, with some exacting up to 0.50%.

MEXC's 0-fee strategy has become a core driver of its sustained trading volume growth, helping millions of users worldwide save significantly on trading costs. Combined with 2,350 listed assets, this fee advantage has made MEXC a preferred platform for traders seeking both cost efficiency and broad asset diversity.

274.6% Reserve Expansion and 101M USDT Guardian Fund Anchor Platform Security

The CoinGecko report also highlights substantial changes in exchange reserves. Between January 2024 and February 2026, MEXC’s reserve value grew by 274.6%, reflecting accelerated institutional and retail capital inflows. Supporting this scale is the MEXC Guardian Fund, deployed in June 2025. Capitalized with over 100 million USDT, the fund establishes a structural defense against cybersecurity threats and technical disruptions.

Executing the Next Era of Global Leadership As MEXC reaches its eight-year milestone, the metrics confirmed by CoinGecko validate the exchange's market dominance. Rather than resting on legacy achievements, MEXC is actively deploying its resources to upgrade its core trading engine, maintain its zero-fee advantage, and expand its global market share in the upcoming growth cycle.

About MEXC

MEXC is the world’s fastest-growing cryptocurrency exchange, trusted by more than 40 million users across 170+ markets. Built on a user-first philosophy, MEXC offers industry-leading 0-fee trading and access to over 3,000 digital assets. As the Gateway to Infinite Opportunities, MEXC provides a single platform where users can easily trade cryptocurrencies alongside tokenized assets, including stocks, ETFs, commodities, and precious metals. \n

MEXC Official WebsiteXTelegramHow to Sign Up on MEXC

For media inquiries, please contact MEXC PR team: [email protected]

Source

\

The TechBeat: Microsoft Generative AI Report: The 40 Most Disrupted Jobs & The 40 Most Secure Jobs (4/14/2026)

2026-04-14 14:10:55

How are you, hacker? 🪐Want to know what's trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## AI Products Have Terrible UX: Here's Why By @deeflect [ 8 Min read ] Most AI products have terrible UX - not because the AI is bad, but because no one who understands both AI and design is building them. Read More.

OpenAI Bought TBPN Because PR Can’t Keep Up With AI

By @davidjdeal [ 6 Min read ] Read this post to understand why OpenAI bought a media company, TBPN. Read More.

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started

By @kilocode [ 6 Min read ] OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo Read More.

Why Your “Profitable” Backtest Fails the Moment You Go Live

By @grigorychikishev [ 6 Min read ] Latency, queue position, market impact, and adverse selection all distort the theoretical edge a model appears to have. Read More.

Don’t Buy the Wrong MacBook Pro: The M5 Trap Apple Won’t Mention

By @aschwabe [ 4 Min read ] The M5 Pro is the only chip built for the 14-inch MacBook Pro's thermal envelope — everything else throttles, is a generation behind, or has no fan at all. Read More.

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

By @thegeneralist [ 8 Min read ] The first workers displaced by generative AI weren't software engineers. They were translators and $1.32/hr data labelers. Philip K. Dick predicted why. Read More.

We Were Promised Jetpacks: Why AI Isn't Accelerating Feature Delivery

By @playerzero [ 6 Min read ] Despite AI coding tools generating more code than ever, engineering productivity lags because these tools excel at building, not debugging or operating systems. Read More.

Microsoft Generative AI Report: The 40 Most Disrupted Jobs & The 40 Most Secure Jobs

By @botbeat [ 21 Min read ] Discover the 40 jobs most vulnerable to gen AI & 40 most secure professions, based on an empirical Microsoft Research study of 200,000 real-world interactions. Read More.

AI Coding Tip 014 - One AGENTS.md Is Hurting Your AI Coding Assistant

By @mcsee [ 4 Min read ] Split your AGENTS.md into layered files so your AI loads only the rules that matter for the code you touch. Read More.

Free VPNs vs Paid VPNs: What Are You Actually Paying For?

By @ipvanish [ 6 Min read ] Free VPNs aren't free. Read More.

A Hidden Problem in Jetpack Compose TextField Max Length

By @indrivetech [ 3 Min read ] Jetpack Compose TextField max length works internally. The difference lies in how TextField state changes are applied. Read More.

Penetration Testing Companies: Comparing The Top 5 Vendors

By @securitymetrics [ 5 Min read ] Read this blog to get the info you need about cost, pros, and more, to pick the best pen testing vendor for your unique needs. Read More.

Qwen3.5-9b-uncensored-hauhaucs-Aggressive Model: A Beginner's Guide to Get You Started

By @aimodels44 [ 2 Min read ] Qwen3.5-9B-Uncensored-HauhauCS-Aggressive is an uncensored variant of the base Qwen3.5-9B model created by HauhauCS. Read More.

Context Graphs, Ontologies, and the Race to Fix Enterprise AI

By @linked_do [ 17 Min read ] What are context graphs, what are they good for, and why are they dubbed AI’s trillion-dollar opportunity? What does context mean, and how can it be defined? Read More.

How to Build a Voice Agent With AssemblyAI

By @assemblyai [ 6 Min read ] This tutorial shows you how to build a complete voice agent that can have natural conversations with users. Read More.

The 5 Best Suits From Marvel's Spider-Man 2: Miles Morales Version

By @joseh [ 4 Min read ] The Smoke and Mirrors suit, the Metro suit, and the Life Story suit are some of Miles' best suits in Marvel's Spider-Man 2. Read More.

30 BI Engineering Interview Questions That Actually Matter in the AI Era

By @anushakovi [ 27 Min read ] The BI interview hasn't caught up with the job. Here are 30 questions that reflect what it actually means to be a BI engineer in 2026. Read More.

Want to Have Successful OpenTelemetry Projects? Implement This One Tip

By @nfrankel [ 4 Min read ] In this post, I want to tackle a real-world use case and describe which tools you can leverage to reduce the necessary changes. Read More.

How to Build the Lowest Latency Voice Agent in Vapi: Achieving ~465ms End-to-end Latency

By @assemblyai [ 4 Min read ] In this comprehensive guide, we'll show you how to build a voice agent in Vapi that achieves an impressive ~465ms end-to-end latency. Read More.

Digital Project Abandonment Crisis: Deadweight Loss in Plain Sight

By @proofofusefulness [ 4 Min read ] What the failure data actually says — and what it means for how we build. Most digital projects fail. This is not a provocative claim. Read More. 🧑‍💻 What happened in your world this week? It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️ ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it. See you on Planet Internet! With love, The HackerNoon Team ✌️

How to Build a Tiny LLM From Scratch Using Frankenstein

2026-04-14 11:29:32

\ Understanding how LLMs work in their most fundamental phase helps you understand how they behave and why they behave like that. Building one helps debunk any theory of consciousness in LLMs, but to do so, you would have to choose carefully what text you are training the LLM with— the book “Frankenstein” is the first thought I had in my mind, for some ironic reason.

This tutorial will guide you through all the steps of building an LLM of ~3.2M parameters, solely using Mary Shelley’s Frankenstein. You would not need to run the process on your local computer, as it’s customized to be run on Kaggle Free GPU (God bless Kaggle) in under ~20 minutes if all goes well. You certainly do not need to be a programmer for this, as the code examples would be provided with explanations on syntax as well as the techniques & concepts.

Consider it a complete guide to understand and build an LLM.

(Critical: the LLM we build here would be the rawest of LLMs, as it would not go through stages of fine-tuning/RLHF as commercially available chatbots. It’s merely a model that is not here to help but make predictions and complete your prompts. Refer to Andrej Karparthy’s State of GPT for more. You can find the code here: Buzzpy/Python-Machine-Learning-Models).

Step 1: The Setup & Tokenization

Computers are fundamentally blind to language. They do not look at the word “monster“ and feel a sense of dread or fear. They see signals, which we represent as numbers. The very first step of building an LLM is translating human text into math, through tokenization.

If we were building a public-facing, much larger LLM, we would normally have to go through the process of gathering training data and cleaning it before tokenization. But in our case, since we are only using a book that’s accessible via the Gutenberg Project, we can skip the hard parts and jump into tokenization of the text.

What is Tokenization?

Tokenization is the process of breaking down text into tiny chunks of data (as numbers) that the computer is able to read. A token typically equals 0.75 of a word or roughly 3-4 characters.

Tokenization in NLP

Source: Machine Learning Expedition

For this, we are using character-level tokenization; modern, high-parameter-count models (like ChatGPT) use word-level or sub-word level tokenization for efficiency, but in our case, we are focusing our machine to learn English one single letter at a time.

To start, go to Kaggle.com, create a free account, and create your notebook. In the sidebar, make sure the “Internet” toggle is on and “Session options” are set up to run on GPU T4 x 2.

In the notebook, you will note that you can create “blocks“ of code, one already available as an example; this is called a Cell. Replace/create Cell 1 to have this code:

# Importing packages required
import os
import torch
import torch.nn as nn
from torch.nn import functional as F
import urllib.request

# 1. Hardware Setup for Kaggle
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# 2. Hyperparameters (The Architectural Blueprint)
batch_size = 64       
block_size = 256      
max_iters = 5000      
eval_interval = 500
learning_rate = 3e-4  
n_embd = 256          
n_head = 4            
n_layer = 4           
dropout = 0.2         

torch.manual_seed(1337)

# 3. Downloading the Dataset
print("Downloading Mary Shelley's Frankenstein...")
url = "https://www.gutenberg.org/cache/epub/84/pg84.txt"
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
with urllib.request.urlopen(req) as response:
    raw_text = response.read().decode('utf-8')

start_idx = raw_text.find("Letter 1")
end_idx = raw_text.find("*** END OF THE PROJECT GUTENBERG EBOOK")
text = raw_text[start_idx:end_idx] if start_idx != -1 and end_idx != -1 else raw_text

# 4. Tokenization
chars = sorted(list(set(text)))
vocab_size = len(chars)

stoi = { ch:i for i,ch in enumerate(chars) } 
itos = { i:ch for i,ch in enumerate(chars) } 
encode = lambda s: [stoi[c] for c in s]
decode = lambda l: ''.join([itos[i] for i in l])

# 5. Creating the Tensors
data = torch.tensor(encode(text), dtype=torch.long)
n = int(0.9 * len(data))
train_data = data[:n] 
val_data = data[n:]   

def get_batch(split):
    data_source = train_data if split == 'train' else val_data
    ix = torch.randint(len(data_source) - block_size, (batch_size,))
    x = torch.stack([data_source[i:i+block_size] for i in ix])
    y = torch.stack([data_source[i+1:i+block_size+1] for i in ix])
    x, y = x.to(device), y.to(device)
    return x, y

The first part of the code (till #2) is about the training setup on Kaggle, not the LLM itself. Note that you will have to edit this setup if you prefer to run the code locally.

Mechanics explained (#2-#5):

  • #2 Hyperparameters: these are the “dials” or controls we use to indicate the size and speed of the model.

  • batch_size = 64: the model will process 64 “chunks“ of text simultaneously.

  • block_size = 256: this is the model’s short-term memory or the “context window“. It can look exactly 256 characters into the past to predict the 257th.

  • n_embed = 256: the number of dimensions in our mathematical space; we are just giving the model a 256-dimensional “room“ to organize concepts/predictions.

  • #3 Downloading: downloads “Frankenstein“ from Project Gutenberg as a TXT file.

  • #4 Tokenization: This is the syntax to split text into characters and tokenize them as discussed before.

  • #5 Tensors: A tensor is a grid of numbers. We convert the entire text of Frankenstein into one massive grid and call it “data“.

  • get_batch: The model learns by playing what I (and many, of course) call a game of “guess the next letter”.

    The function randomly grabs a chunk of text. Let’s say it grabs a block of 4 letters.

  • The Input (x): F - R - A - N

  • The Answer Key (y): R - A - N - K

    Notice that the Answer Key (y) is the exact same sequence of letters, just shifted forward by one space. The model looks at the Input and makes its guesses, and the Answer Key grades those guesses simultaneously:

  1. It looks at F and tries to guess the next letter. The answer key grades it against R.

  2. It looks at F - R and tries to guess the next letter. The answer key grades it against A.

    … this goes on until the chunk of text is completed.

By shifting the text by one letter, a 256-character blog actually gives the model 256 individual training examples of how letters follow one another.

Source: By C. Opus

\ What the get_batch function does is this splitting of data.

And that’s our Cell 1! We have tokenized the text and set up the boundaries of the LLM.

Step 2: The Core Architecture

Now that we have the main parameters set up, we have to build the actual neural network of the model. Which means that we are building a scaled-down Transformer, the exact same architecture that powers modern AI.

What is a Transformer?

A transformer is yet another machine learning model that is designed to process sequential data in parallel, rather than in order. It look at entire chunks of text all at once, using the mechanism called “Self-attention“. It allows the model to look at a specific word (like “bank“) and instantly scan every other word around it to calculate the context (as in a river or money?).

To do this, create another cell in your Kaggle Notebook (the “+ Code“ button at the end of cell 1 does this) and add this code to it:

# 1. THE ATTENTION HEAD (The "Context" Engine)
class Head(nn.Module):
    def __init__(self, head_size):
        super().__init__()
        self.key = nn.Linear(n_embd, head_size, bias=False)
        self.query = nn.Linear(n_embd, head_size, bias=False)
        self.value = nn.Linear(n_embd, head_size, bias=False)
        self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        B, T, C = x.shape
        k = self.key(x)   
        q = self.query(x) 

        # Calculate the mathematical affinities between characters
        wei = q @ k.transpose(-2, -1) * C**-0.5 
        # The Mask: Hide the future!
        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf'))
        wei = F.softmax(wei, dim=-1)
        wei = self.dropout(wei)

        v = self.value(x)
        out = wei @ v
        return out

# 2. MULTI-HEAD ATTENTION (Multiple Brains Working Together)
class MultiHeadAttention(nn.Module):
    def __init__(self, num_heads, head_size):
        super().__init__()
        self.heads = nn.ModuleList([Head(head_size) for _ in range(num_heads)])
        self.proj = nn.Linear(n_embd, n_embd)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        out = torch.cat([h(x) for h in self.heads], dim=-1)
        out = self.dropout(self.proj(out))
        return out

# 3. FEED-FORWARD (The "Thinking" Phase)
class FeedForward(nn.Module):
    def __init__(self, n_embd):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_embd, 4 * n_embd),
            nn.ReLU(),
            nn.Linear(4 * n_embd, n_embd),
            nn.Dropout(dropout),
        )

    def forward(self, x):
        return self.net(x)

# 4. THE TRANSFORMER BLOCK (Putting it Together)
class Block(nn.Module):
    def __init__(self, n_embd, n_head):
        super().__init__()
        head_size = n_embd // n_head
        self.sa = MultiHeadAttention(n_head, head_size)
        self.ffwd = FeedForward(n_embd)
        self.ln1 = nn.LayerNorm(n_embd)
        self.ln2 = nn.LayerNorm(n_embd)

    def forward(self, x):
        x = x + self.sa(self.ln1(x))
        x = x + self.ffwd(self.ln2(x))
        return x

# 5. THE FINAL LLM ASSEMBLY
class EngineeredLLM(nn.Module):
    def __init__(self):
        super().__init__()
        self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
        self.position_embedding_table = nn.Embedding(block_size, n_embd)
        self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)])
        self.ln_f = nn.LayerNorm(n_embd) 
        self.lm_head = nn.Linear(n_embd, vocab_size)

    def forward(self, idx, targets=None):
        B, T = idx.shape
        tok_emb = self.token_embedding_table(idx) 
        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) 
        x = tok_emb + pos_emb 
        x = self.blocks(x) 
        x = self.ln_f(x) 
        logits = self.lm_head(x) 

        if targets is None:
            loss = None
        else:
            B, T, C = logits.shape
            logits = logits.view(B*T, C)
            targets = targets.view(B*T)
            loss = F.cross_entropy(logits, targets)
        return logits, loss

    @torch.no_grad()
    def generate(self, idx, max_new_tokens):
        for _ in range(max_new_tokens):
            idx_cond = idx[:, -block_size:]
            logits, _ = self(idx_cond)
            logits = logits[:, -1, :] 
            probs = F.softmax(logits, dim=-1) 
            idx_next = torch.multinomial(probs, num_samples=1) 
            idx = torch.cat((idx, idx_next), dim=1) 
        return idx

This sure looks like a mix of random variables and syntax, but the underlying mechanics are fairly simple. Not really, but it’s not overwhelming at least.

Mechanics Explained:

  • #1 Attention Head: This is where context is calculated. It asks “What letter am I looking at (Query), “what letters came before me (Key)”, and “what do those letters actually mean (Value)?
  • Notice the Mask (tril). This is crucial. If we didn’t include this, the model would simply look ahead at the answer key to predict the next letter. The mask blinds the model to the future, forcing it to actually learn the statistical patterns of the past.
  • #2 Multi-head Attention: Looking at text through just one lens isn't enough. We run 4 of the above-mentioned Attention Heads in parallel. One head might focus on vowels, another might focus on punctuation, and another on capitalization. They pool their findings together at the end.
  • #3 Feed-Forward: Once the attention mechanisms gather the context from the surrounding letters, the Feed-Forward layer acts as the "reasoning" phase. It passes the data through mathematical filters to finalize its guess.
  • #4 The Transformer Block: This is simply the packaging. It combines the Attention (Context) and Feed-Forward (Reasoning) into a single, neat block of logic.
  • #5 The Final LLM Assembly: This wires everything together. It creates an Embedding Table, which assigns a set of mathematical coordinates to every character. Eventually, this table will learn that vowels cluster together in one part of the mathematical space, while consonants cluster in another. But right now? This brain is completely empty. We have built the engine, but the dials are set to random static.

"sat" pays more attention to "cat" than "the" — then predicts "on."

And that’s our Transformer architecture! A very small version of it, yes, but nevertheless functional. To teach it English, we have to force it to play a guessing game.

Step 3: Backpropagation

It’s rather a fancy word for training the model.

Right now, 3.27 million parameters in our models are completely randomized. For the model to be able to generate any coherent output from what it has learned, we have to train it.

In case you are confused:

When ML engineers talk about dials, weights or parameters, they are often talking about the exact same thing. They are quite literally a massive list of decimes stored in your computer’s RAM.

Imagine a single “neuron“ in our model is trying to decide of the next letter should be “u”. It looks at the current letter (let’s say it’s “q”). The math looks like this:

(Input “q”) × (Weight) = (Prediction “u”)

The Weight is the parameter. It is a number, like 0.842 or -0.113.

If the weight is a high positive number, it means “Yes, strongly connect ‘q’ and ‘u’.” If the weight is a negative number, it means “No, absolutely do not connect ‘q’ and ‘z’.”

When we say the model starts with “random weights,” it means the computer just randomly assigns numbers like 0.34 or -0.91 to all 3.27 million connections. That is why it guesses randomly at first.

Training an AI, as said before, is a game of “Guess the next letter“. The AI makes a guess, checks the actual text to see if it was right, calculates how wrong it was, and then mathematically turns its “dials“ to be slightly less wrong the next time. Interesting.

To train the model, then, you will have to add this code in Cell 3:

# 1. Booting up the Model and Optimizer
model = EngineeredLLM().to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

@torch.no_grad()
def estimate_loss():
    out = {}
    model.eval()
    for split in ['train', 'val']:
        losses = torch.zeros(200)
        for k in range(200):
            X, Y = get_batch(split)
            logits, loss = model(X, Y)
            losses[k] = loss.item()
        out[split] = losses.mean()
    model.train()
    return out

print(f"Forging a {sum(p.numel() for p in model.parameters())/1e6:.2f}M parameter model...")

for iter in range(max_iters):
    if iter % eval_interval == 0 or iter == max_iters - 1:
        # 2. Checking the Error Score (Loss)
        losses = estimate_loss()
        print(f"Step {iter}: Train Loss {losses['train']:.4f}, Val Loss {losses['val']:.4f}")

    # 3. Grab a chunk of text and make a guess
    xb, yb = get_batch('train')
    logits, loss = model(xb, yb)

    # 4. Backpropagation (The Calculus)
    optimizer.zero_grad(set_to_none=True)
    loss.backward()

    # 5. Updating the Weights
    optimizer.step()

print("Training complete.")

This part of the code is fairly easy to read and understand; it’s a set of often-used mechanics in LLM training.

Mechanics explained:

  • #1 The Optimizer: The optimizer (AdamW) is the algorithm responsible for actually turning the 3.27 million “dials” in the neural network to improve the model’s accuracy.
  • #2 Loss: This is the error score, calculated by referencing the original text. A higher loss means it’s guessing blindly (inaccurate), whereas a low loss means it has successfully learned the patterns of English. The earliest loss values will always be higher, as it has not “learned“ at the very beginning.
  • #3 Making a guess: We feed the model a chunk of text (xb) and the answer key(yb). It generates its predictions (logits) and compares them to the answer key to calculate the loss as described before.
  • #4 Backpropagation (loss.backward()): The fancy word again, but it’s a crucial concept in ML. Once the model knows how wrong it is in a certain guess, it uses calculus to trace that error backward through the entire network, determining exactly which of the 3.27 million dials caused the mistake.
  • #5 Updating the Weights: The optimizer takes the information from backpropagation and changes the dials slightly in the correct directions. We do this 6000 times (this allows micro “changes” to dials rather than massive ones). Note that if you change this to some higher number for accuracy, it will backfire, memorizing the book word-for-word, which is called “overfitting“

[When you run this cell, it will take roughly 20 to 30 minutes on the Kaggle GPU. You will see that the Val Loss steadily drops from around 4.6 down to ~1.2. That is the exact threshold where the model suddenly figures out how to construct 19th-century English.]

Now’s a perfect time for a sigh. The training is done.

But if you feel like we went spiraling in the training phase, here is a relatively simple explanation:

  • The model takes a chunk of text.
  • It pushes that text through the Embedding grids and Linear webs (multiplying the text by the 3.27 million random decimal numbers).
  • It spits out a guess for the next letter.
  • The Loss function grades the guess.
  • Backpropagation uses calculus to figure out exactly which of those 3.27 million decimals were slightly too high, and which were slightly too low.
  • The Optimizer (AdamW) goes in and adjusts those decimals by a tiny fraction (e.g., changing a weight from 0.110 to 0.112).

It repeats this 6,000 times. By the end, those 3.27 million decimals have been perfectly arranged so that when you multiply the letters "I am alon" by those parameters, the math perfectly equals "e".

Step 4: Inference

Why build a model if you can’t chat with it? Well, actually, you can not “chat“ with the LLM you built in the sense of chatting with ChatGPT or Claude. Because, right now, the model has no instruction or required fine-tuning to make it a helpful assistant who would answer questions like “Why does the creature have no name?”.

But it can absolutely do this: predict. The fundamental ability of all LLMs.

This means that if you feed it a “starting thought“, i.e., an incomplete sentence, it can complete it based on what it has learned. And a raw LLM’s ability to generate any sort of coherent output itself should be praised because it was built with pure math and computing!

To let it speak, add this code to Cell 4 of your Kaggle Notebook:

# 1. Lock the Weights
model.eval()

print("\n================================================")
print("      TYPE 'quit' TO EXIT.   ")
print("================================================\n")

while True:
    user_input = input("Feed the mirror a starting thought: ")

    if user_input.lower() == 'quit':
        print("Shutting down.")
        break

    # 2. Translate text to numbers
    context_list = encode(user_input)
    context_tensor = torch.tensor([context_list], dtype=torch.long, device=device)

    # 3. Generate the Output
    generated_idx = model.generate(context_tensor, max_new_tokens=500)

    # 4. Translate numbers back to text
    print("\n--- The Output ---")
    print(decode(generated_idx[0].tolist()))
    print("----------------\n")

This code is just an input-output loop that lets you input a starting thought, and the model will generate an output.

It locks the parameters so the model stops learning, takes the words you typed, convert into integers so the math can process them. It then called the generate function from Cell 2. When the model predicts the next letter, it doesn't just pick the single highest probability. If it did, it would get stuck repeating "the the the." Instead, it creates a probability distribution (e.g., "e" is 70% likely, "a" is 20% likely).

Given that the model was trained on Mary Shelley’s Frankenstein, I could not help but use these starting thoughts.

  1. Input: “You are my creator, but I am your”

    Output: “You are my creator, but I am your miserable forbiddence.”

  2. Input: “The greatest tragedy of the human soul is our unending desire to”

    Output: “The greatest tragedy of the human soul is our unending desire to made. \n Life, although they were explained by the just far as I did not see the…”

  3. Input: “My creator gave me a mind, but he forgot to”

    Output: “My creator gave me a mind, but he forgot to one, therefore that I \n knew that I have no collect.”

If the model’s output were Shelley’s writings verbatim, it has memorized them. If it did not, it has learned from it. And notice the errors! Mostly coherent, mostly not. That is what’s intended when you build an LLM based on a single piece of writing, but based on my experience, this is pretty impressive. It has learned many words and the way to put them together.

A few notes: if your input has spelling mistakes or modern words (say “AI”), the model’s math will be clueless and produce gibberish. Beware.

That is all it is for building your own large language model right from scratch.

You can find the code here: Buzzpy/Python-Machine-Learning-Models

The Five Things I Did When I Landed a Role as a DevRel

2026-04-14 11:28:08

If someone had told me early last year that I would become a DevRel before the year ran out, I would have sworn it was never going to happen. The experience I had didn't qualify me for the role, or so I assumed. DevRel, in most cases, is a senior role with a proven track record, and, to be honest, I didn't have much experience. I don't think I had ever applied for the role before.

November 3rd, 2025, changed that story as I officially became a DevRel for one of the best RPC providers in Web3. A quick backstory on how I got this role: I participated in the company's writing contest, where I submitted two entries for the hackathon. The results came in, and I was one of the finalists, but didn't win the prize. I was so pained and disappointed. I remember kneeling down in the toilet and just saying, "Thank you, Jesus." Little did I know I was getting a permanent, full-time role with this company.

The company reached out to me, appreciated the work I was doing, and yes, I got it. At first, I was overwhelmed. I remember I couldn't sleep the night I received my offer. But here I am today, thriving in my role from probation to becoming a full-time employee.

I know this is the case for some others too. Instead of just celebrating and doing the work, I first prayed to God to ask for direction and the wisdom to excel in my role. I decided to do these five things, which I believe will be helpful to you in your role.

1. Study About Developer Marketing

As I said, this is my first time as a DevRel, and trust me, I didn't want to mess it up or treat it shabbily. I first studied what my role entails and what is expected of me. I asked my bosses what was needed and what I needed to do. I went so far as to ask my friends who are DevRels what they do in their roles. I watched videos and also read a book called "Developer Marketing Does Not Exist" by Adam DuVander. This gave me the understanding I needed for the job, and I must say I'm thriving in it. My bosses are loving my work.

2. Carry Out Research About My Company's Products and Its Competitors

This may sound funny, but yes, I always check out what my company's competitors are doing. How do they write their tutorials and videos? What kind of content do they put out there, and how can I stay at the top of my game? I'm representing a whole class brand. That is what I always tell myself, and I need to be a DevRel who is at the top.

3. Improve My Coding Skills

As a DevRel, you need to understand how applications are built. So I took some time to study language documentation, take courses on one of the programming languages used in the industry, Solidity, and build a project called FairPay for the course. You can check out the live demo and the repo, which explain how the product works.

4. Grasp My Responsibilities in Order Not to Underperform or Go Above My Responsibilities

I took my time to ask a lot of questions about my responsibilities so I wouldn't underperform or overstep. This is a lesson I learned from my previous work. I took tasks I should not have taken upon myself. The moment I stopped taking them, I was labeled as "incompetent." I didn't want that to happen in this role. So I asked, and my tasks and responsibilities were explicitly clear to me.

5. Pay for Tools to Optimize My Workloads and Stay Ahead of My Deadlines

Before I started working, my friend advised me to get tools, especially AI tools like Claude, and asked me to pay for them. To be honest, that is one of the tools that is optimizing my work now. I do my work better and faster. I also paid for video tools and improved my content, and I'm still going to invest more in it.

Conclusion

These are the things I did, and I'm still doing that are keeping me at the top of my game. Trust me, if you can implement this, it will work. I approach my job with God first, and I also work smart, not hard, to be valuable and useful in it. I hope you find it helpful. If yes, let me know in the comment section.

LOVE YOU, ILEOLAMI

\ \

From Satellite Signals to Neural Networks

2026-04-14 11:26:03

Let’s be real, most AI projects die at the prototype stage. Great demo, cool tech, zero production-readiness. What separates those graveyard prototypes from systems that actually run the business? We’d argue it’s people like Andrei Shcherbinin, Team Lead aSocial Discovery Group, who’s quietly been building ML infrastructure that does serious heavy lifting.

We grabbed some time with him to talk about engineering roots, 12x speed improvements, chatbots that handle 95% of support on their own, and why he thinks knowing your math is still the best career move in tech.


🗣️ You didn’t take the bootcamp-to-big-tech route. PhD ABD, deep signal processing background — does any of that actually matter when you’re building ML systems today?

— More than people think. Engineering school doesn’t just teach you formulas — it teaches you to think in systems. When I was working with signal processing, the whole game was extracting clean, useful information from enormous streams of noisy data in real time. Turns out? That’s basically what a recommender system or attribution algorithm does. Different tools, same problem.

The other thing a serious academic background gives you is discipline around experimentation. In science, one successful run doesn’t mean anything. You need rigorous methodology. In high-load systems, that same mindset is what separates something that scales from something that collapses the moment traffic spikes.

\ 🗣️ Let’s talk about the attribution model overhaul. Marketing attribution sounds… dry. Why was it actually a hard engineering problem?

— Ha, it only sounds dry until you realize “which ad channel actually drove this purchase” is a question worth millions of dollars. Standard attribution models take a lazy shortcut — they give all the credit to the last click. Reality is messier. Users interact with a brand across multiple touchpoints before buying, and you need probabilistic models to figure out how to fairly distribute credit across all of them.

The real pain wasn’t the math, though. It was the time. Our existing calculations took six hours to run. Marketing can’t wait six hours — they need to know what’s working now. So we rebuilt the algorithms and the data processing architecture from scratch.

==Result:== ==calculation time dropped from 6 hours to 30 minutes. That’s a 12x improvement. And because we now had accurate attribution, we stopped spending budget on channels that looked good under the old model but were actually doing nothing. That math converts directly into saved money.==

\ 🗣️ 12x faster means the infrastructure underneath had to change too, right?

— Completely. We moved from manual process management to fully automated pipelines — MSSQL and BigQuery feeding into Airflow and Kafka, through to Athena and S3. Before that, a lot of processes needed human intervention, which introduced both delays and human error.

After the automation: manual workload down 80%, data latency reduced by 35%. And crucially, the architecture scales — when load increases, the system just uses more resources instead of falling over.

\ 🗣️ The support chatbot is another wild one. 95% of conversations automated, accuracy of 0.95+. How?

— The core problem with most chatbots is context blindness. We built a model that recognizes 35 distinct user intents — so it knows the difference between someone asking for a refund and someone with a technical issue, and responds appropriately.

But the part I’m genuinely proud of isn’t just the accuracy — it’s the security layer. We work with large language models, and you simply cannot feed raw user data into them. So before any message hits the neural network, it goes through an anonymization filter. Personal data stripped, then processed. Users get fast, intelligent responses; their privacy stays intact; we stay compliant.

It’s a security gateway built into the core architecture, not bolted on afterward.

\ 🗣️ You’ve made model monitoring a standard across your ML ecosystem. Why is that a bigger deal than it sounds?

— Because of something called data drift. A model trained on last month’s user behavior can quietly start making bad decisions this month — and without monitoring, you might not notice for weeks. You’re just losing money while the algorithm degrades and nobody’s looking.

Our monitoring system surfaces problems in real time. If recommendation quality dips or classification accuracy drops, engineers get notified immediately, and not via angry user tickets. We cut problem detection time by 60% and reduced model retraining prep time by 40% because the system automatically flags what’s changed and needs attention.

It turns ML maintenance from constant firefighting into something you can actually plan and manage.

🗣️ You manage a cross-functional team now — ML engineers, MLOps, backend devs. How do you get them to actually work well together?

— The classic failure mode is communication gaps. An ML engineer builds something great; a backend dev has no idea how to integrate it. You end up with handoff chaos.

We fixed that with a clear responsibility matrix and KPIs — everyone knows their scope and who to go to for what. We also standardized documentation and onboarding. New hires used to spend weeks just trying to understand the architecture. Now that it’s all documented properly, onboarding time is down 40%.

Managing a team is genuinely similar to managing a distributed system: reduce uncertainty, define interfaces clearly, and things get faster and more reliable.

\ 🗣️ Last one: what should engineers focus on if they want to build things at this level?

— The trend is toward easier model access but more complex orchestration. MLOps is becoming its own discipline, and the engineers who understand the full stack — cloud, databases, security, and the models — are going to be the most valuable people in the room.

But honestly, the thing that hasn’t changed and won’t change is the fundamentals: statistics, linear algebra, algorithmic thinking — these are constant. Tools change every six months. The principles don’t. Learn those, and you can adapt to anything.

==Also: always ask what the business impact is.== ==The attribution project wasn’t about doing fancy math — it was about optimizing a marketing budget. When your team understands that, the quality of every decision they make goes up.==


Andrei leads the ML team at Social Discovery Group. If you want to geek out about ML infrastructure, data pipelines, or what it actually takes to get a model to production — join our team! Find SDG Careers — https://socialdiscoverygroup.com/vacancies/

\

Are Silos Preventing Your Back Office and Field Teams From Collaborating?

2026-04-14 11:24:36

Siloed working environments wreak havoc on team collaboration. If you’re constantly experiencing project delays and putting out data fires, your business could be dealing with an ongoing silo.

Caused by teams or systems operating in isolation, silos can make it difficult to share information across the business. This is especially challenging if your business requires field engineers to operate away from back office teams.

In 2026, 83% of US business leaders reported that silos exist within their companies, with more than 97% saying they impact team performance.

The question is, how do you identify a silo and break its barriers before it’s too late?

How to Identify a Silo Between Back Office & Field?

Before you fix any potential silos, you must first identify where they are coming from. Industries such as construction and engineering often face the greatest barriers due to the need to manage dispersed field and office-based teams across multiple locations.

This makes it difficult to collaborate daily, especially if the business relies on manual or disconnected systems.

When managing projects in 2026, businesses should choose software that centralizes data collection, automates workflows, and offers every team real-time visibility.

If your business is yet to invest in a centralized project management system, here are some top indicators that a silo may be creating friction between back-office and field teams:

You’re Working With Disconnected Systems: One of the first signs that your team may be dealing with an internal silo is the lack of a centralised project management system. If your field team is working on mobile devices, but your back office is still booting up a legacy desktop system, you instantly create room for collaboration challenges. If your systems don’t automatically sync, teams must manually enter and update spreadsheets, checklists, and forms. This heightens the risk of human error and slows down the collaboration process.

You have Poor Communication Cross-Team: If information is traveling too slowly or not at all between field engineers and back-office workers, communication breaks down. Teams are more likely to become segregated, adopting an ‘us vs them’ mentality, often resulting in information hoarding and poor team collaboration.

You constantly have Duplicate Work: If there is a lack of communication between teams, there will also be a lack of visibility into each other’s actions. For example, while a field worker may complete a safety check form before a visit, a back-office employee may unknowingly produce the same form, wasting both time and resources. Without a transparent system in place, duplicate work becomes a regular occurrence, which is a key sign that your team is dealing with an internal silo.

Your Goals are Misaligned: If your back-office and field engineers are not communicating regularly, goals quickly become misaligned. For example, in the back office, where business costs are calculated, the team may prioritize cost reduction, while field engineers, trying to complete as many jobs as possible, are more likely to prioritize speed, leading to conflict rather than a shared outcome.

You’re Delivering Poor Customer Service: One of the most concerning signs of an internal silo is poor customer service. When teams work across multiple, disconnected systems, it’s always the customer who suffers. Whether it’s a field worker struggling to access customer data in the back office or an office employee unable to provide real-time updates on live jobs, the customer ends up with poor service and conflicting information.

In 2026, disparate data tools and poor communication between teams prevent real-time decision-making, which 74% of managers claim complicates their ability to manage projects effectively and assess their success.

Large silos will result in poor team productivity and a loss of profitable customer relationships if your business continues to bury its head in the sand.

How to Break the Silos Before it’s too Late?

Breaking down the barriers of a long-standing internal silo is not an overnight job. It requires teams to shift away from a ‘us vs them’ culture and stat-focusing on shared success.

The way to overcome silos is to integrate all aspects of the business into one centralised, mobile-friendly platform. This immediately puts every worker on the same page, enabling back-office and field employees to access the same documents, data, and project updates in real time.

With transparency at the forefront of operations, it becomes easier to create shared goals, eliminate time spent duplicating work, and facilitate cross-functional collaboration.

Going beyond the tech, encouraging role sharing could also build stronger connections between teams. Have your back-office staff visit the field a few times to understand their challenges, and vice versa. This is key if you want to build empathy and improve workflows long-term.

Organizations that take action now will not only be better equipped to handle future challenges but also foster a team that is happy to help the business reach a shared vision. Those that fail to address their internal silos risk falling behind competitors in an increasingly connected business landscape.