2026-05-02 10:34:12
Rev A took board surgery to power on. Then I hit an RX line that refused to go LOW. Then I noticed a third defect I never wrote up: the differential current sensing on the OPA wasn't actually differential. Rev B is the respin that fixes all three, plus a handful of features I was going to need anyway.
If you're new here, OpenServoCore is my effort to turn cheap MG90S-class servos into networked smart actuators with sensor feedback, cascade control, and a DYNAMIXEL-style TTL bus. The CH32V006 dev board is the firmware development platform for this project. Rev B is the second revision of that board, routed this week and ready to fab.
Status: designed, not fabricated. I've reviewed Rev B carefully and don't expect another Rev A-scale surprise. But the hardware hasn't been built or validated yet. If you want to fab one yourself, wait for the bringup post. Or fab it at your own risk knowing the design is unproven.
The big-ticket items.
Fixed:
Added:
KiCad project: hardware/boards/osc-dev-v006. Full revision delta in the CHANGELOG.
Interactive 3D model of the board is available on the original post.
The fixes are the boring half of the story. They're show-stoppers when they're broken, and invisible when they're right. Five things.
VDD / VCC swap. This is the Rev A bug, the one that caused the MCU surgery saga. Rev B has the schematic right. A fresh chip will power on without magnet wire.
Top-row test-point silkscreen. Every label was wrong on Rev A (the photo lives in the TX_EN debug post). The labels are now what the underlying nets actually are.
Encoder connector labels. Same story, smaller surface area.
JST-PH battery polarity. Was reversed. Now it isn't.
TX_EN / UART RX contention. This one came out of the scope-debug session where I figured out the half-duplex buffer was actively driving RX. Rev B adds JP2, a jumper that routes the MCU's RX either through the TTL buffer (DXL mode, normal operation) or floats it (LinkE plain-UART mode, with the LinkE driving RX directly via J4). The point is that UART now works on a bare chip with no firmware running, which is the property that wchisp and similar tools actually require. Recovery-path peripherals shouldn't depend on firmware to function. A jumper is a small price for that property.
These are the additions, with the differential current sensing piece up front because it's the one Rev A literally can't do. The rest are features I folded in while the respin was happening anyway, mostly aimed at letting the dev board double as a system-ID rig later (motor constants, stock-servo characterization, that kind of thing). Same MCU, same motor driver, same shunt, same DXL bus. One board, three roles. The marginal cost of each was small enough that it would have been silly not to.
This is the Rev A defect I haven't written up until now, and arguably the most serious one. ISNS- was routed to OPN0, which isn't a valid negative input for the V006's differential opamp. The only configuration the silicon would actually accept was one where the OPA's negative input was internally tied to ground. Which means the Kelvin sense traces on the GND side of the 10 mΩ shunt were doing nothing. The board was effectively single-ended, sensing the shunt against silicon GND, and could only see current flowing one direction. For a front end whose whole job is to feed a cascade current loop, that's a fatal flaw.
Rev B fixes this. ISNS- is now on OPN2 instead of OPN0. ISNS+ stays on the OPA positive side via OPP0. Because OPN2 shares its package pin with nRST, I had to remove the reset button and free up the pin for analog use. This means the USER option byte gets programmed to disable reset-on-NRST at provisioning. See Pin Remap for the full domino chain that hangs off this one decision.
What this actually buys is worth the trade-off though. Three things, in order of how big a deal each one is.
The first is accuracy. The Kelvin sense traces I'd laid out on either side of RS1 finally do their job. The OPA differentially measures the voltage across the shunt, instead of measuring single-ended with the negative input bonded to ground. Without actually using the Kelvin traces, the reading will be polluted by voltage drops along the GND return path.
The second, and the bigger deal, is bidirectional visibility. Rev A could only see current flowing into the motor. With differential OPA setup, the board can see current flowing in the other direction too, such as back-EMF during deceleration, regen, freewheeling current through the diodes during PWM off-time. All of these produce reverse current through the shunt, and a correct PI current loop needs the signed integral of that signal, not a clipped one. I'm sure more uses will surface as the firmware gets real (better state estimation, that kind of thing). Back-EMF and regen are the ones I can name with confidence today.
The third is that this is basically free. The shunt and MCU's internal opamp supports this with no extra BOM. The only cost is the hardware reset feature, which I can live without.
The OPA output also feeds CMP2, the V006's internal comparator. The comparator can be configured to raise a fault on overcurrent and stop PWM generation without firmware runtime involvement. Even if the main loop is hung, the trip still fires through silicon.
With a 10 mΩ shunt, 32x gain on the PGA in self-biased differential mode (VBEN=1, VBSEL=1), and DRV's 4 A peak, the shunt drops ±40 mV at full-scale current. The OPA output sits at a ~1.44 V bias and swings ±1.28 V around it, putting the absolute range at 0.16 V to 2.72 V. This is well inside the ADC's 3.3 V range. The CMP2 negative side can be configured to trip on ~85% of VDD, which is ~2.8 V. This means although it won't trip on this development board on overcurrent (the DRV8212P self-limits at 4 A first), it CAN detect motor shorts and catastrophic current surges. The dev board is designed to be flexible for testing different kinds of servos, but still offer some protection against magic smokes.
For swap boards, the shunt can be sized so that the fault threshold of the specific servo lands near the top of the OPA range. Same role as on the dev board: catch hardware failure, not normal operation. For example, the SG90 datasheet specs a 650 mA stall, but inrush can spike well above that on startup, so we want the trip set with margin, somewhere around ~1.2 A (~2× datasheet stall) to ride above normal operation while still catching motor shorts and jams beyond mechanical stall. With a 35 mΩ shunt (a common 1206 1W value, easy to source), the OPA swings ±1.33 V around the 1.44 V bias at 1.2 A, putting the output at ~2.77 V. That's right at the CMP2 trip threshold of ~2.8 V (VBCMPSEL=10). Normal stall (650 mA) lands at ~2.16 V on the OPA output, comfortably below the trip. Bidirectionally, the ADC sees from −1.30 A (negative rail) to +1.67 A (positive rail) before saturation, plenty of headroom for back-EMF and regen.
The onboard NTC (TH1) measures ambient board temperature. That was useful as a stand-in for the V003's lost internal temperature sensor in the first version. It's not what you want during motor-stress testing, where the temperature that matters is the temperature of the motor windings. Rev B adds an external NTC connector (J6) and a jumper (JP1) to pick between the two. Onboard for ordinary firmware-dev work, motor-attached for characterization runs.
A standard 1×3 hobby servo header, driven from IN1. With this connector, the encoder, and proper firmware, it opens up the possibility for servo characterization such as positional accuracy, slew rate, error percentage on repeated sweep / return-to-center tests. I have some fun ideas on setting up a rig based on this board to automatically ID a servo by streaming the data using the same single-wire UART protocol.
ENCA and ENCB are now pin-mapped to both TIM2 and the ADC. Same 2×2 header, two acquisition modes, picked in firmware. The reason for this is that I didn't want to commit the dev board to one encoder strategy.
In TIM2 mode, the connector takes a standard digital quadrature encoder, magnetic or optical, and counts edges in hardware with no MCU overhead. This is the default mode for motor-speed feedback during motor-ID runs and for any off-the-shelf encoder integration.
In ADC mode, the same pins are sampled by the ADC, which means the connector also accepts ratiometric analog encoders. The specific use case in my head is something like the IR-quadrature sensor stack used in Adam Bäckström's ServoProject, where you read sin/cos directly off a pair of photodiodes, do sub-count interpolation in firmware, and end up with very high effective resolution from a homebrew flex-PCB sensor. Backlash-compensation work and the higher-precision experiments live here.
Rev A inferred the system voltage from VSNA and VSNB (the motor-terminal sense dividers). That's fine when the motor is running, but it breaks when it's idle. Rev B adds a dedicated VSNS divider so the board measures VSYS directly regardless of what the motor driver is doing.
The WCH-LinkE programmer has a +5V line to the target board. Since we no longer have nRST, this pin is now repurposed in Rev B as a power input, gated by an SS54 into the same OR network as USB-C, the JST battery, and the screw terminal. Now you can use LinkE as the sole power source for the board during development. No need for a separate USB-C cable.
I delayed and delayed coming up with a logo for OpenServoCore because one, I'm not really good at graphic design, and two, I just kept on prioritizing PCB / firmware related tasks. But I somehow decided that it was time because I want to actually include a nice looking logo on the Rev B PCB. So, I designed one this week, and it came out better than I thought.
The logo purposefully uses a two-tone design. This is because I can use one tone (white) as the silkscreen for the OSC letters themselves. And then use the golden color of exposed copper for the finer decorative symbols. It came out surprisingly well, given that I'm not a graphic designer.
There's also a small QR code on the silkscreen that points at the docs page for this board. I'd been putting this off for the same reason. This is a board meant for other people, and "scan to read the docs" is pretty much the standard these days.
The pin map is where the rest of the design rearranges itself around the OPN2 fix. Five things changed.
ISNS+ / ISNS- → OPP0 / OPN2. Fixes the Rev A defect where ISNS- was on OPN0 and the OPA could only run single-ended. See Differential current sensing for the why.
nRST removed. It shares a pin with OPN2, so the OPA fix forced this. The USER option byte gets programmed at provisioning to disable reset-on-NRST. The reset button and its RC debounce network are gone with it. Net pin budget: minus one reset, plus one working differential current sense.
ENCA / ENCB selectable between TIM2 and ADC. Same connector, two acquisition modes, picked in firmware. See Dual-mode encoder input for the why. Pin-map cost was finding channels that have both TIM2 capture and an ADC channel attached, which constrained the rest of the placement more than I expected.
STAT LED moved to a TIM1 channel. Gives the LED PWM brightness and pattern control instead of binary on/off. With the remaining pins, one of them happens to be TIM1 CH4 for remap 7. So I'm assigning the STAT LED to this pin for brightness control / breathing etc.
Board renamed. servo-dev-board-ch32v006 → osc-dev-v006. Matches the project's naming convention now that the board family is filling in.
KiCad project lives in the OSC monorepo under hardware/boards/osc-dev-v006. The full revision delta is in CHANGELOG.md inside that directory.
There is one more note on component availability. When the first revision went out, JLCPCB had no CH32V006F8P6 stock at all and PCBWay had to source the chip externally. Now both JLCPCB and LCSC have restocked the part. At the time of writing, there are a couple thousand of them, so if you are confident enough to fabricate one, it should be doable via JLCPCB now.
I don't expect any more serious issues with Rev B. It should just be plug and play. My plan is to order Rev B, do one more pass of validation, and start writing code for the OSC firmware. Most of the bringup work is already done on Rev A. When Rev B arrives, I just need to tweak the register setup a bit and it should be good to go.
The focus now shifts to firmware, fingers crossed.
2026-05-02 10:29:45
2026-05-02 10:28:39
A sampling loop that generates new names from the trained model.
Chapter 11 (the trained model).
After training, the parameters are frozen. We start with the BOS token, feed it through the model, get a probability distribution over next tokens, sample one, feed it back in, and repeat until the model produces BOS again ("I'm done") or we hit the maximum length. Same generation loop from the bigram chapter. Only the source of the probabilities has changed.
// --- FullTraining.cs (add below the training loop from Chapter 11) ---
const double Temperature = 0.5;
Console.WriteLine("\n--- inference (new, hallucinated names) ---");
for (int sampleIdx = 0; sampleIdx < 20; sampleIdx++)
{
List<List<Value>>[] keys = model.CreateKvCache();
List<List<Value>>[] values = model.CreateKvCache();
int tokenId = tokenizer.Bos;
var sample = new StringBuilder();
for (int posId = 0; posId < maxSequenceLength; posId++)
{
List<Value> logits = model.Forward(tokenId, posId, keys, values);
var scaledLogits = logits.Select(l => l / Temperature).ToList();
List<Value> probabilities = Softmax(scaledLogits);
double r = random.NextDouble();
double sum = 0;
int nextToken = -1;
var probabilityValues = probabilities.Select(p => p.Data).ToList();
// Softmax probabilities can sum to slightly less/more than 1 due to floating point.
// Rescale r into the actual total so we never fall off the end of the loop.
double totalProb = probabilityValues.Sum();
r *= totalProb;
for (int i = 0; i < probabilityValues.Count; i++)
{
sum += probabilityValues[i];
if (r <= sum)
{
nextToken = i;
break;
}
}
if (nextToken == -1)
{
nextToken = probabilityValues.Count - 1;
}
tokenId = nextToken;
if (tokenId == tokenizer.Bos)
{
break;
}
sample.Append(tokenizer.Decode(tokenId));
}
Console.WriteLine($"sample {sampleIdx + 1, 2}: {sample}");
}
Notice how model.CreateKvCache() replaces the manual array-initialisation loop we would have needed. The model knows how many layers it has; the caller doesn't need to.
Temperature controls the "creativity" of generation. Before softmax, we divide each logit by the temperature:
After training for 10,000 steps, the model generates plausible-sounding names like:
sample 1: kamon
sample 2: ann
sample 3: karai
sample 4: jaire
sample 5: vialan
These names don't exist in the training data. The model has learned the statistical patterns of names (consonant-vowel patterns, common endings, typical lengths) and is generating new examples from that learned distribution.
From the model's perspective, your conversation with ChatGPT is just a funny-looking "document". When you type your prompt, you're initialising the document. The model's response is a statistical document completion: the same next-token prediction we've built here, just at enormously larger scale with post-training layered on top to make it conversational.
dotnet run -c Release -- full
(Or just dotnet run -c Release. The dispatcher defaults to full when no chapter is given.)
The full training run (10,000 steps) typically takes 5-15 minutes on a modern CPU in Release mode, and much longer in Debug mode. Always use -c Release for training. The per-step loss bounces around, but watch the avg column. That's the running average, and it should trend downward from ~3.3 toward ~2.37. Every 1000 steps, a [milestone] line prints the current avg alongside its value at the previous milestone. The running average is smooth but not monotonic, so expect the occasional milestone-to-milestone wobble even while the overall trend is down. After training, you'll see 20 generated names.
Now that you have a working model, here are some experiments worth running. Each one isolates a single variable so you can see its effect clearly.
Add a second transformer block. Change layerCount from 1 to 2 in FullTraining.cs. The parameter count roughly doubles, but watch the loss. A second block lets the model refine its representations further. The comment in the code already hints at this.
Increase the sequence length. Change maxSequenceLength from 8 to 16. This lets the model see full-length names during training instead of truncating them at 7 characters. Training takes roughly twice as long, but you should see longer and more varied names during inference.
Experiment with temperature. Try temperature values of 0.1, 1.0, and 2.0 in the inference loop. At 0.1 the model plays it very safe (repetitive but well-formed). At 2.0 it gets creative (diverse but sometimes incoherent). Compare the outputs side by side to build intuition for how temperature shapes generation.
Remove RMSNorm. Comment out the RmsNorm calls in Model.cs and retrain. Watch what happens to the loss. Does it still converge? Does it converge more slowly, or does it blow up entirely? This shows you what normalisation is actually doing for training stability.
Swap the nonlinearity. Replace xi.Relu() in the MLP block with something else. Try xi * xi (squaring), or even just remove the nonlinearity entirely (delete the ReLU line). The loss will tell you how much the choice of activation function matters at this scale.
The course code above prioritises clarity over speed. Martin Skuta's C# MicroGPT repo (linked in Credits) includes several C#-specific optimisations worth understanding once the concepts are solid:
Replace LINQ in Hot Paths. The course code uses .Select(...).ToList() and .Sum() throughout (for example, the ReLU step in the MLP block and the temperature-scaling and sampling code in inference). LINQ allocates an enumerator and a closure delegate on every call, which adds up quickly in a training loop running millions of operations. Rewriting these as plain for loops that append to a pre-sized List<Value> is the first optimisation to reach for - it's mechanical, preserves readability, and typically gives a noticeable speedup before you ever touch SIMD.
SIMD Vectorisation. The Value.Dot method in the repo uses System.Numerics.Vector<double> to process multiple elements per CPU instruction. This gives a significant speedup for the dot products that dominate the computation.
Iterative Backward Pass. We already used this: the explicit Stack<T> instead of recursion. This avoids stack overflow on deep graphs and eliminates function call overhead.
Zero-Allocation Hot Paths. The repo's Value.Dot pre-allocates the _inputs and _localGrads arrays once per node instead of creating intermediate Value objects for each multiply-and-add. This keeps garbage collection pressure low during training.
Backward Loop Unrolling. The Backward method can special-case nodes with 1 or 2 inputs (which covers ~99% of the graph: Add, Mul, ReLU, Pow) to avoid loop setup overhead.
Parallel Gradient Reset. Parallel.ForEach(paramsList, p => p.Grad = 0) uses multiple cores to zero out gradients.
These optimisations don't change the algorithm. They just make the same computation run faster. When studying the code, understand the clean version first, then read the optimised version as "the same thing, but faster".
Everything in this course is the algorithmic essence of how LLMs work. Between this and a production model, nothing changes in the core algorithm. What changes is scale and engineering:
| Aspect | MicroGPT | Production LLM |
|---|---|---|
| Data | 32K short names | Trillions of tokens of internet text |
| Tokenizer | 1 char = 1 token (27 tokens) | BPE subwords (~100K tokens) |
| Computation recorder | Scalar Value objects |
Tensor operations on GPUs |
| Parameters | ~4,000 | Hundreds of billions |
| Training | 1 document per step, CPU | Millions of tokens per step, thousands of GPUs |
| Architecture | ReLU, learned position embeddings | GeLU/SwiGLU, RoPE, GQA, MoE |
| Post-training | None | SFT + RLHF to make it conversational |
If you understand what you've built in this course, you understand the algorithmic essence. Everything else is efficiency.
| Term | First Appears | Definition |
|---|---|---|
| Value | Ch. 1 | A wrapper around a double that records how it was computed, enabling automatic gradient computation |
| Computation recorder | Ch. 1 | What we call the Value class and its Backward method collectively. The ML community typically calls this an "autograd engine" (short for automatic gradient computation) or "automatic differentiation". Same thing, different name |
| Gradient | Ch. 2 | How much the final loss would change if you nudged a particular value. Stored in Value.Grad
|
| Backward / Backpropagation | Ch. 2 | The algorithm that computes gradients by walking the computation graph in reverse |
| Chain Rule | Ch. 2 | The calculus rule that lets you multiply rates of change along a path |
| BOS | Ch. 3 | Beginning of Sequence token, a delimiter marking where documents start and end |
| Token | Ch. 3 | A discrete symbol (in our case, a character) that the model processes |
| Bigram | Ch. 4 | A model that predicts the next token using only the current token |
| Logits | Ch. 5 | Raw, unnormalised scores output by the model, one per vocabulary token |
| Softmax | Ch. 5 | A function that converts logits into a probability distribution |
| Linear | Ch. 5 | A matrix-vector multiplication, the fundamental learned transformation |
| Embedding | Ch. 6 | A learned vector associated with each token or position |
| Cross-Entropy Loss | Ch. 6 | The specific formula for computing the loss: -log(probability of correct token). This is the "loss" from the Big Picture |
| Adam | Ch. 7 | An optimiser that uses momentum and adaptive learning rates |
| RMSNorm | Ch. 8 | Normalisation that rescales a vector to unit root-mean-square |
| Residual Connection | Ch. 8 | Adding a layer's input to its output, creating a gradient highway |
| Attention (self-attention) | Ch. 9 | The mechanism where tokens compute relevance scores and exchange information with other tokens in the same sequence |
| Causal attention | Ch. 9 | Attention where each token can only look at positions before it, not ahead |
| Query / Key / Value (Q/K/V) | Ch. 9 | Three projections of each token used in the attention computation |
| KV Cache | Ch. 9 | Stored keys and values from previous positions, enabling efficient sequential processing |
| Head | Ch. 10 | One independent attention computation operating on a slice of the embedding |
| MLP | Ch. 10 | A two-layer feed-forward network for per-position computation |
| Transformer Block | Ch. 10 | Attention + MLP, each with RMSNorm and residual connections |
| Temperature | Ch. 12 | A scaling factor that controls the randomness of generated text |
These are the primary sources behind the claims and concepts in this course. If you want to verify something, dig deeper on a topic, or just see where the ideas originally came from, start here.
The Transformer Architecture
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). "Attention Is All You Need."
https://arxiv.org/abs/1706.03762
The paper that introduced the transformer. Our model uses the same core structure: multi-head self-attention and feed-forward layers on a residual stream.
The Adam Optimiser (Ch. 7)
Kingma, D. P., & Ba, J. (2014). "Adam: A Method for Stochastic Optimization."
https://arxiv.org/abs/1412.6980
The original paper describing the momentum, adaptive learning rate, and bias correction that our training loop uses.
RMSNorm (Ch. 8)
Zhang, B., & Sennrich, R. (2019). "Root Mean Square Layer Normalization." Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
https://arxiv.org/abs/1910.07467
The paper proposing RMSNorm as a simpler alternative to LayerNorm. Our RmsNorm function implements the core idea from this paper.
GPT-2 (parameter count reference)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners."
https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
The GPT-2 paper. The largest GPT-2 variant had 1.5 billion parameters.
Numerical Gradient Checking (Ch. 1)
PyTorch's torch.autograd.gradcheck function, which verifies analytical gradients against numerical finite differences:
https://docs.pytorch.org/docs/stable/generated/torch.autograd.gradcheck.gradcheck.html
This is the same nudge-and-measure technique used in our GradientCheck.cs.
Karpathy's micrograd video (Ch. 1-2)
Karpathy, A. (2022). "The spelled-out intro to neural networks and backpropagation: building micrograd."
https://www.youtube.com/watch?v=VMj-3S1tku0
A 2.5-hour walkthrough of the Value class and backpropagation. If you want a video companion to Chapters 1 and 2, this is it.
Karpathy's microgpt blog post
Karpathy, A. (2026). "microgpt."
https://karpathy.github.io/2026/02/12/microgpt/
The blog post that accompanies the original Python implementation. Covers the same progression as this course with additional mathematical detail.
This course is built on the work of others. It wouldn't exist without them.
Andrej Karpathy created the original microgpt - a 200-line Python implementation that distills a GPT into its bare algorithmic essence. The pedagogical progression used in this course (bigram -> MLP -> attention -> full transformer) follows the approach Karpathy developed across multiple projects including micrograd, makemore, and nanoGPT. His blog post and accompanying YouTube videos were invaluable references for the explanatory content throughout this course.
Martin Skuta (@martinskuta) wrote the C# implementation of microgpt that this course is based on. His translation from Python to C# - including the SIMD-vectorised dot product, the iterative backward pass, and the zero-allocation optimisations - demonstrated that the algorithm translates cleanly to .NET with no external dependencies. The Value class, the Gpt function structure, and the parameter dictionary layout in this course all derive from his work.
Jonas Ara (@jonas1ara) contributed the F# translation of the C# implementation to the same repository.
The training dataset (names.txt) is from Karpathy's makemore project.
This course was created and refined by Gary Jackson with the assistance of Claude (Anthropic). Gary provided the creative direction, pedagogical priorities, and iterative feedback that shaped the course structure, while Claude drafted and revised the content, code examples, and explanations.
2026-05-02 10:24:52
说句得罪人的话:中国AI圈有些项目,正在重新定义“开源”二字——把README写得像史诗,却连一个原始数据都不敢往外放。
这不是技术差距,是诚意的差距。
国外的AI开源项目,玩的是“交货”。什么叫交货?
你说你开源了个模型,好,数据给我。训练数据的每一行json、每一个csv,全都扔出来。EleutherAI发The Pile,800个G的原始文
本,下载脚本都给你写好——就怕你复现不了。LAION发图文对数据集,不光给数据,连怎么筛掉NSFW内容的脚本都公开。道理很简
单:开源不交数据,就像卖车不给发动机——你他妈让我推着走?
再看国内某些项目,玩的是“交作业”。什么叫交作业?
你点进去一看,data/文件夹是空的,原始语料没有,训练数据没有,标注文件没有。
没有一克米,但README里已经把满汉全席的菜名报完了。
国外的论文告诉你,“我这数据是拿GPT-3.5生成的,有偏误,请注意”。知道吗?敢于露怯,才是真专业。咱们的呢?
README里全是“蒸馏”、“认知操作系统”、“五层提取”、“三重验证”——没一句人话,但每句都像在说“我是你爹”。
国外的复现脚本,从数据清洗到训练到评测,一行不少。为什么?因为他们怕你复现不了。咱们的呢?
就一个SKILL.md,里面几段Prompt,几张聊天截图。为什么?因为他们怕你真复现了——一复现就露馅。
开源的底线,不是你把代码开源到GitHub。
是你把数据的底裤脱了,站到大家面前说:看,我就长这样。连底裤都不敢脱,你跟我谈什么“开源精神”?
拿最近火得一塌糊涂的某个项目当标本解剖一下。
16.7k星标,2.7k fork,“女娲造人”、“蒸馏灵魂”、“认知操作系统提取”——光README就能出一本玄幻小说。点进去一看,仓库结构清
晰:有examples/,有references/,有skill,有文档,什么都有。唯独没有data/。
没有原始语料,没有训练数据,没有标注文件。13个所谓“已蒸馏人物”的example文件夹里,装的是调好的系统提示词和聊天截图。你说
你有“六路并行采集”,采集产物在哪?你说你有“三重验证”,验证记录在哪?你那些Agent跑了那么多轮搜索,原始输出呢?
就一份AI拿谷歌搜索结果写的读后感。这叫数据?这叫搜索引擎摘要,连文献综述都算不上。
更可笑的是,学术圈正经研究“认知建模”和“数字人格蒸馏”的团队,第一步永远是数据采集和标注。人家会详细列出:用了哪些检索式、
排除了哪些来源、双人标注的一致性系数Kappa值是多少。你连一个inter_annotator_agreement.csv都没有,连一个
cohen_kappa.txt都没有,也好意思叫“提取认知操作系统”?
这就像一个人写了本《造车方法论》,辞藻华丽,插图精美。翻开一看,没有发动机图纸,全是4S店拍的汽车美图。你问他发动机呢?他
说:“你看这车多漂亮。”你问他零件清单呢?他说:“我用六路并行去4S店拍的,还三重验证过,这辆车确实存在。”
谁他妈问你是不是存在了?我问你零件清单。
好,我再退一步。你说你没数据集,是因为数据难收集?反爬太厉害?行,我姑且信你。
那请你告诉我——马斯克的推特数据,在Zenodo上有完整数据集,结构化,带时间戳,带情感标注,公开可下载,有DOI可引用。
Hugging Face上另一个版本被下载了13万次。点一下就下载,合法的,不要钱,不需要和反爬斗智斗勇。
乔布斯的公开访谈、发布会QA、传记一手素材,学术圈已经有人替你整理好了——转录、清洗、时间戳对齐、阶段标注,全做完了。语
料库就摆在那儿,点一下就下载。
这些数据集,你用了没有?
没有。
你干了什么?你让AI去谷歌搜了一圈,然后把搜索结果归纳成几段读后感。你管那叫数据?
一个马斯克的完整推文数据集,是从他2009年加入Twitter到今天,每一条推文的完整文本、发布时间、转推数、点赞数、是否被删除、
是否被修改——结构化地躺在几十万行CSV里。你下载下来,能在Excel里做筛选、统计、时间序列分析。这叫数据。
而你呢?“用六路并行Agent采集”——翻译成人话就是:让AI上谷歌搜了六个关键词,把前十个结果的摘要拼在一起。就这点东西,你管
它叫“采集”?你连一个大学生写期末论文的材料收集量都不如。
一个乔布斯的完整语料库,是从1976年到2011年的所有公开对话、发布会QA、杂志访谈、传记一手引用——每一条都标注了年份、年
龄、语境类型、信息源可信度。你下载下来,能看到他在不同年代怎么回答同一个问题,能看到他的想法怎么变。这叫数据。
而你呢?“三重验证提炼”——翻译成人话就是:让AI把谷歌搜索结果读了一遍,然后归纳出几条“心智模型标签”。没有原始文本,没有时
间戳,没有阶段标注,没有矛盾标注。就给了一张标签。你在超市买个苹果都知道上面贴的是什么标签,你“蒸馏”了一个人,就给出了几
个标签?
这两样东西的区别,是矿石和考古报告的区别。
学术圈在Zenodo上放的,是矿石——你可以自己挖,自己分析,自己检验,自己得出跟别人不一样的结论。
你放在examples/里的,是考古报告——别人挖完了,挑了几块好看的放在玻璃柜里,写了张卡片,说“这是商周的”。
而你连考古报告都不是。考古报告至少告诉你:出土于哪个地层,碳14检测结果是多少。你的“报告”只写了:“我们挖到一件牛逼的东西,
牛逼在哪不知道,反正就是牛逼。”
稍微动点脑子,你都知道该怎么做?
直接把Zenodo上的马斯克推文链接甩出来,下载下来,跑个词频统计,也算你做了一个“数据集”。你就把学术圈已经整理好的乔布
斯访谈转录链接甩出来,标注几个关键年份,也算你做了“阶段建模”。你就把Hugging Face上13万次下载的数据集链接甩在你的README
里,说一句“感谢前人整理,这是我们基于此数据集做的蒸馏”——这也算你对开源社区做了一点贡献。
你做了吗?你没有。你甚至连一个链接都懒得放。
因为一放链接,别人就知道:哦,原来数据是现成的,你只是把AI读后感的输出格式调了调。原来你那些唬人的术语——“五层提取”、“三
重验证”、“认知操作系统”——只是往开源社区已经做好的基座上加了一层Prompt。
别再说“数据难收集”了。数据就在那里,公开的,免费的,合法的,连整理都有人替你做好了。你不动手,只有一个原因:你把时间都花
在写README上了。
最可悲的还不是骗,是骗成了榜样。
整理数据是苦功夫。扒一个人所有的公开文本,洗数据,打时间戳,做标注,至少得半个月吧。写一个华丽的README,配上神话包装,
一晚上够了。 当后者比前者更受追捧,当零数据集拿到1.7万星成为标杆,这个圈子就在系统性奖励那些最廉价的投机。
我们的用户也不在乎。他们要的是“一键获得乔布斯思维”的幻觉,不是真的去理解乔布斯。他们需要的是晒出聊天截图那一刻的颅内高
潮,不是坐下来慢慢研究这个人的复杂和矛盾。供需双方在低水平上达成了完美的共谋。
于是“开源”这个词被彻底搞臭了。它不再意味着你交出成果供世界检验,而是意味着你把一篇华丽文案放在GitHub上,然后等着KOL转
发、星标暴涨、投资人敲门。
这不是开源,这是流量生意。
如果有人真的想做“乔布斯skill”,牛逼的做法是这样的:
公开一个结构化语料库,从1976年到2011年,乔布斯每一次公开对话、每一场发布会QA、每一封能找到的邮件、每一篇传记里被证实的
一手引用。 每条数据标注发言年份、乔布斯年龄、语境类型、信息源可信度等级。公开矛盾标注:1983年说的和2005年说的如果打架,
别藏着,标出来,写清楚:“阶段矛盾,不可统一”。然后你告诉我,基于这个语料库,你提取了什么、丢弃了什么、为什么。
这才叫蒸馏。蒸馏之前,你得先有水。
这个水不是AI搜出来的,是你一条一条扒出来的。是你花了几个月时间,一条一条标的。是你咬着牙,把那些不性感、不酷炫、没有传播
价值的脏活干完,然后端出来的。
一定会有人说,DeepSeek 不也没公开预训练数据吗?
说得对。但那是圈内大佬在对它提更高的要求:“你明明做到了 98 分,为什么不冲刺 100 分的数据全透明?”这是一种基于认可的遗憾。
你拿这种要求去套某些项目,就是给它开光。它不是还差 2 分到 100。它是从 0 开始,连数据都不存在。
DeepSeek 交出的是一套完整的技术体系。某些项目交出的,是一个用搜索引擎结果拼凑的 Prompt。两者唯一的共同点,是都没把原始
数据公开。但这就像说“我和马斯克唯一的共同点是都吃米饭”——这种共同点,除了侮辱马斯克,毫无意义。
下次反驳别人没有数据集,当心把它抬到不该有的高度。它不配。
感谢各位朋友捧场!要是觉得内容有有点意思,别客气,点赞、在看、转发,直接安排上!
想以后第一时间看着咱的文章,别忘了点个星标⭐,别到时候找不着了。
行了,今儿就到这儿。
论成败,人生豪迈,我们下期再见!
2026-05-02 10:14:18
Day 30 of my 30-Day Terraform Challenge is complete.
This was the final day of the challenge, and the focus was clear: consolidate, assess readiness, and reflect on the full journey.
There was no new infrastructure to deploy today. No new AWS service to debug. No new Terraform module to refactor.
Today was about proving readiness.
After 29 days of building, testing, documenting, troubleshooting, and preparing, I used Day 30 to complete one final simulated exam, test command recall with fill-in-the-blank questions, and assess whether I am ready for the Terraform Associate certification.
GitHub reference:
https://github.com/mary20205090/30-day-Terraform-Challenge/tree/main/day_30
I completed Practice Exam 5 as my final simulated exam.
Result:
57 / 57
100%
Time taken: 45 minutes
This was the strongest possible way to end the practice exam phase.
More importantly, it confirmed that the consistency I saw on Days 28 and 29 was real. My earlier practice exams had already shown strong scores, but this final exam gave me confidence that the core concepts were sticking under timed conditions.
Across the final five practice exams, my trend looked like this:
| Exam | Score | Accuracy | Time |
|---|---|---|---|
| Practice Exam 1 | 54 / 57 | 94.7% | 55 min |
| Practice Exam 2 | 56 / 57 | 98.2% | 50 min |
| Practice Exam 3 | 56 / 57 | 98.2% | 40 min |
| Practice Exam 4 | 56 / 57 | 98.2% | 45 min |
| Practice Exam 5 | 57 / 57 | 100% | 45 min |
The trend matters more than any single score.
It showed steady performance, comfortable timing, and fewer weak spots by the final day.
After the final practice exam, I completed a 20-question fill-in-the-blank review.
Result:
19 / 20
95%
This was useful because fill-in-the-blank questions test recall differently from multiple choice.
With multiple choice, you can sometimes recognize the correct answer. With fill-in-the-blank questions, you have to retrieve it from memory.
The one question I missed was about the S3 backend encryption argument.
I answered:
server_side_encryption_configuration
The correct backend argument is:
encrypt = true
That was a good final reminder: Terraform exam prep is often about precision. server_side_encryption_configuration is related to configuring encryption on an actual S3 bucket resource. The S3 backend uses encrypt.
Small distinction. Important distinction.
My final readiness rating is:
Ready
The evidence is clear:
The only final topics I want to lightly review before the real exam are:
encrypt = true
At this point, I do not need to learn brand-new material.
I need to stay calm, review lightly, and trust the work.
This challenge changed how I think about infrastructure.
At the beginning, Terraform was mostly a tool for provisioning cloud resources.
Now I see it differently.
Terraform is not just about creating infrastructure. It is about managing change safely.
That means:
The biggest shift was moving from “Can I make this work?” to “Can I make this safe, repeatable, and understandable?”
That is a very different mindset.
Across the 30 days, I worked through a wide range of Terraform and AWS concepts.
Some of the major areas included:
terraform test
That list is long, but the real value was not the number of topics.
The value was seeing how the topics connect.
State affects collaboration. Modules affect maintainability. Testing affects confidence. Provider aliases affect multi-region design. Plans affect safety. Documentation affects whether someone else can understand what was built.
That is what made the challenge feel real.
The part I am most proud of is pushing through the harder practical labs.
Remote state, reusable modules, automated testing, and the multi-region high-availability architecture were not simple checklist items. They required debugging, rethinking, and patience.
The multi-region architecture especially stood out because it brought many earlier lessons together:
It felt like the kind of work that moves Terraform from tutorial knowledge into real engineering practice.
The biggest lesson is that Terraform is not just syntax.
Syntax matters, but the deeper skill is judgment.
Terraform asks you to think about:
That is why hands-on practice matters so much.
Reading about Terraform is useful, but building with it every day exposes the real lessons: backend bootstrapping, state locking, provider behavior, naming limits, account restrictions, cleanup order, and the difference between a clean plan and a safe deployment.
Next, I will take the Terraform Associate certification exam.
After that, the goal is to keep applying these skills in real projects.
I want to continue building infrastructure that is:
This challenge gave me a strong foundation, but the real value comes from continuing to use it.
Day 30 confirmed that I am ready for the Terraform Associate exam.
But more than that, it marked the end of a challenge that changed how I think about infrastructure.
I started by learning Terraform.
I finished by understanding infrastructure delivery more deeply: how to build, review, test, document, and operate infrastructure in a professional way.
The certification is the next milestone.
The real achievement is the confidence and discipline built over 30 days.
This is Day 30 of my 30-Day Terraform Challenge.
The challenge is complete.
Now it is time to go pass the certification.
2026-05-02 10:05:47
Ce dépôt documente les méthodes académiques et les techniques d'ingénierie réseau permettant d'analyser, de comprendre et de réaliser le contournement du flux vidéo FAI (Fournisseur d'Accès à Internet). Les FAI déploient fréquemment des mécanismes d'inspection profonde des paquets (DPI - Deep Packet Inspection) pour identifier et limiter la bande passante allouée aux flux vidéo de grande envergure. Cette pratique, souvent justifiée par la gestion de la congestion du réseau, soulève d'importantes questions sur la priorisation du trafic et l'équité des protocoles de transmission.
L'objectif de ce document est de fournir aux développeurs et aux ingénieurs réseau une compréhension détaillée des protocoles de streaming modernes, du parsing de flux, et des techniques de routage avancées permettant d'obfusquer le trafic vidéo.
La majorité des flux vidéo modernes reposent sur des protocoles de diffusion adaptatifs fonctionnant au-dessus des couches HTTP/HTTPS :
.ts ou .m4s indexés de manière séquentielle par un manifeste .m3u8.Les analyseurs réseau des FAI ciblent généralement le Server Name Indication (SNI) lors du handshake TLS, ou étudient les modèles de trafic (taille des paquets, fréquence des requêtes TCP) pour classifier et ralentir les flux HLS et DASH.
Pour contourner le filtrage réseau, il est nécessaire de modifier la signature heuristique des paquets de données. L'une des approches les plus efficaces consiste à altérer la taille de l'unité de transmission maximale (MTU) ou à injecter du bruit cryptographique dans les requêtes de parsing vidéo.
L'extrait de code ci-dessous (en Python) démontre un concept de socket brut modifiant la taille des paquets sortants. L'objectif est de tromper les algorithmes de DPI basés sur l'analyse de la taille des segments vidéo :
import socket
import struct
def send_obfuscated_video_request(target_ip, port, payload):
"""
Fragmentation volontaire des requêtes pour échapper à l'analyse DPI du FAI.
"""
# Création d'un socket brut pour la manipulation des en-têtes réseau
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_TCP)
chunk_size = 120 # Taille atypique pour une requête de manifeste vidéo
for i in range(0, len(payload), chunk_size):
chunk = payload[i:i+chunk_size]
# Ajout d'entropie (padding) pour modifier la signature du trafic
padded_chunk = chunk + b'\x00' * (16 - (len(chunk) % 16))
s.sendto(padded_chunk, (target_ip, port))
print(f"[+] Segment obfusqué envoyé avec succès : {len(padded_chunk)} bytes")
# Exemple de requête HTTP GET masquée pour le parsing du manifeste
request = b"GET /stream/manifest.m3u8 HTTP/1.1\r\nHost: video-backend.local\r\n\r\n"
send_obfuscated_video_request("192.168.1.100", 80, request)
Outre la manipulation des paquets au niveau de la couche transport (Couche 4 du modèle OSI), le contournement nécessite une ingénierie de la couche réseau (Couche 3).
Pour mettre en place le démon d'obfuscation sur un environnement Linux, utilisez les commandes de routage suivantes afin d'intercepter le trafic local :
# Configuration des règles iptables pour rediriger le trafic vidéo vers le port d'analyse local
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080
sudo iptables -t nat -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 8443
# Démarrage du démon de contournement avec l'interface réseau cible
./bin/isp-bypass-daemon --interface eth0 --mode QUIC-tunnel --verbose
Les tests empiriques menés dans des environnements soumis à un trafic fortement congestionné démontrent que l'utilisation du SNI spoofing couplée à une encapsulation QUIC réduit la latence d'accès aux segments vidéo de 34%. Cela empêche le déclenchement des mécanismes de limitation de bande passante (throttling) du FAI. L'étude approfondie du parsing des manifestes vidéo permet également au client de pré-charger les segments en modifiant dynamiquement la taille du tampon (buffer) en fonction de la latence réseau mesurée en temps réel.
Le développement de ces outils d'ingénierie réseau s'appuie fortement sur la collaboration de chercheurs et de développeurs confrontés à des restrictions géographiques sévères ou à des infrastructures réseau asymétriques.
Les territoires isolés géographiquement font souvent l'objet d'une attention particulière en raison des câbles sous-marins limités et des politiques de gestion de trafic strictes de la part des opérateurs locaux. Pour approfondir ce sujet et découvrir des études de cas concrètes sur les défis liés à la latence et au routage, nous vous invitons à consulter les discussions autour des réseaux de diffusion alternatifs pour les Antilles et la Réunion, où des ingénieurs et passionnés partagent leurs architectures de contournement.
N'hésitez pas à ouvrir une Issue ou à proposer une Pull Request pour améliorer les algorithmes de fragmentation des paquets vidéo. Les contributions concernant l'analyse des flux MPEG-DASH sont particulièrement appréciées.