2026-04-22 00:00:07
The FBI classifying a breach of its own surveillance systems as a "Major Cyber Incident" made headlines for about 48 hours. Then the news cycle moved on. That was a mistake.
Most coverage treated this as an embarrassing but isolated event — another government agency failing to secure its own network. That framing misses what actually happened and why it matters for everyone in security, not just the public sector.
I want to be direct about something: the way this story was reported undersells it significantly. This wasn't a breach where someone walked off with a database of employee records. The attackers went after the systems the FBI uses to coordinate active surveillance operations. That's a categorically different kind of target — and it tells you everything about the level of sophistication and intent behind it.
This wasn't a data breach. It was a counter-intelligence operation. And the distinction changes everything.
When Chinese state-sponsored actors — consistent with the Volt Typhoon APT group based on the tactics described — accessed the FBI's surveillance coordination systems, they weren't after names and social security numbers. They were after the map of who the FBI is watching.
Think about what that means. Surveillance records reveal which Chinese operatives are "burned" — already identified and being monitored by U.S. counterintelligence. They reveal the scope and priorities of active investigations. They potentially expose undercover assets embedded in Chinese intelligence networks.
In one breach, an adversary could effectively see the entire board from their opponent's perspective. That's not a data breach. That's a strategic intelligence coup.
Under Presidential Policy Directive 41, the "major cyber incident" classification isn't a PR label. It's a formal federal designation reserved for attacks likely to cause demonstrable harm to national security, foreign relations, or public confidence. It triggers a mandatory whole-of-government response — pulling in CISA, the Office of the Director of National Intelligence, and additional federal resources.
As first reported by Politico and covered in depth by one intelligence brief I follow closely, the FBI doesn't use that classification lightly. Investigators are still working to determine whether active undercover assets were exposed through the stolen data. That question alone — are any of our people now in danger — is what keeps counterintelligence officers up at night.
The uncomfortable reality this breach exposes is architectural. The FBI's internal surveillance systems should be among the most hardened, most segmented, most access-controlled environments in the federal government. If Chinese actors were able to access and exfiltrate from those systems, it suggests the network architecture still relies too heavily on perimeter defense — trust everyone inside the wall — rather than continuous verification of every access request regardless of source.
Zero Trust architecture, as defined by NIST, is built on the assumption that no user or system inside or outside the network should be trusted by default. It's not a product. It's a design philosophy. And it's clear that even the most sensitive government networks haven't fully adopted it.
State-sponsored espionage operates on a fundamentally different logic than cybercrime. Ransomware groups want money. APT groups want information — and they're willing to wait years, move slowly, and accept a low hit rate in exchange for access to the right data at the right time.
The FBI breach is a case study in what that looks like when it succeeds. The target wasn't chosen randomly. The data stolen wasn't incidental. Every step of this operation was designed to maximize strategic value while minimizing detection risk.
For security teams outside government: the tactics don't stay in government networks. The same low-and-slow, high-discipline approach that worked against the FBI will be used against defense contractors, critical infrastructure operators, and any organization that sits in the supply chain of national security. The question isn't whether your organization is a target. It's whether your architecture assumes you already are.
What concerns me most about this story isn't the breach itself — it's what it signals about where state-sponsored operations are heading. When the primary target is the surveillance infrastructure of the world's most powerful law enforcement agency, it tells you that adversaries aren't just playing offense anymore. They're playing meta — trying to understand and dismantle the systems designed to catch them. That's a different game entirely, and most organizations aren't prepared to defend against it.
\
:::tip Follow me on LinkedIn if you want to dig deeper into this kind of analysis — I post there regularly.
:::
\
2026-04-21 23:41:50
A temporary npm release exposed Claude Code’s internals, revealing key design decisions around memory, orchestration, tooling, and safety.
2026-04-21 23:36:34
In this part, we transition from data structures to the "physics" of functions. We'll cover why currying is the default, how tail recursion saves your memory, and how to build a Monad from scratch to handle empty results gracefully.
2026-04-21 23:14:57
AI agents don’t actually learn from experience, they reset after every run. Chat history is not real memory and doesn’t scale well. Without memory, agents keep repeating the same mistakes and rediscovering solutions. A simple memory layer can store facts, past events, and learned steps. Even basic memory makes agents more consistent and efficient over time. You don’t need complex systems, just enough memory to retain what worked and what didn’t.
2026-04-21 23:00:48
This is not a code blog; there’s no easy way to copy and paste from here. Instead, please review the source code directly if that’s what you need.
Our first step is to normalize data; we do this by subtracting the mean. An elegant way to do this is to create a unit square matrix, which can then be multiplied by the data/total number of rows, thus giving the sum of data/total number of rows, thus giving the mean. This is formally known as the Deviation Matrix.
Look at this chonky business:
let unit = unitSquareMatrix(matrix.length);
let deviationMatrix = subtract(matrix, multiplyAndScale(unit, matrix, 1 / matrix.length));
const D = deviationMatrix
//Where multiply and scale basically just does a matrix multiplication with the scaling used to calculate mean
//The below snippet is taken directly from the library
/**
* Fix for #11, OOM on moderately large datasets, fuses scale and multiply into a single operation to save memory
*
* @param {Matrix} a
* @param {Matrix} b
* @param {number} factor
* @returns
*/
export function multiplyAndScale(a: Matrix, b: Matrix, factor: number): Matrix {
assertValidMatrices(a, b, "a", "b")
const aRows = a.length;
const aCols = a[0].length;
const bCols = b[0].length;
const flat = new Float64Array(aRows * bCols);
for (let i = 0; i < aRows; i++) {
for (let k = 0; k < aCols; k++) {
const aVal = a[i][k] * factor;
const iOffset = i * bCols;
for (let j = 0; j < bCols; j++) {
flat[iOffset + j] += aVal * b[k][j];
}
}
}
const result: Matrix = [];
for (let i = 0; i < aRows; i++) {
result[i] = Array.from(flat.subarray(i * bCols, (i + 1) * bCols));
}
return result;
}
Is this really fixed though? Will we continue to OOM? Well the 3 for loops basically means O(n^3) worst case and while this is never a good way to do things, we can go waay simpler by simply hard calculating the means inside the matrix columns themselves. Something like this → matrix.map(row => row.map((v, i) => v - matrix.reduce((s, r) => s + r[i], 0) / matrix.length));
The library skews for elegance instead of optimization, and assumes that you would eventually be doing this on the GPU (which would make this sort of code a lot faster for high volume data, with simple but parallel ops).
Let’s move on to the next step → Deviation Scores, which is nothing but D^T @ D (i.e. matrix multiply the transpose of the earlier computed deviation matrix to get a matrix with variances on the diagonal, and covariances off the diagonal). The @ basically means matrix multiplication in Python. Then we can simply divide by the number of rows in order to get the actual Variance Covariance matrix.
Basically we have now formatted our data into a nice shape and form to do actual analysis on it. Simply put, the actual analysis requires data to be in a specific format, not raw unscaled data that could mean absolutely jackshit.
A small glimpse of this final Variance Covariance matrix would be as follows, and what this neatly formatted data means is… it can now be used to accomplish the actual mathematical property that can get us far, and be useful. Namely this can now be used to compute SVD. For 3 factors (or columns of data) and 3 items (or rows of data) you get a matrix looking like the below:
[
[var(f1), cov(f1,f2), cov(f1,f3)],
[cov(f2,f1), var(f2), cov(f2,f3)],
[cov(f3,f1), cov(f3,f2), var(f3) ]
]
Put informally, Singular Value Decomposition is the master of the universe. Basically, the easiest way to split a specially formatted matrix into a set of insights. What are the insights?
SVD = U*Σ*V^T
U and V are basically characteristics, and Σ is the importance of that characteristic. Okay, before that, all of the symbols above are matrices, so they’re all basically just numbers arranged in rows and columns. The insights are in the numbers themselves, while U provides ROW type characteristic insights, V provides COLUMN type characteristic insights, and the sigma is just a set of importance scores telling you how much importance to assign to each characteristic insight.
For our current purpose we only need to use COLUMN insights, i.e. we want to reduce COLUMNS while keeping ROWs intact, so V becomes our eigenvectors and Σ our eigenvalues
Eigen is German for “own” … no we do not use it anywhere else… its basically mathematical terminology for a characteristic, so umm just roll with it ig?
Basically, you can now pick a few eigenvectors and that is a “good enough”™️ representation of your entire data, and the way you pick these vectors and the total percentage of “good enough”™️ that it is is calculated using the eigenvalues. Just sort the list by eigenvalues and pick the vector with the max eigenvalues to get the most accurate compressed representation of your data, and calculate Percentage Explained simply by using :
percentage_explained = Σ(selected eigenvalues) / Σ(all eigenvalues)
So select 2 of the top vectors to get a higher percentage, 3 of them to get an even higher, and basically all the vectors to explain all your data (but then it is basically just the original data)
In practice, the eigenvectors are usually top heavy, with the TOP eigenvector usually explaining 80% of your data (the Pareto principle or 80-20 rule) but you can transparently see if this is not actually the case which is what makes Principal Components useful.
So now you can really simply get the compressed data (fewer number of columns) depending on which eigenvector(s) you chose:
compressed = selected_eigenvectors × centered_data^T
So if you chose 2 columns [2 eigenvectors] x 4 rows [basically a weight of each column of your original data] of the eigenvectors and your data was originally 4 columns x 30 rows then you get finally 2 columns x 30 rows of data, which might not seem like much, but what if you had 3 million rows? HA!
You can get your original data back using
original ≈ selected_eigenvectors^T × compressedData + mean
This is lossy compression , since you no longer have the exact actual data back, that got wiped out because you selected fewer eigenvectors than the full total.
Know that in order to transmit the data, you do need some overhead, like selected_eigenvectors along with the compressed data, but that should usually be extremely small compared to the actual data.
Well, we have achieved compression, but in order to understand actual insights let’s go back to the original example from the readme in the package. You are a high school teacher. You’ve taken 3 examinations in which each student performed different, but you didn’t really standardize the difficulty of each examination, some were harder some were easier.
Now you want to grade your students, but it would be unfair to skew towards a single easy exam where the student might have scored easily.
Sample data from the Readme:
| Student | Exam 1 | Exam 2 | Exam 3 | |----|----|----|----| | 1 | 40 | 50 | 60 | | 2 | 50 | 70 | 60 | | 3 | 80 | 70 | 90 | | 4 | 50 | 60 | 80 |
Here’s the averages :
| | Exam 1 | Exam 2 | Exam 3 | |----|----|----|----| | Mean | 55 | 62.5 | 72.5 |
As you can see, Exam 1 was probably the hardest, but student 3 did exceedingly well in it.. This should probably be provided a higher weightage than the other examinations.
As you can see, student 3 is the best in class, and student 1 is the worst in class.. the middle is where it gets close.
| Student | Average Score | |----|----| | 1 | 50.00 | | 2 | 60.00 | | 3 | 80.00 | | 4 | 63.33 |
Right now, I do not know if Student 4 is truly better than Student 2, since the averages are super close, maybe they just got lucky in the third exam (it was the simplest after all)
But luckily for us, we did Principal Components, due to which we now have 3 eigenvectors, each explaining a certain bit of variance in the examination difficulties.
| PC | Eigenvalue | Eigenvector | % Variance | |----|----|----|----| | PC1 | 520.1 | [0.74, 0.28, 0.60] | 84.3% | | PC2 | 78.1 | [0.23, 0.74, -0.63] | 12.7% | | PC3 | 18.5 | [0.63, -0.61, -0.48] | 3.0% |
Since we seem to have an 80-20 here (thanks Pareto!) we now can very simply take the first eigenvector as a good measure of difficulties of each exam. This also shows us intuitively, that the first exam is a discriminatory exam (i.e. most people did badly), whereas the second is not (scores were pretty close)
So, we can just now create our compressed data as Score = (Exam 1 × 0.74) + (Exam 2 × 0.28) + (Exam 3 × 0.60) where score is a simulated Exam 4, which basically then tells us how “good” a student is at a subject, while normalizing for difficulty. We can say that we would be 84.3% accurate in saying so, and that 80% is a good enough accuracy for us to predict the results of a FINAL exam.
Ideally we would have taken 100 exams in order to normalize this well, but umm, the students would probably revolt!
Now we have the scores:
| Student | Score | |----|----| | 1 | 31 | | 2 | 44 | | 3 | 84 | | 4 | 53 |
Now the picture is much clearer, student 4 may have been lucky, but they were lucky consistently, across a hard and an easy examination, this gives them a much higher score than student 2. Student 3 did well across both a hard and an easy exam so is the valedictorian and will probably go on to do great things!
Exactly! And this is why Principal Components is mainly used as a dimension reduction technique rather than for an actual insight across variables. For example, if Exam 1 was on Physics, Exam 2 on Chemistry, and Exam 3 on Math, then would you say that this would be a correct comparison? Sure, student 3 stands clear in their mastery, but umm, it’s kinda hard to say what really qualifies the other students, are they, physical chemists or mathematical physicists?
And this is how to move on to Part 4 … where we actually use this technique not just to opaquely reduce a set of variables (columns) into a single variable (as a weighted merge of the individual columns), but instead combine columns into an inspectable series of weighted averages. Basically, we give each eigenvector a name based on the weights they provide to the columns selected.
But hold on… before we move on to that… we need to go to Part 4, where we answer the most important question of 2022: ~~What is a Woman?~~ What is a Neural Network? More specifically, we will be answering how to investigate the inner workings of a convolutional neural network (if a GPT is a brain, a convnet is the eyes).
See you in Part 3… or not, idc.
\
2026-04-21 22:40:06
Grant Cardone wants you to know he's building something new.
Not new like a feature update. New like a category. The kind of new that either ends up in a Harvard Business School case study or a courtroom. Maybe both. Probably both.
The real estate mogul, social media howitzer and self-appointed uncle to every retail investor with a WiFi connection has spent the last year fusing two asset classes together that most financial advisors would never put in the same sentence.
Real estate. And Bitcoin.
Not as separate allocations. As one vehicle.
"We added it to the real estate though," Cardone said in a recent interview with YouTuber DJ Vlad. "We didn't just go out and accumulate Bitcoin."
Cardone's team buys commercial real estate below replacement cost. The spread between what they pay and what the property is worth gets stacked in Bitcoin. Both assets sit inside a single entity. Investors get exposure to both.
"I'm buying real estate. I created the largest real estate Bitcoin hybrid in the world," Cardone said. "We did five projects last year. We started with an $85 million deal. We bought it for $72 million. The difference between what the property was worth and what we paid, we added Bitcoin."
The math is Cardone math. Which means it sounds insane until you actually run the numbers.
"The real estate was worth $85 million. I paid $72 million. That's a $13 million difference. I added $15 million in Bitcoin," he said. "When it gets back to replacement cost, I can sell the real estate off and own the Bitcoin for free."
$15 million in Bitcoin. For free.
That's the pitch. And if you just rolled your eyes, he's counting on that. Fewer competitors that way.
Traditional REITs are legally required to distribute at least 90% of taxable income to shareholders. That's the whole selling point. Steady dividends. Predictable income. Grandma loves them.
It's also a structural straitjacket.
"These REITs can never hold cash, right?" Cardone said. "They will never be able to have Bitcoin. There's 190 of them and I compete with every one of them."
He's not wrong. A REIT that starts accumulating Bitcoin loses its tax-advantaged status. The board would revolt. The income investors would flee. The lawyers would have a collective aneurysm.
Cardone doesn't have that problem. He built a different vehicle from scratch.
"We create a new company. It's a real estate Bitcoin hybrid. It's not a REIT. It does not have to distribute cash," he said. "What it does is accumulate cash flow and buy more Bitcoin."
At current scale, that means $100 million in Bitcoin stacked on top of a $230 million real estate portfolio.
"I got a quarter of a billion dollars of real estate, $100 million of Bitcoin," Cardone said. "Fused the two together in a membership. Bring in investors. I'm going to take that public."
Public. As in, listed. As in, SEC filings, audited financials, the whole nine.
Here's where it gets genuinely clever.
Real estate generates depreciation. Paper losses the IRS lets you use to offset real income. Bitcoin, purchased on its own, does not.
But Bitcoin purchased inside a real estate entity that generates depreciable assets? Different story.
"I'm the only guy on planet earth that gets depreciation with a Bitcoin purchase," Cardone said. "We bought $100 million of Bitcoin and I passed on $50 million worth of losses to my investors."
Read that again. His investors bought Bitcoin exposure and got a tax deduction for it. Try doing that on Coinbase.
Cardone's thesis comes down to a simple observation about what real estate and Bitcoin do to each other inside a single wrapper.
"One's very illiquid and solid. One's very liquid and volatile," he said.
Real estate is the thing 99% of investors understand. Brick and mortar. Cash flow. Tenants paying rent. Bitcoin is the thing less than 1% understand. Volatile, digital, mathematical.
"I'm showing people the real estate and we're adding the Bitcoin," Cardone said. "People can come in and get something they understand, which is the real estate and the cash flow and the depreciation."
The exit math branches three ways. Real estate appreciates, sell it, own the Bitcoin for free. Bitcoin explodes, sell it, own the real estate for free. Or hold both and let them compound.
"I'm putting these two entities together because what I'm building is a new vehicle," he said.
He's also already transacting in crypto natively. His team recently sold a ground-floor retail space for $14 million, paid in USDT. Not dollars. Tether.
The guy who built his brand yelling about cold calls and 10X mindset is closing commercial real estate deals in stablecoins.
The timeline moves fast when you're not waiting for a board vote.
Cardone's bet isn't just that this hybrid model is better. It's that he gets there first and the door closes behind him.
"I can go public with that real estate Bitcoin and those guys can never compete with me," he said. "I'll be first to market and they cannot come into my space. So I have this big moat around it."
The moat argument holds up if you squint. Existing REITs would need to blow up their entire structure to replicate this. New entrants would need both real estate deal flow and Bitcoin conviction. The intersection of those two skills is a very small Venn diagram.
Cardone lives in that intersection. Loudly.
The honest answer is nobody knows. This model has never been tested at public-market scale.
Going public means SEC scrutiny, quarterly reporting and a shareholder base that will want answers during the next Bitcoin drawdown. It's easy to pitch a hybrid vehicle when Bitcoin is ripping. It's harder to explain why you're holding $100 million in a volatile asset when it drops 40% in a quarter and your investors are watching CNBC in real time.
The REIT industry isn't frozen in place either. Tokenization infrastructure is maturing. Regulatory frameworks are shifting. The barriers that currently prevent REITs from holding digital assets could look different in five years.
And Cardone is Cardone. The personality that fills up Capital Grill investor dinners and drives YouTube engagement is the same personality that makes institutional allocators nervous. Taking this public means selling to a crowd that doesn't respond to volume.
But the structural insight underneath the showmanship is real. REITs are trapped in a distribution model designed before Bitcoin existed. Cardone isn't. If Bitcoin does what its believers think over the next 10 to 20 years, the guy who wrapped it in depreciable real estate below replacement cost is going to look like he saw the future.
If it doesn't, he still owns the buildings.
That's not a bad position to be in. Regardless of how loud the guy holding it happens to be.
\