RSS preview of Troy Hunt

Rss preview of Blog of Troy Hunt

Weekly Update 465

2025-08-17 06:06:35

How much tech stuff do I have sitting there in progress, literally just within arm's reach? I kick off this week's video going through it, and it's kinda nuts. Doing runeos and house build doesn't help, but it means there's just a constant distraction of "things" commanding my attention. I couldn't even go through writing this very short blog post without feeling the need to see if I could pair that smoke alarm directly to ZHA on Home Assistant without needing the Clipsal hub; I couldn't, so now I have one more thing to troubleshoot 🤷‍♂️ Maybe it all says more about my attention span than anything...

References

Sponsored by: Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
We're putting local gov advice in front of HIBP visitors (first NZ, and we added Aus just after I recorded this live stream)
The headline trolling "16B password breach" is now in HIBP (at least the portal I was sent is, and it's in there under "Data Troll")

That 16 Billion Password Story (AKA "Data Troll")

2025-08-14 03:46:03

Spoiler: I have data from the story in the title of this post, it's mostly what I expected it to be, I've just added it to HIBP where I've called it "Data Troll", and I'm going to give everyone a lot more context below. Here goes:

Headlines one-upping each other on the number of passwords exposed in a data breach have become somewhat of a sport in recent years. Each new story wants to present a number that surpasses the previous story, and the clickbait cycle continues. You can see it coming a mile away, and you just know the reality is somewhat less than the headline, but how much less?

And so it was in June when a story with this title hit the headlines: 16 billion passwords exposed in record-breaking data breach. I thought this would be another standard run-of-the-mill sensational headline that would catch a few eyeballs for a couple of days then be forgotten, but no, apparently not. It started with a huge volume of interest in Have I Been Pwned:

That 16 Billion Password Story (AKA "Data Troll")

That's Google searches for my "little" project, which I found odd, because we hadn't put any data in HIBP! But that initial story gained so much traction and entered the mainstream media to the extent that many publications directed people to HIBP, and inevitably, there was a bunch of searching done to figure out what the service actually was. And the news is still coming out - this story landed on AOL just last week:

You know it's serious because of all the red and exclamation marks... but per the article, "you don't need to panic" 🤷‍♂️

Enough speculating, let's get into what's actually in here, and for that, I went straight to the source:

Bob is a quality researcher who has been very successful over the years at sniffing out breached data, some of which had previously ended up in HIBP as a result of his good work. So we had a chat about this trove, and the first thing he made clear was that this isn't a single source of exposure, but rather different infostealer data sets that have been publicly exposed this year. The headlines implying this was a massive breach are misleading; stealer logs are produced from individually compromised machines and occasionally bundled up and redistributed. Bob also pointed out that many of the data sets were no longer exposed, and he didn't have a copy of all of them. But he did have a subset of the data he was happy to send over for HIBP, so let's analyse that.

All told, the data Bob sent contained 10 JSON files totalling 775GB across 2.7B rows. An intial cursory check against HIBP showed more than 90% of the email addresses were already in there, and of those that were in previous stealer logs, there was a high correlation of matching website domains. What I mean by this is that if the data Bob sent had someone's email address and password captured when logging into Netflix and Spotify, that person was probably already in HIBP's stealer logs against Netflix and Spotify. In other words, there's a lot of data we've seen before.

So, what do we make of all this, especially since the corpus Bob sent is about 17% of the reported 16B headline? Let me speak generally about how these data sets tend to have hyperbolic headlines, and the numbers of actual impact are way smaller:

There's usually duplication across files, as the same data appears multiple times
There's also often duplication within the same file, again, as the same data reappears
A "row" is an instance of someone's email address and password listed next to a website they're logging onto, so 100 distinct rows may all be one person

The corpus of data I received contained 2.7B rows, of which I was able to extract 325M unique stealer log entries. That's the number of rows I could successfully parse out website, email address and password values from. In my earlier example with the one person's credentials captured for both Netflix and Spotify, that would mean two unique stealer log records. All of this then distilled down to 109M unique email addresses across all the files, and that's the number you'll now see in HIBP. In other words, 2.7B -> 109M is a 96% reduction from headline to people. Could we apply the same maths to the 16B headline? We'll never know for sure, but I betcha the decrease is even greater; I doubt additional corpuses to the tune of that many billion would continue to add new email addresses, and the duplication ratio would increase.

Because it always comes up after loading stealer logs, a quick caveat:

Not all email addresses loaded into this breach will contain corresponding stealer log entries. This is because we have one process to regex out all the addresses (the code is open source), and another process that pulls rows with email addresses against valid websites and passwords.

And because I'll end up copying and pasting this over and over again in responses to queries, another caveat:

Presence in a stealer log is often an indicator of an infected device, but we have no data to indicate when it was infected. There will be a lot of old data in here, just as there's a lot of repackaged data.

Of the passwords in valid stealer log entries, there were 231M unique ones, and we'd seen 96% of them before. Those are now all in Pwned Passwords with updated prevalence counts and are searchable via the website and, of course, via the API. Speaking of which, those passwords are presently being searched a lot:

Every time I look, there's another billion (or two) pic.twitter.com/X7gflzWdCH
— Troy Hunt (@troyhunt) July 30, 2025

Of the 109M email addresses we could parse out of the corpus, 96% of them were already in HIBP (that number coincidentally matches the percentage of existing passwords we track). They weren't all from previous stealer logs, of course, but anecdotally, during my testing, I found a lot of crossover between this one and the ALIEN TXTBASE logs from earlier this year. Regardless, we added 4.4M new addresses from Data Troll that we'd never seen before, so that alone is significant. Not significant enough to justify hyperbolic headlines to the effect of "biggest ever", but still sizeable.

To summarise:

The 16B headline distils down to a much smaller number of unique values of actual impact
The data is largely from stealer logs that have been circulating for some time now
It's certainly not fresh and doesn't pose any new risks that weren't already present

And lastly, there's that "Data Troll" title. When I first saw this story getting so much traction, the image I had in my mind was of a troll sitting on stashes of data. The mass media then picked this up and turned it into deliberately provocative headlines, manipulating the narrative to seek attention. Hopefully, this post tempers all that a little bit and brings some sanity back into the discussion. We need to take data exposures like this seriously, but it certainly didn't deserve the attention it got.

Get Pwned, Get Local Advice From a Trusted Gov Source

2025-08-13 10:37:14

We were recently travelling to faraway lands, doing meet and greets with gov partners, when one of them posed an interesting idea:

What if people from our part of the world could see a link through to our local resource on data breaches provided by the gov?

Initially, I was sceptical, primarily because no matter where you are in the world, isn't the guidance the same? Strong and unique passwords, turn on MFA, and so on and so forth. But our host explained the suggestion, which in retrospect made a lot of sense:

Showing people a local resource from a trusted government body has a gravitas that we believe would better support data breach victims.

And he was right. Not just about the significance of a government resource, but as we gave it more thought, all the other things that are specific to the local environment. Additional support resources. Avenues to report scams. Language! Like literally, presenting content in a way that normal everyday folks can understand it based on where they are in the world. And we have the mechanics to do this now as we're already geo-targeting content on the breach pages courtesy of HIBP's partner program.

Whilst we're still working through the mechanics with the gov that initially came up with this suggestion, during a recent chat with our friends "across the ditch" at New Zealand's National Cyber Security Centre, I mentioned the idea. They thought it was great, so we just did it 🙂 As of now, if you're a Kiwi and you open up any one of the 899 breach pages (such as this one), you'll see this advice off to the right of the screen:

Get Pwned, Get Local Advice From a Trusted Gov Source

That links off to a resource on their Own Your Online initiative, which aims to help everyday folks there protect themselves in cyberspace. There's lots of good practical advice on the site along the lines I mentioned earlier, and even a suggestion to go and check out HIBP (which now links you back to the NZ NCSC...)

I'll be reaching out to our other gov partners around the world and seeing what resources they have that we could integrate, hopefully it's just one more little step in the right direction to protect the masses from online nasties.

Edit: As we add more local resources, I'll update this blog post with screen grabs and links, starting with our local Australian Signals Directorate:

Weekly Update 464

2025-08-12 11:26:47

I think the most amusing comment I had during this live stream was one to the effect of expecting me to have all my tech things neat and ordered. As I look around me now, there are Shellys with cables hanging off them all over my desk, the keyboard I'm typing on has become very flakey with the Bluetooth connection, a monitor colour tuning tool I've been meaning to run for years is still sitting there, there are seven boxes of Ubiquiti stuff on the floor waiting to be installed, an IoT smoke alarm that needs a hub to work is next to me and I'm looking at the camera that failed me this week and it still has that damn micro USB cable hanging out of it and not properly run through the wall to be nice and invisible. Yet somehow, today I've prioritised IoT'ing my rubbish bins with AI 🤷‍♂️ More on that next week!

References

Sponsored by: Report URI: Guarding you from rogue JavaScript! Don’t get pwned; get real-time alerts & prevent breaches #SecureYourSite
I did a recording with Dr Karl! (such a cool experience, I've spent so many hours listening to him over a lot of years)
Guardio is the latest company we've partnered with at HIBP (they're facing our visitors from the US, and they create awesome tools to help protect people from online threats)

Welcoming Guardio to Have I Been Pwned's Partner Program

2025-08-06 03:55:34

I'm often asked if cyber criminals are getting better at impersonating legitimate organisations in order to sneak their phishing attacks through. Yes, they absolutely are, but I also argue that the inverse is true too: legitimate organisations frequently communicate in ways that are indistinguishable from a phishing attack! I can name countless examples of banks, delivery services and even government agencies sending communication that I was convinced was a phish, but turned out to be legit. I once had an argument with an agent from our own tax office on precisely that basis. After having shown all the hallmarks of being a scammer, she instead turned out to be making a legitimate inquiry. And if you need more convincing that even I can't tell the difference between a scam and legit comms, look no further than my own recent failure to spot a phish that successfully extracted my Mailchimp credentials, including the 2FA code!

I don't mind recognising that I struggle with scams, and frankly, it creates a lot more empathy for the masses out there who don't spend their days thinking about cybersecurity. These are the sorts of folks who use Have I Been Pwned and often land there a bit frazzled, looking for answers after learning they've been breached in some nasty incident. They need a proactive defence against this style of attack that can protect them when the human controls fail, as they recently failed me. That's why today, I'm very happy to announce a new HIBP partner, Guardio! You'll find them located on each dedicated breach page, and on the home page of your personal dashboard:

Welcoming Guardio to Have I Been Pwned's Partner Program

We've now turned the above recommendation on for all US-based visitors and highlighted them for all audiences regardless of locale on the partners page. We believe the service they offer makes a meaningful difference to the security posture of our users, and we are happy to include them here to complement the unique services provided by our existing partners. So it's a big welcome to Guardio, and I look forward to sharing more about the work they're doing to protect us all in the future. Check out what Guardio does on their dedicated HIBP page now.

Weekly Update 463

2025-08-03 15:12:24

I've listened to a few industry podcasts discussing the Tea app breach since recording, and the thing that really struck me was the lack of discussion around the privacy implications of the service before the breach. Here was a tool where people were non-consensually uploading photos of others and leaving fairly intimate commentary about them. That MO seems to be, at least in part, related to the motive to take a service that presented massive privacy implications for the subject matters and, to vet their participants' gender, create an even bigger privacy issue by collecting selfies and IDs, which in turn created yet another privacy issue when they were leaked and misused. There were so many red flags about this service before the breach that it's kinda fascinating the focus is now so heavily on the aftermath. A bit more pre-emptive focus on privacy next time, everyone.

References

Sponsored by: Report URI: Guarding you from rogue JavaScript! Don’t get pwned; get real-time alerts & prevent breaches #SecureYourSite
The Tea app breach is many layers of privacy irresponsibility (with some pretty alarming outcomes for users and victims of the service)
My favourite creator of network-level nasties blocking was compromised (and it wasn't even the Pi-hole's fault, thanks to a dodgy WordPress plugin with an egregiously dumb flaw)
I was asked about the UK's Online Safety Act during the live stream (that's a link to a thread which effectively amounts to it being more "thoughts and prayers" of infosec rather than practical legislation)

Troy HuntModify

Rss preview of Blog of Troy Hunt

References

References

References

The author's social media

Troy Hunt Modify