2025-01-25 12:42:04
We're heading back to London! And making a trip to Reykjavik. And Dublin. I talked about us considering this in the video yesterday, and just before publishing this post, we pulled the trigger and booked the tickets. The plan is to pretty much repeat the US and Canada trip we did in September and spend the time meeting up with some of the law enforcement agencies and various other organisations we've been working with over the years. As I say in the video, if you're in one of these locations and are in a position to stand up a meetup or user group session, I'd love to hear from you. Europe is a hell of a long way to go so we do want to make the most of the travel, stand by for more plans as they emerge.
2025-01-23 11:14:52
It's hard to find a good criminal these days. I mean a really trustworthy one you can be confident won't lead you up the garden path with false promises of data breaches. Like this guy yesterday:
For my international friends, JB Hi-Fi is a massive electronics retailer down under and they have my data! I mean by design because I've bought a bunch of stuff from them, so I was curious not just about my own data but because a breach of 12 million plus people would be massive in a country of not much more than double that. So, I dropped the guy a message and asked if he'd be willing to help me verify the incident by sharing my own record. I didn't want to post any public commentary about this incident until I had a reasonable degree of confidence it was legit, not given how much impact it could have in my very own backyard.
Now, I wouldn't normally share a private conversation with another party, but when someone sets out to scam people, that rule goes out the window as far as I'm concerned. So here's where the conversation got interesting:
He guaranteed it for me! Sounds legit. But hey, everyone gets the benefit of the doubt until proven otherwise, so I started looking at the data. It turns out my own info wasn't in the full set, but he was happy to provide a few thousand sample records with 14 columns:
Pretty standard stuff, could be legit, let's check. I have a little Powershell script I run against the HIBP API when a new alleged breach comes in and I want to get a really good sense of how unique it is. It simply loops through all the email addresses in a file, checks which breaches they've been in and keeps track of the percentage that have been seen before. A unique breach will have anywhere from about 40% to 80% previously seen addresses, but this one had, well, more:
Spot the trend? Every single address has one breach in common. Hmmm... wonder what the guy has to say about that?
But he was in the server! And he grabbed it from the dashboard of Shopify! Must be legit, unless... what if I compared it to the actual full breach of Dymocks? That's a local Aussie bookseller (so it would have a lot of Aussie-looking email addresses in it, just like JB Hi-Fi would), and their breach dated back to mid-2023. I keep breaches like that on hand for just such occasions, let's compare the two:
Wow! What are the chances?! He's going to be so interested when he hears about this!
And that was it. The chat went silent and very shortly after, the listing was gone:
It looks like the bloke has also since been booted off the forum where he tried to run the scam so yeah, this one didn't work out great for him. That $16k would have been so tasty too!
I wrote this short post to highlight how important verification of data breach claims is. Obviously, I've seen loads of legitimate ones but I've also seen a lot of rubbish. Not usually this blatant where the party contacting me is making such demonstrably false claims about their own exploits, but very regularly from people who obtain something from another party and repeat the lie they've been told. This example also highlights how useful data from previous breaches is, even after the email addresses have been extracted and loaded into HIBP. Data is so often recycled and shipped around as something new, this was just a textbook perfect case of making use of a previous incident to disprove a new claim. Plus, it's kinda fun poking holes in a scamming criminal's claims 😊
2025-01-21 10:14:07
If I'm honest, I was in two minds about adding additional stealer logs to HIBP. Even with the new feature to include the domains an email address appears against in the logs, my concern was that I'd get a barrage of "that's useless information" messages like I normally do when I load stealer logs! Instead, the feedback was resoundingly positive. This week I'm talking more about the logic behind this, some of the challenges we faced with it and what we might see in the future. Stay tuned, because I think we're going to be seeing a lot more of this in HIBP.
2025-01-14 03:48:31
TL;DR — Email addresses in stealer logs can now be queried in HIBP to discover which websites they've had credentials exposed against. Individuals can see this by verifying their address using the notification service and organisations monitoring domains can pull a list back via a new API.
Nasty stuff, stealer logs. I've written about them and loaded them into Have I Been Pwned (HIBP) before but just as a recap, we're talking about the logs created by malware running on infected machines. You know that game cheat you downloaded? Or that crack for the pirated software product? Or the video of your colleague doing something that sounded crazy but you thought you'd better download and run that executable program showing it just to be sure? That's just a few different ways you end up with malware on your machine that then watches what you're doing and logs it, just like this:
These logs all came from the same person and each time the poor bloke visited a website and logged in, the malware snared the URL, his email address and his password. It's akin to a criminal looking over his shoulder and writing down the credentials for every service he's using, except rather than it being one shoulder-surfing bad guy, it's somewhat larger than that. We're talking about billions of records of stealer logs floating around, often published via Telegram where they're easily accessible to the masses. Check out Bitsight's piece titled Exfiltration over Telegram Bots: Skidding Infostealer Logs if you'd like to get into the weeds of how and why this happens. Or, for a really quick snapshot, here's an example that popped up on Telegram as I was writing this post:
As it relates to HIBP, stealer logs have always presented a bit of a paradox: they contain huge troves of personal information that by any reasonable measure constitute a data breach that victims would like to know about, but then what can they actually do about it? What are the websites listed against their email address? And what password was used? Reading the comments from the blog post in the first para, you can sense the frustration; people want more info and merely saying "your email address appeared in stealer logs" has left many feeling more frustrated than informed. I've been giving that a lot of thought over recent months and today, we're going to take a big step towards addressing that concern:
The domains an email address appears next to in stealer logs can now be returned to authorised users.
This means the guy with the Gmail address from the screen grab above can now see that his address has appeared against Amazon, Facebook and H&R Block. Further, his password is also searchable in Pwned Passwords so every piece of info we have from the stealer log is now accessible to him. Let me explain the mechanics of this:
Firstly, the volumes of data we're talking about are immense. In the case of the most recent corpus of data I was sent, there are hundreds of text files with well over 100GB of data and billions of rows. Filtering it all down, we ended up with 220 million unique rows of email address and domain pairs covering 69 million of the total 71 million email addresses in the data. The gap is explained by a combination of email addresses that appeared against invalidly formed domains and in some cases, addresses that only appeared with a password and not a domain. Criminals aren't exactly renowned for dumping perfectly formed data sets we can seamlessly work with, and I hope folks that fall into that few percent gap understand this limitation.
So, we now have 220 million records of email addresses against domains, how do we surface that information? Keeping in mind that "experimental" caveat in the title, the first decision we made is that it should only be accessible to the following parties:
At face value it might look like that first point deviates from the current model of just entering an email address on the front page of the site and getting back a result (and there are very good reasons why the service works this way). There are some important differences though, the first of which is that whilst your classic email address search on HIBP returns verified breaches of specific services, stealer logs contain a list of services that have never have been breached. It means we're talking about much larger numbers that build up far richer profiles; instead of a few breached services someone used, we're talking about potentially hundreds of them. Secondly, many of the services that appear next to email addresses in the stealer logs are precisely the sort of thing we flag as sensitive and hide from public view. There's a heap of Pornhub. There are health-related services. Religious one. Political websites. There are a lot of services there that merely by association constitute sensitive information, and we just don't want to take the risk of showing that info to the masses.
The second point means that companies doing domain searches (for which they already need to prove control of the domain), can pull back the list of the websites people in their organisation have email addresses next to. When the company controls the domain, they also control the email addresses on that domain and by extension, have the technical ability to view messages sent to their mailbox. Whether they have policies prohibiting this is a different story but remember, your work email address is your work's email address! They can already see the services sending emails to their people, and in the case of stealer logs, this is likely to be enormously useful information as it relates to protecting the organisation. I ran a few big names through the data, and even I was shocked at the prevalence of corporate email addresses against services you wouldn't expect to be used in the workplace (then again, using the corp email address in places you definitely shouldn't be isn't exactly anything new). That in itself is an issue, then there's the question of whether these logs came from an infected corporate machine or from someone entering their work email address into their personal device.
I started thinking more about what you can learn about an organisation's exposure in these logs, so I grabbed a well-known brand in the Fortune 500. Here are some of the highlights:
That said, let me emphasise a critical point:
This data is prepared and sold by criminals who provide zero guarantees as to its accuracy. The only guarantee is that the presence of an email address next to a domain is precisely what's in the stealer log; the owner of the address may never have actually visited the indicated website.
Stealer logs are not like typical data breaches where it's a discrete incident leading to the dumping of customers of a specific service. I know that the presence of my personal email address in the LinkedIn and Dropbox data breaches, for example, is a near-ironclad indication that those services exposed my data. Stealer logs don't provide that guarantee, so please understand this when reviewing the data.
The way we've decided to implement these two use cases differs:
We'll make the individual searches cleaner in the near future as part of the rebrand I've recently been talking about. For now, here's what it looks like:
Because of the recirculation of many stealer logs, we're not tracking which domains appeared against which breaches in HIBP. Depending on how this experiment with stealer logs goes, we'll likely add more in the future (and fill in the domain data for existing stealer logs in HIBP), but additional domains will only appear in the screen above if they haven't already been seen.
We've done the searches by domain owners via API as we're talking about potentially huge volumes of data that really don't scale well to the browser experience. Imagine a company with tens or hundreds of thousands of breached addresses and then a whole heap of those addresses have a bunch of stealer log entries against them. Further, by putting this behind a per-email address API rather than automatically showing it on domain search means it's easy for an org to not see these results, which I suspect some will elect to do for privacy reasons. The API approach was easiest while we explore this service then we can build on that based on feedback. I mentioned this was experimental, right? For now, it looks like this:
Lastly, there's another opportunity altogether that loading stealer logs in this fashion opens up, and the penny dropped when I loaded that last one mentioned earlier. I was contacted by a couple of different organisations that explained how around the time the data I'd loaded was circulating, they were seeing an uptick in account takeovers "and the attackers were getting the password right first go every time!" Using HIBP to try and understand where impacted customers might have been exposed, they posited that it was possible the same stealer logs I had were being used by criminals to extract every account that had logged onto their service. So, we started delving into the data and sure enough, all the other email addresses against their domain aligned with customers who were suffering from account takeover. We now have that data in HIBP, and it would be technically feasible to provide this to domain owners so that they can get an early heads up on which of their customers they probably have to rotate credentials for. I love the idea as it's a great preventative measure, perhaps that will be our next experiment.
Onto the passwords and as mentioned earlier, these have all been extracted and added to the existing Pwned Passwords service. This service remains totally free and open source (both code and data), has a really cool anonymity model allowing you to hit the API without disclosing the password being searched for, and has become absolutely MASSIVE!
I thought that doing more than 10 billion requests a month was cool, but look at that data transfer - more than a quarter of a petabyte just last month! And it's in use at some pretty big name sites as well:
That's just where the API is implemented client-side, and we can identify the source of the requests via the referrer header. Most implementations are done server-side, and by design, we have absolutely no idea who those folks are. Shoutout to Cloudflare while we're here for continuing to provide the service behind this for free to help make a more secure web.
In terms of the passwords in this latest stealer log corpus, we found 167 million unique ones of which only 61 million were already in HIBP. That's a massive number, so we did some checks, and whilst there's always a bit of junk in these data sets (remember - criminals and formatting!) there's also a heap of new stuff. For example:
And about 106M other non-kangaroo themed passwords. Admittedly, we did start to get a bit preoccupied looking at some of the creative ways people were creating previously unseen passwords:
And here's something especially ironic: check out these stealer log entries:
People have been checking these passwords on HIBP's service whilst infected with malware that logged the search! None of those passwords were in HIBP... but they all are now 🙂
Want to see something equally ironic? People using my Hack Yourself First website to learn about secure coding practices have also been infected with malware and ended up in stealer logs:
So, that's the experiment we're trying with stealer logs, and that's how to see the websites exposed against an email address. Just one final comment as it comes up every single time we load data like this:
We cannot manually provide data on a per-individual basis.
Hopefully, there's less need to now given the new feature outlined above, and I hope the massive burden of looking up individual records when there are 71 million people impacted is evident. Do leave your comments below and help us improve this feature to become as useful as we can possibly make it.
2025-01-13 06:59:48
This week I'm giving a little teaser as to what's coming with stealer logs in HIBP and in about 24 hours from the time of writing, you'll be able to see the whole thing in action. This has been a huge amount of work trawling through vast volumes of data and trying to make it usable by the masses, but I think what we're launchung tomorrow will be awesome. Along with a new feature around these stealer logs, we've also added a huge number of new passwords to Pwned Passwords not previously seen before. Now, for the first time ever, "fuckkangaroos" will be flagged by any websites using the service 😮 More awesome examples coming in tomorrow's blog post, stay tuned!
2025-01-06 13:20:31
It sounds easy - "just verify people's age before they access the service" - but whether we're talking about porn in the US or Australia's incoming social media laws, the reality is way more complex than that. There's no unified approach across jurisdictions and even within a single country like Australia, the closest we've got to that is a government scheme usually intended for accessing public services. And even if there was a technically workable model, who wants to get either the gov or some big tech firm involved in their use of Instagram or Pornhub?! There's a social acceptance to be considered and not only that, circumvention of age controls is very easy when you can simply VPN into another jurisdiction and access the same website blocked in your locale. Or in the case of the adult material, I'm told (🤷♂️) there are many other legally operating websites in other parts of the world that are less inclined to block individuals in specific states from foreign countries. There'll be no easy solutions for this one, but it'll make for an entertaining year 😊