A data breach investigation blow-by-blow

January 10, 2017, 12:11 am

≫ Next: Weekly update 17 (the veranda edition)

Sponsored by: Barkly - Worried about ransomware in 2017? Learn how runtime malware defense blocks ransomware before it does harm.

A data breach investigation blow-by-blow

Someone has just sent me a data breach. I could go and process the whole thing, attribute it to a source, load it into Have I been pwned (HIBP) then communicate the end result, but I thought it would be more interesting to readers if I took you through the whole process of verifying the legitimacy of the data and pinpointing the source. This is exactly the process I go through, unedited and at the time of writing, with a completely unknown outcome.

Warning: This one is allegedly an adult website and you're going to see terms and concepts related to exactly the sort of thing you'd expect from a site like that. I'm not going to censor words or links other than for privacy purposes; this is exactly what I go through during the verification process so you're going to get the whole thing, blow-by-blow.

The file I've been sent is a 120MB zip called "Eroticy.com_June_2015.sql.zip". The sender of the file has said it's from eroticy.com and that's all I have to go on. I extract the file and find an 841MB MySQL script which I open up in Sublime Text which is pretty good at reading massive files:

A data breach investigation blow-by-blow

Ok, so a MySQL script file, fairly typical. I take a quick spin through and it's the usual trove of insert statements. I want a quick count of just how many email addresses are in the thing though so I point a little app I wrote at it to extract them all (it does some basic parsing and other checks):

A data breach investigation blow-by-blow

That's a sizeable result, nearly 1.4M unique email addresses (a bunch of them appear multiple times based on the output above). If it was small (say, under 100k records), I may not have bothered and moved onto something more sizable. It's pretty much the same effort for me regardless of size and bigger breaches impact more people so this helps me prioritise.

Time to check out the site, VPN and Incognito browser first thank you very much (probably make sure the kids aren't around too...):

A data breach investigation blow-by-blow

Alright, so this is a bit volatile because it's not just a porn site, it's dealing with fantasies too which is really personal stuff. In fact, the site has redirected to dating.eroticy.com so it looks like it's designed to facilitate physical encounters. Let's see if I can easily tie the data to this site.

I go to the exported email addresses and grab a random Mailinator email address. I've written before about why these are useful and it's pretty much the first thing I do these days. Then it's off to the password reset form, but I begin by just fat-fingering the keyboard:

A data breach investigation blow-by-blow

I want to see if the site has an enumeration risk which will confirm the address doesn't exist. Instead, the site responds as it should respond:

A data breach investigation blow-by-blow

But by using a Mailinator address which the creator definitely knows is a public mailbox, I can then see if the mail is actually delivered. I submit it then check Mailinator:

A data breach investigation blow-by-blow

Huh, no email, but there's a definite pattern to the types of email the address has been receiving (although we all get plenty of porn spam). I try a few others and still no email. Normally about now I'd be seeing password reset emails and that would be enough to be pretty confident it's going to be a legit breach.

Let's try another enumeration vector:

A data breach investigation blow-by-blow

Most systems will tell you if the address already exists during registration. But here's what Eroticy does:

A data breach investigation blow-by-blow

So either the account doesn't exist in the system or they're actively avoiding enumeration via registration. Checking Mailinator should tell me:

A data breach investigation blow-by-blow

This is getting interesting. Looks like the account created just fine which is telling me that this address doesn't exist in their system already. I need to take a much closer look at the raw data because as it stands, as real as the data itself looks, no practical evidence is suggesting it came from Eroticy.

I do a bit of searching around too, including on vigilante.pw which has a pretty comprehensive list of alleged breaches. Eroticy is there:

A data breach investigation blow-by-blow

I find other references in various other shady corners of the web too so at the very least, it's data that's been redistributed and branded as a breach. But none of it gives me any confidence it's actually legitimate in terms of the data actually having been sourced from Eroticy which is pretty much the point of going through this exercise here.

I want to be able to examine the data more thoroughly so I fire up a VM running MySQL and import the entire thing. This will make it easier to examine the schema as well as query the data.

While I'm waiting for the data to import, I look more closely at the raw statements within Sublime. I'm seeing a lot of URLs represented in there such as http://cartooncopulations.com/index.php/ft_15666_A_8a2b3c0c52d284c0ad8b604ec820cfac/ccoconsole1.html and http://monstrousmelons.com/index.php/ps_16568_A_6219246bcd03eda402719b3a68f3389b/main.html and a bunch of other far more explicitly titled domain names. The same domains appear many times over in a column called "ref" next to a date in a field called "day". Thing is, the dates are frequently in the 2004 era and a bunch of the domains are either dead or link to the same site with different branding:

A data breach investigation blow-by-blow

The sites aren't explicit either - there's nothing you won't see on most beaches there - and they seem to be there primarily to drive traffic to other places.

The MySQL data is still loading so I look around the raw statements a little more. I find a table called EpochTransStats;

CREATE TABLE IF NOT EXISTS EpochTransStats (
  ets_transaction_id int(11) NOT NULL DEFAULT '0' ,
  ets_member_idx int(11) NOT NULL DEFAULT '0' ,
  ets_transaction_date datetime ,
  ets_transaction_type char(1) NOT NULL DEFAULT '' ,
  ets_co_code varchar(6) NOT NULL DEFAULT '' ,
  ets_pi_code varchar(32) NOT NULL DEFAULT '' ,
  ets_reseller_code varchar(64) DEFAULT 'a' ,
  ets_transaction_amount decimal(10,2) NOT NULL DEFAULT '0.00' ,
  ets_payment_type char(1) DEFAULT 'A' ,
  ets_pst_type char(3) NOT NULL DEFAULT '' ,
  ets_username varchar(32) ,
  ets_password varchar(32) ,
  ets_email varchar(64) ,
  ets_ref_trans_ids int(11) ,
  ets_password_expire varchar(20) ,
  ets_country char(2) NOT NULL DEFAULT '' ,
  ets_state char(2) NOT NULL DEFAULT '' ,
  ets_postalcode varchar(32) NOT NULL DEFAULT '' ,
  ets_city varchar(64) NOT NULL DEFAULT '' ,
  ets_street varchar(80) NOT NULL DEFAULT '' ,
  ets_ipaddr varchar(16) NOT NULL DEFAULT '' ,
  ets_firstname varchar(32) NOT NULL DEFAULT '' ,
  ets_lastname varchar(32) NOT NULL DEFAULT '' ,
  ets_user1 varchar(32) NOT NULL DEFAULT '' ,
  PRIMARY KEY (ets_transaction_id),
   KEY idx_reseller (ets_reseller_code),
   KEY idx_product (ets_pi_code),
   KEY idx_transdate (ets_transaction_date),
   KEY idx_type (ets_transaction_type)
);

It piques my interest because epoch seems like an usual name in this context so I give it a Google. That leads me to a page on a GitHub repository called Elite-Adult-Affiliate-Program. This is interesting because stuff is starting to line up: adult website, bunch of links to other sites and the schema containing a name that's represented in a project that seems to support adult site affiliate links.

There's another table in the dump called "console_links" so I search the GitHub repository for that but come up empty. But then I search for "EpochTransStats" and find a file called affiliateprogram.sql. There's a create statement in there for the EpochTransStats table and it has a heap of common columns:

CREATE TABLE `EpochTransStats` (
  `ets_transaction_id` int(11) NOT NULL default '0',
  `ets_member_idx` int(11) NOT NULL default '0',
  `ets_transaction_date` datetime default NULL,
  `ets_transaction_type` char(1) NOT NULL default '',
  `ets_co_code` varchar(6) NOT NULL default '',
  `ets_pi_code` varchar(32) NOT NULL default '',
  `ets_reseller_code` varchar(64) default 'a',
  `ets_transaction_amount` decimal(10,2) NOT NULL default '0.00',
  `ets_payment_type` char(1) default 'A',
  `ets_username` varchar(32) default NULL,
  `ets_password` varchar(32) default NULL,
  `ets_email` varchar(64) default NULL,
  `ets_ref_trans_ids` int(11) default NULL,
  `ets_password_expire` varchar(20) default NULL,
  PRIMARY KEY  (`ets_transaction_id`),
  KEY `idx_reseller` (`ets_reseller_code`),
  KEY `idx_product` (`ets_pi_code`),
  KEY `idx_transdate` (`ets_transaction_date`),
  KEY `idx_type` (`ets_transaction_type`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Main difference with the breach data is that it doesn't have ets_country, ets_state and a few other columns. But still, there's way too much to be coincidental, there's a common origin there somewhere.

The data is finally up in MySQL, let's check out the schema inspector:

A data breach investigation blow-by-blow

My eye is immediately drawn to the big one being the "emails" table with over 1.5M records so I check that out first:

A data breach investigation blow-by-blow

There's a few interesting things here:

Lot of redundancy with the same email appearing multiple times over
Password is different for the same email (i.e. rows 380, 381, 382)...
...but these almost certainly aren't all user passwords; there's too many good ones!
The good passwords look system generated, but there are a heap of others which are clearly user generated
The dates are very old - does this seriously go back as far as 14+ years?!
The "ref" column is interesting, referrer from other sites, perhaps?

Some of these questions can be answered pretty emphatically, so I start querying the data:

The earliest records are from 9 May, 2002
The latest ones are from 31 Dec 2014 (that's a very wide range for a data breach)

I want to start actually reaching out to some of the people in this incident (which is always fun given the nature of the data...) so I start copying the email addresses I extracted earlier up into HIBP (this won't make them searchable on the site, it will merely give me the ability to query them). While that's running, I decide the "members" table is another particularly interesting one because it may actually start to point to individuals in the incident. Here's what I find in there:

A data breach investigation blow-by-blow

This gets more interesting because it looks like these folks have paid money for a service. We're seeing transaction numbers and payment processors here including iBill which Wikipedia describes as follows:

a top credit card transaction aggregator for adult entertainment websites

That article goes on to say that iBill was incorporated into another company and then changed names more than a decade ago, but a little further across in the set of columns beyond what I screen-capped above was another "day" column with dates as new as the last day of 2014. Some of these were processed by iBill. Turns out that iBill is a smaller player in all this though as I discover once I aggregate the processor column:

epoch 199,414
ccbill 40,684
wts 26,250
jettis 3,114
ibill 2,536
itrans 1,088
electracash 356
2014charge 51
psw 19
wsb 18

Epoch has the lion's share of transactions and that's a potential avenue I could now go down to trace the source of the payments (I could always reach out to them). But at present, I'm still no closer to working out where the data actually came from though so I try a few queries:

select * from members where email like '%+%'

People sometimes use the "+" syntax within email address aliases to identify the site they're using, for example test+eroticy@example.com which would imply the source of the data. There's zero results though which could be because some websites block the "+" symbol (don't do this guys!) but also because in my analysis of other breaches, I find there's usually only about 0.03% of accounts using this syntax.

I try another query:

select password, count(*) from members group by password order by count(*) desc

What I'm trying to do here is see if there's a commonly used password that might indicate the source (other than the usual generically crap ones). Here's the 50 most popular and their respective counts:

123456 1,525
password 815
pussy 301
dragon 296
12345678 282
football 236
fuckme 234
696969 222
qwerty 202
baseball 200
12345 199
1234 196
shadow 159
111111 156
master 156
letmein 156
superman 153
abc123 152
monkey 145
mustang 141
jordan 131
jessica 119
1234567 119
fuckyou 117
Harley 116
michael 116
hunter 113
buster 111
thomas 110
ranger 102
killer 102
FUCK 101
jennifer 99
junior 97
andrew 94
asshole 94
666666 94
tigger 94
joshua 93
batman 92
ashley 91
freedom 91
123456789 91
amanda 86
soccer 86
bigdog 85
h00ters 84
ginger 82
sunshine 82
bandit 82

These are just the usual crap ones though with a bias towards what you'd expect on a porn site. I'm not seeing anything here that jumps out and indicates a potential source. Seeing a password such as "1234" though indicates a really weak password criteria on wherever it came from which is often a bit of an indicator of age (we had more tolerance for bad passwords years ago).

The email addresses are now in HIBP (not searchable in there, rather just sitting somewhere I can privately query them), and I find that of my 934k verified subscribers, 512 of them are in the addresses I extracted. I email the 30 most recent subscribers, the oldest having signed up 6 weeks earlier:

Hi, I’m emailing you as someone who has recently subscribed to the service I run, “Have I been pwned?”

Your email address has appeared in a new data breach I’ve been handed and I’m after your support to help verify whether is legitimate or not. I’d like to be confident it’s not a fake before I load the data and people such as yourself receive notifications.

If you’re willing to assist, I’ll send you further information on the incident and include a small snippet of your (allegedly) breached record, enough for you to verify if it’s accurate. Is this something you’re willing to help with?

For verification, I’m on the about page of the site: https://haveibeenpwned.com/About

It'll take a while for people to start replying so I keep digging. The "payouts" table seems like an interesting name for an adult website:

A data breach investigation blow-by-blow

Seeing a column called "revshare" is making me think about the affiliate situation again as is the table name itself. Is this one of those deals where someone gets cash for enticing others to signup to a service? A "payout" based on "revenue share", perhaps?

A few responses from HIBP subscribers come in and I fire them back different variations of this:

This relates to a data breach which has allegedly come from the Adult website known as “Eroticy”. However, I don’t believe that’s the actual source of the incident due to various indicators in the data itself. I’m hoping HIBP subscribers can help me work out the actual source. Your email address is in the data, here’s what I can tell you about it, I’d appreciate your feedback on the accuracy of the data:

1.    Your record says it was created on [redacted], however the data may be a decade older
2.    There’s a password next to it of “[redacted]” (obviously I’ve obfuscated some characters here)
3.    There is a “ref” column which I believe is for “referrer” and it has this value in it: [redacted] (the URL no longer has an active site on it)
4.    There is an IP address of [redacted] which puts it in New York City: https://db-ip.com/[redacted]
5.    There’s a username field of “[redacted]”
6.    There’s a name field with “[redacted]” in it

I know this is possibly more than a decade old, but does any of this look familiar? My suspicion is that a lot of the data could relate to affiliate programs within the adult entertainment space, if there’s anything you can share that might help me track down the source of this, it would be most appreciated.

One of the early responses is from a female which is pretty unusual as far as adult websites go. Her "ref" column has the site gangbangedgirls.com in it and I hesitate - momentarily - before sending her the info. What if she didn't actually sign up to it herself and I'm sending her what then appears to be an unsolicited link to a hardcore porn site? I figure I can always send her a screen cap of her record later on to clear myself if need be but admittedly, I did worry about how she'd react.

The responses from subscribers start coming in:

I think I signed up to that when I was researching how to develop an adult website. All that info is correct.

That's useful, he then continues:

I would call this 100% valid. I'd say it was maybe 7-9 years ago.

But when I pushed him on whether he recalled the name "Eroticy", he had no recollection of it although he did say that the "page design looks familiar".

Another subscriber chimed in:

That is a long time ago. Password sounds like something I would have used way back.

But then went on to say that he doesn't know what it would have been associated with.

The woman I mentioned earlier also responded, fortunately without any signs of me having offended her but echoing the previous response in terms of being unsure where the data would have come from:

This is not something I would have subscribed to at that time. the password, however is familiar.

And then more reassurance came through:

Ya that seems like something I probably made an account for back in the day when I was a kid. The password was definitely a BS one I used back then too.

Yet more confirmation of the legitimacy of the data itself came through:

That is - indeed - a non-secure password I used to use, and a username I've used - mostly on dating sites and other anon-ish environments where I don't want my more usual username to be immediately google-able. Some of the dating sites I was researching (honestly: I was paid to do it!) were almost certainly related to the adult entertainment biz. Also, I had an account on Xbiz a long time ago that almost certainly had that combination.

This is all great, but there's nothing in there that's helping me work out what's actually happened. And then this reply came in:

Yeah used to be an adult webmaster so there would be some of my data out there

So this isn't someone who's necessarily a customer, rather someone who's been involved in running these sites before. Further supporting that is I can't see him in either the "members" or "emails" tables like everyone else, he's actually in one called "webmasters" which had failed to import into MySQL. There's quite a bit of data in there about him and in fact it turns out he lives not too far from me. I send him all his data which he checks out and makes an observation very similar to my own:

That looks to be webmaster affiliate program table

Frankly, by now I'm starting to find myself at a bit of a loose end so I send him a link to this blog post in draft too. He provides some interesting feedback:

Epoch was THE processor. They had a couple of linux boxes sitting under a desk which were running the majority of transactions in the industry. Where you end up with multiple card processes is when the VISA rules start fining for chargeback ratios and processors start scrubbing cards aggressively. It was discovered that you could increase sales 25% by having multiple redundant processors in a chain, if one fails to accept the card you fall back to the next.

This is all very interesting in terms of the mechanics of how these sites work, but it's not getting me any closer to the source. (Interestingly, I've just finished reading Brian Kreb's Spam Nation and the whole premise of chargebacks is one the underground pharma industry had big issues with too.)

I decide the next step is to simply approach Eroticy about it. No, I'm not confident the data is from them but it's got their name on it and they're represented as having been hacked on other breach info websites so they've got a vested interest in investigating it. I plug this into their contact form:

Hi, my name is Troy Hunt, I'm an independent security researcher and you can find me at https://www.troyhunt.com

Recently someone sent me data which was allegedly hacked from the Eroticy website. I'd like to draw this to your attention and provide you with information to help verify if indeed you've had a security breach. Could someone in a security or technical capacity please get in touch with me and I'll share as much as I can.

And that was successfully sent off:

A data breach investigation blow-by-blow

And successfully received:

A data breach investigation blow-by-blow

So now it's just a matter of waiting because short of their response, there's really nothing more I can do at this stage.

Except they didn't respond. I sent that off on the 2nd of Jan and 8 days later as I write this, there's nothing. Nothing in the inbox, nothing in the junk mail, nothing at all. In a way, this is kind of the end of the road in that there's not much more I can do. Yet on the other hand, what everything up until this point has demonstrated is that almost 1.4M people have their data floating around out there - their legit data - albeit with uncertainty as to the source.

Short of investing copious amounts of further time trawling through the data, there's one more avenue available here and it's a last resort, but it's worked in the past. I'm going to publish the data to HIBP and flag it as both "sensitive" (so it can't be publicly searched) and "unverified". I'm going to put Eroticy's name on it but clearly explain in the description of the breach that I couldn't verify them as the source, albeit that the data itself is accurate. Then I'll link through to this blog post so that people can get the whole story.

Here's what I'm hoping to achieve from this:

People who find themselves in the data will be aware their info is circulating. Even without being able to confidently identify the source, this still gets those people thinking about where they're reusing passwords, how much info they're sharing publicly and other general measures they should be taking to protect themselves (i.e. not using their real email address on adult websites).
This will be brought to Eroticy's attention and I'm more likely to get a response. At this stage, I can neither prove nor disprove that the data came from them. However, it's their name on the breach and you've got various data breach websites reporting that they've had an incident so it's in their best interests to take a position on this.
By virtue of sharing as much information as I have here (yet obviously protecting the identities of those involved), I'm hoping that someone pops up and properly identifies the source of this. Maybe it'll be someone who worked on the system, ran an affiliate or even the person who originally bundled it all together and called it the Eroticy data breach.

I took a similar approach with Regpack a few months ago. The publicity the post garnered resulted in it taking all of a day and a half before they properly investigated and admitted that they'd been the source of the breach. I'm not sure if we'll see that in this case or not, but it will certainly get more eyeballs on the issue.

As of now, the data is searchable in HIBP but again, it's flagged as "sensitive". You cannot search for an email address via the public interface and get a hit on Eroticy, instead you need to use the (free) notification service which will send a verification email to the address and ensure the information is only visible to the owner of the address.

This is a long post that if nothing else, I hope demonstrates how abstract many of these data breaches can be and how much effort can go into properly verifying them. This was done over a period of more than 3 months in total, in part due to waiting on responses from people and in part due to having to fit a fairly arduous process into a busy schedule. If you've got ideas on the source, please leave your comments below and I'll add any noteworthy updates to the bottom of the post if and when they occur.

↧

Weekly update 17 (the veranda edition)

January 12, 2017, 4:45 pm

≫ Next: Weekly update 18 (McLaren edition)

≪ Previous: A data breach investigation blow-by-blow

Sponsored by: Barkly - Worried about ransomware in 2017? Learn how runtime malware defense blocks ransomware before it does harm.

This is a somewhat shorter, very tired version of my weekly update. As I say in the video, preparing for the NDC conference in London next week has been extremely taxing with two new talks and a bunch of other activities to organise. I didn't mention it in the vid, but I was also going until the early hours of yesterday morning recording a new Pluralsight course which I'll then be editing while I travel. I didn't realise how tired I look until I edited the clip so I'm going to do my best to use my last day at home to take it easy and soak up some sun. The next update will be from London with a bunch of both preparation and speaking behind me so I'll finally be able to relax a bit.

iTunes podcast | Google Play Music podcast | RSS podcast

References

NDC London is next week! (I'm planning to be a lot more rested by then...)
Properly researching a data breach can be hard work (months later and no conclusive result for Eroticy)
Trump is making things pretty interesting (that's this week's press conference, it's... entertaining!)
Barkly is this week's sponsor (big thanks to those guys!)

↧

Weekly update 18 (McLaren edition)

January 20, 2017, 9:22 am

≫ Next: Thoughts on the LeakedSource take down

≪ Previous: Weekly update 17 (the veranda edition)

Sponsored by: Netsparker - Dead accurate, fully automated and easy to use web application security scanner - Launch a vulnerability scan in just seconds!

How's this for a spot to be?! This week I'm at the McLaren factory in Woking just outside London courtesy of a kind invite from a friendly Twitter follower. On Sunday, I was shown around some great spots in London by another one and as I've said before, I've only ever had good things come from meeting people "in real life" that I've interacted with online. Highly recommended!

Not a lot to talk about in terms of new content I've written, but I give a rundown on the NDC conference, the new Pluralsight course I'm creating and the significant changes the industry is seeing with the rapid adoption of HTTPS. I also touch on the privacy issues of using luggage tracking tech "Tile". That and more in the podcast and video below.

iTunes podcast | Google Play Music podcast | RSS podcast

References

The McLaren Technology Centre (no, not directly tech-related, but a stunningly modern location and this video shows the things I couldn't photograph myself)
HTTPS is gaining traction (that one's a link to Chrome's upcoming change, but there's lots more to come!)
Security requires pragmatism (my thoughts on using Tile to track personal items and how we need to keep things in perspective)
Netsparker is sponsoring my blog this week (I've written about these guys in the past and genuinely love their dynamic analysis tool)

↧

Thoughts on the LeakedSource take down

January 26, 2017, 11:30 pm

≫ Next: Weekly update 19 (brewery edition)

≪ Previous: Weekly update 18 (McLaren edition)

Sponsored by: Netsparker - Dead accurate, fully automated and easy to use web application security scanner - Launch a vulnerability scan in just seconds!

Yesterday, the website known as "LeakedSource" went offline. It's still early days and there's not yet an official word on exactly what happened, but the unfolding story seems to be as follows:

Yeah you heard it here first. Sorry for all you kids who don't have all your own Databases. Leakedsource is down forever and won't be coming back. Owner raided early this morning. Wasn't arrested, but all SSD's got taken, and Leakedsource servers got subpoena'd and placed under federal investigation. If somehow he recovers from this and launches LS again, then I'll be wrong. But I am not wrong. Also, this is not a troll thread.

LeakedSource provided sensitive personal information obtained from data breaches to anyone willing to pay for it. It was a service that occasionally popped up in news stories and recently appeared in WIRED. I've been asked for my views on the service in the past and how I felt about them providing passwords to people who didn't own them. If I'm honest, it's not really something I gave much thought too... until someone sent me my own personal data.

A few months back, a friend paid for their service and then went about notifying all their contacts who'd been put at risk. He sent me the following about my own personal data:

Thoughts on the LeakedSource take down

Now let's be quite clear about this: I see a lot of data breaches in my travels but I was still shocked to see my own personal information sold in this way. My birth date. My IP address. My password hashes. My cracked password hash (a very old, very poor one). I know full well that my own personal information is out there in multiple data breaches, but there's a big leap between it circulating in relatively closed circles and being put up for sale for a few bucks. Not just that, but being sold on the clear web with no respect for the personal consequences of data breach victims such as myself.

LeakedSource appeared in late 2015 and quickly attracted controversy. Last year there was the cease and desist order from LinkedIn where they took issue with the redistribution of their members' passwords:

We have demanded that parties cease making stolen password data available and will evaluate potential legal action if they fail to comply. As a result we have sent a C&D to LeakedSource

A little while later, they were booted off Twitter with the account remaining suspended to this day. They returned under the guise of @BigSecurityNews but there was never any question that it was merely a facade to promote their activities in lieu of the primary account now being off air. Perhaps foreseeing the inevitable result this week, that account has been dormant for the last couple of weeks.

By late 2016, it was becoming apparent that their actions were erring very much on the black side of grey. There was a constant flow of data that wasn't appearing anywhere else in the usual trading circles before first coming to air via their service. Speculation was rife that there was incentivisation occurring not just to provide data that had already been obtained, but to actively seek out new targets that could subsequently be added to the feed of data then monetised by selling the personal information of the victims to whomever was willing to pay for it. This was always rumoured amongst those "in the scene", but it's not yet clear whether this contributed to the take down or if it was solely due to the services directly provided on the site.

There was never any doubt that the service was being used for destructive purposes. A quick trawl around Twitter shows just how other people's personal information was being used:

Thoughts on the LeakedSource take down

I've obfuscated these identities as I don't know what consequences those who paid for the service may now face, especially those who used it for malicious purposes. The theme here is very clear; the service was frequently used to do harm to others. Malicious use was broadly known and broadly discussed, even in the media:

The hacker will then run that username through LeakedSource.com and pay the website 76 cents for full results. In return, he’ll receive an email address (which can be run through the database again for even more information) and password.

In fact, that piece went on to explain how other notorious hacking collectives regularly used the data to compromise victims' accounts:

Other hackers have stated that J5Z’s LeakedSource method is the preferred strategy of OurMine, the collective that hacked Mark Zuckerberg’s Twitter account.

And then there's the folks behind the service. Or the guy. Or girls. Or who really knows because from the outset, it was pretty clear they didn't want to be identified. The earlier mentioned WIRED article quoted them as saying:

if nobody knows who we are or where our site is located, bad people can’t attack us

The veil of anonymity provided them with a veneer of protection which was always a thin one. It's easy to see why they would have wanted to remain anonymous given the nature of the service and how everyone knew it was being used. However, even whilst operating from behind Cloudflare, the location of the service was readily discoverable by the casual observer and was always going to be easily accessed by law enforcement. There's a lesson in there for anyone who believes they can operate with impunity whilst trading on the misfortune of others.

Now, to Have I been pwned (HIBP). Some people have drawn parallels to services that both myself and others run:

So I wonder how the feds justified raiding @leakedsource vs @haveibeenpwned, @breachalarm, @IsLeaked, etc.
— Mark Burnett (@m8urnett) January 26, 2017

I don't think that Mark genuinely considered HIBP or BreachAlarm to be operating in the same realm as what LeakedSource was (he later acknowledged the shadiness of the now defunct service), but let's address the fundamental differences in ethics and approaches anyway.

HIBP never makes any sensitive personally identifiable data available to anyone, not even the legitimate owners of the data. In fact, some time back I wrote about how I will not provide data breaches to other parties either in full (I've never passed a breach to anyone else), or in part (I always point individuals to that post when they ask for their data). The only exceptions I can think of is when I'm verifying a breach and I've written publicly before about how I'll reach out to existing HIBP subscribers and seek their support in verification by providing them snippets of data. Certainly, under no circumstances would I ever provide someone who doesn't own the data any access to it whatsoever. If you can demonstrate that you own the domain then you can see which accounts have appeared in which data breaches (many companies use this as a means of monitoring the risk their organisations are exposed to), but that's a far cry from handing over sensitive PII to strangers.

I've also never paid for data nor traded any of the breaches I've obtained. Creating a commercial market in no way improves the state of security, it merely provides incentive for malicious parties to obtain even more data.

Over the 3 and a bit years I've been running HIBP, I've found myself continually making small changes in direction in order to respond to changing sentiments and indeed changes in the data breach landscape. For example, when news of the Ashley Madison data breach hit, I elected to build out functionality to keep data from "sensitive" breaches beyond the reach of anyone who doesn't own the email address impacted by the incident (or the domain it sits on). At the time, that took a lot of thought but in retrospect the conclusion was simple: the data could cause serious harm to people so let's make sure that can't happen.

Some people expressed concern over them being discoverable in any data breach so I introduced the opt out feature. In that same post, I wrote about removing the VTech data because it was the right thing to do. Parents were able to get some comfort in knowing that data about their kids and indeed themselves had been removed from every possible location beyond VTech themselves and that was a very good outcome. In a similar vein, I never loaded the Red Cross data because like VTech, that was an incident we could contain and me not having the data was a very positive outcome (I didn't even retain my wife's or my own blood donation records from the leak).

I recently also added a rate limit to the API. I was seeing activity which I didn't believe was in keeping with the objectives of the project and posed the potential to put data breach victims at further risk so I put an immediate stop to it. (I later wrote about how I blocked large volumes of malicious traffic using Azure functions and the Cloudflare API, a model which continues to run beautifully to this day.)

I'm writing this at the end of my second week abroad doing back to back talks and workshops focused entirely on trying to help people keep their data out of services like HIBP.

Just had a really nice two-day workshop on web security by @troyhunt! Oh btw, thanks for mentioning that Belgian bank KBC so many times ;-) pic.twitter.com/MAw9hCIuYJ
— Jef Hermans (@jefhermans) January 26, 2017

I've said many times before that the best possible future for this service is that no more data flows into it; it would be an enormously pleasing result if more resilient systems were to stem the current flow of data taken out of vulnerable websites.

HIBP will continue to evolve. If public sentiment changes and, for example, the premise of searching via email address become legally or socially unacceptable then I'll adapt. If other regulations require it, then I'll work to keep the service running responsibly and in a way that keeps both regulators and data breach victims happy. I think about this every single day.

As for whoever is behind LeakedSource, I hope this incident presents an opportunity to rethink the ethics of how personal data should be handled. The WIRED article stated that some of them "are still in school" and they may well just be kids who were attracted by the allure of some easy bucks without actually being malicious individuals. I genuinely hope the consequences aren't too severe and that they're afforded the opportunity to go on and do awesome things. For now though, the web just became a safer place by their absence.

↧

Weekly update 19 (brewery edition)

January 27, 2017, 8:03 am

≫ Next: HTTPS adoption has reached the tipping point

≪ Previous: Thoughts on the LeakedSource take down

Sponsored by: Netsparker - Dead accurate, fully automated and easy to use web application security scanner - Launch a vulnerability scan in just seconds!

Supercar factory last week. Brewery this week. This is how it's done!

As I've written before, despite the many awesome moments these trips have, they're enormously busy with a huge amount jammed into them. This week I talk about travels in Belgium, how they crashed HIBP with a massive surge of traffic after some good press, my upcoming Copenhagen workshop and the inevitable demise of LeakedSource which finally happened yesterday. Next week... somewhere on a plane... on the way to somewhere warm :)

iTunes podcast | Google Play Music podcast | RSS podcast

References

I'm on Belgian TV! (this is the news story that crashed HIBP as tens of thousands of Belgians suddenly descended on the site)
My Copenhagen workshop next week is almost sold out (just a few seats left, get along if you're local!)
LeakedSource got taken down (it was always just a matter of time and it needed to happen)
Netsparker is sponsoring my blog this week (I've written about these guys in the past and genuinely love their dynamic analysis tool)

↧

HTTPS adoption has reached the tipping point

January 30, 2017, 4:07 am

≫ Next: Weekly update 20 (Dubai airport edition)

≪ Previous: Weekly update 19 (brewery edition)

Sponsored by: Netsparker - Scan your websites & detect SQL Injection, XSS and other vulnerabilities with the dead accurate Netsparker web security scanner

HTTPS adoption has reached the tipping point

That's it - I'm calling it - HTTPS adoption has now reached the moment of critical mass where it's gathering enough momentum that it will very shortly become "the norm" rather than the exception it so frequently was in the past. In just the last few months, there's been some really significant things happen that have caused me to make this call, here's why I think we're now at that tipping point.

We've already passed the halfway mark for requests served over HTTPS

This was one of the first signs that we'd finally hit that tipping point and it came a few months ago:

Yesterday, for the first time, @Mozilla telemetry shows more than 50% of page loads were encrypted with HTTPS. pic.twitter.com/kADcLOLsQ7
— Let's Encrypt (@letsencrypt) October 14, 2016

This is really significant - Mozilla is now seeing more secure traffic than it is non-secure traffic. Now that doesn't mean that most sites are now HTTPS because that figure above has a huge portion of traffic served from a small number of big sites. Twitter, Facebook, Gmail etc. all do all their things over HTTPS and that keeps that number quite high.

But let's look at individual site numbers too because the story there is also very good.

The sites implementing HTTPS doubled in a year

My good friend and fellow security aficionado Scott Helme regularly analyses the Alexa top million websites and looks at what security things they're doing. Among other things, he looks at how many of those sites are redirecting users' browsers from HTTP to HTTPS. Scott runs the scan every 6 months, here's what he's found over the last couple of years:

HTTPS adoption has reached the tipping point

That's more than doubled year from August 2015 to August 2016 and is only a fraction under that over the last year which is excellent news! Yes, it's still only 18.4% of websites, but that rate of growth is quite spectacular. (Thanks to Scott for giving me a sneak peak of the next set of stats due to go out next month.)

Part of what's fuelling that growth is changes in browsers and that's where the news gets even better.

Browsers are holding non-secure sites more accountable

Just last week, Chrome 56 hit and it's started doing this:

Hi @Qantas, I just went to login to my frequent flyer account and the browser is warning me that it's not secure. Is something wrong? pic.twitter.com/6Bu4v9f5Qn
— Troy Hunt (@troyhunt) January 26, 2017

Yes, I'm clearly trolling Qantas and yes, they deserve it! They've long had a very poor HTTPS implementation (things like mixed content) which is unacceptable for a service heavily used by people in higher-risk "man in the middle" (MitM) situations like airports and hotels. I was actually in touch with them some time back and outlined a raft of issues but unfortunately, here we still are.

Anyway, the point with Chrome 56 is that it's now holding sites using practices like this to account. Warnings about a site's security at the time where you're providing sensitive information is precisely the sort of thing that will force the hand of these sites. This has been on the cards for some time already with Eric Lawrence posting a very clear blog post about it in October. Firefox is now doing something similar as of version 51 which also hit last week:

Hi @Qantas, I just went to login to my frequent flyer account and the browser is warning me that it's not secure. Is something wrong? pic.twitter.com/5ZuUX3j4AE
— Troy Hunt (@troyhunt) January 28, 2017

Yeah, I know, I couldn't help myself :)

But it's more than just non-secure login forms too. In that blog post, Eric also warned of a much more significant upcoming change:

Eventually, Chrome will show a Not Secure warning for all pages served over HTTP

Google has been talking about this for a while now and I wrote about how broken today's web will feel in Chrome's secure-by-default future back in September. This, more than just about anything, is a very good reason to go HTTPS sooner rather than later.

We're seeing more abuses of unencrypted HTTP traffic

I want to give just a few examples here of requests to everyday, normal old web sites that are being intercepted and modified by MitM attacks. Websites such as my old blog:

Oh @Fly_Norwegian ... you didn't just do that?!? /cc @troyhunt pic.twitter.com/1QpsOlUDxX
— David Peter Hansen (@DPHansen) January 10, 2016

Or websites accessed over Comcast's network:

Comcast is injecting Bandwidth cap warnings into websites. Remember, when I signed up for this I asked if there was a cap and they said no. pic.twitter.com/rCvzLNtpEu
— Scott Manley (@DJSnM) December 29, 2016

Or any non-secure site requested when you first connect to hotel wifi as I've been doing on the trip I'm currently on:

HTTPS adoption has reached the tipping point

That first request to CNN sent all the cookies I had that were valid for the site and that response could have very easily have been redirected to somewhere else. As it happened, all the other tabs to the left were loaded over HTTPS so they remained grey and threw an appropriate error rather than putting my privacy at risk:

HTTPS adoption has reached the tipping point

We connect more things to more untrusted networks than ever before and we need more protection. Fortunately, this is something that more sites are starting to realise, even ones not handling sensitive info.

Many sites you wouldn't expect are now going HTTPS by default

This is a large part of what I mean by reaching the tipping point because we're seeing things like this:

https://t.co/xxkWaizPqR now defaults to HTTPS! Only 789 days since blogging about it...
— Eitan Konigsburg (@eitanmk) January 10, 2017

Ars announces HTTPS by default (finally) https://t.co/LX2SPQfq5Z by @Lee_Ars
— Ars Technica (@arstechnica) January 25, 2017

HTTPS is now enabled on TNW https://t.co/kF0LzBDeIn
— The Next Web (@TheNextWeb) January 26, 2017

Those three tweets all occurred this month within just over two weeks of each other and they're all for media sites. Not sites (primarily) handling financial information, not just on pages collecting passwords but HTTPS everywhere, all the time. We're talking normal old plain web pages served up for passive consumption now encrypted all the way.

HTTPS has gotten fast

This isn't an entirely new development, but in recent times a combination of both improvements to speed in HTTP implementations and a greater awareness of the performance upsides have helped move things along. For example, sites like istlsfastyet.com have been very good at setting record straight as has the emergence of other HTTPS upsides such as Brotli compression. And then there's this:

HTTPS is slow. No - wait - is it HTTP that's slow?! https://t.co/T49GG7oCaK pic.twitter.com/cfnYOpXMWc
— Troy Hunt (@troyhunt) July 8, 2016

I watched on with much amusement at some of the outrage which followed that tweet before writing I wanna go fast: HTTPS' massive speed advantage. The support of HTTP/2 by HTTPS (and more specifically, the lack of support by browsers for HTTP/2 over unencrypted connections) has weighted the scales in favour of going secure. And if you don't have a web server that supports HTTP/2, you can still get it for free with Cloudflare. Which brings me to the next point...

Cloudflare and Let's Encrypt have made HTTPS free and easy

Two of the biggest hurdles I hear for HTTPS when I run my workshops is "price and effort". The former is obvious, the latter relates to both the initial setup in terms of obtaining and configuring the certificate then the necessity to renew it every year and repeat the process. Both Cloudflare and Let's Encrypt change this fundamentally.

To be fair, they're very different philosophies; Cloudflare is a reverse proxy which man in the middles your traffic by design whilst Let's Encrypt is a free CA that makes installation and renewal of certs easy. They both do different things both well and poorly depending on what's important to you. Cloudflare may send some of your traffic encrypted depending on how you configure it whilst Let's Encrypt can be painful (and unacceptably risky) to configure in some environments.

The great thing about having both of these though is that we have choices to create secure sites at a price and effort we never had until recently. As the barriers to adoption are removed, adoption increases.

Summary

The factors driving HTTPS go beyond just the few key ones I've listed here. For example, there's the SEO bump Google started giving secure sites a couple of years ago. There's also the fascination many governments are developing with intercepting everyone's data, notably the likes of Australia's meta data retention law and the UK's "Snooper's Charter". Whilst HTTPS doesn't prohibit governments from seeing who's sending data to who, it protects the contents of the communication which is a big step in the privacy direction.

There's never been a better time to move to HTTPS and not just because it's the right thing to do, but because there's also increasing pressure to do it. Access to newer, faster protocols, browser warnings and simply protecting customers from other nasty things are all very good reasons. Or choose not to just yet and wait for the browsers to start explicitly flagging sites not served over HTTPS as "non-secure" along with an unwanted visual indicator, that oughta do it...

↧

Weekly update 20 (Dubai airport edition)

February 3, 2017, 2:01 pm

≫ Next: Introducing "fabricated" data breaches to Have I been pwned

≪ Previous: HTTPS adoption has reached the tipping point

Sponsored by: Netsparker - Scan your websites & detect SQL Injection, XSS and other vulnerabilities with the dead accurate Netsparker web security scanner

Weekly update 20 (Dubai airport edition)

It's the end of another big trip and time to take the long journey home. This time it's Copenhagen to London, then Dubai, then finally Brisbane and a ride home to the Gold Coast. It's been a busy week in the lead up to the travel too with lots of talks and writing. But the big pone I'm most proud of this week is this:

Sometime today, @haveibeenpwned broke through the 1M verified subscriber mark. Having a quiet champagne alone before flying home 😀🍾 pic.twitter.com/whIss3OXeO
— Troy Hunt (@troyhunt) February 2, 2017

I'll write more next week about the state of HIBP, where it's come from and where I see it going. Until then, here's this week's update and I look forward to next week's being done from home!

iTunes podcast | Google Play Music podcast | RSS podcast

References

HTTPS has passed the tipping point! (good news on many fronts and a reminder that now is the time to secure all your things)
Educating those building systems is the best security ROI ever! (yes, I have an obvious bias, but it just makes good sense)
Kid teaching SQL injection on YouTube (wait - did he just call it "squirrel injection"?!)
How online tutorials can teach insecure practices (make sure you read the comments!)
Netsparker are sponsoring me again this week (still my favourite tool for dynamic analysis that makes sense to developers)

↧

Introducing "fabricated" data breaches to Have I been pwned

February 7, 2017, 12:44 am

≫ Next: One million subscribers later, here's the state of Have I been pwned

≪ Previous: Weekly update 20 (Dubai airport edition)

Sponsored by: Netsparker - Scan your websites & detect SQL Injection, XSS and other vulnerabilities with the dead accurate Netsparker web security scanner

I've written before about how I verify data breaches and discussed it at length in various conference talks. I take verification very seriously because misattribution can have serious consequences on the company involved, those in the alleged breach and indeed, on myself as well. To give you a sense of how much effort can go into verification, last month I wrote about a data breach investigation blow by blow where ultimately, I failed to verify the authenticity of the data. Due to the prevalence of legitimate data in there though, I still loaded it into HIBP and flagged it as "unverified", a concept I introduced in the middle of last year.

The point of unverified data breaches is that they have a lot of accurate personal information in them yet I'm unable to conclusively tie them back to any one particular service that's been breached. I introduced the concept as a way of representing the incident when my confidence dropped below a certain threshold. The more thought I've given this recently, the more I've been thinking about the whole "confidence" side of verification. Consider the following chart:

Introducing

There are a few important things I'd like to point out here:

Firstly, even for verified breaches, confidence is not always going to be 100%, there's a range at play here. For example, I was very confident when I verified the Dropbox data in part because they'd acknowledged the incident and in part because both my wife's and my strong, password-manager-generated totally random passwords were in there as bcrypt hashes. Other times, there's no acknowledgement from the company involved and I'm relying on a combination of the verification techniques I explained in that earlier post and a gut feel of "how likely was this site to be breached" (i.e. it's running unpatched vBulletin).

Secondly, this is subjective; I'm making a judgement decision here. A lot of how this system runs boils down to me making a call on how things should be classified. For example, it's up to me to decide if I flag a breach as "sensitive" therefore making it no longer publicly searchable. I don't always get this right either; yesterday I loaded the HongFire breach which is an anime and manga forum only to later have someone point out that the content on the site often involves some pretty hardcore cartoon porn. I subsequently flagged it as sensitive because one's presence in this incident could cause them pain were it to be publicly learned.

And thirdly, that chart above has a missing segment on the far right. Until now, there's been no construct for me to handle breaches within HIBP once my confidence level dropped beneath what I'd reasonably classify as an unverified breach. This brings me to "fabricated" breaches.

The catalyst for introducing this classification came when someone sent me a breach that had allegedly come from Justdate.com. As the name implies, it's a dating website and the data I was sent contained over 24M rows. My usual first glance verification attempts were inconclusive; on the one hand, the site looks like security isn't exactly a high priority (i.e. missing transport layer security on login and no ability to serve content over HTTPS), yet on the other hand, tried and tested techniques such as account enumeration were coming up negative (password reset and registration said the accounts in the alleged breach don't exist). However, it was broadly acknowledged within data breach trading circles that the site had been breached:

Introducing

This is from Vigilante.pw and I regularly use this site as a reference point for data that's believed to have been breached (it's only a list of alleged incidents, not a redistribution channel). There were many other references I found in various locations as well so clearly there's something to this.

I started contacting HIBP subscribers which is something I regularly do when I'm having trouble verifying a breach. I wanted to ask them questions about the data which looked like this:

user_id    first   last    email   dob postcode    country ip_country  alerts_email    alerts_online   unsubscribed    bounce
100    [redacted]  [redacted]  [redacted]  1992-03-03  PO5 GB  GB  1023    15  0   0

There wasn't a lot to go on, primarily just a name, email, DOB, country and postcode. Plus of course their presence on the dating site, something they'd likely recall signing up to. About a dozen people responded and as the replies started coming in, it quickly became clear that this data certainly wasn't going to fit in the left side of the graph above. Nobody - nobody - recalled signing up to Justdate.com. Some people had used other dating sites in the past, but others had never used a site of this nature at all. As more replies arrived, I started questioning whether it could even be classified as "unverified". Many people said the birth date was wrong. And the post code. And the country. Not all, mind you, there were certainly some valid entries there but it was less than 50% of respondents. My confidence level in the legitimacy of the incident fell beneath the threshold of what I felt comfortable loading into the system. This certainly wasn't the first case of this nature either.

That's when I started giving the whole thing more thought. Here we have tens of millions of records floating around the web alleging that people had participated in a dating site. Real people too because regardless of how legit the other data attributes were, the email addresses were accurate and they belonged to actual people, many of whom were relying on HIBP to let them know when their info was found. I started toying with the idea of whether it made sense to define a construct under which the data could be loaded, so I reached out and asked them:

Thanks for confirming, looks like this is very likely a fake in terms of coming from that site, but multiple subscribers have confirmed that parts of the data are legitimate. I may consider a new “fake” category in HIBP – do you think people such as yourself would like to know about the presence of this data with your email address even if it didn’t come from Just Date?

I'm going to list off the pertinent parts of each and every response I got to that email in the order with which they came in. Here's what HIBP subscribers had to say:

I think it would definitely be helpful to know where our data is appearing. I like your idea of considering a "fake" or "spoof" category.

I think yes, just for checking, we are not 100% safe & its better to check it even if its fake breach.

Absolutely it's good to know no matter what do you know the service goes onto the internet and delete all of your information more selective I guess I know that there's ways to delete everything but maybe more selective deletion program to find out where your information is and then ask you if you decide kind of like LifeLock but but more aggressive.

Yes I would appreciate any information that has a berring on my privacy. Thank you and I hope I have been some help to you.

Absolutely 100%. I use different emails for forums/subscriptions as I do for work, and another development only email. But that does not help me if my email makes it into a recruitment/spam DB, or even worse leaked in a data dump. I would be very interested to know whenever my name and email address was displayed in dumps.

Yes, I personally would like to know.

Yes I would certainly be interested in this information. It's nice to know who's got my details, even if they're somewhat incorrect.

These responses are exactly as they came in and they were unanimous - people still want to know their info is circulating even if the breach isn't legit. So I gave it some thought and ultimately added the following piece to the earlier chart:

Introducing

I ultimately elected to use the word "fabricated" rather than "fake" as I felt the latter implied nothing within there was real. However, as I mentioned earlier, a number of people had accurate dates of birth in the incident. As I also mentioned earlier, all of this is ultimately a judgement decision on my behalf and not an absolute; I have a very low degree of confidence that Justdate.com was breached and I think it's a highly unlikely proposition based on the data. There's a tiny possibility it's real, but they're not good odds.

This concept now manifests itself in several ways within HIBP. Firstly, as with unverified breaches, there's now a visual indicator next to every fabricated breach in the form of a triangle with an exclamation mark:

Introducing

Because it's a dating site, the breach is also flagged as "sensitive" which is why the little flame is present.

Secondly, there's an attribute on the breach entity returned by the API called "IsFabricated" so consumers of the service can identify breaches of this nature. It means a breach such as Justdate.com appears like this when described in JSON:

{
    Title: "Justdate.com",
    Name: "JustDate",
    Domain: "justdate.com",
    BreachDate: "2016-09-29",
    AddedDate: "2017-02-07T01:28:41Z",
    PwnCount: 24451312,
    Description: "An alleged breach of the dating website...",
    DataClasses: [
        "Dates of birth",
        "Email addresses",
        "Geographic locations",
        "Names"
    ],
    IsVerified: false,
    IsFabricated: true,
    IsSensitive: true,
    IsActive: true,
    IsRetired: false,
    IsSpamList: false
}

Note that I've still flagged this incident as "unverified" as well. There are people that have created dependencies on this attribute that deal with an unverified breach differently to one that I'm confident in. In an ideal world, I'd represent the three different states via one attribute (verified, unverified, fabricated), but existing dependencies mean I need to use each of those fields here.

Thirdly, as you'll see in the Justdate.com breach image above, I'm being very explicit in the description of the breach as to why it's being classed as fabricated. I want to give people as much information as possible so that they can understand what the data is and why I believe it isn't real. I also want to make sure it's abundantly clear that there's insufficient evidence to suggest that the company a fabricated breach has been attributed to did indeed have a breach at all.

And finally, the notification emails sent to subscribers clearly indicates the fabricated status and of course also include the description of the data explaining why I've flagged it as such:

Introducing

~~(And no, my account isn't in the data! Just testing...)~~ I was so certain there was absolutely no reason my account would be in there that I didn't even check. And then I got a breach notification from HIBP! So now, being a "victim", I'm glad I know and I concur with the other comments from subscribers above.

There's a whole other discussion to be had about what causes a bundle of data to be fabricated and called a breach in the first place. Attempts to monetise the data by selling the alleged breach, extortion of the company involved or just simple big-noting by individuals seeking notoriety are all feasible explanations for many of the fabricated breaches I see. For now, the important thing is that if your data is circulating in one of these dumps, there's now a way to know about it.

The Justdate.com data is now in HIBP. Because it's also a sensitive breach, you can only search though it by using the free notification service. I'll load more existing fabricated breaches as time permits and inevitably, as new ones emerge in the future.

↧

One million subscribers later, here's the state of Have I been pwned

February 9, 2017, 1:45 am

≫ Next: Weekly update 21

≪ Previous: Introducing "fabricated" data breaches to Have I been pwned

Sponsored by: Netsparker - Scan your websites & detect SQL Injection, XSS and other vulnerabilities with the dead accurate Netsparker web security scanner

One million subscribers later, here's the state of Have I been pwned

I hit a bit of a milestone last week with HIBP which I thought deserved a little celebration:

Sometime today, @haveibeenpwned broke through the 1M verified subscriber mark. Having a quiet champagne alone before flying home 😀🍾 pic.twitter.com/whIss3OXeO
— Troy Hunt (@troyhunt) February 2, 2017

A million verified subscribers (that is they've received a welcome email and clicked a link to confirm they actually want in), is a pretty major feat in my books, especially for a somewhat niche service. As I sat on the plane back home, I started to think about where the service now stood in terms of things like subscribers, the notifications it's sent and indeed who's using it for what purposes. I decided that because of this milestone, I should take some time to look at where the service currently stands and to reflect a little on how it's grown. I haven't done this for a while so I'm kind of interested in the stats myself, hopefully they'll be of interest to you too.

Subscriber registrations

Let's kick off here because that was the impetus to this post anyway. I launched the notification feature in December 2013 because, well, it just seemed like a good idea at the time. This was only a few weeks after launching the project itself so the notification service was one of the first real features I built.

Let's start with how that subscriber base has grown over time:

One million subscribers later, here's the state of Have I been pwned

Obviously slow beginnings, but a couple of significant observations in there. One is that there are events which cause massive spikes in registrations. August 2015 was the Ashley Madison breach and on the 20th, there were 30,779 registrations in one day and almost 70k in just a 3-day period. Then in August last year, it was the Dropbox hack and the piece I wrote on that went absolutely viral. That saw 42,599 people sign up on August 31st alone and over 77k across a 3-day period.

The other thing that stands out to me and to be honest, I wasn't completely aware of this, is the rate of growth since about that Dropbox time. In the last 4 months (so well after the Dropbox news had died down), I've seen 218k new subscribers so in other words, almost 1.8k per day. As you can see from the graph, that's a pretty linear growth rate too so we're not talking about one or two newsworthy events driving growth, rather a sustained pattern of people taking an interest in their personal exposure to data breaches.

The big question now though is how many of those people are actually receiving notifications? Let's drill into it.

Notifications sent

Every time I load a data breach into HIBP, I run a process that finds the intersection between the email addresses in a breach and the verified subscribers in the system. Every match gets sent an email which over time, has meant the following:

One million subscribers later, here's the state of Have I been pwned

To date, I've sent 868,580 notification emails after loading a breach. That graph obviously has a very slow start in part because back at, say, July 2015 I only had 125k subscribers, but also in part because the scale of breaches changed dramatically last year. That Dropbox breach resulted in 144,136 emails being sent to subscribers. LinkedIn was 111k. Last.fm was almost 100k. MySpace 70k. What makes this challenging as I move forward is the rapid growth of subscribers is dramatically increasing the emails I need to send; there are more than 50% more subscribers now than when I sent all those Dropbox emails and let's face it, Dropbox was "only" 68 million records so if something like that 1 billion record Yahoo breach was ever to surface...

I was talking to an organisation today who's interested in how they can use the data to help their hundreds of millions of customers better understand their exposure and they asked an interesting question - how many people are receiving multiple notifications? Well firstly, of those 868,580 breach notifications sent, there are a total of 396,775 unique subscribers so clearly there's a bunch of people getting multiple notifications. Here's the distribution of how many notifications individual subscribers receive:

One million subscribers later, here's the state of Have I been pwned

Aspects of this were predictable: it tapers off very quickly. But there were things that really surprised me too, for example more people have received 2 or more notifications than have received just a single one. Well over 60k subscribers have received 4 or more notifications. Stunningly, 25 individual subscribers have received 20 or more notifications. That's just for data breaches (not pastes) and yes, they're real email addresses. I don't know what these people have been signing up to... (well actually, I do, I just don't know why there's so many!)

Some of these breaches have been pretty, well, "sensitive" too and a bunch of subscribers have been caught up in that. There were 56 subscribers in the anal fisting site "Rosebutt" (don't worry, that link goes to news and not the actual site!) There were 295 subscribers in the "non-consensual voyeurism site "Candid Board", a site to share photos taken up unsuspecting girls' skirts. Just last week there were 429 subscribers in the Freedom Hosting II breach, a Tor-based hosting providers where half the content was allegedly paedophilia. Clearly, I don't get to choose who appears where, but it goes to show just how far and wide data is spread and how many HIBP subscribers are learning of these (often very unwelcomed) incidents.

That's breaches, but notifications are also sent to subscribers who appear in pastes as well. The paste feature is a construct I introduced in September 2014 and it identifies email addresses dumped at a variety of locations such as Pastebin. Often these are very early indicators of compromise (attackers often post samples of their exploits there) and HIBP automatically trawls multiple sources looking for data. The paste notifications sent is a somewhat interesting chart:

One million subscribers later, here's the state of Have I been pwned

HIBP has sent out a total of 68,974 paste notifications but obviously, there's some really big ones in there. In one case, there were over 25k notifications sent for a single paste. In this case, it was an alleged Gmail breach with a very large number of records. In another case, the Plex data breach was pasted publicly and nearly 10k notifications went out then. Other times I've added new paste sources and a there's been a flurry of activity as they've been indexed and notifications sent hence big jumps in one go. For something that predominantly runs automatically in the background, the paste service has proven to be pretty effective.

Domain subscribers

The subscriptions and notifications above are all happening at an individual level, but I've also offered the ability to monitor at the domain level since just after I launched HIBP. There's a verification process where proof needs to be demonstrated that the requestor controls either the domain or the site behind it then as with the individual subscriptions, notifications will be automatically sent out. Plus, the domain subscriber can run on-demand searches whenever they please too. Here's how the searches have stacked up over time:

One million subscribers later, here's the state of Have I been pwned

That chart aligns somewhat with the verified subscribers earlier on and inevitably as the service gets more airtime by media and in the wake of large data breaches, more people use it. There's been 55,414 verified searches to date, that is a search where the person running it has since gone on to demonstrate that they do indeed have control of the domain or website.

Naturally, there are also many notifications sent when email addresses appear on those domains:

One million subscribers later, here's the state of Have I been pwned

A total of 96,790 to date which is much less than the number sent to individuals, but then each of those notifications also often relates to multiple emails accounts for the one domain (sometimes there's thousands of one organisation's addresses in a single breach). I don't actually store this data (how many email addresses were impacted by a notification) so I can't report on it, but it's expressed in the notification email sent to the subscriber. The paste notification trend is pretty much what you'd expect so I'll save you from another graph and just share the total figure: there have been 47,180 notifications sent to people monitoring domains which means this:

HIBP has now sent 1,081,524 notification emails to individuals and domain owners.

That's a stat I'm enormously proud of because it's a huge number of people that have benefited from the service. But there's another really interesting angle to all this, and that's who the organisations are that are using HIBP.

Who's monitoring domains?

I'm obviously going to be careful to protect the identities of the organisations involved here, but the topic deserves some attention because even I didn't fully appreciate the scope involved until I reviewed the actual data.

One aspect I was interested in was the use of HIBP by large companies, so I turned to a list of the Fortune 500 companies and their primary domain names. Now this was always going to be a low-ball figure because it doesn't include the hundreds (or even thousands) of other domains for brand names those top 500 companies hold, but I still found what I thought was a surprisingly high figure:

At least 70 of the Fortune 500 companies have successfully run verified domain searches on HIBP.

Again, that number is inevitably way too low. If, for example - and it's only an example - Apple was using the service but never did a search for apple.com and instead did one for me.com or icloud.com, they'd be excluded from the result.

Let's try notifications:

At least 62 of the Fortune 500 companies have received domain notifications after a breach was loaded into HIBP.

I'm pretty chuffed knowing that 12% of the largest companies in the US have been using this service for proactive monitoring and they're actually getting value out of it. It's a very diverse group of companies too spanning all sorts of different industries, everything I could imagine just by eyeballing the list. They're companies whose products you use every single day and they're using HIBP to identify risks to their organisations as soon as I know of them.

As interesting as that is, it's data I've pulled from the system and prepared into the stats above. But there are many, many other organisations I've been working with that have reached out directly and requested support. Let me give you a sense of that and again, I'll obviously be cautious about protecting identities here.

I've had a lot of requests over the years by orgs with very large numbers of domain names they want to monitor. Often this is due to the diverse range of brands a company has which makes automatic verification difficult. I've ended up having some great chats with these companies and ultimately loaded domains into the system after doing my own manual verification. Think about companies such as telcos, automotive, airlines and even government and military departments. You'd be genuinely surprised at some of the brands that are so ingrained into our cultures that have reached out for support and now use HIBP on a regular basis.

The service is being used extensively for precisely what I'd hoped - making security better for those impacted by data breaches, both individuals and organisations alike. The stats are great as is the nature of the orgs proactively using HIBP, but I was also really interested to hear firsthand how it was going. Here's what I heard:

How it's helping people

I actually reached out via Twitter for comments on how orgs were using HIBP and I got some great feedback which I'm sharing with permission here, albeit anonymously.

One of the early comments I got related to a more consumer-centric use case, but I'm relaying it here because I think it's a really cool example:

HIBP has helped me because I was able to use it to show my non-tech family members how easy it was to find out if they had information leaked out there. Now most of my family is signed up for notifications!

By virtue of you being here reading this post, you're almost certainly more aware of how exposed we all are than your average person and we kinda forget that; we forget that despite how "normal" this data breach thing is to many of us, it's a foreign concept to most people. I had a recent experience which was somewhat similar where I showed my father how to search his email address and he promptly discovered that he was in the Dropbox breach. Even though he has a fair idea of what it is I do, it still came as a shock and a bit of a wakeup call.

That same awareness observation was made in this comment:

The website made me realize how I should never trust the security of any website, no matter how large or how small it is. While I had already begun compiling a list of my various accounts around the web, your site was the tipping point for me. I made it a point to ensure each site had a different password and 2FA was enabled wherever possible. For sites I no longer used, I set throwaway passwords, changed my email, and closed accounts.

And that's great because it's a genuine behaviour-changing event. Unique passwords combined with multi step verification has a fundamentally positive impact on an individual's security profile. Onto genuine org uses:

It's been of great help convincing customers that database breaches do happen, that password hashing makes a big difference, and that password reuse is a real thing.

I hear a lot of this - using HIBP to show the harsh, cold reality of our exposure. For an organisation, seeing the names of employees next to data breaches can have a profound effect on the organisational attitude towards security. To that effect, many organisations have made handling HIBP notifications a formal part of how they manage their security posture:

I monitor 50 of our company domains with this. our NOC team notifies the Security officer which in his case creates tickets for the service desk to contact the affected users and guides them in changing their passwords and explaining the impact, also on their personal life/accounts.

I like the responsibility shown here in terms of recognition that security incidents impact personal lives too. In fact, I think what I most like about this is that it moves us away from the corp-only or personal-only attitudes so many organisations have traditionally held. The reality is that compromises of personal security can have consequences on organisational security as well.

* Alerting users so they can take (reactive) pro-active measures once the dumps are (semi) public
* Real-world examples for security awareness sessions
* 're-education' of users whose corporate accounts repeatedly appear in breaches of non-work related systems

That last point is particularly interesting because it raises a prickly question: what should organisations do when they find employees in, well, "sensitive" data breaches? I first started having these discussions with organisations after the Ashley Madison data breach and as you can imagine, organisational responses are somewhat varied. Just this week, I had an interesting comment on my post about fabricated breaches which speaks to this issue:

Having been signed up for domain notifications for well over a year now, I can start to correlate the data in the alerts we're getting. We have a small handful of employees who (incredibly!) signed up for every dating & porn site out there that have had breaches and have been ingested into HIBP.

I was curious as to how this organisation dealt with these incidents so I asked for some more info and got a great response:

In terms of how we deal with the info, it depends on the nature of the breach. For all breaches where alerts are received, employees are notified, told to change their passwords on the site in question and at all other sites where the same creds were used. This tends to become a valuable teaching moment as nothing concentrates the mind more than a breach of an active account.

Which is pretty consistent with earlier feedback, but it's the sensitive stuff I was most interested in:

For 'sensitive' breaches, HR tends to get involved. Two of the most prolific offenders had all Internet access removed (they were warehouse guys so not critical for job) and were warned that any more violations of company policy would result in termination.

Now there's a part of me that's very conscious of the privacy aspect here; my system has just been used to "out" two blokes who were doing something of a very personal nature. However, it's also pretty evident that they shouldn't have been doing it on the company time with the company machine using the company email! But it gets really interesting when it's execs caught in the same situation:

For high-profile users (senior managers & C-suite), they get a personal visit and a review of their password habits, patching status...etc. If a senior exec has been involved in a sensitive breach....well that's where it gets interesting as there is the potential for blackmail, reputational damage...etc. We have only had one of those and it resulted in discussions with HR, CEO, CIO, CISO. I was not privy to how these discussions went and how it all turned out.

Consider how this can turn the discussion to "how can we protect the organisation based on what we know of the exposure of our executives". I'll give you a perfect example: after the Ashley Madison incident, I was contacted by someone in desperate need of advice. He'd found himself in the breach after using HIBP and was in a position that put him and his organisation at particular risk. He wanted to talk, which is something I wouldn't normally do, but it was a pretty unique predicament so we caught up via Skype. Turned out he was a C-suite exec at a large company in merger talks and his exposure via the breach could be seen as a black mark against his reputation and by extension, against his company's. I find it fascinating how these stories have emerged and how this little service has contributed to awareness.

Press

There was a time when I used to track where HIBP appeared in the media, in fact there's still a page on the site with media appearances (which I don't explicitly link to anywhere) but as you'll see, I gave up on that about 18 months ago. I didn't maintain it simply due to the crazy volumes of references, sometimes dozens per day turning up in my Google alerts. Even in recent weeks outside of any notable news stories, I don't think a day has gone by where HIBP hasn't been referenced in a story somewhere or other.

A search of the news via Google is probably the best reference these days and at the time of writing, there's going on 4k results with a bunch of press about the demise of Freedom Hosting II whose data I loaded in last week. There's various forum breaches and even a couple of totally unrelated Trump stories, one due to his press secretary appearing in a breach and another related to his appointment of Giuliani, another individual who's apparently been breached.

In more focused press, Motherboard did a very nice writeup last year on The Rise of ‘Have I Been Pwned?’, an Invaluable Resource in the Hacking Age which I thought they did a great job of. WIRED also did a very nice feature piece and I particularly like the outcome of the photo shoot we did, in fact I now have that story framed on my wall (although standing around on rainy London streets with photographers, a cold and no voice was much less fun...)

The attention the service has received via the security industry has also been really heartening to see. For example, Mikko Hypponen's recent Reddit AMA where he encouraged everyone to sign up to the service as part of people protecting themselves online. He's a guy I respect enormously so obviously, I was pretty excited to see that. I've also had a chance to present what I've learned in running the project at events all around the world, including next week when I'm at the RSA conference in San Francisco (in the same week as I'm presenting how I built it at Microsoft Ignite here at home!)

Summary... and what's next

These were the metrics which came to mind and struck me as particularly interesting. I've learned a bunch in writing this (as I do with so many of my blog posts), and I hope the stats here have been interesting to read. If there's other things you'd like to know then do ask in the comments section below.

As for what's next, I've still got a heap of data to trawl through and there are always interesting discussions happening about the way people want to use the service. I'm also noticing increased interest from organisations wanting to take a more active role in the project; VCs, partnerships, acquisitions and so on. Many of these approaches aren't in keeping with what I believe the project should do, but a few of them have started to get closer to where I'd like HIBP to go in the long term. At some stage, it may well make sense to head in that direction but until that time, it's business as usual and you'll continue to see much more writing, news and general anecdotes about the things I've learned right here.

↧

Weekly update 21

February 11, 2017, 12:31 am

≫ Next: Weekly update 22 (Golden Gate Bridge edition)

≪ Previous: One million subscribers later, here's the state of Have I been pwned

Sponsored by: Netsparker - Scan your websites & detect SQL Injection, XSS and other vulnerabilities with the dead accurate Netsparker web security scanner

I got up this morning thinking "I need to do my weekly update today because it's Friday". Except it's not, and due to the joy that is international travel, I really had no idea what day it was! So bottom line, I'm a day late, but with me heading off to the US for the RSA conference on Wednesday, I'm not even sure there's a lot of point me trying to acclimatise to home before I go.

Regardless, this week I talk about some of that travel, how I'm now handling "fabricated" data breaches in HIBP (and how I accidentally found my own data in one), the big milestone I just hit of one million subscribers and Netsparker's ongoing support via sponsorship. Next week I'll be coming to you from San Francisco so I'll try and find a nice little spot there to do my video (which I also expect will be late given my schedule while I'm there).

iTunes podcast | Google Play Music podcast | RSS podcast

References

Why I'm now loading "fabricated" data breaches into HIBP (and how I found myself in there... was fabricated data!)
1M subscribers! (this was a massive milestone and whilst it's only a number, it feels like a great achievement)
Netsparker is back in the sponsor bar again (still love using their to, it's my "go-to" for dynamic analysis)

↧

Weekly update 22 (Golden Gate Bridge edition)

February 18, 2017, 7:30 pm

≫ Next: My 2017 European summer tour: talks, workshops and more!

≪ Previous: Weekly update 21

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

What an awesome spot for a weekly update! Just one of the sensational views I saw today, the first day I've completed a full marathon:

Weekly update 22 (Golden Gate Bridge edition)

Ok, bit more than a marathon, 51.19km in total according to Runkeeper. Now frankly, I'm not overly keen on running (I just get bored), so I walked all over San Francisco, looked at interesting things, took plenty of photos and had some nice breaks instead. I took my Lenovo Yoga 900 in the backpack (I'll write about that separately later), and broke the day into 4 parts, taking some time out in between to sit down, have a break and get a bit of work done. That's a balanced day out I reckon :)

iTunes podcast | Google Play Music podcast | RSS podcast

References

Best blog post of the year! (nice award from RSA for a post that seems to have resonated)
Qantas still needs to fix their HTTPS things (they're working on it now... I hope)
Australia's mandatory data breach disclosure laws are protecting the guilty (there's so much wrong with this, it's very disappointing to see)
Netsparker was back in the sponsor banner (fifth week in a row for these guys, still my go-to dynamic analysis tool)

↧

My 2017 European summer tour: talks, workshops and more!

February 20, 2017, 9:38 am

≫ Next: Weekly update 23

≪ Previous: Weekly update 22 (Golden Gate Bridge edition)

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

My 2017 European summer tour: talks, workshops and more!

These days, I find myself spending more and more time in Europe. Four trips last year and one already this year versus only a single trip to the US each year (which I'm still on). There just tends to be more demand for events and workshops and if I'm honest, I particularly like the place. I've got a lot of history there having lived in the UK and Netherlands before and I love spending time back there sitting the sun at historic locations with a decent coffee or sampling the local breweries. Fun times.

What's been less fun is European winters and spending long times away without my family. In order to rectify things, I've decided to come back for a long trip in the European summer this year (not to escape my Gold Coast winter - the weather is pretty similar to the European summer!) and this time, bring the family along for part of it too. I thought I'd do a quick post outlining what I'm doing, where I'll be and how to get me over to your place for private workshops.

NDC Oslo

My absolute favourite event of the year! This will be the fourth year I've gone to Oslo for this event and the first one I did in 2013 really kicked off my international speaking career so it's a special one to me. Kylie will also be there again having done exceptionally well there last year and just to make it a real family affair, I'll have my 7-year-old son demonstrating just how easy it is to pwn websites (yes - he'll have his hoodie).

NDC Oslo runs from June 12 to 16 with the first couple of days being workshops and the next few conference talks. You can register now and they've got early bird tickets until the 17th of March. I'll be talking at the conference portion of the event and doing my usual workshop there on the 12th and 13th. Speaking of workshops...

Workshops

I've been doing a heap of these over the last couple of years and they've been going fantastically. I genuinely enjoy running them and by all accounts, people get a kick of coming to them. Not just for the security side of things, but because they're a fun and impactful couple of days.

Most of the public events I do now sell out in advance. I ran one for the .NET User Group in Copenhagen in October and it was massively oversubscribed, enough to run another one last month. With their permission, I'm sharing the feedback from that event here:

My 2017 European summer tour: talks, workshops and more!

I'm enormously happy with that feedback not just because of the quantitative ratings above, but because I got some really great comments as well (not shared here for privacy reasons). There were some neat ideas on how to improve things in the future and even though I've done this dozens of times now, it gets better each and every time I run the event.

I'll be running the public event at NDC in Oslo on June 12 and 13 and I'll be running another public one in Belgium with ZIONSECURITY a couple of weeks before. I ran one with them in Leuven only a month ago and it was another packed event that went really well so I'm coming back for more in the summer. I can now confirm the dates for the event are Thursday June 1 and Friday 2 and you can register online now.

I've got a bunch of other private events I'll also be doing before NDC which will see me in Porto down in Portugal, Rotterdam in the Netherlands and Trondheim up north in Norway. I'm also planning some time in the UK after all this around the end of June and start of July with events there yet to be confirmed.

I still have a small amount of time available for private events. This is normally the event described in the "Workshops" link at the top of your browser and all the details are described there. Reach out if you'd like to get me over to wherever you are in Europe, I'm close to capacity but hey, it's summer and a nice time to be in that part of the world :)

↧

Weekly update 23

February 23, 2017, 11:55 pm

≫ Next: Pragmatic thoughts on #CloudBleed

≪ Previous: My 2017 European summer tour: talks, workshops and more!

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

I started out doing this weekly update with not much news to share due to being away running a workshop for a couple of days then sitting on planes and in airports for another day. It was only as I finished recording I saw both shattered.io and CloudBleed, both of which you know are serious because they have cool bug names and even logos. But in all seriousness, these are both major vulnerabilities but the real-world impact isn't yet clear, at least not to me. Great headlines and they're generating plenty of hysteria, but I suspect the reality of each will pan out to be somewhat less than what some are claiming.

Moving on, this week I cover recent US travel plus my upcoming European summer tour which I'm really looking forward to, not least because it's 3 months away and I get to stay home until then! There's also a couple of new HIBP breaches loaded, various general security bits and a brand-new sponsor this week. Next week, I should have a pretty major incident to talk about but until then, here's this week:

iTunes podcast | Google Play Music podcast | RSS podcast

References

Travel plans for the European summer (pretty jam packed, but a little bit of time left)
This is not how you do password resets! (but they really are good guys and followed swiftly with a fix)
No, you don't need to connect your kids' toys to the internet (at least the Germans don't think so - they've banned a connected doll)
Trump got hacked (putting "secure" in the URL does not make it so!)
Checkmarx is up in the sponsor bar (they make a very fine static analysis tool)

↧

Pragmatic thoughts on #CloudBleed

February 25, 2017, 12:49 am

≫ Next: Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

≪ Previous: Weekly update 23

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

It has a cool name and a logo - this must be serious! Since Heartbleed, bug branding has become a bit of a thing and more than anything, it points to the way vulnerabilities like these are represented by the press. It helps with headlines and I'm sure it does wonderful things for bug (brand?) recognition, but it also has a way of drumming up excitement and sensationalism in a way that isn't always commensurate with the actual risk.

That said, the Cloudflare bug is bad, but the question we need to be asking is "how bad"? I saw the news break yesterday morning my time and I've been following it closely since. As I've written a lot about Cloudflare in the past and been very supportive of their service, I've had a lot of questions from people. I want to share my take on it - both the good stuff and the bad stuff - and per the title above, I'm going to be very pragmatic about the whole thing.

Before I get started and if you haven't read it already, start with Project Zero's outline of the bug then move onto Cloudflare's detailed blog post on the issue (that's the chronological order they were written in). Do read the comment threads on both too, they each contain valuable background and insight. Right, assuming you now understand the background, let's jump into it.

The problem is that we just can't measure it

This, more than anything, is the problem with this bug. When there's a data breach - let's say Dropbox - we can say "there were 68 million accounts impacted, here's what was lost and here's everyone we need to notify". With Cloudflare's bug, it's enormously hard to identify exactly who's been impacted and to quantify the risk.

Cloudflare refer to memory being leaked from only "0.00003% of requests" which sounds very small. I commented earlier today that you've got a better chance of winning the lotto which is mathematically correct, but there are two very important things to note with this figure:

Firstly, Cloudflare serves an almost unfathomably large amount of traffic. I wrote a course on Getting Started with CloudFlare Security in mid-2015 and they were serving 5 trillion requests a month at that time and I assume much more as of today. The bottom line is that we're looking at millions of requests per month potentially leaking data. Depending on how you look at it (and how you want to spin it, in some cases), it's either a tiny fraction of traffic or a large number of requests. In reality, it's both.

Secondly, the way they phrase it is "1 in every 3,300,000 HTTP requests through Cloudflare potentially resulting in memory leakage". Now think back to both Project Zero's and Cloudflare's write-ups on the incident: these are requests to pages with malformed tags which result in the leakage of memory from other sites. This is something they could feasibly measure because they can look at the percentage of traffic they're serving with the funky markup. However, what they're highly unlikely to be able to measure is how many totally unrelated sites had their traffic leaked by that small percentage and frankly, I can't see any way that we'll ever know.

Unknown facts create a media vacuum and encourage speculation

This incident has led to some pretty sensational headlines on stories that do a lot of talking but not much establishing of actual facts. For the reasons outlined above, it's a hard thing to do and the numbers we have don't really tell us the entire picture. It's one of the reasons I waited a day and a half before writing anything - I wanted to fully understand the issue and focus on the facts.

For example, Cloudflare has said that "we discovered exposed data on approximately 150 of Cloudflare's customers" which sounds like a very small number when they have a couple of million websites within their service. However, this relates to customer data found in caches such as Google's search engine or in other words, incidents where they could emphatically prove that customer data had been leaked. The term that immediately comes to mind here is "absence of evidence is not evidence of absence"; there could be other data that's been exposed in places they simply don't know about.

On the other hand, the premise that "every Cloudflare site ever is at risk" is nonsense. For example, this very blog runs through Cloudflare as do hundreds of thousands of other sites that contain absolutely nothing of a private nature. I've been pointed at a GitHub repository listing sites behind Cloudflare many times now as though it's a canonical list of everything that's at risk. This blog is in there. My wife's blog is in there. The sample site I used in my Cloudflare course is in there. None of these were put at any risk whatsoever by this bug because they have no traffic that can't be shared with everyone.

As much as the headlines paint this as a bad bug (and it is bad, even by their own admission), it's not as bad and the scope is not as broad and the impact is not as significant as a large amount of the press I've read is implying. When this was eventually identified, it took one of the world's best bug hunters to find it; even with the enormous volumes of traffic moving through their infrastructure, nobody else (that we know of) had found this bug. It was so obscure and manifested itself in such a tiny fraction of requests that even with the huge number of people continually inspecting traffic on all sorts of sites behind Cloudflare, it took Tavis to find it.

What's the impact (and reactions) regarding sensitive data?

Another reason for waiting to write this post was that I wanted to see how organisations were responding to the incident. I was curious, for example, what approaches would be taken to protect customer credentials. I was monitoring things this morning and saw a variety of responses. For example, Creative Commons forced a password reset. Cubits recommended people change their passwords as well as resetting their 2FA and rotating API keys. Bugcrowd decided to invalidate all sessions and like Cubits, recommended rotating API keys and advising that customers should change their passwords. Vultr merely used the incident as an opportunity to "remind you of best security practices".

There are multiple different classes of sensitive data involved here so let's look at the whole thing pragmatically and we'll begin with passwords. Frankly, these are the least likely to be impacted because requests involving passwords are such a tiny fraction of overall requests made to a site. You provide passwords on registration, login and password change and other than those three features, passwords typically never accompany either the request or the response (exceptions are rare and usually pretty screwy). If we were to attempt to put a number on it, using a typical website is going to result in only one in hundreds or even thousands of requests actually sending a password. All those requests to pages, images, JavaScript files, CSS etc. will go out password free. It's a tiny fraction of the at-risk 0.00003% of requests to Cloudflare that would actually contain passwords.

Auth tokens, however, are another story. Typically, these are going to be sent on every authenticated request, even those for static assets as the token is usually a cookie valid for the same domain that serves all content (larger sites serving static assets from other host names being an exception). On the other hand, they're only going to be sent whilst a user is actually auth'd so every request on this site, for example, wouldn't have anything of use for hijacking sessions. Again, we're looking at a subset of that 0.00003% number and of the tokens that were exposed, they're only any good whilst they haven't expired so ones caught in cache in particular may be useless by the time they're grabbed.

It's sensitive data like this which is of particular concern:

Pragmatic thoughts on #CloudBleed

This was shared by Tavis in the Project Zero post and it contains OK Cupid data. There are other screen caps in that post from the likes of Fitbit and Uber, all grabbed from Google's cache and all containing various levels of personal information. It's entirely possible that some of these were accessed by other parties and contain information that can't simply be reset like an auth token can be. Again, Cloudflare believes this impacted 150 customers which is a small portion of their overall client base, but it's still not going to be a pleasant experience for those organisations and it may make for some very uncomfortable discussions with their customers.

Password resets and session expirations are protecting against the unknown

I'm yet to hear of a single organisation saying that they've seen any malicious activity against their services or customers as a result of the bug. Mind you, it would be hard to know, such is the nature of credential abuse, session hijacking (by obtaining auth tokens) and indeed obtaining sensitive personal information. Regardless, organisations are proactively addressing this in the way I described above with the likes of Bugcrowd so let's focus on that for a bit.

The equation many organisations will be grappling with is whether they should inconvenience and potentially alarm customers to protect against a risk they're not sure they've actually been impacted by. I can't give a blanket answer on what everyone should do, but I can frame the discussions they should be having.

Emails to customers about security incidents pose multiple challenges. One of them is that regardless of how carefully worded they are, people often interpret them as a risk present in the service they use and that can impact trust and confidence. Brand value and reputation are fragile and an email going out with a title like "Important security update" is always going to raise eyebrows. There's also the subsequent support demand this may create - will customers email back with questions? Will they phone help desks? Or will it possibly even create a sudden influx of traffic that may be difficult to handle if you're asking people to change passwords?

Then there's the benefit that proactively contacting customers would have. Here I'd be thinking about factors such as the value of the asset being protected; a comments section on a media site is a very different proposition to an online account that could be exploited for financial gain (i.e. Uber). There's also the question of what technical mitigations may already be in place, for example the short expiration of sessions which greatly reduces the likelihood of auth tokens being exploited.

I've heard some people say that password resets should be done proactively because even if a particular Cloudflare customer's traffic wasn't leaked, someone else's could have been and passwords may have been reused. That argument doesn't really stack up though because we could have exactly the same discussion after every single data breach. That scenario alone doesn't pose sufficient risk to justify the impact of forcing a password reset which is a high-friction action that most impacted Cloudflare customers don't seem to be resorting to.

That last point above is important if people want to be guided by the general consensus: I'm not seeing mass password resets. Of the formal communication organisations have sent, resets like Creative Commons' above are the exception with the norm appearing to be general security advice and perhaps session expiration such as Bugcrowd has done. Cloudflare obviously tried to clean up as much cached data as possible before news broke and whilst they didn't get it all, I suspect the cleanup was sufficient and the risk deemed low enough that many organisations are not deeming it as necessary to send any communication at all.

So in short, the actions above are pre-emptive and they do have risks as well as upsides. In almost all cases, organisations taking these measures are protecting against the unknown, that is if they're taking any measures at all.

Will I stop using Cloudflare?

No. This is where we need a bit of common sense about the whole thing so let me explain my take on it:

Very often when I look at security incidents, I see a series of failures. As the story unravels, a picture emerges of a company that's made multiple miss-steps with technology, communication and general respect for their customers. A couple of times this week I've alluded to a major breach I've been working on and when I publish that early next week, you'll see exactly what I mean. That's not Cloudflare; they had a bad bug that turned around and severely bit them on the arse, but that's an exceptional situation for them, not part of a pattern.

When you look at the bug they had, you can see how it would occur and for those of us in the software industry, you can imagine how one of us could have just as easily written it at the time. I like that they've written it up as transparently as they have and it's hard to look at anything of significance they've done post-incident and say they should have done it differently.

Having said that, there are some things they need to do better too. I've seen a number of comments about offering Tavis a t-shirt for his efforts and at present, that's the top reward offered by their bug bounty program (other than free access to their services). That may seem a small and insignificant part of a much more serious incident, but it's been called out a number of times and I'd be remiss not to list it here. A vulnerability of this nature disclosed ethically is worth more than a token gesture and I hope we see their bug bounty program reflect that in the future.

Moving on, the bigger question we should be asking here is not whether or not to trust Cloudflare, but whether or not we should trust other parties with our traffic at all. As it stands now with this bug squashed, there's no reason to trust them any less than counterparts such as Imperva. They play in the same realm and could one day be faced with the same risk where data from one customer is exposed to another by virtue of a bug running on shared infrastructure. Let's follow that thinking even further: I run Have I been pwned (HIBP) on Azure's App Service which is a PaaS implementation where TLS is terminated upstream of my logical infrastructure. They could have a bug in their model which again, cross-contaminates traffic sharing the same physical resources. Or perhaps there's a bad enough bug in Amazon's hypervisor implementation that AWS customers cross-contaminate across VPSs. You can see my point here - it's a slippery slope once you start saying "I no longer trust other parties with my data".

But then you turn the discussion around and say "if not services like these, then what"? The reason there are millions of websites behind Cloudflare is because it's such a valuable service. It's made a huge difference to the way I run HIBP, for example, and it's protected both that service and a huge number of others from the likes of DDoS attacks. It's also significantly reduced my costs by virtue of caching traffic and relieving my origin servers of huge amounts of load. That poses a great deal of value and that's an essential part of the equation for anyone weighing up whether it makes sense to stick with Cloudflare.

To start wrapping up, I really like this observation by David Heinemeier Hansson (the creator of Ruby on Rails):

Good software is uncommon because writing it is hard. In the abstract, we all know that it is hard. We talk incessantly about how it’s hard. And yet, we also collectively seem shocked — just shocked! — when the expectable happens and the software we’re exposed to or is working on turns out poor.

Software is complex. Cloudflare's is certainly not "poor", but even the best has bugs and whilst we should continue striving to improve it, we've also come to expect that it will occasionally go wrong in spectacular fashion. I'll leave you with one thing I can pretty safely say: Cloudflare has never been as security conscious as what they are right now!

↧

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

February 27, 2017, 1:01 pm

≫ Next: A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

≪ Previous: Pragmatic thoughts on #CloudBleed

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

Only a couple of weeks ago, there were a lot of news headlines about how Germany had banned an internet-connected doll called "Cayla" over fears hackers could target children. One of their primary concerns was the potential risk to the privacy of children:

conversations between the child and others can be recorded and forwarded

The Germans had a good point: kids' toys which record their voices and send the recordings up to the web pose some serious privacy risks. It's not that the risks are particularly any different to the ones you and I face every day with the volumes of data we produce and place online (and if you merely have a modern phone, that's precisely what you're doing), it's that our tolerances are very different when kids are involved. I've got young kids myself and frankly, I'm with the Germans on this one; I don't see a need for them to have things like their voices recorded and stored online. That's not to say I don't want them to have an online presence and I'm gradually exposing both of them to more and more modern internet things, but I don't particularly want innocent childish behaviour like playing with a toy to be recorded and stored on other people's computers.

Cayla isn't the first connected toy to raise concerns either. Just over a year ago it was "Hello Barbie" making the headlines for precisely the same reasons. Yes, it's a cool idea but no, I (and many others) don't want my kids exposed in that way. In fact, just before that, we had the VTech data breach which exposed a huge amount of very personal information after parents bought their kids connected tablets, joined them to the wifi network and created accounts for them. Those accounts were ultimately exposed and included the kids' names, genders, birth dates, photos and links to parent with full physical addresses. That should have been the wakeup call where we all said "hey, if we put our kids' data on the web, we need to expect it to be leaked", but evidently it hasn't stopped the flood of connected toy things.

Which brings us to CloudPets (a brand owned by Spiral Toys) which is a toy that represents the nexus of both the problems discussed above: kids' voices being recorded and their data consequently being leaked. The best way to understand what these guys do is to simply watch the video:

Now firstly, put yourself in the shoes of the average parent, that is one who's technically literate enough to know the wifi password but not savvy enough to understand how the "magic" of daddy talking to the kids through the bear (and vice versa) actually works. They don't necessarily realise that every one of those recordings – those intimate, heartfelt, extremely personal recordings – between a parent and their child is stored as an audio file on the web. They certainly wouldn't realise that in CloudPets' case, that data was stored in a MongoDB that was in a publicly facing network segment without any authentication required and had been indexed by Shodan (a popular search engine for finding connected things).

Unfortunately, things only went downhill from there. People found the exposed database online. Many people and the worrying thing is, it's highly unlikely anyone knows quite how many. The first I knew of it was when earlier last week, someone sent me data from the table holding the user accounts, about 583k records in total (this subsequently turned out to be a subset of the total number in the CloudPets service). I started going through my usual verification process to ensure it was legitimate and by pure coincidence, I was in the US running a private security workshop at the time and one of the guys in my class had a CloudPets account. Sure enough, his email address was in the breach and it was time-stamped Christmas day, the day his daughter had been given the toy. His record looked somewhat like these, the first few in the data I was given:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

The password was stored as a bcrypt hash and to verify it was legitimate, he gave me his original password (I asked him to change it on CloudPets first) and I successfully validated that the hash against his record was the correct one (I'd previously validated the Dropbox data breach by doing the same thing with my wife's account). The data was real.

CloudPets left their database exposed publicly to the web without so much as a password to protect it.

Getting back to who sent it to me, this is someone who travels in data breach trading circles so I have no idea how far the data had actually circulated. However, I subsequently discovered that the database had definitely been accessed well beyond just this individual. But first, it gets worse still...

The guy who sent me the data had tried to contact CloudPets three times to warn them about the exposure. The first was on December 31 when he reached out to the email address listed on their support page but the message immediately bounced back. He subsequently followed up with the email address on their WHOIS record (it appears the WHOIS contact is a marketing company called On Demand):

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

4 days later after still not getting a response, he contacted their hosting provider:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

Note the record count here – he'd identified "over 820k users" – the 583k in circulation was not the full amount. So 3 attempts to warn the organisation of a serious security vulnerability and not a single response. I've said many times before in many blog posts, public talks and workshops that one of the greatest difficulties I have in dealing with data breaches is getting a response from the organisation involved. Time and time again, there are extensive delays or no response at all from the very people that should be the most interested in incidents like this. If you run any sort of online service whatsoever, think about what's involved in ensuring someone can report this sort of thing to you because this whole story could have had a very different outcome otherwise. (For reference, check out Tesla's Security Vulnerability Reporting Policy which is beautiful in its simplicity.) But it gets worse still and this brings me back to the earlier point about multiple people having accessed the data...

Now knowing that not only was the data legitimate, it was highly likely to be circulating and the company in question wasn't responding to emails, I reached out to Lorenzo from Motherboard. I worked with Lorenzo on that VTech data breach I mentioned earlier and he's both someone who knows how this industry works and a guy I trust to be fair, accurate and responsible in dealing with these incidents. Journalists have a knack of getting responses from organisations and I had confidence that he'd do his utmost to approach CloudPets in an appropriate fashion and alert them to this incident before publishing anything. When he heard the details of the breach, he was immediately interested and took up the task of investigating it. And then he found these images in previous communications he'd had with someone else:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

He had already been contacted about CloudPets! It wasn't just the images either, he'd received small samples of data from a selection of the tables in the exposed DB. Journos get a heap of communication from people about all number of things and a bunch of it turns out to be red herrings but in this case, it was exactly the same vulnerability identified by someone completely different. Not only that, but this individual had also contacted CloudPets on December 30 and sent this message:

I want to inform you that 45.79.147.159 is running a MongoDB instance which appears not to be correctly configured or protected by a firewall allowing connections via port 2701

Clearly, CloudPets weren't just ignoring my contact, they simply weren't even reading their emails.

4 attempts (that we know of) were made to contact CloudPets and warn them of this risk.

The images sent to Lorenzo both confirm the findings of the individual who sent me the data (it shows 821,296 records) and tell us new things about the extent of the exposure. For example, we can see 2,182,337 voice recordings in the system which seems to be a feasible number for 821k registered users. We can also see two databases of identical size at almost 10GB each, "cloudpets-staging" and "cloudpets-test". Assuming their names are self-explanatory, these support both a staging environment and a test environment although they're both facing the public web and have real customer data in them. This breaks the cardinal rule of never putting production data into a non-production system (read my post on test data done right for more on this). It also potentially exposes the production system (and production customer data) to developers building the software (another cardinal rule broken), but at this stage when it's entirely open to the internet anyway, that would be the least of their worries. The point is, what's disclosed in the images above suggests the problems go deeper than data exposure alone.

There are references to almost 2.2 million voice recordings of parents and their children exposed by databases that should never have contained production data.

But then I dug a little deeper and took a look at the mobile app:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

This app communicates with a website at spiraltoys.s.mready.net which is on a domain owned by Romanian company named mReady. That URL is bound to a server with IP address 45.79.147.159, the exact same address the exposed databases were on. That's a production website there too because it's the one the mobile app is hitting so in other words, the test and staging databases along with the production website were all sitting on the one box. The most feasible explanation I can come up with for this is that one of those databases is being used for production purposes and the other non-production (a testing environment, for example).

I wanted to understand more about how the application was communicating with the server so I took a closer look at the traffic. With the help of a Have I been pwned (HIBP) subscriber I found in the data, I was added as a friend and was then able to observe how the app communicated with the back end services. One of the first things I noticed was that the profile picture I uploaded (I just screen-captured a bunny and used that), was stored in an Amazon S3 bucket:

My profile photo: https://cloudpets-prod.s3.amazonaws.com/d0b5962f8a168dbb1c7761bf795f929e_CB53FC23-9E61-4551-9661-BC252975A931.jpeg

As you can see by loading the image, all that's required to access the file is the path which is returned by the app every time my profile is loaded. That profile also contains other personal information; the data sent to Lorenzo shows that along with references to their profile photos, it contained the names of children and their day and month of birth (although not year). It also contains relationships to parents and "friends" (i.e. grandmother, uncle) that have been authorised to share messages with the child.

I was curious as to whether or not the voice recordings would demonstrate the same behaviour, so I left a test message for my new "friend" and discovered a similar pattern:

My voice recording: https://cloudpets-prod.s3.amazonaws.com/990e5b17ad63146f4aa729209b5fea7a_3FBDE242-0954-466E-800D-421BF0CB0951.wav

Once again, an Amazon S3 bucket with no specific authorisation required, merely knowledge of the file path which is obviously stored in the app itself (returned via the API). Based on how CloudPets position their toys, you can imagine the sorts of voice messages the system contains. By virtue of the support I got from members of the service, I was given access to some of the short clips recorded directly on the toy. One little girl who sounded about the same age as my own 4-year old daughter left a message to her parents:

Hello mommy and daddy, I love you so much

Another one has her singing a short song, others have precisely the sorts of messages you'd expect a young child to share with her parents. I didn't download either pictures or recordings from other parties, only those I was specifically granted access to by HIBP subscribers, but the risk was clear:

The services sitting on top of the exposed database are able to point to the precise location of the profile pictures and voice recordings of children.

But even if you didn't know the exact location of the files on AWS, there's still another risk which would expose them and that relates to the passwords on profiles. Here's where things are a bit of a double-edged sword: on the one hand, CloudPets stored passwords as a bcrypt hash which is a good thing. It's a slow hashing algorithm designed to be more resilient to cracking should the hashes be leaked in precisely the sort of circumstance we have here. However, counteracting that is the fact that CloudPets has absolutely no password strength rules. When I say "no rules", I mean you can literally have a password of "a". That's right, just a single character. Not only that, check out how the tutorial demonstrates account creation and particular, how to choose a password:

The password used here in the demonstration is literally just "qwe"; 3 characters and a keyboard sequence. What this meant is that when I passed the bcrypt hashes into hashcat and checked them against some of the world's most common passwords ("qwerty", "password", "123456", etc.) along with the passwords "qwe" and "cloudpets", I cracked a large number in a very short time:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

That's just a very small sample from a brief run, but the figures showed there would be thousands of passwords adhering to this very small handful of bad examples.

Due to there being absolutely no password strength requirements whatsoever, anyone with the data could crack a large number of passwords, log on to accounts and pull down the voice recordings.

By now it's pretty obvious that multiple parties identified the exposed database, it remained open for a long period of time and it exposed some very personal data. It would be a safe bet to assume that many other parties located and then exfiltrated the same data because that's what people do; scanning for this sort of thing is enormously prevalent and that data – including the kids' and parents' intimate audio clips – is now in the hands of an untold number of people. But it gets even worse again...

My good friend Niall Merrigan was doing some really good work cataloguing exposed MongoDBs recently so I reached out to him for some support to investigate the extent of CloudPets' exposure. He used Shodan's API to go back and look at the historical states of the IP address and the exposed databases running on it. December 25 was the first recorded instance of the data being exposed (that's as far back as Shodan's search will take us):

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

You're looking the JSON emitted by Shodan's API here and it showed both those databases present on Xmas day. As of Jan 5, both databases were still there. However, a couple of days later on Jan 7 they were joined by a new one:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

This is where things took a turn for the worse because as innocuous as the name may seem, it has a more sinister meaning:

New actor in the #mongodb charging 1 BTC DB PLEASE_READ https://t.co/oq4eP6lkcC @0xDUDE pic.twitter.com/I0RnZIHZA4
— Niall Merrigan (@nmerrigan) January 6, 2017

Niall is highlighting a pattern he began to see only just before this new database appeared which was the use of that name to demand a ransom. He was seeing databases named "PLEASE_READ" appear across many compromised systems containing a ransom as follows:

You DB is backed up on our servers, send 1 BTC to 1J5ADzFv1gx3fsUPUY1AWktuJ6DF9P6hiF then send your ip address to email:kraken0@india.com

Whilst Shodan doesn't index the contents of exposed databases it finds, it's a safe bet that the exposed CloudPets one contained the same message as so many other compromised ones with the same name did. The analysis that Niall was doing at the time showed that at this stage, the two original CloudPets databases had been deleted which is what you'd expect when a ransom is being demanded.

On January 8, it got worse again:

Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

Like the earlier image, these are yet more indicators of compromise (IOC) consistent with the ransom demands that were going around for MongoDBs in early Jan. Niall called them out later that month as part of his commentary on how the whole saga was unfolding:

After 2 1/2 weeks the amount of data lost in the #Mongodb ransack is about 150TB (screenshots 08-01 and 25-01) pic.twitter.com/xJQ4Jf5GAq
— Niall Merrigan (@nmerrigan) January 25, 2017

There were many malicious parties taking action against exposed databases during this period and we frequently saw the same system accessed multiple times by different actors, each demanding their own ransom. It wasn't until Jan 13 that Shodan reported no publicly accessible databases remained on CloudPets' IP Address.

The CloudPets data was accessed many times by unauthorised parties before being deleted and then on multiple occasions, held for ransom.

For a great write-up on how MongoDBs were being compromised in this fashion, have a read of Extortionists Wipe Thousands of Databases, Victims Who Pay Up Get Stiffed by Brian Krebs.

So just to tie the whole saga together in a neat chronological fashion, here's the entire timeline of events as I know them:

Dec 30: Lorenzo's contact attempted to alert CloudPets
Dec 31: My contact attempted to alert CloudPets via their published support address
Dec 31: My contact attempted to alert CloudPets via the WHOIS record contact
Jan 4: My contact attempted to alert Linode, CloudPets' hosting provider
Jan 7: The original databases were deleted and a ransom demand was left on the exposed system via the IOC named "PLEASE_READ"
Jan 8: Another ransom demand was left for "README_MISSING_DATABASES" and another again for "PWNED_SECURE_YOUR_STUFF_SILLY"
Jan 13: No remaining databases were found to still be publicly accessible

It's impossible to believe that CloudPets (or mReady) did not know that firstly, the databases had been left publicly exposed and secondly, that malicious parties had accessed them. Obviously, they've changed the security profile of the system and you simply could not have overlooked the fact that a ransom had been left. So both the exposed database and intrusion by those demanding the ransom must have been identified yet this story never made the headlines. Certainly, the guy in my workshop whose data was in the set I was provided had never heard anything about the exposure and that his private conversations with his daughter had potentially been illegally accessed.

Unauthorised access must have been detected but impacted parents were never notified.

There's another angle to all this as well which helps explain why nobody was returning emails or phone calls (Lorenzo tried in vain to phone both CloudPets and Spiral Toys) as well as why there appear to be some serious shortcuts taken with the hosting situation. One look at the stock price for Spiral Toys tells the story:

Spiral Toys is worth less than half a cent per share. Since late 2015, they've been in rapid decline to the point where the company is near worthless with a market cap of only $262k (that's down more than 99% of their peak value). Not even the launch of their new connected piggy bank (yes, you read that correctly) back in November could save them; they saw a momentary bump in the share price then it went back downhill and stayed flat. The CloudPets Twitter account has also been dormant since July last year so combined with the complete lack of response to all communications, it looks like operations have well and truly been shuttered.

Circling back to the parents' position for a moment, you must assume data like this will end up in other peoples' hands. Whether it's the Cayla doll, the Barbie, the VTech tablets or the CloudPets, assume breach. It only takes one little mistake on behalf of the data custodian – such as misconfiguring the database security – and every single piece of data they hold on you and your family can be in the public domain in mere minutes. If you're fine with your kids' recordings ending up in unexpected places then sobeit, but that's the assumption you have to work on because there's a very real chance it'll happen. There's no doubt whatsoever in my mind that there are many other connected toys out there with serious security vulnerabilities in the services that sit behind them. Inevitably, some would already have been compromised and the data taken without the knowledge of the manufacturer or parents.

If you think you've been impacted by the CloudPets breach, you can now search for your email address in HIBP. You can also read Lorenzo's full writeup in Internet of Things Teddy Bear Leaked 2 Million Parent and Kids Message Recordings.

Update 1 (about 8 hours later): I've just seen the first actual response from Spiral Toys and am a bit stunned. Let's dissect this here:

The MongoDB was exposed. That's not negotiable, the data is now in the wild.
Mark said "Were voice recordings stolen? Absolutely not", which I suspect implies they were not obtained from the exposed MongoDB which is correct.
The MongoDB contains references to both profile pictures and voice records which are stored in Amazon S3.
If you know the reference to the S3 file, you can download it without authorisation, for example: https://cloudpets-prod.s3.amazonaws.com/990e5b17ad63146f4aa729209b5fea7a_3FBDE242-0954-466E-800D-421BF0CB0951.wav
If you can crack a password, you can login to an account using the app and access the voice recordings. This is the intended function of the app.
With regards to there being no rules on password strength, Mark suggested there needs to be a balance and questions "How much is too much?". Allowing a password of "a" is too little. Creating a tutorial showing a password of "qwe" is too little. Usually, 6-8 characters of mixed types is a bare minimum and the decision to have absolutely no password strength requirements whatsoever means many passwords can be readily cracked, thus granting access to the voice recordings for the account.
There is a claim made that "the company never received the warnings". There were many messages sent over a period of time as detailed above. Lorenzo also tried to get in touch within the last week via multiple channels and as he said in his story, CloudPets "could not be reached for comment". There is also a record in their ZenDesk support ticket system warning them of the risk from December 31.
Mark is quoted as saying that "the company found no evidence that any hackers broke into customer accounts", which indicates they knew of the incident before this blog post. Their databases were deleted and 3 different ransoms were left. Unauthorised parties downloaded their databases.

To suggest that the exposure and ransom of a database containing 821k user records and providing access to millions of voice recordings from and to children represents "a very minimal issue" is just unfathomable. Further to this, California (where Spiral Toys is based), has mandatory data breach reporting laws:

California law requires a business or state agency to notify any California resident whose unencrypted personal information, as defined, was acquired, or reasonably believed to have been acquired, by an unauthorized person.

(As of the first of January, that scope was expanded to also include encrypted personal information.)

Personal information includes "A user name or email address, in combination with a password or security question and answer that would permit access to an online account". Data breach disclosure laws require impacted parties be notified when their personal information (such as email addresses and passwords) is disclosed. It is not up to the company who lost the data to make a judgement decision on how likely it was that malicious parties then logged into individuals' accounts.

Update 2 (about 23 hours after posting):

After first being quoted as saying that the company never received warnings, the Spiral Toys CEO is now saying "We did have a reporter, try to contact us multiple times last week, you don't respond to some random person about a data breach". Yes, you do respond because it's random people who report precisely these sorts of things to companies that leave their databases open!
Lorenzo has posted a follow-up story on How This Internet of Things Stuffed Animal Can Be Remotely Turned Into a Spy Device which speaks to the lack of security on the toy itself (not just the back end services). There's a full writeup from Paul Stone of Context Security where he explains how "the toy does not use any built-in Bluetooth security features".
Also note Paul's comments on responsible disclosure: "we've been attempting to contact the toy's manufacturer, SpiralToys, since last October. After several emails, and messages to their Facebook and Twitter accounts, we've not yet had any response."
Further info on attempts to alert Spiral Toys: Lorenzo tried many different phone numbers which were "either dead or no one was answering them". He also tried to contact mReady who "did not respond to my contact requests".

Update 3 (1 day, 7 hours after posting):

CloudPets have now sent a notice to the California Attorney General's Office which is, well, let's just go through and correct it all here, bit by bit:

"Spiral Toys was told about a potential breach on February 22": No, they were told about it on December 31, here's the ZenDesk ticket
"after receiving an inquiry from Canadian Vice Media journalist Lorenzo Franceschi-Bicchierai": No, Lorenzo is from Brooklyn, just like it says on his profile (along with "He's also a defrocked lawyer from Barcelona, Spain—although he is actually Italian")
"After receiving [Franceschi-Bicchierai's] email, we carried out an internal investigation": No, they literally just said "you don't respond to some random person about a data breach" (unless they're saying that they acted on it but just didn't respond)
"we carried out an internal investigation and detected an issue with a migration server MongoDB": No, the server that exposed the data was 45.79.147.159 which is the IP address that spiraltoys.s.mready.net resolves to (the host name the production app hits)
"We immediately conducted a comprehensive check of the development site and confirmed that the data breach was fixed on January 9th as the server was being developed": So they knew about this more than 6 weeks before acknowledging contact and seem to be acknowledging that they put production data on their development database (real customer data was exposed)
"the data breach was part of a massive cyber attack on MongoDB that affected over 28,000 instances globally": Let us be clear - this wasn't a 0-day in the product like Heartbleed was, they didn't put a password on the database and they left it facing the public internet rather than keeping it on a private network segment
"we took extra precautions and also researched if the message and image date were exposed": What's an image date? Is it a typo of "data"?
"At that time the data was on a different server and could not have been affected by the security breach": The message and image data was stored on AWS S3; the MongoDB contains references to the URIs to retrieve the data from S3
"The statement that 2M+ messages were leaked is misleading readers into believing that all messages and images on our servers were obtained by hackers": There were 2,182,337 records in the "VoiceMessage" table with references to the files on AWS S3; we don't know how many voice messages or profile photos were subsequently downloaded using these references
"In the leaked data all passwords were encrypted": No, no passwords were encrypted, they were hashed with bcrypt and because of the design decision to enforce absolutely zero password strength, many are easily cracked in the presence of a good adaptive hashing algorithm
"The messages and images of a customer account could not be accessed unless a hacker “guessed” the password": This is what hash cracking is and it's a highly-automated process that's particularly effective against databases that had no password rules
"The hacker could have stolen the email addresses and could start running tests to find simple passwords such as “1234” or “password”": Yes, the world's most common passwords would be tested for because CloudPets allowed customers to use them!
"In the CloudPets terms of use we do recommend customers to use complex passwords": The CloudPets Getting Started tutorial literally shows how to create a password of "qwe" (also, nobody reads the terms of use!)
"Since there is a potential that hackers could try to guess passwords to acquire customers information we have invalidated all current passwords": I can still login with the password I used when I created an account last week (it's "abc")
"For the protection of our users we are now requiring users to choose new increased security passwords": The iOS app was last updated on Jan 27, but the back end service no longer allows me to use "abc" as a password; I had to use "cloudpets" instead
"The CloudPet services have been running safely since March 2015": No, they haven't, multiple people accessed the open database and took data out whilst 3 other parties tried to ransom it
"It is very unfortunate that during a standard development we were exposed to a cyber attack": It's not unfortunate, it's an egregious error that was then covered up and not reported
"Spiral Toys was not contacted by any cyber security professionals nor a hacker holding the data for ransom": Yes they were, it's all documented above and whilst the ransoms didn't involve contacting them, when the only databases remaining on the system demand Bitcoin in exchange for the return of data, that's a bit of a red flag
"The CloutPets production server": That's not how you spell the product name
"The CloutPets production server and app were at no time affected by this incident": We've been through this already - 45.79.147.159 is serving production traffic
"We will be contacting all of our customers with emails, around 500,000 users": I was sent 583,503 email addresses and that was only about 71% of the user records

CloudPets need to get someone local on-board that can help them through this both technically and legally because the way they're going at the moment, it likely won't end well.

↧

A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

March 1, 2017, 3:27 am

≫ Next: Weekly update 24

≪ Previous: Data from connected CloudPets teddy bears leaked and ransomed, exposing kids' voice messages

Sponsored by: Checkmarx - Remediate security vulnerabilities early in the SDLC. Save developer time – enhance your ROI.

A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

One of the best things I do on trips away is walk. It's a combination of things really; it's great for getting over jet lag, a fantastic way a seeing new places and if you do enough of it, a good workout too. I'll regularly walk 20+ kilometres when I've got a full day off in a nice spot, just seeing the sights, meeting up with random Twitter followers and taking some time out in nice spots to get a bit of work done. Which brings me to the Lenovo angle.

I've always been of the mindset that a serious laptop I can actually do stuff with must have serious grunt. Serious grunt, in turn dictates heft, as demonstrated by the P50 I wrote about almost a year ago. Xeon processor, 64GB of RAM, 15.6" screen, several disks and as a result, quite some number of KGs, not to mention a large form factor. I would have been perfectly happy continuing to buy machines like that and later, as Lenovo invited me into their Insiders program, perfectly happy for them to send them to me. And then they sent me the Yoga 900 which at first glance, I thought was, well, "cute". It's still a quad core i7 with 16GB of RAM, but it's obviously a small form factor and super light, only half the weight of the P50.

Anyway, the point was not to harp on the machine itself, rather to observe that it's now the thing I've become very dependent on when I'm genuinely mobile. For example, I'm writing this blog post on it now here:

A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

I took the jet ski over here to Tipplers cafe on South Stradbroke island and it has just large enough room to chuck a small backpack in the front that fits the Yoga. It's not all fun and games mind you, there's genuine native threats here I have to fend off:

A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

But getting back to walking, whilst I was in San Fran last week I thought I'd do a bit of an epic one and ended up covering just over 51km. I took the Yoga in the backpack and just sanpped off interesting pics as I went, stopping every few hours to grab a coffee and some food whilst getting some work done. Here's where I went and then all the bits I saw that I thought were interesting (at least to an Aussie tourist):

A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

↧

Weekly update 24

March 2, 2017, 11:31 pm

≫ Next: How I finally fixed the dodgy wifi on my jet ski with Ubiquiti's UniFi Mesh

≪ Previous: A marathon (and a bit) in San Francisco, Lenovo Yoga 900 in tow

Sponsored by: Checkmarx – Application Security Testing Developers Actually Use

Crikey, what a week! Between the two Cloud[thing] stories, most of the last 7 days has disappeared with research, writing, media and seeing the first Cloud[thing] turn into a bit of a non-event whilst recoiling in abject horror as the second Cloud[thing] continued to unfold. This ended up being a 35 minute "update" which is way longer than I'd normally do, but I really wanted to drill into those two stories in particular and try to express some views in a way that doesn't always come across in writing. I hope you find it a useful companion to the written pieces and I also hope that next week is a bit calmer!

iTunes podcast | Google Play Music podcast | RSS podcast

References

People kinda lost their minds over #CloudBleed (a week later, it's all died down very quickly)
CloudPets, well, just read the post and weep... (in all honesty, I can't think of a data breach that was handled worse)
I did a marathon (and then some) in San Fran with my laptop (big walk, little laptop, lots of photos)
Checkmarx is still up in the sponsor bar (and they still make a very fine static analysis tool)

↧

How I finally fixed the dodgy wifi on my jet ski with Ubiquiti's UniFi Mesh

March 6, 2017, 12:26 am

≫ Next: Weekly update 25

≪ Previous: Weekly update 24

Sponsored by: Checkmarx – Application Security Testing Developers Actually Use

How I finally fixed the dodgy wifi on my jet ski with Ubiquiti's UniFi Mesh

There are many challenging aspects about being an Australian. For example, being terrorised by kangaroos:

Being terrorised by koalas:

Or my own unique challenge: not having a decent wifi signal whilst my jet ski is moored on the jetty out the back of the house. This makes every day, normal activities like connecting to the home NAS to watch movies via Plex whilst sitting on the docked ski difficult and clearly, something had to be done.

Ok, getting (a bit more) serious for a moment, a few months ago I wrote about how I finally fixed my dodgy wifi where I rebuilt my home network from the ground up with Ubiquiti products. I was sick of the problems I was having with crappy consumer grade gear and I wanted to do things properly, so I put in all the research, went and bought a couple of thousand Aussie dollars' worth of gear and did exactly what the blog title suggests. I thought it would be a useful blog post, but I never expected so many other people would then go out and rebuild their own networks in a similar fashion. The post really resonated and as I said in my 2016 retrospective, it was the surprise blog post of the year.

Clearly, this also made Ubiquiti happy. I wrote that piece with zero input or incentivisation from them and organic, genuine reviews like that without motive are extremely valuable. Anyway, to show their thanks they sent me over a box of their bits which whilst not organically acquired in the same way as the previous bits, are still well worth talking about. Now in all honesty, I didn't need any more bits, but if you're here reading this then you're probably a bit like me in that you're gonna find a way to do something cool with them anyway.

Amongst the bits they sent was this: