Kids and Code: Conditions and loops

February 28, 2016, 4:00 pm

Last week I published the first post of Kids and Code where I started recording the process of teaching my six-year-old son to code. We used code.org which is just awesome, specifically the Minecraft game which has just the right balance of difficulty, engagement and entertainment. It’s mostly dragging and dropping blocks which represent procedures, but it’s a great way of getting kids to think about the structure that goes into code. Even at six years old, Ari picked it up very quickly and is genuinely enjoying it, so we recorded another one with some more advanced concepts.

Here’s another short clip where he starts using two fundamental coding constructs. Firstly, there are conditions – “if a condition is met, then perform a task” – which is an absolutely essential component of any software development. Next there are loops – “repeat something over and over again” – also fundamental in the code all software professionals write. Here’s how he went:

Consider the behaviours he’s able to demonstrate here:

Nesting: He has to get the “if” condition nested within the repeat control. I really like the way this is visualised here where you can clearly see the contrasting colours and drag blocks within different levels of nesting.
Reading code: We talk about how the cobblestone is only being laid when there’s lava and I ask him to show me where in the code this is happening. Actually being able to understand the coding construct that causes the behaviour is a really key skill.
Code optimisation: Towards the end, he begins by trying to add another “destroy block” command after his loop. We talk about how he can use his existing loop by simply incrementing the times it’s repeated instead.

The other thing I’ve noticed the more he does this is how suitable the touch screen is for young kids. You can see him using his finger on the screen the whole time rather than the mouse, not by my prompting but just because he’s obviously finding this to be a more intuitive user interface. The Lenovo Yoga 900 I wrote about in the previous post is perfect for this and it means that as we progress through his education he can learn the fundamentals of using a PC as well as how to interact with modern touch-enabled applications.

Teaching kids to code is an enormously valuable thing to do with your time and I had several friends reach out to me after the last post with a renewed enthusiasm. It doesn’t necessarily mean you’re setting your kids on a path to be a software developer, rather you’re empowering them to better understand the increasingly technological world they live within today.

I’ll leave you with this very short clip of Obama encouraging kids to get into the hour of code in 2014 and I especially like this segment of the video:

Don’t just consume things, create things. Take an hour to learn more about the technology that touches every part of our lives.

That’s not just our present reality, it’s their future too so help them to be a part of it.

↧

Microsoft Regional Director

March 2, 2016, 4:00 pm

≫ Next: The Australian Taxation Office scam call

≪ Previous: Kids and Code: Conditions and loops

This was not what I was expecting earlier this week:

I am delighted to welcome you to the Microsoft Regional Director program!

More specifically, the nomination I received some weeks back was not what I expected and this week’s message was what I’d dared not get my hopes up too much about.

A bit of context first – I’m not going to work for Microsoft and despite the title of “Microsoft Regional Director”, I’m no more an employee than what I was (and still am) an MVP. The MVP title remains and what the Regional Director status does is turns that up to 11. Here’s what they told me in the email:

The competition for admission to this program was intense. Your selection is a tribute to your deep technical and business knowledge, your community leadership, and your ability to connect with Microsoft customers, partners, prospects, and product group professionals.

There’s a good little piece on what the Microsoft Regional Director Program is plus a list of the folks that I join on the program which will include many familiar names if you travel in Microsoft circles. I’ll join them representing the Asia Pacific region and I expect it will give me better access to the right people in Microsoft (although in fairness, I’ve never felt this has been a challenge in the past) as well as obviously carrying kudos which helps when talking to the various organisations I work with.

As with the MVP program, independence is still key and also as with the MVP program, I suspect I’ll continue to face a barrage of “well you have to say that, because Microsoft” responses from time to time. I’ll continue to get my phone and tablet Apple, my browser from Google and my laptops from Lenovo, but I’ll also continue to love working in Visual Studio with ASP.NET and publishing it up to Azure. That’s what independence looks like.

I’m really grateful to have this recognition, particularly because it’s comes as a result of just doing what I genuinely love. I’m really enjoying creating Pluralsight courses, travelling the world to speak and spend time with organisations in workshops and seeing Have I been pwned continue to grow in unexpected ways, all of which give me an opportunity to showcase many wonderful technologies, including those from Microsoft. All of that only works because I have an audience though so a big thanks to everyone who’s helped me along the way by consuming the things I create and enabling me to have these opportunities.

↧

The Australian Taxation Office scam call

March 6, 2016, 4:00 pm

≫ Next: How your data is collected and commoditised via “free” online services

≪ Previous: Microsoft Regional Director

The Australian Taxation Office scam call

I actually thought that once I didn’t bother connecting a landline after moving house recently, it would be the end of scam calls. I used to get them all the time – the ones where they’d call up and say you had viruses on your PC – and my recordings of those turned out to be rather popular. But today I had another call, although this one went a bit differently.

First off, I missed a call in the morning from a Sydney landline number which was 02 6064 2043. There was no voicemail left and the call wouldn’t connect when I tried getting back in touch so I dismissed it. Then later this afternoon, they called back. This time I was greeted by a recording with an Aussie accent saying I needed to call that number ASAP due to a taxation issue. So I did, and here’s what happened (note: he gets a bit sweary at times):

Clearly at some stage he’s put two and two together and realised there’s no money coming his way. I get the impression the obnoxious behaviour started before that though and it’s not quite clear if it’s an intimidation or technique or that they’re just arseholes. I mean obviously the latter is true regardless, I just wonder how much that’s a part of the scam.

It’s a shame the call ended before I had a chance to see how they intended to monetise the scam. They may have attempted to extract a smaller portion of the cash as a way of placating the victim or they possibly may have just attempted to extract identity attributes they could then use for other purposes. Either way, nothing good was going to come of it and it’s pure criminal activity.

After the call, I found this number documented on a reverse phone lookup service. The comments go back several months and there are numerous online reports about the scam, including from our own government Scam Watch site back in 2010. Unfortunately, as I’ve found with previous scams, it’s pointless even reporting this to the authorities. They’re well aware of them already and the fact they’re frequently executed out of India and typically involve small amounts of money (although I’m sure it’s still a painful experience for the victims), means very little actually gets done about them (and yes, I have spoken directly with the Federal Police in the past about them). You’d hope the local telcos could at least disconnect the number yet here we are, months on and the one that was already reported above continues to operate (one of the comments on the phone lookup service refers to reporting it to the ATO).

Just another day in an increasingly connected world with cheap VOIP calls.

↧

How your data is collected and commoditised via “free” online services

March 15, 2016, 5:00 pm

≫ Next: Request for feedback: Organisations using “Have I been pwned” data

≪ Previous: The Australian Taxation Office scam call

How your data is collected and commoditised via “free” online services

I get a lot of people popping up with data breaches for Have I been pwned (HIBP). There’s an interesting story in that itself actually, one I must get around to writing in the future as folks come from all sorts of different backgrounds and offer up data they’ve come across in various locations. Recently someone sent me a list of various data breaches they’d obtained, including this one:

InstantCheckmate 2015 - 80M entries

On the surface of it, that’s a phenomenal incident and it would be the second largest data breach ever loaded into HIBP. But it turned out to be something quite different and that in itself makes for an interesting story. Let me walk you through what was provided to me, the research I did and how I eventually joined together an entirely different set of dots.

Understanding the data

The data consisted of 396 files that looked like this:

How your data is collected and commoditised via “free” online services

That’s just the first 27 of them in descending size. Just as an indication, here’s the first few rows of that largest file:

email,ip,url,joindate,fname,lname,address,address2,city,state,zip,phone,mobile,dob,gender
[redacted]@gmail.com,162.158.22.[redacted],instantcheckmate.com,2015-08-06,[redacted],[redacted],,,San Francisco,CA,94107,,,,
[redacted]@gmail.com,70.198.4.[redacted],creditcardguide.com,2015-08-06,,[redacted],,,Mitchell,SD,57301,,,,
[redacted]@gmail.com,166.137.139.[redacted],creditcardguide.com,2015-08-06,[redacted],[redacted],,,,,,,,,

In all, there are 81,191,621 records of similar structure across the nearly 400 files. Across these, there were a total of 30,741,620 unique email addresses. Aspects of this are very data breach-like: personal details spread across a large collection of files adhering to a common structure. Yet this was one of those ones which didn’t quite “smell right” and as I dug into the data, I began to realise I was looking at something quite different. Let me explain.

Verifying the “breach”

Usually when I’m verifying a data breach, the first thing I’ll do is check out the site it allegedly came from. In this case, that’s instantcheckmate.com:

How your data is collected and commoditised via “free” online services

The whole point of the site appears to be to dig up information on people; prospective employees, a spouse and well, here’s some of the options they suggest based on their testimonials:

How your data is collected and commoditised via “free” online services

Spying on your neighbours or poking around your daughter’s private life are apparently also on the cards. Given that anybody can search for anybody else, I thought I’d see what I could find on my namesake in Texas:

How your data is collected and commoditised via “free” online services

It goes through a bunch of “checks” and shows the progress along the way:

How your data is collected and commoditised via “free” online services

As the green status bar progresses, data in the locations showing the loading indictors above gradually complete as though it’s being populated as queries are run. Only thing is though, all the requests the browser sends during this process are for… images. No APIs returning results or anything remotely related to actually finding information on the individual, it’s all just showmanship. By now, I was starting to think the whole thing might be a bit dodgy, but fortunately the site was then confirmed as legit courtesy of a Norton seal (in case the irony is lost on you, read Here’s why you can’t trust SSL logos on HTTP pages (even from SSL vendors) and then Exploring the Ecosystem of Third-party Security Seals):

How your data is collected and commoditised via “free” online services

I go through a number of different steps and the only request that actually transfers any data backwards and forwards is this one:

GET https://www.instantcheckmate.com/api/check-customer/
{"success":true,"error":null,"loggedIn":false}

Not exactly the sort of response that would indicate any of these checks actually being run. But then I got through to the crux of why I’d come to the site in the first place – data collection. I filled in the following personal details:

How your data is collected and commoditised via “free” online services

And this time, information was actually sent to the server. The API responded as follows:

{"success":false,"error":"Email already in use","lead":null,"emailInUse":true}

The significance here is that this is that by confirming that the email is already in use, the site exposes an enumeration risk or in other words, the server will happily tell you if an account exists on the site or not. This is the cornerstone of much of the verification process I go through every time I load a breach into HIBP. The “john” alias on Mailinator was simply made up and for those of you who aren’t familiar with it, Mailinator is a free service that allows you to email any alias you’d like and then check it without any authentication or identity verification. For example, here’s John’s and it’s a neat service for what’s probably best described as “throwaway” accounts.

Getting to the point of all this, with an enumeration vector now discovered I could take one of the email addresses from the data I was provided and see if it exists on the site. For example, there’s stella@mailinator.com but… Instant Checkmate didn’t have it on record. Same with other examples – they didn’t exist on the site. I ran some other quick checks as well and I kept coming to the same conclusion – I couldn’t verify that the data had come from the site.

As shady as the site is (and there’s probably another story in there on just how misleading some of their practices are), I couldn’t attribute the “breach” back to them so I needed to look further.

Digging into the data

I wanted to get further clarity on the accuracy of the data because one thing I see a lot of is fabricated breaches. Because the data has both IP address and physical address, there was an avenue here I could pursue. For example, let’s take that first record above:

[redacted]@gmail.com,162.158.22.[redacted],instantcheckmate.com,2015-08-06,[redacted],[redacted],,,San Francisco,CA,94107,,,,

And now do a search on the IP address:

How your data is collected and commoditised via “free” online services

The IP address is located in San Francisco and so is the physical address on the record. I kept picking these at random and kept getting matches; the IP address was always a match to the location. My initial thought was that this likely indicated the person signing up to whatever service had leaked the data was entering their own physical address. In the case of Instant Checkmate, this might indicate that if the data had come from them, we’re looking at the info of the person doing the search, not the person they were searching for.

But the more I looked at the data the more…. good it looked. Too good – people don’t consistently enter the right city and postcode without typos or differences in case or other nuances that us fallible humans are so good at introducing. No, this was way more likely machine generated and given how well IPs were lining up with locations, it was very likely a case of someone on the other end taking the IP of the person who signed up and generating the physical location from it. Assuming that’s the case, it really told me very little about where the data had come from. I needed another angle and fortunately, I have a few hundred thousand of them on hand.

Verification with HIBP subscribers

As of the time of writing, I have 356k people subscribed to the free HIBP notification service and verified as wanting to be on there (they click a link in an email I send them). Every time there’s a paste or a breach loaded into the system, any single one of them who appears in it receives an email letting them know of their exposure. What this also means is that I’ve got a great list of people I can reach out to if I need help in verifying a data breach, something I’ve done on a number of occasions now when I’ve been unable to confirm the legitimacy of the incident.

I sent off a couple of dozen emails to the most recent subscribers asking for assistance and got a number of responses, including one from a girl in New Jersey. She offered assistance so I sent her over her record which was similar to the one above, but this time included the URL prepareyourcredit.com. She confirmed her name was correct but had this to say in terms of the location of the IP address:

A few blocks from a place I lived ~10 years ago.

And as for the URL:

Never heard of it, certainly didn't sign up for it in 2015. My credit is fine so I wouldn't even sign up for a similar service.

Which was interesting because it got me wondering how on earth she came to be associated with the site. She went on to say this:

I did sign up for my share of "earn money by taking online surveys" sites while in college, which is about the right timeframe for that data to have been sold/harvested.

As we went back and forwards discussing the data, the most likely explanation became that she had signed up to that site a decade ago and for some reason it had then been time-stamped last year and was now circulating around the web. Other responses from other people were consistent with the location being correct at some point in their lives, but them having no recollection of the site in the URL. This got me particularly interested in what was at the end of those URLs, so I did some digging…

Source URLs

I imported the entire data set into SQL Server to do some analysis. In there, I found over 900,000 unique values in the URL column. Some of them were due to data integrity issues in the source (i.e. inconsistent delimiters in some files), yet there were still 144 URLs with more than 50k records against them so obviously a large array of addresses.

Here’s the top 20 in terms of how many entries they had (note the double-up on the one that also represents the HTTP scheme):

URL	Records
originalcruisegiveaway.com	6,315,233
www.directeducationcenter.com	4,483,469
creditcardguide.com	3,021,831
instantcheckmate.com	2,742,961
cash1234.biz	2,026,161
stimulationserotica.com	1,948,071
prepaidoptions.mobi	1,864,010
progressivebusinesssystems.com	1,812,917
thecouponcastle.com	1,791,022
employmentcalling.com	1,663,183
www.alwayscashloans.com	1,503,810
freerewardcenter.com	1,476,341
paydayloaneveryday.com	1,250,648
homepowerprofits.com	1,209,936
theonlinebusiness.com	1,102,612
http://www.homepowerprofits.com	1,083,029
pdlloans.com	987,104
employmentsearchusa.com	968,445
getamoneyadvance.com	964,620
luckylending.com	952,957

Just reading through these, you’ll notice a very common theme. Here’s what the other four in the top five look like (you’ve already seen Instant Checkmate):

How your data is collected and commoditised via “free” online services

It’s that same sort of sleazy marketing feel – “give us your info and we’ll give you something… maybe” – and it’s the sort of site that most of us end up on accidentally and then get out of ASAP. But “feel” is not enough to start drawing any conclusions on where the data had actually come from, I needed evidence.

I started by looking at the HTML source of the top sites and found, well, some patterns:

How your data is collected and commoditised via “free” online services

Expecting to find privacy-enabled WHOIS records, I ran a domain search and instead, found this on each of the ones above:

How your data is collected and commoditised via “free” online services

Now this is curious because we have identical ownership across sites designed to help you with your education, win a cruise and get yourself a credit card, not exactly complimentary business models.

I kept probing and found more matches:

How your data is collected and commoditised via “free” online services

Ok, firstly, tables. But secondly, these ones all had privacy enabled and all used the same name servers:

How your data is collected and commoditised via “free” online services

So here we have another set of sites almost certainly from the same organisation, albeit a different one from the first trilogy of sites I showed earlier. Now we’ve got a couple of employment sites and one about money advances which again, are not what you’d consider similar business models… unless the model is something altogether different…

Data harvesting services

Let’s go back to the largest of the sites in terms of the records provided in the dataset and that’s originalcruisegiveaway.com. Here’s how to “claim your spot” (but hurry!):

How your data is collected and commoditised via “free” online services

Once you enter your personal info (or fabricated info…) all you need to do is, wait…

How your data is collected and commoditised via “free” online services

Assumedly, at some point you will end up carrying your bride away on a tropical island (or possibly being carried away by your man, depending on your perspective).

However, there’s a popular saying that if you’re not paying for the product, then you are the product! It wouldn’t be a real solid business model to simply go giving away cruises to anyone who filled in the form, so there has to be another upside. Let’s try the privacy policy and in particular, this section (emphasis mine):

We may sell the personal information that you supply to us and we may work with other third party businesses to bring selected retail opportunities to our members via direct mail, email and telemarketing. These businesses may include providers of direct marketing services and applications, including lookup and reference, data enhancement, suppression and validation and email marketing. Regardless of any State or Federal Do Not Call Registrations, you the customer expressly consent to be contacted via telephone in reference to this offer.

But hang on – isn’t this your data they’re selling? Nope:

Once it is received in our database, any information, including your name, e-mail address, and home address becomes the property of Interactive Marketing Solutions.

This is outrageous! It’s America, so let’s just sue them. Oh wait, can’t do that either:

To the extent permitted by law, you agree that you will not bring, join or participate in any class action lawsuit as to any claim, dispute or controversy that you may have against the Company and/or its employees, officers, directors, members, representatives and/or assigns.

So in short, they own your data, they can resell it and there’s nothing you can do about it. But hey, at least you get a free cruise out of it, right?! Well, no. The web is littered with stories about free cruise scams and it’s entirely possible – no, likely – that this falls into the same category.

But that’s cruises, what about something like the education one? That data is also flying around the web, in fact it was being discussed in a forum on SEO tactics years ago:

How your data is collected and commoditised via “free” online services

The original data is gone, but the site was archived:

How your data is collected and commoditised via “free” online services

The more I dug into these, the more all these sites conformed to similar patterns – “Hey, give us your details and you’ll get free stuff” – and the clearer it became where the data in the records I was handed was actually sourced from.

Now you may be thinking that this is just data that’s sold or traded in underground circles, away from the public eye and only obtainable by those who mix with this class of, well, “adversary”. But it’s much more public than that, let me explain.

Your data for sale

This brings us to the Special K Data Feed:

Now normally I’d be a bit cautious about linking through to a resource like this but in this case, I feel that more exposure is better simply to illustrate the extent of the problem. There are screen caps of personal data – data I’ve elected to obfuscate here – but as you’ll also read, you need to take these with a grain of salt.

Apparently the data costs “1500$ compared to the 8,000$/month retail rate for it!” which is a little unclear in terms of what constitutes a month, although inevitably it’s a subset of the records handed to me. And why would someone want this data? For all sorts of good reasons:

This is data for sale – your data for sale – but of course you knew that because you agreed to your data being sold in the terms and conditions of the sites you gave it to, right?! The data here is a very close match to much of what I was given, in fact you can browse through the collection of files (although you can’t open them) and you’ll recognise many familiar filenames from my earlier screenshot.

As you browse back to the root of the site, you’ll find all sorts of data sets from different countries, including “foreign” countries like Australia:

This data is sourced from all sorts of different locations:

5 Million records (w/phones, emails): Collected from several individuals/consumers in Australia, from verticals such as: Debt, Weight Loss, Cell Phone Sales/Accessories, and Sweepstakes/Giveaways!

Now I know what you’re thinking – you’d be blessed to have access to this sort of data and you’re absolutely right!

It's very RARE, but BLESSED when there's 5 Million records of this kind of data. Whether it's for PHONES or EMAIL, you really get a nice PIECE that represents a NEW frontier for NEW PROFITS! Australia, the Island Nation worth swimming on shore for its outrageous outback barbequeing, kangaroos in the wild, and in some areas: a 12-to-1 female to male pop. ratio!

If you like the idea of being the one bloke surrounded by a dozen women while you BBQ your kangaroo (or something like that), take a look at the data they have for you (click for the full-sized image):

This may look like personally identifiable info I’ve just shared but it’s not – it’s fake. The first sign of this is the very first name – “Sheila” – which just seemed way too convenient next to talk about kangaroos and BBQs. It’s frequently a tongue-in-cheek name for Aussie women and whilst indeed there are a number of legitimate Sheilas out there (named Sheila), this was just a bit too coincidental.

What’s not possible is to have all of these people in Western Australia yet have post codes which start with a 4:

How your data is collected and commoditised via “free” online services

They’re all Queensland post codes in the sample data, yet they all sit next to phone numbers beginning with 03 which is only used in Victoria and Tasmania:

How your data is collected and commoditised via “free” online services

So that’s half the country’s states represented right there in the “sample” data. Let’s try Sheila’s IP address:

How your data is collected and commoditised via “free” online services

If you’re looking for kangaroos to BBQ there, you’re going to be greatly disappointed!

The bottom line is that there’s a bunch of fake data being sold. Initially I looked at this and was concerned to see names like Kogan there in the source of the data (they’re a legitimate company), but clearly it’s all fabricated anyway. I wondered if it was perhaps merely representative of the sort of data you could expect from the 5 million records, but there’s nothing on the site to explicitly suggest that.

The thing is though, other data isn’t fabricated because I’ve verified it with HIBP subscribers. Now that may well have come from a different source and the D4M site is merely reselling it as one of many online distribution channels, but the fact remains that in those 30M+ records there is legitimate personal data.

Conclusions

When I started looking at the data I was sent, I expected to end up with an additional 30 million records in HIBP. That’s not going to happen because this isn’t a data breach therefore I won’t be loading it, but what the exercise did do is open my eyes further in terms of understanding how personal data is collected and redistributed.

Aspects of this story remain unanswered for me; do the sites I discuss above have business models beyond just data harvesting? Instant Checkmate certainly wants to charge for reports, was it a conscious decision to sell the data in the collection sent to me and indeed did it even come from them in the first place? As for the other giveaway sites, are they simply on the shady side of the web running misleading campaigns or is it their express intention to collect and then resell the data? If anyone has any insights on this or would just like to speculate, please do leave a comment below.

Your data is actually quite valuable and I don’t just mean your sensitive personal info such as your birth date or your income level or the things we would normally associate with being worth something to nefarious parties. Your name and your location combined with your email address is valuable too, perhaps not in isolation but in large collections numbering in the millions, they’re actually worth a bit and they can be sold over and over again. The actual impact of this on you may be minimal (namely spam), but most people aren’t real keen on their data being traded in this way.

Your data is a commodity and as such, others will attempt to extract it from you and sell it. Remember this the next time a site like the ones above request it from you; how are they justifying their online presence? Are you the product? Probably.

↧

Request for feedback: Organisations using “Have I been pwned” data

March 17, 2016, 5:00 pm

≫ Next: Understanding CSRF, the video tutorial edition

≪ Previous: How your data is collected and commoditised via “free” online services

Request for feedback: Organisations using “Have I been pwned” data

Working on Have I been pwned (HIBP), I come across a lot of interesting things. Interesting people dealing in data breaches, interesting vulnerabilities in systems which have been compromised and interesting requests from people wanting the data. In fact, I was getting so many requests for data I ended up writing No, I cannot share data breaches with you where I very explicitly laid out how I wouldn’t give people their own record from a breach, I wouldn’t give the data to researchers and I wouldn’t trade data breaches. I still hold that view – nothing has changed there – but I’ve been receiving some requests recently for access to data which is causing me to stop, think and, well, write this blog post seek your feedback. Let me explain.

HIBP is used in a number of different ways by large organisations. Some of them rely on the public API to check the exposure of their users and notify them, many have domain subscriptions which send them a notification if one of their colleagues is in a breach and a small number are using the commercial callback implementation to notify them when an account they’re monitoring is impacted. However, there are a few things stopping other orgs from using the service in any of these ways, things I can’t overcome with the current model.

The barriers I’m hearing from organisations who would like access to the data to do good things (I cannot emphasise that point enough) are two-fold; service location and privacy. On the former, these orgs are typically European and are beholden to local legislation which doesn’t like their data being sent outside the EU and subscribing emails to or querying HIBP would do just that. I could stand up an EU instance of the service, but it doesn’t solve the next issue which is privacy. These companies are understandably worried about sending me any of their data. They don’t want a situation regardless of where the service runs where I know who their customers are and I totally get that – I’d have the same concern.

So it leaves me in somewhat of a quandary; these organisations want to do good things with the data but my existing constructs make that impossible. Let me talk about the sorts of things they want to do though as that will put things in more perspective and I’ll start with an example. Someone sent me this from Amazon just the other day, have a read:

From: security-update@amazon.com
Subject: Your Amazon password has been changed
Date: March 16, 2016 at 4:02:41 PM CDT
To: [redacted]

Hello,

At Amazon we take your security and privacy very seriously. As part of our routine monitoring, we discovered a list of email address and password sets posted online. While the list was not Amazon-related, we know that many customers reuse their passwords on several websites. We believe your email address and password set was on that list. For your security, we have assigned a temporary password to your account.

You will need to reset your password when you return to the Amazon.com site. To reset your password, click "Your Account" at the top of any page on Amazon.com. On the Sign In page, click the "Forgot your password?" link to reach the Amazon.com Password Assistance page. After you enter your email or mobile phone number, you will receive an email containing a personalized link. Click the link from the email and follow the directions provided. 

Your new password will be effective immediately. We recommend that you choose a password that you have never used with any website.

Sincerely,

Amazon.com
http://www.amazon.com

This is actually really cool – Amazon are sourcing data breaches then notifying their customers when they find a credential match. This isn’t entirely new, in fact I wrote about how both them and LinkedIn were doing it back in November. There are many fascinating things about this: Firstly, they’re actively acquiring data breaches, an activity that some have suggested is on the shady side of legal yet here we have some of the web’s largest properties doing it to do good things. Secondly, given that the vast majority of breaches have some form of cryptographic storage for passwords (even if it’s just plain old MD5), they’re either cracking hashes or hashing their customers’ passwords with the algorithm of the breached site when they log in (or register or change their password). It’s quite possibly the former as the individual who sent me the email above noted that it was an account he hadn’t used for years.

I can only see upsides to the likes of Amazon doing this. They’re decreasing the risk to the individual and they’re decreasing the likelihood of them having to deal with account takeovers by malicious parties. If they get onto this fast enough after a breach is found (and that’s always going to be the challenge), they have the ability to put a serious dent in the value of the data for those who wish to do harm.

All this brings me back to the requests I’ve been getting for access to HIBP data. These are coming from large names you’d recognise in various technology sectors and the common thread from these legitimate big players is that they genuinely want to improve the security of their existing customers. They want to do the same thing as Amazon and LinkedIn except they want the ability to consume that data “as a service”. But they don’t want the full data breach; they don’t want birth dates and genders and in a case like VTech, the names of people’s kids or for Ashley Madison, what your bedroom preferences are. They only want email addresses and passwords and in some cases, only just the email addresses.

I’ve thought of many different ways to do this without sending large volumes of data to the organisations keen on this approach. For example, I could have them send me a hash of the email they want to monitor and then hash each incoming email in a new breach and match based on those. In fact, this is already a suggestion in the HIBP User Voice but the problem is that I can still derive the original value of an address once I match it to one in a breach and they’re still “giving” me addresses which is against their rules. I’ve thought of just providing directions on where an org can obtain the breach from but not only is this not always possible (many are sent to me privately), there’d still be a bunch of processing work required on their behalf, work that I already do and they’d like to consume “as a service”.

The more I’ve thought about this, the more I keep coming back to one key principle: could this help data breach victims? Would it reduce the number of accounts they’ve reused credentials across getting compromised and increase their awareness of the risks they face? It’s an emphatic “yes” and the remaining questions about whether this should happen or not are related to the downsides. Would this violate their privacy? That’s largely dependent on how the organisation I send this to handles it. Could it be done securely? Certainly we have mechanisms to transfer data in a highly secure fashion, it potentially wouldn’t even need to touch the Azure-based HIBP service (i.e. could be distributed directly from the environment in which the processing is done).

Here’s what I’d like from you: On balance, is this a good thing for data breach victims? Which organisations should have access to this data? What controls would you put in place? Is this something that would be valuable to your organisation? Anything else?

If you’ve been following HIBP for any period of time, you’ll know that I’m super cautious around issues of privacy and using the data for good. The last thing I’d ever want is to have all sorts of issues arise that I’d never even thought of. It’s not a small thing either – I’d have to make it a commercial offering – and there’d be a significant time commitment from me not just to build the mechanism but also each and every time I process a data breach. An agreement in place with a consuming organisation would also need to strictly outline how the data may be used, for example only to provide services that improves the security of their existing customers and not to elicit business from new ones. But I think it has the potential to do good things in the right organisations, such as what you see above with Amazon.

Leave your comments below folks or if you want to share anything privately, see the contact page. I’d love to hear your feedback.

Update: Have a read of Nzall's comment about hashing. This sounds like a good model for reducing the risk of abuse as the customer would need to hash and compare each and every email on their end, they could never simply pull the entire list of addresses from the breach.

↧

Understanding CSRF, the video tutorial edition

March 20, 2016, 5:00 pm

≫ Next: New Pluralsight course: Ethical Hacking, Denial of Service

≪ Previous: Request for feedback: Organisations using “Have I been pwned” data

Understanding CSRF, the video tutorial edition

Cross site request forgery is one of those attacks which remains enormously effective yet is frequently misunderstood. I’ve been running a bunch of security workshops for web developers around the globe recently and this is one of the topics we cover that often results in blank stares when I first ask about it. It usually unfolds that the developers have multiple resources at risk of a CSRF attack and if it’s not a classic web form style resource, then it’s frequently an API somewhere (you’re passing anti-forgery tokens to any APIs you wouldn’t want fraudulently called, right?!)

I thought I’d record a quick (ok, half an hour is still quick for me!) and unedited walkthrough of the mechanics of CSRF and how ASP.NET deals with it in both MVC and Web Forms. The .NET bits are just examples of how anti-forgery tokens in hidden form fields and cookies work though so don’t worry if you live in another web stack, it’s the same fundamental defence. Here’s the vid:

For a great example of nasty CSRF and an attack style we’ve seen many times before now, check out how an attack campaign compromised 300,000 home routers, alters DNS settings.

If you’d like to have a play with CSRF yourself, that form I used in the vid is here: http://evilcyberhacker.com/csrf.html

The site I use to demo it is here: http://hackyourselffirst.troyhunt.com

If you’d like to read about CSRF in more detail, check out my 2010 post (crikey, is it that long already?!) on OWASP Top 10 for .NET developers part 5: Cross-Site Request Forgery (CSRF). If you have Pluralsight access, it’s in my course of the same name for the .NET folks or my Hack Yourself First course if you’d like a technology agnostic view of it. If you don’t have Pluralsight, firstly, what’s wrong with you?! :) And secondly, you can still get three months for free using this one neat trick… enjoy!

↧

New Pluralsight course: Ethical Hacking, Denial of Service

March 21, 2016, 5:00 pm

≫ Next: The world needs more stupid security researchers – join me!

≪ Previous: Understanding CSRF, the video tutorial edition

I’ve just launched my latest Pluralsight course titled Ethical Hacking, Denial of Service but before I explain what’s in it, let’s kick off with some trivia: DDoS attacks have increased massively in size in recent years:

New Pluralsight course: Ethical Hacking, Denial of Service

This is from Arbor Networks’ latest Worldwide Infrastructure Security Report and that was current in October when the study was done. Now, it’s not so current:

New Pluralsight course: Ethical Hacking, Denial of Service

By the time you read this, we may well be at 700 or 800 or who knows where because the trend is very consistently “up and to the right”, as they say. But it’s not just the scale of the attacks which is alarming, it’s the cost. We may think of a DDoS attack as being a simple temporary outage (or a peaceful protest, as some would like to see it), but have a look at how much Arbor found it actually costs organisations in their report:

New Pluralsight course: Ethical Hacking, Denial of Service

So about two thirds of company take a hit of $30k an hour or more. 14% of them were seeing costs upwards of ten times that – $300k per hour because of DDoS!

But here’s the other crazy thing about this class of attack – it’s easy, it’s cheap and you don’t need any specific app flaws in order to seriously mess with a site. Here’s a perfect example of how easily DDoS is, go and check out this site (turn your speakers way up first!):

I’m not normally in the habit of linking through to malicious sites, but “booters” are so prevalent and easily accessible on the clear web that you’ll find them within minutes anyway if you want to. The crazy thing with the booters is the sheer prevalence of them and the ease with which they seem to be spun up. For example, the Lizard Squad service of late 2014 which proved to be rather effective at taking down the likes of Sony’s PSN and Microsoft’s Xbox Live. Even kids in their bedrooms are spinning up DDoS services these days:

Now to be clear, I’m not exactly sure how effective Christa’s service really is, but there’s something about DDoS that really appeals to kids. Part of it is likely that they’re just so easy to execute in an impactful way; not only are there many DDoS as a service offerings out there (DDoSaas?), they also don’t need any specific vulnerabilities in the target site in order to cause damage. No SQL injection or XSS or enumeration risks or anything like that, they merely need a site which can handle less traffic than they can send it.

DDoS has also become a favourite with this genre that we know as the hacktivist. For example, we’ve seen successful attacks based on philosophical differences such as #oppayback in the wake of PayPal and co refusing to accept donations to Wikileaks. The legalities of this action aside (and I’ll come back to that), you can understand the ire the hacktivists felt in the wake of PayPal’s decision. Likewise, there probably wasn’t a whole lot of sympathy when hactivists DDoS’d Scientology and I suspect next to zero sympathy when it was Westborough Baptist church being hit.

But the ease of this class of attack also makes for some pretty unreasonable action on behalf of adversaries. For example, just a couple of months ago Nissan was DDoS’d. Why? No, not because they build cars with security flaws, rather it was because of the whales. Hang on – what? This is not because Nissan is upholstering cars in whale skin or testing airbags on them or anything whacky like that, it’s because Nissan is Japanese and there remains “scientific research” there which involves harpooning them. (It’s unclear how scientific turning them into sushi is, but I digress.) The point is that Nissan themselves almost certainly have nothing to do with whaling yet they copped a rather effective DDoS there for a while.

Another reason DDoS attacks remain rampant particularly via hacktivists is that they frequently don’t believe there’ll be any recourse. I recently watched the We Are Legion documentary (not sensational, but worthwhile viewing IMHO), and it was interesting to see just how unaware so many of those involved in the aforementioned #oppayback attacks were. They gave no thought to the fact that what they were doing might be illegal and could have serious consequences. Being entirely objective about it, you can see how they might draw that conclusion; they’re sitting there in their bedrooms firing up a tool like LOIC (a favourite with hacktivists) and then just… going outside and playing. No seriously, here’s a tutorial by Anonymous that tells you to do just that:

Ok, this is funny for many reasons yet it does demonstrate how eager especially kids or young adults are to jump on the DDoS bandwagon. In the aforementioned documentary, many of those interviewed talked about how DDoS was just a form of protest, in fact even their lawyer was quite adamant they were just exercising their freedom of speech rights. The problem is though, when you’re costing an organisation thousands of dollars by the minute, you’re actually doing serious damage and with that comes serious consequences. A few years back I wrote What is LOIC and can I be arrested for DDoS’ing someone? because I was amazed at just how many people thought they couldn’t be arrested for it!

I could go on about DDoS because it’s a genuinely fascinating topic, but I do want to touch on the point of writing the post in the first place which is to launch my latest (and 21st) Pluralsight course titled Ethical Hacking, Denial of Service. This is now the 6th course in the Ethical Hacking series and I decided to break it down into four modules like so:

Understanding Denial of Service Attacks: How this class of attack has grown so rapidly in recent years and the impact it’s having on organisations, as well as some really neat visualisation of attacks such as the Norse map.
Attack Techniques: There’s a lot more to denial of service attacks than a bunch of people pointing LOIC at a site. SYN floods, ICMP and UDP attacks, DNS amplification and even the man on the side style of attack we saw against GitHub last year to name just a few.
Tools and Services: Obviously I touch on LOIC (and it’s web-based counterpart JS LOIC) and I then delve into the operations of booters as mentioned above and the whole DDoSaaS class of offering. In fact I even do a walkthrough of one of these which gives some insight into just how polished these services have become.
Defending Against Attacks: Given the course is aimed at your ethical class of professional, I naturally spend a bunch of time talking about how to limit the effectiveness of these attacks. It’s never foolproof, but there’s a lot that can be done to lessen the blow of an attack.

That’s it for DDoS, the course is now live and ready to be viewed – enjoy!

↧

The world needs more stupid security researchers – join me!

March 29, 2016, 4:00 pm

≫ Next: MVP, round 6!

≪ Previous: New Pluralsight course: Ethical Hacking, Denial of Service

The world needs more stupid security researchers – join me!

I love this Google Play store review of the NissanConnect app which had such terrible security issues recently:

I may print and frame this: pic.twitter.com/P0hu7E08GQ
— Troy Hunt (@troyhunt) March 17, 2016

I join a long line of stupid security folks who’ve messed things up for other people. Sometimes people have been unable to purchase things just because a stupid researcher found that credit cards may be stolen. Or unable to use government services because another stupid researcher found personal info may be exposed. How inconvenient!

It’s like that stupid doctor I saw that time who told me I had pneumonia and made me rest up. What a dick.

Or the stupid local government that makes us put a fence around the pool. They’re my kids running around the place, dammit!

Whilst there is stupid security, we need more stupid security researchers. Not everyone will always agree with you, but we need you to help make the web a safer place. As a permanent reminder that sometimes you need to upset the minority in order to better protect the majority, I did indeed print and frame that app store review:

The world needs more stupid security researchers – join me!

People will get angry. They’ll blame you for the ills created by others. They’ll accuse you of all sorts of things with reckless, irrational abandon. If you’re cranky about negligence such as this being called out then just be patient, there’s much more for you to be mad about yet.

Don’t be Melvyn. Be bold and stupid and whatever else it takes to approach security like this ethically, yet effectively. The world needs more stupid security researchers!

↧

MVP, round 6!

April 6, 2016, 4:00 pm

≫ Next: Have I been pwned, opting out, VTech and general privacy things

≪ Previous: The world needs more stupid security researchers – join me!

Each year since 2011, on April first (yeah, I know…), I’ve looked for one of these to land in my inbox and fortunately, this year didn’t disappoint:

MVP, round 6!

The MVP program has been an enormously fulfilling thing to be a part of these last five years. It’s been great for the connections I’ve made, the access to folks in Microsoft and the community engagements it’s lead to, particularly in my post-corporate life as an independent. Now yes, I’ve been misquoted as “Troy Hunt from Microsoft” many, many times as a result but frankly, that’s been the only downside.

For those wondering, this is independent of my role as a Microsoft Regional Director. I’ll still go through the annual MVP renewal process which looks at what community contributions I’ve made and then (hopefully) the RD thing will continue to tick over on a bi-annual basis.

Thanks everyone who continues to read this blog, watch my Pluralsight courses, come to my conference talks and do the things that you do to help me have such a fulfilling career :)

↧

Have I been pwned, opting out, VTech and general privacy things

April 7, 2016, 4:00 pm

≫ Next: Lenovo P50 and my dislike of high DPI displays (for now)

≪ Previous: MVP, round 6!

Have I been pwned, opting out, VTech and general privacy things

It’s now going on two and a half years since I launched Have I been pwned (HIBP) and I’m continually amazed by how much has happened in that time. It started out with a “mere” 152M breached records and has now more than doubled in volume, I’ve added an API, notifications, domain searches, pastes and a heap of other things both visible to the public and behind the scenes. It’s also gone from a hobby project which I thought only a few curious technology people would visit to a site that’s seen over a million visitors in a single day in the wake of the Ashley Madison breach and has been covered by some of the world’s largest media outlets.

The breaches themselves have gotten me involved with a side of security I previously had limited insight into, a side that many of us probably don’t think about that much. For example, the outpouring of deeply sensitive information from Ashley Madison members. Another side of data breaches I didn’t fully appreciate earlier on is the prevalence with which they’re traded online. In fact to my delight, HIBP appears to have proven detrimental to this trade which is an entirely unexpected yet pleasing outcome. As you can imagine, running this service has also exposed me to some very interesting characters; sometimes highly intelligent, other times (mostly) innocent juveniles and other times, a downright nasty side of society.

Dealing with data of this nature means that privacy is always going to be something I need to be especially mindful of. From the outset, I made the call to never store anything other than email addresses in the system so no other data breach attributes such as passwords or even names touch HIBP. Last year I introduced “sensitive” breaches as well, that is I made certain incidents such as Ashley Madison, Adult Friend Finder and YouPorn unavailable for public search due to the potential for it to have a serious adverse impact on those exposed. Yes, they’re all readily obtainable data breaches circulating around in public anyway, but I don’t want to be the one that facilitates a simple online search that then causes someone serious harm as a result of them being found in one of these sites.

The ethics around how I handle data on HIBP is something I put a huge amount of thought into. I hope that by virtue of the transparency I’ve always had with how I run this site, people see it as a resource for good and that’s almost unanimously been the feedback from people – 99.9% positive, I’d say. To that effect, I want to write about two more things I’ve changed on the site just today in order to continue focusing on privacy in an ethical way: the ability to opt out of being publicly discoverable in any data breaches in HIBP and the removal of the VTech data. Let me fully explain the rationale behind both of these changes.

Opting out of HIBP

I’m not sure how many people have visited HIBP since I started it back in 2013. Google Analytics reckons about 15M page views and there’s a heap more that wouldn’t have been counted thanks to ad blockers and VPNs that keep the trackers away. All I know is that it’s 8 figures which at least in my books, is a hell of a lot.

To date, I’ve had 5 people ask me to remove their data. Five. That is all. A few of them have come across tersely to begin with but I’ve always been conciliatory and immediately obliged after which they usually turn out to be pretty cool. Often they’re unaware that sites are disclosing their presence on them anyway via enumeration risks; try entering an email address into Adult Friend Finder’s password reset page and you’ll see what I mean.

But regardless of the ease with which you can discover someone’s presence on almost every one of the sites already loaded into HIBP, I understand the sentiment. Last year I had someone create a UserVoice suggestion requesting I provide a way for people to remove themselves and that’s now precisely what I’ve done. Here’s how it works: There’s now an “Opt out” link under the “About” menu item:

Have I been pwned, opting out, VTech and general privacy things

That brings you to the opt out page:

Have I been pwned, opting out, VTech and general privacy things

The reCAPTCHA is necessary to ensure there’s no abuse that causes emails to be sent to unsuspecting people en mass. Sending an email is necessary because that’s the verification channel to ensure someone does indeed want to opt out:

Have I been pwned, opting out, VTech and general privacy things

The email and what happens next is pretty self-explanatory, let me explain how the mechanism works on the back end.

Firstly, in order to know who wants to opt out, I need to store the person’s email address. I could always just nuke their entire record now, but that doesn’t keep them opted out in the future if a new breach appears that include them. I also wanted to give people the flexibility to still use the service if they wanted to; they should be able to privately search for their exposure and the notification system should still work. To make all of this possible and still ensure both data imports and searches remain efficient, I now flag someone’s record in both the breaches table and the pastes table when they want to opt out. These are Azure Table Storage entities which are super fast to retrieve by key but slow to run ad hoc queries on. What this model means is that when a search is done for someone and their record is flagged as “opt out”, the result is simply 404 (not found), which is identical to if they were never in there in the first place.

That last point is important too: there’s no public indication that someone has opted out. If they’re taking the view that they don’t want to be publicly located then I think it’s fair to assume that applies not disclosing that fact as well. That means that both the search facility and the opt out process itself will not publicly disclose their opted out state – there are no known enumeration risks either explicitly or via timing attacks or anything like that. Having said that, if someone was to trawl through the Adobe breach (which is still easily located on the web) and find that an address in there is not in HIBP then clearly that would imply the opt out. There’s not really anything I can do about that, but I thought it was worth a mention.

To the earlier point about still being able to search for your own exposure, at any time anyone can use the notification service to check breaches and pastes they’ve been involved in. It will send them an email with a unique link which will take them to a page with the search results regardless of whether they’ve opted out or not and regardless of whether the breach was sensitive or not (i.e. Ashley Madison).

So that’s the opt out facility, let’s get onto VTech.

VTech and breach retirement

In November last year, Motherboard reporter Lorenzo Franceschi Bicchierai contacted me about a serious data breach. He’d been handed a large number of records directly by an individual claiming to have broken into VTech’s systems and thus began a saga that resulted in the details of millions of adults, their children and even their kids’ photos being exfiltrated by the individual. I helped Lorenzo verify that the breach was indeed legitimate and he communicated backwards and forwards with VTech and the attacker. I wrote about it at the time in When children are breached – inside the massive VTech hack.

The VTech data breach was unique for a number of reasons. Firstly, there were only ever three individuals who held the data, those being the guy who obtained it in the first place, Lorenzo and myself. To this day, it remains the only breach I’ve ever loaded into HIBP that other parties didn’t already have their hands on. Secondly, it’s kids and there’s really not much more to say about that; when children are involved, the incident takes on a whole new level of emotion for those caught up in the incident (put yourself in the parents’ shoes for a moment).

An incident of this nature has broad-reaching and long-lasting ramifications. One of those is that the alleged perpetrator was arrested in December. I know nothing more of his fate since that time nor do I know his identity and it may well remain permanently suppressed. Whilst he clearly stepped way over the line of ethical disclosure (he originally claimed he simply wanted to make VTech aware of risks in their assets), a saving grace is that to the best of my knowledge, he never redistributed the data beyond Lorenzo. In fact I’ve frequently had people request it from me but as I’ve said in the past, I don’t give anybody any of this data under any circumstances.

I’ve had chats with many parties about the VTech incident and one thing I’ll make crystal clear up front is that they’ve all been casual, productive and friendly chats. It’s included lawyers involved in class actions on behalf of those in the breach, three letter law enforcement acronyms in multiple countries and legal representatives for VTech themselves. Obviously the nature of those discussions has differed but there’s been one constant: nobody wants to see the data spread in the way that so many other data breaches have in the past. Because of this, I’ve agreed to ensure that can never happen due to anything on my end going wrong and I’ve now permanently deleted both the original data breach Lorenzo sent to me and the VTech data in HIBP.

For the sake of full transparency, this was not as a result of a demand or a threat or anything of that nature, rather it’s to give families impacted by the incident that little bit of extra certainty that they can put this incident behind them. This was a concern that was raised by all of the parties I mentioned earlier and regardless of how capable I may think I am of protecting data, it gives these families further peace of mind because I simply cannot lose what I do not have. I’ve been assured that every single person impacted by that incident has been contacted directly by VTech so there should be no interested parties left who aren’t aware of their exposure. These factors together – me being one of the only parties with the data, the concern families have over their kids and everyone in there having been notified already – are what’s driven me to remove that data because it’s the right thing to do.

In terms of the mechanics of how this works in HIBP, I actually had to build a new process to remove the data. As I’ve written before, it’s in Azure Table Storage rather than a relational database so it’s not a simple matter of DELETE FROM, rather I had to enumerate over every single record and remove entries one by one, all 4.8 million of them. You’ll still see VTech listed as a site that was breached, but I’ve now introduced the concept of a “retired breach” and you’ll see a corresponding logo next to it:

Have I been pwned, opting out, VTech and general privacy things

I elected to continue showing it here and not erase all memory of it in part because it has historical significance as a major industry data breach and also due to the referential integrity dependency of HIBP subscribers. I notified 1.7k of them who were found in the VTech breach and I have a historical record of having sent them an email. Whilst removing the 4.8M addresses themselves from the searchable index in HIBP was the right thing to do, it’s a different story for that small number who willingly provided their info to HIBP and asked the service to watch out for them.

This is a natural evolution for HIBP that has negligible real world impact. Continuing to run this service responsibly is absolutely paramount and measures such as the opt out feature and removing the VTech data are consistent with the ethical approach I believe this class of information deserves.

Summary

The only reason this service has been able to continue running unhindered is because I’ve always erred on the side of caution when it comes to how I handle data. A month before the Ashley Madison data even hit the torrents, I’d concluded that it shouldn’t be publicly searchable which I’m confident is the reason HIBP never received a DMCA takedown or even a hint of a legal threat. Regardless of any concerns about that, it was the right thing to do as is removing peoples’ data from public searching and retiring the VTech breach.

The things I’ve outlined in this blog post might be viewed as taking the moral high-ground and that may well be right – I hope it’s right – but more than anything, it’s to try and help this project achieve what I ultimately created it for and that’s to help people better protect themselves online.

↧

Lenovo P50 and my dislike of high DPI displays (for now)

April 12, 2016, 4:00 pm

≫ Next: When a nation is hacked: Understanding the ginormous Philippines data breach

≪ Previous: Have I been pwned, opting out, VTech and general privacy things

Lenovo P50 and my dislike of high DPI displays (for now)

A little while back, I wrote about how Lenovo were sending me some things as part of their Insiders program which meant getting to use a number of machines I probably wouldn’t have thought twice about otherwise. The Yoga 900 in that blog post, for example, is not something I would have normally considered for myself as I like a physically larger, gruntier machine yet it’s turned out to be one of the best laptops I’ve ever owned. You’d normally find me with a machine like a W540 which is the last machine I bought myself about 18 months back and it’s been sensational… until this arrived:

There we go - my new new workhorse - a P50! pic.twitter.com/oCzuQELpYE
— Troy Hunt (@troyhunt) February 26, 2016

In total honesty, had it been my own dollars I wouldn’t have gone out and bought the P50 because the W540 is still such a good machine. I want to be transparent about that because the last thing I want is a perception of my views being skewed by virtue of getting free stuff from Lenovo.

In terms of why they feel compelled to send me things, well, it’s because of stuff like this:

I just got a Lenovo P50 because @troyhunt made it sound sooooo cool
— MomMeetCoffee (@mommeetcoffee) March 18, 2016

Obviously influence is important to them particularly when it’s viewed as being objective, independent thoughts. So with that in mind, let me share how I’ve been finding the machine and I’m going to give you a bunch of stats followed by what I didn’t initially like and then why I decided I did like it!

It’s not for lightweights…

Let’s be clear about who this class of machine is for. Actually, I’ll turn it around – here’s who it’s not for:

@mommeetcoffee @troyhunt looks nice and...heavy. Do you guys dead lift with it? :P
— Steven Luu (@sluu99) April 6, 2016

The machine weighs in at somewhere between 2.5 and 3kg depending on how you configure it then you can add another 0.7kg for the seriously chunky power brick. My Yoga 900, by comparison, is only 1.3kg and the power supply probably just a couple of hundred grams. If the weight worries you, this is not the machine for you. For me personally, I do a lot of travel with a large machine and the Yoga all in my backpack and it doesn’t bother me, but maybe that’s just me.

The P50 isn’t intended to be a lightweight machine. It’s got a 15.6” screen so to begin with it’s a bigger chassis than many of today’s lightweight machines, then there’s the fact that it has a Xeon processor. Yep, Xeon. This machine is the world’s first laptop to use the Skylake microarchitecture and I can only imagine that it means a fair bit of cooling is also required hence the sizable fan ports on the thing. Ports is the other thing – 4 USB 3.0, a USB Type C, Ethernet, HDMI, mini display port, express card, smart card reader – it all adds up to more weight.

Look, I really like carrying the Yoga to a coffee shop or using it on the plane, but I really like working with a larger keyboard with more travel on the keys and using a large screen. But more than all that, I like power. Lots of power. Let’s see how much.

Benchmarks

Let’s just get straight to the numbers; here’s my PassMark PerformanceTest results for the old W540 and the new P50:

Lenovo P50 and my dislike of high DPI displays (for now)

Other than the very slight dip on 2D graphics, the P50 is quicker across the board but it’s the disk that really smashes it. When Lenovo shipped the machine to me, it came with a 256GB NVMe SSD which isn’t quite big enough for a primary drive these days. I went and bought a 512GB Samsung 950 Pro NVMe which appears to be widely accepted as about the faster drive you can get and oh boy – look at the rating on that sucker! That’s compared to the Crucial SSD in the old W540, but I saw similar improvements comparing it against other machines I’d had with older SSDs. (Incidentally, you’re reading this whilst struggling to get a Lenovo P50 or P70 booting from this drive, read this.)

In terms of the other specs, you can read the whole machine configuration in this system overview from PerformanceTest. Lenovo sent it to me with 16GB of DDR4 ECC RAM which I probably would have specced at 32 had I the choice (I’ve already run short whilst running a VM). The machine will actually go up to 64 which is pretty nuts for a laptop but if you’re doing anything memory intensive then that would be neat. I’ll likely just order some more shortly.

What’s also interesting is Greg Pakes recently picked up a Dell XPS 9550 and asked to compare baselines. We did a swap and here’s the end result:

Lenovo P50 and my dislike of high DPI displays (for now)

The Lenovo was faster in every area, even with the disk which is an identical drive in each so I can only assume it’s related to bus architecture on the machine or something else specific to the P50’s design. That said, do be conscious that all of these benchmarks can fluctuate between runs, particularly depending on what’s running on the machine at the time and which drivers are installed.

For the curious, if you’d like a copy of my benchmark you can grab it here. Note that the “fully configured” suffix indicates that this was after I configured everything optimally. Also note that it was before configuring Bitlocker so I expect there’d be a bit of a perf hit there. In short, this is the best result I could get out of the machine.

High DPI displays – we’re not quite ready

I’m notorious for wanting the latest shiny thing. There’s something in my head that just automatically associates “newer” with “better” and that carries with it this innate desire to upgrade and reap the benefits, whether they be perceived or real.

One such example is higher DPI devices and my love of lots more pixels in ever smaller spaces started with the iPhone 4. It was so… dense! So of course I later on had to have the iPad with lots of pixels and they are indeed beautiful devices. More recently, that’s extended to PCs and the Lenovo W540 I purchased about 18 months ago. Lovely machine, heaps of grunt and it’ll happily do 2,880x1620. More recently, I received a Yoga 900 at 3200x1800 and that’s in a smaller screen that the W540 too so it’s dense pixels FTW.

When the P50 arrived, I was initially disappointed when I saw it only did 1080p. That’s less than half the pixels than the old 540 therefore it is worse, right? No, and here’s why:

I spend a huge amount of time working in Camtasia where I edit my Pluralsight courses. Here’s what it looks like on the new P50:

Lenovo P50 and my dislike of high DPI displays (for now)

And now on the W540 with the higher DPI:

Lenovo P50 and my dislike of high DPI displays (for now)

The canvas no longer stretches to the edges when told to “Shrink to fit”, the buttons all become smaller than they should and various other dialogues have clearly odd spacing, particularly for someone like me who gets pedantic about these things. For an app I spend huge amounts of time in, this is enormously frustrating.

Let’s try LiveWriter where I write my blogs. Here’s the P50 at 1080p:

Lenovo P50 and my dislike of high DPI displays (for now)

And now here’s the W540 with the higher DPI:

Lenovo P50 and my dislike of high DPI displays (for now)

There’s no concept of scaling the pixels on the higher DPI machine to align with the size the browser will render it at once it’s published. If you’ve ever seen me post a blog with oddly sized images it’s usually because I wrote it on the W540 and what I was seeing at design time is not what I ended up getting at publish time.

There are many, many other apps that struggle too. SQL Server Management Studio has various quirks, as do Red Gate’s tools that I run within them. It gets worse as the DPI goes up too – it’s much more obvious when I use the Yoga 900 with 3,200 horizontal pixels versus the W540 with 2,880 compared with the P50 at “only” 1,920.

Let us be clear – this is the software’s fault not the machine’s – but it’s software I’m heavily dependent on. It’s not just the apps I install on the machine either, Windows 10 itself seems to struggle with the DPI namely when changing resolutions. Often I’ll see some assets appear to be targeting one pixel density and others targeting another; desktop icons suddenly become too small or the taskbar too big or any number of things that remind you how the high DPI model is still a second class citizen.

It pains me to say it because I want that pixel density so that I have super sharp visuals, but if I’m honest, I haven’t missed it at all with the P50. Like I said, a 1080p screen would not have been my first choice but having these apps that haven’t got with the times yet behave as they should and not having dramas with Windows flicking between pixel densities has been rather nice! Hopefully by the next time I come to need a new machine things will be more stable but for now, less is actually more.

What’s it like to use the P50?

So keeping in mind that this is a combination of how I find it compared to the W540 and general observations about the machine itself, there’s a few things that stick out.

Firstly, it has proper buttons next to the trackpad:

Lenovo P50 and my dislike of high DPI displays (for now)

The W540 had one of those pushable trackpads which initially drove me nuts then I got kinda used to. Now I’m readjusting again but I think this is a better design, if for no other reason than it’s more tactile and there’s a clearer delineation of where each button is. I normally use a wireless external mouse anyway though so it’s infrequent I touch the pad.

Fingerprint reader – get one. The last machine had one too although the P50’s implementation just involves placing the finger on it versus swiping it like on the old machine. The best bit about a fingerprint reader is not the ease of logging back in (although that’s nice), it’s being able to do it in front of other people without that awkward thing where they try to look away and you try to hide what you’re doing. I love Touch ID on my iThings for the same reason.

The fan – why does it keep coming on?! I don’t know why, but even on normal power setting and sitting there doing nothing it wants to spin it up every now and then. The old machine didn’t do this, perhaps it’s something I need to probe into a bit further.

Malware – no, I haven’t found any! I say this simply as I pretty much have a new metric when I tweet anything Lenovo related these days which is TTFS, otherwise known as Time To First Superfish. Lenovo screwed up badly on this front last year and they know it. I usually repave the machine from scratch anyway so anything on the disk is clean (plus it’s a new disk too), so short of funky firmware or BIOS software (and yes, I’m aware of the history there too), it’s all good.

Keyboard – I mentioned it earlier but it has a much more tactile feel than something like the Yoga. I know Apple rant on about how much effort goes into the way the keys on the Macbook feel when they’re pressed but they’re actually right – a good keyboard feels nice to use. I like the concave on the keys, the smooth but not glossy surface and the spacing between each one. It’s just a good place to spend lots of time and that’s exactly what a workhorse laptop should feel like IMHO.

This is really just a casual “here’s how I find it” review, if you want to get really into depth, check out this review on the ThinkScopes blog.

Cheap productivity

I love the saying that computers are cheap and humans are expensive. Comparatively little costs in hardware can translate to big savings in productivity and as the value of your time increases, the statement becomes truer. It’s a hard thing to quantify though; if you spend $1k more on a top-spec machine, what’s the ROI? I’ve got no idea but I do know that very small gains will easily justify a dollar a day over the (likely) life of the machine and I also know that in terms of buying productivity, $1 is nothing.

It always infuriated me back in my corporate life at Pfizer how cost was viewed when it came to people and productivity. The thing that really stuck in my mind is that during an organisation-wide refresh of machines there was a push to keep the primary HDD at only 128GB which made things very tight on space. It was something like $100 to increase the capacity to a level where it would be plenty for the vast majority of people but the argument was always “Do you know how much it would cost to do that across X number of people?!” because that was the big scary figure. My retorts of “Do you know how much we’re paying those X number of people?!” generally didn’t go down too well even though it meant valuable resources wasting time purging otherwise useful files because hey, that upfront cost looks scary! Use your common sense on these things and invest in equipment that will help you do your job and above all, that you actually enjoy using.

But I digress, the point is that whenever it’s me paying the money, I always err on the side of going fast and being productive and this machine is pretty much the epitome of that.

Summary

The capability of the machine is kinda the point in terms of wearing the additional weight; this machine will do pretty much everything my usual desktop machine will do. I travel a lot and I want to be just as capable on the other side of the world or at 30,000 feet as I am when I’m sitting at home (lack of multiple monitors aside). I travel with both this machine and my Yoga in part because I always have a backup machine when I’m speaking and in part because when I’m running workshops I tend to present from one machine then use another to prepare things whilst the class is doing their exercises. For me, capability is more important than weight and if your priorities are inversed then this is not the laptop you’re looking for.

But if going fast in a fully-featured machine is your priority then this is absolutely the machine I would buy with my own cash, albeit with 32GB of RAM… and a higher DPI screen which I would then curse at!

↧

When a nation is hacked: Understanding the ginormous Philippines data breach

April 13, 2016, 4:00 pm

≫ Next: It's a new blog!

≪ Previous: Lenovo P50 and my dislike of high DPI displays (for now)

When a nation is hacked: Understanding the ginormous Philippines data breach

Remember when OPM got breached last year? There was a lot of excitement in various parts of the world (namely the US) because here we had a government department (Office of Personnel Management), and they’d just lost 21.5 million records! These records included such sensitive data as names, dates of birth and addresses and by any reasonable measure, it was serious – that’s almost 7% of the country’s population!

Yet somehow, last week’s news that 55 million Filipino voters’ data was now out in the wild went largely unnoticed. Let’s put it down to a very western-centric tech media but move past that and look at this incident for what it is – a ginormous data breach with extremely sensitive information and at 55M individuals, that’s also more than half the country’s population.

Whilst there’s been limited press coverage on the issue, a public statement from the Filipino government has suggested that nothing sensitive was disclosed. As I discovered when I reached out to some of the people involved, this is blatantly wrong. Here’s how it all unfolded.

Background

A couple of weeks back, the COMELEC website (Commission of Elections) was defaced, allegedly by “Anonymous Philippines”:

When a nation is hacked: Understanding the ginormous Philippines data breach

This is the usual hacktivist ramblings, nothing out of the ordinary for a defacement but inevitably it led to them grabbing quite a bit of data. Just for context, here’s one of the search facilities on the COMELEC website (provided to me by a Filipino who assisted with data verification):

When a nation is hacked: Understanding the ginormous Philippines data breach

It’s a search facility requesting name and sensitive personal information (namely birthdate) over an unencrypted connection. You could try and load the page securely, except that won’t work very well:

When a nation is hacked: Understanding the ginormous Philippines data breach

Often when I’m verifying a breach, I’ll look at the site it allegedly came from and try to get a feel for how likely it is to be legitimate. Oversights like this combined with the W3C compliance GIF and hit counter at the bottom of the page just above the “Copyright © 2001” statement start to paint a pretty clear picture….

When a nation is hacked: Understanding the ginormous Philippines data breach

Breach distribution and contents

Firstly, unlike the OPM data, the Filipino breach has been very broadly distributed. Not only has it been readily available for download from multiple locations on the clear web, it’s been quite extensively torrented too. The genie is well and truly out of the bottle and it won’t be going back in.

The data consists of 76GB worth of (usually) compressed files, most notably a MySQL backup that expands out to 338GB. There’s a raft of other .sql files in the breach as well ranging from a few KB up to hundreds of MB. The breadth of data in these is quite significant; Trend Micro did a write-up on what they found so I won’t repeat it all here, I was more interested in verifying the legitimacy of the breach and conclusively reporting on the accuracy.

Amongst the huge volume of data is a total of 228,605 email addresses. This may sound like a small number out of the 55M records, but according to reports, a lot of the sensitive data such as passport numbers belongs to a “mere” 1.3M overseas voters. It’s entirely conceivable that records are not complete across all these individuals, but at least the email addresses gave me a verification avenue.

Breach verification with “Have I been pwned” subscribers

At the time of writing, I have 367k verified subscribers in Have I been pwned (HIBP), that is they’ve all gone to the notifications page, left their email address then received a confirmation and acknowledged it. I’ve used subscribers in the past to verify breaches where I’m not confident of the authenticity. It’s always worked out very well as I’ve got a large number of people interested in their exposure in data breaches and able to confirm whether information is accurate or not. I give them a small slice of what is allegedly their data and they then confirm the legitimacy.

Part of the reason why I particularly wanted to do that with this breach is because of this statement by COMELEC officials (emphasis mine):

Again, I want to emphasize that the database in our website is accessible to the public. There is no sensitive information there.

Now this could just be misreporting or comments taken out of context, but I’ve seen numerous articles downplaying the severity of the data in a manner that’s just not consistent with what I’ve observed in the data breach. Let’s see what those in there have to say about their data.

Yesterday I emailed a number of HIBP subscribers and got back some pretty quick responses with everyone willing to assist. I found them spread out across two tables in the data breach, the first being a table called “irdoctable2014” which has the following fields:

# FORM_ID, APP_TYPE, REGISTRATION, LASTNAME, FIRSTNAME, MATERNALNAME, SEX, CIVILSTATUS, SPOUSENAME, RESSTREET, RESPRECINCT, RESPRECINCTCODE, RESREGION, RESBARANGAY, RESCITY, RESPROVINCE, MAILSTREET, MAILEMBASSY, MAILCOUNTRY, REGCOUNTRY, REGEMBASSY, REPSTREET, REPBARANGAY, REPCITY, REPPROVINCE, EMAIL, ABROADSTATUS, ABROADSTATUSSPECIF, FLASTNAME, FFIRSTNAME, FMATERNALNAME, MLASTNAME, MFIRSTNAME, MMATERNALNAME, REPLASTNAME, REPFIRSTNAME, REPMATERNALNAME, DOBYEAR, DOBMONTH, DOBDAY, BIRTHCITY, BIRTHPROVINCE, CITIZENSHIP, NATURALIZATIONDATE, CERTIFICATENB, COUNTRYRES, CITYRESYEAR, CITYRESMONTH, PROFESSION, SECTOR, HEIGHT, WEIGHT, MARKS, DISABLED, ASSISTEDBY, TIN, PASSPORTLOST, PASSPORTNB, PASSPORTPLACE, PASSYEAR, PASSMONTH, PASSDAY, REGBARANGAY, REGREGION, REGCITY, REGPROVINCE, REG_DATE, STATIONID, LOCAL_ID, CREATE_TIME, UPDATE_TIME, IS_EXTRACTED, IS_EXPIRED, IS_CANCELLED, CONTACTNUMBER, EXPIRATION_DATE, APPOINTMENT_DATE, APPOINTMENT_TIME, SCHED_TIME, COUNTER_CHANGES, REFERENCENUMBER, ERBDATE, USER_ID, EMAIL_ID, EXTRACTED_DATETIME, IS_DELETE, UPDATED_DATETIME, IS_FRONTPAGE, IS_REPRINT, IS_OV, IS_COUNTED

This is a very large amount of data and reading through those column names, clearly many of them would be considered sensitive personally identifiable data. However, some of the data is encrypted, namely the person’s name and their data of birth. Part of the irony here though is that the email addresses appear in the clear and often contains both the first and last name anyway! Not all the fields are populated but plenty of them are and they contain very personal info. Let me demonstrate by sharing one of the responses I got to my questions in full (responses in bold, data obfuscated by XXX):

Would it be feasible that you have a record in this database? Yes, as I am a registered voter for the coming elections

Is this likely to be legitimate data? Yes

There’s some very specific information about your height (XXX) and weight (XXX) which I assume is metres and kilograms – does that sound right? Yes, that's what I declared when they took my vital statistics and biometrics when I registered last year

Do you recognise the names “'XXX', 'XXX', 'XXX', 'XXX', 'XXX', 'XXX'” – I suspect they may refer to your mother and father, I’m just trying to confirm? Yes, those are my parents

Along with email address (which in this case included the person’s full name), is their “vital statistics and biometrics” as well as their parents’ names which all appear in the clear. There’s also a physical address, gender, marital status, where they were born, where they’re now living, their profession and their phone number. This is very personal information!

Another subscriber provided this confirmation:

XXX and XXX are my parents' middle names. In the Philippine setting, your middle name is usually your mother's surname (prior to her marriage).

I then had a further two other people provide the same emphatic confirmation about their data in the same table. A fifth person who offered support was found in a table called “doctablepost” which was in the 338GB file and contained these fields:

# ID, APPLICATION_ID, FORM_ID, APP_TYPE, ABSENTEE, REGISTRATION, LASTNAME, FIRSTNAME, MATERNALNAME, SEX, CIVILSTATUS, SPOUSENAME, RESSTREET, RESPRECINCT, RESPRECINCTCODE, RESBARANGAY, RESCITY, RESPROVINCE, ABROADSTREET, ABROADZIP, ABSENTIA, ABROADCITY, ABROADCOUNTRY, ABROADPERIOD, ABROADRESCONT, REGCOUNTRY, REGEMBASSY, MAILSTREET, MAILZIP, MAILCITY, MAILCOUNTRY, MAILEMBASSY, REPSTREET, REPBARANGAY, REPCITY, REPPROVINCE, EMAIL, ABROADSTATUS, ABROADSTATUSSPECIF, LASTENTRYDATE, ABSREGISTERED, OLDPRECINCT, OLDREGBARANGAY, OLDREGCITY, OLDREGPROVINCE, OLDREGDATE, FLASTNAME, FFIRSTNAME, FMATERNALNAME, MLASTNAME, MFIRSTNAME, MMATERNALNAME, REPLASTNAME, REPFIRSTNAME, REPMATERNALNAME, DOBYEAR, DOBMONTH, DOBDAY, BIRTHCITY, BIRTHPROVINCE, CITIZENSHIP, NATURALIZATIONDATE, CERTIFICATENB, COUNTRYRES, CITYRESYEAR, CITYRESMONTH, PROFESSION, SECTOR, HEIGHT, WEIGHT, MARKS, DISABLED, ASSISTEDBY, OLD_VIN, VINP1, VINP2, VINP3, VINCONTROLCODE, TIN, PASSPORTLOST, PASSPORTNB, PASSPORTPLACE, PASSYEAR, PASSMONTH, PASSDAY, REGBARANGAY, REGCITY, REGPROVINCE, REG_DATE, INTERNAME, OFFICERNAME, OPERNAME, STATIONID, CDID, SETID, PRINT_FLAG, FINGER_INFO, FINGER_TOPO_COORD, QUALITY, MATCHING_FINGER, TRANSFER_STATUS, TRANSFER_UPDATE_TIME, PAGES_DESCR, LOCAL_ID, CREATE_TIME, UPDATE_TIME, LOCK_USER, LOCK_TIME, PROCESSING, IS_CURRENT, DOC_VERSION, CD_STAT_ENTY, DISAPPROVED, VOTING_HIST1, VOTING_HIST2, OP_CODE, OP_DATE

This is where we now start to get into passports too and indeed this individual's was in there, both the number and the place and date of issue. His name was also encrypted but the passport data wasn’t, nor was his birthday as it was for others in the previous table. I asked him for confirmation of his data:

Would it be feasible that you have a record in this database? Yes
Is this likely to be legitimate data? Yes
The db record appears to be my overseas absentee voter record which I have registered into back in 2012. That also contains my current address, mothers maiden name and the Comelec officer who processed my registration.

Does your passport number end in “XXX” and was it issued in the month of XXX? Yes. Starting with XXX. It was issued in XXX and will expire XXX XXX
Were you born in [month] XXX? Yes, I am born in XXX

That’s a very emphatic set of responses and he was able to not only confirm passport fragments that I gave him, but provide me with other fragments that lined up with the data in the breach. With five independent confirmations of the data, there’s no doubt in my mind that this is the real deal.

Other data attributes

As serious as the info above is, it’s only scratching the surface. Per the reports linked to earlier, there’s also biometric data relating to fingerprints in the system. This contains columns names such as these:

PRINT_FLAG, FINGER_INFO, FINGER_TOPO_COORD, QUALITY, MATCHING_FINGER

The values within there can be quite detailed and I’ve no reason to think that this isn’t indeed legitimate print data uniquely and biologically identifying the owner. You don’t get to reset that stuff once it’s been released into the wild!

There’s what appears to be a CMS back end for the website. There’s voting history against names (it appears to just be dates rather than the candidate voted for). There’s information about embassies and polling locations and data on what appears to be electoral candidates as well. Clearly some of this should be public, but here you have a whole heap of very sensitive, poorly protected data somehow grouped in with public domain info. Given that most of this all sits in the one database as well, it’s highly liking it was all running behind the public website which may explain how such a broad set of data was obtained.

It’s just an absolute mess of huge volumes of data, tables with suffixes which appear to indicate copies or duplication, draft or temporary data and inconsistent (and frequently insufficient) cryptographic storage of sensitive data. This feels like so many other large, legacy corporate databases I’ve seen which have had numerous developers applying various practices to it over a long period of time. Only difference is, it’s got a heap of highly sensitive information in it and it’s now all public.

HIBP and summary

The 228,605 email addresses in the breach are now searchable in HIBP. I actually had to create five new data classes when loading this breach, that is I’d never seen this information in a breach before: Marital statuses, Biometric data, Physical attributes, Family members' names

The Philippines is not exactly high on the news radar for Western media but this is a breach we should be paying attention to. There’s the potential to do serious damage to those involved and we need to remember that the same classes of data are held by all our governments in our respective corners of the world. It’s too late for those in this breach – a huge amount of personal data is now perpetually out there in the public domain – but let’s see if we can avoid those same mistakes in other parts of the world.

↧

It's a new blog!

April 19, 2016, 3:35 am

≫ Next: Disqus and the disappearing comments (which still exist)

≪ Previous: When a nation is hacked: Understanding the ginormous Philippines data breach

It's been 434 blog posts over six and a half years. It's gone from being excited about a hundred visitors in a week to hundreds of thousands on a big day. It's taken me from a hobby to a career. In so many ways, this blog has defined who I am and what I do today but finally, it was time for a change.

You're now reading an all new blog in an all new design on an all new platform. The content is the only thing that remains and I've literally rebuilt everything from the ground up over the last few months. Over that time, I've made many promises to explain how I decided to do it so my first post on the new platform is going to be a very meta-post about how everything you're seeing here has been put together. I'd love to get your feedback too so please do use the comments below; tell me what you love or hate or think I can improve and whilst I may not be able to do everything, I can at least read all feedback and I'm sure a bunch of it will lead to positive improvements.

Ghost, Ghost Pro

Last year I had to create a new blog. Not for me, mind you, rather for my wife, Kylie. I wanted something she could self-manage and that was a good, modern day platform. I looked at WordPress options but there were a number of things that just didn't sit well with me on that front. One of them was the sheer scale of security issues they have and before you interject, yes, I know that's usually because of third party add-ons yet the ecosystem still facilitates this. Then there was just the multitude of things that WordPress sites do that she simply didn't need - all she wanted was a blog, not add-ons and forms and other things that were ancillary to the simple objective of just creating a blog.

You can read more about it in Creating a blog for your non-techie significant other; the path to Ghost but as the title suggests, the solution was ultimately to move her to Ghost. And then I got a bit jealous because her site looked so damn good! It doesn't just look good, it's all HTTPS, loads super-fast and is a joy to work with.

After creating Kylie's blog and writing about the process, I began having a bit of dialogue with John O'Nolan from Ghost, AKA the guy who created Ghost. I was already keen to move this blog for a whole bunch of reasons that will become clear as you read this post, and John offered to help make it happen. It was never going to be a simple task and particularly as an independent these days, I have to think carefully about where I spend my time. The offer of support was enough to tip me over into saying "yes" and now here we are!

One more thing before I get into the details; this blog is running on Ghost Pro which is their commercially hosted version of the otherwise free Ghost blogging platform. A lot of people have suggested I should host it myself on Azure or Digital Ocean or similar and I have absolutely zero interest in doing that for a blog. Let me explain.

Hosting websites is like having kids; they're continually attacked by nasty things, they need ongoing care and as good an idea as they may seem at the time, they end up costing you a heap more than you planned. And no, I don't care that [whatever your favourite is] only costs 3 cents a month because that's not what matters; time is the commodity that's most valuable to me now. It's not just time in terms of hours actually spent, it's needing to be ready to patch any nasties, managing (and testing) backups, installing updates so you can leverage new features and so on and so forth. Let's be overly optimistic about it and say all that only takes 2 hours a month - what's that worth to you? Go on, put a dollar figure on it and consider what you could charge for the time plus what it's worth to simply not have to think about it.

Alternatively, check out this one the Ghost folks prepared earlier which I wholeheartedly agree with:

It's a new blog!

You do have to make trade-offs on a managed platform like Ghost Pro. For example, I can't add custom responses headers and implement a content security policy. On the other hand, I'm not patching it and securing the environment as I would be if I managed it myself which would give me access to headers so I'm effectively trading off one security defence of limited value to a read-only personal blog and substituting it with a far more valuable defence which is letting the Ghost pros manage their platform. Pragmatism, people.

Ghost Pro starts at $19/m and if this seems like an insurmountably high amount, it might be time to think about whether it's worth doing at all if it's not worth ~60 cents a day. That cost can go up if you've got larger traffic numbers, but I'll talk shortly about how to keep that down with CloudFlare as well. For me, it was never up for discussion, a SaaS offering like Ghost Pro is an absolute no-brainer.

Right, that's clear, let me move on to the mechanics of how it's all put together.

CloudFlare

There are three things you need to understand about CloudFlare if you're not already familiar with what they do:

Their service routes your traffic through their global infrastructure
They give you security bits and performance bit for free
It only takes 5 minutes to setup

I wrote about CloudFlare last year and it's a really neat way of simply wrapping services around your existing site. In fact, I was so impressed with CloudFlare that I ended up writing a Pluralsight course on it, Getting Started with CloudFlare Security. At the time of writing the course just under a year ago, they were putting 5 trillion (yes, with a "t") requests a month through their infrastructure, a globally distributed collection of edge nodes that sit, well, pretty much everywhere:

It's a new blog!

When you have a service like this that sits in between customers and the origin website (the one running on Ghost), you can do all sorts of neat performance and security things. Actually let's get onto that now - security - and I'm really happy with this next bit...

It's all HTTPS

I've been hammered by people on my lack of HTTPS for some time, even occasionally being called hypocritical for not having it when writing about the importance of it on other sites. Let me take issue with that for a moment: criticising a commercial website accepting credentials in the clear or a payment provider loading credit card forms insecurely is a world apart from a public blog where you only read information being loaded over HTTP. Absolutist views like this are unhealthy; pragmatism about what security defences make sense in what scenarios is really important.

That said, I really dislike this sort of thing:

Oh @Fly_Norwegian ... you didn't just do that?!? /cc @troyhunt pic.twitter.com/1QpsOlUDxX
— David Peter Hansen (@DPHansen) January 10, 2016

I saw it myself last year whilst flying Norwegian and whilst I'm sure this looked great on the marketing slides, it's a profoundly obtrusive and irresponsible thing for them to do. And now they can't because the data is encrypted from your device as it flows out through the network so Norwegian, screw you!

But increasingly, there are other reasons to go all HTTPS on a blog like this one. One is that Google now use it as a ranking signal or in other words, your searchability gets a bump simply by virtue of serving it securely. It's allegedly a minor bump but in a space where SEO techniques are not just very competitive but often very shady, a bump for doing something positive is a very good thing indeed.

Another reason is that sooner or later, sites that aren't served over HTTPS are going to be explicitly flagged as insecure. Think about how it works at present; let's take a site like, oh I dunno, Norwegian airlines:

It's a new blog!

This site is insecure but the browser doesn't tell you that, rather it just loads it up without any visual warning indicators - it's implicitly insecure. Compare that to what you now see in the address bar of this site - green and padlock, depending on your browser - it's explicitly secure. Scott Helme talks about this in his excellent post about Still think you don't need HTTPS? which is well worth a read if you're on the fence about how important the shift to "secure by default is".

You can actually see what this looks like right now: in Chrome, jump on over to chrome://flags/#mark-non-secure-as and tell it to mark non-secure origins as non-secure:

It's a new blog!

Now you head on over to Norwegian airlines and, well:

It's a new blog!

Fortunately, the web as a whole is moving in the right direction albeit at a gradual pace. For example, only a couple of weeks ago Wordpress announced that all sites on wordpress.com would get HTTPS. It's an incremental process that will take many years to filter out to masses of HTTP sites out there but eventually, insecure sites will be the exception so best start moving away from that early.

I've also enabled HSTS which means that once you do see the site over a secure connection, supporting browsers (which is all the major modern ones now), won't load the content over HTTP. I can't force it on subdomains though because I have the likes of hackyourselffirst.troyhunt.com and a couple of others which I want to keep serving content over HTTP. That also means I can't preload (explained in that HSTS link) but again, it's a personal blog, it's not a class of site that warrants the changes required to enable those extra layers of defence. Go and have a good look at the configuration of haveibeenpwned.com if you'd like to see a utopian security headers approach.

One thing to be clear about with HTTPS on this blog is where it begins and ends using CloudFlare's service. What this new blog means is that I've been able to go from zero encryption anywhere to A+ grade encryption from the browser to CloudFlare (so all the most common attack vectors like rogue wireless providers, dodgy ISPs and modified DNS are all good now) and what they call "full" encryption from CloudFlare to Ghost. This protects against passive eavesdropping on that network segment (i.e. the NSA can't just watch the traffic), but doesn't employ what they call "Full strict" encryption which would also protect against an adversary issuing a fake certificate. To do that, I'd need Ghost Pro to support me loading my own certificate for troyhunt.com into the site there and that's not a feature they presently enable, in fact they expressly direct people to do precisely what I've done here with CloudFlare. I'll write more about this another time but for a personal blog, it's more than ample.

Last thing on HTTPS - before anyone says "Well you really need to preload HSTS and use HPKP to be properly secure" - do a quick reality check. No seriously, it's a freakin' blog and as much as I'd like to think it's important enough for Ahmadinejad to go and own another CA and issue rogue certs so that he can inspect your requests to troyhunt.com, it's not. I spend a lot of my time when teaching security courses these days trying to talk about pragmatism; applying the right security levels to the right assets and knowing when to is both very important and sorely lacking. I'll write more on this lately because some of the absolutist attitudes I've seen recently need a bit of bringing back down to earth.

Tor

This is one thing I didn't want happening:

@troyhunt I'd like to know why you chose @CloudFlare because as a regular visitor to your site, you've just made my life more painful.
— hXmxhvEM2mtY (@hXmxhvEM2mtY) April 17, 2016

If you believe their stats, 94% of the traffic from Tor is malicious. If you believe Tor, CloudFlare don't know what they're talking about, or at least they analyse the data in a very selective fashion. Regardless, when you route your traffic through CloudFlare, by default users on Tor will be challenged:

It's a new blog!

This is not a pleasant user experience. I get why they're doing it and regardless of what the figures are, a network designed to anonymise the traffic is going to have a bunch of evil stuff come through it that's probably going to be harder to track than via the clear web. If I was wrapping my own app in CloudFlare and I had serious business going on in it that was unlikely to see legitimate traffic coming from Tor, I'd think differently. But when everything is running on a managed service like Ghost Pro, it's not my problem! (Remember, that's another joy of not hosting your own things - they get to deal with any attacks themselves!)

Anyway, the point is that I want to make things as frictionless as possible and as such, I've followed CloudFlare's guidance on whitelisting Tor traffic. Anyone who chooses to browse via Tor will now be able to do so without a challenge. In fact, that was my decision well before even launching the site and it was like that from the very first requests through CloudFlare.

Performance

This pleases me enormously:

@troyhunt Just hit the new site for the first time. Pretty damn fast. even rivals the one I built last week. Which is impressive!
— Gerard (@Spitfire2k6) April 18, 2016

And this:

@troyhunt new site looks great and is very fast. I'll forgive most things if your site loads quickly!
— Colm Heaney (@ColmJHeaney) April 19, 2016

This one too:

@troyhunt @TryGhost @CloudFlare Looks great, good job. It's lightening fast!
— Kieran Innes (@kieraninnes) April 17, 2016

And all the other ones :)

This is one of the things I'm proudest of - it's chalk and cheese compared to the old blog. Before, I had both a legacy of Blogger-bloat and quite frankly, my own busy content with too many widgets and other bits integrated into the page.

This time, I wanted to go fast. Real fast. Let me show you the quantitative results and I'll start with a page speed test of the old site from Saturday just before I rolled over, here's a link to the actual test:

It's a new blog!

Now that's still faster than 74% of the sites they've tested so not too bad, but it's also 142 requests at 2.1MB and taking 1.9 seconds. In the grand scheme of websites, it's actually not too bad. But I could do better - much better.

Different test tools will rate sites differently too so I wanted to mix it up a bit. I quite like the way GTmetrix breaks down the data so I ran it through there too. Here's a link to the actual test and the results as follows:

It's a new blog!

This test doesn't load the ads so the requests are way down, but the duration was actually higher at 3.3 seconds. Different tests, different methodologies, the main thing is how the new site compares to the old when using the same tools. Let's give that a go, here's a test of the new site.

It's a new blog!

Let me bask in this for a moment:

The number of requests are down 87%
The page size is down 81%
The load time is down 61%
It's now faster than 94% of (tested) sites on the web!

Let's try GTmetrix, here's the new one:

It's a new blog!

Again, different perf tests will differ in different ways, the main thing is that everything here shows improvement and it's all green! There are always further tweaks that can be made and I do have a few in mind, for example:

@troyhunt are you making use of our “cache everything” option via PageRules? Can cache entire static pages — we can serve 95% of requests
— Justin (@xxdesmus) April 18, 2016

There are issues there related to cache expiry and the very limited number of page rules the free CloudFlare offering gives you, but we're well and truly into the realm of very small improvements now.

Perf tweaks galore

Clearly, I'm really happy with the performance of the site and that's been one of the resounding pieces of feedback since I launched. Let me talk through just some of the things I've done to eke out as much perf as I could.

All the obvious things such as bundling and minimising CSS and JavaScript are there. As part of my Node.js journey (Ghost is all built on Node), I wrangled up a Gulp file to handle it all and it not only does the usual tricks of combining files and stripping unnecessary characters, it also combines media queries. The Gulp file is here if you'd like to take a look and I definitely welcome suggestions on that one.

Obviously sitting behind CloudFlare is an important perf decision as well. It makes the difference between my content sitting only within Ghost Pro's Amsterdam hosting and sitting, well, almost everywhere courtesy of that globally distributed CDN map earlier on. You start to understand just how much difference this makes when you look at the stats in CloudFlare:

It's a new blog!

This is over the last 24 hours at the time of writing, a period where I haven't posted anything new and nothing is going nuts as opposed to traffic norms, yet look at the stats. CloudFlare has served 184k requests (about two thirds of them) from nodes close to the user so it's fast for them, but it's also relieved Ghost Pro of those requests plus about 9.5GB of data. That's on a highly-optimised, super-efficient site too so consider what this would mean for a site with heavy pages serving heaps of requests.

There are endless other little tweaks and optimisations too. For example, all the JavaScript and the IFTTT logo in the footer are inlined in the HTML, they're not external files. They don't need to be because they're so small it's not worth loading up the extra HTTP request for the sake of wearing a few extra bytes on every page load.

If there's other perf opportunities you see when browsing the site then definitely let me know. I'm sure they're out there!

Designing the site

I built the last site template running on Google's Blogger from scratch. All hand-rolled including the responsive design bits and all the pain that goes with cross platform testing. I actually think things are easier these days if you're targeting modern browsers, but I still didn't want to deal with it myself.

I began with a false start - I had someone building me a Ghost template and it just didn't work out. It's not that they were bad, it was just one of those things where it was hard for me to articulate what I wanted and for them to turn that into a site. Each time I almost just went "it's good enough" I stopped... and remembered just how critical this site is to the things I do today. It has to represent me and what I do in a way that I'm 100% happy with and compromising just didn't make sense. We iterated many times but ultimately I reached the conclusion that it just wasn't going to work; I had to build it myself.

To be fair though, I cheated. I looked around the Ghost Marketplace and found the template that was the closest to what I actually wanted. Then I started ripping it to pieces. I optimised it fairly significantly, doing things like throwing out SVGs for the social icons and deferring to Font Awesome instead. I discarded fonts I didn't think I needed. I got rid of extraneous markup. In short, I trimmed it down as far as I dared in order to not just get it light, but get it reflecting the image that I wanted to present. That said, there are still artefacts of the original design in there and I'm sure there's wasted bytes floating around, but it's pretty good now :)

There were plenty of crossroads where I had to make calls on design elements that weren't always easy. For example, I started out with hyperlinks having a bottom border rather than the default underline text decoration. This causes other problems:

What's a reliable way of bottom removing bottom border from images in <a> tags via #css without setting a class on <a> or <img>?
— Troy Hunt (@troyhunt) April 11, 2016

The answer is simple - kill the bottom border and lose the aesthetically-more-pleasing space between the letters and the line under them. For a site like this, design fluffiness is not worth performance, usability and maintenance costs. That's a question of degrees, of course, and you're still "paying" the cost of downloading a web font used in the text you're reading plus of course there's some big images in the banners, but it's the presence of these style elements which I felt enabled me the kill fussy little things like the borders instead.

Lastly, obviously these days everything has to play nice on mobile, that's an absolute given. This site has much better spacing and large fonts which especially makes a difference on pocket devices. But as important as those devices are and contrary to popular industry beliefs, the vast majority of my traffic from the last month is still via desktop browsers:

It's a new blog!

Further to that, 28% of all traffic is on browsers running 1,920 pixels on the horizontal so we're talking about high-res devices. I've tried to make it play nice across everything and I hope I've been able to do that.

Content migration and new pages

Migrating over 434 blog posts isn't a simple thing, although it did end up being easier than expected. In part that's because of awesome support from the folks at Ghost (and I do mean awesome - I can't speak highly enough of the assistance they gave me), and in part because we cheated just a little bit. Normally you'd create posts in Ghost using markdown which is exactly what I've done with this post, the first one on the new platform. However, you can also just write plain old HTML into the posts and that's effectively what we did with the old posts - just moved over the HTML.

One thing we had to make sure of though was that there was no external content embedded over HTTP otherwise there'd be mixed content warnings when the new blog loaded the parent page over HTTPS. This mostly meant fixing embedded YouTube video references, but I wouldn't be surprised if we've missed a few other content types. Comments section if you find them!

The paths also changed as Ghost doesn't work to the same convention as the old Blogger platform does plus of course I've now also got HTTPS as the scheme. It meant that a URL like this:

http://www.troyhunt.com/2016/04/lenovo-p50-and-my-dislike-of-high-dpi.html

Now becomes this:

https://www.troyhunt.com/lenovo-p50-and-my-dislike-of-high-dpi/

The Ghost folks stood up a bunch of redirects so each "legacy" URL still works just fine. I also uploaded a map into Disqus to migrate comments although disappointingly, that appears to have been only partially successful. Blog posts like the one above don't have any of the old comments appearing on them although I can still see them in the Disqus portal. Other posts came across just fine and I've got a ticket with them now to hopefully get it sorted, although it's been radio silence since I lodged it more than a day ago. Know anyone at Disqus? Give them a nudge for me because frankly, I'm a bit disappointed right now.

One thing I'm kinda curious about with the 301s is what it'll do to my SEO. Paths have changed and the scheme has changed and indeed this has been one of the hesitations that many people have expressed about moving to HTTPS only - what will it do to their searchability? I honestly don't know, but if I see anything worth sharing once everything has been refreshed in the search engines, I'll write about it.

Onto new content - you'll see some links across the top of the page (or in the menu if your device is small). These days, what I'm doing with speaking and particularly the workshops I run are really important. I hope I've captured the essence of those here, it's information I only had in pieces before and consolidating it in a cohesive fashion is really important, particularly to my new life as an independent.

I’ve kept the same three ads as I’ve had for some time now, although I’ve attempted to make them a little less intrusive. Making them more visible more of the time would obviously have been to my financial advantage but every time I looked at this, it just didn’t feel right. Not right aesthetically and not right in terms of prioritising someone else’s message over what readers have actually come to the site to read.

It’s also completely ad block friendly, in fact I checked it with the major blockers to make sure the page still rendered cleanly and didn’t make a mess of things. Of course I’d like the revenue that comes from the ads, but I’d like people to have a positive experience when they come to my site even more.

Let me digress on ads for a moment: ad networks - you guys have massively screwed this up. I went to town on bad UX in this very popular post earlier this year and shitty behaviour by ad networks was right up there at the top of the worst offenders. One of the practices that most incensed people was Forbes demanding they disable their ad blocker before viewing their content. That's nuts, right? I mean who would ever do that?! Apparently, about half of you:

~1/2 of Forbes visitors disabled their ad blocker voluntarily when asked. https://t.co/5j5SEvWamJ @RandyWestergren pic.twitter.com/PVbcvqWv1b
— Jeremiah Grossman (@jeremiahg) March 24, 2016

The ad thing is a mess and it isn't getting better any time soon. I want it to work well and I want people to see my ads because they do actually contribute quite a bit to my efforts when content does well, but above all, I want your experience to be positive.

I was going to go naked...

No, don't get excited, not that naked, rather naked in the sense of troyhunt.com rather than www.troyhunt.com. Of course either would have still worked as both of them always have, it's merely a question of which one becomes the canonical one with the other one simply redirecting to it.

What originally motivated me to want to do this was nothing more than aesthetics; it's a cleaner look not having the www prefix on the URL when you see it on the page. But what stopped me was entirely practical reasons, starting with this tweet:

Thanks @samnewman @troyhunt. The benefits of shorter urls are not worth the trouble. I would also stick with www. https://t.co/YRsghEwzDI
— Huilaaja (@Huilaaja) April 16, 2016

That site is well worth a read and it has some very valid technical arguments as to why the www prefix makes sense. The more I thought about it, the more I realised that regardless of the technical reasons, my aesthetic arguments didn't really make sense anyway. For example, here's the site loaded on my iPhone now:

It's a new blog!

And here's how Twitter is showing links:

It's a new blog!

These are both www prefixed sites, the respective apps just auto-strip them off when displaying the URL. Now we're in this realm of decreasing aesthetic value, negative technical impact and a change from the existing status quo. So I left it as is and I reckon that's the right decision.

Search

Gone. Dead. Cactus.

The original reason for the decision it is that there's simply no support for search in Ghost. But on further reflection, I began to wonder how much it was actually used and the Google Analytics stats suggested "not much". The vast bulk of traffic on the old site came from search engines or clicking through on the site, not querying directly on the site. Even if I wanted to add search just because sites "should have it", a big input box on the page somewhere was honestly not exciting me design wise.

We'll see how it goes and I'll happily take feedback on this telling me I'm wrong. Of course if I am, I won't be adding in the feature via Ghost and I'll have to look at third party options instead, but let's see how it goes.

Open source

Not much to say here other than that everything you see here, namely the Ghost template, JS, CSS and Gulp bits are all in a public GitHub repo named troyhunt.com-ghost-theme. If you can fix or enhance or do anything else useful then by all means, submit a contribution and I'll be happy to take it if it helps the site.

Feedback

I'd love to hear it, particularly anything constructive around stuff that can be refined, clarified, fixed or otherwise made better. Of course I'd love to hear what you like as well; this has been a huge effort a long time coming and whilst I'm obviously very happy with the result, you're the people who have to look at it so your opinions matter enormously!

And that's it. First post on the new platform with many, many more to come yet.

↧

Disqus and the disappearing comments (which still exist)

April 22, 2016, 12:35 am

≫ Next: 100 data breaches later, Have I been pwned gets its first self-submission

≪ Previous: It's a new blog!

Disqus and the disappearing comments (which still exist)

Update (just a couple of hours later): We're fixed! By all means, read the background it you're having similar problems or jump to the bottom for the solution.

I'm writing this post because I've pretty much exhausted all other avenues. I'm out of ideas. I'm stumped. I'm also missing a truckload of Disqus comments that should be appearing on many of the blog posts I've just migrated. Let me explain what I've done and what I'm now seeing and hopefully between me trying to articulate this clearly and people reading this post, we'll work out how to get it fixed.

Some background first: earlier this week I wrote about my shiny new blog which I migrated over on Sunday my time, about five and a half days ago now. There were two important changes here as it relates to Disqus:

The scheme changed from HTTP to HTTPS
The paths of the posts changed too

Fortunately, Disqus has you covered for such changes (at least they should have you covered), and they outline migration tools including a domain migration (only any good when the paths don't change), a URL map of old URL then new URL and a redirect crawler which follows all your 301s. I elected to go with the URL map because it seemed like the most expeditious way forward. Here's an example of what's in the map:

http://www.troyhunt.com/2016/04/lenovo-p50-and-my-dislike-of-high-dpi.html,https://www.troyhunt.com/lenovo-p50-and-my-dislike-of-high-dpi/

This is the source URL and the target URL, the former which now also redirects to the latter. That present URL at https://www.troyhunt.com/lenovo-p50-and-my-dislike-of-high-dpi/ only has 3 comments showing on it as of today, but I'll come back to that. The entire map of 434 posts can be seen here.

As well as migrating the comments with the URL map, I also configured the Disqus div and script as follows:

<div id="disqus_thread"></div>  
<script>  
var disqus_config = function () {  
this.page.url = 'https://www.troyhunt.com/lenovo-p50-and-my-dislike-of-high-dpi/';  
this.page.identifier = 'ghost-1785';  
};

(function() {
var d = document, s = d.createElement('script');

s.src = '//troyhunt.disqus.com/embed.js';

s.setAttribute('data-timestamp', +new Date());  
(d.head || d.body).appendChild(s);
})();
</script>

Now in theory, this should be it. Job done. The URL map might take up to 24 hours to run and that's cool, but something is still amiss. I want to start outlining the issues and ask for feedback but before I do that, please don't suggest I do [insert whatever] with the Disqus embed tags unless I've clearly screwed up what you see above. I say that merely because many of the suggestions via Twitter have pointed me to references on the tag setup and I don't believe that's where the problem is so unless you can see that I've clearly messed them up, let's instead focus on the next bit.

Going back to this post on the P50, here's the comments I'm presently seeing:

Disqus and the disappearing comments (which still exist)

However, when I browse over to the Disqus portal, I can see comments like this which are clearly not showing up on the page:

Disqus and the disappearing comments (which still exist)

Note the cursor over the post title and the URL in the status bar. That URL is https://www.troyhunt.com/lenovo-p50-and-my-dislike-of-high-dpi/#comment-2625929224 and if you click it, it will indeed open the P50 post yet when it opens, you won't see the comment on the page. To ensure I wasn't going insane, I even diff'd it against the URL of the post by browsing directly there from the homepage. The URL is good and there's no redirects or other funkiness going on. Why does this comment not appear on the blog post?!

Conversely, there are comments with the wrong URL that do show on the page. For example, the one comment of mine that's visible on the P50 page shows this URL in Disqus:

Disqus and the disappearing comments (which still exist)

Now again, this comment is presently showing up just fine on the P50 page yet in this image you can clearly see it links through to the troyhunt.ghost.io host name. This was the temporary path I used whilst building the blog before updating the DNS and pointing troyhunt.com to it. As part of my testing, I'd viewed this post on the troyhunt.ghost.io domain before making the site live which made me curios...

I viewed the Sony passwords post before rolling over DNS and I can only see one comment on it which was made since rollover.

I also viewed this TeamCity post before rollover and it has no comments.

But I didn't view this TeamCity post before rollover and it has comments just fine.

Somehow, any of the posts I viewed on the troyhunt.ghost.io domain now no longer have comments associated to them. The comments are still there in the Disqus portal so I have no reason to believe that I've actually lost them, but nobody can see them. For all intents and purposes, these posts presently have no comment history.

This is enormously frustrating. I've got posts such as the one on the massive Philippines election committee data breach which are making national news and getting many thousands of visits a day and there's an entire dialogue about the incident missing. I've tried re-uploading the URL map and I've tried the redirect crawler but I just can't get everything synced up. I've also tried Disqus support and whilst friendly, it hasn't led to a solution.

If you have a Disqus contact for me who might be able to get some results, please ping me.

If you're from Disqus, please let me know how to get this fixed.

I've been very happy with Disqus for many years and I love what the service does, I hope someone out there can help me piece things back together. Please do leave a comment if you've any ideas on how to fix this, I think I'll be able to read it...

We're fixed!

A big thanks to this thread with John Nobile below who suggested trying other URL maps. I eventually narrowed it down to this one, thus solving the problem:

http://troyhunt.ghost.io/are-you-working-for-next-sony-pictures/,https://www.troyhunt.com/are-you-working-for-next-sony-pictures/

Disqus was somehow re-keying the posts to Ghost's temporary domain, inevitably when I previewed the posts before rolling over DNS. I've noticed similar oddities when previewing a post in Ghost and then later changing the title before publishing it; Disqus (for some reason) picks up on that first URL it sees - even when there are no comments - and keeps a hold of it for perpetuity. There's probably some more insights to be had re that behaviour and I'm not sure why Disqus themselves couldn't identify this issue when I logged the support ticket, but right now I'm just glad to have my posts back.

For those who have similar problems in future, here's the revised map I uploaded. I did it for every single URL and for both HTTP and HTTPS schemes as I suspect I loaded some of them over the secure protocol so I wanted to cover my bases. I had every old and new URL mapped in a database so it was an easy query to write and it's done the trick!

↧

100 data breaches later, Have I been pwned gets its first self-submission

April 24, 2016, 1:51 pm

≫ Next: Breach concealment is not a security strategy

≪ Previous: Disqus and the disappearing comments (which still exist)

100 data breaches later, Have I been pwned gets its first self-submission

I certainly didn't expect it would go this far when I built Have I been pwned (HIBP) a few years ago, but I've just loaded the 100th data breach into the system. This brings it to a grand total of 336,724,945 breached accounts that have been loaded in over the years, another figure I honestly didn't expect to see.

But there's something a bit different about this 100th data breach - it was provided to me by the site that was breached themselves. It was self-submitted, if you like. Usually, a site is breached and the data floats around the web whilst the impacted organisation either has no clue what happened or they stonewall and avoid admitting the incident. Just yesterday I wrote If I Can Verify Data Breaches, so Can Those Who Are Breached where I chastised organisations such as the Philippines Electoral Committee and Naughty America for still not acknowledging breach authenticity weeks after the incident. As much as ethics are lacking when hackers break into these systems and put the people in there at risk, so too are they lacking in the organisations that refuse to admit the incident and focus on protecting their members.

Recently, I received an email that included this request:

I am an admin / dev of a gaming forum with ~ 80,000 accounts that had a db breach a few weeks back and we'd like to add our breach to the site listing.

Now as you can imagine, I often have what you might call "interesting" interactions with various people who pop up out of the blue and want to talk about data breaches, but it turns out that this one was precisely what it suggests at face value. The site is TruckersMP and it's a trucking simulator:

100 data breaches later, Have I been pwned gets its first self-submission

News of the breach was published on their website on Feb 25 at 19:39 which is 2 hours and 9 minutes after they first discovered the incident. That discovery was only 30 minutes after the incident took place. The succinct blog post explains what happens and then offers an apology, all within a few hours of the event.

I was curious though as to why they'd reach out and offer the data to HIBP. We had a bit of email to and fro (which included me verifying I was indeed chatting with an admin of the site and that the data they provided was legitimate) and they had this to say on why they provided me with the data:

We're decently security minded and feel a responsibility and duty to inform our users when such a breach happens. All of the members of the team agreed it'd be ok to be added to the list with the notion that we'd like to see other sites do the same as well; given the unfortunate chance.

For a while now, I've had a few ideas forming about how I can use HIBP in conjunction with breached organisations to better support those who have accounts compromised, but I honestly wasn't expecting this.

Perhaps I've just become a little cynical after seeing literally hundreds of "we take security seriously" statements from organisations which clearly didn't and to see a response like this where they're not trying to spin the story to their own advantage or misconstrue facts is heartening. If only those with nation state budgets or billion dollar revenues could act so responsibly.

There are now 83,957 TruckersMP accounts searchable on HIBP.

↧

Breach concealment is not a security strategy

April 29, 2016, 12:56 am

≫ Next: Here's how I verify data breaches

≪ Previous: 100 data breaches later, Have I been pwned gets its first self-submission

Breach concealment is not a security strategy

I've just been reading Kingpin by Kevin Poulsen which sheds some really interesting light on criminal credit card fraud in the mid 2000's. Towards the end of the book, there's a reference to a 1997 case in which the government persuades the sentencing judge to permanently seal the court transcripts for fear that disclosure would impact the targeted company as follows:

loss of business due to the perception of others that computer systems may be vulnerable

There were 80,000 people impacted in that incident and they never knew that their personal information had been obtained by criminals for fear that the very organisation that lost the data in the first place would be adversely impacted. Fortunately, cover-ups like this can no longer happen in many parts of the world. Most US states have had mandatory data breach disclosure laws since 2002, Australia has a draft bill that will go to parliament this year and the EU's General Data Protection Regulation is making it mandatory across the board there too. It's always been ethically dubious not to disclose a data breach to those who have been impacted by it, but it's also illegal in many places if not now, then very soon.

Yet somehow, well over a decade after we started seeing mandatory disclosure laws come into effect, some organisations not only ignore the push for public transparency, but even justify non-disclosure by saying it's in the victims' best interests to keep it quiet. I saw a security "strategy" this week in the wake of a major data breach which was alarming, to say the least. I want to capture the details of it here and frankly, tear it to shreds because we should never see an organisation playing fast and loose with people's data in this way. Hopefully if this strategy is ever considered by others in future they'll stumble across this post and think better of it.

This relates to the Lifeboat data breach from earlier this week. Well actually, the breach itself was many months ago but the disclosure was only this week and therein lies the problem.

When approached by the reporter in the above story about this incident, Lifeboat stated:

When this happened [in] early January we figured the best thing for our players was to quietly force a password reset without letting the hackers know they had limited time to act

I was stunned when I read this - you mean they knew about the incident and decided to cover it up?! I'm used to seeing organisations genuinely have no idea they've been hacked but to see one that actually knew about it - a 7 million record breach at that - and then consciously silence the incident without telling anyone left me speechless. This is 7 million records which contain passwords stored as MD5 hashes too which means that you can take the hash then simply Google it and like magic, here's the plain text value. Or you do it en masse using hashcat as I recently showed for salted MD5 hashes (hashes with no salt such as Lifeboat's are significantly easier to crack). A large portion of those passwords would be reverted to plain text in a very short time.

As much as that comment shocked me, the discussion I then saw on Twitter from someone who works for Lifeboat made it even worse. I'm not going to link directly to the thread in order to save the individual embarrassment because in all likelihood they'll later realise the serious implications of what they've said. I'm sure we've all evolved our thinking over time and would be embarrassed to look back on some of the views we held once upon a time and I suspect that will be the case for this bloke as well. But with that said, let's get to the meat of the issue.

It started out with a discussion on Twitter which used the same justification for concealing the breach:

If they alerted people about passwords being reset they would've basically been telling the hackers to hurry up and ALL data would've been stolen.

So let's start there. Assumedly, this statement and the earlier one about not letting the attackers know they have limited time relates to the window of opportunity in which an account can be exploited. As an attacker, you have someone's email address and their password and you want to use that to compromise other accounts because password reuse remains the norm rather than the exception. For example, just read through some of the responses to this tweet:

⚠ If you used your Lifeboat account password for any other services, please change them now ⚠ pic.twitter.com/H4302cOWwl
— Lifeboat (@lifeboatmc) April 27, 2016

Responses like this one:

Breach concealment is not a security strategy

One things for sure; if hackers are looking to exploit people then yes, they'd need to hurry up as that window of opportunity is now way smaller since the incident went public. Once people know their data has been leaked then there's a much higher likelihood that they'll change that password not just on the service that lost it, but on other locations too.

The Twitter discussion continued:

Lifeboat took an approach that protected some users whilst not alerting the hackers, pretty smart imo.

This is completely nonsensical. In no way whatsoever did covering up the incident protect anything other than their own reputation. Consider the tweet above - "all my fucking passwords are my Lifeboat one" - how on earth is it protecting the guy if they know malicious actors have his credentials and they don't tell him?! Notifying customers would have significantly reduced the window of risk; instead of many month's worth of opportunity to access victims' other accounts, attackers now had a much shorter period.

The database was compromised but hasn't been made publicly available (to our knowledge)

I'm certain this statement is correct, but it also proves the old adage of "absence of evidence is not evidence of absence". Perhaps it's simply an issue of how they interpret the term "publicly available" so let me clarify: I was actually approached by two separate individuals who offered the Lifeboat data. I have no idea whether they travel in the same circles or not, but one of them also had this to say:

also WAS being sold on RealDeal/Hell

These sites are both accessible via Tor and they're notorious for trading in data breaches. For example, here's R2 games being sold right now:

Breach concealment is not a security strategy

R2 was actually a very similar deal to Lifeboat; someone approached me with the data some time ago after trading it within their community and I loaded it into Have I been pwned. Also like Lifeboat, R2 remains in denial about the spread of the data, in fact someone forwarded me on this message after contacting R2 about the breach:

We have also received numerous reports about this which is very alarming, however you must not worry because this news spreading was not true, R2Games is safe and secured and far from being hacked

To make absolutely certain there was no ambiguity about the incident, I verified a number of the individual's data attributes from the breach and he confirmed that it "is very legit". Another person had a similar discussion with R2 and they told him this:

We have asked our operations department regarding this issue of breach and they deny all the allegations this website is saying

I don't want to go too far off on an R2 tangent here, the point I want to make is that the realm of data breaches is often foreign to organisations that don't travel in those circles. It's an unfamiliar environment to them and I'm not at all surprised that Lifeboat was unable to locate their data. Sites like Real Deal and Hell tend to be transient both in terms of their existence (Hell has disappeared and reappeared numerous times) and in terms of the individual data sets they're trading. Many other people have the Lifeboat data and frankly, none of us know exactly what they've done with it.

The Twitter discussion went on:

Just because it's been sold or traded doesn't mean those people are smart enough or evil enough to use the emails and passwords for bad.

Precisely why do they think people actually pay for this data?! Whilst I'm sure there are some individuals that simply like to hoard it, there are many others that purchase it because there's an ROI. They spend money on the data because they can get a return on exploiting it, particularly when you have a dump as large as 7 million records and passwords stored in an easily reversible fashion. We know this very well and we've seen it happen time and time again, even all the way back to 2010 when Gawker was hacked (Del Harvey is the Head of Twitter Trust & Safety):

Got a Gawker acct that shares a PW w/your Twitter acct? Change your Twitter PW. A current attack appears to be due to the Gawker compromise.
— Del Harvey (@delbius) December 13, 2010

By barely protecting credentials in storage and then letting them float around the web for months without telling anyone, they put people at huge risk. Even in the update Lifeboat eventually issued once the incident got press (which includes the pre-requisite defence of we-now-take-security-seriously-even-though-we-didn't-before), whilst on the one hand it's heartening to see them apologise, on the other hand they then downplay the significance of the incident:

We do not know of anyone who has actually had their email or other service hacked

Let's just consider this for a moment; they're saying that they don't know of anyone who had their email compromised as a result of their credentials being taken from Lifeboat months ago. Now what does it look like when this happens, I mean when someone logs into your email account using stolen credentials? Well for one thing nobody would have turned around and blamed Lifeboat because they didn't know Lifeboat lost their credentials! Lifeboat is 100% correct in saying they don't know if anyone had other things hacked - how would they?! It's a Chewbacca defence that has nothing to do with whether other accounts were actually compromised or not.

There's also a school of thought that says "hey, this is just a game, why does security even matter?" In fact, Lifeboat specifically downplay the value of its members' credentials in their getting started guide:

This is not online banking

No, Lifeboat isn't online banking but you know what many of those passwords will get you access to? Online banking. And email. And ecommerce accounts and all the other places that people reused their credentials. We all know damn well that password reuse is the norm, not the exception and when you hold someone's password - and it very frequently is "their password" that they use everywhere - you have a responsibility that goes well beyond your own system. For example:

Breach concealment is not a security strategy

The person who originally posted this subsequently deleted it so I've obfuscated her identity here, but she's merely repeated what is a truth for a huge proportion of the Lifeboat users. It's a statistical certainty that there's a 7 figure number of people in the Lifeboat data breach who reused their passwords.

By attempting to suppress the incident, Lifeboat left millions of people completely exposed. This was unequivocally the wrong thing to do which is why even more so than the breach itself, their decision to cover it up is what's making headlines. It's simply indefensible.

People are not happy, as is evidenced in the comments on their announcement that eventually came after the public exposure:

Breach concealment is not a security strategy

The Twitters are likewise angry:

@lifeboatmc @SurvivalHiveDE No, that wasn't the real problem. The real problem is you guys not caring enough to tell your users about it.
— hcherndon @ Hypixel (@hcherndon) April 27, 2016

@DavidJBrockway @DALTONTASTIC @TheLoneFireWolf @lifeboatmc Agreed. They should have let people know about this when it actually happened.
— HotshotHD = Kareem (@Hotshot_9930) April 27, 2016

@lifeboatmc You stayed quiet to cover your own asses & most of us didn't find out until yesterday. That is 3 months unaccounted for.
— Dalton Edwards (@DALTONTASTIC) April 27, 2016

@Cajun_MCPE @ZahneDoesTweets @shanjilahmed @Nykyrian_Q192 @lifeboatmc The fact that they didnt tell us when they found out makes me angry.
— Cajun (@Cajun_MCPE) April 27, 2016

Fortunately, in their announcement they do say that since the hack "The password that you chose is encrypted using much stronger algorithms" which is good news... assuming they mean "hashed" and not "encrypted", that is.

Breaches happen, it's an inevitability of modern day online life. How the organisation deals with it is the key. Compare Lifeboat's approach to that of TruckersMP which I wrote about only 5 days ago. They "owned" the incident; they got on top of it early, admitted their shortcomings and were transparent in the way they dealt with it. The first thing that should be on any company's mind after an incident like this is "how do we minimise the damage to our users".

I honestly don't know if the Lifeboat folks made this decision genuinely thinking it would help their users rather than jeopardise them or if they were just trying to cover their own arses. I genuinely hope it was the former and that with the benefit of hindsight they'd approach things differently next time. At the very least, I hope that any organisation considering a similar strategy observes the fallout from this one and thinks better of it in the future.

↧

Here's how I verify data breaches

May 6, 2016, 10:35 am

≫ Next: Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

≪ Previous: Breach concealment is not a security strategy

Let me start with this headline:

Here's how I verify data breaches

Other headlines went on to suggest that you need to change your password right now if you're using the likes of Hotmail or Gmail, among others. The strong implication across the stories I've read is that these mail providers have been hacked and now there's a mega-list of stolen accounts floating around the webs.

The chances of this data actually coming from these service providers is near zero. I say this because firstly, there's a very small chance that providers of this calibre would lose the data, secondly because if they did then we'd be looking at very strong cryptographically hashed passwords which would be near useless (Google isn't sitting them around in plain text or MD5) and thirdly, because I see data like this which can't be accurately attributed back to a source all the time.

That's all I want to say on that particular headline for now, instead I'd like to focus on how I verify data breaches and ensure that when reporters cover them, they report accurately and in a way that doesn't perpetuate FUD. Here's how I verify data breaches.

Sources and the importance of verification

I come across breaches via a few different channels. Sometimes it's a data set that's broadly distributed publicly after a major incident such as the Ashley Madison attack, other times people who have the data themselves (often because they're trading it) provide it to me directly and increasingly, it comes via reporters who've been handed the data from those who've hacked it.

I don't trust any of it. Regardless of where it's come from or how confident I "feel" about the integrity of the data, everything gets verified. Here's a perfect example of why: I recently wrote about How your data is collected and commoditised via "free" online services which was about how I'd been handed over 80 million accounts allegedly from a site called Instant Checkmate. I could have easily taken that data, loaded it into Have I been pwned (HIBP), perhaps pinged a few reporters on it then gone on my way. But think about the ramifications of that...

Firstly, Instant Checkmate would have been completely blindsided by the story. Nobody would have reached out to them before the news hit and the first they'd know of them being "hacked" is either the news headlines or HIBP subscribers beating down their door wanting answers. Secondly, it could have had a seriously detrimental effect on their business; what would those headlines do to customer confidence? But thirdly, it would have also made me look foolish as the breach wasn't from Instant Checkmate - bits of it possibly came there but I couldn't verify that with any confidence so I wasn't going to be making that claim.

This week, as the news I mentioned in the intro was breaking, I spent a great deal of time verifying another two incidents, one fake and one legitimate. Let me talk about how I did that and ultimately reached those conclusions about authenticity.

Breach structure

Let's start with an incident that has been covered in a story just today titled One of the biggest hacks happened last year, but nobody noticed. When Zack (the ZDNet reporter) came to me with the data, it was being represented as coming from Zoosk, an online dating site. We've seen a bunch of relationship-orientated sites recently hacked and that I've successfully verified (such as Mate1.com and Beautiful People) so the concept of Zoosk being breached sounded feasible, but had to be emphatically verified.

The first thing I did was look at the data which appears like this:
Here's how I verify data breaches

There were 57,554,881 rows of this structure; an email address and a plain text password delimited by a colon. This was possibly a data breach of Zoosk, but right off the bat, only having email and password makes it very hard to verify. These could be from anywhere which isn't to say that some wouldn't work on Zoosk, but they could be aggregated from various sources and then simply tested against Zoosk.

One thing that's enormously important when doing verification is the ability to provide the organisation that's allegedly been hacked with a "proof". Compare that Zoosk data (I'll refer to it as "Zoosk data" even though ultimately I disprove this), to this one:

Here's how I verify data breaches

This data was allegedly from fling.com (you probably don't want to go there if you're at work...) and it relates to this story that just hit today: Another Day, Another Hack: Passwords and Sexual Desires for Dating Site 'Fling'. Joseph (the reporter on that piece) came to me with the data earlier in the week and as with Zack's 57 million record "Zoosk" breach, I went through the same verification process. But look at how different this data is - it's complete. Not only does this give me a much higher degree of confidence it's legit, it meant that Joseph could send Fling segments of the data which they could independently verify. Zoosk could easily be fabricated, but Fling could look at the info in that file and have absolute certainty that it came from their system. You can't fabricate internal identifiers and time stamps and not be caught out as a fraud when they're compared to an internal system.

Here's the full column headings for Fling:

CREATE TABLE `user` (`duid` int(10) unsigned NOT NULL AUTO_INCREMENT, `username` varchar(64) NOT NULL, `password` varchar(32) NOT NULL, `email` varchar(255) NOT NULL, `email_validated` enum('N','Y') NOT NULL DEFAULT 'N', `accept_email` enum('N','Y') NOT NULL DEFAULT 'Y', `accept_im` enum('N','Y') NOT NULL DEFAULT 'Y', `md5` varchar(32) NOT NULL, `membership` enum('FREE','PROMO','GRANDFATHERED','BRONZE','SILVER','GOLD','ADMIN') NOT NULL DEFAULT 'FREE', `join_date` datetime NOT NULL, `birth_date` date NOT NULL, `location_id` varchar(8) NOT NULL, `gender` enum('COUPLE','MAN','WOMAN','TS','UNSPECIFIED') NOT NULL DEFAULT 'UNSPECIFIED', `seeking` set('COUPLE','MAN','WOMAN','TS','UNSPECIFIED') NOT NULL DEFAULT 'UNSPECIFIED', `interested_in` set('FETISH','GROUPSEX','SEXUAL RELATIONS','ONLINE FLIRTING','OTHER','UNSPECIFIED') NOT NULL DEFAULT 'UNSPECIFIED', `last_login` datetime NOT NULL, `mobile_user` enum('N','Y') NOT NULL DEFAULT 'N', `mobile_phone_no` varchar(16) DEFAULT NULL, `mobile_carrier` varchar(20) DEFAULT NULL, `discreet_profile` enum('N','Y') NOT NULL DEFAULT 'N', `featured_profile` enum('N','Y') NOT NULL DEFAULT 'N', `power_user` enum('N','Y') NOT NULL DEFAULT 'N', `account_status` enum('ACTIVE','USER_DISABLED','ADMIN_DISABLED','SCAMMER_DISABLED') NOT NULL DEFAULT 'ACTIVE', `advert_id` varchar(25) DEFAULT NULL, `ip_address` varchar(16) NOT NULL, `mtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`duid`), UNIQUE KEY `username` (`username`), UNIQUE KEY `email` (`email`), KEY `location_id` (`location_id`), KEY `md5` (`md5`), KEY `join_date` (`join_date`), KEY `ip_address` (`ip_address`), KEY `password` (`password`), CONSTRAINT `user_ibfk_1` FOREIGN KEY (`location_id`) REFERENCES `geo_location` (`location_id`) ON UPDATE CASCADE) ENGINE=InnoDB AUTO_INCREMENT=64192949 DEFAULT CHARSET=utf8;

The other thing in terms of structure is that the Fling data begins with this:

-- MySQL dump 10.11
--
-- Host: 192.168.1.28    Database: fling
-- ------------------------------------------------------
-- Server version    5.1.41-enterprise-gpl-advanced-log

It's a mysqldump of the data with enough version and host info to again, create a much higher degree of confidence in the data not just for me in terms of how it "feels", but for Fling themselves to be able to verify.

I'm very suspicious of data presented in the way the Zoosk breach was and compared to Fling, you can see how both would impact my confidence levels in different ways. Let's move on though and increase that confidence level a bit.

Enumeration

Most websites will tell you if an email address exists on the site, you just need to ask. For example, enter an email address into Adult Friend Finder's password reset feature and they'll tell you very clearly if it's already in their database or not. It's not always that explicit, Ashley Madison used to disclose account existing by returning slightly different responses. If a site isn't facilitating enumeration on the password reset, then it frequently is on the registration feature ("this email address is already registered") and it's rare not to be able to simply plug in an email address and be told via one channel or another if it already exists on the site.

Enumeration risks such as these are not "silent" in that something like a password reset will send an email to the recipient. Whilst it's by no means compromising their personal security in any way, I also don't particularly want to inconvenience people. But there's a way around that and it provides another upside too.

Mailinator accounts in data breaches

If you haven't used Mailinator before, you're missing out. It's an awesome way of standing up free, disposable email addresses and you can simply send a mail to [anything]@mailinator.com then check it on their site. There's also zero security and consequently, zero privacy. People often use Mailinator accounts simply as a means of passing the "please verify your email address" test that many sites pose before you can access them.

Mailinator accounts are perfect for testing enumeration risks. For example, the email address bigbob**@mailinator.com is the first one in Fling and if you plug that into their password reset form, you get this:

Here's how I verify data breaches

Curiously, Fling returns exactly the same message when the email is entirely fabricated; fat-finger the keyboard and you'll get the same response. In that regard, password reset may not be an enumeration vector on Fling but it doesn't matter because when testing a Mailinator account, the reset email is publicly accessible anyway:

Here's how I verify data breaches

It turns out that Big Bob also has a password of commensurate security to his choice of mail provider, and this gives us another verification data point:

Here's how I verify data breaches

Of course you can only do this with a breach where the site actually emails the password which (fortunately) isn't that common, but you can see how each of these processes starts to build confidence in the authenticity of the breach. That can be confidence that it is genuine as well as confidence that it isn't.

The Zoosk data had way too many accounts that weren't checking out. Some Mailinator accounts would cause their password reset to respond confirming an email had been sent but many others didn't. It's possible that accounts had been deleted from their end post-breach (sometimes this is just a "soft" delete - the record is still there but flagged as inactive), but the low hit-rate wasn't inspiring much confidence.

But there's another avenue I have available that's proven very reliable, and that's HIBP subscribers.

Verifying with HIBP subscribers

I'm now approaching 400k verified subscribers to HIBP, that is they've gone to the free notification service page, entered their email address then received an email at that address and clicked on a verification link. These are people who have an interest in protecting their online identities and they want to know about it when an incident occurs that impacts them.

What I've been doing with breaches that are harder to verify or I that want to have a greater degree of confidence in, is temporarily loading the email addresses into the SQL database in HIBP which stores the notification users (this doesn't contain the accounts the service allows you to search, those are stored in Azure Table Storage), then running a query that gives me results like this:

Here's how I verify data breaches

These are the most recently verified HIBP subscribers who appear in the Zoosk data or in other words, those who have a recent recollection of signing up to the service I run. I'll take 30 of those and send them an email such as this one:

Hi, I’m emailing you as someone who has recently subscribed to the service I run, "Have I been pwned?"

I’m after your support in helping to verify whether a data breach I’ve been handed is legitimate or not. It’s one that I need to be absolutely confident it’s not a fake before I load the data and people such as yourself receive notifications. This particular one is quite personal hence the extra due diligence.

If you’re willing to assist, I’ll send you further information on the incident and include a small snippet of your (allegedly) breached record, enough for you to verify if it’s accurate. Is this something you’re willing to help with?

I send this off with everyone BCC'd so inevitably a bunch of them go to spam whilst others are ignored or simply not seen for quite a while hence why email 30 people at a time. People who *do* respond are always willing to help so I send them back some segments of the data to verify, for example:

This relates to the website fling.com which an attacker has allegedly breached. Your email address is in there with the following attributes:

1. A password that begins with “[redacted]”
2. An IP address that belongs to [redacted] and places you in [redacted]
3. A join date in [month] [year]

Does this data seem legitimate? Other indicators suggest it’s highly likely to be accurate and your confirmation would be enormously helpful.

I sent this exact message back to a number of HIBP subscribers in the Fling data set and all of them confirmed the data with responses such as this:

That is indeed accurate. Lovely plaintext password storage I see.

There's a risk that people merely respond in the affirmative to my questions regardless of whether the data is accurate or not. However firstly, I've already found them in the breach and reached out to them - it's already likely they're a member. Secondly, I rely on multiple positive responses from subscribers so we're now talking about people lying en masse which is much less likely than just one person with a confirmation bias. Finally, if I really feel even greater confidence is required, sometimes I'll ask them for a piece of data to confirm the breach, for example "what month were you born in".

The Fling data was emphatically confirmed. The Zoosk data was not, although some people gave responses indicating they'd previously signed up. Part of the problem with verifying Zoosk though is that there's just an email address and a password, both of which could conceivably have come from anywhere. Those who denied membership also denied they'd ever used the password which appeared next to their email address in the data that was provided to me so the whole thing was looking shakier and shakier.

Zoosk wasn't looking legit, but I wanted to try and get to the bottom of it which called for more analysis. Here's what I did next.

Other verification patterns

In a case like Zoosk where I just can't explain the data, I'll often load the data into a local instance of SQL Server and do further analysis (I don't do this in Azure as I don't want to put other people's credentials up there in the cloud). For example, I'm interested in the distribution of email addresses across domains:

Here's how I verify data breaches

See anything odd? Is Hotmail having a resurgence, perhaps? This is not an organic distribution of email service providers because Gmail should be way out in front, not at 50% of Hotmail. It's more significant than that too because rows 4, 5 and 10 are also Hotmail so we're talking 24 million accounts. It just doesn't smell right.

Then again, what does smell right is the distribution of email accounts by TLD:

Here's how I verify data breaches

I was interested in whether there was an unexpected bias towards any one particular TLD, for example we'll often see a heap of .ru accounts. This would tell me something about the origin of the data but in this case, the spread was the kind of thing I'd expect of an international dating service.

Another way I sliced the data is by password which was feasible due to the plain text nature of them (although it could also be done with salt-less hashes as well). Here's what I found:

Here's how I verify data breaches

With passwords, I'm interested in whether there's either an obvious bias in the most common ones or a pattern that reinforces that they were indeed taken from the site in question. The most obvious anomaly in the passwords above is that first result; 1.7M passwords that are simply the escape character for a new line. Clearly this doesn't represent the source password so we have to consider other options. One, is that those 1.7M passwords were uncrackable; the individual that provided the data to Zack indicated that storage was originally MD5 and that he'd cracked a bunch of the passwords. However, this would represent a 97% success rate when considering there were 57M accounts and whilst not impossible, that feels way too high for a casual hacker, even with MD5. The passwords which do appear in the clear are all pretty simple which you'd expect, but there's simply not enough diversity to represent a natural spread of passwords. That's a very "gut feel" observation, but with other oddities in the data set as well it seems feasible.

But then we have indicators that reinforce the premise that the data came from Zoosk, just look at the 11th most popular one - "zoosk". As much as that reinforces the Zoosk angle though, the 17th most popular password implicates an entirely different site - Badoo.

Badoo is another dating site so we're in the same realm of relationship sites getting hacked again. Not only does Badoo feature in the passwords, but there are 88k email addresses with the word "badoo" in them. That compares to only 6.4k email addresses with Zoosk in them.

While we're talking about passwords, there are 93k on them matching a pattern similar to this: "$HEX[73c5826f6e65637a6e696b69]". That's a small portion of the 57M of them, but it's yet another anomaly which decreases my confidence in the data breach being what it was represented as - a straight out exploit of Zoosk.

Another really important step though is actually confirming a breach with the owner of the site that allegedly lost it. Let's delve into that.

Verifying with the site owner

Not only is the site owner in the best position to tell whether the breach is legit or not, it's also just simply the right thing to do. They deserve an early heads up if their asset has been accused of being hacked. However, this is by no means a foolproof way of getting to the bottom of the incident in terms of verification.

A perfect example of this is the Philippines Election Committee breach I wrote about last month. Even whilst acknowledging that their site had indeed been hacked (it's hard to deny this once you've had your site defaced!), they still refused to confirm or deny the legitimacy of the data floating around the web even weeks after the event. This is not a hard job - it literally would have taken them hours at most to confirm that indeed, the data had come from their system.

One thing I'll often do for verification with the site owner is use journalists. Often this is because data breaches come via them in the first place, other times I'll reach out to them for support when data comes directly to me. The reason for this is that they're very well-practiced at getting responses from organisations. It can be notoriously hard to ethically report security incidents but when it's a journalist from a major international publication calling, organisations tend to sit up and listen. There are a small handful of journalists I often work with because I trust them to report ethically and honestly and that includes both Zack and Joseph who I mentioned earlier.

Both the breaches I've referred to throughout this post came in via journalists in the first place so they were already well-placed to contact the respective sites. In the case of Zoosk, they inspected the data and concluded what I had - it was unlikely to be a breach of their system:

None of the full user records in the sample data set was a direct match to a Zoosk user

They also pointed out odd idiosyncrasies with the data that suggested a potential link to Badoo and that led Zack to contact them too. Per his ZDNet article, there might be something to it but certainly it was no smoking gun and ultimately both Zoosk and Badoo helped us confirm what we'd already suspected: the "breach" might have some unexplained patterns in it but it definitely wasn't an outright compromise of either site.

The Fling breach was different and Joseph got a very clear answer very quickly:

The person who the Fling.com domain is registered to confirmed the legitimacy of the sample data.

Well that was simple. It also confirmed what I was already quite confident of, but I want to impress how verification involved looking at the data in a number of different ways to ensure we were really confident that this was actually what it appeared to be before it made news headlines.

Testing credentials is not cool

Many people have asked me "why don't you just try to login with the credentials in the breach" and obvioulsy this would be an easy test. But it would also be an invasion of privacy and depending on how you look it, potentially a violation of laws such as the US Computer Fraud and Abuse Act (CFAA). In fact it would clearly constitute "having knowingly accessed a computer without authorization or exceeding authorized access" and whilst I can't see myself going to jail for doing this with a couple of accounts, it wouldn't stand me in good light if I ever needed to explain myself.

Look, it'd be easy to fire up Tor and plug in a username and password for say, Fling, but that's stepping over an ethical boundary I just don't want to cross. Not only that, but I don't need to cross it; the verification channels I've already outlined are more than enough to be confident in the authenticity of the breach and logging into someone else's porn account is entirely unnecessary.

Summary

Before I'd even managed to finish writing this blog post, the excitement about the "breach" I mentioned in the opening of this blog post had begun to come back down to earth. So far down to earth in fact that we're potentially looking at only about one in every five and a half thousand accounts actually working on the site they allegedly belonged to:

Mail.Ru analyzed 57 mil of the 272 mil credentials found this week in alleged breach: 99.982% of those are "invalid" https://t.co/fOrfJoZb12
— Lorenzo Franceschi-B (@lorenzoFB) May 6, 2016

That's not just a fabricated breach, it's a very poor one at that as the hit rate you'd get from simply taking credentials from another breach and testing them against the victims' mail providers would yield a significantly higher success rate (more than 0.02% of people reuse their passwords). Not only was the press starting to question how legitimate the data actually was, they were getting statements from those implicated as having lost it in the first place. In fact, Mail.ru was pretty clear about how legitimate the data was:

none of the email and password combinations work

Breach verification can be laborious, time consuming work that frequently results in the incident not being newsworthy or HIBP-worthy but it's important work that should - no "must" - be done before there are news headlines making bold statements. Often these statements turn out to not only be false, but unnecessarily alarming and sometimes damaging to the organisation involved. Breach verification is important.

↧

Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

May 10, 2016, 2:40 am

≫ Next: I'm doing a public "Hack Yourself First" workshop in London this June

≪ Previous: Here's how I verify data breaches

This was pretty big news 18 months ago:

Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

It was what greeted Sony Pictures employees when they turned up to the office and switched on their machines. Machines infected with malware was one thing - a very bad thing at that - but it got much, much worse for Sony.

In all, we saw about 40GB of company data walk out the proverbial door and it included everything from employee credentials to unreleased films to somewhere in the order of 170,000 corporate emails. It was all bad news, but those emails in particular made things especially awkward on the company because it involved such embarrassing exchanges as execs making racist comments about Obama. (Side note: think just for one moment if there's anything in any email you've ever sent - possibly even 20 years ago - which would damage your reputation. Yeah, it's a scary thought...)

But here's the point I'm driving at and why I've kicked off the intro to my new course with a Sony story: the attackers were in Sony's network for more than a year before being discovered. That's astounding not just because a year is a hell of a long time, but because it makes you stop for a moment and think "I wonder how many other networks have malicious actors wandering around them right now. I wonder if my network does?!"

A few years back, Brian Krebs wrote about a service selling access to fortune 500 firms. By all accounts it was a fairly well put together service too. You simply browsed through a rather friendly user interface, picked the organisation you'd like access to and then bought yourself some RDP time. It looked like this:

Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

That seems like a pretty good deal actually - $4.55 for access to a Windows Server box within Cisco! Times gone by, there was this mindset that the network perimeter of an organisation was holy and that anything within it could be trusted. But just as with that horse incident in the city bearing my name a few thousand years back, the perimeter is not sacred and indeed there can be some rather nasty stuff within your walls.

Getting back to the course for a moment, this is now the 7th course I've done in Pluralsight's Ethical Hacking series and as with the other 6, whilst it covers bits from the CEH syllabus, it's a course that can still stand on its own. I suspect everyone reading this already has an understanding of firewalls (although perhaps concepts such as bastion hosts and circuit-level gateways will be new), so I won't dwell on those here. Likewise with intrusion detection systems, whilst a little less familiar to many than firewalls, as a concept they're likely not completely foreign. Honeypots, however, are worth touching on here.

In the course I talk about various honeypot architectures and the role they play within organisations. The one that particularly excites me though is this little guy:

Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

This is a Canary. Obviously. More specifically, it's a turnkey device made by the folks at Thinkst in South Africa and it's quite literally called a Canary. You're probably aware of "the canary in the coal mine" saying, that is the canary was the little fella that would get gassed before the miners did and when he dropped off his perch, they knew it was time to make haste. Canaries in coal mines have become analogous with early warning systems and that's precisely what a honeypot is there for.

Now imagine that you've got a bunch of these little device and you plug them into the ethernet at work. You then remote into them and set them up like so:

Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

What you're actually doing here is deciding what sort of device you'd like the Canary to emulate. Chuck one in as a Windows Server with a file share, another in as a Dell switch and perhaps make another one appear like a Synology DiskStation NAS. And then you forget about them... unless one of them starts chirping. As soon as someone attempts to, say, pull files off the NAS, the admin gets an alert. A honeypot is a device that serves no normal functional purpose so as soon as someone connects to it (i.e. an adversary probing for devices on the network), alarms start going off.

The whole premise of what you're doing here is putting early warning canaries in your proverbial coal mine. Think back to that Sony Pictures incident for a moment; a honeypot isn't going to stop attackers getting in (that's what firewalls and IDS are for), but they're going to let you know early if they do. If Sony had a handful of Canaries connected to the network (or any other honeypot implementation, for that matter), it almost certainly wouldn't have been a year before the attack was discovered and the damage could have been significantly less.

Honeypots are a complimentary technology to IDS and firewalls and certainly there are many others beyond Canary that I touch on in the course as well. Strictly speaking, the course is titled Ethical Hacking: Evading IDS, Firewalls, and Honeypots but it's a lot more about "here's how these technologies function" than it is about evasion (in fact evasion of good honeypots is extremely difficult).

I've got one more course to go in this series and that's one on cloud security that I'm writing now. Until then, this is the latest one in the series and it's available on Pluralsight right now. Enjoy!

↧

I'm doing a public "Hack Yourself First" workshop in London this June

May 12, 2016, 1:38 am

≫ Next: Understanding CSP, the video tutorial edition

≪ Previous: Understanding firewalls, intrusion detection systems and honeypots with Pluralsight

Round 2 of European travel for me this year has just wrapped up with talks in Brussels for Techorama (which incidentally, was sensational!) followed by a private event for a multinational information services company in Barcelona doing my usual Hack Yourself First workshop. But it's time for the next one already so it's back to Europe again and then after catching my breath at home for a couple of week, time for some US travel for the first time this year. Let me share where I'm off to, where you can catch me and where I still have some time to come see some folks.

"Hack Yourself First" – The Workshop

When I was over in Europe in Jan and Feb, I did this two-day workshop seven times almost back to back with six of the events being in the UK. These are usually private workshops in that an organisation such as a financial institution or e-commerce site gets me in to take their developers through today's top online risks where they get to exploit things like SQL injection first hand. Most importantly though, they learn the defensive patterns so we close the loop of "here's how you attack" by then doing "here's how you defend". I find this gets builders endorsed in the value of security in ways that simply teaching them defensive mechanisms never would.

The next workshop I have coming up is in London on June 13 and 14 and it's being put on by the folks at Learning Connexions.

It's about the same cost as many conferences except it's designed to be small enough so that I get a heap of one on one time with everyone in the group. We're usually looking at around 20 people and I see everyone from testers to obviously developers to security pros. There's a much more comprehensive overview of what I do in the workshops page on my site so go and check that out if you'd like more info.

In the lead-up to this next trip, I'll be publishing some more videos that'll give you a good sense of what we do at these events. I recently recorded Understanding CSRF, the video tutorial edition and I'll do another similar one on content security policies shortly.

I'm really looking forward to this event, I thoroughly enjoyed my time in London in Jan and I've been promised better weather in June so I'm quite excited! The Jan trip got completely overbooked and I do need to cut back a bit on long trips, but the London event is definitely happening and you can register for it now.

A lot of stuff got hacked since my last trip...

I always talk a lot about current events when I run these workshops, in part because they're enormously relevant in terms of understanding what's happening in the real world and in part because they're just fascinating stories. I was just thinking about the incidents that had occurred in the three months since my last UK visit and I realised there were some absolute zingers - and that's just the ones I spent time analysing! For example:

There was the Philippines Electoral Commission hack which not only exposed more than half the entire nation's population - more than half of 100M people! - but also exposed data such as passport numbers and biometric fingerprint info used in their elections. Oddly, they actually encrypted first and last names but then they didn't encrypt email addresses which frequently contain... first and last name. SQL injection was the root cause of that one - it's the vulnerability that just keeps on giving.

Then there was the Lifeboat incident which not only exposed over 7 million members' email addresses and MD5 passwords (incidentally, one of the workshop exercises is cracking MD5 hashes which is ridiculously easy), but then they also decided keep the incident hidden from the public. There's some really interesting discussions to be had around the ethics of security not just as professionals working for organisations entrusted with customer data, but as individuals who may come across the sorts of weaknesses that brought Lifeboat undone.

The one that really hit the news big time was the Nissan LEAF vulnerability which exposed owners to the risk of any person in any location being able to turn their climate control on or off and track their driving habits. The particularly relevant thing about this event is that the vulnerability was discovered by someone in one of my workshops during that Europe trip. This was a guy whose total experience with this class of risk was the one hour exercise we did looking at how mobile apps talk to API back ends.

Perhaps the headline I least expected to see myself in this year (or ever) was the one from just a couple of days ago: Another Day, Another Hack: Is Your Fisting Site Updating Its Forum Software? Yes, that is what it sounds like and no, don't Google it if it's an unfamiliar term! Verifying that incident after someone had sent me the data was easy because the site had multiple enumeration risks, that is I could simply ask it if an email address exists on the site and it would give me a very clear "yes or no" answer courtesy of the password reset page. People often don't think of this as a vulnerability and for a site like, say, Stack Overflow it barely matters. But when it comes to disclosing something as personal as people's sexual proclivities, you probably don't want to make that a publicly discoverable thing.

There were many, many more incidents I got involved in over the last few months and many more again beyond that. These were just a few that really stuck out at me in terms of their significance and because all of them had major security flaws that only take a couple of days of training to get on top of.

USA in July

Just as a quick addendum to this, I'll be stateside in July for a private workshop and I'm presently trying to maximise the travel time with another couple of events. If you're working with an org that'd like me to come visit, hit me up via one of the channels on the contact page.

↧

Understanding CSP, the video tutorial edition

May 13, 2016, 12:30 am

≫ Next: Going dark: online privacy and anonymity for normal people

≪ Previous: I'm doing a public "Hack Yourself First" workshop in London this June

Understanding CSP, the video tutorial edition

A couple of months ago I did a video titled Understanding CSRF, the video tutorial edition which was a pretty raw run through of the mechanics and defences of cross site request forgery. It's content I often show in my workshops and I recorded the video pretty much as I present it in those sessions.

Today I thought I'd do one on content security policies or as we otherwise know it, CSP. This is a response header that tells the browser a number of things about what it can and can't do once it loads up a website and it's a really neat additional defence against risks such as cross site scripting. When you watch the video, you'll see how I demonstrate the value proposition of CSP by loading up a, well, let's just say "entertaining" video into a bank's website. This week I had a couple of people suggest that this was both beyond the developer's control and not representative of an actual security risk:

@nillisti @troyhunt Why "vulnerable"? Modifying the DOM on the client seems beyond control.
— Sven Aelterman (@svenaelterman) May 11, 2016

@troyhunt @svenaelterman @nillisti modifying a page in the client side is not a vulnerability. I can make my account look like £100000000
— javixeneize (@Todomotornet) May 11, 2016

I totally get what these guys are saying too - I mean how can you stop it and why would it be a risk when you're simply modifying something that's already loaded on your own machine?! Watch the video and see how I explain the concepts:

I hope that resonates, obviously it's a fun part of the workshop but it's also quite impactful when people then go off and actually build their own CSP and see it in action. Let me leave you with a bunch of useful resources on CSP:

Introduction to Browser Security Headers (Pluralsight Course) - This goes into a heap of depth about security headers in general; HSTS, HPKP and of course CSP.
Report URI - This is Scott Helme's service I showed in the video. Use this, it's awesome!
securityheaders.io - Also Scott's, use it to test the current state of any site's security headers.
CSP-Fiddler-Extension - This is the tool I use to help generate the CSP in Fiddler.
Can I use CSP - The resource which fully explains compatibility.

Finally, I blogged yesterday about my next public workshop which will be in London next month and there's still a few spots left there. I also mentioned being available for private workshops in the US when I'm there in July. Between all that and the resources above, you're running out of excuses not to have a CSP on your site!

↧

Understanding the data

Verifying the “breach”

Digging into the data

Verification with HIBP subscribers

Source URLs

Data harvesting services

Your data for sale

Conclusions

Opting out of HIBP

VTech and breach retirement

Summary

It’s not for lightweights…

Benchmarks

High DPI displays – we’re not quite ready

What’s it like to use the P50?

Cheap productivity

Summary

Background

Breach distribution and contents

Breach verification with “Have I been pwned” subscribers

Other data attributes

HIBP and summary

Ghost, Ghost Pro

CloudFlare

It's all HTTPS

Tor

Performance

Perf tweaks galore

Designing the site

Content migration and new pages

Ads

I was going to go naked...

Search

Open source

Feedback

We're fixed!

Sources and the importance of verification

Breach structure

Enumeration

Mailinator accounts in data breaches

Verifying with HIBP subscribers

Other verification patterns

Verifying with the site owner

Testing credentials is not cool

Summary

"Hack Yourself First" – The Workshop

A lot of stuff got hacked since my last trip...

USA in July