Quantcast
Channel: Troy Hunt's Blog
Viewing all 885 articles
Browse latest View live

Going dark: online privacy and anonymity for normal people

$
0
0
Going dark: online privacy and anonymity for normal people

Last week we got news of the Rosebutt data breach. This is a very particular class of site and like many others we've recently seen compromised, it's highly likely that members would have preferred to keep their identities secret. It doesn't matter if you don't agree with the lifestyle choice of those on the site and certainly I myself am not one to look around the house at everyday items and think "I wonder if that could...". That's entirely beside the point though which is that a bunch of consenting adults now have their identities in the hands of an untold number of people who are willingly sharing the data around web. But it didn't have to be that way.

I've had this post in mind for some time as I've seen more and more deeply personal data spread across the web. Ashley Madison is a perfect example of that and many people were shocked at just how many real identities were contained in the data, identities that then caused a great deal of grief for their owners.

I want to talk about practical, everyday things that people who aren't deeply technical can do to better protect themselves. They're simple, mostly free and easily obtainable by everyone. I'd also like to encourage those who do give online anonymity a lot of thought to leave their suggestions in the comments section, keeping in mind the target audience being your normal, everyday people.

Let's start somewhere extremely practical yet often not acknowledged in discussions on privacy and anonymity.

Who are you hiding your identity from?

This is a fundamentally important questions because it greatly impacts the defences you apply. The measures you take to hide your identity from, say, a significant other or general member of the community may not be sufficient to hide from government oversight. Of course the latter will usually also protect you from the former, but it also often comes with an additional burden to implement.

I'm going to focus on what's readily accessible to the bulk of the population. If you don't want your participation in certain sites going public, then this will be useful. If you're a budding Edward Snowden then you'll need to go much, much further.

Let's jump into it, and we'll start somewhere simple.

Don't use your real email address

The easiest personal identifier that will match you to a site is your email address. It's a well-known identity attribute, it's unique to you and there are multiple ways of discovering if it exists on a given website. One of those ways is obviously when data is breached from a system and all the email addresses are on easy display:

But another way is via an enumeration risk. For example, you can go to the Adult Friend Finder password reset page and you simply enter an address - any address. The page will then tell you if it exists or not. It's not always that explicit either, for example Ashley Madison returned slightly different responses which could still be observed.

Don't use your personal email address. Don't use your work email address. Go to gmail.com and fabricate one. When you do fabricate one, don't put your real name in it! You can fat-finger the keyboard or enter a fabricated name but don't use your own. Also consider how you fill out the following form when you create the account:

Going dark: online privacy and anonymity for normal people

These attributes won't show up on other sites where the address is used, but they can start to surface in other places. For example, when doing a password reset:

Going dark: online privacy and anonymity for normal people

If you authenticate to another site using your Gmail account (social logins are increasingly common), then you may be prompted to share data attributes such as your name with that site. When you create a set of personally identifiable attributes such as those in the Gmail signup screen above, there are all sorts of ways that info can be leaked. Not just the mechanisms above, there's always legal requests by law enforcement. Whilst that's unlikely to be the threat that most folks just wanting to remain genuinely anonymous on the classes of personal site we continually see being breached, it's also an unnecessary risk.

Gmail (or equivalent - there are many other free online mail providers) gives you a full blown email address and obviously requests a lot of info in the process. A great alternative where an email address is simply a requirement to entry and you care little about anything that's actually sent to it is to use a Mailinator address. For example, you can enter an address such as foo@mailinator.com then head off to Mailinator, enter the alias and immediately view the contents of the mailbox:

Going dark: online privacy and anonymity for normal people

There's no authentication and therefore you need to assume there's no confidentiality (although the likelihood of someone stumbling across a genuinely randomised email alias is extremely low), but when email is simply a barrier to site entry then Mailinator is a very simple solution. Many of the data breaches I come across have numerous Mailinator addresses in them for precisely this reason.

The email address is the first, most logical step and honestly this is a huge portion of the anonymity story as it relates to identities being spread around the web when a system is compromised. But we can go further than that, much further.

Crafting an identity

Consider the data that many sites request on signup: name, location (possibly your exact address), date of birth, phone number etc. If protecting your identity is indeed important to you, consider what these values should be. If there's never any adverse recourse from fabricating them then do just that - make them all up. Make sure you record them though as they may be required for identification purposes later on, for example your date of birth is requested as part of an account unlock process. Don't put them in a text file on your desktop called "my secret online identity", put them in an encrypted keychain such as 1Password's.

Not the creative type? Then go generate yourself a fake identity:

Going dark: online privacy and anonymity for normal people

Be conscious also that sometimes the phone number is required for verification purposes, for example you may need to demonstrate that you can receive and SMS to the number. There are services available that allow you to receive an SMS to a number other than your own so that can help protect your actual number. Alternatively, go out and buy a cheap SIM that's enough to receive an SMS on. Some countries require you to verify your identity in order to do this, but if you're simply protecting your identity from the broader community then that's not a problem, it's not a number that will be publicly linked to you. Remember, that's just as uniquely identifying as an email address so it's not a data attribute you want to be sharing.

Making payments

Often people's fabricated personas come undone once payments are involved. For example, in the Ashley Madison data breach there was a very large number of payment records that logged personal attributes uniquely identifying the member, even after that member had paid for the "full delete" service. Due to the nature of many forms of online payment and the obvious potential for fraud, sites like Ashley Madison like holding onto as much data as they can so financial transactions can have a pretty long paper trail.

One option for getting away from using your everyday credit card for purchases is to get a virtual card, for example via the likes of Entropay. It's quick to setup and it operates like a debit card. You need to put cash on it but then as far as the merchant you're paying is concerned, there's no traceable route back to the original card. Of course there's a name on the card, but that may come from the fabricated identity earlier on (the issue of how that sits with Entropay's T&Cs aside).

There are other virtual credit cards that act like a debit card and simply require a top up before they can be used, the point is simply that there exists a paradigm that allows payments in this fashion.

Then there's good old crypto currency which is a favourite among those wanting to obfuscate their identity. Thing is though, as much as I love Bitcoin (try using it for perfectly legitimate, above board purposes one day just to experience it), the mainstream, above board places you can use it are pretty limited. For example, here are your options to pay Ashley Madison money:

Going dark: online privacy and anonymity for normal people

The virtual credit card approach would work, as would a PayPal account. Of course paying with PayPal sends the identity of the payer so whilst they request a full legal name at signup, in theory, you could sign up with something else. Hypothetically, of course. You may then be happy linking a real card or bank account or paying to it from another PayPal account as that identity doesn't get passed on downstream when purchases are made.

Obviously I've (not particularly subtly) skirted around the topic of using fake details to sign up to financial services and this may or may not bother some people (probably much more "not" if they're trying to hide their identity anyway). Just be conscious that when you use fake information with a service - particularly one that manages money - you may well find the account suspended at some time and it could be tricky to recover. Proceed with caution.

Browsing anonymously

In terms of your exposure to a particular site you may not wish to make public, have a look at how many data breaches I've loaded into Have I been pwned (HIBP) that include IP addresses. Many forum products capture and store them by default and many sites use them to identify everything from a rough physical location to possible fraudulent activity.

These sites are storing your publicly facing IP address, that is the address they see when you make requests to their sites. If you're at home, all your machines within your home network have the same outbound address. The same often (but not always) applies to requests from within corporate networks. Particularly in the home though, there's a very high chance of the IP address uniquely identifying you. For example, your significant other jumps on the home PC and heads to whatismyip.com which tells them something like this:

Going dark: online privacy and anonymity for normal people

Find that IP address in, say, the Adult Friend Finder data breach and there's going to be some explaining to do. Not always, mind you; whilst many people have "static" IP addresses (the one IP sticks with them for the life of their time with the ISP), many others are dynamic (the ISP rotates the addresses over time). In fact, where an IP address is dynamic, someone may be unfairly implicated simply because they obtain an address previously used by someone else. Of course ISPs have logs of which addresses they provisioned to which customers (indeed these are often requested by law enforcement), but that doesn't solve the "suspicious partner" problem.

There are several options to hide your identity and we'll start with one that doesn't work in the way many people think it does:

Going dark: online privacy and anonymity for normal people

Incognito mode in Chrome or private browsing in Firefox and Internet Explorer or more colloquially, "porn mode", have their uses, but what they won't do is hide your IP address from the sites you're browsing. What they're useful for though is giving you a "clean" browser session, that is one without cookies, cached files and browser history such that when you browse around the web, it's like being on a totally clean machine.

Incognito works best in conjunction with a VPN and indeed that's what I snapped the earlier screen cap of my IP address whilst using. A VPN gives you an encrypted tunnel between your machine and the VPN exit node. In the case above, I chose to go out via F-Secure's Singapore node using their Freedome VPN product. What this means is that any site looking at my IP sees F-Secure's, not mine. F-Secure see mine and were they so inclined, they could observe that it was my source IP actually browsing the site and indeed this means you put an enormous amount of trust in the VPN provider. I chose them precisely because of that - I trust them - and whilst the price point is higher than some of the others (although we're still only talking €4 a month for up to 3 devices), trust is paramount. Here's what I'm seeing as I write this:

Going dark: online privacy and anonymity for normal people

The primary value proposition of a VPN for me is that it means I can use public wifi while travelling and not worry about my traffic being intercepted by an airport or the hotel. But it also hides my source IP address and once combined with Incognito mode means that not only is my address hidden, the sites I've previously visited are too. Just using a VPN and a normal browser window puts you at risk of cookies in the browser associating your true identity to your browsing habits. For example, Facebook has tentacles in just about everything; visiting a site that integrates, say, the Facebook "like" button while you're logged into Facebook with that browser will announce your movements to them. Furthermore, accidentally hit a like button and now you're announcing your browsing habits to your friends.

Another option that's totally free is to use Tor. Traditionally thought of as "The Dark Web", Tor allows sites to run in a fashion that makes it difficult to identify where they actually are but it also enables people to browse the "clear web" anonymously. The Tor browser bundle can easily be installed on Windows, Mac or Linux and is a pretty familiar experience:

Going dark: online privacy and anonymity for normal people

The difference to browsing with, say, Chrome on the desktop though is that your traffic is routed through multiple nodes along the way such that your source IP address is not presented to the site you're browsing. This now puts us somewhere altogether different:

Going dark: online privacy and anonymity for normal people

Furthermore, it's easy to take a new identity within the Tor browser and jump out through another exit node and consequently obtain a new IP address. But there are problems with this approach too; for one, you're entrusting your traffic to some unknown party running an exit node. Traffic sent over HTTP (not HTTPS) can be observed and manipulated via someone running a malicious exit node. You'll also run into trouble with many sites treating Tor traffic as suspicious. CloudFlare recently wrote that 94% of the traffic they see from Tor is Malicious and consequently they continually pop up a CAPTCHA by default (this blog routes through CloudFlare but I've disabled that "feature" for Tor traffic). Even just loading that IP address checking site above resulted in me having to answer multiple CAPTCHAs and it's not a pleasant browsing experience.

None of these approaches are foolproof and all of them carry some burden of overhead with them, but for "normal people" (that is not techies or those worried about government interception), a good VPN and Incognito mode goes a very long way to protecting your identity. Just don't forget to enable the VPN before browsing!

Summary

For the purposes of protecting yourself from incidents such as I opened this blog with, most people are simply looking for one degree of separation from the site. This means that they don't want to give the site anything personally identifiable, but they may be happy to give other services upstream of there certain info. Think about this carefully when you create accounts; who are you giving personally identifiable data attributes to and where might they be passed on to?

I'm also conscious that the guidance here could be used to hide identities on sites that most reasonable people would not like to see others participating in. The advice above is not going to entirely circumvent law enforcement - it's not supposed to - it's simply there to enable you to participate in communities where anonymity is important. Start throwing in police surveillance, court orders and data requests of service providers and the wheels will fall off very quickly.

Online privacy is not necessarily about having something to hide, it's often about simply not wanting to share certain activities. If you consider that everything you do on the web might one day become public, you may well find there are classes of site you use where privacy is actually rather important.


Observations and thoughts on the LinkedIn data breach

$
0
0
Observations and thoughts on the LinkedIn data breach

Last week there was no escaping news of the latest data breach. The LinkedIn hack of 2012 which we thought had "only" exposed 6.5M password hashes (not even the associated email addresses so in practice, useless data), was now being sold on the dark web. It was allegedly 167 million accounts and for a mere 5 bitcoins (about US$2.2k) you could jump over to the Tor-based trading site, pay your Bitcoins and retrieve what is one of the largest data breaches ever to hit the airwaves.

But this is not a straightforward incident for many reasons and there are numerous issues raised by the data itself and the nature of the hack. I've had a heap of calls and emails from various parties doing stories on it over the last week so I thought I'd address some of those queries here and add my own thoughts having now seen the data. I'll also talk about Have I been pwned (HIBP) and the broader issue of searchable breach data.

Why a 4 year lead time?

This is one of the most common questions that comes up - what's been happening since 2012? Why have we only just now seen the data? The easy answer is that I don't know and it's quite possible that LinkedIn doesn't know either.

I wrote a longer piece about this last week in my Security Sense column titled There's a Lot of Hacked Companies We Don't Even Know About and the title pretty much sums it up. I cite other incidents there which demonstrate how often if can be years - sometimes longer than the LinkedIn lead time - between the hack and the subsequent public release of the data. Inevitably there's a catalyst, but it could be many different things; the attacker finally deciding to monetise it, they themselves being targeted and losing the data or ultimately trading it for something else of value.

But speaking of value, how much is the data actually worth?

Is the data worth $2.2k?

This is a recurring question - "is it worth it"? Are 167 million records really worth $2.2k?

Well firstly, I'm fond of the adage that "something is only worth what someone is willing to pay for it" and by all accounts, people have indeed paid for it (more on that later). But if you want to look at it another way, 167 million accounts selling for $2.2k is only 0.001 cents per account which at least to me, feel very cheap indeed.

Well it was 0.001 cents per account but already, we've seen that 5 BTC price drop:

This is curious and I suspect that both the points I made in that tweet are related. Earlier this year I observed that HIBP was having an impact on data breach prices due to the increased awareness it raised with those who've had their data exposed. Whilst the data wasn't in HIBP at the time of that tweet, it's possibly what we're seeing here in terms of it having been spread around more broadly. In fact, since then, the price has fallen even further:

The seller is quoted in a Motherboard story as saying:

[The] more i sell, and more days pass, [the] value drops

He then goes on to claim he's sold the data 6 times over for a grand total of $12k.

The lower price point will obviously make the data more accessible to more people, but it also likely indicates that the value is diminishing as the data is being abused. Speaking of which...

Accounts are being hijacked

The first screen cap I tweeted above references the Motherboard article about hackers hijacking the accounts of prominent individuals. There'll be a small window where this is possible and over time the likelihood of it will diminish as people change passwords on the various services where they've reused the one from LinkedIn. Let's be clear about that too - it's not just LinkedIn accounts that are at risk, the data dump puts many others at risk too.

I had someone contact me after receiving an email from Groupon who'd proactively reset his password right around the time the LinkedIn data started doing the rounds. I asked if perhaps it was related to LinkedIn and he replied:

I had long since changed my LinkedIn, but I guess they used that for Groupon

Then there was this case yeterday:

It's extremely difficult to prove that the LinkedIn data was the source of subsequent account takeovers on Groupon or Rockstar because that's the nature of password reuse - information obtained from many different sites can authenticate someone to many other sites. But I've seen similar reports from individuals that seem to have increased in volume since the LinkedIn incident and it would be unusual for this not to happen. Treat this as merely anecdotal as I've no way to verify it, but certainly it's a pattern we've seen many times before.

But all of this points to another important observation about the breach: the data is spreading. Account hijacks and proactive password resets in other services both point to the likely redistribution of the data, but this next point absolutely and emphatically confirms how those 167M records are now doing the rounds.

Password cracking

Very soon after the original news last week, we began seeing analysis of the passwords in the dump. KoreLogic runs a password recovery service and obviously they're going to be interested in analysing data of this nature as their ability to crack passwords en masse demonstrates the effectiveness of their service. As of the last update before writing this post, they'd cracked 49,999,999 unique hashes with 11,863,000 remaining. However, that's a much greater proportion of all accounts than the numbers suggest due to multiple people using the same password as other users (there were 1,135,936 LinkedIn members using the password "123456"). 5 days ago (and things have moved on since then), 86% of all accounts had cracked passwords which is a staggeringly high number courtesy of the choice of SHA1 storage (without a salt, that is).

There's a bunch of stats on that blog about the choice of passwords but there's really not much new information to glean; people still make bad password choices. For me, what was more interesting about the whole thing was to witness both how the data was spreading and how comprehensively the weak cryptographic storage was being cracked.

It's not just KoreLogic looking at cracking passwords though, there was this from Jeremi Gosney shortly after news of the leak broke:

The cracking stats by professional password crackers is one thing, but we're also seeing them turn up in other places too, for example:

Observations and thoughts on the LinkedIn data breach

This is on a site which frequently redistributes data breaches, a very different category of site to KoreLogic's with a very different MO to how Jeremi Gosney operates (he frequently provides services to law enforcement). It's just another observation about how the data is spreading.

Many LinkedIn members have not received emails (but some non-members have)

One of the things I've found quite intriguing about how LinkedIn has dealt with this data breach is the way in which notifications have been delivered to those impacted. For example, they started sending out emails to people the day after the news broke last week yet as of now, neither I nor a number of other people who were members before 2012 have received an email:

We were definitely in the data breach, but no email. Oddly enough, both David's and my records don't have associated password hashes in the breach data. If notifications are only being sent based on data attributes that are present (or not present) in the leaked dataset, does this mean LinkedIn are sending emails based on the data floating around publicly rather than what was in their membership database back in 2012?

Conversely, other people who weren't LinkedIn members have allegedly received notifications. The allegation being made here is that as I observed above, LinkedIn contacted people based on the data in the breach that was being sold. I not certain if this is true and there are other possible explanations for how this might happen. But be that as it may, based on what's been written in that post (and keep in mind there's not enough information there to verify the claims), the decision process behind who gets emailed and who doesn't appears to be quite unusual.

Another thing that struck me as odd with the emails is the call to action:

What this is saying is that the user's current password (which for many people, will be the one in the data breach), is still active. It can be used to login to the site after which the person using it (be that the legitimate owner or someone else who obtained the password) can then set a new one. Now clearly if they'd effectively invalidated all impacted accounts right off the bat it would have caused other issues and perhaps this was ultimately the best of a bad set of options, but it did strike me as unusual.

Having said that, there was another story yesterday that quoted LinkedIn as having now invalidated old passwords:

We have invalidated the passwords of all accounts that were created prior to the 2012 breach that hadn’t updated their password since then

The statement isn't entirely clear, but I'd like to believe it means people who hadn't changed their passwords are now getting reset emails as opposed to being able to use those old passwords in any sort of functional way. Then again, you could also read it as an attacker who logged in and changed a victim's password before this statement came into effect still has control of the account. And of course none of this is at all relevant to the fact that other services where passwords we reused likely remain vulnerable.

But there's another problem with breach notification emails like these and it's one we see time and time again after any significant event like this..

Phishing

Inevitably, we'll see a heap of stuff like this:

It's not immediately clear as to whether or not Vikram did indeed receive a phishing email, but we know scammers frequently leverage events like this to extract personal data from people. For example, after Heartbleed back in 2014 we saw a surge of phishing emails preying on the fact that people were expecting password reset notifications. It's a clever piece of social engineering that exploits the fact that many people will have their defences down - "oh, that'll be the password reset I heard I should be expecting, let me just enter my credentials..."

As painful as this entire incident is becoming for some LinkedIn members, it's hard to blame the choice of password storage on the folks who are now dealing with this incident. Here's why:

Inheriting a data breach

There's been some pretty vocal criticism of LinkedIn's handling of the breach and Per makes many important observations in there, including ones similar to mine regarding the reset process. From a pure security perspective, I agree wholeheartedly with him, but I'm also a little sympathetic at the same time.

Spare a thought for Cory Scott, LinkedIn's CISO. Cory has publicly blogged about the hack and no doubt bearing the brunt of the subsequently unfolding security dramas. But the attack didn't happen under his watch, he inherited what would later become a very public incident after arriving at the company in 2013.

There will be many others in the same boat as Cory who are now finding themselves at the receiving end of feedback such as this:

Of course SHA1 was a bad choice. It was a bad choice in 2012 let alone years later when the news is hitting (although I believe the password hashing algorithm was changed around the time of the original leak 4 years ago). But it wasn't Cory's choice and it didn't happen under his watch.

The point is that the folks dealing with this incident today are cleaning up someone else's mess. I'm sympathetic to those dealing with the breach and as much as we may feel tempted to blame LinkedIn as an organisation, it's worth remembering there are people dealing with this that are having a pretty miserable time through absolutely no fault of their own.

But of course for me, one of the things at the forefront of my mind after such an incident is how I'm going to deal with the data as it relates to HIBP. This is a bit nuanced, so let me try and fully explain it here:

Have I been pwned

I want to talk about HIBP and the LinkedIn data in a moment, but I want to share a portion of an email I received from someone this weekend first:

I'm not sure if you're aware of leakedsource.com - it's vaguely similar to haveibeenpwned.com in that it allows visitors to find out if they appear in dumps, except it feels like it's being run with more criminal intent. In contrast to your site, they don't give any background on themselves, they also have a paid subscription which allows access to the entire database, including other people's details and plain text passwords. On top of that, to remove one's self from the database for free, you're expected to send even more personal data to them, and then it's done manually, for which they conveniently have a "huge backlog".

I've been aware of Leaked Source for a while and I have thoughts on how they operate which are similar to what's represented in the comment above. What the person who sent me that email wasn't aware of at the time though was that Leaked Source subsequently received some pretty stern words from LinkedIn, most notably the following:

We have demanded that parties cease making stolen password data available and will evaluate potential legal action if they fail to comply

Obviously redistributing passwords is not going to go down well given the likelihood of abuse not just on people's LinkedIn accounts, but on their other online assets too. I don't necessarily think these guys have malicious intent, but I hope they can find a way to run the service that doesn't involve exposing sensitive data such as credentials (to their credit, they did remove the LinkedIn passwords from visibility after they got the notice). I'm not going to comment further on their service in particular but I do think it's important to talk about how data breaches are handled.

Following many of these incidents, breached data is publicly distributed and easily come by, in fact it was in the wake of the broadly redistributed Adobe breach of 2013 that I originally created HIBP. Since then we've seen many other times where breached data has rapidly spread across the web including Ashley Madison (probably the most downloaded breach of all time) and more recently, the Philippines Electoral Commission that impacted 55 million people. But just as there is an opportunity to do useful things with the data that genuinely helps those impacted, it's also very easy to make the whole situation a lot worse for all those involved.

My views on how data breaches are handled continue to evolve over time as I understand the impact of them better. For example, before the Ashley Madison data even went public I'd decided that it should never be publicly searchable. Whilst I knew it would be broadly distributed if it ever saw the light of day, it wasn't going to be HIBP that told someone's boss or wife or kids that the guy had an AM account. With the benefit of hindsight, that's almost certainly the single biggest factor that ensured I never received a DMCA take down from Avid Life Media when so many other search services did (many would actually return all the data found on someone to anyone who searched for it).

More recently, I made the decision to permanently delete the VTech data. Not just make it privately searchable but actually permanently and irrevocably delete it. I explain why in that blog post and it essentially boiled down to me being one of only 3 people to ever have it (including the since-arrested hacker and the journalist who sent it to me), combined with the fact that the data included kids who just shouldn't be caught up in this sort of thing (their average age was only 5 years old). A year ago I probably wouldn't have seen that as being necessary but again, my views are evolving as I better understand the data breach landscape.

What I'd most love for those of us dealing with this class of data to continually ask ourselves is this question:

How can we help people impacted by data breaches without making life worse for them?

I have no problem with people commercialising services that do that, but let's do it in a way that doesn't put people at further risk.

In terms of LinkedIn, a couple of days after the incident I had multiple people approach me with the data, people from very different walks of life at that (I highly doubt they'd shared the data with each other). Combined with the other indicators I mentioned earlier suggesting that the data is now starting to spread, I made the call to load it and as of now, it's searchable within HIBP. As with any other data breach that's searchable, it remains only searchable by email address and will only return a [yes/no] for whether an account exists in a breach, never any data attributes contained in the breach such as passwords.

I took some time before loading the data to get a good sense of how it was spreading, who had it and whether it was being abused. It's now obvious that somewhat predictably, it's in many hands and people are doing bad things with it. One of the key factors that ultimately drives me to load any data is that people have come to rely on HIBP as a trustworthy if not canonical resource for data breach information. I've had a lot of queries like this:

Not only was there a marked up-tick in traffic once the LinkedIn news broke, I've also now got 430k subscribers to whom I've committed to notifying when they're exposed in a breach. They don't pay me anything (it's a free service), but this is precisely the type of incident they're relying on me to notify them of.

This breach now pushes HIBP past half a billion records. Thank you to everyone who's supported the project via encouragement, suggestions, technical know-how and of course donations. Whilst I don't particularly want to see circumstances that add another half billion records, I'll continue to run the service whilst it remains both useful and viable.

The emergence of historical mega breaches

$
0
0
The emergence of historical mega breaches

Over the period of this month, we've seen an interesting trend of data breaches. Any one of these 4 I'm going to talk about on their own would be notable, but to see a cluster of them appear together is quite intriguing.

For example, just yesterday I loaded the Fling database (you probably don't want to go to fling dot com until you're in a private setting). That was over 40 million records and the breach dates back to 2011.

A few days before that it was LinkedIn which has been pretty comprehensively covered in the press by now. There's 164 million unique email addresses (out of about 167 million records in total), and that dates back to 2012.

Just now, I've finished loading tumblr into Have I been pwned (HIBP) with a grand total of over 65 million records dating back to 2013. That rounds out the total number of records loaded in just the last 6 days to 269 million, not that much less than I had in the entire system just a week ago. It's also the second data breach I've personally appeared in over that period, my 6th overall. (Incidentally, you may see various different stats on the exact number of addresses in the tumblr breach due to data idiosyncrasies such as the way deactivated accounts were flagged.)

But all of these will pale in comparison when the much-touted MySpace breach of 360 million records turns up. Whilst I've not seen a date on when the breach actually occurred, c'mon, it's MySpace and you know it's going to date back a way.

There are some really interesting patterns emerging here. One is obviously the age; the newest breach of this recent spate is still more than 3 years old. This data has been lying dormant (or at least out of public sight) for long periods of time.

The other is the size and these 4 breaches are all in the top 5 largest ones HIBP has ever seen. That's out of 109 breaches to date, too. Not only that, but these 4 incidents account for two thirds of all the data in the system, or least they will once MySpace turns up.

Then there's the fact that it's all appearing within a very short period of time - all just this month. There's been some catalyst that has brought these breaches to light and to see them all fit this mould and appear in such a short period of time, I can't help but wonder if they're perhaps related.

One explanation may be related to the presence of these breaches being listed for sale on the dark market:

The emergence of historical mega breaches

The emergence of historical mega breaches

The emergence of historical mega breaches

The emergence of historical mega breaches

These 3 are all listed by peace_of_mind and by all accounts, this individual is peddling a quality product:

The emergence of historical mega breaches

Apparently, buyers are happy. Now this is not to say that peace is the guy who's hacking into these sites and indeed attribution can be hard, particularly after so much time has passed by since the sites were actually attacked. But certainly there's a trend here which is hard to ignore.

But here's what keeps me really curious: if this indeed is a trend, where does it end? What more is in store that we haven't already seen? And for that matter, even if these events don't all correlate to the same source and we're merely looking at coincidental timing of releases, how many more are there in the "mega" category that are simply sitting there in the clutches of various unknown parties?

I honestly don't know how much more data is floating around out there, but apparently it's much more than even I had thought only a week ago.

Dating the ginormous MySpace breach

$
0
0
Dating the ginormous MySpace breach

It's been a crazy time for data breaches and as I wrote yesterday, we've seen a very distinct pattern of historical mega breaches lately. Fling in 2011, LinkedIn in 2012, tumblr in 2013 and the mother of them all, MySpace in, well, we don't quite know. There's been no information forthcoming from anyone about when this breach actually occurred and there's no explicit indicators in the data dump either (sometimes there are timestamps on account creation or website activity). So when did it actually happen? Let's work out.

Firstly, the only data in the breach is an incrementing ID (possibly an internal MySpace identifier which would enable them to date it), an email address, username and one or two passwords. The passwords are stored as SHA1 hashes of the first 10 characters of the password converted to lowercase. That's right, truncated and case insensitive passwords stored without a salt. There are likely some interesting insights to take away from the passwords alone, but it's the email addresses that can help us actually date the thing.

When we look at the top 3 email address in the MySpace breach by domain, we see an interesting distribution:

Dating the ginormous MySpace breach

What's up with Gmail?! Here we have the world's largest provider of email addresses and it has only a fifth the prevalence of Yahoo addresses. Think of the email account distribution like this:

  • For every one Gmail account there are 5 Yahoo accounts
  • For every one Gmail account there are 3 Hotmail accounts

But what we need to remember with Gmail is that they're a relatively new player. They entered private beta in April 2004 and didn't hit the mainstream until February 2007. There are still 25M accounts in the MySpace data so the incident certainly happened after that early 2007 time frame (I recall there were a lot of people in the beta, but I doubt it was enough for 25M of them to have MySpace accounts), but how much after?

Going back to another recent large incident, here's how the data on LinkedIn breaks down:

Dating the ginormous MySpace breach

This is obviously a really different split; Gmail is now well and truly out front which is more commensurate with what we'd expect today. Keep in mind that LinkedIn was hacked in May 2012 so now we have a window somewhere between then and 2007. Of course this is all assuming an even distribution of accounts over services at similar times which will never be exactly the case, but at least we're comparing two truly global services that launched at similar times (2002 for LinkedIn and 2003 for MySpace). It would be a reasonable assumption that MySpace was hacked well before LinkedIn was. What we really need though is more data from between 2007 and 2012.

One source of info is to look at the stats comparing mail providers from around the turn of the decade. For example, this Business Insider chart:

Dating the ginormous MySpace breach

Keep in mind that this is a US chart (although arguably MySpace was US-centric), and even if Gmail was at parity with Yahoo back then we'd still expect more Yahoo accounts as the incumbent mail provider; people were creating MySpace accounts before Gmail existed. But check the proportions in April 09:

  • There are about 3 Yahoo accounts for every 1 Gmail account
  • There's about 1.25 Hotmail accounts for every 1 Gmail account

This data feels too late when we think back to the 1:5 and 1:3 ratios for Yahoo and Hotmail versus Gmail. Let's find some stats on the previous year, for example this cart from TechCrunch in Jan 2009:

Dating the ginormous MySpace breach

Here the ratio of Gmail to Hotmail starts out at about 1:2.5 in Oct 2007 and finishes at about 1:1.5 in Dec 2008. Now again, this is not to say that as of the end of 2008 MySpace would have had 1.5 Hotmail accounts for every 1 Gmail account as they'd already had 6 years of accumulating Hotmail customers and only a couple of years of Gmail customers (at least since they went out of beta). But the patterns do start to fit much better, most notably because Gmail is far enough out of beta to have tens of millions of people use it on MySpace but not far enough along it's growth curve to come anywhere near the Hotmail and especially the Yahoo numbers.

What I really needed though was evidence of people who created accounts around this time frame, so I asked for some help yesterday:

I had a few followers get back to me and I checked their accounts in the data set. One account I was given was by someone who is in the breach stated quite emphatically:

if my memory serves me well, the account must have been created on April 28th, 2007

That was a good baseline, but then there was this:

Account was created around late November - early December 2007

And another with this:

4 November 2007 18:08 CET

(Many people actually kept the original welcome email from MySpace which proved enormously useful in this exercise.)

All of this was pointing to the gut feel I had about the incident not occurring earlier than a 2008 time frame. What I really needed though was a max date, so a point in time after which people registered yet weren't in the data breach.

The oldest account provided by someone who wasn't in the breach was also emphatic about the date:

I have a MySpace account (apparently) that I appear to have created on Dec 26, 2009

That last one used a MySpace prefix on a personal domain for their email address so was obviously tacking care to track which accounts were created where.

From about mid-2008, everything started going downhill for MySpace's business. Facebook was on a massive climb and MySpace was losing visitors. However, it wasn't losing registrations and you could speculate that they could have been frozen in time in 2008 then hacked years later and the evidence would still point to the incident occurring 8 years ago. It may be that the incident occurred after 2008, but I doubt it was much later as they were still getting registrations and they would have been very heavily Gmail biased by that time.

One possibility that would explain the ginormous volume of data that was taken (the extracted breach file is 33GB) is an insider threat. Keep in mind that we're talking about a time where MySpace was running into serious trouble and there were a raft of lay-offs, it's highly conceivable that someone literally walked out the door with the data. That's not to say it couldn't have been hacked in a more traditional external-actor sort of way, but the timing is coincidental...

So that's the best estimate I can draw on the evidence here - MySpace was probably hacked in the mid-2008 to early-2009 time frame. The data is now searchable in Have I been pwned (yes, all 359,420,698 unique email addresses) and if you do find yourself in there and know when your account was created, drop a note in the comments below and we might be able to crowd source a more accurate picture of when this event occurred.

Update: Since writing this piece, MySpace have posted a blog about the incident which they've dated as having occurred before 11 June 2013. They're not clear about how much before then the incident occurred and indeed it could have been many years earlier (it seems like they had a major architecture change then which has allowed them to provide some context around the date). Interesting side note: the MySpace blog post specifically names "peace" - the seller on the dark market site - which is very unusual in an announcement like this, particularly given he may just be the seller and not the individual who actually hacked the system in the first place.

How I prepared for the NDC keynote (and other speaker tips)

$
0
0
How I prepared for the NDC keynote (and other speaker tips)

I just had an absolutely sensational trip over to Europe which kicked off with my favourite event of the year - NDC Oslo. I first came to this event two years ago and talked about How I Hacked my Way to Norway which was the first big international talk I'd done. Per the link to that blog post, the talk went sensationally well, topping the charts for the event with 100% positive feedback (there's a green / yellow / red card rating system). I went back again last year and did Making Hacking Child's Play and again, it was a resounding success going right to the top of the charts. (Incidentally, these talks have helped me realise how much I love doing talks that are entertaining as opposed to purely functional.)

For this year, the folks at NDC asked me to do the opening keynote which was a bit of a speaking career highlight. I've done other keynotes at other events, but NDC is special to me given the history I've had there. It was only about two and a half years ago that merely speaking there was an aspirational goal and now here was an opportunity to open the event on a massive stage in front of more than 2,000 people. I wanted to get this right. Actually, I wanted to get it perfect or as near as dammit to that as possible.

I started writing this talk many months before the event, gradually refining it over time. That included many, many "dry runs" practicing it at home. I decided to record the very last practice run before leaving home and I thought I'd share that here for those who may be interested in how I prepare for a talk like this. Now that the actual talk from the event is live, I'll embed that below first and I suggest checking that out before looking at the rehearsal vid further down the page. I've just watched a fair portion of each and I'm pretty happy with how closely they align.

Here's the live talk:

There's no rating for the keynote so I can't share quantitative results, but it felt awesome in terms of audience engagement (you only get a small sense of the laughter and interaction from the crowd in the video). Some of the pics snapped by myself and others will give you a bit of a sense of what it was like to be there:

How I prepared for the NDC keynote (and other speaker tips)

How I prepared for the NDC keynote (and other speaker tips)

How I prepared for the NDC keynote (and other speaker tips)

How I prepared for the NDC keynote (and other speaker tips)

Scrolling back in time a bit, here's the rehearsal video I did beforehand. The intro where I explain how I prepare is probably the most useful as is the last minute or so where I talk about what went well and what I should change. Here's the prep video:

In the prep video I talk about a blog post covering the timing sheet. Check out Speaker style bingo: 10 presentation anti-patterns for that one.

I did indeed change the Clippy section before the actual talk and whilst I think it's better than the version in the rehearsal,it could be better yet. There were also a few other minor things I kept tweaking right up until the night before the talk. One of those was that I spoke with the organisers the day before and decided it worked better not to have someone intro me beforehand, primarily because I wanted to play the intro video then come straight on stage while people were still laughing about those clips. This was preferable to breaking things up with a more serious intro which would have brought the mood down a bit.

I ultimately decided to present from two different machines with my Lenovo P50 being the primary one I presented the slides from and my Yoga 900 serving as the Ashley Madison demo machine. The primary reason for this was that it allowed me to keep the machine with the slides in presenter mode so I could see what was coming next and the Yoga with the screen duplicated so I could look directly at it while interacting with the Ashley Madison website. Which was one of the things that went wrong...

For all my preparation and testing, I never tried using the Ashley Madison identity obfuscation tool on the Yoga before the talk. I couldn't properly drag and resize the mask over my face, I assume due to the high-DPI display on that machine. It's such a little thing that I never even thought would be a problem, yet here we are. I also didn't test the USB Type C video adaptor I had for the Yoga on that machine in advance of the talk. I tested it on the P50 and it worked fine, but it wouldn't play ball on the Yoga and I had to borrow someone else's. (Incidentally, the VGA to HDMI adaptor I had for the P50 died the following week only a couple of hours into a workshop I was running so my faith in video adaptors is at about an all-time low right now...)

In addition to the keynote, I also did a talk on "Lessons from a quarter of a billion breached records" which I'll embed below. This is one I've done several times although I keep modifying it each time to ensure it remains current. It's far less carefully rehearsed and in fact in this case, I shortened it and tacked on the bit where I added the VK data breach to Have I been pwned which took the system past the 1 billion record mark (yes, billion with a "b"). Here's the talk:

The feedback scores are yet to be released publicly so I won't relay precisely what I was told by an NDC insider, but it would appear the result is "consistent with previous years" which makes me enormously happy :)

For those of you down Australia way, NDC is coming to Sydney in August. I've been a part of the agenda committee on this one so have seen details of all the speakers and talks that will be coming down under and this event will really be second to none when it comes to Aussie tech conferences. This is the the one you want to get to in Aus this year and if you're on the wrong side of the globe, give NDC in London next Jan a go instead.

Evernote is crippling their free service, here's how to move to OneNote

$
0
0
Evernote is crippling their free service, here's how to move to OneNote

I've been using Evernote for about 6 years now. Nothing heavy duty, just basic notes that I collect around things like conference talk ideas, code snippets some recipes I often make and other rather mundane things. Anything sensitive goes into 1Password, this is just everyday notes about things I want easy access to across devices. For me, "devices" means my iPhone, iPad, desktop PC and a couple of laptops. The ability to simply fire up a device and have access to everything is important to me but unfortunately, that's all coming to an end:

To be clear, you can still sync across all your devices but now it'll cost you. It's normally $50 a year or $35 according to the discounted prices listed on their blog but that's not really the point. Read through the responses to the tweet above and you'll see that as I suggested there, people get kinda cranky when you take something away from them. Yes, Evernote needs to make a dollar out of this thing but they had commercial "Plus" and "Premium" plans before and the "Basic" plan was part of the freemium model. Inevitably they would have expected some degree of subscriber exodus by making this change, perhaps this segment of the market just isn't a priority any more.

In all honesty, there's more to me moving from Evernote than just the cost. Working across so many different devices, one of the things I found constantly painful was conflicts. I'd edit on a device then for some reason it wouldn't sync cleanly or it'd be delayed then I'd edit on another device and end up with a conflict. It's a messy conflict too - not just a "here's what's different, let us help you merge", rather "here's one big whack of text and there's another, you're on your own"! I use Trello a lot too and that thing always plays nice with device syncing. Then there's the constant visual indicator that pops up to let you know it's decided to sync. I've got many, many Pluralsight courses where the raw audio includes "Oh FFS Evernote!" followed by a retake without the damn popup.

Anyway, the point is that all combined it was time to leave Evernote and Microsoft OneNote was the logical successor (you'll see that as the unanimous suggestion in response to my earlier tweet too). They've got a page specifically for migrating from Evernote which includes a handy little tool that couldn't make things easier. Here's how it works:

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

Evernote is crippling their free service, here's how to move to OneNote

And that's it - job done! It's literally single digit minutes of effort then my import took about 15 mins to run after that. The import tools pops you over to OneNote afterwards which helps orientate you:

Evernote is crippling their free service, here's how to move to OneNote

Everything I checked came across cleanly so a note like this in Evernote:

Evernote is crippling their free service, here's how to move to OneNote

Now looks like this in OneNote (create date came across fine too):

Evernote is crippling their free service, here's how to move to OneNote

Fine on the iPhone too:

Evernote is crippling their free service, here's how to move to OneNote

And the iPad:

Evernote is crippling their free service, here's how to move to OneNote

And that's it - job done! Obviously it's a somewhat different UX to Evernote and just to be sure I won't be shutting down the old account until I'm absolutely confident that everything came across ok, but it was a super easy migration process and had I known it was this simple I would have made the leap earlier regardless of the pricing change.

Security insanity: how we keep failing at the basics

$
0
0
Security insanity: how we keep failing at the basics

Some days, it just feels like the world is working against you or in the case of today, like it's all just going to metaphorical security hell. As much as we like to keep pushing the needle further around the "strong security dial" with things like security headers, strong HTTPS implementations and robust hashing algorithms, every now and then we need to take a moment to remember just how low the bar still remains and that frequently, we can't even get the basics right.

Here's a bunch of examples just from today that illustrate how far we still have to go.

Password complexity rules still suck (for your own good, allegedly)

Earlier today I saw my mate Lars Klint call Etihad Airlines to account on behalf of their password complexity rules:

Firstly, why we still have this problem is beyond me. The entire premise of arbitrarily limiting password lengths this way is flawed. Not only does it suggest that just maybe - maybe - the problem is that they're trying to fit the password into that varchar(10) column in the database thus implying no cryptographic storage, it fundamentally weakens the choice of passwords available to the user. The screwy rule around what it begins and ends with only makes it worse and to add insult to injury, they'll happily let you create an account with the password of "password1".

So why does Etihad do this? Because "security":

Wait - what?! No really, it's in your own best interests:

Ok, we're used to screwy responses by Twitter account operators but clearly there is still an underlying problem here.

Sites are still breaking password managers

The Etihad thing was annoying but I decided to get on with life and go do something useful, something like changing my PayPal password. So I headed over to the change password page:

Security insanity: how we keep failing at the basics

I use 1Password to generate all my passwords so I right-click on the field then ask it to generate me a new one with the usual 30 characters of randomness:

Security insanity: how we keep failing at the basics

It fills in the form and I change the password just fine. It then asks me to log back in which I try to do but, well, then this happens:

Security insanity: how we keep failing at the basics

Shit. That's an account with money in it and I can't login! So I'm looking at the page trying to figure out what the hell is going on because there's absolutely no doubt whatsoever the password is correct, 1Password just stored the generated one for me. On a hunch, I took a look at the DOM inspector:

Security insanity: how we keep failing at the basics

Huh, no max length so it's not truncating it. On another hunch, I headed over to the registration form and inspected the password field there:

Security insanity: how we keep failing at the basics

Oh c'mon, they're truncating passwords there but not at login? So I chop off the last 10 chars of my new password and bingo - I'm in!

Same deal as Etihad in many ways and this is just a totally screwy implementation. I didn't count the obfuscated chars on the change password page and inevitably there were only 20 of them and not 30 but that implementation fundamentally broke my ability to use a password manager to create a new password. Frankly, I would have preferred to see them simply truncate it on the back end and let me live with the false sense of security that I had 30 chars when I only had 20. Or even notified me when they saw 20 chars appear in the form without a single key press that "hey, looks like you just generated a password, thought you'd like to know that we're gonna screw that up on you". Just breaking my login is ridiculous.

HTTPS remains hard (except it's not)

Let's go back to Etihad because after seeing their screwy password approach, I was pretty sure other things would be broken too. They didn't disappoint:

So why is this? A quick look at the console tells all:

Security insanity: how we keep failing at the basics

Let me demonstrate the problem and then the fix. This lovely flight attendant may not actually be a flight attendant at all - I can't trust her because she was loaded insecurely:

Security insanity: how we keep failing at the basics

However, this one really is a legitimate flight attendant as she was loaded over a secure connection:

Security insanity: how we keep failing at the basics

This image is the one causing the first warning in the console image above and it can very happily be served over HTTPS. We can see her embedded in the page as follows:

Security insanity: how we keep failing at the basics

The HTTP scheme is explicitly used instead of the HTTPS one. Either that or even better, had they just used a protocol relative link with a bit of src="//media.etihad... then everything would have been fine. It's such a pointless mistake made even worse due to the fact that she only even appears when you hover over a social link in the footer (and even then, her image hasn't been correctly sized):

Security insanity: how we keep failing at the basics

It's not just Etihad though, for some time now I've used Qantas in many of my workshops to illustrate the same thing:

Security insanity: how we keep failing at the basics

In fact, this is a great demo - go to https://www.qantas.com and watch as Chrome momentarily shows a green padlock... then yoinks it away! Actually, Qantas is even worse in many ways as the page doesn't load securely by default yet includes your frequent flyer login. Ah well, it's not like you'll be using an airline's website in insecure places like, oh, on free public wifi at an airport...

But Qantas is worse still because of this:

Security insanity: how we keep failing at the basics

They're posting off to an insecure form! But they can't do it securely in the first place (yet) anyway because the host at qantas.resultspage.com.au doesn't have a valid certificate - it's only valid for *.resultspage.com which doesn't include the .au TLD.

This is all so, so simple yet here we are, unable to get the most fundamental aspects of web security right. At least there's good training material available on the web...

Crazy password advice (and censoring critics)

Continuing my day of crazy app sec, someone sent me over a link to a piece on how to create your own encrypted password using PHP. Of particular interest was the following piece of code; take a moment to absorb this:

Security insanity: how we keep failing at the basics

This is simply character substitution - for each character in the input string, replace it with another character via a convoluted series of sub string, modulus and ASCII value operations. Repeat in reverse order to "decrypt". This is reminiscent of my post from a few years ago about The problem with website security is us! which shows similar screwy approaches.

I almost wouldn't have bothered including the story in this post because hey, we've all done some screwy code things in the past but we learn and move on, right? Except in this case, the guy started deleting comments. On a hunch that this may occur, I snapped a quick screen cap earlier on:

Security insanity: how we keep failing at the basics

Now I can see how some of these comments could feel very confrontational. There's a way to give feedback which can be both kind and constructive and whilst I appreciate the passion with which these (and others out of screen) responded, they probably could have done so a little more... diplomatically.

But what frustrates me far more is that despite the comments outlining the problems, not only did nothing change in the original post but the author elected to delete those comments so that others couldn't see the criticism. This is reckless - people will Google "encrypt passwords in php" and potentially find this piece and literally copy and paste the whole thing (betcha they even copy the "key"). I've actually had a post in draft for a while titled "We all get stuff wrong, now (wo)man up and take it constructively" and it refers to this exact pattern - a badly written post (albeit one that's an innocent mistake) and then the deliberate censoring of critics. I appreciate my calling him out here draws further attention to the issue and that itself may be uncomfortable for him, but I'm more worried about the downstream impact on others who read this without knowing what's wrong with it than I am about hurting the feelings of someone who won't accept criticism.

With the hope of leaving kind, constructive and relevant advice, I posted a comment myself which I'll leave here just in case it too disappears:

Security insanity: how we keep failing at the basics

Update: The following day, the author updated the post with a bcrypt example and acknowledged the comment I left on his site (which still remains at the time of writing). Kudos to him!

Security is often mixed, crazy messaging

Then there was Pandora and how's this for a bunch of screwy, mixed messaging: We start with the piece I published to my Security Sense column earlier today about over-communicating breach incidents. In there, I talk about this message from Pandora which I received earlier in the week:

Security insanity: how we keep failing at the basics

Read that piece mentioned above for why I don't think what they've done here is wise, but that's not the real issue I have with them. It all started when I went to the site to login and take a look see at what was new there:

Security insanity: how we keep failing at the basics

Ok - hang on - just to login (which many people would do to then change their password after receiving the above email), I need to enable one of the most frequently exploited pieces of code on the planet?! In an era where we've all unanimously agreed that Flash is not just on the way out but should by now be well and truly pushing up daisies, this is hard to fathom. Yes, I get that their legacy music player has a dependency on Flash but I'm not trying to play music, I'm just trying to login!

Ok, I'll humour them, let's enable Flash then login:

Oh c'mon! The screwy thing about this is that login is still just an HTML form, but you can't use it unless you enable Flash.

Because security is just pointless

Or at least that's how it feels sometimes. For all the progress we make in so many areas, seeing a flood of screwy things in one day just makes you shake your head. It'd be depressing, if it weren't for the fact that all the sorts of behaviours you see here have given me a very fulfilling career!

I'll leave you with one last screwy security thing of the day:

As if the very premise of civilians having guns in the first place wasn't already an unfathomable concept for most of the world that doesn't live in the US (and a good whack of them who do, by all accounts), but someone made a wifi enabled version! And because wifi isn't inherently insecure enough as it is, someone's gone and connected a consumer "device" to it that can fire bullets. And holy shit, look at the size of this thing!

Not exactly "for self-defence" or "should the British invade"! And then it talks over 802.11. And it can be hacked. But it's ok so long as hackers don't get within 100 feet. I think I'm over security for today...

I'm a car enthusiast - bring on self-driving cars!

$
0
0
I'm a car enthusiast - bring on self-driving cars!

I've had this post in mind for a while now. It's a little tangential to the sort of stuff I'd normally write, yet it's something I'm passionate about and has become more topical in the last few days. The catalyst for finally completing this piece came after last week's reporting of the first death in a Tesla operating under "auto pilot" (although the incident itself occurred in May). There are many very interesting angles here including the issue of how technology and automotive intersect with each other.

Obviously I'm very active in the technology scene but my involvement with cars is less well known (well perhaps beyond the recent Nissan LEAF incident, that is). They were a passion as a kid and as an adult, they've been somewhat of an obsession at times. Of most relevance to this piece, I spent the best part of a decade actively involved in club level motorsport spending every other weekend at a racetrack. They were fun times that only ever slowed down when kids arrived but a good whack of that decade was spent on circuits, skid pans, dirt rally courses and hill climbs. About five of those years were as an instructor driving many different cars within racetrack environments. Not in the road-based "here's how to indicate and read road signs" way either, rather "here are the thresholds of braking, this is how the weight distribution of the car shifts and affects dynamics" and many other nuances most people never think about. It meant I got to spend a lot of time doing (and teaching others to do) stuff like this:

I'm a car enthusiast - bring on self-driving cars!

The STi was one of the cars I had during this time and it was sensational for track events like this. The joy of being able to do this in a controlled and legal environment meant much time was spent doing things that would never be acceptable on the road and in turn, you learn things you'll never learn on the road.

These days, I drive something a very long way away from self-driving cars which I'll ultimately talk about in this post:

I'm a car enthusiast - bring on self-driving cars!

The Nissan GT-R is one of the few things that can make a Tesla Model S feel slow (and yes, I've driven several of them too), not necessarily just in terms of low-speed acceleration (I'll come back to that later), but across a much broader gamut of performance metrics. I'm getting off topic though, let's get on with the point of all this.

Driving aids are aids, not human replacements

If we just look at, say, the last couple of decades, consider how many driving aids have now become mainstream. ABS is one of the most significant that comes to mind, stability control systems (including traction control) is another. Many other smaller but important contributions have also come along, things like blind spot indicators and Mercedes' funky Pre-Safe implementation.

ABS is a great example that illustrates my point here and if you have a vehicle built in the last 10 years there's a high chance it's equipped with this. Before ABS, the temptation most people have in an emergency braking situation is just to jump on the anchors as hard as possible. The vast majority of people have very little idea of what the threshold of their brakes are so the wheels would lock, traction would be broken and braking distances prolonged. It also means they could no longer steer; once the wheels are locked, directional control is gone and the only way to get it back is to come back off the brakes which is entirely counter-intuitive when you're careering towards an obstacle. If you've never given it much thought to it, here's a simple explanation of how ABS works:

Every now and then you'll still hear the screech of tyres as the driver suddenly realises they're approaching stationary traffic and you'll always turn and see an older non ABS-equipped car with a shaken driver. Anti-lock brakes fundamentally change this equation as the driver's natural reaction to over-brake no longer leads to lockup. It has undoubtedly saved many lives and even more rear-enders and it asks nothing of the driver to be effective - they just do what they always did. But here's the point I'm getting to - ABS does not suspend the laws of physics. There is a velocity which when combined with road conditions, vehicle weight, tyre conditions and a bunch of other factors, simply cannot prevent you from running out of braking distance.

Driving aids are just that - aids. You still need judgement, experience and frankly, common sense. You're still ultimately in control. It's when the driver's expectation of the technology exceeds its ability that we have a problem.

The fallibility of "auto" things (that aren't really fully auto)

Here's a great video example relating to Volvo's Pedestrian Detection functionality:

Naturally, there's more to this incident than what immediately meets the eye but regardless of what was or was not enabled on the vehicle at the time, the fact remains that the driver didn't intervene of their own accord and use the brake pedal, they relied entirely on the technology... and it went wrong. Indeed, that was inevitably the purpose of the demo - for the driver not to intervene - but that doesn't change the nature of the outcome.

And this is what worried me when I first saw the Tesla Autopilot situation. Like ABS and indeed like Volvo's solution above, this is a really neat driving aid. But it's no more than that - an aid - and Tesla themselves are pretty adamant about the way it should be used. In their piece on the tragedy that unfolded last week, they had this to say about Autopilot:

an assist feature that requires you to keep your hands on the steering wheel at all times

Keeping your hands on the wheel is one thing, actually paying attention and being ready to take control is quite another though. But they're pretty clear on that too:

you need to maintain control and responsibility for your vehicle

They go on to say that you should "be prepared to take control at any time". Think of it like cruise control - it can be a handy aid but you're still ultimately responsible for controlling the car.

When the Tesla story broke last week, many news articles linked to a previous video of the driver who was involved in the accident. This showed Autopilot taking action as follows:

I watched this over and over again because it bugged me; rather than marvelling at the Autopilot's (admittedly impressive) ability to avoid a collision, I was watching the truck. It approaches on the driver's side ahead of the Tesla which is both in its blind spot and much lower than the cabin the driver was in. The truck driver was unequivocally in the wrong, yet... how on earth did the Tesla driver not see this coming?! What was his level of situational awareness because it certainly doesn't look like he was paying attention to the traffic around him.

The concern with driving aids such as Autopilot is that they lead to complacency. Only a few weeks ago, this video appeared of a driver literally sleeping whilst Autopilot handled the traffic for him:

As last week's news unfolds, we're learning more about the specifics of the incident. The driver of the truck the Tesla collided with has said that he could hear Harry Potter playing in the car and indeed the police have confirmed that a portable DVD player was found in the wreckage. It's pretty clear what's being implied here and again, we're yet to see official reports, but it's certainly conceivable that the driver was relying entirely on Autopilot and was not in position to take control of the vehicle himself.

Tesla's own blog post I referenced earlier said this:

Neither Autopilot nor the driver noticed the white side of the tractor trailer against a brightly lit sky, so the brake was not applied. The high ride height of the trailer combined with its positioning across the road and the extremely rare circumstances of the impact caused the Model S to pass under the trailer, with the bottom of the trailer impacting the windshield of the Model S.

This truck was so large that the body of the Tesla could actually pass under it to the extent that it was windshield height that actually impacted the trailer. And the incident occurred on a day with a "brightly lit sky", not at night or under inclement conditions. This is not a vehicle an attentive driver doesn't see.

Unfortunately, much of the reporting of this incident is shining Tesla in a negative light which is entirely undue in this situation. For example, CNN claiming the Tesla crash highlights real problem behind self-driving cars. That whole story is bizarre, claiming that "had the tractor-trailer also been driven by computer, it could have been on the same network as the Tesla" and then continuing on to imply that the accident occurred because the truck wasn't self-driving. The Tesla system is obviously good - very good - as a driving aid but whilst there remains a requirement for human intervention, humans will need to remain alert and attentive. Getting to the title of this post though, I'm actually very excited about a time then they won't need to be...

Bring on self-driving cars (for other people)

For many people, a car is a utilitarian device. It gets them between two points and that's about the end of it. They drive because they need to, not because they want to. So many of the videos of Tesla's Autopilot already circulating around the web demonstrate this perfectly; drivers want to sleep, watch movies, read books and generally do anything but actually drive. And I think they should be able to... eventually.

What I love about the inevitability of a future filled with self-driving cars is that those who have no interest in driving will no longer have to. I've seen time and time again how under-skilled drivers who are actually interested in driving are (most of my instructing experience was in a club for people who love cars), those who merely passed a test (possibly decades ago) often have some pretty serious (metaphorical) blind spots in their ability. In the very near future, we'll get self-driving cars to the point where they well and truly exceed the ability of your average human. I mean c'mon, wouldn't we all like to get these people out from behind the wheel?!

There's all sorts of issues to overcome well above and beyond the technology itself (wait until the first fully self-driving car kills someone...) but all we know that the reality is on the very near horizon. Make no mistake though - Tesla Autopilot is not that reality and they're very explicit about that.

As someone who genuinely enjoys driving, I love the idea of other people being driven around by their cars. I love that they'll be able to drive at a consistent speed at the legal limit or to the prevailing conditions. I love that they'll keep left unless overtaking (or right, depending on where you are). I love that they won't tailgate, they won't try to close out an empty spot towards the end of an overtaking lane and they won't dawdle around indecisively. And the reality is that most of the drivers in those cars will love that too.

Enthusiasts will still buy enthusiasts cars

Some people got a bit cranky when I said this a few months ago:

With the benefit of hindsight, perhaps I should have said "driving enthusiasts" rather than "passionate car folk" because certainly there are many passionate Tesla owners. They love the environmental impact, the way it's packaged together (check out the use of space if you haven't seen it, it's massively impressive) and they love the high tech bits. But the "enthusiasts" I refer to - the driving enthusiasts - are those that value the finesse of, say, a McLaren 675LT (plus it has doors that go like this \/ which makes it awesome):

I'm a car enthusiast - bring on self-driving cars!

You don't have to go stupid money in that regard either, I've had an absolute ball on racetracks in cars like the Toyota 86:

I'm a car enthusiast - bring on self-driving cars!

And Mazda's MX5:

I'm a car enthusiast - bring on self-driving cars!

In fact, that one was my wife's, seen here with yours truly taking my father for hot laps. Woefully under-powered but such a sweet, sweet balance in a car that weighed only about a tonne. It was an absolute hoot to drive on the track.

Driving enthusiasts go weak at the knees when they hear an engine like a Mercedes-AMG GT R:

They lust after the lightness of something like a Porsche GT3 RS at only 1.4 tonnes:

I'm a car enthusiast - bring on self-driving cars!

For context, a P90D Tesla is pushing about 2.2 tonnes which is the same as a Range Rover Evoque:

I'm a car enthusiast - bring on self-driving cars!

...whilst towing a jet ski (including the trailer weight):

I'm a car enthusiast - bring on self-driving cars!

And yes, the Tesla puts it all down very low to help keep the centre of gravity where it should be which is great, but you simply cannot overcome the massive inertia of an SUV towing a jet ski no matter how good your CG is. This is particularly apparent in cornering and braking (heaps of torque and no gearbox still do great things for acceleration), but it's not designed to go fast across the gamut of performance benchmarks which is inevitably why they so frequently go into "limp mode" when pushed on a racetrack. None of that matters though for people who don't want to use it for these purposes. In fact, this is the joy of the Tesla and the aspects of it that I've enjoyed very much when I've driven them; it has few peers when it comes to wafting around town in silence and indeed when accelerating at socially responsible speeds too.

For my money, when it comes to having a family-friendly car, I very nearly bought a Mercedes-AMG C63S Estate recently (video timed to the bits that show why I love it so much):

I probably still will buy it when it finally makes sense to dispose of the incumbent family car (it certainly won't be replacing the GT-R!) because it's a lovely thing to drive. Fans of electric vehicles will bemoan the presence of a gearbox and an internal combustion engine because they have a different set of priorities. For myself and other driving enthusiasts, it's a combination of the noise (that crackle on throttle lift and blips on downshifts, oh boy...) the rear-drive behaviour and the feel of how the thing responds (it's half a tonne lighter than a Tesla); it triggers an emotional reaction in us, if you like. And it doesn't matter one iota to people like myself that you can't drive that or the GT-R the way Chris Harris does in the video above 99% of the time because it feels "special" to those who value these attributes 100% of the time.

Summary

I can't wait for the day when those who want to be driven can be done so in cars designed to do just that, whether it be an evolution of the Model S or something more akin to Google's self-driving car:

These don't require "driver" attentiveness because there's nothing for them to control anyway! They can read, watch movies and take naps to their hearts' content. These cars will have massively broad-reaching ramifications we're yet to even grasp, everything from how we commute to where we live and work, not to mention the challenging the very premise of car ownership itself.

Fully autonomous vehicles will not only allow people who don't want to drive to better enjoy being a passenger, but for those who do to get more out of being a driver. Until then, no matter how good the driving aids are we're all still drivers.


Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

$
0
0
Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Let us start with what's wrong with the world today, and that's certificate authorities. Just take a look at the trusted root CAs running on a Windows 10 machine:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

The very premise of having these root CAs on your machine is that they ultimate get to decide which websites your browser will consider to have a valid SSL certificate. The root CAs serve other purposes too, but that's what I'm especially interested in here.

Edit: As Tom points out below, there are hundreds of other root certs the OS will happily trust as required. Microsoft documents this on the Microsoft Trusted Root Certificate Program page.

Now here's the point I'm driving at - if QuoVadis wants to sign a certificate that's loaded onto a website which then serves traffic over HTTPS, your browser will validate the certificate and be entirely happy with it. I have no idea who these guys are. Ok, I do now by courtesy of Wikipedia (they're HQ'd in Bermuda), but the point is that they have a great deal of control over the trust my machine establishes in other websites and they're one of dozens in the local root store.

A more established name in the business is Comodo. In fact, they're very established to the point where they issue more than 40% of all certificates on the web which is nearly double second place (Symantec with 23%) and four times third place (GoDaddy with 10%). Certificates are big business too and Comodo sells a lot of these:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

That's within the realms of standard market pricing too. StartCom are $60 a year, GoDaddy is $70 and others charge varying amounts for various classes of cert. It's a lucrative business and one the incumbent CAs want to protect. In fact, they want to protect it so much, that Comodo even tried to trademark variations of Let's Encrypt's name, such is the threat of the new up-start we're going to talk about here. As if that wasn't audacious enough, Comodo's CEO defended his company's position in attempting to trademark Let's Encrypt by saying "How can you prove it was them who made it up?". They ultimately realised they'd totally misread the market and the whole thing was what could only be referred to as "a dick move" and they retracted their claim to the name.

But let's get onto Let's Encrypt and talk about why it's upsetting the industry so much. There are several problems with the way the entire CA model has handled certificates in times gone by:

  1. The price poses a barrier to entry. Even at only tens of dollars a year, people have shied away from buying them.
  2. The setup can be clunky. Read through my post on The complete guide to loading a free SSL certificate into an Azure website if you've either not experienced the joy yourself or think that it's an easy thing.
  3. They need to be renewed. This compounds both the points above as you have to keep doing it and keep paying for it. People also forget too - there are many tales of woe where certificates have expired and things have broken because someone forgot to renew it.

Let's Encrypt sets out to address these issues and the banner on their site makes it pretty clear how:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

I'm going to steal directly from their about page to highlight how they're doing this:

  • Free: Anyone who owns a domain name can use Let’s Encrypt to obtain a trusted certificate at zero cost.
  • Automatic: Software running on a web server can interact with Let’s Encrypt to painlessly obtain a certificate, securely configure it for use, and automatically take care of renewal.
  • Secure: Let’s Encrypt will serve as a platform for advancing TLS security best practices, both on the CA side and by helping site operators properly secure their servers.
  • Transparent: All certificates issued or revoked will be publicly recorded and available for anyone to inspect.
  • Open: The automatic issuance and renewal protocol will be published as an open standard that others can adopt.
  • Cooperative: Much like the underlying Internet protocols themselves, Let’s Encrypt is a joint effort to benefit the community, beyond the control of any one organization.

In a nutshell, it's an automatic way of both installing and renewing certificates and it doesn't cost you a cent. It's no less secure than the likes of Comodo (don't let claims to the contrary fool you) and since it kicked off late last year they've now issued over 5 million certificates so the service has been well and truly tested. The certificates are only valid for three months, but when renewal is automatic that doesn't really matter, it just ticks over to a new one at the appropriate time.

In this blog post I'm going to ultimately use the Azure Let's Encrypt Site Extension. There's some great documentation provided with this extension and what I'm going to do here largely adheres to that so full credit to them for making this easy for me. I'll add a number of things along the way that I think are relevant, particularly to understanding SSL and the role Let's Encrypt plays. Further to that, when I sat down to write this I'd never actually deployed a Let's Encrypt cert before; I write these blog posts as much to get my own thoughts straight when I'm learning something new as I do to help others! Regardless, it should be really useful info for those who come after me.

Enough background, let's make the thing work.

Prerequisites

I'm going to be running this process on an Azure website so firstly, you need one of them. This is the PaaS website offering I'm so fond of and it's exactly what I run Have I been pwned (HIBP) on. However, for this exercise, I'm going to be loading a new certificate into Hack Yourself First, the intentionally vulnerable website I use in so many of my Pluralsight courses and in my workshops. The cert from StartCom was coming up for renewal so I decided that now was a good time to use Let's Encrypt "in anger".

You also need to have an Azure Storage account. We'll need this towards the end of the setup process when we eventually have a WebJob running and it requires the storage account in order to store logs. You may already have one of these for your website but if you don't, go set one up first. The pricing for the tiny amount of storage we're going to use is as good as zero (literally in the order of cents per month), so don't worry about that.

Right, so that's everything you need before doing anything, let's get on with actually making Let's Encrypt work in an Azure website.

Creating an Azure Service Principal

Remember that bit about how Let's Encrypt is "automated"? Making all this work means we're going to need to enable a process to run within Azure under an identity and to do that we need an Active Directory service principal. In simple terms, this is just a user in your existing AD implementation (everyone with an Azure account already has one) that will be given the required rights to set the Let's Encrypt cert up.

To do this, as of the time of writing we still need to jump over to the old portal (or "classic portal", as Microsoft refers to it). You can PowerShell it if you prefer, but we'll use the GUI here. We'll start here on the left hand navigation:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Here's what I presently see within my AD:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Drilling down into that single entry, I'll now go to my existing applications:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

What we're going to do now is add a new application for Let's Encrypt via the add button in the footer of the page:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

This will be an app that "my organisation is developing":

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

And then we'll give it a sensible name:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Lastly, we need to define a couple of URLs that would normally relate to how you sign into the AD app we're creating. But we're never going to actually need these addresses so you can go ahead and put any valid URL you like in there, here's what I did:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Eventually, we're going to need to allow the site extension that does the Let's Encrypt magic to sign in using this service principal and to do that we need two things. The first is the client ID of the app we just created and that's down under the "configure" option in the nav:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

The second is that we need a key that can be used to authenticate the client principal so scroll down a little but further and add one of those:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Per the description in the image above, once you hit the save button you'll actually have a value you can copy. Take a note of that and then we're done with the service principal setup.

Granting the service principal access to the resource group

A quick "Azure hierarchy 101":

You have an Azure account.

The account may have many subscriptions.

A subscription may have many resource groups.

A resource group may have many resources.

Get this straight in your head and you're all good. Mix it up and, well, it all gets very confusing. Actually, just on that, do try and keep the naming of your resource groups fairly logical as it's real easy for them to get out of control (ask me how I know...) and that makes things rather confusing. Remember, a resource group should contain logically associated resources (for example, a project with a website, database, storage etc.) and you can then assign role based access controls to that group.

Moving on, we want to grab the resource group that has the app service plan which is the logical machine(s) the site is running on, plus you'll find the site itself should be in there as well. Because I'm not always real good at taking my own advice, my resource group is still named "Default-Web-WestUS" and we're going to drill down into the access controls of the resource group by clicking on the two little heads:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

This is where we now bring in the service principal and we're going to do that by adding a user:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Now drill down into the roles and select "Contributor" then add a user and start typing the name of the service principal. This is important as it won't be listed there on its own. Here's what you should now see:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Select, ok, wait a bit and you'll shortly see a success message:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

So now we have a service principal which has the correct rights in the resource group containing the website we want to create the cert for. There's just one more thing we need to get right before we can setup Let's Encrypt itself.

Configuring the storage account

I mentioned earlier on in the prerequisites section that we'd need a storage account. It's a little bit more than that though as we also need to configure the website so that WebJobs running on it can communicate with the storage account. If you've used WebJobs before then this will be pretty familiar stuff, but let's run through it anyway.

Jump over to your storage account and copy the primary connection string:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Now head over to the website that we'll ultimately be installing the new certificate on and create two connection strings, one called "AzureWebJobsDashboard" and another called "AzureWebJobsStorage", both with the value copied from the storage account above and both configured as "Custom" connection strings:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

That's all the native Azure bits done, it's time to make Let's Encrypt work.

Configuring the site extension

This is really the guts of making Let's Encrypt work; everything until now was just setup. We're going to start by jumping into Kudu:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

If you're using Azure websites and not already using Kudu, you're missing out as there's a heap of good stuff in there. I won't go into it all now, suffice to say that if you head over to https://{your website name}.scm.azurewebsites.net you get a bunch of awesome stuff. One of those awesome things is site extensions and within there you'll find one called "Azure Let's Encrypt (x86)":

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Hit the "+" button on that (make sure you chose the x86 version unless you've moved away from the default website configuration and are running 64 bit) and then tell it to install. You'll get an indicator showing that stuff is happening:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

And when it's done, you'll get a play button:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Now this is important - if you hit play button you'll likely see a rather unfriendly error about "No route registered for '/letsencrypt/'". Jump back into the Azure portal, find the website and stop it:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Now start it again. Don't restart it, it needs a completely cold stop / start cycle and yes, this will obviously take the site offline for a few seconds too. Once that's done, head back to the site extension and hit that play button again after which the new extension should be running just fine:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

There's a bit going on here so I've highlighted the most important bit - the key values pairs on the bottom part of the screen are the app settings you need to configure on the website. Further down the page you'll see a bunch of empty text fields:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

We'll come back to these later, for now let's just agree that there's nothing in them. Now it all gets a bit painful because we need to retrieve a bunch of information from a variety of different sources then create app settings in the website after which the empty text boxes you see above will populate with the correct values.

We'll start with the tenant name which you'll find under your identity in the top right of the portal (not in Kudu):

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

You'll find it'll be something like {my name}.onmicrosoft.com so grab that and create an app setting which should now look something like this:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Next up is the subscription ID and you can grab that from a bunch of places, easiest is probably from the website "Essentials" panel:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Grab that guy and drop it into another app setting (I won't screen cap them all individually).

The next two are easy - the client ID and the client secret - and they're the ones we have from when the service principal was created. They now go into the next two app settings.

Nearly there, next one is the resource group name and we've seen that multiple places now, including in the image above with the subscription ID. That's all we need, we can skip the service plan resource group name entry. When it's all done, it should look something like this:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Right, save all that, jump back to the Let's Encrypt site extension in Kudu and give the page a refresh. Those text fields which were empty before should now look like this:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Now we're getting to the pointy end of all this. Hit "Next" and you'll see a summary of the existing hostnames and SSL bindings on the site:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Hit "Next" again and now we're finally at the stage of actually requesting the certificate. Chose the hostname you actually want, enter your email then request and install certificate:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

And then you wait... for what feels like ages until:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Hang on - is it really that easy?! Ok, there was a lot of earlier work but was installing the certificate itself as simple as hitting one button? Let's check out the portal:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

There's two certs here, the one I haven't highlighted expires on the 14th of July and that's the StartCom one I mentioned earlier on in the blog. The highlighted one is the new one from Let's Encrypt which will expire in only a few months from now (remember, they're all 90 day certs). We can then see the thumbprint of that cert appear on the SSL bindings further down. All this means that if we now go to the website and check the cert:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

Success! It took some effort, but the whole thing is now up and running just the way I'd expect it to be. Don't just do this and say "ooh, magic certificate!" rather do have a read of Let's Encrypt's How it Works page. It's a very short read and it's important to understand how the website communicates with their service in order to identify itself and ultimately receive a certificate.

But getting back to that certificate, it'll expire in a few months and Let's Encrypt is meant to automate the renewal process, so how does this work? Let's take a look.

Automatic renewal with WebJobs

I've long espoused the virtues of Azure WebJobs because they're easy to implement, highly resilient and free to run on an existing website service (amongst other virtues). WebJobs are also the mechanics behind how the Let's Encrypt site extension renews certificates.

You can view the WebJobs running on an Azure website via Kudu, in fact you'll see a link in the title bar of the images above when I was configuring the site extension. Drilling down into those WebJobs, I can see a brand new one from Let's Encrypt:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

You can drill down into that job and see a couple of functions running:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

The names of those give you a good idea of what they're intended to do and inevitably, a few months from now, I'll see output in the jobs suggesting that a new certificate has been installed and created. I hope. If all goes to plan, which kinda brings me to the next point...

Proceed with caution...

Here's the first problem I have with all this - time. Time to figure out the Let's Encrypt paradigm within Azure, time to configure it all, time to make sure what you might uncharitably call "a kludge of different components" all work nice together. I've had people say to me many times that Let's Encrypt makes SSL "free" but that's just not the case and it totally neglects the total cost of ownership of making this thing work. I originally saw my mate Nik Molnar's post on this earlier in the year and thought "holy shit, that's a lot of work". And relatively speaking, it was. From a pure ROI perspective, spending 70 bucks on a certificate bought via the Azure portal almost certainly makes more sense, at least for the first year until the manual renewal process of a traditional cert is required.

The second problem I have with this is that there's a lot of moving parts keeping this whole thing alive. The service principal needs to exist and have rights within the website which must be configured to access the storage account which will be written to via the WebJob which must have access to the correct connecting strings and app settings in the website. I have the ability to screw any of these things up myself well after I've already configured them and indeed it's the sort of thing I can see someone inadvertently doing ("Why do I have a storage account for that site? I'm sure I don't need it, let me get rid of that..."). Microsoft themselves recently had a glitch which deleted all my app settings from HIBP (it was publicly acknowledged and affected a small number of customers), so you can see my concern. If any of this fails, the cert won't be renewed and disaster will ensue when people realise they can't access the website. Conversely, there's a lot of manual mucking around every time you renew a traditional cert anyway, although you only have to get that right once a year as opposed to all the processes described here needing to work in perfect harmony 4 times a year.

To be completely fair to Let's Encrypt, this is a much easier process when you have root on a Linux box. That's the level of ease I'd love to see in Azure but we're comparing total control of an environment to managing a PaaS offering here. The joy of PaaS though is that there's the potential to abstract all the hard work away and offer it as a service so certainly that could be on the road map for Azure.

You know what I'd really like? I'd like this:

Everything you need to know about loading a free Let's Encrypt certificate into an Azure website

I'd love Microsoft to get on board with Let's Encrypt and make it a first class citizen within the Azure portal (and I'll definitely be raising that with them too). Make it properly automated within the portal and run the cert renewal as a process behind the user-configurable scenes. As an end-user, I'd love to simply say "give me a cert" and it's a single button click then job done.

At this stage, I'd still seriously consider following the CloudFlare approach I wrote about last year. It's significantly easier to setup, it's a well-proven model and it's also still free and includes automatic renewals. Plus, it does a heap of other security and perf things too and yes, some people will bemoan that it may not always encrypt all the way through to the origin (that's an issue I'll take up in another blog post) but it sure is a hell of a lot easier and less likely to go wrong.

Until we see the Let's Encrypt process described above fully automated and really owned by Microsoft, I'd be cautious about using it on an important site. Having said that, I'd still encourage anyone comfortable with all of this to back Let's Encrypt and give them the support they need to keep on growing. Let's remember this is early days - we're literally only 8 months into the service - so they've done remarkable things already in such a short time and they will be a significant part of the SSL landscape in the future. Ultimately though, whether it's Let's Encrypt or Comodo or CloudFlare, the march forward from no transport layer security whatsoever is a very positive shift for the web indeed.

Introducing unverified breaches to Have I been pwned

$
0
0
Introducing unverified breaches to Have I been pwned

Data breaches can be shady business. There's obviously the issue of sites being hacked in the first place which is not just shady, but downright illegal. Then there's the way this information is redistributed, the anonymous identities that deal with it and the various motives people have for bringing this data into the public eye.

One of the constant challenges with the spread of data breaches is establishing what is indeed data hacked out of an organisation versus data from another source. We've seen many recent cases where representations of a data breach have been made and the claim subsequently well and truly disproved. For example, the recent case where it was claimed that 272 million accounts had been stolen from Hotmail, Yahoo, Gmail and Mail.ru. The mail providers subsequently confirmed that no, this was not the case. Same again for recent claims that there were 32 million Twitter accounts on the loose. Twitter quickly debunked this and speculation that they were obtained via malware has never been substantiated.

The first thing I try and do when I see a new data breach is establish if it's legitimate and I've written before about how I do this. Under no circumstances do I want to end up in a situation where I'm making a claim about an organisation being hacked which is then proven to be false, not only because of the potential reputation damage to the company, but because of the unnecessary angst it causes for those involved in the incident. Plus, any claims of this nature are being made by me as an identifiable individual; I'm not hiding behind the veil of anonymity and shirking any responsibility associated with getting my facts wrong. Integrity is essential, particularly in an area of security so frequently lacking it.

But here's the problem and the catalyst for writing this post: sometimes there are breaches where I just can't be certain of the authenticity, yet there are many indicators which point to an actual breach. The incident sits in that grey area between "very unlikely to be legitimate" and "almost certainly legitimate". For example, the Badoo breach. They've denied the data came from them so that in itself is an important factor to consider. That doesn't necessarily mean they're right, but it's a factor involved in my confidence level, particularly when the likes of LinkedIn and MySpace openly acknowledged the legitimacy of their recent breaches. The Badoo data itself is... eclectic. Here's the first row of the breach file:

INSERT INTO 'User66' VALUES ('11917635', '62', '0', '8', '0', 'None', '67', '7636', '265791', '0', 'W', 'No', '..::\|/::..', '', '', 'Default', 'Y3B0ZmluZHVzQHN1cGVyZXZhLml0', 'Yes', '0000-00-00 00:00:00', '', 'No', '0000-00-00 00:00:00', 'No', 'On', 'On', 'On', 'Default', 'Default', 'Default', 'On', 'On', 'On', 'Default', '11917635.onirc.cptfindus', '0e19a8bac63f97a513063dcb9a64442b', 'Default', 'UbLHyDFVtm', '1979-10-07', '29', 'M', 'No', '29', '568', '45661', '29', '0', '0', '0', '0', '0', '0', '', '0', '22555', 'enAg2oQmyS', '0', 'Yes', 'Yes', 'Yes', 'Email', '', '2013-03-14 15:03:11', '2006-12-02 00:10:37', 'No', 'Active', 'Deleted', '2009-06-05 09:38:16', '2006-12-02 00:16:14', '1990-01-01 00:00:00', '0000-00-00 00:00:00', '0000-00-00 00:00:00', 'No', 'No', '2006-12-02 00:15:17', '0000-00-00 00:00:00', '0000-00-00 00:00:00', 'Yes', 'Yes', '0', 'No', 'New', '2007-07-13 13:31:39', 'No', 'None', '0000-00-00 00:00:00', '0000-00-00 00:00:00', '0000-00-00 00:00:00', 'Changed', 'No', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'NotActive', 'NotActive', '', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', '16777216', '0', '0', '0', '', '0', 'Default', 'No', 'Default', '0', '0', '0000-00-00 00:00:00', '0', '0000-00-00 00:00:00', 'On', 'Default', 'Yes', 'Web', 'Commercial', '0000-00-00 00:00:00', 'Yes', 'Yes', '', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'Default', 'No', '0000-00-00 00:00:00', 'F', '24', '39', '10001', null, 'NOT_SET', 'No', 'Default', null, '0', 'No', '0', '0', null, null);

This implies the presence of many interesting fields, yet every subsequent row is inconsistent with this insert statement and contains significantly less information. For example, here's a sample Mailinator account (these are often disposable addresses used by individuals who are not creating genuine accounts they actually intend to use):

177459377:[redacted]@mailinator.com:0177459377:32ce6f311197613d6e77d31a66af52c0:Bah:Bah:Doo:1969-07-21:43:M:9:874:132959

Here we have an ID, email (the alias also contains the word "badoo"), username, password, first and last name (clearly fabricate above, but seemingly legitimate on many other records), what's possibly a username or alias, birth date, gender and what are likely foreign keys to other tables. The Badoo website confirms the existence of the email address via the password reset feature and that MD5 password hash has a plain text value of... "badoo".

But then when looking at the Badoo site, there's inconsistencies with the data. For example, you can't create an account with an email address that uses Mailinator:

Introducing unverified breaches to Have I been pwned

However, they'll happily allow one of Mailinator's hundreds of alternative domain names (such as spamhereplease.com). Now this doesn't mean that the account above didn't come from Badoo, it may simply mean that at some time after it was originally created they changed their policy on addresses and disallowed that host name. I thought I might see similar behaviour when creating a password but no, Badoo will still happily allow a password of "badoo". There are 49,941 "badoo" passwords in the dump...

This exercise gives me some degree of confidence in the legitimacy of the breach, but the same process with other records was much less conclusive. Particularly for an incident of this size, I didn't want lingering doubts so I needed to reach out further.

Over recent weeks, I've been in contact with dozens of Have I been pwned (HIBP) subscribers who are in the alleged breach. I've been using them to help sanity check the data and the results have been... mixed. With only a limited set of data available to actually verify whether it actually came from Badoo, I provided snippets to the alleged owner and asked them not just if the data itself was correct, but if they'd ever created an account on Badoo. Often I'd get a simple confirmation - "Yes, I had an account there and yes, the data is correct". Other times they were adamant they'd never created an account but their personal attributes were accurate. Then in some cases, none of the data was accurate.

Now negative responses don't necessarily mean that someone didn't have an account; they could have forgotten, they could have created one with another service that Badoo since acquired or someone could have simply signed them up without their knowledge. All of these are possible. Problem is, in some cases, people would respond like this:

I get a message saying incorrect email address

This was after I suggested that one of my HIBP subscribers issue a password reset for what was allegedly their account. It's possible their email was genuinely in the system and it was simply "soft deleted" (the record was still there but merely flagged as inactive), but it's also entirely possible that Badoo has never seen this individual before. Similar story here:

I did try to ask them for password recovery since I receive your email and never had an email sent to me

But then there were confirmations from others:

Once I signed on to that site, but I did delete it

Or this slightly cryptic one:

Yes, the account has been set up but not by me. There were my details. I sent a request to the Badoo team, to removed this account. I've got a reply they have done already.

So you see the challenge in terms of verification when there are both positive and negative indicators of legitimacy. It would be irresponsible to make an outright claim that "Badoo was hacked", yet by the same token there is a very high likelihood that at least some of the data has come from them. Ultimately, the only conclusion I can emphatically reach is that the data is "unverified", which brings me to the concept of unverified data breaches in HIBP.

This is actually not a new thing, in fact I created a UserVoice idea for it back in 2014. Actually, I did more than that, I integrated the concept of an unverified breach into the underlying data model of the system and whilst I didn't publish anything in the API documentation, there's been an "IsVerified" flag returned with the breach model for some time now. Until today, it's always returned "true", but Badoo marks the first unverified breach I've loaded.

Because it's unverified, it's important I indicate that whenever the breach is described in the system. The first place you'll see that is on the homepage as it's within the top 10 breaches loaded into the system in terms of size:

Introducing unverified breaches to Have I been pwned

The next is that if you search for an email address and it appears in an unverified breach, there'll be an indicator in the description. Now this isn't possible with Badoo because being a dating site it's also flagged as "sensitive" which means you can't search for it publicly. However, those who've subscribed to HIBP's free notification service can still view everything they've been pwned in by following the link in the email they receive when signing up (you can come back and do this even if you're already subscribed):

Introducing unverified breaches to Have I been pwned

Because it's there in the description of the incident, anyone who appears in the data breach and receives an email notification will see a clear explanation of the unverified nature of the data with a link through to this blog post. The point is that I want to ensure at every possible opportunity, the unverified status of the data is made perfectly clear.

I put a lot of thought into how to handle this incident and that combined with reaching out to so many HIBP subscribers in the data set has meant loading it all the month after it originally appeared. One of the key factors driving this approach is that even if not all the data is accurate and some of it doesn't align with what Badoo holds in their system, this is people's personal data floating around the web and they want to know about it. There is certainly a threshold beneath which I won't load a "breach" regardless of how many people are in there - I still need to have a sufficient degree of confidence in it - but that's a judgement decision I'll have to make on a case by case basis.

I harp on about this but it's really important: dealing with data breaches responsibly at every turn is really, really important. Misrepresenting a data breach without doing sufficient research to establish legitimacy would be reckless and makes an already bad situation worse. This process isn't always easy, but it's the right thing to do and whilst I doubt my position on this will have much influence over the data breach handling industry in general (for want of a better term), hopefully it demonstrates that there are ways of handling these incidents that can act in the best interests of all involved.

Getting to grips with cloud computing security on Pluralsight

$
0
0
Getting to grips with cloud computing security on Pluralsight

Two of the things you'll have found me most frequently writing about on this blog are "cloud" and "security". Whilst the latter seems to have been what I've gravitated towards most in recent years, the former is something I'm very heavily involved in, particularly with my work on Have I been pwned (HIBP). I'm enormously happy to see the very last course in the Ethical Hacking series I've been building out with Pluralsight now complete with the 8th and final one being Ethical Hacking: Cloud Computing. Overwhelmingly excited. Ecstatic!

I'll come back to why I'm so happy that the entire thing has now been wrapped up, but let me start with my favourite cloud question of all:

Is "the cloud" secure?

I love this question for a couple of reasons. Firstly, it implies that there is "a cloud", as though it is this one thing that everyone uses. Many of my readers will immediately see the flaw in the logic here; the cloud I run HIBP on is different to the cloud my iPhone backs up to which is different to the cloud you're reading this blog on (which is also served through a cloud proxy service - CloudFlare). They're very different cloud services serving very different purposes and presenting very different risk profiles.

But most of all, I love this question due to one simple fact: it's a terrible question. I love it being a terrible question because it allows us to have a discussion about what people asking it really mean. This question is often asked by people considering a move to cloud computing and these days I answer it by telling a story:

A couple of years ago in a previous career, I was called over to Beijing to help sort out some issues with a piece of software. It was a critical application for the organisation; a halo project in a behemoth multinational in the fastest growing region in the world. I spent several days with the team working on all the usual sorts of issues that large software projects face. Not security related in this case, but performance, maintainability, release management and so on and so forth. As we wrapped up with a little time to spare, one of the girls in the team asked about Azure. I did the usual dog and pony show (essentially a very compressed version of Modernizing Your Websites with Azure Platform as a Service) after which I got that exact question:

Is "the cloud" secure?

This was a great opportunity: "I don't think you're asking the right question", I replied, "Let me give you a demonstration". Now this all happened very quickly so let me summarise it in point form:

  1. I went to the test version of the application I was there to consult on
  2. I took the URL and resolved it to a host name
  3. I then browsed over to that host via Windows Explorer
  4. I discovered a shared folder that pointed to the web root of the project
  5. There were no access controls on the folder (full control to everyone)
  6. It was an ASP.NET application which contained a web.config file
  7. The web.config had an unencrypted connection string using a SQL login
  8. The SQL login used the sa account...
  9. Now I had full access to the test database server
  10. The test database server had a linked server to production
  11. The account the linked server used had full control over the production database

This was a 5-minute exercise and anyone with half a clue could have easily reproduced the same process very quickly. It was literally this simple:

Getting to grips with cloud computing security on Pluralsight

I use this example in the first module of the course because it helps set the tone of the discussion we should be having about cloud security and that's not talking about it like it's some secure / insecure boolean state, rather that it's different. Rather than asking if the cloud is secure, here's what people should be asking instead:

How does cloud computing change the security posture of what we're doing now?

There are many aspects of a well-executed cloud strategy that would totally change the security landscape of the situation I described above. By the same token, it could have been done properly on-premise as well but what was lacking was competency in those administering the environment. You go to the cloud and you start to get choices about how much administration you delegate back to the provider by virtue of IaaS, PaaS and SaaS offerings. If I had architectural control over this particular system, I would have thrown it all into PaaS and not given anyone the ability to open up shared folders or create linked servers or do any number of things that seriously jeopardised the security position of this project.

The fact that cloud computing leads to so many new discussions about security is really the point in terms of looking at how things change. Because of that, I wanted to create a really balanced course that looks at both the strengths and weaknesses of the cloud. Here's what's in there:

  1. Cloud Computing Concepts
  2. Organizational Security Considerations
  3. Cloud Computing Risks
  4. Cloud Computing Security Strengths
  5. Hardening the Cloud

Balance and objectivity is really important and didn't want to go into this all "ra ra ra, cloud is awesome, cloud all the things". Because of that, I talk about the risk of things like having your administration portal compromised - that's a sure way to mess your day up very quickly (and probably much worse). Or how about the risk of an EDoS attack or "Economic Denial of Sustainability". A great example of this is that you don't normally pay for ingress data in a cloud service (for example, outbound requests from a VPS), but you do pay for egress (for example, requests to a cloud website). So the attacker hammers your site from another service loading the largest content they can as fast as they can. Your site stays up, but suddenly your costs go nuts because cloud computing is a measured service - you pay for what you use.

Obviously three hours of content goes into a lot more detail than that across a much broader range of cloud security topics, but you get the idea. I really hope this course helps people learn how to think about security in the cloud. It's not always a more secure or even less secure proposition rather as I keep saying, it's just different.

This wraps up everything I was slated to produce for the Ethical Hacking series, a journey I started way back in March last year. That may not seem like a long time, but a huge amount of water has gone under the bridge for me personally since then. I'm just waiting on my co-author Dale Meredith to finish his very last course and I'll write something more general about the Ethical Hacking series and indeed about the CEH certification in general once the whole package is there for everyone.

I hope you enjoy watching this course as much as I enjoyed finishing the series - Ethical Hacking: Cloud Computing is now live!

Round 4 of Europe for 2016: More talks, more workshops

$
0
0
Round 4 of Europe for 2016: More talks, more workshops

If you follow my Twitters, you may have noticed I can be a bit, well, "despondent" about the climate in Europe. No, not the whole Brexit political climate situation, I mean more like this:

Yet I keep ending up back there so either it's my poor judgement or... I secretly enjoy it. Back in Jan (when it was much easier to complain about the weather), I was over in England, Scotland and Norway for four weeks then in May it was Belgium and Spain and last month was Norway and England again plus a few days in the Netherlands. I always try and combine a bunch of events because frankly, everything is a long way away from Australia, it's just a matter of waiting until there's a critical mass of interest to make a long trip. Now, for the fourth time this year, there is.

This coming October, I'm going to be in Scotland, Denmark, England and Switzerland. Let me share which events I'm going to and where I'll be running public workshops. I have a small amount of availability to run private workshops as well (check out the Workshops page for more info on these), so this is a couple of days within an organisation sharing the appsec love with their team. I'd love to hear from companies in Europe that would like to get me over as it makes a lot more sense to do these while I'm already in the region rather than getting me back over another time.

Here's what I'll be getting up to:

ScotlandIS Keynote in Edinburgh, Thursday October 6, Scotland

Round 4 of Europe for 2016: More talks, more workshops

The whole idea of coming back to Europe really began with these guys. They were very enthusiastic to make this happen and I'm really glad I managed to tie together enough things to make it a reality. I'm happy to be doing the closing keynote at the event and I'm looking forward to soaking up a bit of the local culture there. Someone told me there are these little wild creatures they call "haggis" that run round and round mountains so I thought it might be cool to go see those...

Public Workshop in Copenhagen , Tuesday October 11 and Wednesday 12, Denmark

Round 4 of Europe for 2016: More talks, more workshops

Next up is Copenhagen where I'll be running a two-day public "Hack Yourself First" workshop. This has been organised by the Copenhagen .NET User Group and anyone in Denmark (or who can get themselves there) is welcome to come along. This will be my first trip to the country so I'm looking forward to seeing something a bit new (either that or lots and lots of Lego).

Public Workshop in London, Thursday 13 and Friday 14, England

During my last European tour in June, I stopped off in London and did a couple of days of "Hack Yourself First" with Learning Connexions. This was a public workshop so anyone willing to pay for a couple of days training could join and boy did they join. We maxed out the capacity, exceeding our projected attendee numbers and having an absolutely sensational, engaging, fun couple of days.

Learning Connexions actually gathered a bunch of feedback afterwards which they're now including with details of this next event. I'm enormously happy to share it here:

Round 4 of Europe for 2016: More talks, more workshops

Obviously I'm really pleased with the results and I expect we'll max this event out again in October.

Swiss Cyber Storm Keynote in Luzern, Wednesday October 19, Switzerland

Round 4 of Europe for 2016: More talks, more workshops

Firstly, is that not an awesome name for a conference?! I'm really glad I was asked to do this event as I love Switzerland (although it's been 16 years since I was last there), and there's a bunch of people speaking there I'm yet to meet. Although among that list is my friend Scott Helme who among other things, was my cohort in the Nissan LEAF exploit which hit the headlines earlier this year. I'll be talking about a heap of lessons I've taken away from running "Having I been pwned" which is usually a very well-received topic.

WIRED SECURITY Keynote in London, Thursday October 20, England

Round 4 of Europe for 2016: More talks, more workshops

To round out the trip, I'll be back in London on the 20th to keynote at the WIRED SECURITY event. WIRED as a publication has been running for 23 years now which is a lifetime in technology. Like the other events I'll be talking at on this trip, it's a very different style of conference to the typically developer-centric talks I normally do so I'm really looking forward to getting in front of a different audience. I was also happy to see my friend Mikko Hypponen on the speaker list and he's someone who always does sensational talks so I'm quite excited to see him and a bunch of others who I'm yet to meet.

Private workshops

I've got a small handful of days free to run private workshops in the UK or mainland Europe and I'll be trying to fill in available spaces with those fairly promptly. A complete overview of what I do at these events is listed on the Workshops page of this blog and I'd love to chat to anyone interested in having me over. Get in touch via the contact page if you'd like further information.

Why am I in a data breach for a site I never signed up to?

$
0
0

This question in the title of this post comes up after pretty much every data breach I load so I thought I'd answer it here once and for all then direct inquisitive Have I been pwned (HIBP) users when confusion ensues in the future. Let me outline a number of different root causes for the "why is my data on a site I never signed up to?" question.

You forgot you signed up

Let's start with the simplest explanation because it's often the correct one - you've simply forgotten you signed up. We leave a huge trail of accounts behind us on the web over the many years we've been online for and there's no doubt whatsoever that most of us (I certainly include myself in that), can't recall exactly what we signed up for a decade ago.

I've had a number of occasions in the past where people have claimed they've received a notification from HIBP and sweared black and blue they never had an account only to then recall they did after I've started troubleshooting what might have gone on. That's just the nature of the web these days in that we spread ourselves around so much that we'll never be able to recall every location we've left our data.

Keep in mind also that we may not have left our data "on the web", it could it have been a physical registration form or that time we provided our info to a hotel then they signed us up for an account with their loyalty program.

Our data is sold and redistributed

Your information is a commodity. A while back I wrote about how your data is collected and commoditised via “free” online services and this showed the way data spreads to various locations after you provide it to somewhere which seems entirely unrelated.

Websites buy your data. Websites redistribute your data. You even agree to this when you accept the terms and conditions of so many different websites (and no, I don't read them either), so it's no surprise that your data spreads so broadly into totally unexpected places.

Sites rename and rebrand themselves

This isn't particularly unusual, especially on the web where companies are frequently "pivoting". They used to do X and it didn't work out so well, now they're going to do Y under a different identity with a different purpose. Y gets hacked and data gets leaked and attributed to them, but you gave your info to X which leads to obvious confusion.

It speaks to the fluid nature of online services and we can all think of many that have come and gone or refocused their attention in different directions. This often becomes apparent when looking at underlying data structures exposed by attacks where the old name still persists with just the veneer of the service changing as far as the public is concerned.

Acquisitions occur

I'll give you a perfect example of this that affected me in a data breach a few years ago. I found one of my work accounts in the Adobe data breach and I was certain I'd never signed it up to them. Upon further reflection, I realised that I'd used that account with Macromedia back in the day when I was using Dreamweaver. Adobe's acquisition of Macromedia now meant that email address was in the Adobe data breach.

I certainly can't keep track of who's buying who and an acquisition of this style can mean your data ends up in entirely unexpected locations.

Other people sign you up

Here's another personal example: in October last year, 000webhost was hacked and their data spread around the web. I loaded it into HIBP and then... I got a personal breach notification. This was entirely unexpected as I didn't have an account with them, at least I thought I didn't.

Being curious, I went to the 000webhost site and requested a password reset for my email address. I logged on and found it was managing someone else's website. On the face of it, the root cause appears to have been someone "fat-fingering" the email address when signing up and entering mine instead of another very similar one. Because 000webhost never had subscribers verify their address before attaching a valuable asset like a website to it (tip for web developers - do this!), I was now in control of someone else's site.

There are many "Barack Obama" entries in Ashley Madison. Now I'm going to go out on a limb here and suggest that POTUS wasn't signing up to an adultery website in the first place and if he was, he wouldn't be using his own name! The point is that anyone can sign up to almost any website with any name and any email address - including yours.

Summary

It can be a little disconcerting when you can't work out why your data is appearing in a certain location, indeed that's part of the value proposition of HIBP as you can track the occurrence of many of these incidents. I put a lot of effort into verifying the legitimacy of data breaches so there shouldn't be false positives and if ever I'm not sure about the legitimacy of the incident, I flag it as "unverified" and make it crystal, crystal clear.

As I've said above, this is now just the nature of the web in that we end up with our personal data scattered far and wide, often beyond the scope of what we intended it to be. Hopefully, HIBP can help you surface that information, but don't surprised if you find yourself in entirely unexpected locations.

I wanna go fast: HTTPS' massive speed advantage

$
0
0
I wanna go fast: HTTPS' massive speed advantage

I tweeted this the other day, and the internet was not pleased:

In fact, a bunch of the internet was pretty upset. "It's not fair!", they cried. "You're comparing apples and oranges!", they raged.

No, it's not fair, the internet is not fair. But that's just how the web is today and whilst you might not like that it's not fair, that's the ballgame we're playing. When it comes to performance tests, I don't care about "fair", I only care about one thing:

I wanna go fast: HTTPS' massive speed advantage

Let's take just a moment to put how fast into context. Here's the test from the tweet above over HTTP:

I wanna go fast: HTTPS' massive speed advantage

The content kinda staggers in bit by bit as we've become accustomed to on the web these days. But now let's run the HTTPS test:

I wanna go fast: HTTPS' massive speed advantage

Whoa! This is awesome! Job done, HTTPS is fast and HTTP is crap, nothing more to see here.

Well, almost, let's address the "It's not fair" whingers. The HTTPS test is faster because it uses HTTP/2 whist the HTTP test only uses HTTP/1.1. The naysayers are upset because they think the test should be comparing both the secure and insecure scheme across the same version of the protocol. Now we could do that with the old protocol, but here's the problem with doing it across the newer protocol:

I wanna go fast: HTTPS' massive speed advantage

Hey, look at that, every current browser that supports HTTP/2 has got a little "2" annotation on it. Accordingly, that means this:

Only supports HTTP2 over TLS (https)

So in other words, if you wanna go fast you can only do it over the secure protocol, not the one that sends everything in the clear because no browser supports it. HTTP/2 is able to do this courtesy of Multiplexing so we're talking about asyncing a bunch of requests via binary streams across the one TCP connection. What that means is the difference between this way of loading images in the old version of the protocol:

I wanna go fast: HTTPS' massive speed advantage

And this way in the new version:

I wanna go fast: HTTPS' massive speed advantage

Get it? The old one is the very classic "waterfall" of requests occurring with minimal asynchronicity whilst the new one is more of a "cascade" of requests all happening at the same time. That's why you see a bunch of the images appearing in large batches in the animation earlier on as opposed to staggering in per the insecure protocol.

Now the naysayers will lament that the test is unrealistic because you've got 360 little images all loading on the one page. But it doesn't really matter because you could do it with 36 and the multiplexing is still going to make it way faster. Or perhaps more realistically, a couple of megs worth of chunky images, CSS, JS and all the other crap so many websites load today. They all get the perf benefits that HTTP/2 offers and some of them may well show even greater differences than observed in this test; those 360 little images only add up to 0.62MB whereas the average web page is now 2.3MB. You're also looking at somewhere in the order of 100 requests too so the comparison tests above may even be erring on the conservative side.

Of course the web server also has to support HTTP/2, so that means you can't get it on IIS yet (we'll see it soon when Windows Server 2016 ships with IIS 10) unless you wrap CloudFlare around it (like this blog), which can serve its cached content over the newer version of the protocol. CloudFlare also has a little speed comparison test on that page with both protocol versions served over HTTPS:

I wanna go fast: HTTPS' massive speed advantage

The neat thing about this approach is that even if the origin website (the one CloudFlare is serving traffic from) doesn't support HTTP/2 (and Ghost Pro which this blog is on does support it), you can still get super-fast HTTP/2 speeds. Here's how to see it in action first hand: I have a website running on cloudflareonazure.com which I use in my Getting Started with CloudFlare Security Pluralsight course (don't worry that the site has mixed content, that's both intentional and not the point). This site is an Azure website which presently only runs on HTTP/1.1. Now let's drop into the Chrome dev tools, over to the network tab then right-click on a column heading to turn on the protocol column:

I wanna go fast: HTTPS' massive speed advantage

I also turned on the domain column so that I could clearly show you this:

I wanna go fast: HTTPS' massive speed advantage

Here we have requests going to a domain hosted on Azure which can't talk HTTP/2 yet the protocol being returned is "h2" which is the identifier for HTTPS over HTTP/2. We see this because all requests are routed through CloudFlare which can talk h2. Now of course if CloudFlare needs to pull content from an origin that doesn't talk h2 then there'll still be a bottleneck in the connection, but many requests won't come from the origin anyway. Two thirds of the traffic on this blog is served directly from their cache so that can come down over h2 and make a significant difference to speeds even when the origin is stuck on HTTP/1.1.

And lastly, for those who really, really want to live under the illusion that the web is "fair" and a head-to-head match of HTTP and HTTPS over 1.1 would yield a fundamentally different result in favour of going insecure, have a read of Is TLS Fast Yet. Even over the outgoing version of the protocol, the "encryption is slow" argument has gone the way of the marquee tag and remains an artefact held onto only by those living in the past. Actually, bugger it, if you really want to test both schemes over HTTP/1.1 then issue the requests with a header that only accepts 1.1 and see how that goes:

I wanna go fast: HTTPS' massive speed advantage

I just gave that a run with Fiddler open (which doesn't support HTTP/2 thus strips support for it from the request header) and it was still faster. Do it back to back a few times and the results will fluctuate (minor differences in connection quality etc), but you won't find a smoking gun pointing to how slow HTTPS is, even over HTTP/1.

This is all simply a test of "what's the fastest we can go over HTTP versus what's the fastest we can go over HTTPS". I don't want fair, I want fast. If you wanna go fast, serve content over HTTPS using HTTP/2.

Stop the madness! Ridiculous security scare tactics revealed

$
0
0
Stop the madness! Ridiculous security scare tactics revealed

You know the best way to sell security products? Scare the shit out of people. I mean make them really genuinely fearful that if they don't have the thing you're pushing that a bunch of nasty stuff will happen to them. It's the Donald Trump school of winning hearts and minds.

Which brings me to CUJO, an Indiegogo campaign for a "security in a box" product. Strap yourself in and watch the video:

Are we terrified yet? Yes? Good. Scary music, a hacker dude in a hoodie and just for good measure, a beauty queen and a kid (won't someone please think of the children?!)

Around the 40 second mark you get a bit of "you may not know it, but you've probably already been hacked" whilst hacker dude is busily, well, hacking someone. Here he is in action:

Stop the madness! Ridiculous security scare tactics revealed

Now that struck me as odd because as best I could tell, that's a browser window he's hacking in. Let's go all CSI on this and zoom in to take a closer look:

Stop the madness! Ridiculous security scare tactics revealed

Wow, it really is a browser window! How does such magic hacking in the browser actually work?! Well you're in luck folks, because today I'm going to teach you how to be a hacker dude too and all you've gotta do is head over to hackertyper.net and then... mash the keyboard. Then, as if by magic, your fat-fingering will produce this:

Stop the madness! Ridiculous security scare tactics revealed

Wow, so that's how the hackers do it!

But is it really that blatant? I mean did they actually just go to hackertyper.net and film a dude in a hoodie mashing the keyboard in order to help convince buyers they need to buy security in a box? Let's compare the CUJO video to a segment of the code from the hacker website:

Stop the madness! Ridiculous security scare tactics revealed

Stop the madness! Ridiculous security scare tactics revealed

Next time guys, at least put the browser in full screen mode! Oh, and perhaps less scary stuff and more facts would also be good. If you'd like to learn more about CUJO and draw your own conclusions on its efficacy, head on over to getcujo.com or review the CUJO discussion in the Shitty Kickstarters Subreddit .

Seen something similar? Leave me a comment!


What you should and shouldn't worry about when you complete today's census

$
0
0
What you should and shouldn't worry about when you complete today's census

There's a lot of people getting themselves worked up about the Australian census whose five-yearly cycle falls due today. For the most part, it's like any other normal census we've done ever since I can remember, but what's changed this year is the duration for which names and addresses will be retained against the census answers.

There are some good reasons to question the whole thing, plus some good reasons why it's really a non-event. Let me share my view of things.

About the census

I've just literally been handed the census paperwork by the property managers at the place we're staying at while trying to get in a bit of snowboarding. Here's an example of what we're talking about:

What you should and shouldn't worry about when you complete today's census

It's stuff like where you live, your ancestry, household languages spoken, religion, education background, job and income plus a few incidental other questions. But do you really need to worry about the government collecting this data? Let me throw out some concerns followed by some rationalisation.

Why you need to be concerned

This is a lot of data being collected in a fashion that ties together many personal attributes about individuals. Were it to be leaked or otherwise abused, people would quite rightly be concerned. One of the defences I've heard is this:

It is never ok to try and placate people's concerns about their personal data by simply saying that everything has been fine so far. (That they know of.) We're living in an era where data breaches are a daily occurrences and as we saw in the case of the Philippines earlier this year, that includes an entire country's electoral role. You must assume a breach and it sounds like even IBM are on the same page there.

Only a few days ago, IBM (the service provider for the online census), was quoted as saying a hack of the data is inevitable. Now I'm not entirely sure how I ended up being quoted in that piece, but it seems a Twitter account I didn't immediately associate with IBM asked me if Have I been pwned could hold the entire census. I provided a brief, off-handed response:

I suspect Philip got himself into a bit of hot water there and his tweet was consequently deleted, but obviously not before media outlets picked up on it.

To try and quell the concerns being raised by the public, the government has talked about privacy safeguards such as strict controls on which officials have access to which data. Access controls are obviously important, but as we've seen in the past, whilst there are human controls involved there will inevitably be incidents of improper data access. It will almost certainly happen with the census data too, that's just how some people are wired.

One of the arguments that's being wheeled out by the government is that it's ironic people are complaining about the census on social media where they share far more revealing data than what's being asked for every 5 years. This is a major misreading of public sentiment for one simple reason: the biggest issue with the census is not that personally identifiable data will be collected and retained for a longer period, rather it's that we don't have any say in the matter.

The reality is though, most of us are disclosing this information via many other channels the government already has access to either willingly or implicitly and for that reason, I'm not at all (extra) concerned. Here's why:

Why you shouldn't be concerned

Firstly, names and addresses have always been collected in the census, the only thing that's changing is the retention period. We're going from 18 months to 4 years so it's not the data that's changing, merely the period before which it's scrubbed (I suspect many people upset about the issue are not aware of this). Naturally this increases the risk of exposure, but it merely extends that period rather than introduces a new risk.

I file tax returns every year that detail not just what I earn, but on many of the things I buy. They also describe who I work with, how much superannuation (retirement plan) I contribute to and what benefits I claim on behalf of having kids. I have cars with rego plates so a bunch of my movements are tracked by the government plus I have the usual healthcare benefits so more tracking again there. The government knows my education history due to HECS (student loans while at university), my marital history and pretty much everything else on the form. In fact, the only thing I could pick they wouldn't either explicitly know or could easily derive is religion.

Now to be fair, not everyone is me; certain personal attributes (such as religion) may be something they don't want to disclose. But for the vast majority of Aussies, they're not expressing anything that either isn't already known or is easily discoverable. Whilst I see the lighter side of 65k Aussies listing themselves as "Jedi" religion in 2011, (and am particularly bemused at those decrying it's just a fabricated belief system...), the fact remains that the data the census collects is actually pretty useful and should be provided accurately. The Australian Bureau of Statistics uses this from those of us down here to help the gov plan how the country should grow and IMHO, that's actually a pretty important thing; the government has a huge amount of personal data on me because that's what a country requires to be functional.

As for the senators who are stating that they won't put their name on the census (and in some cases saying they won't pay the subsequent fine and even go to jail), this is almost certainly more a reflection of politics at play than a deep-seated privacy belief. It gets them airtime in a post-election period where there's not a lot else going on and is particularly ironic when you consider the public profile of these people; more of their personal data than just about anyone's is already in the public domain. It's great for news headlines, but it doesn't say a lot beyond that.

In the broader spectrum of personal data collection, far more outrage is due to shopping centres providing loyalty cards than it is for the census, yet few people think twice about allowing grocery stores to track them. Perhaps it's the monetary reward enticing people, but Coles and Woolies are able to track when you're sexually active, when you have a drinking problem and as we've seen before, even accidentally disclose when you're pregnant.

Summary

Where the ABS has really struggled on this issue is in explaining why data such as names needs to be retained at all. I was just listening to Michael McCormack on the news this morning and the reporter was grilling him on this question. All he could do was revert to political misdirection and play the Chewbacca defence as he went off on a tangent about tracking life expectancy. But this is a messaging problem rather than an indication of actual privacy issues.

Despite the noise coming via the social channels I follow, I doubt this whole issue will have much impact on how people fill out the census. There'll be a few who rebel and go all Jedi on it (see what I did there?) or refuse to provide personal info or even leave the country to avoid it (you know your comings and goings from the country are still recorded, right?) but for the most part, no impact. I'll fill it out accurately and legally because in the broader scheme of privacy, it's a non-event. Now if only I could get my wife to ditch her loyalty cards...

The "Have I been pwned" API, rate limiting and commercial use

$
0
0
The

It's almost 3 years ago now that I launched the Have I been pwned (HIBP) API and made it free and unlimited. No dollars, no rate limits just query it at will and results not flagged as sensitive will be returned. Since then it's been called, well, I don't know how many times but at the least, it's well into the hundreds of millions if not billions. I've always been pretty clear on not logging searches or tracking anything personally identifiable and that combined with attempting to squeeze out every last bit of performance whilst keeping costs low have meant not tracking the API calls over time. What I do know though is that often my traffic will do this:

The

What you're looking at here is 925k requests to HIBP over the period of an hour. It's peaking at about 24k requests a minute and each of those requests is searching through 1.3 billion records. As I've written many times before on Azure, the way I achieve this scale is by allowing the platform to add web instances in response to demand. At present, if CPU utilisation goes over 80% across a 10-minute period then I add an instance. Once it drops down under 40%, I take one away. It means my CPU utilisation for the period above looks like this:

The

You can see utilisation at 100% early on then dropping over time as additional instances are added to the web farm. This causes several problems that I want to address here.

Why high volumes of API calls are problematic

Firstly, the API is being called in a way which means requests are going from a steady organic state to full thrust in a matter of seconds. This maxes out the CPU and degrades the performance for everyone until more instances are put on. Autoscale works beautifully when traffic ramps up steadily, but I'm going to wear a perf hit when it suddenly goes off the charts.

Secondly, money. I've always run this with the cost objective of paying "less than I spend on coffee". Particularly over the last few months (and traffic patterns over that period is the catalyst for this blog posts), I've found myself needing to drink more and more coffee at ever increasing prices in order to reach that stated goal which is, well, not really what I want to be doing.

But there's another issue beyond performance and cost that I've become increasingly conscious of, and that's ethical use. This is a hard one for an API of this nature; you want it to be easily consumable but you also don't want it being abused. Whilst I publish acceptable use guidelines, that serves primarily as a point of recourse for me ("hey, you're not playing nice per the stated guidelines...") than it does as enforcement to ensure it's used in the right way. The very first point in those guidelines is that the API should not be used to cause harm to other people but as the data set grows, the likelihood of this happening increases. Recently, there's been various indicators that the API has been used in ways that aren't in keeping with the spirit of why it was created and whilst I won't go into the details of it here, it's not something that I want to see continuing.

What all of this means - the cost, perf impact and potential for abuse - is this: starting on September 12, the free API will be rate limited to 1 request per IP address per second. Let me talk through the rationale behind this method of rate limiting:

One of the most important considerations for me was not breaking legitimate applications using the API. As you'll see on the API consumers page, a bunch of people have made really neat apps that do positive things for online security in a way that's entirely consistent with how I'd hoped it'd be used. If I'd added an auth requirement, these would break without first reworking them and whilst there are times where that might make sense for certain APIs, I didn't want to break anything unless absolutely necessary. When that API is hit directly from the client consuming it such that the requests are distributed out to many different IP addresses (i.e. when the consumer is a mobile app), they're not going to be making more than 1 request a second anyway.

For anyone scanning multiple addresses in quick succession via an automated script, the rate limit still gets you 3.6k requests an hour. If, for example, you're scanning through your organisation's emails then that's just fine for a small to medium sized company. What it'll do though is make it significantly less feasible to scan through a huge address list. What I'm trying to stop here is doing this for the purpose of identifying otherwise random individuals. However, for legitimate use cases, there's another avenue:

Using the API for commercial purposes

I'm still making commercial options available. Just over a year ago I wrote about a commercial offering with HIBP which allows an organisation to proactively monitor email addresses and domains. By "proactively", I mean there's a web hook implementation which reaches out to an API endpoint on the customer's side and notifies them when there's a hit. There's also the ability to search through both email addresses and domains without rate limit and including sensitive breaches. I want to clearly explain the caveats around this though:

There are a small collection of organisations presently using the commercial offering. They're monitoring accounts where either there has been express permission given by the email address owner ("I would like you to watch for occurrences of my address as part of the identity protection services you provide...") or the owner of the domain ("We would like to monitor where our company's email addresses are appearing..."). I speak to every single organisation using this service; not colloquially "speak" as in an anonymous messaging chat, rather we talk via Skype and I work with them to help them use the data in a positive way. They sign a contract outlining expectations of how the data can be used and we establish an ongoing commercial relationship. What it means is that I have a very high degree of confidence that they're using the data in an ethical way in the best interests of those who own the accounts they're monitoring.

With all that now said, let me talk about how this is technically implemented and what you should keep in mind if you're calling the API at a rate that could exceed the limit.

Technical implementation

Every time the API is hit, the IP address of the caller is stored in cache for a duration of 1 second. When the inbound request arrives, the cache is searched and if the IP is found, an HTTP 429 "Too Many Requests" response code is returned. If the IP isn't found in cache, the request is processes as per usual. The rate limiting is implemented independently on the breaches and pastes APIs to ensure users of the web interface can asynchronously hit both without the second one being rejected. The POST action to the root of the website also implements the same logic to ensure it can't be screen scraped (this is usually just a fallback for if JavaScript is disabled).

Per the spec, I'm also returning a "Retry-After" after header which specifies how long the client should wait before issuing another request. However, this value is expressed in seconds and when I'm allowing one request per second, it pretty much always rounds down to zero. If you're coding to cater for the retry then I'd still pay attention to the header in case I change the rate later on.

Here's what a sample rate limited response looks like:

HTTP/1.1 429
Retry-After: 0

Rate limit exceeded, refer to acceptable use of the API: https://haveibeenpwned.com/API/v2#AcceptableUse

This is the simplest possible implementation I could conceive of. It has no requirements for additional data repositories (cache is stored in-process), it has almost zero overhead as the cache duration is so short and as I said earlier, it requires no changes on the consumers' end (unless they're exceeding the rate limit, of course).

This is also not foolproof; requests could exceed 1 per second if there are multiple instances of the web front end running due to scale out (the in-process cache is machine specific). Someone could also fire up a botnet and hammer it from different IPs all around the globe. However, compared to the present state where I'm frequently seeing many hundreds of requests per second, the rate limiting approach still makes a massive difference either in the volume of possible requests or the financial resources required to make them (i.e. renting a botnet). But it doesn't have to be perfect, it just has to be much better than it presently is.

Testing the rate limit behaviour

The rate limit feature is now fully active in the HIBP staging environment. In fact, both environments are running an identical code base right now and the rate limit is merely feature-toggled to "off" in production. If you're going to need to modify your consuming client to adapt the number of calls it makes, I strongly suggest you play in this environment before the rollover next month.

Help support HIBP by playing nice

I'm going to continue trying to keep as much of HIBP as easily accessible and freely available as possible. To do this, I need the community's support by working within the rate limits. The implementation above will ensure valid responses aren't returned at a rate of more than 1 a second, but it won't stop people hammering it and getting a plethora of 429 responses before getting a valid one. There are other avenues available to block these clients outright but I'd far prefer requests were just kept within the rate limit and the service was used responsibility.

Further to that (and this is the bit where I need to be a bit more serious), I will start blocking abusive clients that aren't adhering to the existing acceptable use guidelines before the rate limit kicks in on the 12th. These guidelines have been documented on the API page for some time now and I've embellished them a little further over the weekend. It's only those that are quite blatantly using the API contrary to the way it's intended to be used that will be impacted and those folks have a pretty good idea of who they are.

As always, please provide your comments below if you've got any feedback you think I really should take on board. I've ensured there's 4 weeks of lead time precisely for that reason and to make sure that anyone who may be exceeding it has time to consider the right path forward.

Website enumeration insanity: how our personal data is leaked

$
0
0
Website enumeration insanity: how our personal data is leaked

I've just wrapped up a couple of Hack Yourself First workshops down closer to home in Australia and true to usual form, attendees found some absolute zinger security implementations. Previous workshops have found various vulnerabilities ranging from realestate.com.au's lack of HTTPS in their Android app (pro tip: don't 301 HTTP requests to APIs!) to the one that really made headlines earlier this year which was the insecure Nissan LEAF app.

One of the modules in the workshop looks at enumeration risks, that is the ability to check the existence of someone's account on a website. There's a perfect illustration of this in the post I did on Ashley Madison last year which showed that even before the data breach, affairs on the website were never discreet; their password reset feature readily disclosed whether an email address already existed on the site.

I'm used to seeing sites with enumeration risks in the password reset, registration and login features and frankly, it's an exception when a site doesn't leak data through one of these. (Check the timing of logins too: slow hashing algorithms often disclose if an account exists by virtue of several hundred milliseconds delay when the username exists.) But this time, the class I ran a few days ago found a far worse level of enumeration:

This is exactly what it looks like: enter a username - any username - and the site will give you the email address. John, Mary... troyhunt all resulted in an email address being publicly shown. This doesn't happen by accident; this is something that someone decides is a "feature". I always picture your typical marketing manager defending the position - "People need to know what email address we just sent the password reset to because... usability!" - which of course is an indefensible position anyway.

I regularly use the marketing manager example because I've had one spin precisely this rubbish before. I once privately reported a raft of issues in the Aussie Farmers Direct website including plain text password storage, passwords sent via email, weak password requirements, mixed mode HTTPS warnings, auth cookie not flagged as secure or HTTP only, reflected XSS and more that I won't go into here. Someone who literally had the title of "Marketing Manager" responded with:

To date we've not had a single security issue stemming from new customers being emailed their password, and I know for a fact 90% of the sites I personally sign up to online also follow that same process.

Good on you mate! But the laissez-faire attitude to security and disdain for ethical reporting paved the way for many interesting follow-up blog posts including how they implemented "remember me" by storing the password in a cookie (also not HTTP only) and how they sent off a bunch of personal data to a tracker over an insecure connection. It should come as no surprise that a short while later, they found themselves in the headlines:

But getting back to enumeration, to their credit, Essential Baby now no longer leaks email addresses, they just leak the presence of your username like so many others do. (Big shout out too for all the people that subsequently tested my username on that site!)

Now that was bad and foolish and as much as it must have seemed like a good idea at the time, at least they fixed it promptly once brought to their attention. Unlike Strawberrynet whose approach to personal information disclosure is, well, "special". Now before I go into this, they do know about the issue and they've concluded that it's a feature that people want so I'm comfortable outlining it here (after which they may well decide it's a feature they don't want).

So here's the issue: you find something nice then hit the "buy now" button which presents a checkout dialogue which you enter an email address into. This can be any email address, after all it's nothing more than a text field:

Website enumeration insanity: how our personal data is leaked

Now, as a reasonable person you'd look at this and say "well clearly after this the site will probably either ask me to create an account or log in which yes, may be an enumeration risk but 90% of sites on the web do that..." But here's what happens next:

Website enumeration insanity: how our personal data is leaked

This is exactly what it looks like - someone else's personal data. Now normally this is not something I'd ever show in a blog post and obviously I've obfuscated the personally identifiable attributes, but this is a bit of a special case. It's special because it's been reported to them before in a very eloquent, precise fashion via a private channel. They responded with a reference to their privacy policy and a quick lesson on securing the transport layer:

Furthermore, our website operates on an S.S.L. Secure Server, which is the industry standard for encrypting all information submitted through our website so that it cannot be read by anyone else over the internet.

What this means is that whilst someone harvests their customer database, anyone malicious sitting in the middle of the connection won't be able to see the traffic. Right...

The individual reporting it clearly saw the error of their ways and pushed back. Thing is, he didn't realise this was actually a feature so they set him straight:

Please be advised that in surveys we have completed, a huge majority of customers like our system with no password. Using your e-mail address as your password is sufficient security, and in addition we never keep your payment details on our website or in our computers.

I'm really not sure what else to add here, I mean clearly the customers like it so that's that, right?! This conversation was some time ago and clearly Strawberrynet still believes this is in customers' best interests. It's not. They may not know it's not, but it's not. Their view may change after this post and often it takes a bit of social pressure to make that happen.

Examples like this remind us how low the bar is often set with security. That you can have a bunch of people in a workshop sit around in a room for a couple of days learning security fundamentals and easily discover stuff like this on a regular basis continues to amaze me. That it can actually be considered a feature genuinely leaves me speechless.


Update (the following day): Unsurprisingly, it seems that a number of people have since requested Strawberrynet permanently delete all traces of their data. One such follower emailed me today, not so much to let me know about having their data removed but to share Strawberrynet's response which is... well, you read it and decide for yourself:

We take the security on our site very seriously and we have recently upgraded our security compliance procedures to PCI DSS Level 1, the highest level of data security and the same standard used by all of the world's top financial institutions.

We did this with the help of two of the world's leading online security companies, so you can rest assured that our site is safe to shop on.

Additionally, all your data is transmitted using industry standard encryption, called SSL, which prevents it being read or intercepted by anyone.

This is just unfathomably reckless, stupid and shows a total disregard for customer privacy. Consider this: someone could hypothetically feed in a big list of email addresses (which are easily found via sources such as recent data breaches) and extract the personal data of what would surely be at least tens of thousands of customers. Names, addresses, phone numbers all retrievable with nothing more than an email address. PCI and SSL wouldn't stop that, hypothetically, of course...

Understanding account enumeration, the video tutorial edition

$
0
0
Understanding account enumeration, the video tutorial edition

I've been running my Hack Yourself First workshop all over the world where I talk to software developers about various security risks which they then get to exploit firsthand. It's a lot of fun and very hands on and practical which inevitably means spending time looking at real world implementations of security.

After running a couple of these workshops last week, I wrote Website enumeration insanity: how our personal data is leaked which highlighted a couple of really bad examples of enumeration that attendees had discovered. That Strawberrynet one in particular... wow! But the post did lead to some questions about how to properly protect against enumeration risks so as I've done in the past with modules from the workshop, I've just recorded a walkthrough of some of the stuff I normally cover to help explain both the issue and the defences. This includes a video demo of the problem with Strawberrynet should anyone be in any doubt whatsoever about the problem at hand:

If you enjoyed the style of how I explained the enumeration risk, you might also like the following which are again, both from my Hack Yourself First workshop:

  1. Understanding CSP, the video tutorial edition
  2. Understanding CSRF, the video tutorial edition

Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

$
0
0
Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

Another day, another data breach:

Yesterday it was a different one:

A couple of weeks ago it was this one:

A little before that there was this:

A fortnight earlier:

And it goes on and on and on beyond which any of you really want to sit here and read through in entirety. Clearly, there is a problem with sites hosting vBulletin getting hacked, but I'm not here to say "the entire problem is vBulletin". Let me explain.

GTAGaming is the site I loaded into Have I been pwned (HIBP) earlier today and this meta tag still graces the source code of their registration page:

<meta name="generator" content="vBulletin 3.8.7" />

Now let's be clear about the genre of this edition of vBulletin; version 3 launched in March 2004 and in case you can't quite remember that far back, there was no Facebook. Well there kinda was, but it wasn't public as Zuck had only just kicked off the coding and it wasn't yet even Facebook, it was "Thefacebook":

Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

A lot of water went under the bridge thereafter and almost 7 years later, version 3.8.7 of vBulletin hit. That was more than 5 years ago and a number of subsequent updates followed:

Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

But even whilst maintaining vBulletin 3 with minor releases, things moved on well beyond this. In December 2009 there was "an extensive rewrite" which brought version 4 and then in February 2013, version 5 arrived. When GTAGaming was hacked, they were two major releases behind the current generation and four and a half years behind in their patches for the major version they were running. And this is the real story with vBulletin - installations going unloved.

When you look at the history of vBulletin sites being hacked, it's rarely 0-day vulnerabilities so we're usually not looking at an attack and saying "Wow, we've never seen that before!". Of course this does sometimes happen but vBulletin issues patches, people take them then we all move on. In theory. But in all seriousness, the next time you see vBulletin in the press, take a look at the version number of the site in question and see just how out of date it was.

But let's take a step back for a moment and stop talking about vBulletin specifically. Managing a software product is like having children; they need constant care and attention and if you neglect them, bad things tend to happen. Let me give you an example from a couple of years ago that beautifully illustrates the problem:

Back in October 2014, Drupal had a little issue. Ok, a big issue in that a SQL injection vector was discovered and it was now "highly critical" that anyone running it patch their things. Urgently (emphasis mine):

You should proceed under the assumption that every Drupal 7 website was compromised unless updated or patched before Oct 15th, 11pm UTC, that is 7 hours after the announcement.

So basically, you go to bed one night and all your Drupal things are cool. You have a full night's sleep, you get up the next morning and now you've gotta assume that all your Drupals have been pwned during the night. Discovering how to exploit the risk is trivial too, in fact you can find attacks published in multiple places both in shadier corners of the web and via the likes of Exploit Database:

Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

Discovering sites running a product like Drupal 7 is then a simple Shodan search away:

Self-hosted vBulletin - you're doing it wrong! (and why you should be using managed hosting services)

There's over 52k easily discoverable sites in one go because as with many frameworks, Drupal likes to shout about its presence via response headers. The point being that going from a vulnerability being announced to discovering an exploit to finding sites at risk is a trivial series of steps. There are groups of individuals actively doing just this against vBulletin forums right now because it's such easy pickings.

But here's the crux of all this and what I alluded to in the title of this post: When you're managing your own installation of a software product, you're entirely responsible for patching it. Quickly! Whether it's vBulletin or Drupal or any other product on any other technology stack, the same thing applies. However, if you're not responsible for managing it, things suddenly get a whole lot easier.

Earlier this year I relaunched this blog on a brand new platform. It all runs on Ghost Pro which is the managed version of their excellent blogging platform. Some people were aghast: "Dude, you can totally manage your entire blog yourself, why would you pay someone to do it?!". I explain it more in the launch blog post but in short, I just don't want the responsibility. I've never patched it, I've never had to fix outages (there's been some, but I didn't have to fix them), I've never had to scale it out and I've never had to worry about all the sorts of things that you should worry about when you're the one responsible for managing the product.

Which brings me back to vBulletin. You can go and get vBulletin Cloud right now for just $15/m. That'll only get you 25GB of bandwidth so you wrap CloudFlare around it and serve 7 times the amount for free (CloudFlare stats for troyhunt.com show 86% of the bandwidth being served from their cache). Yes, you can go and get a Digital Ocean Droplet for $5/m but c'mon, we're talking $10/m the difference here - 33 cents a day! If you're really at the point where you can't justify 33 cents a day then I'm sorry, but you just shouldn't be in a position where you're responsible for other people's sensitive personal information.

This is a critical fact that people self-managing vBulletin (or any system, for that matter) neglect: When you stand up that cheap forum for a bit of fun and thousands of people register, you're now responsible their Gmail, Facebook and banking credentials. Let that sink in for a moment... We've known about this for years and we've also known about how weak vBulletin password hashing can be. Yes, people will exercise poor password management and they'll reuse credentials everywhere but when you're responsible for an old vBulletin installation you've neglected to patch, it's your negligence that allows malicious parties to exploit the negligence of those who signed up to your site.

If you're managing your own installation of vBulletin, you almost certainly shouldn't be. And as for GTAGaming, their fate following the attack has been sealed:

We have now closed the forums permanently

Don't be another vBulletin statistic, hand over the management of the site to them and focus on running a successful community instead.

Viewing all 885 articles
Browse latest View live