Quantcast
Channel: Troy Hunt's Blog
Viewing all 869 articles
Browse latest View live

The Red Cross Blood Service: Australia's largest ever leak of personal data

$
0
0

Sponsored by: Barkly - Make security something your users actually care about. Download the IT Pro's Guide to Raising Security Awareness.

The Red Cross Blood Service: Australia's largest ever leak of personal data

I don't give blood as much as I should. My wife has a much better track record than me, regularly donating not just blood but plasma and platelets as well. I know this not just because it's the sort of thing we talk about, but because her data - along with mine - has been leaked publicly in what I believe is the largest ever leak of Aussie data from a local service.

Because of the coverage this incident will inevitably receive, I'm writing this piece in advance of them publicly disclosing it in order to answer as many of the inevitable questions which will arise as possible. I also want to make it abundantly clear up front that this should not discourage anyone from giving blood in the future because as important as this incident is, it pales in comparison to making a donation that could save lives. I'll come back to that later, let's just start with the facts.

Discovery, verification and disclosure timeline

I run the data breach notification service known as Have I been pwned (HIBP), an ethical project designed to help individuals and organisations understand their exposure on the web which often follows online security incidents such as the LinkedIn and Dropbox hacks. In running this service, I frequently have people contact me with data breaches. Often this is after they've received the data from someone else as part of a trade, sometimes it's provided by the individual who hacked into the system itself and occasionally, it's because they simply found the data lying exposed somewhere.

On Tuesday morning, I was contacted by someone who fell into that last category. He claimed to have data from donateblood.com.au and he provided me with a snippet to prove it - a snippet of my own data. There was my name, my email, gender, date of birth, phone number and the date I'd last donated. He then provided me with the entire data set, a 1.74GB file with 1,286,366 records in a "donor" table which was just one out of a total of 647 different tables. I checked my wife's record and found all the same info as I had albeit across 9 different records reflecting the different occasions she'd donated. In addition to the fields in my data, her data also had our home address and her blood type. There was no doubt in my mind that this data was legitimate.

I queried the sender of the information about how he'd come across the data, expecting it to be as a result of an attack using a technique such as SQL injection, widely regarded as the most serious risk to web security today and frequently the "vector" which leads to the disclosure of data like this. But it actually turned out to be much simpler than that to the point where I initially had trouble grasping what he was saying.

What he'd actually been doing is simply scanning internet IP addresses and looking for publicly exposed web servers returning directory listings. This is literally as simple as going to an address such as http://127.0.0.1 and seeing a list of all the files on the system (sample address only). He'd then look to see if any of those files contained a .sql extension which would indicate a database backup... and that is all. I'll come back to why this data was there a little later.

It can be difficult to know how to proceed after making a discovery like this. I could go direct to the Red Cross who runs the website but there's always the risk of it being swept under the carpet (I had no reason to believe that the Red Cross specifically would do this, but it frequently happens with other organisations). I could go to the Australian Federal Police but frankly, they've got enough really serious crime to deal with as it is. I could go to the media and it would certainly get immediate attention, but it would catch the Red Cross off guard and particularly given the fantastic work they do for the community, that's not something I wanted to see happen.

Ultimately, I elected to reach out to a contact at AusCERT. Many countries have their own CERT (Computer Emergency Response Team) and our local one was a channel I trusted to both take the incident seriously and handle it ethically. AusCERT is a not for profit organisation based out of the University of Queensland and they provide various services to member organisations (membership is a small annual fee) and fortunately, the Red Cross had a pre-existing membership with them. I also knew they were properly equipped with the right people and processes to take something with this degree of sensitivity and do the right thing by those impacted, my wife and I included amongst a huge number of other Aussies.

I spoke to AusCERT Tuesday afternoon and outlined the situation. They reached out immediately to the Red Cross and got back in touch with me Wednesday morning. We spoke again Wednesday evening, Thursday morning and again Thursday afternoon. The constant overarching theme of the discussions was how we could best contain the data and minimise the impact on the donors within there.

As of 12:00 NSW time today, the Red Cross has now issued a public statement which explains the situation and has been covered by the ABC.

What actually happened?

Being conscious that many non-technical people will read this post, let me set some context first. Most organisations have a raft of different, systems, processes, people and partners that handle their data. I spent 14 years in one of the world's largest companies and saw firsthand on countless occasions just how far customer data spreads either by design (working with a marketing agency or a data processing partner) or through lack of proper process (developers with access to customer databases, people managing servers without appropriate skills etc). It's not unusual to see data pass through many hands. It shouldn't happen, but it's extremely common.

In the Red Cross' case, the data that was ultimately leaked was a database backup. That 1.74GB was simply a mysqldump file that had everything in it. Taking a database backup is not unusual (in fact it's pretty essential for disaster recovery), it's what happened next that was the problem.

The database backup was published to a publicly facing website. This is really the heart of the problem because no way, no how should that ever happen. There is no good reason to place database backups on a website, let alone a publicly facing one. There are many bad reasons (usually related to convenience), but no good ones. In fact, I show this anti-pattern in my security workshops; I've just spent the last few weeks training software developers in Europe about how precisely this behaviour is risky and even have a live demo of this in the site I use for my workshops. Often, people don't believe that such an egregiously bad security pattern would ever happen "in the real world", but here we are.

The final piece that made all this possible was having directory browsing enabled on the server. The database backup should never have been there in the first place, but it's highly unlikely it would have been found without directory browsing enabled (the file name would not have been easily guessed, it wasn't as obvious as something like "database.sql"). Showing a public listing of the file contents of the server is a well-known risk and there's rarely a valid justification for this, precisely for the sorts of reasons demonstrated with this incident.

One really important point to make here is that whilst the data originally came from the Red Cross, it ultimately wasn't them that published it to a publicly facing server, rather it was a partner. That doesn't change the end result and certainly the Red Cross has still taken responsibility for the incident, but it's an important detail in the overall chronology of events. Still, it's highly unlikely there was a valid reason for them to provide the partner with such an extensive amount of data and I'm sure there will be many questions asked as to how so much information should have been shared in the first place and indeed how much is shared in the future.

The other issue that's exacerbated the seriousness of this leak is the nature of the data within it. Let's take a look at what was involved.

What data was included?

The obvious one which should already be apparent by now is a list of blood donors. It's not just donors though, it's the appointments they've made and the other data around their identities. I briefly touched on it before, but here's the list in full:

  1. First name
  2. Last name
  3. Gender
  4. Physical address
  5. Email address
  6. Phone number
  7. Date of birth
  8. Blood type
  9. If they'd previously donated
  10. Country of birth
  11. When their record was created
  12. The type of donation (Plasma, Plasmapheresis, Platelet, Plateletpheresis, Whole Blood)
  13. When each donation occurred
  14. Donor eligibility answers

As I mentioned earlier, there were almost 1.3M records in the donor table, but that doesn't mean there were that many actual donors. Of these, 602k had no email address which is entirely feasible given many people would have donated blood by channels which either didn't require them to provide it at registration or they elected to withhold it. There were a total of 413k unique email addresses and many entries that used the same address due to multiple donations (i.e. my wife's). What all this means is that according to the Red Cross' statement today, there were approximately 550k actual people in the data.

As with most data breaches, not all attributes are complete for every person. For example, I mentioned earlier that my blood type wasn't in the data and I suspect that's because if I'm honest, I don't know what it is myself therefore I wouldn't have provided it at registration when I booked the appointment. (There doesn't appear to be any post-donation data in there such as the results of any tests on the blood.)

One attribute of particular sensitivity is the collection of donor eligibility answers. Each donor is asked questions such as whether or not they're on antibiotics, if they're under or over weight and if they've had any recent surgical procedures. They're personal questions, no doubt, but one of them particularly stands out in terms of sensitivity:

In the last 12 months, have you engaged in at-risk sexual behaviour?

Clearly that is a deeply personal, private attribute that could be enormously sensitive if the answer is in the affirmative. Because there are many eligibility questions for each donor, there are a total of 7,343,537 answers in the system and naturally, many of these relate to the question of at-risk sexual behaviour.

Per the title of this post, I believe this incident has the unenviable title of being Australia's largest ever leak of personal data. There was Aussie Farmers Direct with 5k accounts a year ago and Aussie Travel Cover lost a six figure number of records earlier that same year (one person may have multiple records). Both Kmart Australia and David Jones had incidents just before that (both impacting a small subset of online customers) as well as Catch of the Day the year before with an undisclosed number of records exposed, although highly unlikely at the numbers we're talking about here. One thing is certain though - none of them had data anywhere near as sensitive as what the Red Cross holds on blood donors.

Who else has the data?

This is the question which is most concerning and the only answer anyone can confidently give is "we don't know". Part of the reason for this is that the mechanism used by the guy that found it is very simple and very widespread. Scanning the internet for everything from vulnerable code to connected devices to publicly facing backup files is something that happens constantly by many different parties. We saw that recently in the story I wrote up about how Regpack "lost" 324k payment records, again in the exposed Modern Business Solutions MongoDB database a couple of weeks ago and on countless other occasions. Scanning the internet for "things" has just become the norm and sooner or later, data like this will inevitably be found.

Then there's the individual himself who first reached out to me. Trading data such as this is an alarmingly frequent practice and it's common for individuals to exchange it with others. Obviously, that's not something we wanted to see happen and we'd need his support to minimise the exposure. I had a discussion with him on Thursday morning which produced two important outcomes:

  1. He maintains that he hasn't redistributed the data to anyone else
  2. He also agreed to permanently delete the existing Red Cross data he had

However, by his own admission, we can only take his responses at face value. The Red Cross has done the right thing in making a public statement about this and notifying impacted donors; with AusCERT's support, they've approached this as though the data is out there and proceeding with an abundance of caution is the responsible path to take.

HIBP and my own personal data retention

As I mentioned earlier on, people send me data like this to load into HIBP so that impacted individuals can learn of their exposure in incidents they may never have been aware of elsewise. However, this case is unique for two main reasons:

  1. The Red Cross has committed to notifying all impacted parties; there should be nobody who was exposed in this incident that doesn't hear directly from them about what happened, my wife and I included
  2. With the original party who located the data having deleted his copy, mine was the only one we know for sure still existed outside the Red Cross' systems

As a result, I offered to permanently delete the copy I was sent and not load it into HIBP. As of Thursday evening, that's precisely what I did - permanently deleted every trace of it I had. This isn't unprecedented, I took the same steps as part of the clean-up in the wake of the VTech data breach and for all the same reasons it made sense then, it makes sense now. As with VTech, this should give those who were exposed in the incident just a little bit more peace of mind that their data has been contained to the fullest extent possible.

Is it a hack? Or a breach? Or a leak? Does it matter?

One of the things that often happens after an incident like this is fairly selective wording of what actually transpired. For example, the Regpack situation I mentioned above was a very similar situation to this, namely that someone inadvertently published data to a publicly accessible location. Regpack were at pains to point at that their systems were not breached, instead referring to the event as a "data incident". I was critical of this approach at the time because it was obvious their intention was to downplay the severity of the issue as opposed to owning the problem and communicating honestly and transparently (there was other behaviour also consistent with this).

In terms of the Red Cross, it's hard to call this a "hack" simply because it didn't involve exploiting any weaknesses within their software. I mentioned SQL injection earlier on and that's frequently the root cause of breaches where exploiting system flaws is involved, as are attacks such as enumerating direct object references and numerous other methods that rely on faulty code. I've used the term "leak" throughout this post because in my view that's a fairer definition; they inadvertently published the data to the world web and someone simply downloaded it.

But frankly, it makes very little difference to the people in the data set as the end result is the same: their very personal information fell into the hands of someone who should never have had it in the first place.

AusCERT and Red Cross' handling of the incident

There's no escaping the fact that this was a major cock-up on many levels and that's the simple, honest truth. Like many people who report security vulnerabilities or incidents like this, I've had experiences in the past where responsible disclosure and appropriate action on behalf of the organisation involved has not been a smooth experience (000webhost and Nissan are two notable examples).

AusCERT have been absolutely outstanding. I won't go into all the details but they handled this with a professionalism and urgency far beyond what I expected. They were instrumental in helping the Red Cross contain the risk, prepare their public communications and identify the necessary steps to advise impacted parties. I always had complete confidence that they'd handle this precisely in the way that those of us reporting incidents like this would expect and they've done a standout job of that. I'd encourage not just other Aussies to reach out to AusCERT in cases like this, but folks around the world to contact their local CERT if ever they learn of serious digital risks like this (Wikipedia has a list of them).

As for the Red Cross, this was a massive shock to the leadership there. As unforgiving as I tend to be about serious security oversights like this, I can also sympathise with the situation they've found themselves in, especially given their purpose and the role they play within the community. There was never any indication that they wouldn't handle this in precisely the fashion an incident of this type deserves: it was a factual, honest and expeditious process. No sugar-coating what happened, no suppressing information from impacted parties and no blame apportioned to the individual who originally identified the leak. It shouldn't have happened in the first place, but it couldn't have been handled better once it did.

If nothing else, I hope that the outcome of this incident encourages others to exercise responsible disclosure themselves in the future.

Please continue to give blood generously

I was really conscious when I first started looking into this that the incident would make life hard on the Red Cross. It's going to cost them money, it's bad publicity and there's a real chance that people may actually feel less inclined to give blood. I want to lead by example here and do what I should have been doing far more frequently anyway:

The Red Cross Blood Service: Australia's largest ever leak of personal data

I've booked an appointment for the first available spot at my local donation centre so come Monday, the Red Cross will have my blood. They also now have my data (again) and yes, it's the correct data with honest answers to all questions. This is not one of those "oh, just fabricate the answers" deals, this is an act that can saves lives. I don't like that my data was exposed in this way but let us not lose focus on life's bigger issues.

If you read this and found it interesting, wherever you are in the world, take a short bit of time out of your day and donate blood. Help find a silver lining in this incident and use it as an opportunity to make a positive difference.

Edit: The source of the leak has now been identified as a contractor named Precedent.


Weekly update 6

$
0
0

Sponsored by: Barkly - Make security something your users actually care about. Download the IT Pro's Guide to Raising Security Awareness.

Weekly update 6

I'm home! Ideally, I'd be home recovering from travel but it hasn't quite worked out that way, particularly with the Red Cross Blood Service having a massive data leak. I blogged abut that in some detail yesterday, but I wanted to talk about it in this week's update video and give some more context as to what went on and why I made some of the decisions I did. Plus, there's the mega-trip wrap up, a quick preview of some upcoming stuff on my Ubiquiti network setup and my thoughts on how the web is held together with sticky tape.

  1. The Red Cross Blood Service leak (yes, it's massive and no, it's not in HIBP - I explain why)
  2. A blow-by-blow account of my mega trip (we all share the most glamorous bits of our lives on social, here's some of the darker ones of my recent trip as well)
  3. The IoT DDoS is a reminder of how fragile the web is (stateless HTTP combined with cookie for auth needing anti-forgery tokens to prevent CSRF - it's all just held together with sticky tape)
  4. Barkly sponsored the blog this week (there's a free e-book behind that link!)

Apple's desensitisation of the human race to fundamental security practices

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Apple's desensitisation of the human race to fundamental security practices

My son turned 7 earlier this month. I've been getting him into coding and teaching him the fundamentals of using a PC which I reckon is a pretty essential life skill these days. Part of that is helping him to understand the principle of secrets, namely he that should protect the PIN he's using to sign in to his Windows 10 machine. He's good at it too, being sure to shield the little laptop from view whenever he uses it with others around. But based on my experience today, if he was to walk into an Apple store today with a faulty iPhone, he'd be taught a very different lesson:

The link in that tweet explains precisely the reason why we all know that handing over your unlocked phone is a bad idea. This is how it played out in the store:

Me: The speaker in the phone has got a really nasty crackle when holding it to your ear on a call.

Genius (and I'll be using that term sarcastically from here on in): Oh yes, I can hear that, it definitely has an issue, let me run some diagnostics.

[He fires up the local diagnostics on the device in front of me after I'd unlocked it with Touch ID]

Genius: Ok, let me take that out back and check it out for you.

Me: Sure, just let me lock it first.

Genius: Uh, no, we need it unlocked.

Me: It's a hardware fault and no, I can't give it to you unlocked, I'm a security guy and I think about these things.

Genius: Oh you can trust us, we take your security very seriously!

[Right about here I'm trying not to lose it because I know what that statement usually means!

Me: Look, there's no way I'm giving you an unlocked phone while you disappear into a back room, do people seriously do that?!

Genius: Sure, all the time!

Of course I knew the answer to the question before even asking it, but I somehow felt I needed to rub the point in before storming out of the store, dysfunctional iPhone speaker in tow. I could have wiped it there and then, handed it over and later restored from last night's iCloud backup, but I don't like not having a fully working outgoing device before doing a restore to a new one. I also don't like the lag time due to poor Australian internet and whilst I could have driven home and done a local backup to iTunes, there's still the need to reconfigure a bunch of things that don't cleanly restore. tl;dr - I wasn't getting out of this without burning a heap of time one way or the other.

Walking away disgruntled by the whole affair, a number of things started really getting to me. First of all was how broken the ecosystem was when the only way they could support their product was to fundamentally compromise their customer's security. It's certainly not just my experience either:

That last tweet is a perfectly reasonable assumption and some device manufacturers seem to be able to get that spot on:

I'm not highlighting this because I do some Microsoft things, I'm pointing it out because it's precisely the way we'd all like to see things pan out. If it's a software or configuration issue, sit with the customer and sort it out but if there's a hardware issue that requires the device to be taken "out back", then there should be a means of diagnosing faults on a locked device. Yes, I get that we're talking about potentially competing objectives (strong encryption versus being able to operate some functions whilst the device is locked), but clearly the current situation is far from ideal.

The other thing that struck me is the paradox of how easily all the great things Apple has done with protecting devices can so easily be undone by a screwy support policy. I mean think about it - Touch ID, Secure Enclave and a track record of being resilient even to attempts to access a locked device by the FBI. Then a kid at the Genius bar turns around and say "oh yeah, just circumvent all that for us would ya?" I'm pretty sure I've seen this paradox represented somewhere before...

Apple's desensitisation of the human race to fundamental security practices

The bigger issue though (and the one which inspired the title of this post) is that it's a terrible message to be sending consumers. Apple have done the good work mentioned above precisely to keep the data out of the hands of unauthorised users. Look around you in an Apple Store and consider how many of the folks in there have the first clue about how to properly secure their devices. Like all of us, they learn based on the experiences they're exposed to and the message they're getting when they go into an Apple store is that it's ok to give an unknown third party their unlocked device. And yes, the person you're giving it to is an unknown party because you're not giving it to Apple (that's a company), you're giving it to a poorly paid and inexperienced stranger.

I've been critical of Apple in the past for very similar reasons insofar as that was a case of convenience trumping security. I love my Apple devices and I genuinely think they're the best phones and tablets on the market, but this attitude towards consumer security needs fixing.

However, I've worked out how to fix it - I'm going to buy that iPhone 7 I was thinking about, restore the 6S to it, wipe the old device and then have it repaired (or replaced) after which I'll eBay it. Somehow I think they'll be just fine with that solution...

The public Have I been pwned API now has a Creative Commons Attribution license

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

The public Have I been pwned API now has a Creative Commons Attribution license

We're now going on almost 3 years since I introduced the Have I been pwned (HIBP) API. In fact it was one of the first things I did after creating HIBP in the first place because I wanted to make the data as accessible as possible and create an ecosystem of third party apps.

However, over time I've also had to deal with the API being used in ways I never intended. For example, I recently introduced the rate limit because I saw evidence of the API being used in a fashion that was inconsistent with what I (and most others) would reasonably deem to be ethical use. I wanted to make it as broadly accessible as possible in the early days, but ultimately there came a time where it needed to evolve.

One aspect of the API I didn't give a lot of thought to early on is licensing. In fact I deliberately didn't touch on it because I wanted to keep things as open as possible so I could see the directions people wanted to go with the service. The API consumers page lists many excellent uses of it so far including mobile apps, various use cases in different programming languages and all sorts of implementations I never thought of. All this is great - it makes me enormously happy :)

Increasingly though, I've found uses of the API that make me less happy, namely cases where I've looked at a site that represents itself as being a data breach monitoring service and thought "Wow, that's really similar to HIBP", only to realise that each request to it is then passed directly to the HIBP API anyway!

So what's the problem with this? Well there's a few things and one of them is simply confusion it creates. I often have people contacting me and asking "Hey, is this HIBP data?" because they're using another service which not only operates identically, but returns precisely the same results. Are they getting data from HIBP? Is HIBP getting data from them? It's confusing.

In some cases, it isn't simply a case of another service using the HIBP API for the public good, they're commercialising it too. Now here I have a more serious issue and it's not that I'm losing money (I want people to use HIBP for free), rather it's that work I wanted to make accessible to the masses is being monetised by someone else. It's hard to put into words precisely how that makes me feel, but it just doesn't sit well and I think most people understand that.

I've always run HIBP very transparently and I want that ethos to extend to those who wish to leverage the public API. I don't want to stop people using it in the way they are, but I do want them to show the same levels of transparency that I do when they create publicly facing services. As such, the Creative Commons Attribution License was a perfect fit and I've now added this to the API documentation:

The public Have I been pwned API now has a Creative Commons Attribution license

This is really simple: if you want to use the free, publicly available API then just let people know where the data is coming from. I don't mind if services want to charge for that as part of a broader offering but I do want people to know the source of the information. If you're using the API and you want to "white-label" the HIBP data, then contact me and we'll discuss commercial options, some of which I've written about in the past.

The last thing I'll touch on here is the premise of data being free and open: this is not that. In fact I put a lot of effort into making sure the data is not free and open because frankly, it's just not that sort of data. For example, last year I wrote about how I'd handle the Ashley Madison data precisely to ensure it wasn't open. The rate limit I mentioned earlier is also a good example of how I've deliberately worked to ensure the data is not free for all the sorts of reasons I explained there. It's not in the same class as open source software either; this isn't a funky JavaScript library you want spread as far and wide as possible to help developers do their job, it's a serious amount of often sensitive information that needs to be carefully guarded.

I hinted at this earlier today on Twitter and the response was (almost) unanimously positive with most people being very understanding of why attribution is important. By adding this license, not only does it clear up some of the use cases I've already alluded to, it also paves the way for new consumers to have a much better understanding of how they can use the service right from the outset. I'll be reaching out to any services I know of that aren't in keeping with the philosophy I've outlined here and supporting them whilst they make the transition - I'm certainly not about to cut anyone off overnight! But I do hope that ultimately, this leads to greater transparency and more people having a better understanding of where their data has been compromised.

New Pluralsight Course: Modernizing Your Deployment Strategy with Octopus Deploy

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

New Pluralsight Course: Modernizing Your Deployment Strategy with Octopus Deploy

Here's a little-known fact for folks that have only tuned in more recently: I had a life before doing security things. I know, it seems like a long time ago now, but there was a time where all the other things that go into the software development process were highly topical for me. In fact, some of the most popular content on my blog over the last 7 years has been the You're deploying it wrong series where I walk through the setup of TeamCity to automate deployments from source control. That was way back in 2010, but I've now found myself revisiting the whole deployment world in a new Pluralsight course.

A few months again when I was down in Sydney at the NDC conference, I teamed up with Damian Brady who works for Octopus Deploy to record a "Play by Play" course. Damo is also a good mate which lends itself really well to this style of course because it's the both of us sitting there having a conversation about technology. But don't worry, you're not just looking at us, there's actual walkthroughs of how the product runs as well:

New Pluralsight Course: Modernizing Your Deployment Strategy with Octopus Deploy

Octopus has become hugely popular in recent years and the Aussie startup is doing great things on the world scale. I genuinely wanted to understand more about how it's put together and that created the perfect opportunity to record a course of this nature.

In this 58-minute course, we talk about some of the incumbent mechanisms for deploying software, the challenges they pose and then how Octopus addresses those. Of course, I had to grill him on the security model too and we cover off other aspects such as how you can grab Octopus Deploy in the Azure marketplace, deploying to non-Windows environments and how auditability is achieved which is a pretty important factor in many organisations.

This is a great course to watch while you're sitting down having lunch or on the daily commute because it's not technically deep but goes far enough to help you understand what Octopus Deploy is all about. Play by Play: Modernizing Your Deployment Strategy with Octopus Deploy is now live!

Ubiquiti all the things: how I finally fixed my dodgy wifi

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Ubiquiti all the things: how I finally fixed my dodgy wifi

I'm increasingly of the view that both my time and my sanity are worth more and more as the years progress. Particularly in my independent life, it really can be that black and white - if I can't work, it costs me money. Plus, I want to be happy and few things make me less happy than computer bits going wrong:

I bought 2 Linksys WRT1900AC wireless routers when I moved interstate into a big house just over a year ago. They were well-reviewed and I figured I'd put them at opposite ends of the place, throw in a couple of switches as well and we'd be all good. Now maybe if I was your everyday garden variety home user it would have been ok, but I'm not and if you're here on my blog reading this then you're probably not either.

The first thing that got me with these devices is that it felt like nothing had changed since the very first wireless access points I had over a decade ago. Same basic interface, same simple settings, same "just get it set up then never look at it again" paradigm. Also the same approach to updating the thing: you're not going to see many changes to the firmware. I had Billion devices before Linksys and I don't think I ever saw a firmware update over many years which worried me not just because I'm sure there was broken stuff in there, but because it was never evolving. It was like "here's your router, uh, good luck!"

And then the Linksys started misbehaving. I'd be working away on the laptop on wifi and the connection would just drop to almost nothing. Instead of 40Mbps down per the tweet above, it'd drop to less than 2. There's 95% of my connection speed gone for no apparent reason. Wired connections were fine and power-cycling the router would fix it, but every time a page was a bit slow I'd be wondering if it was cactus again. After venting on Twitter, I got a variety of responses including that I should install the open source dd-wrt firmware or that I should buy a power adaptor that can automatically cycle the power every night. No, no, a hundred times no to both - if I buy a product then I expect it to work as advertised and not need to implement hacks to keep it alive. There's also extensive evidence out there that this is just what many Linksys products do and the more I looked into it, the less convinced I was that it was ever going to be acceptable whilst I had Linksys devices in the house.

But I also got a resounding chorus of people telling me to just "do it properly" and go buy Ubiquiti bits. Admittedly, this bugged me because I didn't want to buy new stuff, I wanted the stuff I already had to work properly. But the more I thought about it and particularly as I looked into it, Ubiquiti made a lot more sense. Plus, I frankly just had to get over my sense of loss aversion - it is what it is with the Linksys devices, now what's the smartest thing to do? It was Ubiquiti. Let me share what I got, how it works and how it's finally solved my long-running wifi dramas.

Ubiquiti things

First and foremost, Ubiquiti doesn't fit the usual consumer-grade mould, for example, they make gear like this:

Ubiquiti all the things: how I finally fixed my dodgy wifi

For when you absolutely, positively need to stream data at 450Mbps over a 30km range! It's full on gear and when you browse through their product page you get a sense of who they target. There's a lot of gear there you'll find in an office or commercial environment and it's bereft of pretty colours, gamer-style antennas and other warm, fuzzy, touchy-feely things. It's stuff that has purpose.

But here's the thing - it's all pretty single-purpose stuff too. The consumer gear crams everything into one box that tries to do everything which you then sit in a corner and it's "job done". For example, the Linksys I was replacing is The Homer:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Go down the Ubiquiti path and you're buying different bits for different purposes. Instead of being a single purpose Homer, it's multiple individual vehicles: one is a two seat Italian supercar, another is a people mover, then there's a track day special and yes that means more things but they each do what they do extremely well without the compromises of multi-purposing. Want DHCP? That's a device. Want wifi? That's a different device. Hard-wiring some bits? Another device.

All these devices take some planning so let's get onto how I put it all together.

Requirements and design

Because not having an all-in-one device was a bit foreign to me, I decided I needed to crowdsource some help on this one. I'd had a bunch of chats with people online about this but it was tricky in short Twitter discussions so I wrote the whole thing up in a Gist about building Troy's network which I'll embed here as it explains the background and design decisions really well:

Let me clarify a few important points here:

  1. The house is large - 500 square meters is 5,381 square feet - and it's spread out over 3 levels
  2. Going back to the first para of this post, I wasn't going to take any chances this time and I wanted to overcompensate
  3. I decided I didn't need 3 switches after all, I'd run the lounge devices (Apple TV, Xbox, etc) off wifi and save on the extra network hardware

Do have a read of the comments on that Gist too and while I'm here, a big thanks to Drew Davies and Nick Craver who were enormously helpful in sanity-checking what I was doing.

Ordering and what's in the box

With the design in the Gist above now pretty much solid, I ordered all the bits locally. I'm always pretty transparent in these posts so here's what they cost me (all Australian dollars so take off about 24% for USD, 31% for EUR or 38% for GBP):

  1. The 5 pack of wireless access points: $1,077
  2. The 2 switches: $762
  3. The Security Gateway: $197
  4. The Cloud Key: $139

These were all via local sellers on eBay with strong histories I was confident I could get support from. I could have bought them cheaper from overseas or via less reputable sellers, but I didn't want to take chances. Yes, it's nearly $2.2k but I'll eBay the 2 Linksys wireless routers (the faulty one goes back to them, they sent me a brand new one that's still boxed) and the 2 Linksys switches I have and that'll be about a quarter of the Ubiquiti cost right there. The remainder is spent with pre-tax dollars and I'll get at least half a decade out of it so back of the napkin, it'll take less than 50c a day out of my pocket for something that's an absolutely critical part of what I do professionally, as well as personally. I could also have saved about half the cost by using a single switch (I could repurpose one of the Linksys switches) and only a couple of wireless access points. I was happy to pay to overdo it, but you get the point - you can do it a lot cheaper.

Anyway, I order all the gear whilst away on my mega-trip and collected it all when I got home. I captured the following Periscope video as I unboxed it all:

And once arranged in a pretty, photogenic fashion:

Ubiquiti all the things: how I finally fixed my dodgy wifi

All that is just the warm-up though, let's now make some stuff work!

Getting the basics working

I wanted to start with just the most fundamental bits working which in my mind, meant this:

Modem -> Security Gateway -> Switch -> Cloud Key

So I did just that: the modem was the only hangover from the old setup so that stayed and I plugged it directly into the Security Gateway. These were sitting next to each other in my lounge room (because that's where the cable for the web enters the house) so it was a no-brainer. I needed the Cloud Key in order to actually control everything though which meant I also needed a switch to plug it into. I went out from the Security Gateway, into a wired ethernet port in the lounge room, out through the patch board into the garage and then into the Ubiquiti switch. Plug the Cloud Key into that and it's job done.

I connected my laptop into the switch via ethernet and browsed over to the Cloud Key admin interface. Everything worked immediately and frankly, I chucked out the instructions after the first few minutes and just followed the on-screen prompts. I'm not even going to bother repeating the steps here because it's ridiculously easy - there's no story to be told in clicking the "next" button!

I patched the point which terminates in my study and dropped in another switch up there. This then gave all my study bits connectivity as well (desktop PC, NAS, printer).

Then I patched a bunch of points around the house for the wireless access points, each one ultimately terminating at the Ubiquiti switch in the garage which provides them power over ethernet. The only other step required now was to "adopt" each access point which literally meant a button click and now it was under control with all the other UniFi enabled devices.

Let me just qualify that term too - "UniFi": it's actually very simply explained on their UniFi page:

UniFi is the revolutionary Wi-Fi system that combines Enterprise performance, unlimited scalability, a central management controller and disruptive pricing

Right... In more simple terms, it unifies (get it?) all the management into the one portal run off the Cloud Key which makes for a neat management experience. All my bits are aligned with the UniFi product line (other Ubiquiti bits aren't) which makes for a nicely integrated experience.

All said and done, I ended up with this:

Ubiquiti all the things: how I finally fixed my dodgy wifi

1 Security Gateway, 2 switches and 5 wireless access points. Beautiful! Let's set them up.

Configuring the wired bits

This was all a piece of cake and it literally consisted of me working through the menus and thinking "I wonder what this does..." For example, when I drill down into the switch I put in the garage at the patch board by clicking on the device in the image above, I found this:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Already you'll start to see a lot of info not exposed by your usual consumer-grade stuff. There's more again in the uplink section:

Ubiquiti all the things: how I finally fixed my dodgy wifi

There's the Security Gateway sitting upstream of the switch and the activity it's both seen and is seeing. Here's everything downstream of the switch:

Ubiquiti all the things: how I finally fixed my dodgy wifi

You can see that I have 4 wireless access points plugged directly into the switch as well as the other switch in the study. If I was to drill down on that one, I'd see another access point connected directly to it.

In the earlier image, you may have noticed the coloured indicators across the top of the properties window, let's take a closer look at those:

Ubiquiti all the things: how I finally fixed my dodgy wifi

I love how much information is coming back here. For example, port 5 is only talking at 100/10Mbps, the slowest of the lot. That's because it's connected to the base station for my NetGear Arlo security cameras and inevitably it doesn't require quite the same throughput as the things on the other ports. Speaking of which, there are 5 ports using PoE which is the Cloud Key and 4 of the wireless access points. The 1Gbps port with no PoE links to the switch in the study and whilst all this may be more information than what I actually need, it doesn't get in the way of the essential stuff.

Moving on, I used the configuration section of the switches to name them logically and then did the same with the access points:

Ubiquiti all the things: how I finally fixed my dodgy wifi

There's obviously there's a heap more configurability beneath there and I won't delve into all of it, but you're getting a pretty good idea of what goes on by now anyway.

Configuring the wireless bits

Onto the wifi and it's a case of dropping into the wireless networks section and just creating a new one:

Ubiquiti all the things: how I finally fixed my dodgy wifi

You can go a whole lot deeper on things like rate and beacon control too if you want to get down to that level:

Ubiquiti all the things: how I finally fixed my dodgy wifi

What I really like about Ubiquiti's approach is that whilst you can get down to this very fine level of control from within the UI, it never gets in the way of configuring the fundamentals. I don't have a need for rate and beacon control and I don't see it by default or even when drilling down into the advanced options (it's collapsed by default), but when I need it, it's easy to find.

Once the wifi is configured, it's pushed out to all the access points. Remember, this is all being set up on the Cloud Key and the config then needs to be applied as required, but that happens automatically when you save any settings. It's a seamless experience.

As with the switches, there's a heap of configuration options once you drill down into an access points:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Of particular "cool factor" is stuff like the spectrum analysis which allows you to run an RF scan on the network from any access point:

Ubiquiti all the things: how I finally fixed my dodgy wifi

That's 5G from the living room drilled down on channel 100 usage, here's 2G as well:

Ubiquiti all the things: how I finally fixed my dodgy wifi

This can be great for identifying where you might have interference on certain channels within the environment.

Then there's band steering:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Apparently, clients aren't always the best at connecting to 5G when it's supported and will often fall back to 2.4G. There's a good discussion on this in the Ubiquiti forum (in fact there are many good discussions on the forum with lots of support from Ubiquiti themselves) but in short, it should encourage your things to communicate via the faster 5G network. I actually only configured this while writing this blog a week after setting up the network and I found a lot of 5G capable devices were on 2.4G. Now having said that, I still see a lot of 5G capable devices on 2.4G which doesn't bother me for devices like the iPhone, but I need to look a bit further at why my laptop in particularly may not be jumping onto the faster network at every available opportunity.

So ultimately, I ended up with 5 wireless access points spread around the house with each supporting both 2.4G and 5G but "steering" clients towards the latter. Roaming around between them is seamless; I have absolutely no idea when, for example, my iPhone jumps between access points whilst on a Skype call. Ubiquiti implements what they refer to as Zero-Handoff and it certainly does seem to make the whole experience as slick as I could imagine with no noticeable change in behaviour as I move around between APs. However, I can always find out what the phone did by looking at the events:

Ubiquiti all the things: how I finally fixed my dodgy wifi

I went out of range at 08:17 when I took my son to school, not returning home until 11:11 at which time I wandered around the house a bit until the time of writing at midday. What an excellent way to track the comings and going of your significant other! Or perhaps older kids ("I know how late you were out, the network told me!"). And in case you're wondering, yes, I named my network HTTP403 :)

But I'm getting ahead of myself, let's get everything reconnected.

Reconnecting everything

By far, this was the most painful bit of the entire install. Combine a strong WPA2 password with many devices and a bunch of them not being things you can easily enter anything into and you've got yourself an evening of tedium. For example, I have a WeMo switch for the coffee machine and if you change your router you have to restore the device to its original settings which means re configuring things like schedules too. I also ended up entirely reconfiguring the Sonos as well and the Withings scales were painful to connect again. But it's a one off - you do it and you're done - and that means you get a really shiny dashboard:

Ubiquiti all the things: how I finally fixed my dodgy wifi

I love this. I love that I can see everything that's connected (I did have to name each device as I showed with the switch earlier on), I love seeing where it's connected to and I love seeing which devices are active. When I snapped this, I'd just fired up the W540 after being dormant for a long period and it was downloading updates. My wife was watching Netflix on the Apple TV in the lounge room and the Yoga 900 was downloading an update. For folks like me that love visibility into this sort of thing, it's like being a kid in a candy store.

There's just information everywhere. For example, here's what I'm seeing on my iPhone:

Ubiquiti all the things: how I finally fixed my dodgy wifi

If I drill into the history, I can see what its been doing data wise recently:

Ubiquiti all the things: how I finally fixed my dodgy wifi

I did an iOS update last night so it pulled down quite a bit, plus it did the usual iCloud backup thing so pushed up 163MB.

But here's the big question most people have in mind when implementing wifi: is it fast? Let's start with one easily quantifiable figure and that's the speed test embedded in the tweet at the start of this post. It's 40Mbps down and as for the up speed, well, we try not to talk about that in Australia (it's almost impossible to get more than 2Mbps up in the home unless you're on the National Broadband Network). That was on 5G with the Linksys gear, here it is on 5G with Ubiquiti:

Ubiquiti all the things: how I finally fixed my dodgy wifi

That's healthy! In fact, it's as fast as my connection will go - I even saw a few Mbps less just now when I tried it on my desktop PC wired directly into the switch. Now of course there's lots of factors involved here as well, not least of which is the proximity to the AP and if there's any obstructions. In my case, it was about 6m away and separated by a concrete wall. But what I'm happy about is that this is my Lenovo P50 sitting on the kitchen bench where I so frequently use it and I'm now getting the maximum possible speed through to the internet which is what was most important for me.

This is by no means a comprehensive test, but it's the one that most mattered to me. If you want to get into more technical detail, check out the post titled Ubiquiti UniFi made me realize how terrible consumer Wi-Fi gear is on Ars. That pretty much sums it up for me too :)

Now all this is cool, but wait until you see the next bit...

Analysing the network

I wanted to leave this until now because it doesn't make a lot of sense until everything is plugged in and traffic is being captured, so here it is, in all its glory:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Need a quiet moment to yourself? Yeah, it's that good. Let's go through it bit by bit from top to bottom:

  1. The 44ms latency is what I'm getting on the outbound connection and I'm pulling down 49Mbps whilst pushing up 0.94
  2. I've got 1 WAN device which is the Security Gateway
  3. I've got 2 LAN devices being the switches
  4. There are 5 WLAN devices which are the access points
  5. Download throughput and latency are self-explanatory and show traffic patterns that align with waking hours
  6. The devices on the 2.4G and 5G channels are represented by little dots
  7. I've got 8 network devices serving 18 clients that have transferred 217GB of traffic since I set this up

That last stat is extra cool and it's enabled once you turn on deep packet inspection (DPI). Here the Security Gateway is inspecting the traffic so that it can give you reports like this:

Ubiquiti all the things: how I finally fixed my dodgy wifi

DPI is actually still indicated as being in beta at the time of writing and there are a few rough edges but it's still very well presented. For example, there's a lot of Dropbox traffic in there, let's see who that's coming from:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Ok, that'd be mostly me then! Now just in case anyone asks, yes, Dropbox traffic is encrypted but a man in the middle can still observe the service the traffic is going to and which clients are sending it hence the ability to do visualisations like this.

Actually, the Dropbox graph is a good example of where it's still a little bit rough. We're seeing 87.2GB but it's not immediately clear what duration this is over (I assume since day 0). The clients beneath it are also a bit misleading - we're looking at current activity here as opposed to historical shares of the 87.2GB pie. But hey, it's beta and once the data exists they can always but more intuitive visualisations over it later on.

In fact, to the point about later updates, Ubiquiti is actively working on new features:

This is one of the things I'm excited about and it's not just that software is being maintained, it's that they're actively adding features. I've never had a home access point that's done that and frankly, I've been lucky to ever see any updates of any kind on most of them!

Guest network

Just a quick one here - of course there's the ability to create a guest network:

Ubiquiti all the things: how I finally fixed my dodgy wifi

What did surprise me though is the extent to which it can be customised which is, well, pretty full on:

Ubiquiti all the things: how I finally fixed my dodgy wifi

This brings us back to the premise that this isn't a consumer product and that it's designed to support all sorts of use cases you're not going to see in the home. Changing the authentication to "Hotspot", for example, allows you to configure voucher and payment based authorisation.

Finally, on the guest side of things, there's easy access to filtering down the subnets available on that network which is pretty handy in terms of keeping guests isolated but also having the ability to give them access to certain devices if desired:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Other controls

There's more configurability than what I have time to cover, but let me give you just a sense of things.

I could decide that I want to limit the speed some users can achieve by setting upload and download limits in a new user group:

Ubiquiti all the things: how I finally fixed my dodgy wifi

I can then create a new wireless network and apply the group to it:

Ubiquiti all the things: how I finally fixed my dodgy wifi

And because this is for the kids, I've also restricted their hours so they're offline by 18:00 when it's a school day the next day, but they have until 20:30 when the following day is free.

Mobile clients

For me personally, these aren't particularity important as I'm always close to a PC which is going to be a much better experience if I'm configuring anything, but they've actually got pretty neat iOS clients:

Ubiquiti all the things: how I finally fixed my dodgy wifi

Ubiquiti all the things: how I finally fixed my dodgy wifi

Actually, the one surprising thing I found was that the mobile clients seem to be the only place where I can see which devices are on 2.4G versus 5G! Not sure entirely why that is (a possible answer is that I'm simply missing it on the portal), but here we are.

Other bits

A few other things that have come up as I've been researching, configuring and sharing progress online:

There's some contention about whether or not you should have a Security Gateway or an Edge Router, there's even a video dedicated to the discussion. The tl;dr is that the Security Gateways works with the UniFi ecosystem which gives it nice integration but lacks the power and configuration of the Edge Router. The latter has greater configurability around more complex network setups and also adds support for things like... DPI! But this is a 5-month-old video and since that time, things have moved on and obviously, DPI is a great example of something that's now in the Security Gateway too. You'll also notice that it's a very different looking portal in my screen grabs and clearly a lot of work has gone into the software since the video was recorded. I'm sure there's still not quite feature parity, but it's worth doing the research based on what's actually supported today and what's important to you personally.

One criticism I've seen is that the guts of the Ubiquiti gear is essentially just consumer grade chipsets. I'm not sure this is necessarily a detrimental attribute though - using chips from the one manufacturer across various other downstream hardware vendors is very common. It's also worth nothing that's a 2-year-old post and things may have moved on since then, but I'm just not sure that common components are really a story.

There's a bunch of other stuff I haven't touched on here because I haven't had the need to: VLANs, site to site VPNs, airtime fairness and so on and so forth. If you want more info on these, a combination of the Ubiquiti website and their forum should have you well and truly covered. Or just ask a question in the comments below and I'll do my best to answer it.

But do I trust the security of them?

I'm going to touch very briefly on this simply because people have asked the question and it deserves an answer. I have no less confidence in the security of Ubiquiti gear than I did with Linksys or any other provider for that matter. That's a very caveated statement because frankly, it's not something I've delved into. I don't want to delve into it either; I have so many devices, apps and all manner of other connected things that I simply don't have the time or inclination to delve into all of them.

One area where Ubiquity does come up security wise though is when they featured in DDoS attacks last year. This was not their finest hour - vulnerabilities in their gear caused equipment to be enslaved in botnets which is not what you want to see happened with your network things. Unfortunately, this is what does tend to happen with network equipment on an alarmingly regular basis, just check out some of the examples involving CSRF on routers.

Where I do have more confidence in Ubiquiti than the other players I've dealt with before is in what I mentioned earlier, namely that they're maintaining their things. The product is being improved and the software regularly updated. They're used to shipping updates and that gives me a great degree of confidence that when they next need to fix a vulnerability (and there will be a next time - everyone is in this boat), they should be well-positioned to do it quickly.

Summary

Obviously, I like the gear :)

I've been running the setup described above for a week now and it's been rock solid. Great coverage, no drop outs like before and some very interesting insights about what my devices are doing which I never knew before. But you'd expect all that, especially as I over-compensated on APs!

What I didn't expect was how easy it would be. I was admittedly daunted when I first started investigating this setup what with the Security Gateway and the Cloud Key and other concepts I'd never had to consider in the home before. But I wouldn't do it any differently and if you're reading this and thinking "It's nice, but it's pricey", ditch one of the switches off my config and stick with 1 or maybe 2 APs depending on the size of your place and call it done.

Just do it - you'll never look at home networking the same again!

Weekly update 7 (the island edition)

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Weekly update 7 (the island edition)

Apparently, after doing several weekly updates from different locations across the globe, last week's one from my home office was rather boring. Now maybe that was just a noisy minority saying that, I don't know, but I thought I'd test the theory and this week I headed out on jet ski to one of my favourite little island spots. I still got all the same content in, albeit from a more interesting spot. If you love or hate this approach, let me know, I'm still seeing what resonates and what doesn't in these videos.

Getting back to business, this week I cover off issues such as my experience with the Apple Store wanting my unlocked phone, the Creative Commons with Attribution license on HIBP, a new Pluralsight course I've done on Octopus Deploy and my very shiny Ubiquiti network bits. Here's the video - the sunniest one I've done to date!

References

  1. Apple's shoddy security example and unlocked devices (no way, no how am I giving my unlocked phone to anyone whilst they take it out of my sight)
  2. I put a Creative Commons Attribution license on HIBP (yes, people were really just sitting on top of the API and selling access to the service I built for free)
  3. New Pluralsight "Play by Play" course on Octopus Deploy (watch Damo and I go through the awesomeness that is Octopus Deploy)
  4. All the details about my new Ubiquiti network (you want this - trust me - you really want this!)

Bonus material

Because the island itself isn't enough awesome, I captured this on Periscope as I was heading over:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

I've had this blog post in one form or another of draft for several years now. I hesitated to complete it, in part because at the best of times cultural observations can easily be misinterpreted and also in part because of the role I had in working with many outsourcing vendors across Asia. Whilst the former hesitation has in no way changed, the latter has and I think it's a genuinely interesting topic worth sharing, particularly before my outsourcing memories fade too far.

One thing before I begin: these are opinions based on personal observations. I'll qualify them wherever possible and articulate my experiences as far as I can but they remain just that - my experiences. I'll also be as culturally sensitive as I can but ultimately I'm still going to comment on cultural traits. It'll resonate with some people and it will upset others but I hope that amongst all that, there's some genuinely valuable insights.

Here's what I've learned over many years of outsourcing software projects to India, China and the Philippines.

Who am I to talk about offshoring / outsourcing / other cultures?

In what is now a former life, I spent a great deal of time sending work overseas. In fact, I spent quite a bit of time over there myself in some of the places I'm going to talk about. That former life was as a software architect for Pfizer and in the last half dozen years of my 14-year tenure, I had responsibility for software architecture in the Asia Pacific region. For those not from this corner of the world, Asia Pacific is a sizeable chunk of the globe:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

The region covers everything from Pakistan in the west to obviously China, over to Japan and down to my corner of the world in Australia and New Zealand. This is a region which is home to more than half the world's population, 2.7 billion of whom are in China and India alone. The size, population and diversity is enormous.

Getting back to Pfizer, whilst I was originally brought on to build software, over the years the organisational strategy for the region changed until they reached a rather simple conclusion: outsource all the things. That was the standard practice and with this change, my role evolved to one of workings with vendors rather than working with developers. I got involved in projects ranging from basic brochureware websites to large clinical research systems, mobile apps to POS terminal implementations and all sorts of supporting initiatives such as iPad rollouts, organisation wide operating system and browser upgrades and a raft of other internal "enabling functions". It was broad.

Over the years, I worked on what must have totalled hundreds of projects with dozens of different vendors, particularly in India, China and the Philippines. It's those countries and indeed outsourcing observations in general I want to talk about here. Call it part of my post-corporate therapy, if you will :)

Just one other thing before I start talking about cultures as well: I lived most of my teenage years in Europe and Asia, 3 of them in Singapore while finishing high school there. I lived amongst, went to school with and spent a lot of time socially with the predominant 3 cultures that make up the island nation, namely Chinese, Indian and Malay. I chuckle these days when people suggest that because of my demographic I don't know what it's like to be in a minority group. Yeah, try being a 6' 5" blonde Caucasian in an intensely nationalistic small Asian nation and see how much you stick out compared to everyone else! Anyway, the point is that the observations I'm going to make in this post aren't those made by someone with only a traditional western background to compare experiences to, far from it.

Why outsource?

Let's start with the promise of outsourcing: People are expensive and you've gotta keep a lot of them on hand to build software products and then have them ready to support them. Whether they're working or not, you need to pay them and they may not even have the specific skills you need on each and every project which may mean going out and hiring even more people.

The promise of outsourcing is the human equivalent of elastic cloud scale; you use what you want when you need it and only pay for what you consume. A project comes up so you flick it to a vendor with the resources to do the job thus "unburdening" your own org of needing to keep those heads on staff.

Now to be clear, outsourcing alone doesn't mean sending it offshore, that's another dimension all together. Let's talk about the value proposition sending work to another country presents.

Why offshore?

The single biggest value proposition of offshoring is precisely the point this ad makes about labour costs:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

Australia is expensive. America is expensive. The UK is expensive. I pick those 3 because it's where the bulk of my readership comes from and if we look at the average wage by country (you'll always get different figures on this, I'm going by the World Bank's data), they all do exceptionally well compared to the rest of the world:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

Now, much of the offshoring rationale says "Hey, why are we paying expensive developers in these countries when we could be paying cheap developers in other countries to do the same thing?!". Let's take those 3 very affluent countries and sit them next to the 3 I've had the most experience offshoring to:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

That's a remarkable difference. Now to be fair, particularly in emerging markets like these 3, the averages are brought down significantly due to large swathes of rural areas and significant portions of the country living in poverty (a quarter of the Filipino population lives below the national poverty line) and the disparity between white-collar roles here and over there is less. In my experience, it was usually more a factor of 4 or 5 times the hourly rate in the expensive markets but the premise that there's a huge wage disparity still holds true. The point is that it makes for a very shiny ROI when put into a management PowerPoint deck.

So that's the basic premise of outsourcing and why organisations like to send the work to other countries. I'll delve deeper into that later on, but let's get onto the three locations in the title first.

India

India has become the canonical offshoring location for organisations all over the world. In fact, I dare say the very name of the country has become synonymous with offshoring and the phrase "Indian Outsourcing" is ingrained into many an IT manager's psyche. There are some good reasons for this:

  1. The British colonial history means they have very good English skills
  2. They have a huge population - the world's second largest with more than 1.3B people
  3. They've invested heavily in business process outsourcing (BPO) and high tech jobs

They also have some seriously big players in the outsourcing game: Tech Mahindra, Infosys and Wipro to name just a few. I've had experience with all of them across many projects as well as with a bunch of smaller players. There were some interesting cultural trends that really stuck out at me and again, these are my observations but they're ones made from many years of experience. (India is also the one country discussed here that I didn't visit personally.)

Many of the developers in India are produced en mass via intensive training programs focusing on a discrete technology. Now of course we have universities and other tertiary avenues available in this part of the world too, but what particularly stuck out at me in India was the niches the developers would fit into. An example of this is Google's recent news about training 2 million Android developers in India - that's a huge number of people to train around a very specific stack. What I'd typically find is people would specialise in one discrete part of a software component and that's all they'd do. Now of course we also have that in Aus or the UK or anywhere else, but we also have the "full stack" developer and they're a much more common beast over here.

The way this would manifest itself in projects is that you'd end up with a lot of people involved. I'd look at estimates for work and quiz the account manager: "Why are there so many roles represented here?" (work would usually be broken down by roles and their costs). There'd always be junior people and senior people and people who specialised in the API and others to do the website and others to do the DB and so on and so forth. Now to be clear, there are projects that require this due to their sheer size but the same pattern would repeat across even small units of work. It would get to the point where I'd joke about the separation of duties extending to the point where someone would write the open tags in a website and someone else would write the close tags! This has a cost that is far more significant than just the obvious people time too, something I'll delve into more a little later.

The other really stark observation about India (far more so than the other countries I'll talk about) is the rate of churn. People were always leaving. We've certainly seen the traditional philosophy of company loyalty eroded here over the years, but over there it was unusual to see someone sticking with the organisation for more than a year or two. I suspect a large part of that was the scramble up the corporate ladder and hopping to a better opportunity somewhere else can accelerate that. And really, who could blame them? But the churn meant for a lot more friction on projects and that's just something that never appears on a line item in a proposal. I'll come back to that later on when I talk about the fungibility of developers.

Another pattern I found time and time again when outsourcing to India was that they'd want really detailed documentation. This is always going to be a contentious issue and there are many different views of how much should be done under what circumstances. But more so in India than the other two locations I'll talk about shortly, detail was important to them. There were many occasions where we would make assumptions that a feature requirement clearly implied certain things only to later discover they was deemed "out of scope". Now that can happen in any project in any part of the world, but it was extremely prevalent in India.

To round out on a strength, what seemed to work best in India was units of work that were discrete in scope (they were fairly fixed), well documented and followed patterns that the vendor was already competent in. I think a lot of it has to do with that "developer factory" mentality where there's a lot of people who are trained in very specific things and if you can task them with those things then that's your "happy path".

China

I spent a lot of time in China and it's a fascinating place for many reasons. For example, you've got 721 million internet users there so we're talking 2.5 times more people connected than in the US. But it's much more significant than that too because almost 90% of the US is already on the web whereas only just over half of China is. But for those who are on the internet, it's not the same web as we know in other parts of the world. Thanks to The Great Firewall there's no Facebook, no Twitter, no YouTube and very onerous laws around personal privacy and anonymity. It means that China has an internet ecosystem all of its own and here's a stark representation of that: India has 462 million internet connected people and is the 6th most represented country on this blog with 4.04% of all visitors coming from there this year. China - with one and a half times as many connected people - is the 49th most represented country here with only 0.21% of my traffic. (Note that these figures are from Google Analytics and could be skewed due to Chinese blocks on Google, but every other metric I have from Cloudflare stats to anecdotal evidence all points to a near non-existent Chinese audience.)

Much of the promise with China is that you've got a huge population of rapidly urbanising people that are getting high quality educations. They're a very tech-savvy nation and we're seeing them go from strength to strength on that front with everything from emerging (and now dominant) hardware providers such as Huawei and Lenovo (my personal favourite for PC gear), to services such as Alibaba which is now well and truly on the world stage. If you're into tech and not watching what's going on in China, you're missing what will probably be the world's most significant source of technology innovation and growth for decades to come.

One thing that makes China difficult to work with though is English literacy. They don't have the colonial background that India does and because they're such an insular country so reliant on their own Chinese language services, exposure to English is far more limited and clear communication can be tricky. It means that you're pretty much always going to be at least one layer of indirection away from those actually doing the work; there'll be an account manager with (some) English skills and they'll translate for the tech folks. Now often this would be the case anywhere in the world, but it means that when you really need to have a detail discussion it is literally Chinese whispers and a lot gets lost in translation.

Consider also what this means for code quality. Functionally, the code itself may be just fine but we're talking about people with little or no English language skills writing English syntax (no, in case you're wondering, you don't write HTML tags, JavaScript calls or C# syntax in Chinese)! Consider what that means for the quality objective of self-documenting code:

self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge

In other words, if your code clearly describes what it does then it will be easier to read and easier to maintain, not just for the folks who wrote it, but for those who come along in the future. This is also particularly relevant when we talk about documentation within the code itself or as external artefacts of the project. Now that may not be a problem if forever and a day you commit to that vendor, but if you ever want to take the code back yourself or shuffle it between other cultures then you've got a serious issue. I found it particularly difficult reviewing code written by folks in China for precisely that reason.

Keep in mind that language barriers extend to the UI as well; the same people that may not have much of a grasp on English when they're writing code are also writing labels and headings and all sorts of other externally facing content in the app. That's particularly troublesome in a place like China and it meant for a lot of unexpected (and un-budgeted) revisions, but even in India there are many cultural nuances with the English language. Again, it's the sort of thing that doesn't normally appear as a line-item on a proposal but it definitely has a cost. Browsing through my phone for Beijing pics, this one stuck out at me as it perfectly illustrates the sort of language that led to lots of to-and-fro when outsourcing to China:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

Coming back to Chinese censorship for a moment, IMHO this still rates as a "high risk" factor for Chinese outsourcing. The risk can manifest itself in various ways ranging from people in China not having access to reference resources (my blog sometimes wasn't accessible because it used to run on Google's Blogger service) to all of GitHub being blocked. The unpredictability of the government's censorship regime is worrying - you should be worried about it if you're doing work in China and yes, many people run VPNs anyway but they're regularly blocked and at best they're "grey" when used to circumvent The Great Firewall so you have to ask yourself: do you really want to be asking your partners there to circumvent government controls in order to deliver work for you?

One of the practices I always pushed all vendors for was proper work breakdown structures in their proposals. The rationale was that if we're going to build an app with, say, 20 features then I want to know how much each one is going to cost. I can then take that breakdown, go back to the business and have a discussion about how important each one is and if there were opportunities to reduce cost by discarding lower value features. It would also allow me to sanity check the proposal and challenge anything I considered to be excessive. But try as I might, in China in particular I just couldn't draw vendors to this way of representing costs. They'd give me breakdowns but it would be how many juniors and seniors and project managers and testers and how much they all cost. To my mind, that was irrelevant because there was nothing actionable I could do with it. But there was just something within the way multiple different vendors across China worked that couldn't get me information in this way and it usually just boiled down to "here's the cost of the whole thing".

However, by having visibility to the man hours for the entire project as well as the total cost, I was able to get much better insight into the effort invested to build the solution. What I consistently found was very high man hours compared to estimates from vendors back home. In fact, I remember one case quite clearly where the project's dollar estimate was identical to a local vendor in Australia yet the man hours were an order of magnitude greater. Yet even then, there was organisational push-back on the local vendor due to their high hourly rate (the rationale for that is best left for when I've completed my post-corporate recovery program...)

Actually, just tangentially whilst I'm on this point, if you're going to offshore the work to somewhere like China then do exactly this: get estimates from local vendors as well, compare the estimated effort and if you see major disparities then push back on the higher numbers. Low hourly rates only make sense when you're comparing a similar number of hours!

Lastly, China is now getting expensive. I wished I could have seen percentage increases in my salary anywhere near the what the guys reporting to me in Beijing were getting, which was well into the double-digit percentages whilst I languished at somewhere around the CPI mark (usually just under 3%). Because of the rapid wage increases in China, according to Glassdoor you're looking at about $25k/y for a software developer in Beijing and only about $7k in Mumbai. That's a very significant difference and it certainly wasn't that multiple when you go back 5 years or more.

On the plus side for China though, one of their strengths (particularly over India) was the ability to get down to work with minimal documentation. Now of course your level of documentation is going to differ by project and indeed by organisation, but in a case like Pfizer's where technology folks were viewed as part of the "burden" they were trying to release, chances are it's not going to be particularly robust. There's a real "can do" attitude in China where they'd pick up anything and have a go at it. This can be risky as well in that my experiences were they'd say "yes" to anything (and that can actually mean "no" in a high-context culture, more on that later), but in general I found them very accommodating which is a big plus.

I'd be more inclined to outsource to China where requirements were more fluid and there was less formality required. The proviso would be that I was happy to stick with the Chinese vendor for a longer period of time as there was a greater friction involved in transitioning projects due to the language barriers.

The Philippines

Had I stayed on at Pfizer, the Philippines is where I would have loved to setup an in-house dev shop (I know, developers are a "burden", but bear with me). I spent a lot of time in Manila and I think I can pretty confidently say that Filipinos are the friendliest bunch of people you'll meet anywhere. Their English language skills are also exceptional (it's one of the two national languages next to Filipino), courtesy of a long history of American settlement. You have zero problems anywhere in the capital being understood and conversing openly and the same goes for discussions with outsourcing vendors.

One thing I've noticed in Australia over recent years is a really clear shift to the Philippines for call centres where previously you'd have been speaking to someone in India. For example, every time I call my telco I end up speaking with a Filipino and I suspect it's a combination of their strong English skills, friendly dispositions and emerging tech sector that's driven the shift. Whilst India has been in the BPO game for many years, the Philippines seems to have become prominent only more recently and that's reflected in the salaries as well (they're much closer to the Indian levels than the Chinese ones).

I mentioned poverty in the Philippines earlier on and it's hard to escape it. You see it everywhere, particularly as you drive through ramshackle housing on the way to the airport.

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

This has an impact on the tech scene as well and one of the starkest patterns I saw outsourcing there was their strong affinity to PHP. This isn't intended to be a derogatory comment about PHP; the LAMP stack is really appealing to folks who don't have much cash. But conversely, the Microsoft stack that I worked in and we were increasingly unifying around in Pfizer was simply much harder to find competence in. You'd go to a vendor and their default position was "Yeah, we can do that in PHP and MySQL". Although we did find vendors that could work with Microsoft's technologies, I always felt it wasn't quite the first-class citizen it was in other countries and that left me just a bit wary about how well-equipped they were to support it.

Cost wise, the Philippines was consistently lower certainly than China and regularly than India too. The economics of each of those countries mean it will likely stay that way for some time too.

In many ways, the Philippines was the best of all worlds; I just mentioned cost then you combined that with the best English skills of the lot, overtly friendly nature and that same "can do" attitude I observed in China and it makes a lot of sense. But that's not to say it's an easy solution and that I'd immediately outsource all the things there either. There are other fundamentals you have to consider about the very foundation the principals of offshoring are built on. Let's take a look at those now.

The economics of offshoring are changing

The economics of offshoring are very heavily based on the wage disparities between the country doing the offshoring (for example, the US) and the country being outsourced to (for example, China). But the economics that made sense in the past are very different to the economics today and this picture speaks volumes:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

This is data from the World Bank and shows that massive disparity in wage growth between the two countries. The gaps are narrower between say, Australia which weathered the GFC exceptionally well due to our resources boom and a country like India which is yet to see the same rate of development as China. But the disparity is still there and the point is simply this: where there once might have been an eleven-fold difference in US and Chinese wages in 2010, it declined to a seven-fold difference in only 5 short years. At what multiplier does it no longer make sense to offshore? Because trend wise, it's just a matter of time until you're at 6 or 5 or who knows where from there.

Now again, we're looking at average salaries across a country like China which still includes many rural areas and the difference is going to be less stark once you're in Beijing or Shanghai, but the trend is the same. Offshoring often means playing a long-game where you really need to commit for many years (I explained some of the reasons why above) and the risk you run is that the economies will change fundamentally in that time.

Vendors don't build software, people build software

One of my personal bugbears is the assertion that a non-sentient being is writing software for you: "We used XYZ vendor on the last project and they were awesome!". What they really mean is "When we used XYZ vendor they had some awesome people doing the work". Now that may sound semantic but the difference is really, really important because people are a very fluid commodity, especially in a place like India.

I was on a project where a developer in Mumbai was doing the lion's share of the work (I'll call her Avni for the sake of example). Anyway, we invested quite a bit of time and money skilling Avni up in a particular technology with which she had familiarity but not the level of strength we would have liked. That all went well until one day... no Avni! With the benefit of hindsight, I suspect she'd headed off to have a baby which in my experience, you normally have a number of months' notice when this is going to happen. However, the message we got at the time was that she was sick but don't worry, here's another Avni for you, just like the last one!

Well before this event happened, I wrote a blog post about The commoditisation of the coder and it stands true to this day. Developers are not a fungible commodity, that is you cannot simply replace one with another identical item and expect them to perform the same. The kid operating the cash register down at your local store - now he's a fungible commodity because if he buggers off you can just drop in another kid and they can almost certainly operate the till with the same level of professionalism.

In Avni's case though, she'd built up knowledge about the project she was working on. She knew why we'd made certain design decisions, she understood the business objectives and how the project owner had moulded the requirements over time. She had the training - not the role - and it walked out the door with her. I've regularly seen assertions from offshore companies that we should be confident in their ability to simply replace a resource as easily as we would a light bulb in the kitchen. This is a dangerously incompetent belief that demonstrates a fundamental misunderstanding of what's actually involved in building software and if you hear it said, you should be asking some pretty serious questions in reply.

You can argue that a professionally run outsourcing company will have processes in place to minimise friction when people leave and equally, you can argue that people leave even permanent positions if they're working onshore within your organisation. All of this is true but it all begs for the question to be asked: when you outsource, are you putting your trust in an organisation that's resilient to people movements, or are you ultimately at the mercy of a few key individuals?

The fallacy of adding more people

One of the problems that plagues non-technical leadership (and there's a lot of them out there in technology leadership positions with genuinely very little idea about technology), is that they view work output as being a linear scale with people resources. In other words, add more people and things will be done faster.

This misunderstanding is exacerbated when offshoring due to the low rates. Your classic pointy-haired boss will look at the people cost and go "Wow, if they're only half the price of the local folks we can get twice as many of them and do the project in half the time"! This fallacy is known as Brook's Law and it's comes from Fred Brook's Mythical Man Month which anyone involved in delivering software should read at least once in their life. One of the guys I used to work with would regularly draw this analogy:

9 women can't have a baby in 1 month

Brooks talks a lot about the cost of connections in his book, namely that the more people you add to a project, the more connections you have. When there are 2 people there is only 1 connection; person A talks to person B. But when there are 3 people the number of connections literally doubles because A talks to B, A talks to C and B talks to C. 4 people is 6 connections, 5 people is 10 and it goes up rapidly from there. What it means is that due to the mechanics of how teams operate, 5 people in India at 20% of the cost is not the same price as 1 person in Australia, far from it.

Adding people costs time:

Offshoring roulette: lessons from outsourcing to India, China and the Philippines

This paradox is often neglected and particularly when the people are cheap due to offshoring, the temptation is there to simply load more onto the project.

Code quality

This is a particularly prickly topic and I know there will be many people with many strongly held views on this. My anecdotal experience - and I'll stress that I have no hard metrics to back this up - is that code quality suffers when you outsource from a market like Australia, the US or the UK and you send the work to somewhere low cost. Let me articulate what I believe the drivers for this are:

Firstly, in my experience (there's that caveat again!), there is a much greater desire for the folks building software products in low cost markets to climb the corporate ladder. I mentioned the rate of churn in India as people jump between jobs looking for that management role and certainly I saw the same thing happen over and over again in China and the Philippines too. Smart technical people wanted to "progress" by stopping doing the things they were doing well in order to become managers. Now that happens everywhere, no doubt about it, but whereas here or in the organisations I regularly visit in the US and Europe, there are many people who genuinely want entire careers building products. In the markets I'm discussing here, they almost exclusively want to move to a position where they're instructing others. There just seems to be less interest - less passion - in code and inevitably that's going to impact quality.

Secondly, the outsourcing industry is big business that's always looking to increase output and decrease costs; they are very bottom line focused. I mentioned Tech Mahindra before and they've got well over 100k people on staff there. These companies are ultimately just factories producing a product and they're working ruthlessly to make the process more efficient. They're doing that by having a very large number of very cheap resources and as the saying goes "you get what you pay for".

Finally, the impact of quality is not necessarily clear and present at the time the project is delivered. Often it'll take months or even years to realise the cost of the "technical debt" you've been handed. The Chinese language example earlier on is perfect; after the vendor has built the product and handed it over, what's it going to cost when you want to add features later on and the code is illegible? The longer game in a software product's lifecycle is often neglected as those holding the purse strings rush towards immediate term savings.

Here's a perfect example that embodies these three observations: I never saw a single unit test written for a project. Not one. I would push vendors on what their structured testing process was and it always boiled down to humans manually making sure things work. The developers had usually never even heard of the concept and writing extra code would cost more money, it was that simple in their minds. Of course this led to all the problems you'd expect from not having automated tests which is the third point above: the longer-term maintainability and costs of the project were terrible.

Do not underestimate the cost of cultural nuances

We're all aware that there are many different cultural norms across the globe that are often at odds with each other. In Western society, you'd blow your nose on a handkerchief if it was running yet in Asia this is seen as somewhat gross. Have a bowl of noodles in China and you'll be slurping away at it but that would be uncouth over here. They're minor cultural nuances though, there are far more serious ones that impact cross-cultural teams.

Someone sent me this video recently that was made by a couple of blokes from the subcontinent humorously explaining how to interpret the very Indian trait of the head wobble:

One piece of training I did at Pfizer which was genuinely interesting was around cultural awareness and particularly how many cultures communicate very implicitly. I like the way the video above embodies that; what we interpret as "yes" in Australia can mean "no" in India. It's often referred to as a high-context culture and it means that many things are left unsaid in the direct way that particularly those of us down here communicate. Instead, they're implied and that can lead to serious communication issues.

The problem when you put this in an outsourcing context - particularly when you're remote - is that it's very easy to have misunderstandings. Think about how many times sentiment or emotion is lost in written communication with your own culture and multiply that way out. These can be misunderstandings about costs, inclusions, features, bugs, timelines - you name it. I'd see it happen day in and day out and again, it's something that just doesn't feature as a line item. I mean imagine it:

Cost of cross-cultural teams not understanding each other: $20,000

It never happens! Perhaps it slots into the amorphous blob that is "contingency costs" or something similar, but it's enormously difficult to pin down to a figure.

When outsourcing offshore works

One thing I want to be very clear about here is that I'm not saying sending work overseas doesn't work, far from it. What I am saying is that there are certain challenges that are rarely considered in the ROI and they can have a fundamental impact on the success of outsourcing.

In my post-Pfizer days, I spend a lot of time going into companies around the world and training their developers. When I spend time with teams, the conversation often turns to what work is being done offshore and as much as there's all the sorts of bad stories you've read about here (and possibly experienced yourself), there are also some very positive experiences.

Where things seem to work especially well is when the model is blended; it's not "outsource all the things", it's "let's get our guys working with their guys and all doing what we do best". It sounds profoundly obvious when I put it like that and it genuinely does work well for many orgs. They retain knowledge of the IP within their onshore teams, get those folks facing the customer and doing a lot of the design work then they ship the work which is most suitable for "factory" style development to India.

I would also avoid any models that put dependencies on individuals within the outsourcing org. The Avni example above highlights why this is and you have to except that people turnover is far more likely to happen in an environment like that. Instead, focus on scenarios which lend themselves to greater numbers of people doing more cookie-cutter work then if one or a bunch of them transition out, you're not losing too much project knowledge when it happens.

Often, I've spoken to people that have sent team members over to these far-flung lands so that they could get involved with the offshore people. Not just ingrain them into the organisation's processes, but go and drink beer with them and create a genuine bond. This is a very different attitude to "let's just throw it over the fence and unburden ourselves" and by all accounts, it seems to work well.

And finally, we've got some great tooling these days to make the process of sending work elsewhere more predictable. Enforceable code quality checks, build services, DVCS and all manner of other mechanisms that can remove unpredictability from the process. More than ever, these are essential when you need to build predictability into what can otherwise be a less predictable working model.

Summary

More than anything I've said here to this point, here's what's really important: if you're looking at hourly rate as a metric for outsourcing success, you're doing it very, very wrong!

Outsourcing to other countries is often a decision made by those who have the least understanding of how the mechanics of software development actually work. They're drawn in by the attraction of low rates and their measure of success usually stops at what the immediate term cost is and if the requirements are delivered. It's much harder to factor in the longer terms costs of things like maintainability, availability and of course, security. But when budgets and objectives focus so heavily on the immediate term, is it any wonder that those simultaneously controlling the money whilst not understanding the technology make ill-informed outsourcing decisions?

Cheap offshoring is the McDonalds of the software industry; it's a mass-produced product that whilst highly in demand, isn't very good for you if you do it all the time. Treat it as part of a balanced diet, make informed decisions and don't be blinded by the low sticker price without thinking about the real cost you're going to pay.

This was a really highly-requested blog post that received a lot of positive responses when I recently asked about writing it. It wasn't an easy one to pull together whilst trying to be both honest and culturally sensitive and I do hope I've done it justice. As with all posts, I'd love comments from people that have had their own experiences - both good and bad - and are willing to share them here. Thanks for reading.


The Capgemini leak of Michael Page data via publicly facing database backup

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

The Capgemini leak of Michael Page data via publicly facing database backup

A couple of weeks ago I wrote about the leak of data from the Red Cross' Blood Service down here in Australia. Many people were shocked that you could have a situation where troves of personal data were obtainable not through any advanced hacking technique, but by merely downloading a database backup from the website it had been published to. It's literally that simple and it's shockingly common.

I've had this blog post in mind for a little while and the basic premise is that as much as we're working through really creative new defences against attacks, we're also still alarmingly bad at the basics. For example, we're doing cool things with security headers like content security policies, but then we've still got SQL injection all over the place. We're making great inroads with getting SSL on all our things (particularly due to the likes of Let's Encrypt and Cloudflare), but then we're publishing our databases to publicly facing websites. And that brings us to Capgemini and Michael Page.

On Sunday 30 (all times by my Australian clock), I was contacted by someone who provided the following screen cap:

The Capgemini leak of Michael Page data via publicly facing database backup

It was the same individual who located the Red Cross data and the same story in terms of discovery an underlying risk on the server end; publicly exposed website, directory listing enabled, .sql files exposed. This time, the data was identified as belonging to Michael Page, the British-based (yet very global) recruitment firm. Per the directory listing above, he'd identified backups from a variety of different global assets totalling several gigabytes.

He sent over a file indicating it was sourced from the UK as a proof. It was a 362Mb compressed file which extracted out to 4.55GB. Assuming a similar compression ratio, the files in the directory listing above would total well over 30GB of raw data which is a very large set of data to leak publicly.

The file I received included table names indicating that as with the Red Cross, this was the output of mysqldump and in this case it contained table names pointing to Acquia, a hosted Drupal platform. Further info followed by way of screen caps indicating various other fields and data snippets that you'd expect people to provide a recruitment company:

The Capgemini leak of Michael Page data via publicly facing database backup

This is just one of many tables and there was a degree of sanitisation that nulled out some fields; not every piece of data existed for every record. I'll refer to Michael Page's disclosure a little later on, but what I will say here is that there were over 780k unique email addresses in that one file and plenty of data relating to candidates' jobs such as cover letters relating to their experience.

Moving on, in the early messages I received from the individual, one in particular stuck out:

michaelpage is capgemini

This changed things somewhat because Capgemini is a multinational consulting and outsourcing firm with 180k people across 40 countries. As the messages flowed, the story that unfolded was that whilst it was Michael Page's data, it was Capgemini that had exposed it. Again, the similarities with the Red Cross continue with their data also having been leaked by a partner. Coincidentally, I had a contact within Capgemini so I reached out to him on Monday 31 with the preface of "you're probably about to have a very bad day". It turned out to be more like a bad week as they worked to understand the scope of the leak and remediate the underlying risks.

As with the Red Cross situation, there were numerous failings which led to the exposure of this data. I won't go into those here, some of them are obvious and others are up to Capgemini to choose how transparent they wish to be. Also, as with the Red Cross the individual who reported the leak has deleted the data he obtained and every trace of the backup I had is also gone. Of course, these are the instances of the data we know of but the commitment is the same as the last time: all known copies of the data have been removed.

This obviously came as a shock to Michael Page, but it was also a shock for Capgemini, especially given the nature of some of the organisations they provide services to:

The Capgemini leak of Michael Page data via publicly facing database backup

It's a big company and those of us that have spent time in organisations of serious size (particularly those that are globally distributed), understand how there can be pockets that follow, shall we say "quaint" approaches to security. And that's another really important observation in all this: Capgemini has an annual revenue of €12B but it hasn't stopped a series of egregious mistakes being made. Security flaws simply don't discriminate by organisation size.

I've held off posting this story until impacted parties could begin being notified which clearly, has now happened:

A final comment on these incidents: this is a perfect illustration of where companies need bug bounties. These were such low-hanging vulnerabilities that had there been even the slightest inkling of incentivisation, they would have been found very quickly and reported ethically via a channel that researches could trust. Check out Bugcrowd as a way of managing the entire process and look at a case like their bounty program with Tesla. Ask yourself this: would these incidents be making news if they had people looking for these risks early on? I highly doubt it.

Weekly update 8 (the backyard edition)

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Weekly update 8 (the backyard edition)

Let's get this out of the way early - I did not shoot this video on a green screen! When I first watched it, I couldn't believe how amazing the picture quality was and the first thought I had when I saw it was the green screen one. The new iPhone 7 Plus is a major part of that, but I got out early when it was quiet and got the light just right too. I posted a still to Twitter earlier today and someone asked what lighting I use. Uh, "the sun", that is all.

Anyway, a bunch of things worth discussing did happen this week, here's the video:

  1. Tesco Bank got very hacked (almost certainly not phishing, but 20k people still had their accounts drained)
  2. Tesco has got previous form! ("Passwords are stored in a secure way. They're only copied into plain text when pasted automatically into a password reminder mail." Ah, the memories...)
  3. Skype sending Baidu spam (no, Skype wasn't hacked, people just choose crap passwords... then reuse them)
  4. Lessons from outsourcing to offshore developers (I'm honestly surprised at how well this has been received, especially given I make a lot of comments about cultural traits)
  5. Capgemini leaked a heap of Michael Page data (no, it's not a "hack" when you publish tens of gigabytes of your customer's database backups to a publicly facing website)

New Pluralsight course: Exploring the Internet of Vulnerabilities

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

New Pluralsight course: Exploring the Internet of Vulnerabilities

I've done a number of "Play by Play" courses for Pluralsight this year on a range of topics including Social Engineering with my mate Lars Klint, Deconstructing the Hack with my mate Gary Eimerman, Modernizing Your Deployment Strategy with Octopus Deploy with my mate Damo Brady and the latest one that's just landed, Exploring the Internet of Vulnerabilities with my mate Niall Merrigan. Lot of Play by Plays, lot of mates and frankly, that's what makes all these courses work; they're all friends I spend time with both in a professional capacity and in a drinking beer capacity. The Play by Play courses are all about the dialogue between two people talking through technology concepts and the chemistry is really important.

You may remember Niall from such tweets as looking like an eager puppy with his head out the window whilst on the back of my jet ski:

But what you may not know about Niall is that he does some excellent security talks, particularly around finding vulnerable things facing the internet. I've learned a lot from him personally and have really enjoyed (or is that "been terrified by"?) what he's shared around services like Shodan. In fact that's where we start the course, here's what's in there:

  1. Finding crazy connected things with Shodan
  2. Identifying and exploiting vulnerabilities using services such as the Google Hacking Database and sqlmap
  3. Wifi Pineapple, locating SSIDs and "The Dark Web"

It's a 58 minute course that's very easy to watch because ultimately, it's just 2 guys talking. There's screens as well, mind you, so you don't just spend the whole timing looking at us, we do dive in and demonstrate things as well. We really enjoyed making this course and I hope you really enjoy watching it.

Play by Play: Exploring the Internet of Vulnerabilities is now live!

Disqus' mixed content problem and fixing it with a CSP

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Disqus' mixed content problem and fixing it with a CSP

I write a blog with a lot of security things on it so understandably, it upsets me somewhat when my site throws security warnings:

Disqus' mixed content problem and fixing it with a CSP

I'd had a number of people report this and indeed I'd seen it myself, albeit transiently. Diving into the console, I found the source of the problem:

Disqus' mixed content problem and fixing it with a CSP

Who the hell is Circulate?! And what are they doing in my blog? Let's find out:

Disqus' mixed content problem and fixing it with a CSP

Right... I don't have any ads on my blog these days (just sponsor messages) so there shouldn't be any third-party monetisation going on. However, what I'd noticed about this issue is that it always seemed to occur when loading Disqus comments and certainly they do some ad things, albeit ad things that I'd explicitly disabled on my blog.

I decided to capture the entire page load in Fiddler so I could track down the source of the issue. A quick search in the results for lore.circulate.com located the offending request:

Disqus' mixed content problem and fixing it with a CSP

But of course, this is the parent request, that is the resource which is attempting to load the Circulate data. The request you're looking at here though is from pippio.com and when you look at what they do, you're redirected:

Disqus' mixed content problem and fixing it with a CSP

Alrighty, sounds very marketing like again. But per the earlier image in Fiddler and as explained by the referrer header, this resource was loaded by Disqus ergo they're to blame. But there's an easy fix for this:

The upgrade-insecure-requests CSP directive here does just what it sounds like it does - upgrades the request to be secure and forces it over the HTTPS scheme. However, here's what happens when you make a secure request to lore.circulate.com:

Disqus' mixed content problem and fixing it with a CSP

But I don't really care because the content wasn't loading anyway due to the browser protecting users from loading insecure things. All that's happening now as a result of this is that the content still isn't loading but there's no visible browser warning, just some console output:

Disqus' mixed content problem and fixing it with a CSP

The lesson here is that firstly, you get a lot of stuff loaded from other resources every time you embed external services like Disqus. I had no idea about Pippio or Arbor or Circulate, I just knew I was using Disqus. But secondly, the other lesson is that you can use a CSP to fix your things even when the service provider has screwed up.

And that's the final lesson, this time for Disqus: fix your broken implementation!

My weekly updates are now available as an audio podcast

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

My weekly updates are now available as an audio podcast

I've been doing the weekly updates for a couple of months now and by all accounts, they've been very well-received. One of the early pieces of feedback I got though was that I should also publish them as an audio podcast so that people can listen to them in the car or while doing whatever else it is that people do while listening to syndicated content via the likes of iTunes.

Yesterday I finally got around to getting all this setup by using the OmnyStudio service, an Aussie offering which makes the process of publishing the feed and hosting the content a breeze. I then published it all to the iTunes store and there we have it - a podcast!

My weekly updates are now available as an audio podcast

There are two ways of consuming the podcast:

  1. Via iTunes: https://itunes.apple.com/au/podcast/troy-hunts-weekly-update-podcast/id1176454699
  2. Direct from the RSS feed on OmnyStudio: http://www.omnycontent.com/d/playlist/1439345f-6152-486d-a9c2-a6bf0067f2b7/3ba9af7f-3bfb-48fd-aae7-a6bf00689c10/fde26e49-9fb8-457d-8f16-a6bf00696676/podcast.rss

It's a no-brainer for me to push and audio version of each week's video edition so I'll now maintain both of them and interested followers can pick whatever works for them. I hope this is useful for those we were requesting this feature, do keep the feedback coming if there are other suggestions.

Edit: I should have included this originally because the question was bound to be asked, but the podcast isn't in Google Play Music because they don't like Australians. Or British people. Or anyone who isn't American, apparently:

My weekly updates are now available as an audio podcast

What wisdom has driven this decision is entirely beyond me, hopefully Android users can pull direct from the OmnyStudio RSS feel listed above in point 2 instead. (And don't forget to tell Google how stupid this is...)

Data breach claims are often poorly researched, unsubstantiated and ultimately fake

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Data breach claims are often poorly researched, unsubstantiated and ultimately fake

I have multiple Yahoo data breaches. I have a Twitter data breach. I have Facebook data breaches. I know they are data breaches from those sources because people told me they are, ergo, they're data breaches. Except they're not - they're all fake. Problem is though, fake data breaches don't make for a very good headline nor do they give you something worth trading; for many people, it's not in their best interests to establish what's fake and what's not.

Earlier this year I wrote about how I verify data breaches and gave an example of Zoosk, the "data breach" that turned out to be anything but. It would have been easy for me to loads the tens of millions of records from the alleged breach into Have I been pwned (HIBP), got myself some press and sent out a bunch of notifications to (very confused) subscribers, but that would have unfairly implicated the company and frankly, that's just irresponsible.

The catalyst for this blog post was an investigation I did last week into an alleged data breach from dfb.de, a German soccer site. I was sent a file with several million records and one of the first things I did was look online for existing references of an incident. Vigilante.pw is often a good resource and I found it there:

Data breach claims are often poorly researched, unsubstantiated and ultimately fake

Someone else sent me through a list of breaches I might be interested in which also included it:

Data breach claims are often poorly researched, unsubstantiated and ultimately fake

It even turned up in a tweet sent by my mate Dez Blanchfield just a couple of days ago:

Popular belief as far as those dealing with the data are concerned is that DFB was hacked and there are millions of records floating around which people are now trading. But when you actually make an attempt to verify the data, a very different story emerges. None of the publicly available mechanisms I referred to in that verification post were checking out, so I started emailing HIBP subscribers who were in the "breach". Let me share their responses:

as far as I remember I am not using my email there

Not as emphatic as I'd like, but it was the beginning of a pattern. Here's another one:

I was never a user of dfb.de (Deutscher Fußball-Bund) as far as I remember

Yes, these are two completely different people despite the similar language used. Here's another one:

as said, i do not remember myself to ever registered/signed up of the site you mentioned.

And another:

this must be a mistake. The site does not ring any bells with me nor am I a soccer fan

Then this one:

I don't recall creating an account with the DFB

The last one articulates what probably happened very well:

no, I think this was made up by someone. I didn't have an account there and I never used such a password.

Someone else actually offered another means of verification:

a friend of mine has an old, still working account that you could use for verification

Yet as with every other check to date, it came up blank - the email address wasn't in the data. However, there were some responses that said the password snippet I gave them from the file could have been legitimate (I'd send a partial password, just a few letters). We often see situations where one breach is re-branded as another so there can be traces of legitimate data in there, but that doesn't necessarily prove the source of the incident and clearly in this case, the source was not DFB.

Until verified by the company itself or a trusted party, take claims of breaches with a grain of salt. Nefarious individuals often have vested interests in making headlines and misrepresenting events. Particularly when hiding behind the veil of anonymity, the potential ramifications of slander tend not to be something people worry about. That is a valid worry for any legitimate organisation or individual: unfairly claiming a company has security vulnerabilities can cause serious reputation damage and lead to legal repercussions as MedSec is presently experiencing. If someone is not prepared to put their name to a data breach claim, that's an immediate red flag.

When you see sensationalist and unsubstantiated headlines like If You Use One Of These Popular Internet Services, You Need To Change Your Password Right Now, treat them the same way as you should Facebook friends who post updates about something scary all in caps and encourage you to share it with all your friends: berate them appropriately and above all else, don't spread the FUD.

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

Let me make it crystal clear in the opening paragraph: this incident is not about any sort of security vulnerability on GitHub's behalf, rather it relates to a trove of data from their site which was inappropriately scraped and then inadvertently exposed due to a vulnerability in another service. My data. Probably your data if you're in the software industry. Millions of people's data.

On Saturday, a character in the data trading scene popped up and sent me a 594MB file called geekedin.net_mirror_20160815.7z. It was allegedly a MongoDB backup from August belonging to a site I'd not heard of before, one called GeekedIn and they apparently do this:

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

A bit of searching around suggested that they're based just outside Barcelona and appear to be ramping up a service for recruiting technology professionals. A bit more searching around showed a couple of MongoDB instances running in Poland:

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

The second IP address there is the one the website itself is presently running on so all the dots are starting to join up. But it wasn't until I found my own data that the penny dropped on what was going on. Here's what I found:


{
    "_id": "5cb692d1-8fd5-44d7-8639-e680fb2b30f41454836589580",
    "_class": "com.geekedin.domain.entities.mongo.EDeveloper",
    "scores": [],
    "libraryScores": [],
    "compactLibraries": [],
    "name": "Troy Hunt",
    "email": "troyhunt@hotmail.com",
    "location": "Sydney",
    "country": "AU",
    "city": "0fab4eb2-c5d5-459d-a7c5-c50b845c12da1448621413411",
    "score": 0,
    "scanCompleted": false,
    "socialNetworks": [
        {
            "_id": "https://github.com/troyhunt",
            "socialNetwork": "other",
            "url": "https://github.com/troyhunt"
        },
        {
            "_id": "troyhunt",
            "socialNetwork": "github",
            "url": "https://github.com/troyhunt"
        },
        {
            "_id": "http://troyhunt.com",
            "socialNetwork": "other",
            "url": "http://troyhunt.com"
        }
    ],
    "locationPoint": {
        "x": 151.20732,
        "y": -33.86785
    },
    "yearsOfExperience": 0,
    "otherLocations": [],
    "extraEmails": [],
    "locateTryFailed": false,
    "locateTried": true,
    "beingLocated": false,
    "remoteLocateTried": true,
    "blogFindTried": false,
    "contactIsPossible": true,
    "freelancer": false,
    "compactLibrariesComputed": false,
    "compactLibrariesBeingComputed": false,
    "followersFound": false,
    "emailAddresses": [
        {
            "email": "troyhunt@hotmail.com",
            "type": "primary"
        }
    ]
}

And there's the GitHub references. When I took a look at my publicly facing GitHub profile, I did indeed have my email address exposed as well as my location. But my profile is actually rather sparse when compared to ones like this:


{
    "technology": "06405e50-a3b3-471d-a50d-17fc2cb4a9181448621393946",
    "library": "56623985e4b0cab0586c4d09",
    "score": 0,
    "lastCompute": {
        "sec": 1449279293,
        "usec": 262000
    },
    "endorsements": 0,
    "version": "0.3-SNAPSHOT"
},
{
    "technology": "06405e50-a3b3-471d-a50d-17fc2cb4a9181448621393946",
    "library": "56623e9ae4b0cab0586cb344",
    "score": 0,
    "lastCompute": {
        "sec": 1449279293,
        "usec": 262000
    },
    "endorsements": 0,
    "version": "0.1"
},
{
    "technology": "06405e50-a3b3-471d-a50d-17fc2cb4a9181448621393946",
    "library": "56623ab1e4b0cab0586c64ab",
    "score": 0,
    "lastCompute": {
        "sec": 1449279293,
        "usec": 262000
    },
    "endorsements": 0,
    "version": "1.0.0"
},
{
    "technology": "06405e50-a3b3-471d-a50d-17fc2cb4a9181448621393946",
    "library": "56623ee2e4b0cab0586cb624",
    "score": 0,
    "lastCompute": {
        "sec": 1449279293,
        "usec": 262000
    },
    "endorsements": 0,
    "version": "0.2"
}...

This is just a small snapshot of a profile that is ultimately hundreds of times larger than my full one shown earlier. The data above exists as part of a much larger set within a "scores" collection, evidently as part of a profile of the user's technology use. I won't go into any more detail here because I am going to give you the ability to pull your own data shortly and you'll be able to see what's in there firsthand. The other reason I won't give any more detail on it here is that whilst this appeared to be data that was obtained from publicly facing GitHub profiles, it felt wrong to see it all aggregated together here like this. I wasn't aware of any implementation within GitHub that publicly exposed this information for the purpose of mass consumption (i.e. via an API), and frankly, by now it was starting to smell a bit bad...

Last thing before I start talking about disclosure: when I analysed the data set I found almost 8.2 million unique email addresses. That's about what I'd expect in terms of the number of accounts on GitHub, but there was a twist in that 7.1 million ended with ".xyzp.wzf". What I realised on closer analysis is that these represented all the GitHub accounts that had no publicly facing email address. For example, my mate Niall Merrigan's profile is bereft of an address so his email in the data set is represented as niallmerrigan@github.xyzp.wzf. Within that data set there were also 15k @bitbucket.xyzp.wzf addresses so although it made up only a tiny percentage of the overall data, clearly they were drawing from more than one source.

Moving on, given what I'd seen thus far I decided to reach out to GitHub directly as it was their data that was involved here, even if it was leaked from another service. GitHub has a great track record of handling security incidents, not just in terms of having had a lot of experience with them, but in the way in which they've dealt with them. There's been incidents like the 2013 brute force attack by 40k unique IP addresses and the (really rather creative) Chinese attacks linked to the Great Firewall. I got in touch with a contact there within hours of establishing what the data was and suggested that firstly, they probably want to think about whether or not GeekedIn's practices were considered acceptable use and secondly, if there was any messaging they wanted to communicate around this incident. My concern for them was that here was a very large amount of "GitHub data" (said in quotes because it is, but it didn't come from them), and I didn't want the source to be misrepresented. Particularly if data was circulating among traders, there was every chance people would start popping up and saying "hey, I've got the GitHub data breach".

The approach we ultimately took was that both GitHub and I have reached out directly to GeekedIn independently. It wasn't easy though - the Twitter account on their site looks dead and the contact form there appeared non-responsive. I attempted to find a contact via Twitter which didn't yield anything useful beyond pointing to details on their site and the antiliasoft.com website the copyright belongs to is returning a 404 from Google. Ultimately, I reached out to an email address on that same domain and got a response yesterday, acknowledging the incident and making a commitment to secure the data.

I asked GitHub for something I could share with this post and they provided me with the following:

Third parties frequently scrape public GitHub data for various reasons, such as research or archival purposes. We permit this type of scraping so long as any user's personal information is only used for the same purpose for which they gave that information to GitHub. Using scraped information for a commercial purpose violates our privacy statement and we do not condone this kind of use.

So it's pretty clear then that this isn't cool:

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

As someone in the data breach myself, I don't want my data being sold this way. And again, yes, you can go and pull this data publicly on a per-individual basis but the constant response I got from close confidants I shared this information with is that "it just feels wrong". And it is wrong, not just the scraping of GitHub in the first place in order to commercialise our information, but then subsequently losing it via a MongoDB with no password and now having it float around the web in data breach trading circles.

As of now, the data is searchable in HIBP. But this time I've done something just a little bit different and that's made the actual raw data available. Now I want to be cautious here because this is not something I normally do nor something I expect to do again any time soon. Normally there's no way I'd load source data into HIBP and I've been vocally critical of any service that does this precisely because breach services themselves can be pwned. Particularly any storing of plain text credentials is enormously reckless and unfortunately there are multiple services out there doing just this.

The difference here is twofold though: firstly, this is publicly exposed data. No, it shouldn't have been aggregated en masse and no, it definitely shouldn't have been leaked, but it is data that on a per-individual basis you can go and retrieve from anyone's GitHub profile. Secondly, a significant component of the audience in this incident are people that will understand what that big collection of JSON means and know how to interpret it. These are tech people - people just like me - and a collection of curly braces and JSON syntax is speaking their language.

There's a third reason as well and that's a bit of an exploratory exercise for me. I very frequently hear from people after they find themselves in a data breach that they want to know what information was exposed about them. I get this - they want to know exactly what anyone holding the data now knows - but it's not something I've ever done for the reasons explained above. This was an opportunity to do that without the risks involved with sensitive data and with an audience on my wavelength. I'm really curious to see how feedback on this goes and what (if any) value it poses to people.

Big caveat time: You can only see your own data and only if your real email address was in the dump. I know, we've established it's publicly facing data, but I still want to constrain it to the data owners themselves. The way I've done this is by using HIBP's existing notification service which already has a means of verifying email addresses. It works like this:

  1. Go to haveibeenpwned.com/NotifyMe
  2. Enter your email address and receive the confirmation email
  3. Click on the link in the email to verify your address

That last step then shows a page like this for me and for any one of the million plus people who had their actual email address leaked:

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

When I toggle the "show raw geekedin data" I get my record:

8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours

And that's it! Once again, you're only going to show up here if your email address was successfully pulled from GitHub, but I hope it's useful for the million plus people whose data was exposed somewhere they never expected it to be.


Weekly update 9

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Weekly update 9

Lots on this week and I'm very happy to have finally got myself organised and set up an audio podcast feed. It's getting a heap of downloads already so obviously, people did actually want it and frankly, I'm sorry I didn't get it organised earlier! That and much more in this week's update

iTunes podcast | Google Play Music podcast | RSS podcast

  1. Get these weekly updates as a podcast! (it's been requested since day 1 and I finally got around to doing it)
  2. Niall and I did a Pluralsight course on Exploring the Internet of Vulnerabilities (this is the one that should make people scared to go online...)
  3. Disqus screwed up and served mixed content so I fixed it with a CSP (the upgrade-insecure-requests directive worked wonders!)
  4. I'm a bit sick of false claims about data breaches (it's easy not to check your facts and hide behind a veil of anonymity whilst making slanderous claims)
  5. GeekedIn (probably) lost your GitHub data and you've (probably) never even heard of them (pro tip: if you don't put a password on your MongoDB, people will find it and they will take your data)
  6. Upcoming events (yes, it's travel time again)
    1. DDD Brisbane
    2. NDC London
    3. Visug Belgium
    4. ZION SECURITY Belgium
    5. Copenhagen workshop (no link yet, contact me for details)
    6. Ignite Australia
    7. RSA in the USA

Ad blockers are part of the problem

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Ad blockers are part of the problem

Earlier this year, I wrote about bad user experiences on websites and foremost among these were the shitty things some sites do with ads. Forbes' insistence that you watch one before manually clicking through to the story, full screen and popover ads and ads that would take over your screen after you started reading the article were all highlighted. Unanimously, we hate this experience.

Because the aforementioned experiences are shit, people run ad blockers and I get the rationale: if ads are going to do crap like this then let's ban them. Except then you get the likes of Forbes denying access to their content if you run them and you get into this nasty cycle of advertisers trying to circumvent ad blockers trying to circumvent advertisers. This is just not a healthy place to be.

A couple of months ago, I got fed up with ads too. I didn't start running an ad blocker though, I decided to make a positive difference to everyone's experience when they came to my site and I began offering sponsorship of this blog instead. The sponsor boils down to an unobtrusive line of text like this:

Ad blockers are part of the problem

Readers were happy because none of the shit they usually have to deal with when ads load was there. Sponsors were happy as they were getting prime real estate and heaps of exposure. And I was happy because not only was I giving people a much better user experience, they pay a lot more than ads too. In fact, since then I've not run a single ad - I've always filled every available sponsor slot. As best I could tell, everyone was happy. But it turns out that's not quite true...

Shortly after launching the sponsorship, someone pointed out that the sponsor message was being removed by ad blockers.

What. The. Fuck.

I get that ad blockers block ads because there's the extra bandwidth they consume, they're frequently a vector for malware and because frankly, they're obtrusive and detract from the viewing experience. But my sponsor message was none of these, what the hell was going on?!

I gave the ad blockers the benefit of the doubt and assumed that because I'd named a class "sponsor_block" and given an element a name of "sponsor_message" it was simply caught up in an automated process of filtering out ad-like content. So I changed things to instead refer to "message_of_support" and in my naivety, assumed this would fix what must surely have been a mistake. There were no more "false positives" as I saw them and the sponsor message again appeared for those running ad blockers. Everyone was happy.

And then it started getting blocked again. Someone recently pointed out that Adblock Plus was causing the message to be displayed so I installed the extension and sure enough, here's what I saw:

Ad blockers are part of the problem

This was no longer a false positive, I was convinced they were deliberately filtering out my sponsor. I delved a little deeper, and found that Adblock Plus uses EasyList which has an admirable objective:

The EasyList filter lists are sets of rules originally designed for Adblock that automatically remove unwanted content from the internet, including annoying adverts, bothersome banners and troublesome tracking

Yet when I drilled down into the EasyList definitions of content to be blocked, I found something that didn't meet any of those criteria:

Ad blockers are part of the problem

In other words, someone had deliberately decided that the sponsor I show in order to help support me financially - the one with no tracking or images or iframes or malware or other crap - was being consciously blocked. The highlighted line there is just one of more than 57k other examples in that file, many of which are no doubt nasty ads in the traditional sense we think of them.

Unfortunately, because EasyList is used across other ad blockers as well, the problem extends beyond one rogue extension:

Ad blockers are part of the problem

This is uBlock Origin and it was the final straw for writing this post after someone reported it to me on the weekend.

Now as it turns out, Adblock Plus actually defines criteria for acceptable ads, criteria which are entirely reasonable. For example, ads shouldn't disrupt the page flow by inserting themselves into the middle of the content:

Ad blockers are part of the problem

Ads also shouldn't consume too much space:

Ad blockers are part of the problem

This is good - any reasonable person would agree with all of this - yet my sponsor text comes nowhere near exceeding any of the criteria. Clearly, this is a mistake so I went ahead and filled out an acceptable ads application. That was now a couple of weeks ago and as of today, their false positive remains. Unfortunately, as best I can tell the process for blocking content involves no review whilst the process for unblocking errors like this require human intervention.

When I realised what was going on here, I was angry. I was suddenly sympathetic with Forbes and their decision to block people with ad blockers which is just wrong - I shouldn't be sympathetic with them - but I'm enormously frustrated at being penalised whilst trying to make a positive difference to this whole ad thing. I was being penalised for doing precisely what the likes of Adblock Plus say I should be doing!

So here's what I'm going to do: absolutely nothing.

I'm not going to rename elements or CSS classes in an attempt to circumvent their blocking, that's a vicious cycle that would only sap my time as I continued to try and circumvent an unjust process. Fortunately, sponsors pay me independently of any form of CPM such as ad providers rely on so it doesn't directly impact me, but of course I want my sponsor messages to be seen as that's why they're there in the first place. I could appeal to people to whitelist my site in their own instance of Adblock Plus or uBlock or whatever other ad blocker they're using, but I'd prefer to appeal to them to report this as an incorrectly categorised ad.

When ad blockers are stooping to the same low level as advertisers themselves are in order to force their own agendas, something is very, very wrong. Deliberately modifying sites like mine which are making a conscious effort to get us away from the very things about ads that led to ad blockers in the first place makes them part of the problem. Ad blockers like this need to clean up their act.

Update (the following day): Shortly after posting this article, Adblock Plus added an exception for the element which shows the sponsor message. They responded by explaining that "I saw your sponsor message and it looks perfectly acceptable" and that it was in compliance with all their criteria. I appreciate their responsiveness on this; supporting responsible ads or sponsors or whatever you want to call them is what we need for a healthy balance of content and monetisation. As for the comment within that link that's concerned this will now display ads if ever I put them back, note that ABP's white-list is specifically for the sponsor banner and there would be no reason for me to put ads in that element.

Handling people's personal data is sensitive business

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Handling people's personal data is sensitive business

Last week I wrote about how 8 million GitHub profiles were leaked from GeekedIn's MongoDB which is always a risk when you expose a DB with no auth whatsoever! For any other website, this would be a typical data breach scenario in that info that was meant to remain private was made public. However, GeekedIn lost publicly accessible GitHub data so whilst yes, there was a breach, no, it wasn't anything you couldn't get publicly anyway. So what's the big deal?

I expected there'd be people in both camps on this issue - those who couldn't care less and those who were upset - but I was surprised at both how passionate each side was and how biased the vast majority was towards one end of that scale. Here's how I personally felt about the incident:

whilst this appeared to be data that was obtained from publicly facing GitHub profiles, it felt wrong to see it all aggregated together here like this

But if you have a read through the comments on that blog post, there were a couple of people in particular who were ardently against me on this. That's fine, you're always going to have different views on these things, but there were some pretty strong opinions usually along these lines:

Why is it wrong to see publicly available data aggregated?

And this is the paradox: if you place your information on a publicly facing resource, should it then be a free-for-all for anyone to do whatever they want with? I mean it's now public domain data, right? Others agreed:

In this case that expectation should be that it is completely public and can be used for anything. Github is in no position to forbid anything about content they serve publicly.

But this is where it starts to get a murkier - the assertion that a website is unable to define the terms by which people use it - because that's incorrect. Here's where GitHub stands on that:

Using scraped information for a commercial purpose violates our privacy statement and we do not condone this kind of use.

There are many, many precedents of legal trouble arising as a result of violating a website's terms of service and using otherwise publicly accessible data in a way they prohibit. Most notable was Aaron Swartz pulling data from MIT and as the linked article states:

The "terms of service" (TOS) of any website are basically a contract. They constitute an agreement about what you can and can't do, and what the provider can and can't do.

That "agreement" was pursued to tragic consequences in Aaron's case and I highly doubt that GitHub would ever actually go on the offence against a little startup like GeekedIn, but nobody should be under the illusion that someone else's website is fair game to do whatever they want with. That's a little bit tangential to the whole point about handling personal data though, let's get back to that:

I find it amusing that people would be worried that their email, which they published on github already, was "leaked", but then they would have no issue confirming it's an actively used email by signing up at HIBP

This one starts to get to the heart of the issue and it's simply this:

People provided their data to GitHub with an expectation of how it would be used. GeekedIn took that data and in violation of GitHub's terms, re-purposed it to do something completely different and for their own commercial gain.

I was seeing a lot of comments from people who were pretty unhappy about this situation, indeed many debating with those who'd left the comments above. So I thought I'd send out a poll:

When two third of respondents are unhappy about this and only a single digit percentage genuinely couldn't care, there's obviously something upsetting people. Someone pointed me over to a page on LinkedIn where the GeekedIn service was originally launched (you need to be logged in to LinkedIn to read this) and there are some very strongly worded comments there. I want to touch on some of those because they give us an insight into how the vast majority of people feel about their data being used in this way.

For many people, it's a consent issue:

Well thanks for scraping my personal data without any signup/consent and failing to keep it secure.

And that comes through time and time again:

Thanks for scraping my data off github, dude. I didn't ask to be in your dumbshit database.

And again:

Cheers, mate! Thanks a lot for scraping my data without any kind of consent.

And this one raised another interesting point:

I never gave consent to you to allow my PERSONAL email address to be used by recruiters. Where is your OPTIN which is European Law for personal emails?

As did this one:

Thanks for breaking EU and Spanish data protection laws. You know the drill - as required by law, please explain immediately where you got my data, what is in there, how you store it and how you process it. Then delete all of it, and provide proof, as required by law.

Quite a few people actually brought this up regarding opt-in and if GeekedIn's actions could be considered illegal under European law. Particularly in the EU (which incidentally, is where both the founder and the service are based), folks are very sensitive about data privacy and they have some of the world's strictest laws there governing how information can be handled. As a society as well, I find a much lower tolerance to privacy violations than in say, the US or Australia.

Legal aspects aside, much of this goes back to earlier point I highlighted around the expectation of context; yes the data was publicly facing on GitHub's site, but that's the context people were happy to have their data exposed in - on GitHub's site, not necessarily elsewhere. In fact that theme is very clear:

It's completely fair to be pissed about this, as it is 100% not the way the data is intended to be used

The commercial intentions were what upset many people; it's one thing to have your data scraped in this fashion, but it's another thing again to have someone attempt to monetise it:

You scraped a bunch of websites collecting information about individuals, without telling people about it? AND you charged money (quite a lot of money) for giving companies and recruiters that information so they can spam us with crap?

Within the comments on that LinkedIn page there were a few messages of support, albeit from the one individual who seems to be associated with the service. His defence consisted of a combination of "don't put your data on the web if you don't want it scraped", "web crawlers do this too" and "but the information isn't sensitive anyway". Now I get the defence and why he'd think this way but again, people have very different expectations here. You expect public data to be indexed for the purposes of searchability and when it comes to personal information, there are very different views out there about how sensitive it is.

Here's what really gets me with all of this though:

Despite the overwhelming majority of survey respondents and commenters being unhappy about this, there's a small minority that insist, against all evidence, that these people are wrong. They don't believe anyone has the right to feel at all violated that their data has been misused in this fashion.

The simple premise of "it's already public" is not a sufficient defence and there are many examples of where that rationale doesn't play out. I'm "already public" when I'm at the beach with my family but I don't wish for unknown people to share that with others via photographs. People may visit certain public establishments which they don't want to broadcast to the world. The decision of what should and should not be public and how it's used once it is remains a personal decision, not one upon which others project their own views to the detriment of others.

The lesson from all of this regardless of whether you think this issue is a non-event or not is precisely the title of this blog post: handling people's personal data is sensitive business. It doesn't matter where they left it or how you got it, the vast majority of people feel deeply about how it is reused and if you don't respect that, you could be heading towards the same hot water GeekedIn now find themselves in.

Have I been pwned and spam lists of personal information

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Have I been pwned and spam lists of personal information

One of the things I'm finding with running Have I been pwned (HIBP) is that over time, my approach is changing. Nothing dramatic thus far, usually just what I'd call "organic" corrections in direction and usually in response to things I've learned, industry events or changes in the way people are using the service. For example, the Ashley Madison hack led to the concept of a sensitive breach which meant ensuring that data from certain incidents is not publicly searchable. More recently I introduced API rate limiting as I was seeing the service being used in ways that worried me. Times change, things move on.

Recently, I came across a massive spam list with a bunch of personal data and it got me thinking again about whether this has a role to play in HIBP. In fact, the context was that someone sent it to me as a claimed "OneDrive breach", something I was highly suspicious of right from the outset (only last week I was lamenting how many "breaches" are entirely fake). But be that as it may, it actually had a lot of personal data in it and with a little digging around I tracked it down to a commercial spam list. Actually, it was very similar to the Special K spam list I wrote about in March, even the total numbers were close (31M in the first one, 33M in the second). They weren't the same thing (only 5.6M common accounts), but they adhered to the same pattern in terms of a large amount of personal data floating around the web, inevitably unbeknownst to the owners of it.

I thought I'd put some feelers out to see how people would receive this class of data appearing in HIBP, and the response painted a pretty clear picture:

The caveat that accompanied many of the written responses to that tweet was that it should be clear when the data comes from an identifiable breach as opposed to when it's sold for spam purposes.

Here's the way I look at it all and I'm going to share both the negative and positive in terms of whether it makes sense to load it into HIBP: The main negative is that if it's loaded, then what? I mean someone finds themselves in a spam list, what can they actually do? They can't change their password like they can in a data breach nor do they really have any viable recourse against the people selling their data. There's really nothing actionable about all of this.

On the other hand, people want to know about their exposure. They want to know where their personal data appears and where it's being redistributed. So many times after loading a data breach I've had people contact me relieved that they've been able to attribute a potential source of abuse. Of course there are no guarantees that what they're seeing in HIBP is genuinely the source of the abuse, but it helps them complete the personal picture of their exposure. It's not just that, but HIBP subscribers expect to be told when I find them floating around the web; there are 10k subscribers in that latest list and 16k in the Special K one and they're going to want to know that they're in there.

Bet here's the clincher for me:

Spam lists containing personal information are indistinguishable from data breaches.

This is not just a collection of email addresses; we're talking about multiple personal identity attributes which are regularly used for nefarious purposes. For example, here's what's in that Special K spam list I mentioned:

email,ip,url,joindate,fname,lname,address,address2,city,state,zip,phone,mobile,dob,gender
[redacted]@gmail.com,162.158.22.[redacted],instantcheckmate.com,2015-08-06,[redacted],[redacted],,,San Francisco,CA,94107,,,,
[redacted]@gmail.com,70.198.4.[redacted],creditcardguide.com,2015-08-06,,[redacted],,,Mitchell,SD,57301,,,,
[redacted]@gmail.com,166.137.139.[redacted],creditcardguide.com,2015-08-06,[redacted],[redacted],,,,,,,,,

Names, addresses, birth dates, genders, phone numbers - this is precisely the sort of data people are so worried about being misused. The only thing not in there that you'd usually see in breach data is a password. Even then, not all breaches have a password in them anyway. For example, the Modern Business Solutions breach didn't have them and neither did the Regpacks breach, although they all had many of the fields you see above in common and this is precisely the sort of data people want to know is circulating about them.

Including widely circulating spam lists helps HIBP users assess the overall exposure of their personal data. Subscribers expect to be notified when their data is misused in this way.

I've now integrated the concept of spam lists and have done a number of things to clearly identify what they are. Firstly, these incidents are flagged each and every time they're represented in the HIBP interface, firstly both via the title and with an icon:

Have I been pwned and spam lists of personal information

That's if it makes one of the top 10 incidents presently represented on the home page (which neither of the lists I'm kicking off with do), and it's then explained clearly in the description that pops up when drilling into it:

Have I been pwned and spam lists of personal information

Same again on the page listing all pwned sites:

Have I been pwned and spam lists of personal information

There's also now a spam list entry on the FAQs page which explains much of what you've read here albeit more succinctly. That's linked from anywhere it makes sense to provide a definition which now also includes in notifications sent to HIBP subscribers:

Have I been pwned and spam lists of personal information

And finally, the API docs now also include a reference to the attribute:

Have I been pwned and spam lists of personal information

And that's pretty much everywhere you'll see spam lists represented in the system. I've tried to make it as absolutely transparent as possible and while I'm sure there'll be further tweaks (the nomenclature in combination with the word "breach" in certain places, for example), this all contributes the overall objective of helping people understand more about their exposure on the web.

So that's spam lists and the Special K incident is now live and searchable. Plus, as you'd expect, HIBP subscribers are now receiving notifications about something they almost certainly weren't aware of before. I've got that second incident I referred to at the start of this post to also load today then I'm sure that just as our data will continue to be unwillingly exposed by breaches, it will continue to appear in spam lists as well so inevitably we'll see a lot more of this in the future.

Get "The Information Security Big Picture" on Pluralsight now!

$
0
0

Sponsored by: Sucuri: Incident Response, Monitoring, DDoS mitigation and WAF for websites

Get

If you're here reading this then it probably won't come as a big surprise but brace yourself anyway - we have a security problem. Yes, yes, I know, it's all very terrifying and not a day goes by where someone isn't getting cyber-something'd. As best I can tell from the news, it's pretty much all to do with guys in hoodies sitting at green screens pwning all our things. I'm quite sure that's the case, I even did a quick check on Google to confirm:

Get

I talk about these crazy hacker perceptions in the intro of my new Pluralsight course and despite the sensationalist and inaccurate imagery in the reporting, security genuinely is a big problem. In my view, much of it has to do with many fundamental misunderstandings, even amongst those who are more technically inclined. I wanted to make a course that challenged many of the views people hold about security and took them to places that, well, might make them feel a little bit uncomfortable. For example, people run anti-virus with the expectation that it will do precisely what the name suggests, yet in the second module when I delve into The Soft Centre Within Our Networks, it becomes pretty evident that traditional signature-based AV is losing the battle:

Get

Speaking of feeling uncomfortable, I love opportunities to turn conventional wisdom on its head so in the third module on The Security Lies We Tell Ourselves, I talk about what's safe to browse and what's not:

Get

Now I'm not saying that when you next kick back for a bit of lazy web surfing this should necessarily influence where you browse to, but there are some genuinely interesting insights in here. When Symantec prepared this report a few years ago, they identified a number of factors that made porn a safer bet than religion, including the fact that the former has a monetisation strategy that's heavily dependent on them keeping their site safe. You're (probably) not going to change religion if you get a virus from your local church's website, but you may choose another location for your adult entertainment if the same thing happens there. This then leads us into discussions about topics such as malware distribution via ad networks that can happen on seemingly innocuous websites which then turns into a discussion on the role of ad blockers.

The fourth module talks about The Problems (and Solutions) with Mobile Devices and it gets us into important issues such as gummi bears:

Get

There's context here, of course, and it has to do with biometrics and the assertion that they're "unsafe" due to the ability to create gelatin moulds of fingers thus circumventing what is becoming an increasingly popular security control on mobile devices. This was demonstrated as far back as 2002 (probably further) and biometric-naysayers will argue that it proves this mechanism of identification is "insecure". But when you look at what's actually involved in a workable attack and you think about your threat actors - which are almost certainly the folks in the image above - the reality starts to look a lot different. There's a very pragmatic discussion to be had around all this and I cover it pretty holistically in the course.

Moving onto the fifth module and it's What People Just Don't Understand About the Web. For example, websites in the EU are doing a good job of scaring the bejesus out of anyone with cookies enabled by throwing big warnings around about tracking. Yet you don't need cookies to be tracked and I love showing people amiunique.org just to make a point of it:

Get

Go and give it a go and you'll almost certainly find that just as your mother told you, you are indeed a unique snowflake. Disable your cookies, remove your plugins and turn off your JavaScript and you can still easily be tracked. Now how 'bout removing those inane cookie warnings, Europe?

And finally, in the sixth and final module, it's The Painful Truths About Passwords and Authentication. I love this module, because I get to talk about how badly people screw up 2FA:

Yes, that is what it looks like. Reckon that's nuts? How about not just pointing a webcam at the token (you know, the thing you're meant to have which is what makes it a second factor in the first place!) but also writing a blog post about how to do it including OCR'ing the PIN. I've got a long and prosperous career ahead of me at this rate...

I wanted this course to highlight security in ways that many people don't usually think about it. I wanted it to span both technical and non-technical audiences and I really wanted it to be an enjoyable course to watch. It comes in at 2 hours and 17 minutes and we've positioned it as a "Big Picture" course so it covers a lot (far more than just the bits I've highlighted here) but it doesn't go too deep. I've had this one in planning for a long time and it took a lot of work to get the balance just right so I'm really hoping that it's well-received.

The Information Security Big Picture is now live on Pluralsight!

Viewing all 869 articles
Browse latest View live