2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul

Where Did The Breach Go?

Where on earth did the breach go? We've asked ourselves, we've asked others, and we've been asked by many.

The simple answer is, we don't know! It could be anything, really, that has caused the dramatic decline in reported data loss incidents in 2009. Here are a few ideas:

  • The decline is media related. Data breaches are 'passé'.
  • Organizations are implementing better security.
  • Organizations aren't reporting incidents.
  • Solar Flares

None of these, with the exception of solar flares, is likely to be analyzable at first glance. But what about the first bullet?

Due to a lack in expertise of space weather, we decided to dive into the Google News archives, and things became interesting. Google News' timeline feature facilitates this kind of analysis. We looked through search result totals matching the query "data breach", per month, for 72 months (2004 through 2009). We then tossed the data into a graph, added a polynomial trend-line with an order of 6, and took a deep breath.

This graph looked strikingly familiar to the graph at the top of datalossdb.org. We then made a similar graph of datalossdb.org incidents for 72 months (again, 2004 through 2009). We also tossed in another polynomial trend-line with an order of 6, and sure enough, we were looking at very similar graphs.

"Pretty rad", we thought. So we then superimposed them.

Before we go on, we should be clear that we're not drawing any conclusions here. Just casual observations. It would appear that Incidents and Articles regarding "data breaches" are related. That's our only conclusion here. Determining causality is tricky. Is it the breach that creates the news? Or is it the news that 'digs up' the breach? The answer is probably both, to some extent.

These results aren't all that surprising either. Most of the data in our data set comes from news articles regarding breaches. We'd expect, to some degree, a similarity. The scale, however, is rather different. ~500 to ~1800 for articles, and ~10 to ~75 for incidents, per month.

So we decided to seek out a single, national-scale news source to query. The number of articles in the New York Times on a given subject, for instance, might give us a more accurate gauge of national umm...popularity? The number of articles should also be smaller. In truth, we're not picking the NY Times at random here. They happen to have a fantastic and Open REST-based API.

We ran the same query through the NY Times API "data breach" (in quotes). The number of articles annually was very small, so we broke out the data into years, as opposed to months, and aggregated our monthly number of incidents in datalossdb.org to match it. We then generated a similar graph to the above graphs.

Now there are fewer data points, so the trend-lines are significantly different than above, but we see an interesting pattern. The term "data breach" returned 0, yes, zero articles for 2004. It jumps to 15 in 2005, then 22 in 2006, then nose dives to 9 in '07, 5 in '08, and 4 in '09.

This was a bit impressive actually, considering that 2007 had TJX, 2008 had the most incidents in history, and 2009 had some of the largest incidents ever. However, in those years the data breach topic received significantly less attention. Why?

Again, there are several possibilities that come to mind:

  • Data breaches are 'passé'.
  • Semantic problems on our end.
  • Solar Flares

Wow, look at that, 2 out of 3 are the same as the list above! Suggesting more and more that solar flares are indeed the issue here!

In truth, we're not trying to suggest that data breaches being 'passé' in the media is the real issue here. We don't know! It's one of our theories, and none of what we've produced here proves causality. We're suggesting it though. The NY Times analysis, when compared to the other charts, seems to suggest it. Why did the Times cover data breaches less in recent years, despite the explosion in frequency and grandness of them? Why did the peaks dissipate from Google News results in mid 2008, when the number of incidents was still booming?

Maybe the New York Times is ahead of the curve in terms of news trendiness and the rest of the media followed suit later on?

Maybe list item 2 is our problem; perhaps the term "data breach" isn't semantically accurate. Even adding "data loss" results doesn't change the graph. It added more data, though, but the graph stayed essentially the same.

Same graph, but we figured we'd show it regardless.

Here's some wild speculation, which is the same wild speculation we were making before we even looked at any of this data:

  • The media initially rode the wind of these 'new' threats to consumer privacy.
  • Consumers were concerned.
  • Legislators passed laws requiring notification.
  • Consumers started receiving letters.
  • Consumers became desensitized to the letters.
  • The media became desensitized to the stories: the stories no longer shocked consumers.
  • Breaches were reported in the media less often.
  • Solar flares happened

Going back to the original question: Where did the breach go?

We have no idea. But it was fun looking into one theory a little closer.

If you enjoyed this article, please help us meet our goal by donating to the Open Security Foundation via the link below:

Sponsored By: Credant_200x51 Tenable Pgp_logo Zecurion
Permission is granted to use this database in non-profit works and research. Use of the DataLossDB, and its exports, RSS feeds, reports, or other materials produced on this site by the Open Security Foundation for commercial interests requires authorization and licensing arrangements. For more information, please e-mail curators@datalossdb.org with a brief summary of how you would like to use this information; product, service, research, etc.
© 2005 - 2010, Open Security Foundation, All Rights Reserved.