Big Data Implementations.

Big Data is conceptually a collection of data and the means to store (a file system) and access it (a runtime).

This means that all of the components are swappable including the very core components that were the archetypal definition of Big Data – the file system (HDFS) and the execution environment (MapReduce). The resulting system will still be Big Data, and if any of this still fails to meet your needs you can write your own and this will still be Big Data (after some smart people look at it first).

It’s like Granddad’s axe. The handle has been replaced 3 times and the head once, but it’s still Granddad’s axe.

Alternatives to MapReduce and HDFS started to get published in 2009 (signs of a healthy ecosystem), and the people that were deciding that there needed to be alternatives were predominantly the people who wrote this stuff in the first place – Google (signs of Google eating their own dog food). There are several excellent articles on these alternatives. What is encouraging is that the initial Google projects (like Pregel) also have open source counterparts (Giraph), what is more encouraging is that there are Google projects as well as Big Data solutions out there that don’t have an open source project.

In this case the definition of Big Data for you could be your very own open source project and community (be careful what you wish for.)

Big Data is also a solution. It is a general data processing tool built to address the real world use case of speedy access and analytics of huge datasets.

Examples of these types of solutions are:

  • Facebook uses it to handle 50 Billion photographs
  • President Barack Obama committed $200 million to Federal Initiatives to fight cancer and traffic congestion with Big Data. Mr. Obama knows the value of Big Data – it helped him get re-elected.
  • BigSQL – the poster child for Data Warehousing with Big Data.
  • Decoding the Human genome – what took 10 years to do originally, Big Data does in a week.
  • NASA uses it to observe and simulate the weather
  • The NSA are definitely going to use it for the betterment of the web experience when they fire up their yottabyte capable Big Data machine on information they capture over the internet. 1000000000000000000000000 bytes – it’s a lot bigger than it looks, trust me. Just imagine what those little rascals could get up to with all of that lovely data.
  • Architects use it to enforce a consistent design approach and increase productivity (Nice to hear, I assume it also makes buildings safer as well as profits bigger).
  • MIT hosts the Intel Science and Technology Center for Big Data (If you can’t do, teach. Big Data eats Intel chips like a pig eats truffles).
  • McKinsey bills millions on their MIT graduate Big Data consultants.
  • Made finding the God Particle (Higgs Boson) possible.

As you can see, the limit of natty little Big Data implementations is your imagination.