Big Data. What’s the big deal, really? Data has been around for a long time, and organizations have been using it to their advantage since for ever. So what is so different now?
Legitimate questions from anyone who is sick of hearing all the hype around Big Data. Granted, you will come across many instances where the term is used just because it is the fashionable thing to do. At the same time, the drive towards what is called Big Data is very real. The acceleration is due to expansion of data in not one, not two, but three dimensions simultaneously. These three dimensions are famously known as the Three “V”s of Big Data.
This is the obvious one. From mega to giga to tera to peta bytes, volume of data is increasing exponentially. You can understand this quite simply by comparing the size of your typical text file with that of a typical MP3. Take it forward and compare sizes with video files and then take it a step further to look at the size for what is now a norm with all of us, i.e. HD videos. Increase in size is pretty dramatic, and of course that means volume of data being collected at any place increases.
But that’s not the only thing that drives volume. Sources of data have increased significantly as well. Take, as an example, a typical cell phone we carry around. Not so long ago, the only data it had was voice calls and texts. With the advent of smartphones and the mind boggling array of sensors and other equipment within them, the amount of data they carry is staggering. Combine this with the Internet explosion, social media, digitized businesses, e-commerce and you will begin to understand why it’s not only the size of individual files that are driving data volumes in today’s world.
It is no longer ponds or lakes of data organizations are dealing with, it is vast expanses of oceans. And when you are in an ocean, you have a whole different set of challenges to navigate your way through the hazards of rough seas.
This is probably the least understood of the 3 “V”s. There are multiple angles to look at velocity as a dimension in which data is expanding.
First is the sheer rate at which data is coming in. Along with an ever-faster connectivity, this co-relates directly with the volume dimension described above.
It is not just the rate at which data is pouring in, it is also how quickly it needs to be processed. Batch processing is not good enough, business managers need data in real-time. As I mentioned in a previous post (http://openscg.com/2013/
07/bi-and-hadoop/), “I will get back to you tomorrow” is no longer an acceptable answer!
Thirdly, data is only useful as long as it is being processed faster than it is entering the system. If the velocity of data processing is less than the velocity of data entering the system, all you would end up with is an ever-growing backlog. And of course, because the demand is for real-time data, back-logged information is not really of any use to anyone.
Lastly, you need to be prepared for not only consistently high velocity, but also acceleration and, at times, bursts of intense activity.
Think about all the various types of data that you have the ability to gather with today’s technology. Audio/video streams, GPS, sensors, documents, SMS/MMS, flash, images etc etc.
Every organization gathers data and every organization wants to use that data. In order to convert this ‘dark’ unstructured data into information, traditional structured approaches no longer suffice.
Wrapping up, I hope you can see now why Big Data has gained so much momentum lately and why organizations need to take urgent steps to ensure they are not left behind in this race. Proper tools are the key!