By: Dr. Lalit S Kathpalia, Director, Symbiosis Institute of Computer Studies and Research (SICSR)
Big Data is currently bleeding edge technology, or something which technology professionals want to flaunt on their resume. Frankly speaking Big Data along with Internet of Things as of now is a sexy term to describe existing technologies that are ready for adoption, usage and would reach maturity. We would be looking at Big Data to understand whether Big Data is really Big?
In order to understand whether Big Data is Hype or Reality we first need to understand in a layman’s term what is Big Data and when did it start trending. All folks who work in the technology industry are always insecure. And many a times folks who are not from the technology industry wonder about the insecurity of the technology industry folks. Frankly speaking this insecurity of the technology industry folks is due to the constant threat of getting obsolete (technologies dying down) and being laid off. So Big Data is an alternate path which provides solace to the folks from the technology industry.
There are many ways to explain the term Big Data. As per Wikipedia Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. In layman terms Big Data simply means an amount of data so massive that it is difficult to handle by traditional methods. Interestingly it’s not the amount, volume of data that’s important but what organizations do with the data that matters. Big data provides insights that lead to better decisions for organizations.
Now that we know what Big Data is let us go back to some history. Big Data is not a recent phenomenon; it has been for quite some time without being named Big Data. In 2012, Big Data was a featured topic at the World Economic Forum in Davos, Switzerland, with a report titled “Big Data, Big Impact.” The fact that Big Data had arrived was in 2012, when it became part of satire in the “Dilbert” comic strip by Scott Adams. “It comes from everywhere. It knows all.” one frame reads, and the next concludes that “its name is Big Data.”
Interestingly if you go through history and do some investigations the first time the word Big Data was brought up was the year 1998. The father of the term Big Data in all probability could be John Mashey, who was chief scientist at Silicon Graphics in the 1990s. There are no academic papers to support the attribution to Mr. Mashey. He gave hundreds of talks to small groups in the middle and late 1990s to explain the concept and, of course, pitch Silicon Graphics products. The case for Mr. Mashey is on the Web sites of technical and professional organizations, like Usenix. There, some of his presentation slides from those talks are posted, including “Big Data and the Next Wave of Infrastress” in 1998.
All Mr. John has to say regarding the simple term (Big Data) and his role in popularizing the term among the high-tech community is – “I was using one label for a range of issues, and I wanted the simplest, shortest phrase to convey that the boundaries of computing keep advancing.”
At the same time I would exercise caution on the number of experts on Big Data. A lot of people claim to have Big Data competencies just to make money which is like “Make Hay while the Sun shines”.
The question we have in our minds is “Where does Big Data come from”. Big data can be categorized to a few categories including social data, machine data, and transactional data. Social media data is providing remarkable insights to companies on consumer behavior and sentiment. This can be integrated with CRM (Customer Relationship Management) data for analysis, considering say 500 million tweets posted on Twitter per day, 5 lakh “Facebook” likes per minute, and 300 hours of video uploaded to YouTube every minute. Machine data is information generated from industrial equipment, real-time data from sensors that track parts and monitor machinery (often also called the Internet of Things), and even web logs that track user behavior online. Regarding transactional data, large retailers and even B2B companies can generate multitudes of data on a regular basis considering that their transactions consist of one or many items, product IDs, prices, payment information, manufacturer and distributor data, and much more. Major retailers like Amazon.com, and restaurants like US pizza chain Domino’s, which serves over 1 million customers per day, are generating petabytes of transactional big data. The thing to note is that big data can resemble traditional structured data or unstructured, high frequency information.
The problem of Big Data is a problem of Amplification of the environment. For e.g. to manage traffic you need a single traffic light to indicate on or off since there are two states in the environment. Now by tracking every vehicle through video cameras and other technologies we are creating millions of states that need to be managed. This is far beyond the requisite variety that the traffic police can manage. In a nutshell we are making a manageable environment become unmanageable by trying to manage too many states. As per Dr. Anupam Saraph “Big Data is actually a problem created by technology folks to make manageable organizations become unmanageable. This is so because Big Data amplifies the environment t beyond the ability of the management. This in System Science is recognized as a violation of Ashby’s Law of requisite varieties”.
So Big Data is actually Big making organizations Small or Management in organizations small. We need to be cautious since Big Data may end up eroding the organization’s ability to cope and solve the problem.
And to end this story is the famous Billy Joel song –
“We didn’t start the fire
It was always burning
Since the world’s been turning
We didn’t start the fire
But when we are gone
It will still burn on and on and on and on
And on and on and on and on.. “