We all know that a ‘googol’ is the large number 10 to the power 100, that is, the numeral 1 followed by 100 zeros. This term was coined by Edward Kasner in 1938. Edward Kasner used it to illustrate the difference between an unimaginably large number and infinity. Indeed, we are entering the age of googol. In fact, ‘google.com’ itself is a play on the word ‘googol’. It reflects Larry Page and Sergey Brin’s mission to organize a seemingly infinite amount of information on the web.
A recent report by McKinsey found that in 15 of the U.S. economy’s 17 sectors, companies with more than 1,000 employees store, on average, over 235 terabytes of data, more data than is contained in the U.S. Library of Congress. Evidently, there’s a swarm of activity around a new crop of Big Data tools like Hadoop, that can deal with huge amounts of data. The opportunity to identify patterns reaches far beyond single view of datasets. Firms are looking for tools that depend on massive datasets containing not only financial details for transactions, but IP addresses, browser information, and other technical data that will help these companies refine models to predict, identify, and analyze.
Take the case of something like the Financial Intelligence Unit established by the Ministry of Finance. It aims to track trillions of transactions flowing through various Banks, Financial Institutions, Payment Gateways etc. to parse them and look for anomalous transactions or ‘outliers’.
The Financial Intelligence unit’s objective is to track and control activities like money laundering, corrupt practices in the financial sector and prevent black money transfers. Obviously, this means that the technology needs to grapple with humongous datasets right from extracting, loading, transformation, storage, analytical processing and reporting. And when we mean analytical processing, it means some heavy duty statistical analysis which require numerous computations to be happening at the back end. Obviously, the department needs something like Big Data capability in this regard.
Interestingly, Ministry of Corporate Affairs (MCA)is the place where companies (both listed and unlisted need to file their annual reports and statements). MCA aims to leverage this database to monitor fraudulent behavior and bring down with exemplary punishments in cases of non-compliance or fraud. This is another case of Big Data capability, but are we trying to achieve it. In contrast, refer to this news where Hadoop is being used by the Department of Homeland security (USA) to track data and uncover patterns. Like Hadoop, already there are several active players in this field with huge funding from PE firms Angel investors commercializing the knowledge and selling analytics driven Big Data solutions. Quantavo, IBM, HPCC, Amazon, 1010 Data, Opera solutions are some of the names in this business. One of the new technologies in Big Data is an open-source database called Hadoop. Developed by Yahoo, it was spun off earlier this year to an independent open-source company, Hortonworks, financed with additional backing by Benchmark Capital. Hadoop works on only a subset of the problems that can be broken up into chunks and distributed across a whole bunch of computers. That is revolutionary, since price points are definitely lower while the speed/processing power enhances exponentially.
According to a global survey of EMC, Two-thirds of companies don’t properly use big data to influence decisions. Only one third of companies are able to effectively use Big Data to assist their business’ decision-making. Nonetheless, there is spurt of activities around Big Data; almost like carpet baggers looking to make bucks out of the exploits that come from global meltdown.
This is hard to believe but let us consider these developments. LexisNexis has created an application for Big Data Analytics, and it believes that it has produced something that’s better and more mature than the better known Hadoop technology. Not to be left behind, SAP plans to make the HANA in-memory database the pivot of its Big Data technology that will leverage its mammoth ERP architecture and deliver Big Data analytics capability. Microsoft has installed a version of Apache Hadoop on its Azure cloud service. The Greenplum division of EMC is building a single data analytics platform that can crunch both structured and unstructured data and give a broad range of users the tools to study an enterprise’s information.
The proliferation of large-scale data sets is beginning to change business and science around the world, but enterprises need to prepare in order to gain the most advantage from their information. If we go into the market now, every enterprise software vendor will boast about the strength of its application and how great their products are. They churn terabytes of data and apply agglomerative clustering, Latent Class, Convolution, Non-linear programming and other high-end econometric and optimization routines.
However, let’s take these with pinch (nay, cupfuls) of salt. Many a times, the Big Data mines are ‘salted’. In fact, in past, we have seen these in respect of some of BI. These products just kept selling throughout the global recession, as companies looked to gain insights into their business and subsequently, more efficiency as well as new ideas. But, only few could get the value of their hard earned money spent. Similar is the case with Big Data Analytics. If used without thought and due-diligence, it becomes a quagmire of lost money, lost hopes and lost desires too. Used wisely, it becomes a God like beacon in troubled times.
– The views expressed in this article are personal