Filtering Through The Noise

TM Arun Kumar
New Update
arun x

While the world at large is being bombarded with cliches like more data will be produced in the next two years than in the last century or more data will be produced in the next decade than in the history of mankind, what one doesn't often hear is that much of the data that is being generated and stored is, to put it crudely, junk. They consist of data that is not actionable.


However, thanks to ever growing digital technology that invades almost every facet of our daily lives, the data will be produced and stored, whether you like it or not. Most experts on big data or people looking at implementing or those that have already implemented big data solutions would concede that the data that is being generated is very noisy. It is their way of describing that much of the data being produced is just simply junk.

With so much amount of useless data or junk being produced, stored, and passed off for analysis, it raises a few important questions. Will the real actionable and important bits of data get drowned in this sea of junk? And secondly, how does one effectively sift through these vast troves of data to find the actionable bits? Finally, and most importantly and worryingly, can this garguantuan amount of data, which will have some actionable bits buried in it, lead people analysing them to wrong conclusions?

The first two questions are related and fairly simple and straightforward to answer. Yes, lots of important bits of data will seem to get lost in the crowd of junk, but effective use of technology can help people sift through this mountain of data to find out the relevant and important bits. So far so good. But, how about the third question?

This is an interesting problem and truly represents one of the biggest challenges of Big Data. For instance ice cream sales increase during summer and so do forest fires. But, does it mean that one causes the other? Of course not since co-relation doesnt mean causation. While the ice cream – forest fire example may be simple, there may be many similar positive co-relations that may not be so simple for analysts to solve. This is the reason why you see many unrelated “targeted ads” popping up while surfing – the data might have suggested a positive co-relation.

The answer to this lies in the skills of the people doing the analysis – if the analysts, or “Big Data scientists” as the industry likes to call them, are not any good, the resulting analysis will be poor. The resutlts will be only as good as the person analysing it. Garbage in Garbage Out.