Adventures of Big Data Analytics

By: Titir Pal & Rajat Narang, Director, Solutions and Senior Manager, Solutions, Absolutdata Analytics

Data has been around forever. Right from the 19th century when people would collect data through manual surveys and create trends around them to the current times when smart
cities are emitting data at a second level from every installed equipment; data has played a big part in  decision making.

With the advances in mathematics and statistics during the 20th century, we reached a point in the early 2000s where there were lots of techniques toData has been around forever. Right from the 19th century when people would collect data
through manual surveys and create trends around them to the current times when smart cities are emitting data at a second level from every installed equipment; data has played a big part in decision making.

With the advances in mathematics and statistics during the 20th century, we reached a point in the early 2000s where there were lots of techniques to utilize the data but we lacked the processing power and the right paradigm of programming to handle the large scale processing needed.

Google’s Jeffrey Dean and Sanjay Ghemawat changed the entire processing game with their seminal paper on map reduce. Add to this the great work done by the open source champions (many of them at Yahoo, Facebook, Cloudera, and Google) who
create an open source implementation of the paradigm in
the form of Apache Hadoop.

On the one hand, processing was becoming feasible, and on the other hand machine learning scientists such as Andrew NG, Tom Mitchell have started taking their art to the masses using the Massive Open Online Course (MOOC).

This advancement in data handling capability along with machine learning algorithms have given rise to usage of data in newer ways in many important industries. Some of use cases in key industries are:

Utilities Industry—Smart Homes/Office Sand Internet
of Everything: A few cities in India have started installing smart meters for energy consumption and for water consumption checkpoints. One of the most important use cases involves using the smart electric meter data flowing at each second to understand the devices used by the household.

It involves breaking down the energy usage by various devices using Fourier analysis. Once, we have the information around the devices used, and given that we understand the dynamic pricing of energy companies, we can give recommendations to the customers to use their devices at non-peak hours.

This helps the energy companies the problem of extreme demands at the peak hours. This also helps the customers with smaller electricity bills. The energy company can also help the customer understand which of his devices are performing worse than similar devices at others homes. This can possibly save a
faulty device, accidents, and better energy usage. This device data can monetized as well to understand the affluence levels of customers and then marketing  products which the customer might buy.

Insurance, Actuaries, and Financial Services—Self
Learning Automated Algorithms: Insurance industry makes big money by playing at risk probabilities of one in a million. However, when Insurance companies lose money, they lose it big. Insurance companies are now consuming telematics data coming from automobiles to understand the risks associated with someone’s life, car et-cetera by analyzing their driving behavior. So, if you consume a lot of alcohol while driving and somehow dodge the police man, the insurance company is watching you using the sensors.
The insurance industry is also very prone to frauds which are planned by data monsters themselves. That brings us to another use case is to detect frauds. This is being done by anomaly detection algorithms which are able to detect not only large frauds (transactions) but also multiple small massively planned frauds without much human intervention. These are self-learning algorithms that also improve their performance with time.

New Age Marketing Analytics: Marketing Analytics companies have been using data driven marketing and campaign management for long. However, increased sources of data as well as the capability to consume such massive amounts of data is giving these companies a massive muscle bump.

Video data to understand the consumer behavior and shelf optimization in stores, social data to understand the correlation between social discussions about brands and their sales, GPS data to understand where OOH advertisements need to be put etcetra have all helped the marketing analytics companies
and in turn the manufacturers and retailers better target their customers as well as better evaluate them.

Politics—Political Campaign Analysis: While analysts such as Nate Silver are busy predicting elections before time, Barack Obama in the US and Narendra Modi in India have made great use of data analytics and business intelligence tools to understand their voter perception.

The  where to say what and when is pretty much pre-decided
alongside the expected impact of such statements. More and more political parties are designing local campaigns using text analytics on social media data and analyzing the offline data from user surveys. Additionally, pre-poll online and offline surveys around voter sentiments on various topics, and about various candidates help political parties strategize the intensity of campaign as well as understand topics which catch the voter nerve. The speech contents can be curated using the insights.

National Security—Terrorism Detection Using
Social and Telecom Data: As the Facebook CEO, Mark Zuckerberg met Prime Minister Narendra Modi in October this year, the rumor mills have started building the story around India’s own NSA like agency to keep track of everything that happens on the social media.

Social data and telcom data are a major source of information
in building patterns around possible terrorist activities. In the times to come, government spending on analysis of such data sources will go in a hyperbola and there is a chance for many companies to serve human causes as well as make lots of money.
Requirements created by these application have
created enough data engineers and data scientists in the
world. Although, the demand remains a lot higher than the
supply and big data analytics continues to be a big field
of interest.