Seizing the Big Data Analytics Opportunity in eCommerce: Andy Sen, CTO, Edureka

eCommerce companies are increasingly evaluating advanced big data analytics tools to stay ahead in the ever-competitive space

By: Andy Sen, CTO, Edureka

Big data is a term for a collection of data sets so large and complex that it becomes difficult
to process it using traditional data processing applications. Big data is characterized
by volume (large amount of data); variety (structured, unstructured, semi-structured data);
velocity (speed of data—real-time, batch);and veracity (uncertainty of data).

For an eCommerce industry, which thrives majorly on customer usage and purchase behavior, tracking this becomes quite challenging without the usage of certain data analytics tools. Online purchases or ATM withdrawals (traditional transaction processing) are still very capably served by RDBMS (Relational Database Management Systems) such as MS SQL/Oracle/MySQL.

But, when it comes to analyzing the same sales data,customer activity and other unstructured information,the limitations of earlier computing tools force analysts to focus on smaller samples as opposed to the entire population of the available data.

However, with new big data tools and techniques, all kinds of information can be analyzed at a granular level,providing features such as personalized recommendations and targeted advertising—did you ever notice how those ads on Gmail were correlated to the email you were viewing?

Traditional data analytics created three primary issues:The inability to explore high fidelity raw data; the analytics computation would not scale; and premature data death
(due to data archival).

For example, when SEARS was using Oracle Exadata along with SAS, only 10% of the 2000 TB of data was available for analysis, with the remaining 90% archived.After moving to advanced tools like Hadoop, it kept 100% of its historical data available for processing—the ability of Hadoop to provide both a storage and compute cluster tackled the three aforementioned issues.


Apache Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a simple programming mode. A typical Hadoop cluster consists of:

1. Distributed File System (DFS): Using Hadoop DFS across a cluster of N machines results not only in data redundancy but also a speed increase of almost N times. Essentially a 10 node cluster improves one’s read time by a factor of ten, important when reading thousands ormillions of terabytes.

2. MapReduce Framework: Splits a task across processors, is ‘near’ the data and assembles the results.

3. Some Analysis Modules: A classic one would be Mahout, which allows one to utilize machine learning algorithms such as recommendation mining—taking users’ behavior and trying to find items users might like.

Now with the advent of virtualization and IaaS providers like Amazon Web Services, Linode, Microsoft Azure, etc,setting up computing clusters is no longer effort intensive and can be done while sipping one’s favourite cuppa.

Once a proof-of-concept cluster is deployed and tested on the cloud, based on cost considerations, one may want to continue with it or proceed with a real hardware
implementation in one’s local datacenter.


Once the cluster is setup, it is time to harness the behavior of customers to help make user recommendations. Today,tools enable learn relationships between users and items every N hours. Usually these relationships don’t change often—people who buy a Canon camera also buy a Canon battery pack. And using Mahout one can recommend a product to a new user with no history. Combined with the analysis of unstructured data (such as liking and commenting on Facebook posts), advertising firms are able to track and predict the user’s behavioral history and provide personalized ads. The plus for eCommerce firms is that they can get increasingly specific about their target audience.

There are several business benefits of building predictive models on big data for eCommerce firms. Apart from higher customer satisfaction and increased sales, it also
shortens lead times and waiting cycles. Amazon, who has always wanted to reduce its shipping time, has filed a patent to predict consumer purchase before it actually happens (based on the user’s browsing pattern).

Another area where predictive analytics is used is in the propensity models. Metrics such as propensity to engage (predicts the likelihood of a customer opening your email); propensity to convert (the likelihood of a customer to accept your offer); and propensity to buy (identifies customers ready to make their purchase) play a crucial role in shaping the marketing campaigns and increasing the effectiveness of every marketing dollar. It is also important for online service providers who spend a significant amount of money in user adoption / acquisition.

With the advent of SaaS providers who have built their own number crunchers and adopted proprietary analytics, it is now possible to leverage several of these techniques
on a pay-as-you-go model.

This might be as simple as placing a ‘tracking’ code snippet on one’s website to sending customer behavior data via their APIs. This allows one to harness the models without having to actually build the platform. Most of them promise increased customer acquisition, maximizing per-customer revenue/ profitability, and improved retention.

Big data analysts are increasingly adopting newer tools and techniques in the ever-competitive eCommerce space to increase focus on customer acquisition. Today, the role of a data analyst is not limited to making proficient use of the latest tech in finding solutions and reaching crucial conclusions.

Specialized skills to detect eCommerce fraud, understand and predict user behavior
have given rise to a new era of ‘Business by Engaging Customers.

Leave a Reply

Your email address will not be published. Required fields are marked *