Advertisment

Taking Data Analysis to the Cloud

author-image
Smita Vasudevan
New Update
Big data-as-a-service

Ashish and Joydeep worked at Facebook from 2007 to 2011, were they built and ran the Data Infrastructure team around the Apache Hadoop stack, which was used by the networking giant for large scale data processing and analysis. Their stint at the company helped them realize how crucial it was to make Data analysis easy and collaborative for users to be able to use new technologies. With public cloud gaining acceptance in the enterprise space around that time, the idea of offering big data analytics as-a-service looked more like a possibility.

Advertisment

The idea paved way for a company called Qubole, which is swiftly gaining recognition in the “big data on cloud space”, serving new age companies like Ola, Saavn and Hike messenger in India and more than 150 customers globally. We interact with Joydeep Sen Sarma, Co-Founder and Head, Qubole, India to understand what this Bangalore-based start up is currently up to, and how it plans to resolve the data analysis needs of next-gen businesses. Excerpts.

Can you give us a sneak peak into how things are at Qubole and where is it headed in the big data as-a-service market?

Over the years we have raised $50 Mn in funding across three series, the latest on in January 2016. Qubole’s primary market is North America and Europe and it is increasing its presence in India and APAC regions. It now processes more than 250 petabytes of data each month from its customers, which is more than Facebook processed when it went public in 2012, and 3 times more than Netflix processes. We have been able to grow 3 times internationally in 2015-2016 and aim to grow 2 to 2.5 times this year.

Advertisment

While we started with a focus on Apache Hive - we now offer a full suite of Big Data Technologies (including Spark, Presto, Airflow, Pig, Sqoop etc.) that are commonly required by data teams. From starting our original service in AWS' Virginia data centers - we now run clusters all over the world and in Microsoft Azure and Google GCP as well. The engineering team has expanded from about six people when we started to more than 60 now.

Joydeep-QuboleThrough our flagship product, the Qubole Data Service (QDS), we helped enterprises deploy data analytics architecture quickly and save costs by taking analytics directly to the cloud where all their data is being stored ---Joydeep Sen Sarma

How is the company resolving the data analysis needs of today’s enterprises?

Advertisment

Traditionally, corporates had to build their own data centres to store and analyse their data. This was a disadvantage for many since they had to incur a huge cost at the very beginning and also had to maintain a good pool of data architects and data scientist to build and service the data infrastructure. However, with the rise of cloud and entry of big players such as Amazon Web Services, Google Cloud and Microsoft Azure, this problem has been solved to a large extent. Data storage has become easy and scalability is not a concern anymore. Qubole understood this and through our flagship product, the Qubole Data Service (QDS), we helped enterprises deploy data analytics architecture quickly and save costs by taking analytics directly to the cloud where all their data is being stored.

Users come in through a self-serve interface and write analysis in SQL or Python or Scala and don't have to worry about Cluster management. Data analysts can use a variety of SQL engines - Hive, Presto and SparkSQL - for their needs from the same console. Since 2015 - we have added Spark to our Service with a Notebook interface powered by Apache Zeppelin for Data Scientists and Engineers. This has enabled advanced self-serve data analysis and visualization. All forms of analysis can be scheduled for periodic runs and any results exported to databases for easy reporting capabilities. In addition - Qubole's web service is accessible via ODBC and JDBC interfaces that allow users to bring in their favourite BI tools like Tableau and use them on top of Qubole.

Governance and Security are important aspects for any collaborative analytics environment - and we allow administrators fine-grained control over encryption, data access permissions, perimeter security and different forms of cloud based authorization. Administrators can also get aggregated insights into the various data analysis happening inside their organization and use that to optimize their deployment and control and monitor costs.

Advertisment

Can you illustrate with an example, how you have worked with a company in India?

QDS offers different Big Data Engines like Hadoop, Presto and Spark as a Service - along with tools to export/import Data from different sources, schedule analysis and browser based self-serve interfaces for SQL analysis and Data Science.

One of our earliest customers in India was Capillary Technologies. They are a retail analytics company and their consulting and data analytics team uses Qubole's browser based interfaces for customized analytics and reports for their clients. As is the case with the Retail market, data sources for Capillary are distributed. The first step for them is to use Qubole to extract data from these different data sources all over the organization and centralize them. Analysts and consultants perform interactive analytics against these large data sets to deliver insights for customers all over the world - and Qubole's global capabilities allows them to perform analytics in different regions and keep data local to such regions in keeping with legal requirements. The Qubole platform achieves this without any Capillary admin/ dev-ops help whatsoever un-constraining the analysts.

Advertisment

Indix, based in Chennai, crawls the internet to get large data sets of raw data for millions of products and build a comprehensive and clean product repository by applying proprietary algorithms. They use Qubole's Spark and Hadoop offering extensively to achieve this data crunching.

What do you think about the future trends/opportunities in the big data as-a-service space?  

As is well known - public Clouds are growing at a frantic pace and Big data companies are also growing north. So we feel very optimistic about the growth rate in this market. We have also seen a very clear shift towards the public Clouds in our markets in the last couple of years - with large enterprise companies seriously considering a public Cloud based architecture.

There are many parallel trends that are evident. One is the increasing importance of Data Science. Secondly, more and more data is going real-time - particularly with the emergence of the IoT space. Ease of use and SaaS based self-service analytics is redefining the analytics space in the same way that Gmail redefined Email.

Exploiting cloud based hardware for cost effective and fast data processing is a trend that Qubole pioneered. A similar proliferation of choices continues in the Big Data software space as well - with new entrants (like Apache Flink) showing up at regular frequency. The complexity of navigating these choices and having an optimized, production-ready and well supported Big-Data stack is going to make it increasingly natural for customers to gravitate to Big-Data-as-a-Service solutions like Qubole.

big-data-on-cloud qubole-start-up data-processing-on-cloud
Advertisment