Why big data and IoT are a perfect match

With significant growth in IoT applications and advancements in big data technologies, organizations are on the cusp of a major opportunity to revolutionize customer experience by combining the two trends

By: Arvind Purushothaman, Practice Head, Information Management & Analytics, Virtusa

The Internet of Things (IoT) represents the next big wave. Gartner predicts there will be 25 bn connected things by 2020. McKinsey Global Institute reports that IoT business will reach $6.2 tn in revenue by 2025. Unprecedented connectivity between ‘things’ and gathering massive volumes of data is going to change the way business gets done. The key word here is connectivity—which implies that ‘things’ must be able to transmit data which needs to be ingested, stored, transformed, and transmitted back.

Also implied in this is real-time and ability to handle large volumes of data. The fundamental premise of big data technologies was around the three Vs (Volume, Velocity, and Variety). With IoT, the volume and velocity needs are growing exponentially. The question is whether the existing technologies can handle the requirements. The three
stages in the data journey include data ingestion, data storage and data analytics, and all of it has to happen in near real-time.

In a recent discussion on this topic at a big data event, a majority of the participants were of the opinion that big data technology was not ready for IoT. When participants were
asked on how they see the technology platforms evolving to handle IoT, they were divided on whether it should be a domain agnostic horizontal platform capable of ingesting
any type of data or be more domain-centric depending on their background. That being said, the Hadoop ecosystem is evolving rapidly with the ability to process real-time
data using protocols like MQTT (Message Queue Telemetry Transport) or messaging brokers like Kafka to handle real-time low-latency data. Coupled with technologies like
Apache Storm that allow for processing of real-time data and HDFS or other NoSQL databases that offer high ‘write speed,’ it can be said that the big data ecosystem is mostly
capable of handling the current needs of IoT applications. The ‘write speed’ is very critical because of time-series analysis that is a common requirement. It is hard to say if
the growth in IoT applications is driving advancements in the big data technologies or vice versa, but the key here is that advancements in technology will lead to a better
customer experience.


One of the points to note in the end-to-end flow of the data from the devices to the customer experience is that the entire process of transmitting data from a device or
sensor includes a lot of data that does not add value. Hence, it becomes important to identify this upfront and build the right pre-processing layer and also put in the
right filters early on to ensure only relevant data is transmitted. As the data flows through the system, it is of higher quality and more aggregated. The reporting and the
user connect is more mature with a lot of technologies in play including data visualization tools and also mobile apps for a connected customer experience.

An alternative to building an IoT platform from the ground up or using a domain agnostic IoT platform is to use industry specific platforms, which are hosted solutions built by leveraging the core technologies of big data with specific domains in mind. An example would be a platform that captures data from automobile sensors. One of the
primary advantages with this approach is the ability of use more of a ‘plug and play’ approach without having to worry about building and maintaining the platform.


Organizations that invest in the IoT space have much to plan for in terms of their architecture. The IoT architecture will be part of their overall information management
architecture, and will extend their existing investments. The approach should be to leverage their existing ‘intelligence’ from traditional data sources including data
warehouse technologies, bring in additional data sources into the Hadoop ecosystem, and also add the IoT derived analytics using a federated approach to enhance the customer
experience through combined insights. This federated approach is very important given that the data is going to be physically stored in different platforms, and
possibly even across multiple cloud providers. A typical federated architecture includes a traditional data warehouse, a Hadoop ecosystem, and an IoT platform.

A standalone IoT platform cannot provide all the insights. Hadoop-based platforms can help correlate data from multiple devices, identify patterns from historical data in near real-time, enable algorithms to convert raw data into a meaningful format, perform complex aggregates that cannot be handled by device gateways, and enable machine learning and related analytics. Some of the early adopters of IoT include the retail, automotive, and healthcare industries. In the retail world, the end goal is a seamless highly personalized and contextual experience. In the automotive industry, sensors in cars can warn about faulty parts and even work with a service provider to schedule appointments. In the healthcare industry, connected medical devices and wearables will help provide early warnings and can trigger workflows which can help save patients’ lives. If you look at the above examples, all of this requires combining the IoT data with data from other sources to provide insights.

As solutions increase in complexity cutting across industry verticals, it may not be possible to go with a single approach, and an open architecture that feeds into a federated solution is a must. This requires building teams that can straddle the traditional with the new age technologies. Business and technology architects will play keyroles along with data scientists, data visualization/frontendengineers, Hadoop developers, data warehouse developers,network engineers, and DevOps engineers.
Whatever be the decision in terms of implementation,the end goal of providing an enhanced customer experience remains the same.

Leave a Reply

Your email address will not be published. Required fields are marked *