By: Sameer Kunde, Assistant Manager, Business Development, Product Engineering Services, Sasken Technologies
Machine Learning, Artificial Intelligence, Deep Learning and related buzzwords have gripped the technology world like never before. These are not new terms; some of them were coined way back in the 1950s. Moreover, researchers have been steadily chipping away at complex problems in these areas all these years. So what’s changed now? It’s basically the humongous computing power that we have today that has made Artificial Intelligence (AI) and Machine Learning (ML) a ‘practical’ solution for many problems now. But is this humongous computing power really enough in the context of embedded devices? Can the octa-core processor flaunting smartphones become ‘really’ intelligent? I am making an attempt to present a view of Machine Learning from an embedded edge device (e.g. smartphone, IoT Hub, voice assistant).
The Process of Machine Learning
Getting a Machine Learning algorithm to make an accurate prediction is a three-stage activity. In the first stage, the training data on the basis of which the algorithm will learn is pre-processed. The kind of pre-processing varies depending on the data and the needs of the algorithm. For instance, an image recognition algorithm would need the training data (images) to be converted to a certain pixel format in the form of a matrix. The second stage is the actual training of the algorithm wherein it digests all the training data and gets ready to make predictions. This is generally a computationally intensive process and is often accomplished on centralized servers using powerful GPUs. The third stage is the inference or prediction stage where the trained algorithm (also called the ‘model’) makes prediction on a new image or data that it has never seen before.
Currently, in most implementations of intelligence on edge devices (like smartphones), neither the learning nor the prediction happens on the device itself. A pre-trained model is kept on the cloud and the device queries the model over the network to get the prediction. In effect, the device asks the user to wait while it makes a prediction, thus ruining user experience as real time predictions cannot be made. There are also security and privacy concerns about sending out a lot of user data over the network to the cloud for processing.
However, there have been advances in this area of ‘on-device prediction’. Chipset makers like Qualcomm, MediaTek, NVIDIA are building ‘Neural Processing’ capabilities on their SoCs. The accompanying SDK allows developers to run the trained model on the device and get a quick local prediction. Moreover, Google has released a for-device version of its famed ML framework TensorFlow called TensorFlow Lite. Similarly Facebook has released Caffe2Go as a for-device version of their famed Caffe2 ML framework. Notably, both these are open source.
This ability to make local on-device predictions opens up immense possibilities. For example,
- Real-time monitoring of health parameters on a medical wearable and a quick response in case of anticipated emergency
- Terrain sensing drones that can be used in emergency search and rescue operations
- ‘Smarter’ IoT Hubs that do some basic edge processing of data from multiple sensors to make a quick real-time decision
- Smart imaging and Intelligent Orientation Decisions on-board Cubesats flying in space
Challenges in On-Device Predictions
While the ecosystem for local on-device prediction is developing rapidly, getting this working for important use cases is fraught with tricky issues for developers and software architects. For instance,
- Compressing a model with a storage and execution footprint that would be practical for use on an embedded device – a process known as quantization
- SoCs like Snapdragon have a CPU, GPU and DSP on board. Balancing the load on these optimally to get the best possible performance
- Striking a balance between how much prediction should be done on the device vis-à-vis the cloud.
Then, there is a question of whether a very small embedded device like an Arduino or a Raspberry Pi can do any predictions at all. This has implications for the world of IoT where an IoT hub may be required to do basic inference and filtering of sensor data based on a ML algorithm, while consuming very little power. Of course, running a small model may be feasible, but a multi-layer convolutional neural network might stretch the device beyond its processing capability.
Next question – Is it possible to even have some (if not all) of the learning/training process also on the device? Learning is computationally intensive and currently embedded devices are just not up to it. Google has recently made a proposal termed as ‘Federated Learning’ which is a hybrid architecture with some learning on the device and some on the cloud. In this architecture, a centrally trained model is first downloaded to the user’s device. Based on the data on the device the model makes an incremental change and sends this update back to the cloud where the central model incorporates it and relearns. Similar updates can be received from several users and this is the real power of this architecture which allows the central model to relearn very fast based on data from multiple users. Google is already using Federated Learning for GBoard on Android devices. While critics would point out that this architecture is not entirely on-device learning, it is certainly a novel way of making devices smarter and up-to-date with a model that is continuously updated.