Making the case for agile and adaptive AI and ML

The COVID-19 pandemic has been an unprecedented event. There are obvious disruptions in the Healthcare sector particularly in areas like accelerated clinical trials and sudden spike in demand for Telemedicine. We also saw some major shifts in lifestyle patterns and will continue to see these in the coming months.

2020 has of course up-ended many areas of business and home life – from a drastic decrease in community activities and sports events, to staying and working from home, to the face mask becoming ubiquitous.

There have also been major impacts of course on online retail and supply chain. Top searched items on online retail websites changed from mobile phones and toys to face masks and sanitizers. Home broadband traffic jumped with remote work being the norm – plus the consumption of domestic electricity.

These also surfaced some key challenges faced by artificial intelligence (AI) systems. AI has disrupted these industries, driving the next level of maturity in digital transformation, but AI is dependent on data that is used for training and making predictions. So a system trained to predict future demand patterns at an electric utility could not foresee the sudden increase in demand, giving incorrect predictions. Supply chain optimization also relies heavily on AI and saw the effect of missed delivery estimates and under-scheduling due to the impact of lockdowns. In most cases we needed a human to step in and override the AI prediction to reflect reality. As a result, these issues have forced us to rethink the productionization of AI and consider agility and adaptability.

Agile is mostly associated with software development – to build and deliver software faster by encouraging active collaboration between teams and promoting tools to automate testing, integration, and delivery of code. Agile encourages organizations to employ a DevOps strategy where development and operations teams collaborate to deliver and manage production software.

Agile is also very applicable to AI systems – specifically machine Learning (ML) systems that build models from training data and integrate these models with software. Modern software systems invariably have multiple ML models; the newer versions handle the core smarts of the system whereas the older systems treat the model as a black-box and integrate it as an afterthought in the overall system. Not a whole lot of attention has been given to versioning and the lifecycle of the ML process.

Today this is changing with a new emerging discipline – ML-Ops.  With an emphasises on agile development principles specifically for ML, all the stages of this new pipeline like data acquisition, cleansing, feature engineering, model development, hyperparameter tuning, deployment and monitoring should be automated with emphasis on reproducibility of results.

An active feedback mechanism needs to be in place for ML models which monitors their performance (precision, recall, accuracy) and alerts the ML-Ops teams on the need to update the models when there is a data or concept drift. Data drift occurs when the statistical properties of features used in ML model change – whereas a concept drift indicates the core relationships between the features and the target variable has changed. Both these cases usually cause degradation in model performance and need a retraining and redeployment of the ML model on a better representative dataset.

The next level of maturity for ML systems beyond ML-Ops automation is in adaptive models. These continuously tune themselves as they see new data and can adopt to changing patterns. So, in a COVID-like scenario the demand forecasting models would learn from new data continuously and start capturing the changing patterns and adjust their predictions. Making models adaptive is not trivial and there is considerable research in this area such as Contextual Bandits.

Contextual Bandits is a generalization of the reinforcement learning problem where the ML model adjusts its recommended action based on the context it observes. The context is the features it uses for predictions but may also include other data that is not directly used to establish a correlation with the prediction. These are a type of online or feedback learning systems that can adjust to changing patterns and in the long-term provide better predictions. The only catch is that these systems take some time to tune for changes in data.

The world is rapidly changing, and unforeseen events will greatly affect the benchmarks established by ML models for making real-world predictions. For AI systems to be effective and resilient to these changes, we need to make them more agile and adaptive.

By Dattaraj Jagdish Rao, Head AI Research at Persistent Systems


Leave a Reply

Your email address will not be published. Required fields are marked *