Artificial intelligence

AI and data irony – Ferrari without fuel?

Or, a lake without oxygen? While data is exploding and dancing like never before, AI is still not able to convert this ocean into the juice of actionable intelligence as much as, and as fast as, we hoped it to. What could be holding AI back?

Nutrients are good. Like data! They feed, they nurture and they help the lakes, whether water-bodies or data-lakes, grow. But, is too much of any good thing always a good thing?

What could have been seen as a sign to open the hammocks and relax could actually be a red flag to take out the shovels and roll the arm-sleeves. Kind of like eutrophication. Imagine a lake getting enriched with minerals and nutrients flowing from a suddenly-available source nearby. That’s supposed to be good, right? But all this rich flow of the good stuff can actually work the other way. It can help the algae to grow instead of helping other organisms that are not as fast or as ready as the algae. The resulting excessive growth of algae actually depletes the oxygen in the water, harming other plants and animals that need it.

Having the flow right, in other words, is sometimes as, or even more, important than having the flow. With breakthroughs that brought affordability, scalability and simplicity for collecting and analyzing more data; it was easy to presume that artificial intelligence is all set for explosion and exuberance. Turns out that just having cheaper and better compute technology or storage or databases was not the answer, yet! The real answer is still elusive.

Does it lie somewhere in a paradox – What if too much (and too easy) data has actually turned into a challenge for AI. Was AI not all about data to start with?

Data scarcity – Yes, in this very age!
As counter-intuitive and preposterous as it may sound, we may be surrounded by mountains of data and still be starting at a serious dearth of data.

Data is crucial for algorithms to work in an AI context, remarks Arup Roy, Analyst, Gartner. “There is definitely no dearth of data, but when you get down to digging it for useful and curated data, then it is a different scenario.”

The volume of data enterprises accumulate today has grown tremendously due to the increased sources of data and customer touch-points, agrees Faisal Husain, Co founder and CEO, Synechron. But the success of an AI-enabled program depends on the quality and quantity of data transmitted through the data pipeline, as he underlines next.

“Enterprises face ‘Data Scarcity’ when it comes to training AI algorithms, resulting in manually labelling the training data. This makes the dataset prone to human errors, thereby affecting the accuracy of insights derived.”

Deficit in AI is not in the start; but scale-ups, argues Vishal Vasu, CTO, Dev Information Technology. “Though a lot of enterprises are taking a plunge in, there are very few that stand the test of time to grow and scale.” He reminds of the good old adage here – Garbage in is garbage out.

“Access to data is the key. Once you have the data, it has to be cleaned, de-structured and again re-structured. Without right data you cannot train your AI models. And if your data is not of sufficient quality, you need a lot of resources to fix it which can be time and capital intensive.”

Data half-baked, half-squeezed
Fragmentation is another monster that lurks around AI’s future contours. AI projects are only as good as the data fed into them, feels Vinod Ganesan, Country Head – India, Cloudera.

“Multiple data sources generate tonnes of structured and unstructured datasets that exist in silos across the organization. The biggest challenge that enterprises face is with the collection and integration of this data. Without a proper infrastructure, data remains undiscovered and unused across the organizational network.”

That’s not it! The discovered data then needs to be cleaned and classified so that it is prepped to be easily found and used as and when needed. “At this stage, it’s important to accurately classify data so that the AI model that’s being trained on it does not form any underlying biases that affect the output.” Being fluid and not fragmented can be a big determinant in the AI game. Husain dissects that AI systems are built to replicate neural systems in the brain. However, they experience difficulty when it comes to transferring their learning from one set to a similar but different problem set.” It is necessary to maintain an intelligent collection, storage, transformation and tagging to optimise the results.”

Precision and presence at the right time – they can easily decide whether an AI project shines or struggles. Ashish Khushu, Chief Technology Officer, L&T Technology Services (LTTS) illustrates the value of right data by citing a platform that the company has developed for AI.

“Avertle, a Condition-based Equipment Maintenance solution, employs machine learning principles in predictive analytics to proactively notify operators about potential machine failure. The solution simulates data using the Digital Twin approach to compensate for missing historical information. Here, our domain expertise ensures that the credibility of the simulated data remains intact.”

Data not trained – cocoon issues
If AI is the caterpillar and data the mulberry leaf that feeds it, there is a long way to go before the wings come out.

Shahin Khan, founder partner and analyst, OrionX.net avers that despite so much, so overwhelming and so noisy of an availability of Big Data, AI projects struggle with the data that snaps right into place and is not AI-brain-dead.

Khan has more than an inkling of the gravity and source of this crunch. As he spells it out hitting the nail right on its head, with a firm hand, but a sunny face – fundamentally, the reality of AI via Deep Learning is becoming better understood. It’s a phase but the issues are real, he contends. “AI models need a lot more data than was hoped, and then only work within the scope for which they were trained. Data capture is hard and once you set it up and it’s working, people tend to leave it alone. ‘Don’t touch it’. It’s easier to re-train or manually tag. Rethinking of data capture should be an ongoing process.”

As Khan untangles the knots further, a common refrain that we need to start paying attention to is this – “It’s not trained for that. He quips how more humans should use that excuse, by the way: “I’m not trained for that!”

“Data semantics and structure play a big role in how you view the data you have and what questions you might ask of it. Data needs to get structured to show its value, while unstructured data is the part that is growing. Best models structuralism data at inception.”

Data stuck between IT and AI
The transition between yesteryear IT and modern AI infrastructure could also be a latent rope that pulls back AI. Ramprakash Ramamoorthy, Product Manager, Zoho Labs explains. “Traditionally, IT software has been used to automate processes, but more recently, AI is automating decision making; the challenge lies in how to fit this new capability into the everyday hierarchy and getting users to embrace AI that introduces probabilities rather than decisive results.”

This can be seen in the area of network monitoring, as he illustrates. “Earlier, we would simply notify the IT team that a server has gone down, and the team would follow a standard remediation procedure. A modern day AI system can predict that there is a 60 percent chance of a server outage in the next hour. The challenge lies in designing processes around this possible outage.”

AI applications can’t be fully efficient if enterprises continue to use legacy IT infrastructures. AI projects often run into scalability constraints with traditional models and fail to achieve their full potential, Husain observes as well.

Ganesan, too, strongly recommends that while developing an AI solution, organizations must also consider how easily and efficiently it can be scaled up or modified. Here, legacy IT systems of organisations can restrict the seamless and independent scaling of AI models.”

There is data, and then, there is digitised data. Anibha Athalye, Domain Specialist, Growth & Solutions, Persistent Systems Ltd, brings that distinction to the table here. “A lot of industries in India still struggle with this. Most data would be paper and documents and so, digitisation needs to happen as a first step before any advanced analytics, AI etc. can be applied to these.” She points out how application of AI and ML to healthcare is quite progressed in US because the patient data is well maintained and is of a good quality.

The legacy-new infrastructure gap is a critical area in Athalye’s assessment as well. She laments the low investment into basic foundations like data lakes, data warehouses, etc., that makes it difficult, since the initial cost of a simple AI use case will be higher with no data integration in place.

Data still unboxed
A big pet peeve or concern for AI adopters, and for reasonable worries, is the issue of AI being a closed box. Algorithms may impress us at learning fast and fiercely, but ‘exactly how do they do that’, is something that cannot be shrugged away so easily.

Can machines grasp the limitations and unsolved mysteries of Maths the way mathematicians tend to? It is a problem, agrees Athalye. “Because there is a concern with trusting the black-box models especially in a regulated industry and especially healthcare. When you have an AI powered diagnostics tool, it is important to provide insights to the caregiver as to what factors led to a typical insight by the model. Providing a way to show what thought process went into a decision-making is very important and it is the closest it can come to human decision making as well.”

Patience, boy, patience
It’s not the technology that’s holding us back; it’s more towards how to leverage these new tools into businesses. Vasu turns the lens on the executives too, who he opines, sometimes simply don’t know where to start. “It takes patience and endurance to build an AI driven solution. The current talent supply may still be undeveloped, leaving companies bereft of the right talent.”

The sentiment echoes with Ramamoorthy who also suggests a good dose of patience while we hope for AI to bloom full and fresh. “It’s only natural that this transition takes time. One way to accelerate AI adoption is to introduce explanations. Since any decision that comes out of an AI system is going to be acted upon by teams rather than individuals, an AI system that is able to explain its decision will win the trust of the teams; teams may even begin to rely on the probabilities provided by their AI systems.”

AI product development tests patience, affirms Vasu. “Developing an AI system can take ages, literally. The time gap between idea, theories, and actual product realization is huge, making it many to drop the ball midway.” There is nothing wrong with nutrients. All we need is to set them in the right flow and for the right organism!

Gartner’s Roy recommends that enterprises should keep experimenting but avoid making any big strategic bets yet. “Vendors need to evolve and the ecosystem has to mature well. Enterprises should evaluate vendors well, compare and contrast them and weed out those who are just here to ride the wave. Also make sure AI integrates with your existing ecosystem of technology. The problem of disparate technology is a big one to address here.”

It is not an easy shift, as it turns out. “There is no exact formula for successful AI and people are still figuring it out. Enterprises can also get a better vantage point by developing in-house capabilities as well; instead of only relying on vendors. But stop having unrealistic expectations from AI. It is not magic. It is a technology, and one that is half-baked.”

It is easy to show off a ‘Look, I am doing AI’ trophy. But what is actually remarkable is something else. Time for Eutrophy. Time to set AI’s flow in the right groove.

— Pratima Harigunani.

1 comment

  1. Charles Lawrence

    Trying to help for all here.

    It is possible that a secured but loggable/expandable “BIOS” of known patterns found in structured/unstructured data will secure an extremely efficient response in AI integrity.

    Let us say that there is unstructured data, 1s and 0s.

    Having a dedicated data stream patterning AI system is ideal for a reference database, similar to a bios in hardware. After an absolute accuracy is obtained from data stream patterning, it is archived and “hard-bound” into the reference database, so that when the pattern occurs in future stream processing, the AI will not need to compute that particular portion of the data stream.

    The frequency/amplitude/pitch of the data stream is rarely considered in these cases… 😉

    Integrating known patterns into a universally open-source accessible database for all AI developers is ideal.

Leave a Reply

Your email address will not be published. Required fields are marked *