One of the key aspects of digital transformation is digitization of legacy documents and analytics on the text content. Text is the easiest form of data from which insights can be gleaned using text analytics tools and algorithms.
Indium Software offers teX.ai', an AI-based text analytics suite of solutions to offer impeccable data scraping, validation, classification, summarization, clustering, topic modeling and a lot more.
Ram Sukumar, co-founder and CEO, Indium Software, tells us more. Excerpts:
DQ: Briefly explain Indium Software’s offering in India and globally?
Ram Sukumar: Indium Software is a leading data analytics, AI and ML services company with focused offerings in advanced analytics solutions, Big Data, data warehousing and BI solutions. Combining these with robust full stack and low code development capabilities, Indium helps customers in their digital transformation journey through a gamut of solutions that deliver business value.
DQ: What is text analytics and how can it help businesses?
Ram Sukumar: Text analytics is gathering text data from various sources and preparing it for analysis. Text analytics is used to convert the unstructured data into a structured form to obtain actionable insights. It uses many linguistic, statistical, and machine learning techniques to analyze and provide results.
Businesses today, generate huge volumes of text data with more and more people communicating digitally. Organizations can use text analytics to analyze social media interactions, product reviews, getting a summary of documents, understanding the trending themes, and many more.
DQ: What are the key factor driving the text analytics market growth?
Ram Sukumar: The text analytics market projected to grow from $3.2 billion in 2016 to $8.8 billion by 2022, at a CAGR of 17.2% during the forecast period. Today, analysis of numerical data is in a matured phase; the need for analysis of text data is the need of the hour.
Mentioned here are a few key drivers of text analytics:
* Understanding the customer has always been critical as it has decided the fate of many a company. Understanding the customer through their reviews and comments is the primary growth driver for text analytics.
* Text data comes from multiple sources and in various formats. The dire need to see the text data categorized in one single format and presented neatly to derive actionable insights is a major growth driver.
* Another significant growth driver is the growing need for social media analytics.
DQ: How does Indium’s teX.ai solution work?
Ram Sukumar: teX.ai is an Ai-powered text analytics product that is majorly used in three areas -- text extraction, text summarization and text classification. It uses Python libraries, like Tabula, Camelot, Tensorflow, Keras, Selenium for extraction, and structuring of data.
teX.ai uses a lot of deep learning methods and algorithms, and puts them in innovative recipes to get industry-grade results. Some examples are as below:
* Using CNN like methods for identifying table or chart like areas and using pre-trained OCR methods to extract tabular data. Marking non-tabular data as peripheral and using Conditional Random Fields for structuring.
* Using edge detection methods to identify cell boundaries of a table to extract each cell and then OCR it at a cell level for high accuracy. Train a new CNN for handwritten digit and character recognition.
* Using CRF and LSTM-CRF algorithms for positive/negative keyphrase extraction.
* Using kMeans clustering on ELMo embeddings of extracted keyphrases to semantically cluster these keyphrases.
DQ: Which are industries that can benefit from Indium’s teX.ai solution? Also elaborate on use cases.
Ram Sukumar: Industries like retail, BFSI, manufacturing, legal, e-commerce, and all companies that deal with a huge amount of text data are the ones who benefit from teX.ai.
Some use cases:
* Banks or any B2C company in the BFSI domain have a routine task to perform KYC. As a part of KYC, they receive bank statements or passbook copies in either pdf or image formats. These docs have a lot of text data embedded in tables or as plain text. teX.ai can extract information like withdrawal, deposit, the balance from tables, or outside tables as it is or in a customized way.
* Banks receive credit score documents of customers from credit bureau agencies. These documents are either pdf or images which contain lots of information such as past payment behavior of loans, allowable loan balance, average income, outstanding loan amount, and credit score. Their positions cannot be guessed. teX.ai helps with effective information extraction by custom field choices, which further help in validating credit scores.
* Customer support of all the firms receive tickets in the form of chats, teX.ai helps in identifying the topics and the important phrases under those topics. That way, companies can focus on weaker areas - mistake-prone functions.
* Retail, B2C, and e-commerce firms have numerous customer reviews that run into thousands and, at times, millions.
* Research companies have a lot of documents in the form of a knowledge base, and they have pre-defined labels under which they have to be grouped. For instance, short 1 page documents can be analyzed to form auto clustering of documents into health, business, finance, economy, and so on.
* Equity research companies scan through annual reports of companies running to 100+ pages to unearth data. The critical data might be embedded in multiple tables numbering in a few 10s to 100s. teX.ai can help in wholesale tables extraction instead of scouring through pages and individually extracting and can help in customized extraction of tables with a particular field search.
DQ: Elaborate on Indium’s market leadership and way forward business plan.
Ram Sukumar: Indium launched teX.ai in 2019. We have already been seeing considerable traction in global markets. We have signed up with a couple of conglomerates and haven working with them for the past 12 months. We are confident of adding a minimum of 20 customers across the globe in 2020.
Our offerings consist of multiple models (SaaS and on-prem), and the solutions are customized as per the customer’s requirement.
* SaaS - Extraction per document model
* SaaS - Analysis per review
* SaaS - Customized volume licensing model.
In the interest of adhering to customer policies regarding data security & data confidentiality and avoiding interdependencies, We work on on-prem models as well.