According to forecasts, the market for generative AI will grow from USD 11.3 billion in 2023 to USD 51.8 billion by 2028, at a CAGR of 35.6%. Despite the fact that the market for generative AI is anticipated to increase significantly in the coming years, it is crucial for organizations utilizing generative AI to comprehend the data governance and security obligations of integrated services.
Sanjay Deshmukh, Senior Regional Vice President of India and ASEAN at Snowflake, on the Future of AI and Data Governance
DQ: What should organizations consider when working with LLM regarding risk, security, and governance?
Sanjay Deshmukh: That’s an essential question. Let’s break it down. First, it’s crucial for enterprises to realize that they’re not merely consumers. Unlike individual consumers, enterprises hold their customers’ data in trust, along with valuable intellectual property owned by shareholders. Protecting these assets is a fundamental responsibility.
As organizations embark on the journey to leverage disruptive technologies like AI, they must prioritize assessing risks. The majority of these risks revolve around data. Therefore, we recommend starting with a robust data strategy that empowers the AI strategy. Data is the lifeblood of AI, and a solid data strategy lays the foundation for effective AI development.
DQ: Could you elaborate on AI’s future impact on the industry and data cloud ecosystems?
Sanjay Deshmukh: AI is undoubtedly a disruptive force. It has the power to humanize interactions between users and computer systems fundamentally. We’ve come a long way from the complexities of old Unix systems. AI is poised to make interactions more natural and conversational, removing the need for technical jargon. This transformation will drive productivity gains, enhance user experiences, and provide valuable insights.
However, organizations must acknowledge certain risks when adopting AI. I would like to emphasize three crucial points that, in our view, enterprises should prioritize when acknowledging the risks associated with external large language models and foundation models:
- Security Risk: AI models often require data to leave an organization’s security perimeter for cloud processing. This poses significant security risks.
- Data Relevance: Large language models are trained on public data, not proprietary data. Businesses need solutions tailored to their specific problems.
- Broad vs. Specific Models: While foundational models are versatile, businesses often require customized solutions.
Recognizing and addressing these challenges is critical to formulating a sound AI strategy. At Snowflake, our goal is to innovate and develop a platform that assists customers in addressing these issues.
DQ: Data privacy is a major challenge. How does Snowflake navigate this?
Sanjay Deshmukh: When Snowflake was founded, governance was a core principle. We aimed to address data fragmentation issues where data was scattered across multiple silos. This fragmentation occurred due to limitations in technology or resources. Different departments, such as finance and sales, had their own data repositories, referred to as data marts or data warehouses. This fragmentation resulted in inconsistent views of the same business metrics. A common scenario was that in a business meeting, representatives from finance and sales would discuss revenue as a key performance indicator (KPI), but each had a different perspective. Consequently, a significant portion of the meeting time was spent debating whose view was correct. This was the first challenge we aimed to address. We built a platform to consolidate data into a single location, establish a single version of truth, and implement a governance layer. Our governance approach involves identification: you must classify what is critical and requires protection. Not all data within an organization falls under categories like Personally Identifiable Information (PII) or privacy-sensitive data. Some data may not need the same level of protection. The second step is protection: our platform offers various capabilities, such as
tokenization, masking, and role-based access control. Finally, the third step is ensuring that these policies are consistently applied whenever the data is accessed. This commitment to governance remains central to our platform, ensuring data protection and privacy.
DQ: Speaking about governance and security, I would like to know your views on the recent PDP bill and its significance.
Sanjay Deshmukh: While I can’t comment on specific bills, I can emphasize that our approach is to comply with data regulations in every market. These regulations typically emphasize data sovereignty and Personally Identifiable Information (PII) protection. Snowflake meets these requirements by hosting data within specific countries and implementing robust governance measures. For instance, in the Indian market, we offer the Snowflake Data Cloud through AWS and Azure regions hosted in India, fully meeting the data sovereignty requirement.
We concentrate on protecting PII data through processes like classification and implementing security policies such as tokenization, masking, and access control. Our governance capabilities ensure that no matter who is accessing the data for what purpose—be it a data scientist training a model or a business leader creating a dashboard—the PII information remains secure. We are confident that Snowflake Data Cloud’s governance features meet the regulators’ requirements for safeguarding PII data.
These two aspects, data sovereignty and PII data protection are the core elements found in most regulations worldwide, and Snowflake fully complies with them. There are additional nuances depending on industries, including certifications and encryption requirements, which we also meet in nearly every market. As a result, Snowflake has gained adoption in various sectors, including banking, capital markets, and government institutions. Customers trust us with their data because we consistently meet these expectations.
In summary, our approach is global, and we are confident in our ability to meet the standards set forth by these regulations.
DQ: How can enterprises fully leverage generative AI?
Sanjay Deshmukh: To harness the full potential of generative AI, enterprises must address three key challenges. Firstly, security concerns about data leaving an organization’s boundaries can be mitigated by hosting large language models within Snowflake’s secure perimeter. Secondly, partnerships with GPU providers like Nvidia enable effective model training on proprietary data. Lastly, enterprises should focus on specific business problems and tailor large language models to address them. Snowflake’s approach ensures technology serves distinct business goals. For instance, in the banking sector, the problem might be reducing Net Promoter Scores (NPS) by identifying customers with better credit ratings to mitigate loan defaults. Or in retail, it could be hyper-personalizing marketing campaigns. Once the business problem is defined, we assist customers in identifying the required data and, if necessary, the relevant large language model to address that problem. Our approach ensures that the technology serves a specific business goal rather than the other way around.
An excellent example of this approach is our Document AI capability, which extracts structured content from unstructured documents like PDFs. This enables organizations to make better use of previously inaccessible data. For instance, it allows banks to analyze interest rates and property details from loan agreements, leading to more informed decision-making.
In summary, our approach is to use large language models in a focused and targeted manner to solve specific business problems, resulting in tangible and meaningful outcomes for enterprises. This approach ensures that generative AI fully leverages its potential for businesses, rather than relying on broad foundational models more suited for consumer applications.
DQ: Do you believe that the challenges explained earlier are some of the major obstacles that enterprises encounter when attempting to utilize generative AI and Large Language Models (LLM)?
Sanjay Deshmukh: Yes, these challenges are significant. However, it’s crucial to note that we’re still in the early stages of the generative AI and LLM journey. Enterprises have been integrating AI and machine learning into operations for a while. The shift to external LLMs, along with the ability to tailor them to specific business needs, marks a substantial change.
In the 1.0 phase, machine learning models were primarily developed by in-house data scientists employed by enterprises. These data scientists used various programming languages, such as Python, and accessed data from platforms like Snowflake to train and deploy models. These models were then utilized in various business applications, ranging from credit risk assessment to fraud detection. One of the most common and impactful examples of AI and machine learning for consumers is the recommendation engine, used by platforms like Amazon or Netflix. This engine analyzes user behavior and suggests products or content, contributing significantly to revenue generation.
Now, moving to the 2.0 phase, there’s a fundamental shift. In the 2.0 era, the process of building models has been partially outsourced. Enterprises can now leverage models that have been developed by external entities like Jenni AI and others. These models can be deployed and fine-tuned within the enterprise’s specific context, all while using Snowflake as a foundational platform.
This shift is significant for two main reasons:
Firstly, the development of the model itself is no longer solely an in-house task. Enterprises can tap into models created by external experts, saving time and resources.
Secondly, these external entities have the capabilities and resources to train models on vast volumes of data, something that many individual enterprises may struggle to do on their own. Enterprises can then take these pre-trained models, apply their proprietary data, and make them highly relevant and effective for their specific business needs.
In essence, the 2.0 phase allows enterprises to leverage the collective knowledge and resources of external entities while tailoring these models to their unique requirements through proprietary data training. This represents a significant shift in how AI and machine learning are harnessed for business purposes.
DQ: Snowflake recently acquired NEEVA AI. How has this strengthened your position?
Sanjay Deshmukh: This acquisition is indeed quite exciting, and I’m pleased that you brought it up. I had the opportunity to meet with Sridhar, the founder of NeevaAI, and I must say that while it might not be widely known, they were one of the early pioneers in using large language models for search. Sridhar himself was a part of the Google Search team before venturing out to create Neeva AI. Together with his founding team and other talented engineers, they developed large language models that revolutionized the way search results are summarized and presented.
The acquisition of NEEVA AI has brought exceptional talent led by Sridhar himself and expertise in large language models to Snowflake. We’re integrating these capabilities to enhance user experiences. For instance, users will benefit from simplified query building using plain English language interfaces.
This acquisition has empowered us to improve our software capabilities significantly. e are leveraging their large language models to streamline the usage of our own software. We’ve implemented this into our documentation, enabling technical administrators of Snowflake for various customers to use simple English language interfaces to search for documentation and obtain guidance on how to maximize the platform’s utility.
In summary, we are extremely excited about the Neeva acquisition. It has not only expanded our talent pool but also empowered us to enhance our software capabilities and improve user experiences. Prior to Neeva, we acquired Applica, which provided us with the foundational model for building Document AI, further strengthening our position in the AI landscape.
DQ: How does Snowflake enable enterprises to build LLMs with their own data?
Sanjay Deshmukh: Snowflake offers four distinct methods for users to access and utilize Large Language Models.
Firstly, we provide our customers with access to our own LLMs, like Document AI. This is an excellent example of a Snowflake-built LLM that allows customers to extract structured information from unstructured documents.
Secondly, we enable customers to use LLMs developed by our partners. These partner-developed models are integrated into our Snowflake Native App Framework and made available in the Snowflake Marketplace. Customers can easily deploy these models on their data. This approach broadens the range of LLMs available to our customers.
Thirdly, we allow customers to access open-source LLMs or models developed by other companies and deploy them within their Snowflake environment. This capability is known as ‘No Pod Container Services.’ It ensures that sensitive data remains protected within the Snowflake environment, meeting security requirements.
Lastly, if there is a foundational LLM hosted externally, such as in the cloud, and a customer wants to harness its power while safeguarding their data, we offer a solution called ‘Streamlit’ that serves as an interface for users. Behind these interfaces, large language models operate. The ‘Stimulus’ capability is used to create AI-driven applications that can access external foundational LLMs. However, it is crucial for customers to ensure that sensitive or personally identifiable information (PII) is not used in these applications to maintain data security.
In summary, Snowflake offers four distinct ways to enable our customers to access and utilize Large Language Models: our in-house LLMs, partner-developed models in the Snowflake Marketplace, open-source models through No Pod Container Services, and external foundational LLMs via Streamlit applications and open APIs.