/dq/media/media_files/2025/05/09/RSf7kdEvNYQfF6O6jKue.png)
As generative AI technologies become more sophisticated, the danger of safety and reliability issues from multimodal AI models is becoming more evident. A new report from Enkrypt AI, a leading provider of AI safety and compliance solutions, lays out in stark relief the serious dangers that can be used to undermine the integrity of AI models.
An extensive amount of red teaming was fundamental in developing claims of safety protocol from the Enkrypt AI report, which included some serious safety protocol deficiencies that could allow for multimodal exploitable gaps that could lead to enterprise liability, public safety and threats to vulnerable populations.
The emergence of new vulnerabilities in multimodal AI
Multimodal AI models accept both textual and visual input to create different types of outputs associated with both media types. The implications of multiple data types also have implications for attack surfaces that cannot be predicted by simultaneous input and/or processing. The report notes that recent jailbreak techniques combined with multimodal system content filters are capable of producing harmful output. Importantly, the dangerous outputs are often produced without any obvious outputs in visible prompts/inputs, somehow evading existing safety mechanisms to expose risk.
Sahil Agarwal, CEO of Enkrypt AI, articulates the rising anxiety: "Multimodal AI has untold benefits, but it also opens an attack surface we could not have even anticipated. This research is the alarm bell: embedding harmful textual instructions inside benign images has staggering implications - enterprise liability, public safety, and child protection are just the start."
Specific risks and findings
The report examined two popular multimodal models from Mistral, Pixtral-Large (25.02) and Pixtral-12b. In addition, Enkrypt AI's research found that these models have a disproportionate likelihood of producing harmful outputs, include child sexual exploitation material (CSEM) and hazardous CBRN (Chemical, Biological, Radiological, and Nuclear) information.
Results indicated that, when adversarial inputs were used, Pixtral-Large and Pixtral-12b were 60 times more likely to produce a CSEM-related textual response compared other models in the test group, including OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet.
The models had an 18-40 times greater likelihood to produce hazardous CBRN-related information by prompt injections encoded in image files. These prompt injections in images did not suffer from the limitations observed with classic prompt injections because the vulnerabilities inherent to the multimodal models allowed inputs to bypass filters employed when harmful text is flagged.
The implications of these results are staggering. The report concludes, as models evolve in complexity, we are not only developing new harm vectors, we are also potentially elevating harm when our security measures cannot be modified to replace all of the pre-trained knowledge.
Why these risks matter
The threats described in the report are not a threat on paper; they are real and present threats to the security and safety of multimodal - or AI-driven - systems. This use of multimodal models creates serious problems for AI builders, businesses and regulators.
These vulnerabilities can have harmful consequences for individuals and businesses, particularly in areas of responsibility with respect to AI functions such as content moderation, child safety, and public safety.
The ability to embed harmful prompts in images, which can then be used in conjunction with text as an attack, is a critical oversight for existing approaches to AI Safety frameworks. Traditional tools for content moderation can identify harmful text, but in this new type of attack, users and businesses are at risk.
Mitigation strategies: Strengthening safety alignment
In this report, Enkrypt AI made various recommendations for how to prevent these new threats to multimodal AI, to encourage the reduction in the vulnerabilities highlighted in the study, and to keep AI systems safe for all their users. Some of the key recommendations described include the following:
-
Integrating Red Teaming Datasets: Red teaming exercises, where AI systems are subjected to adversarial testing, should be incorporated into regular safety alignment processes to identify and address weaknesses before they can be exploited.
-
Continuous Automated Stress Testing: Developers should implement continuous stress testing of multimodal models to simulate real-world adversarial attacks and assess their resilience against them.
-
Context-Aware Guardrails: Context-sensitive safety measures that can understand the nuances of multimodal inputs should be deployed to identify and mitigate potential risks in real-time.
-
Real-Time Monitoring and Incident Response: Establishing robust monitoring systems to track AI outputs and detect harmful content as it is generated, coupled with a clear incident response protocol, can help mitigate harm quickly.
-
Model Risk Cards: Transparency is key to managing AI risks. Model risk cards should be created to document and communicate the specific vulnerabilities of different models, allowing users to make informed decisions about the AI systems they deploy.
Sahil Agarwal adds to the reason why we need to have a proactive response: "these aren't theoretical risks. If we are not safety first with multimodal AI, we put users, and potentially vulnerable populations, at significant risk."
Conclusion
As we continue to explore the capabilities of multimodal AI, we also see growing risk of threat exposure with its adoption. The findings of the Multimodal Safety Report from Enkrypt AI are a reminder of the need to approach AI safety with urgency and foresight.
On one hand Greater potential for multimodal AI still leaves one worried that it could lead to harm if it is developed without safety capabilities integrated into their design and use. In the future, AI developers, enterprises and policy actors must collaborate and take actions to lessen the risks identified in the findings to allow us to experience the benefit of multimodal AI in a safe and secure way.