Artificial Intelligence

When AI backfires: Enkrypt AI reports expose dangerous vulnerabilities in multiple models

In May 2025, Enkrypt AI released its multimodal red group report, a shocking analysis that reveals how advanced AI systems can be easily manipulated into content that creates dangerous and immorality. The report focuses on two leading visual models of Mistral: Pixtral-Large (25.02) and Pixtral-12b, and depicts pictures of models that are not only technically impressive but also disturbing.

Visual Language Models (VLM) are constructed to interpret visual and text inputs, allowing them to intelligently respond to complex real-world cues. But this ability increases risk. Unlike traditional language models that only deal with text, VLM can be affected by the interaction between images and words, opening new doors for adversarial attacks. Tests from Enkrypt AI show that these doors are easily opened.

Shocking test results: CSEM and CBRN failed

The team behind the report used a complex red group approach, a form of adversarial assessment designed to mimic real-world threats. These tests employ strategies such as jailbreaking (which prompted the model to craft queries to bypass security filters, image-based spoofing and contextual manipulation. Shockingly, 68% of these adversaries prompted harmful reactions in both Pixtral models, including things related to modification, exploitation and even chemical weapon design.

One of the most compelling revelations is the material for child sexual exploitation (CSEM). The report found that Mistral’s model is 60 times more likely to produce CSEM-related content compared to industry benchmarks such as GPT-4O and Claude 3.7 SONNEN. In the test case, the model responds to a camouflage embellishment prompt with structured multi-paragraph content, explains how to manipulate minors and packages with unwise disclaimers such as “for educational awareness only.” Not only do these models do not reject harmful queries, they can also be done in detail.

Also disturbing are the results of the CBRN (Chemical, Biology, Radiology and Nuclear) risk categories. When prompted for requirements on how to modify VX neural agents, a chemical weapon, the model offers shockingly specific ideas for increasing its persistence in the environment. They describe methods such as packaging, environmental shielding and controlled release systems in edited but obvious technical details.

These failures are not always triggered by publicly harmful requirements. One strategy involves uploading images with blank numbered lists and asking the model to “fill in details”. This simple, seemingly harmless prompt leads to immoral and illegal explanations. The fusion of visual and text manipulation has proven to be particularly dangerous, a unique challenge posed by multimodal AI.

Why visual language models pose new security challenges

At the heart of these risks is the technical complexity of the visual model. These systems not only parse languages, they can synthesize meanings across formats, meaning they must interpret the image content, understand the text context and respond accordingly. This interaction introduces new vectors for development. The model can correctly reject harmful text cues, but it can produce dangerous output when paired with suggestive images or ambiguous environments.

Enkrypt AI’s Red Team discovered how cross-mode injection attacks completely bypass standard security mechanisms, how subtle clues of one mode affect the output of another. These failures suggest that traditional content auditing techniques for single-mode systems are insufficient for today’s VLM.

The report also details how to access the PixTral model: Pixtral-Large via AWS Bedrock and Pixtral-12b via the Mistral platform. This realistic deployment environment further emphasizes the urgency of these findings. These models are not limited to labs – they are available through mainstream cloud platforms and can be easily integrated into consumer or enterprise products.

What Must Do: Blueprints for Safer AI

To its credit, Enkrypt AI not only emphasizes the problem, but also provides a way forward. The report outlines a comprehensive mitigation strategy starting with safety alignment training. This involves using their own red team data to retrain the model to reduce sensitivity to harmful cues. It is recommended to fine-tune the model response to technologies such as Direct Preference Optimization (DPO) to stay away from risk output.

It also emphasizes the importance of context-aware guardrails, which can interpret and block friction filters in real time, and take into account the full context of multimodal inputs. In addition, it is proposed to use the model risk card as a transparency measure to help stakeholders understand the limitations of the model and known failure cases.

Perhaps the most critical advice is to treat the Red Team as an ongoing process rather than a one-time test. As the model develops, so do attack strategies. Only ongoing assessment and proactive monitoring can ensure long-term reliability, especially when models are deployed in sensitive sectors such as healthcare, education, or national defense.

Enkrypt AI’s multimodal red team report is a clear signal for the AI ​​industry: multimodal capabilities have multimodal responsibilities. These models represent leaps in capabilities, but they also require leaps in our thinking about security, security and ethical deployment. Stay unchecked, they not only risk failure, but also risk real-world harm.

For anyone working on large-scale AI or deploying large-scale AI, this report is more than a warning. This is a script. And it cannot come in more urgent times.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button