LLM04: Data & Model Poisoning

Data & Model Poisoning is the fourth risk listed in the OWASP Top 10 for LLM. Read on to learn how this risk affects LLMs, and what you can do to prevent it.

AI is the biggest thing in tech right now, and AI breaches and incidents have been making headlines all year.

We’ve seen a rise not only in the volume but also in the complexity of attacks across the board, leading security teams to scramble to patch these problems. But most of the time, they are already too late, as AI security requires a proactive approach, from code to cloud.

What is Data & Model Poisoning?

Data poisoning refers to any time data is manipulated to introduce vulnerabilities to a model. These vulnerabilities often consist of biases, misinformation or hallucinations which can cause the model to behave against its training, and backdoors which remain inactive until they are triggered, making it difficult to test for them.

Data poisoning can occur during training stages of the LLM’s lifecycle from early pre-training and fine-tuning to embedding. The risks associated with data poisoning are higher in models distributed through shared repositories or those that draw from external data sources which may contain unverified or even malicious content.

Examples:

Attackers introducing harmful information to the model during its training period, leading to biased outputs.
Users unknowingly sharing sensitive information with the model, which it exposes later.
Developers not putting enough restrictions on the information the model consumes, leading to it ingesting inaccurate data sources.

Mitigating the Risk

In order to prevent data poisoning proactively, OWASP recommends tracking data origins to verify their accuracy at all stages of model development and continuing to vet data sources rigorously, validating outputs against trusted sources regularly to proactively monitor for any poisoning. As usual, testing also plays a large role in determining risk levels in a model. Staying on top of versioning is also a critical part of evading data poisoning.

Sandboxing, infrastructure controls, and anomaly detection can also help to filter and limit exposure to untrustworthy data sources. Fine-tune datasets for specific models and use cases in order to produce clear, defined outputs. Security teams and developers need to monitor training loss and regularly analyze the model for signs of poisoning. Finally, Retrieval-Augmented Generation (RAG) and grounding techniques can reduce risks of hallucinations

However, even all of these measures cannot fully protect against the risk of data poisoning, especially when data lineage remains hidden and, and poisoned content risks don’t show up in tests, or certain bad behaviours only show up when triggered.

Security teams must remain vigilant in the face of these mounting risks. If your organization is struggling with AI security, see how FireTail can work for you. Schedule a demo, or start a free trial here, today.