LLM07: System Prompt Leakage

System Prompt Leakage vulnerabilities occur when sensitive information is stored in the instructions used to steer an AI model’s behavior. In this blog, we’ll discuss the ways this can happen, consequences, and how to prevent it.

LLM07: System Prompt Leakage

In 2025, AI is everywhere, and so are AI vulnerabilities. OWASP’s Top Ten Risks for LLMs provides developers and security researchers with a comprehensive resource for breaking down the most common risks to AI models. In previous blogs, we’ve covered the first 6 items on the list, and today, we’ll be going over number 7: System Prompt Leakage.

System Prompts are used to instruct AI model behaviour, and System Prompt Leakage occurs when sensitive information contained within the prompt is exposed. Once attackers access these secrets, they can use what they’ve learned to facilitate further attacks. 

The system prompt itself should never be a secret, however, underlying secrets contained within the system prompt, such as guardrails, etc., are what attackers are looking for.

The best way to prevent System Prompt Leakage is to avoid hiding sensitive data such as credentials, permissions, data strings or passwords, etc., within the system prompt language. That way even if attackers get a hold of the system prompt, they have not gained any critical insider knowledge.

Some common examples of System Prompt Leakage are:

  1. Exposure of Sensitive Functionality- Attackers could learn critical confidential information about functionality through a system prompt. For instance, it could reveal the database information is stored in, resulting in a targeted attack.
  2. Exposure of Internal Rules- The system prompt could reveal information on the internal decision-making process which would allow hackers to gain insight into how it works, thus making it easier to hack.
  3. Revealing of Filtering Criteria- Attackers could figure out the limitations of requests and use this to their advantage.
  4. Disclosure of Permissions and User Roles- The system prompt could reveal information about permissions and user roles that could lead to further exploitation.

Prevention Strategies:

  1. Separate sensitive data from system prompts: As stated above, the best way to avoid system prompt leakage vulnerabilities is to keep secrets and sensitive information outside the system prompt altogether.
  2. Avoid reliance on system prompts for behavior control: Ensure that you are using a variety of security and other controls for each LLM, instead of putting all your eggs in the system prompt basket.
  3. Implement Guardrails: Guardrails that limit the functionality of certain parts of the LLM can also restrict the information attackers are able to access via the system prompt.
  4. Ensure Security Controls are implemented separately from the LLM: When in doubt, outsource- make sure that you are not solely relying on the LLM to keep itself secure. Use security software to place checks on each LLM to prevent system prompt leakage.

With AI vulnerabilities on the rise, now more than ever is the time for security researchers to educate themselves on the risks to LLMs and the OWASP Top 10 is a great place to start. System Prompt Leakage occurs when attackers access sensitive information contained within the system prompt of an LLM. They can then use this information to launch further attacks. There are several ways to mitigate the risk of system prompt leakage, but the best way is to ensure that you store sensitive information such as credentials and passwords outside of the system prompt.

To learn more about AI security and see how FireTail can help you with your AI security today, schedule a demo or set up a free trial, here.