In the latest episode of This Week in AI Security, Jeremy reports live from the sidelines of RSA in San Francisco. The week is defined by "gullible" AI agents, legal precedents for chatbot liability, and a massive supply chain attack targeting the tools developers use to build AI applications.
.png)
In the latest episode of This Week in AI Security, Jeremy reports live from the sidelines of RSA in San Francisco. The week is defined by "gullible" AI agents, legal precedents for chatbot liability, and a massive supply chain attack targeting the tools developers use to build AI applications.
Key Stories & Developments:
Episode Links
All right. Welcome back to another episode of This Week in AI security. We are coming to you for the week of the twenty sixth of March, twenty twenty six, recording from the sidelines of RSAC conference. We've got one day to go in the conference, and we've got a couple of stories from presentations given at RSAC this year. A lot to get through. Let's get to it.
First up is AI agents are gullible and easy to turn into your minions. So this was a presentation from RSAC this year from the team over at Zenity showing some zero click exploits against Cursor, Salesforce Einstein, ChatGPT, Gemini Copilot with basically no user interaction. Just one prompt that can set things off. So the argument from the presenter here is that we shouldn't think of things as prompt injection, but think of that as persuasion. And that's going to tie into a couple of the other stories. I don't know that I have a strong feeling about the terminology that we use. The point is what can get your LLM to act in a way that you didn't intend it to act. And I think that's where we should focus. We need to start understanding the intent behind the interactions that we're triggering. You know, what is the response that we're going for? What is the task or the question that we're trying to get answered? So you can manipulate an AI advisor to give wrong advice. And these things now become influence vectors. And if when we think about putting things more into agentic workflows and chained events and kind of agent-to-agent interactions, that's where some of these risks become compounded. Really interesting story. Good presentation here.
Next, moving on from the team over at Oasis Security, a three vulnerability chain in Claude. Now, just as a heads up, this has mostly been patched already. So just interesting to understand some of the vulnerabilities that went into this chain. We've talked about kind of invisible or indirect prompt injection in many different contexts before everything from environment variables to Readme files in a repository, etc. This is prompt injection via URL parameters. So if you think about passing a URL parameter with the query or the prompt that you want to pass on to your LLM, well, what if that is a malicious prompt? It goes back to some of the things that we've talked about a number of times on the show. You've got to think about kind of some validation and sanitization of the things that you're passing along to LLMs that you're using in your interactions.
All right, moving on. This is another one from RSAC conference this year. This was the General Analysis team that red teamed fifty plus customer facing AI agents. They were able to kind of get fabrication of ten million dollars in discount services commitments, etc. Think of this as kind of, you know, customer support requests. You're a dissatisfied customer. You're acting with an AI chatbot agent who is there to give you customer support. You complain, you complain, you complain, you escalate. You give arguments for why you should get a refund or who knows what. And you know, the agents will eventually, at some point through persuasion, through kind of manipulation of their intent, give in in certain cases. And so they were able to show that they were able to generate ten million in fabricated discounts. So really interesting stuff over here. It also ties into another story we're going to get into later in today's episode.
We've got one from the CVE side this week. This was disclosed on March nineteenth by Quintana Cantina from HackerOne. This was a Claude Code resolved permissions from settings in a JSON file that is part of your overall environment, and a malicious repo could commit permissions, default mode, bypass permissions, and skip over any kind of trust and permissions validation. So the way that this would work is you clone a malicious repo, open it in Claude Code, and the agent will immediately act on any of the malicious repo instructions that are embedded through those things like indirect prompt injection, malicious prompts that are embedded in Readme files, etc., etc. It connects back to a number of things we've talked about—IDE disaster, etc.
All right, moving on. Another interesting open source project: CloneGuard. And we've talked a lot about things like AI disaster and indirect prompt injection, etc. And so a researcher named Kira figured out, you know, maybe what if we looked at this from the other perspective, from the defender's perspective? What if we looked at this in terms of saying like, hey, you shouldn't be able to just clone things. You should be able to have a set of checks that runs when you try to grab something as a project and clone it and then build your own version of that thing. So some interesting stuff here. I will say like one of the biggest points that I took away from this read is like disabling "YOLO mode," which is the kind of skip-permission step, is really the last line of defense. And you want to make sure that you kind of never do that in any of the projects that you work on.
Moving on. Next story. It is now confirmed, and again we've talked about this a little bit before, but the FBI has really confirmed that ransomware gangs are definitely using AI assisted tools and processes in all of their criminal activities. So three were named by name: Akira, Chilin, and Scattered Spider. And there is now evidential proof of the usage of AI tools to do that. You know, there's a lot of concern that this is kind of an economic escalation here in terms of obviously the ability to operate at scale and at speed. Something to definitely watch out for. I don't know that it really changes much from a defender's perspective. You've got the same kind of infrastructure. You maybe want to think about where some of the weak links in your own kind of layers of defense-in-depth might be. Reinforce those a little bit. Think about areas where rapid interactions might be a little bit more susceptible to, whether it's user error and clicking a phishing link, or whether it is a quick authorization that shouldn't have happened so quickly, things like that. So those are some of the issues to potentially think about in terms of your own layers of defense. But yeah, it's nothing new. It's just an escalation of things we've talked about in the past.
All right, moving on. Interesting story about some internal stuff over at Meta that ended up leaking out to the press. So a Meta engineer asked an internal AI agent to help analyze a question on a forum. The agent posted a response autonomously without asking for permission. So that kind of human-in-the-loop step was really skipped there. And then the employee actually just blindly followed the advice that caused a massive, inadvertent exposure of some company data—it was apparently internally categorized as a Sev 1 issue. You got kind of like the two failure modes in here. One is kind of the agent had too much agency and was able to take an action without getting any approval. And then the advice was just wrong. And so no kind of human validation or kind of sanity check of the instructions that came back; it was blindly followed. You got to think about how you tell your team to respond to instructions coming from agents who authorized this thing. Is this the right set of things? And if you're ever in doubt, how do you train your employees on how to treat that situation? Do they take action first and look for resolution later? Do they go validate with a second person? What's the level of action that is allowed for both the agent and the human actor in that scenario? A little bit of a philosophical question, I think is kind of interesting.
Next, from time to time, we do definitely like to cover academic research papers in this area. A paper from a group of university researchers who figured out that by observing the traffic, you can actually fingerprint what LLM is in use. There are certain distinctive patterns in the kind of payloads themselves, but definitely in the structure of the responses that you get from different LLMs. This is useful in an attacker's context, because as we've talked about on the show before, no LLM is one hundred percent perfect. And we've also discussed the theoretical research around the mathematical proofs about every LLM being susceptible to prompt injection. So you always have that risk. But every LLM has a different set of risks. Some are more susceptible to writing new malware when given a prompt; they'll bypass that ethical guardrail. Others have what's called the "Grandma vulnerability," where if you role-play a scenario that, "Hey, I'm trying to help my grandmother and provide her IT support, how do I get her to give me her password?" You know, they'll go along with it and give you convincing language that you could use in a phishing email, whatever the case may be. There's kind of twenty-five categories that are well known and documented of different types of manipulation, vulnerability, persuasion, as has been discussed in today's episode, or influencing rather. And so the advantage of knowing which LLM you might be using through this fingerprinting is now you know specifically which of those vulnerabilities is most present in that LLM. So what's the best way for me to do it? Is it Base64 encoded prompt injection? Is it role-play scenarios like "Grandma"? Is it hex characters? Is it ASCII smuggling? What is the right way to look at it so that fingerprinting can actually give attackers and potentially defenders a real advantage in understanding things?
All right. Moving on to our next story. We've got a couple of stories from the legal domain which I think are really interesting. So the British Columbia Civil Resolution Tribunal has ruled that Air Canada is liable for the incorrect response from the chatbot. And so what this happens with is that a customer went and claimed a discount and Air Canada denied it. But actually the chatbot allowed it. And actually what the chatbot said to the user is what the organization is liable for. So bear in mind, when you're building agents that represent your organization, you as the organization are the identity behind that chatbot and you are the legal entity and you will inherit that liability. This is a precedent. And I think there are more cases like this to come.
Next story, also in the legal domain: Colorado passed one of the first comprehensive U.S. state AI laws targeting high-risk AI in hiring, housing, lending and government. There has been some industry pushback here, but I want to point out that this is a state-level act. This parallels very much the EU AI Act, which bans the use of AI or risky AI in a lot of these high-risk scenarios, exactly like these. These are things like discrimination in banking, discrimination in lending, discrimination in healthcare—any of those things that are deemed to be high-risk use cases. In the EU AI Act, I think the current state is that all AI usage is banned, and you need to be able to attest and provide auditability on that. And in this case, it's just high-risk AI usage. Now real question, obviously for the philosophical minded of those of our audience, is what is high-risk usage? All AI usage has some inherent risk in it. So the accountability behind it is also a very interesting question. It will probably provide a stress test at some point when one of these cases comes up, like what we just covered in the Air Canada scenario. So definitely something to keep out for. And remember that your organization is the one that is going to be on the hook when it is your org using the chatbot that has some issues around this.
All right, moving on to our biggest story of the week. And this is research from the Datadog team. This has been widely reported, but I thought the Datadog team did a nice write up. So I want to give them a quick shout out. So a threat actor group called Team TCP compromised Trivy through a supply chain attack. They then used that to harvest credentials to poison LiteLLM via PyPI, which is a Python package publishing service, very parallel to npm for JavaScript. And our apologies if I have that language wrong, but that allowed it to backdoor versions 1.82.7 and 1.82.8 of LiteLLM. LiteLLM is downloaded about 3.4 million times per day, so malicious versions were live for about three hours before PyPI was able to quarantine them.
There was also actually a parallel campaign on npm using Canister Worm. So it's a pretty sophisticated supply chain that involved multiple organizations that ended up getting breached. Trivy appears to be the first compromised package—that is from a company called Aqua Security—and then some lateral movement using credentials fetched out of that environment allowed the compromise of an internal GitHub. The organization, it force-pushed existing version tags and GitHub Actions. So already running pipelines continued to execute and potentially fetch the wrong versions. Created a payload with a credential harvester, encrypted exfiltration, and a persistent backdoor in a Kubernetes worm, all in that LiteLLM package. So there's a number of kind of practical and tactical takeaways here. Pinning your CI/CD dependencies to commit hashes, not tags—tags are mutable, commits are not—but also really thinking about your supply chain. And we've talked a lot on the show about how there are problems in LLMs. And we've talked about persuasion, influence, all those things on today's episode, but we've also talked a lot about how the whole environment and the infrastructure around what you're doing is a huge threat surface and attack surface to worry about. And I think that's the key takeaway of this story. We're building really, really fast. We're grabbing packages that facilitate our building. What assurances do we have on that side?
All right. We'll leave it there for today. I know it's been a lot of stories. We've got to get back to RSAC for the final day. Hope you have a great day. Enjoy this week. We'll talk to you next week. Bye bye.