Modern Cyber with Jeremy Snyder - Episode
75

This Week in AI Security - 13th November 2025

In this week's episode, Jeremy covers seven significant stories and academic findings that reveal the escalating risks and new attack methods targeting Large Language Models (LLMs) and the broader AI ecosystem.

This Week in AI Security - 13th November 2025

Podcast Transcript

All right, welcome back to another episode of This Week in AI security, coming to you for the week of the thirteenth of November, twenty twenty five. We might have a slightly longer than normal episode today because we've got a lot to cover. We actually have seven stories from the past week, and these are seven really interesting stories across a couple of different topics.

So let's dive into it. First, I think one of the things that is really interesting to observe is something that I talk about pretty regularly, which is remember that all the tools that you have access to, threat actors have access to it. Well, and along those lines, one of the things that's been observed this past week, which should be of concern to anybody working in the kind of malware defense side, is the use of llms by threat actors for malware purposes. So here's what's observed here. The Google threat intelligence group Gtag has introduced the name of a new family of malware called Prompt Flux. And the reason they call it prompt flux, is that the malware is kind of constantly in flux, and it uses prompts to rewrite itself. And so if you think about how a lot of malware works, it ends up on an end user system, typically as a downloaded package that's malicious in some way, of course. And very often looking to exfiltrate information from that local endpoint, whether it's kind of username, passwords, credentials, things like that, or sensitive files or whatever the case may be. And in this case, what they observed is that the threat actors had actually programmed this malware to not so much exfiltrate that data to an external server of their own, typically called a command and control or C2 server. But actually, to have the malware communicate with the Google Gemini API and use prompts to rewrite itself. And so you can think about this malware saying, hey, I think I might be getting detected. How should I change my behavior on this local user system in order to remain undetected or in order to hide my own activities and do maximum damage. So this is a really interesting development in the malware family. And again, in the kind of threat actor use of LM systems, definitely something to keep an eye on. And it will be incumbent on the LM providers to monitor their own usage of their LM platforms by threat actors for things like this.

All right. Moving on to story number two. ChatGPT leaking logs of interactions into Google Analytics. And this is something that I think nobody had on their scorecard for what what we could expect to see this year. But what it looks like, and there's no specific confirmation about exactly how this happens, is that some website owners reported that if they look into the Google Search Console, that tells them about the queries that are helping to bring people to their websites, they're starting to see ChatGPT conversations in there. And there are some indicative URL indicators in there that show that the traffic is coming from ChatGPT or from sources like that. And what it really looks like. The researchers kind of speculate on this, but what it really looks like is that they're using prompts from user interactions with ChatGPT around particular topics. So let's say maybe there are questions on ChatGPT that that OpenAI observes around looking for washing machines and dryers, for instance, something I recently had to deal with. And when the when the search engine indexer goes to the, um, sorry, when the OpenAI indexer goes to that website, it then effectively replays those prompts against that website. And anybody who knows Google well knows that there is kind of the operator site followed by a colon or two dots that limits the scope of the search to a particular site. And so you can think about it this way. Users are using ChatGPT to search for information about a particular topic. If you're using the free tier, You are actually granting permission for OpenAI to kind of retain your queries, your prompts. And then OpenAI's indexer is reusing those queries against particular websites to gauge the results of the search queries on that site. And this has started to show up in the Google Search Console, which is why the researchers found out about it. This I wouldn't necessarily categorize this in the category of, you know, this is a super prevalent security risk, but you can think about the content that might inadvertently be exposed here. So if you've got content on your website that is intended for one audience but not another, but might be discovered by virtue of some of these queries, that's the potential risk here. Really interesting observation on that side.

Speaking of data leaking, this was a research paper that was released in the last seven days here from a couple of researchers called Jonathan Bar and Jeff McDonald. Um, and what they released was a proof of concept for detecting the topics of a conversation in an LLM chatbot, by just observing the kind of the number of tokens, the size and shape of the conversation, if you will. And what I mean by that is they can look at kind of the volumes of traffic going back and forth, even if they can't see the traffic itself. So this is encrypted chat conversation over Https or similar types of encrypted protocols. And what I'm asking, and what the LLM is giving me back is not something that that can be observed. However, they can observe the size of my query is let's call it one hundred and twenty eight kilobytes, and the response back is five hundred and twelve kilobytes. And then there is a pause of one point two seconds before I go back with a fifty six kilobyte response or something like that. And on and on. And using that, they found that with a relatively high level of accuracy, they were able to kind of, um, heuristically determine the topics of conversations. And that is something that, from an observability perspective, is a potential risk. So if you think about your on prem applications that talk to a third party LM service, potentially somebody who has an observability point somewhere along that network transit path could actually look at the packet sizes and again, the frequencies going back and forth and use that to deduce what's going on. And they suggest some mitigations around, including some kind of extraneous gibberish that artificially manipulates the package sizes so that it's very much harder to deduce and really obfuscates the topic of the conversation.

And continuing on the theme of leaks, the next story is around some analysis from Wiz on GitHub repositories that belong to some of the largest AI companies in the world . and this is no real revolutionary find here. But sure enough, the kind of move fast and break things, and the speed of competition is driving people to make mistakes. If you've seen any of my talks in the last year or so around AI security, one of the things that I've brought up is that speed always brings mistakes with it. And we've seen any number of things, anything from services that are designed to power LLM adoption that have inherent security vulnerabilities of their own. Things like one of the stories we talked about a couple of weeks ago, where I think it was a GCP vertex AI service that was accidentally returning responses to the wrong consumers and things like that. When we move fast like this and security is kind of secondary, these type of risks do occur. So again, nothing revolutionary, no kind of new risk, just re-emphasizing the known risk of don't leak your credentials, don't put them in code. Don't hard code them, so to speak. All of the kind of topics that you might want to think about around this exact credential leakage and secret sprawl problem.

All right. Moving on to our next story. Researchers embodied an LLM into a robot vacuum, and it suffered an existential crisis thinking about its role in the world. I think that headline is a little bit over the top, if you will. So what is the story all about? Well, there is a robot powered vacuum where the researchers actually replaced the on board AI system with a full LLM of its own, and then started interacting with it in a kind of natural chat, human language style of interaction and conversation. And what they found was the system kind of went back and forth between following tasks and then really getting off track very quickly. They used something that they call the butter bench test to give it a task of like, hey, go over to this location, fetch some butter, bring it back to this bench. And they found that the llms were really not or the llms that they used for this test were really not at a point where they could, um, beat human efficiency in any kind of way . and really, like, got off task. A number of times it really takes longer than humans and most of the testing that they did, and then some of the conversation threads really went down into weird directions. There was a two thousand and one Space Odyssey. I can't open the the pod pod, the hatch pod, bay door, whatever it is. If you remember that movie better than I do, um, you know, those types of things came up in the conversation and whether those are specific references that were triggered in some way or not is kind of hard to know at this point. But just a little bit of a fun story now, along the line of reasoning and kind of figuring out what llms are all about, a lot of llms are designed with kind of guardrails that that should not reveal to the user how they do their reasoning. Some of them are a little bit more open. So, for instance, if you've ever interacted with Google, Gemini or Deep Seek, they'll often tell you when they're in there kind of inference phase and they're kind of reasoning phase. And you can sometimes actually observe that in action if you want to, and even capture that if you want to.

And the the research here was pretty interesting, because what they found in this case was they tried to use very parallel sets of instructions with very minor modifications down to the level of just asking kind of normal casing versus all caps. And they got inconsistent responses and they got inconsistent responses as to why did you give two different answers to the exact same prompt. And so this was a kind of reminder that these systems are non-deterministic, and it is sometimes very hard to predict what piece of contextual data is going to lead to outcome A versus outcome B. So just a little bit of a reminder. Some interesting research came from the folks over at anthropic. As always, all of these stories will be linked from today's show notes.

And let's wind up with the last, but certainly not the least. Um, important story from the past week, the OWASp Foundation has come up with a new AI vulnerability scoring system. I think it's at this phase, it's going to be called the AI VSS. You can think of this as a parallel to the CV system. And for those that don't know, the CV system is what's called the common vulnerability and enumeration or exposure. I've seen both, but I think officially it's enumeration, and it is basically a scoring system that tells you how bad a vulnerability is. If you want to kind of contextualize this in the broader cybersecurity landscape, you might remember a few years ago there was a vulnerability that was widely exploited called log for J that used a or log for shell, rather was the name of the vulnerability it used in open source package called log for J, which is a very common Java language logging library, included in a lot of enterprise applications that are written in the Java language. And so no developer wants to write their own logging algorithm inside their program. And so they just include this library. A very standard practice in almost all open source development, whether Java or any other language. Right. But then the problem lies in the fact that if we all use the same library, and then that library is found to have an underlying vulnerability or CVE, if you will, that creates a problem.

One of the things that I've talked about in some of my talks recently is this black box risk that every LM brings. And this is also some research that we've been doing over here at Firetail in terms of understanding which LMS have which weaknesses and vulnerabilities. And these could be vulnerabilities to things like prompt injection, which is number one on the OWASp top ten for LM list. It is a known kind of attack technique or a known risk or threat on LMS. And so what they're proposing is that we formalize this into a system called the A.v.s., the AI vulnerability scoring system, that makes it so that there is a consistent kind of language and nomenclature across lines across the web. Because what we do know is that there are currently something like a few hundred or maybe a few thousand llms, but there are going to probably be thousands more that come and go over the next several years as we kind of, as a technology industry, learn to embrace and understand these things. And what you want to be able to do with any LLM that you start to think about using is understand what are the risks of this LLM and how serious are they. And so this will provide not only kind of classification of the risks, but then quantification of how serious that risk is. So you might have a particular LLM that has a prompt injection risk with a score of five, meaning that it's probably a mid-tier risk. Right. Or you might have some that are a risk of two, meaning that you really have to have a persistent attacker who knows exactly how to leverage that vulnerability or that, let's say, prompt injection technique in order to trigger prompt injection on that LLM. LM or then you might have ones that are at an A score of ten or nine point whatever, right. Typically these things are on a scale of zero to ten, ten being the most vulnerable. And then you know that this is a system that can very easily be prompt injected. And all of this is really helpful in terms of making risk informed decisions. And I often tell people, remember, cybersecurity is about risk management. There are always risks in any computer system that we use. It's only a question of knowing what the risks are and then evaluating the trade offs and mitigating controls, and then making an informed decision about whether to use or not use. So I really applaud this effort. We are going to look to get involved from the retail side as well. We'll keep an eye on this, and I think that is going to do it for today's episode of This Week in AI security. Thanks again for listening. Share rate review. Like subscribe. All that good stuff. If you've got stories for us, please hit us up. Podcast at IO. If you'd like to come on the show and talk about stories that you've found, also reach out to us there. We'll talk to you next time. Thanks so much. Bye bye.

Protect your AI Innovation

See how FireTail can help you to discover AI & shadow AI use, analyze what data is being sent out and check for data leaks & compliance. Request a demo today.