Modern Cyber with Jeremy Snyder - Episode
74

This Week in AI Security - 6th November 2025

In this week's episode, Jeremy looks at three compelling stories and a significant academic paper that illustrate the accelerating convergence of AI, APIs, and network security.

This Week in AI Security - 6th November 2025

Podcast Transcript

All right, welcome back to another episode of This Week in AI security, coming to you from modern cyber and the Firetail team behind Modern Cyber. I'm your host, Jeremy. As usual, we are recording this week for the week of November sixth. Despite what you might see on the screen, if you're looking at the video version of today's slides, I'm not going to let that hold us up though. Let's dive into it. We've got a couple of stories that I want to cover today across a couple of different areas, as well as one academic research paper that we're going to talk about today.

So first, from our friends over at Chaser Systems , and if you are a subscriber to the pod or you have viewed other episodes of Modern Cyber in the past, you might recognize that name. CEO co-founder Dhruv Ahuja was a guest of ours, actually, in a live session that we recorded on the sidelines of Forward Cloud Tech North America this year in twenty twenty five. And Dhruv is a longtime practitioner and I would say kind of a networking expert, and his background comes out of banking and out of running security programs, particularly network security practices at large scale financial institutions, kind of global financial institutions, if you want to think about it that way. And so Dhruv has this really interesting and strong, strong background, as I said, in that area.

And so when he and his team were looking at kind of AI agents and in particular AI coding agents, they started asking themselves the question of, well, what data do AI coding agents send and where. And so if you think about AI coding agents, these are very often think of things like copilot for GitHub, things of think of cursor think of Claud code. Think of, you know, uh, what what is the Microsoft Visual Studio, um, agent or something like that. It's called or VS code agent. I apologize if I don't get the name exactly right there, but the question that came to their mind was what data is being sent, where and what's in that data.

So they took on the task of actually running a number of different, um, AI code agents on different environments from different service providers. So I think in their example, they used cursor and they used cloud code and they used, uh, VSL, this V's plugin that I mentioned earlier and a couple of other systems. And what they did was they put a mITM, a man in the middle proxy in place and really spent some time analyzing all the network traffic. Some of it is interesting, some of it, frankly, from my perspective, a little bit less interesting. On one hand, they actually analyzed all of the DNS lookups and all of the kind of resolutions that happened, and then the volume of traffic to each of a few different end points across there. They did that in both kind of like freemium and premium premium modes, if I remember right. Uh, we'll link as always to the article from the show notes, and you can have a look there.

But one of the things that was interesting is when you do get into it, you can actually see that there is a lot of sensitive data being sent in some of these packets. Some of that will make a lot of sense to you. Okay, I'm using a third party service. I'm authenticating over an API. Of course I'm going to have some API credentials , but one of the other things that they noticed was that actually the results of the previous interactions were getting repackaged back up and sent back again. So if you think about that, I start a coding process in the sense of, hey, I want to write some code , and then maybe my next prompt is I'm trying to make a web application. And then my next prompt is I want to make an online banking application. And then maybe my next prompt is I want to create checking accounts and savings accounts or whatever the case may be. Well, what they found was that all of those initial interactions are also lumped in with each subsequent request.

And this makes sense from an LLM perspective because all of your Interactions in that kind of session window tend to get lumped together to provide the context that the LM needs in order to do its token prediction. So if you're sitting there thinking about like, well, how do LMS work? This is probably logically making a ton of sense to you , but it is indicative. And what it also shows is that the volume of network traffic could be really, really high. And if you have something that slips out in one, uh, one interaction, whether that is, let's say some customer data or something like that, it's going to get packaged up and resent again and again and again. And so from a detection's perspective, you might actually detect the same PII leakage. Let's say, you know, n number of times where n is the number of interactions on that vibe coding session, if you will. So really interesting thing there. Um, exposing how does that traffic look. What all is it contained in the packages etc. It's a pretty deep dive. Takes a few minutes to read through, but again, we'll link it from the show notes of interest for anybody who's working on the detection side, and particularly who's looking at their network as a way to observe things.

All right. Next up, Claude Pirate. This was a really interesting, um, uh, disclosure around kind of what is going on. It is a data exfiltration attack chain. And the security researcher here figured out that they could actually leverage their own set of credentials against an API on the cloud code interpreter platform to do network requests. And with those network requests, passing their own API key exfiltrate data. And that was a really interesting observation from their perspective. Now on this one, I will say they did disclose to anthropic. Anthropic. Kudos to them. Credit to them. Exactly the response you want to see from a service provider when you do take something to them. And that is, you know, acknowledged receipt of the report. Do your own quick triage on the report confirmed that it is a vulnerability. If it is as it was in this case, and then, you know, kind of coordinate with the security researcher who reports it to you.

And best practice around this, for those who are maybe not aware, is that, you know, you as a security researcher, as an ethical security researcher, you kind of agreed not to publish anything around this until the issue has been mitigated, or you've gotten good communication from the service provider , and they may ask at your discretion, to hold off on publishing anything around this, because, hey, if you're an ethical security researcher, you want to be making the internet a safer place. And so you'll generally agree to that. But really interesting. This shows from our perspective here at Firetail. You know, we have long stated that AI and APIs really are very, very heavily linked and that actually all of your AI initiatives are going to happen over APIs anyway. And so when you think about that, the API becomes the primary attack surface that is exposed. It's not the only attack surface because the API is where you can do things like exfiltrate data. The LLM itself is where you can do things like trigger hallucinations, or maybe run prompt injections that get the LLM to misbehave. But you've got this kind of compounded risk, and this is a real world example of where those two things kind of play together. So fascinating from our perspective to see that play out.

All right. Moving on to a research paper called The Attacker Move. Second, stronger adaptive attacks bypass defenses against LM Jailbreaks and prompt injections. So if you're not familiar, jailbreaks and prompt injections are categorized as two of the highest risk categories around large language models. And Jailbreaks basically means get the LLM to perform in a way that is kind of unintended. Prompt injection means get it to do things that are kind of against ethical guardrails or other, maybe content guardrails that are in place. They're very, very closely related concepts. Honestly, in my own mind, I find it hard sometimes to draw the line between what's a jailbreak and what's a prompt injection in certain circumstances. But in any, uh, in any regards, the main defense mechanism that most people will talk about when it comes to either of these risks is we should have guardrails in place , and guardrails are sets of embedded instructions in the LLM that basically say things like, no matter what, do not give the user a recipe for making a bomb. Very simple example of a guardrail, right?

And what they found is that from an automated testing perspective, uh, yes, these guardrails can be pretty effective. But when you apply a little bit of human logic or even prompted logic of an LLM communicating with an LLM, uh, you can actually bypass this in a huge percentage of cases. The ASR attack success rate, which you may remember, we talked about on a previous week's episode as being kind of the key indicator for a lot of these types of content manipulation or injection or jailbreak attacks. Tax. The ASR goes above ninety percent for almost all LLMs. And one of the things that the researchers argue in this paper is that these guardrails actually give organizations a false sense of security. They think that because they've put guardrails in place, they've actually got good control over the risks of that particular LLM or that particular scenario that they're deploying the LLM for. But what the researchers argue is that that is, at best good for a kind of simple, automated things. When you put it in the context of a human who's augmenting the automated testing or who is actively working against those defenses, they were able to bypass guardrails in ninety plus percent of cases. So fascinating. Long read. Again, I will confess, I actually used an LLM to summarize some of this. Some of it was a little bit above my own level. You may find the you may find you want to do the same, but it is well worth the read for those of you who are working on this space.

Last but definitely not least, is one more example on the AI API intersection. And this one I want to take a couple of minutes to talk about some of my own understanding of that. I will share here on today's episode, with the caveat that I may not have understood everything exactly correctly. So you may want to read this for yourself, but let's get into it. So this is from the Microsoft Incident Response Detection and Response Team, or Microsoft Dart as it is commonly known. This is a security team over at Microsoft that does a lot of research on kind of emerging threats. So they're very often looking at, you know, kind of new sources of data, new types of log files for new technologies, um, new kind of consolidated threats, new risks on things like Microsoft three hundred sixty five and Active Directory. They look at all kinds of emerging threats. And that's one of the key things about them, staffed by a team of experts. I will say I personally had the privilege of meeting some of these people years ago. A few jobs back at another company, uh, got a chance to visit them on site in Microsoft. I was truly impressed with the caliber of the team. I obviously don't know how many of the same people are still in place, but I certainly would expect them to uphold a pretty high standard on that side.

With all of that background said about the Microsoft Dart team. What they found was they found a new threat on some malware that they noticed was popping up on certain systems. In the article, they go into some detail about how some windows system DLLs sorry, not windows system windows application DLLs. I don't want to imply that these are things from the operating system. These are third party DLLs that are running on the windows operating system were installed using some obfuscation, which is a pretty common technique around trying to mask malware on windows systems. Um, and they were able to kind of, you know, intersect, kind of tear these things apart a little bit, do some forensic analysis on them.

And one of the interesting things that they found was that they found that the backdoor, so to speak, or the what's called what's often called the command and control server or C2 server, um, was not a typical IP address. And that is often what you see is what you see is either a DNS name or an IP address. And that's kind of the the C2 server that this malware will communicate with. And kind of the way to think about it is, you know, malware exists on a number of systems. And then the C2 server coordinates the activity of that malware. Very often the malware will get dropped on my system, and it might have some local instructions like, hey, look for passwords in Jeremy's text files or whatever the case may be , but then it'll have a second set of instructions, which is communicate with this server that is remote, and get your follow up instructions on the things you want to do. And maybe that's as simple as upload these, you know, the Jeremy password files up onto a certain server. Or maybe it is, download some additional stuff and then run that some more advanced malware, whatever the case may be.

And what they found that was surprising in this case was the actual the C2 server, so to speak, was actually the open AI assistance API. Now the assistants API is one API feature within the OpenAI ChatGPT family of services, right? And that API itself is actually already slated for deprecation. But what's fascinating about this is that when you think about this, this has the potential to unleash a new category of malware. What do I mean by that? Most of the time, malware has a limited set of instructions. It's computer code. It does exactly what it's meant to do. So you get a piece of malware on your local system. It communicates with some remote C2 server. It gets a set of instructions. Well, that set of instructions is going to be very computer code ish, right? It's going to tell your system exactly what to do. In this case though, if you think about an LLM, that is what the malware is talking to. And it the local malware might communicate to a C2 server that now has more of a context guidelines or more a framework and a more kind of standing set of instructions Actions, and that set of instructions might be anything like get all of Jeremy's passwords or capture Jeremy's keystrokes, or whatever the case may be. But it's much more free form. And to that point, it can be quote unquote, creative. And this has been proven in other instances to be a things that llms can do and can sometimes do very, very well. So, for instance, in things like capture the flag scenarios, llms have been proven to be sometimes more effective than humans because they will do additional research, find additional vulnerabilities and methods for exploiting those vulnerabilities, and capture flags accordingly.

This one was stopped. Not so, not necessarily in its tracks. We don't know exactly how many cases there were around this. That's not in the disclosure, but it was a really fascinating kind of thought exercise around a risk that could be emerging. And one of the things we try to do here on this week in AI security is actually take real world examples of what is actually happening, as opposed to stuff happening in a lab. And this is a case point in that, you know, this is real world log files, real world observance of a new attack pattern as it is emerging. So something to keep an eye out for on in terms of what, you know, LLM powered malware might look like in the future and what are some of the risks that that might introduce?

All right. We went a little bit longer today, but we had some very compelling stories to share with you. Hopefully this has been another useful episode for you in this week in AI security. As a reminder, like subscribe, rate, review, all that good stuff, share the episode with people that you are that you know that are also looking at what are the real world threats around AI adoption as as it as it relates to cybersecurity in your organization? If you've got any questions, reach out anytime. If you've got a story, please send that to podcast at AI. We will talk to you next week on This Week in AI security. Thanks so much. Bye bye.

Protect your AI Innovation

See how FireTail can help you to discover AI & shadow AI use, analyze what data is being sent out and check for data leaks & compliance. Request a demo today.