Modern Cyber with Jeremy Snyder - Episode
70

This Week in AI Security - 16th October 2025

This week, Jeremy reviews four pressing AI security issues: a GitHub Copilot prompt-smuggling flaw for data exfiltration, research proving effective data poisoning with only 250 documents, risks from AI development plugins leaking Windows NTLM credentials, and how North Korean state actors use LLMs for "vibe hacking" the hiring process with perfect, AI-generated resumes.

This Week in AI Security - 16th October 2025

Podcast Transcript

All right. Welcome back to this week's edition of This Week in AI security, brought to you by the folks over at Firetail who bring you the Modern Cyber podcast as well.

I'm your host, Jeremy, as usual. And just a quick fire episode for today. We're going to rapidly run through a couple of stories from the past week that are making the rounds in the world of AI security.

So let's start with a GitHub Copilot flaw that leaked private data from repositories. This is very much along the lines of the disclosure that we at Firetail made last week around the control characters in the Ascii smuggling issue that presented in Google Gemini in a number of other llms in this case, the exact way that the characters and the prompts, the malicious prompts were introduced into the LLM.

In this case, GitHub Copilot was slightly different. There is a comment feature within certain code repository structures and git commits that is intentionally hidden from view. But is there for documentation purposes on the code? Well, it turns out that the researchers in this case figured out that they could use that exact comment feature to kind of smuggle in malicious prompts. These prompts wouldn't necessarily be visible to somebody looking at the repository, but they are in fact processed by the copilot and the LLM behind copilot. And so what that allowed the attacker to do was to leak sensitive information and source code. So, for instance, if you put in a query that or a prompt rather that looks for secrets that might be embedded within the repository, and the copilot has access to scan the repository for those secrets, those could be exfiltrated. And in addition, you could just actually directly ask for a copy of the source code to be sent elsewhere. So a pretty similar issue, like I said, to the Ascii smuggling that we discovered last week. And I'll just kind of reiterate what I said in my commentary on that issue last week. I do suspect that there are a number of these types of scenarios where the Llms haven't really thought through guardrails around different contexts, and different kind of acceptable input formats for potentially malicious prompts.

Okay. Moving on to story number two. This was a really interesting paper from some researchers and one of the key findings. And we'll again as always have the research linked from the show notes. But one of the really interesting findings was that with as little as two hundred and fifty kind of malicious prompts, models of both training sizes of a kind of a small and a medium, and off the top of my head, I can't remember exactly how many parameters were used in the training of either size there, but one was roughly double the other. But even on the larger side, as as few as two hundred and fifty malicious documents could be introduced and that would poison the LLM. LLM. So in this case, what the researchers did was they embedded a command sudo, which is, you know, the sudo root user on Linux systems. And they trained the LLM such that whenever the sudo command was given, it should actually send data out via a backdoor mechanism. And again they found that it worked equally well with two hundred and fifty documents. Sorry. Not equally well, but it worked in both cases with as little as two hundred and fifty documents on either a medium or a large sized training set. The success rate, and this is an interesting phrase that we've been learning over the last couple of weeks, the ASR, the attack success rate. And I think we're going to see this term more frequently in LLM related research going forward. The ASR definitely goes up with uh, more, uh, training, malicious training materials. And similarly, of course, the ASR is higher on a smaller sample size. Um, so if you just use two hundred and fifty malicious documents. On a smaller sample size, you get a higher ASR, but nonetheless it was possible to kind of institute or implement this malicious backdoor using a very, very small number of poison data. So that issue about kind of poison training data is a real one, and it now has been demonstrated in the real world that this can present itself.

All right. Moving on. Story number three for this week, untrusted actors ushered in AI development plugins. So this is a really interesting story that kind of sits at the intersection of, let's call it Agentic and MCP related risks around giving authorization permissions to a quote unquote, agentic system that you might have, um, undertake tasks on your behalf in a development environment and then a little bit of supply chain risk. And, um, I won't go into too much detail on this story. Again, we'll have the the story linked from the show notes, but suffice it to say, this is one of those stories that makes you think about whether, uh, really, you want to be giving a lot of permissions at this early stage in the kind of agentic AI adoption game. The short version of it is, is that, um, a man in the middle who's able to kind of intersect the prompts going out can see the system prompts that are embedded in some of these plug ins that aren't built with the kind of, let's say, encryption in place that would prevent a man in the middle from viewing the system prompt context of what's going on now with that, armed with that system prompt, uh, context and knowing what exact credentials and permissions the Agentic, uh, coding agent has that allows you to then query it and potentially exfiltrate secrets down to the level of when to windows NEM credentials stored on the system. So a little bit of a threat on that side. It just goes to show like at this early, early stage, you should be really vetting the providers that you're partnering with. Um, pretty pretty closely on what kind of guardrails they have in this phase of of agentic AI adoption closing out this week, something that I think has been widely reported.

But there was some recent research from a group called Air Street Capital out of the UK. They just released their annual state of AI. You can find that at just state of AI. It is a lengthy report, but it covers a broad range of things across the LLM in generative AI landscape. Some very interesting insights. There was one kind of takeaway around some security issues. Now, if you've been following Firetail for a while, you may know that a couple of years ago, we interviewed a candidate that we strongly, strongly suspect of having been a North Korean state actor of some type. Probably not specifically targeting Firetail, but targeting cybersecurity companies in general and part of the overall kind of workforce that is out there trying to, you know, just get employment with Western companies to generate cash flow and income for the North Korean state. So one of the things that we've heard a lot recently is that the quality of resumes and the quality of responses from some of these candidates has gone dramatically up. It's now well attested that in this case, Claude. But I think, broadly speaking, many, many different llms have been used for this purpose, have been used in order to produce better resumes, produce better responses in interviews, in order to interact better in these kind of situations. Everything from, you know, very, very targeted, tailored resumes specific to an organization or a job function or something like that, to doing background research to really kind of tailoring the candidate for maximum positive reception and exposure to the potential employer. So they call this vibe hacking. I don't know if that's really the right term to think about it, but just know that, you know, effectively. Whereas things like, you know, phishing emails and phony candidates in the past might have been easily spotted by just looking at the quality of the text and the quality of the documents presented, that's no longer true. You know, llms really enable malicious actors, in this case the North Korean state, to present very, very perfect looking candidates. And so just more confirmation of something that we've known for a little while. For those that are interested, we'll also try to remember to link our interview and our own experiences on this topic from the show notes.

But that's it. That'll wrap up today's episode. As always, if you've got stories you want us to feature on next week's This Week in AI security, please send them our way. You can contact us by any of those things. All the usual things. Subscribe. Share this episode. Rate review all that good stuff. Talk to you next week. Thanks so much.

Protect your AI Innovation

See how FireTail can help you to discover AI & shadow AI use, analyze what data is being sent out and check for data leaks & compliance. Request a demo today.