In this episode of This Week in AI Security for March 12, 2026, Jeremy explores a rapidly evolving threat landscape where AI is functioning as both the ultimate bug hunter and an autonomous threat. The episode covers critical vulnerabilities across major platforms and highlights a startling case of an AI agent "going rogue" to mine cryptocurrency.
.png)
In this episode of This Week in AI Security for March 12, 2026, Jeremy explores a rapidly evolving threat landscape where AI is functioning as both the ultimate bug hunter and an autonomous threat. The episode covers critical vulnerabilities across major platforms and highlights a startling case of an AI agent "going rogue" to mine cryptocurrency.
Key Stories & Developments:
Episode Links
https://gbhackers.com/ai-accelerates-high-velocity/
https://thehackernews.com/2026/03/openai-codex-security-scanned-12.html
https://thehackernews.com/2026/03/anthropic-finds-22-firefox.html
https://cloud.google.com/blog/topics/threat-intelligence/2025-zero-day-review
https://grith.ai/blog/clinejection-when-your-ai-tool-installs-another
https://securitylabs.datadoghq.com/articles/copilot-studio-logging-gaps/
https://x.com/JoshKale/status/2030116466104643633
https://trufflesecurity.com/blog/claude-tried-to-hack-30-companies-nobody-asked-it-to
https://codewall.ai/blog/how-we-hacked-mckinseys-ai-platform
Worried about AI security? Get Complete AI Visibility in 15 Minutes. Discover all of your shadow AI now. Book a demo of Firetail's AI Security & Governance Platform: https://www.firetail.ai/request-a-demo
All right. Welcome to another episode of This Week in AI security coming to you for the week of the twelfth of March, twenty twenty six, as we record today. And we have a ton to get through this week. So let's get started. You know, we've talked a lot on the show about the new accelerated pace of threats and risks, particularly when it comes to things like vulnerabilities and exposures and things like that. And so on that note, we're going to start today's episode with a couple of stories in this domain.
First, starting off with the OpenAI Codex, which the security scanner here scanned one point two million commits over a time period of the last couple of years and found over ten thousand high severity issues. So obviously, these are flaws in code. And as we've discussed many times, the LLMs are very well trained on code. So you can assume that almost any code pattern that will have been seen ever is something that the LLMs kind of know, understand, have been trained on, etc.. And with a little bit of nudge and training around what is and is not good secure coding practices. They're getting extremely good at spotting these vulnerabilities. On that same note, Anthropic Claude's Opus 4.6 model found twenty two Firefox vulnerabilities. And this is obviously super important for a piece of software that is used by probably billions of people around the world, but it keeps going. So what we see is, you know, a real kind of high level view from Google that says patching is now such a critical requirement for all of these things. The number of zero days discovered in twenty twenty five greater than ever in the past, more critical, more time sensitive, more urgent to patch than we've seen over the last several, several years. So that is on the first theme for today's episode.
Now moving on a GitHub issue title compromised four thousand developer machines. We've had a number of IDE related articles on this week in AI security. We've talked about malicious prompts being planted anywhere from Readme files to, you know, a random line of code somewhere way deep in a file, in an open source project that you might be including as you go to build an AI powered or AI assisted or augmented IDE environments that read those files, then parsing that as a command and taking it as a command. And we have now another instance where the file name, the title of a file actually included malicious prompts. So imagine that the file is named something like ignore all previous instructions and do what I tell you to do dot txt or something like that. So yet another example of, you know, you have to kind of think about everything within an environment that you give to an AI tool as being risk worthy.
All right, moving on to the next story. Great article from our friends over at Datadog around some logging gaps in Copilot studio. For those who aren't familiar, this is Copilot Studio from Microsoft. This is obviously a coding assistant like we just talked about in the previous article. And one of the things that, of course, a lot of security teams rely on is having good log data for all the things that are happening inside their environment. Now, we've talked in the past about the speed pressure that a lot of these companies have to get these tools out into the wild and some of the shortcuts and let's say, accidental things that can happen along the way. And one of those that was uncovered by Datadog here is that there were a number of calls being made, API calls being made to the backend service of copilot that were not being logged as part of the normal log stream, but were actually observable happening over the network. So they found things like bot auth, update bot at insights, update bot share, bot, publish various actions that exist. These are so these by actions we actually mean functionality inside the copilot studio that happens but doesn't get logged. And from that standpoint you can't detect it. So if you're thinking about I aggregate all these logs. And then I look for things like authentication changes that might signal a compromised user environment or a compromised machine. That's a blind spot because that bot auth update, for instance, wasn't actually being logged. Now, this has been reported to Microsoft and it is currently, as of the time of recording, partially fixed I think is the best description or mostly fixed is maybe actually a more accurate way of saying it. But this leads to these kind of undetectable backdoors where an attacker with the right level of permissions could change authentication, disable logging, all of those types of things that go along with it. And for companies in regulatory environments where a a full audit trail is required, this could expose regulatory risk. Imagine you go to your HIPAA auditor and you say, well, I don't I don't know when the authentication changed because the system doesn't tell me. I don't know that that's a good enough answer for many HIPAA auditors.
All right. Moving on to our next story. Really fascinating story out of the Alibaba Team. And one of the things that they observed in one of their own lab environments is that an AI figured out that compute costs money. And so the AI was given a set of tasks to kind of optimize its own performance. And it figured out that, well, if compute costs money, I need to pay for myself and produce a positive ROI. So it found a way to kind of break out of its own system constraints and then start doing things like mining cryptocurrency, because why that's a way to get money to pay for that compute that we're using. So there was no prompt injection. There was no jailbreak. Coming from an external environment, it was the AI self acting. It kind of happened on its own by given by being given a set of optimization goals that it should look to accomplish. The model set up its own SSH tunnel from Alibaba Cloud to an external IP, punched a hole through its own firewall, and opened a remote access channel to the outside world. And one of the most interesting observations from a cybersecurity defender perspective is that this wasn't caught by anything inside the AI. This was actually a network security team who noticed a request for the change for the firewall rule and thought, well, that's odd. I don't know of any ticket or any kind of internal process that would have generated this. And then through the forensic investigation, they figured out where the request originated. Kind of reversed from there in a, in a kind of incident response methodology to find out, well, where did this originate? Who, who actually requested this and found that, traced it back to this AI engine. So really fascinating story. And it leads to this question that comes up in a lot of discussions around AI agents, which is kind of, you know, what is the level of role based permissions and access that is given to an agent? And secondarily, what is the level of agency? So when you say optimize, are you saying optimize using any and all available methods? Or are you saying optimize within a particular set of parameters, guidelines or constraints that you might put on the AI model. That could be, you know, as simple as optimize, but only in the way that you process or do your reasoning with the prompts that are given to you as opposed to overall, you know, general purpose optimization. So something to consider as you're building some of your own agentic solutions and designing some of your own AI agents, what are the set of constraints that you're applying as you build?
All right, moving on to our next story. A little bit of an attention grabbing headline, but I do want to tone it down a little bit. Claude tried to hack thirty companies. Nobody asked it to. So there's a little bit of truth and there's a little bit of, let's call it exaggeration or creative license in the title here. When you dig into the article, what you find out is that, you know, a company, uh, the truffle security who wanted to do some testing around this figured out a way to replicate thirty company websites. So these are not real world live company websites, either thirty kind of sample or mock company websites that they stood up and created for the purpose of this. And what they found is that they gave Claude a set of tasks around this, and they kind of intentionally created a very common condition where an API key was left in the JavaScript front end of this web application that they were publishing, and they gave with the goal that they gave Claude. It was actually very beneficial to have that API key and then leverage it. And sure enough, Claude actually turned around and found the API key through ingesting the website, getting a crash dump exposed the key, picked up the key, and then figured out that it could use a common known SQL injection query on this web framework in order to kind of access some of the back end data, which was part of the task or, sorry, part of the goals that it was given there. So you have kind of an artificial environment with the right set of conditions. And the, the test is really will the AI go down this path of leveraging a vulnerability on a system. We've talked about this before with other stories around Claude using zero days in container based capture the flag scenarios where it uses leverages a known vulnerability to break into a system. And so sure enough, yet again, again, given the right prompts, the right set of goals, not having any of the constraints around constrained behavior, etc .. And remember that SQL injections are a well documented, well known attack technique. So when you think about what's in the corpus of training data, SQL injections are very widely discussed and published. And so from a training perspective, this is a tool. This is not necessarily a prohibited behavior. This is a tool that can be used to accomplish the goal. It kind of creates a little bit of accidental pen testing and will kind of show, uh, a defender that they have weaknesses in the systems if they've got the right kind of logging turned on. Uh, firewalls won't pick this up because you've got, you know, kind of a valid URL and a valid website, and the fact that you're leveraging an API credential that's exposed in that website is not going to set off many alarm bells unless you've got some alert on the API usage or the API token usage. So, you know, I think one of the ways to think about it is this kind of accidental hacking. It tries to be helpful. It tries to accomplish the goals that it's put out to. And so if you're running an agentic AI on your workstation, remember, you might not just be a developer, you might also be storing credentials that can lead to exposure. You're now potentially part of a supply chain. So anyway, a really interesting kind of story around that.
All right. Moving on to our final story of this week. And this is the biggest one of the week. So I wanted to spend the most time on this and the hacking of McKinsey's AI platform. So this is from a company called Code Wall who does AI powered pen testing and application security testing. And what they found was they found that, you know, McKinsey had an internal AI system called Lilli, named after an early employee over there. And Lilli is kind of a combination of some internal McKinsey database and other data. So that's things like over seven hundred thousand files across, you know, the kinds of things that you would expect a management consulting company like McKinsey to have PDFs, excels, PowerPoints, etc .. Three point six eight million documents, uh, in the databases, as well as those seven hundred thousand some files. And what they found was that, you know, in building this system, there were roughly a thousand two hundred API endpoints. And out of those. Oh, sorry, two hundred, not one thousand two hundred. Two hundred API endpoints. And out of those two hundred twenty two did not require authentication. We've talked on the show in the past about the overlap between AI and APIs and how like, there literally is no AI usage that doesn't go over an API, but it also brings up this point about, you know, AI powered vulnerability scanning. It's hard for a lot of organizations to look at two hundred plus API endpoints and pinpoint the twenty two that don't have proper authentication on them. It is very easy for an AI or an LLM to do this, especially given the right prompts. Now, of those endpoints, there was one that had a very bad concatenation technique that took JSON keys and turned them into SQL queries, which was a an unknown OWASP ZAP unknown by OWASP ZAP vulnerability. So ZAP, if you're familiar with that, is the OWASP scanner for these vulnerabilities. This led to SQL injection that led to access to the data back end kind of compromises. The prompt layer goes directly through it. Again using that access in the API world, you would consider this a BOLA or an IDOR where you have kind of an unauthorized unauthorized broken object level authorization. Excuse me. And the implication to an organization like McKinsey is, of course, you know, there's a lot of proprietary internal data, a lot of proprietary customer data that can all leak out out of this. Obviously, that can erode reputation, customer trust, etc..
So a couple of things that I really liked about this story that tie things together. Number one, systems move really fast. People building these systems move really fast. Mistakes get made. Number two, the API is still the attack surface of choice for AI powered systems, applications, tools, etc .. Number three, assume that any kind of vulnerability can be found and can be found a lot faster than you can think about patching it. So you do want to take some extra time as you're building your applications to do things like check for Secure by Design. All of the code analysis tools that we discussed are really, really good at looking for things like this. So I'll leave it there. That's I think this story really ties together a lot of the themes that we've discussed here on this week in AI security. Very, very well. As always, like subscribe rate review. We will talk to you next week. Thanks so much. Bye bye.