Modern Cyber with Jeremy Snyder - Episode
104

This Week in AI Security - 23rd April 2026

In this episode for April 23, 2026, Jeremy explores a week where "first principles" in security are being forgotten in the rush to adopt AI.

This Week in AI Security - 23rd April 2026

Podcast Transcript

All right. Welcome back to another episode of This Week in AI security. We have a ton of stuff to get through this week. We are coming to you for the week of the twenty third of April twenty twenty six. We're going to fly through a couple of these stories. We've got a couple of things, updates from stories from previous weeks. Let's get started.

First, a couple of stories from the legal side. An Oregon lawyer fined ten thousand dollars for AI, quote unquote, slop. You know, these are hallucinations coming out of court filings. Legal precedent that AI error does not excuse professional negligence or lack of oversight. This has also been cited before. I would draw a parallel back to the Air Canada story from a couple of weeks back. You know, if you are using AI in your professional duties, you are liable for any usage or any kind of mistake in usage that comes out of that.

Still in the legal domain, we've got an Oregon winery legal battle centering on AI-generated evidence and automatic strategic business advice. AI hallucinations are shifting from being kind of quirks to being something that will come up again and again. And you know, the real kicker from our perspective is that you have to audit the AI-derived data, and you also need to have a human-in-the-loop review process when you're thinking about this.

Moving on, we've got a story about employee privacy. There are reports coming out of TechCrunch and other media sources today that Meta is going to start capturing employee keystrokes to refine future internal AI training sets. And the motivation here is that, you know, if we want to train AI agents in the way that humans work, we need to kind of monitor humans at work to understand exactly how they work and use that as the basis. Now, there's obviously some insider risk if training data captures sensitive passwords that can get incorporated into data sets that are part of the training and could potentially be regurgitated later by the LLMs. There's some internal friction around this, around developer and corporate privacy. Be interesting to see how this plays out.

Moving on to our next story, we've got a report out of CSA, the Cloud Security Alliance. And a survey revealed that eighty-two percent of enterprises have unknown AI agents operating without security oversight. This is really along the lines of kind of the shadow AI as the new shadow IT. But now, you know, taking the next step, going from simple user usage in a browser-based LLM context, like a ChatGPT web environment to really kind of the next wave of that, we are bringing kind of personal productivity agents into the organization and then unleashing them. And again, you know, we're already creating this kind of security visibility and awareness gap with shadow IT in general and in the AI domain in particular. It's happening at a very, very rapid pace. And again, transitioning from the kind of individual employee into the more agent and agentic workflow and personal productivity automation sphere.

All right, moving on. There is a report of a data breach at an application hosting an application generation company called Vercel that I think will be familiar to many of our audience. Customer data has been exfiltrated. This has now been confirmed. This turns out to be with a third-party AI analytics partner. So I would categorize this under the whole, you know, let's say supply chain connected ecosystem. Again, it may not be the AI tool, it may be the tools to support AI usage that are rushed out in a hurry and have particular vulnerabilities, or open you up to third-party providers where there are risks across that third-party partner that you might be working with.

Moving on to our next story, we've got a couple of stories around a reported design flaw in the Model Context Protocol, or MCP, which is a popular concept in AI tooling. And you can think of it as a way to wrap an API or an external-facing interface that other services can use to integrate with your service. Wrap that with a human language LLM-powered layer, right. So instead of me having to construct a very structured JSON-formatted REST API call, I can use more human language. "Tell me about all of the prospects in my CRM database who match a certain set of criteria." As an example of a human language query that might need to be codified into a JSON object as part of a REST API call.

And what it turns out here is that the AI tools can be coerced into executing malicious commands via standard MCP requests. So just as we have things like prompt injection or indirect prompt injection in IDEs and any number of other environments that we've covered here on this week in AI security, we've got now similar things kind of popping up in the MCP environment as well. And it looks like these tools can be coerced into executing malicious commands via standard requests.

Anthropic, reportedly at the time of publication, has said that this is actually a design choice in the MCP protocol, and Anthropic is actually the originator of the MCP. And so, you know, there are some risks that any organization might be jumping into by unleashing an MCP interface to any of their systems through things like malicious actors or whether it be unintentional activity of agents that are connecting to MCP. That is one of the primary use cases of MCP—that if you have an AI agent that doesn't interact over REST, but interacts more over kind of prompt and response interpretation, the MCP is the logical interface for them to use because it is, again, human language structured.

This flaw also puts things like NGINX at risk, where MCP very often is built on kind of a translation layer between the human language and the back end over NGINX. And AI requests could probe these back-end systems to discover vulnerabilities in the APIs that are covered by NGINX at the back end. And that kind of hardened infrastructure becomes vulnerable when connected to these MCPs that might bring some of that inherent risk in there. And so all of these stories together kind of really suggest to me that the way to think about this is, you know, we've got security best practices around APIs, and we've had those for a long time, and they rely on things like authentication and authorization controls, as well as things like input validation and sanitization. Don't leave those first principles behind in the rush to MCP.

All right, moving on. There is a report that there are a rising class of so-called LLM routers. And these are tools that are designed to kind of help you connect system one to system two. You can think of in the past systems like Zapier that were kind of like API orchestration. And these LLM routers—the report is showing that some of these orchestration layers fail to validate if the commands are authenticated and authorized commands, or if there are prompt injections embedded in them. And similar to that, there are reports that some of them are malicious and malicious ones are coming online doing things like siphoning off credentials and tokens, and then sharing those to threat actors. There was one OpenAI key that was deliberately used in the clear in one of these environments, and then was used to rack up a bunch of token utilization on the OpenAI platform.

Moving on, we've got a couple of stories around again, you know, back to IDE environments and indirect prompt injection. In this case, it is actually comments on repos and comments on commits. And those are turning out to be the latest environment where researchers have proven that introducing malicious prompts will get executed. We've got something called "Logjack." This comes from some academic researchers, notably a gentleman named Harsha. And in this case, it is actually malicious prompts in AI logs that are being analyzed. And so if you're thinking about using AI to automate the process of log analysis, well, what happens if there's a malicious prompt in one of those? And then will the LLM actually execute based on what's in there?

And then last but not least, a story around Microsoft Copilot Studio and a prompt injection vulnerability in there. In this case, the indirect prompt injection came from filling out forms on a SharePoint site and then using an LLM to process those forms. This is in conjunction with Copilot Studio and connections that a lot of organizations will have if they're heavily invested in the Microsoft ecosystem. They use Copilot Studio, they use SharePoint as an internal documentation system, etc. It's again, it's all on that theme of, you know, if you are using AI or LLMs in particular to process large volumes of text, there is a high possibility that malicious text in any one of those environments can have a bad or negative outcome.

Moving on to our next story, which is arguably one of the bigger stories of the week. And this is around Anthropic's Mythos, which we've talked about now a couple of different times. We've got confirmed reports as of the time of recording, that an unauthorized group has gained access to Anthropic's Mythos model. How did it occur? Well, it turns out that Anthropic uses an API naming convention system around different endpoints and around the models that are behind the endpoints. And so this was literally just a set of guessed endpoints.

The unauthorized actor apparently looked at the previous families of models in the Claude family—Opus, Sonnet, etc.—and said, "Well, okay, well, these API URLs look like this. What are the URLs going to look like for Mythos?" They attempted a bunch of them and they were out there. And it's really interesting because Anthropic has been very, very cautious and careful about the rollout of Mythos. But then to leave this actually up in production without any kind of authorization checks on who is using those APIs is one of those basics that should be in there. Not only can people who know the API URL use it, but double check, are they authorized to use it?

From an API perspective, we would probably label this under Broken Function Level Authorization or Broken Object Level Authorization. I don't really have a dog in that fight, but it is a failure of authorization for some APIs that are already put out there. One other thing—I think that the key difference between some of the small language model (SLM) research and the Mythos research is that the small language models were used to reproduce the results of Mythos. Mythos was given a broader set of tasks like, "Hey, go find vulnerabilities," whereas the small language model was used to go validate vulnerabilities that were either known or hypothesized by Mythos already. It was Mythos that did all of the thinking and postulating around where to look for those vulnerabilities.

And finally, last story for the week. Every old vulnerability is now an AI vulnerability. This is not new AI vulnerabilities; this is just AI finding vulnerabilities on steroids. The upstart Lovable, which is a vibe coding platform, had a data breach. It looks to be confirmed. It is a result of APIs and Broken Object Level Authorization on those APIs. There is no usage of AI without API. Remember that thinking as you're thinking about first principles for some of your own initiatives. That was a busy week. Tons of stuff. Thanks for listening. We'll talk to you next week. Bye bye.

Protect your AI Innovation

See how FireTail can help you to discover AI & shadow AI use, analyze what data is being sent out and check for data leaks & compliance. Request a demo today.