In this episode, Jeremy explores how autonomous model execution is completely upending classical software patch cycles and regulatory risk modeling.
.png)
In this episode, Jeremy explores how autonomous model execution is completely upending classical software patch cycles and regulatory risk modeling. From Anthropic’s early access model mapping out thousands of real-world vulnerabilities autonomously to state regulators drawing a hard line on frontier safety, enterprise security is in a state of rapid transformation.
Key Episode Highlights:
Episode Links
All right. Welcome back to another episode of This Week in AI security, coming to you for the week of the twenty eighth of May twenty twenty six. We actually have a little bit shorter than usual episode in terms of new stories to report on. So I want to take a little bit of time at the end of today's episode to talk about a couple of larger, let's say, macro or meta level themes that are coming up more and more.
So let's get started by running through a couple of the tactical stories that are on themes that we've covered for the last couple of weeks, for the last several weeks, of course, the whole industry has been talking about Anthropic's Mythos model family. And, you know, it's insane cybersecurity capability, specifically the ability to identify vulnerabilities by scanning source code. So about fifty partners have been working with it for the last month or so in early access. And across that, it's now reported that more than ten thousand vulnerabilities have been identified, including, uh, a number of critical open source and very high severity items across about a thousand open source projects. One thousand seven hundred and twenty six confirmed true positives so far. And so while there has been a lot of talk about like, hey, this is overblown or hey, you know, smaller models can find the same, I think again, the theme to really focus on is that this is autonomous scanning that is happening all on its own without a lot of direction or human intervention.
The mark he finds so far is a CVSS score of nine point one on a CVE in WolfSSL certificate forgery that lets attackers masquerade as legit services. We've had ninety seven findings patched to upstream. So this is, you know, just doing a lot of this kind of source code scanning and helping the industry build more secure and better software. That's generally a positive, but it does leave this window of opportunity where software that's already running might get scanned and might get scanned by threat actors as well.
Moving on. There's a proof of concept around a Vietnamese security firm called Khalif, who apparently got access to this preview. Built a public macOS kernel memory corruption exploit on Apple M5 silicon that bypasses something called the memory integrity, Enforcement or hardware MTD module, and this was built in just five days. It chains together two kernel vulnerabilities for local privilege escalation. And this was against this integrity memory integrity enforcement molecule module rather that Apple reportedly spent billions building. And so that's a really interesting, you know, advanced malware capability. Something to keep an eye on is this is, is this a one off proof of concept from a security firm that has the expertise? Or is this a capability that might end up in broader hands that, you know, anybody might have access to that?
All right. Moving off of Mythos and vulnerability, scanning onto a couple of the other areas that we've touched on any number of times, and that is around supply chain and the infrastructure used to power AI. And so a vulnerability that is being dubbed bad host was found in something called Starlette that has about three hundred million downloads per week. It's the routing core of the FastAPI framework, and it is basically the entire Python AI infrastructure stack. Chances are pretty high if you're running Python based AI workloads, meaning like hosting your LLM or hosting the application that talks to an LLM on a Python environment like this with a lot of direct pass through of prompting, etc. you may well be using the Starlette package, whether directly or inside FastAPI. That would be something to to watch out for. The patch is just a bump to one dot zero dot one. Two things that we have talked on the show any number of times. One is, remember, it's not always the AI itself, it's the infrastructure around the AI. And number two is it's always the API. And so just remember those things as you think about building stuff on your own.
Again, on the subject of supply chain, there was a GitHub breach over the past week that came from Team PCP, who apparently teamed up with the Lapsus$ group. They offered breach data for sale for ninety five thousand dollars. Customer repos on GitHub were reportedly not affected. Investigation is ongoing, and it has been reported that this compromise happened via a compromised GitHub employee's device with a malicious VS Code extension. So we've talked about IDE environments. We've talked about the proliferation of new extensions and add ons and tools around those IDE environments, and how a lot of them have already been shown to have malicious content inside them. And then, of course, the ability to take over repositories or environments has been a key risk that's been popping up again and again recently in this fast moving area.
All right, next, moving on to, uh, a little bit more of a think piece from TechCrunch. Everyone is navigating AI security in real time, even Google. I think the interesting thing to, to talk about here, you know, they cite a couple of data points that we've talked about any number of times on the show, the zero day clock and how, you know, the time on discovery of zero days has come down so dramatically over the past several months. How that threat, the threat of vulnerable software, is out there. We talked last week about the Verizon DBIR reporting that for the first time ever, a vulnerability exploit has now surpassed phishing and employee compromise as the number one attack vector for attackers to make their way into your organization.
There's also a lot of talk in this article about, you know, thinking about the data archaeology that, you know, there's a saying that, hey, if data is the new oil, uh, you know, that's the thing that's going to power AI for you and it's going to make your AI investment so powerful is your ability to couple your data with AI that you're using. Well, that also helps to uncover long forgotten things like, you know, legacy data repositories sitting inside your environment. Maybe you've got an old SharePoint server that has a lot of sensitive customer or organizational information inside it. Nobody really thinks about it because maybe the org has migrated a lot of internal documentation or what have you.
So a lot of things like that are in this article, other things as well, things around Google API keys. We've talked about that a couple of times on the show where, you know, API keys that were assumed to be relatively safe in terms of using them to embed things like Google Maps on your own company address website, just like we do here at FireTail. You know, those are have gotten an escalation of privileges and permissions into the API scopes and how that is now also a challenge. Uh, and older API key infrastructure still take up to twenty minutes to invalidate. Um, so there are reports that, you know, if your key is out there and it gets compromised even from the time that you go to shut it down, there may be a twenty three minute window that, uh, that is still allows your key to be used. And so there are a couple of examples of organizations where these insecure, leaked API credentials have racked up ten thousand dollars in usage in thirty minutes before spending limits, rate throttling or API key deregistration can kick in there. You know, even the COO of Google Cloud has confirmed a lot of these challenges. So just to kind of, you know, I don't know if reassure is the right word, but kind of confirm to a lot of our audience, you know, if your organization is struggling with the speed of this, just bear in mind you're not alone.
All right, moving on to our last kind of thematic topic of the day. I've got a couple of stories from the policy and regulation ecosystem that I want to talk about to close out today's episode. In the first of the stories in this category is from the New York DFS, which is the Department of Financial Services, and New York obviously being a huge financial service hub. And a lot of people's opinion kind of acts as the default initial regulator. Then later the SEC or other organizations might kick in as the actual regulator from a federal perspective. But the DFS as a state regulator really just kind of like acts as a initial source for a lot of regulation around this.
So there was a letter, uh, on May twenty first sent out to industry leaders. And it talks about the heightened cybersecurity risk from frontier AI models, talks a little bit about defensive measures. And it is the first state regulator to formally treat frontier AI as a cyber threat category. But I want to dive into a couple of things here on this point, because it's not a universal oh my gosh, frontier AI is a problem. No, no, the letter is very specific in the things that it calls out.
Number one is, of course, expedited vulnerability management. This ties back to what we've already talked about, you know, with the ability of threat actors to find vulnerabilities super quickly and to build exploits super quickly, coordination of third party service providers, securing material downstream dependencies. This really talks to your supply chain. Again, a theme that we've talked about. One of the latest, really good pieces of guidance that I saw around this is that, you know, never use latest. And I think that is pretty well known at this point. But I saw another piece of feedback from a gentleman called Dan Guido in a talk that he gave at the Unprompted conference earlier this year, who said that, you know, internally they decided that the right place to draw the line is that they are never going to use latest. Everything that they use must be seven days old or around that time frame. And so think about that as like any supply chain exploit will come out within that, that time period. And seven days is recent enough that it shouldn't have core vulnerabilities baked into it, but old enough that, you know, we we know that there's no supply or, you know, repo compromise, supply chain compromise risk around it.
Strengthening the security of programming practices again ties into the ability of the ability of LLMs to find vulnerabilities in your software. But this last point that is called out in the article is the monitoring and prompt reporting. And this is an area where I definitely would say that we at FireTail hear a lot of challenges from organizations that we talked to. Shadow AI is a huge problem within most organizations right now, and you as an organization are liable for the usage of AI that happens inside your organization. That does extend down to the level of the individual employee, the individual employee prompt, the individual employee uploading a file that might have data that shouldn't go out there and that, you know, if you already have shadow AI, meaning you don't know who's using what where. Chances are very high that you don't have a mechanism for monitoring or tracking employee prompts across the environment.
So it'll be really interesting to see if from this one state, this turns into either a broader state by state kind of rollout, or because this is the Department of Financial Services, does this go from the New York Financial Services regulator to the federal financial services regulator, to other jurisdictions outside the US, start to copy some of these or, you know, issue similar directives to to regulated organizations within their spaces. That'll be a really interesting thing to keep an eye out for. So watch this space for more regulation coming around industries or geographies.
And moving on to our last story of the day. So this last story was a draft executive order that would have given the government voluntary ninety day pre-release access to frontier models for security testing. Treasury, NSA, CISA would create classified benchmarks for quote unquote, covered frontier models. And the executive order was pulled just hours before it was supposed to be signed. The reports say that the framework could slow innovation and give China an edge, and that is reportedly the reason for why the why the executive order was pulled so last minute.
There's other speculation that some of the optics around the signing ceremony and event weren't playing out the way that the White House wanted to, and that was a contributing factor in why it was canceled. The net result of it is status quo, but what it does speak to is a lot of uncertainty and kind of, let's say, geopolitical implications around the development of AI around the world. You know, was this really pulled to keep a perceived advantage for America? Was it pulled because of optics? Was it pulled because there was something that would have slowed the industry down? It's really impossible to say.
But what where it leaves a lot of organizations is thinking like, okay, I'm getting very little guidance, whether from state or industry specific regulators and certainly not from the federal level. You know, how do I go forward as an organization with any level of confidence? The business demand is so high that, you know, you as a security or let's say, GRC professional kind of have to find a way to say yes to AI within your organization, but then saying yes using what risk mitigation guidelines, using what set of technical controls. And I think from talking to a lot of people in the industry, I actually think that the industry right now would welcome more clarity around the direction of regulation or regulatory oversight that might be coming into play. Really interesting article. You can draw your own conclusions about why it was pulled. But again, like the status that it leaves us in is a little bit of limbo and a lot of uncertainty. So push forward, but maybe you need to have the ability to, to stop quickly. Unclear at this point.
All right. We'll leave it there. I guess a little bit shorter than usual for this week, but a couple of think pieces there to close out the week. If you've got any stories please send them our way. Otherwise we will talk to you on the next week of this week in AI security. And if you have been listening for a while now, I would appreciate if you could share this episode, like subscribe, all that good stuff. Talk to you soon. Thanks. Bye bye.