Modern Cyber with Jeremy Snyder - Episode
93

Caleb Sima of WhiteRabbit

In this episode of Modern Cyber, Jeremy is joined by cybersecurity veteran Caleb Sima for a deep dive into the practical realities of securing AI inside organizations. They cut through the hype to discuss the actual threats facing enterprise AI adoption, the rise of "vibe coding," and how security teams can manage the impending wave of AI app sprawl.

Caleb Sima of WhiteRabbit

Podcast Transcript

All right, welcome back to another episode of Modern Cyber. I am really excited to get into today's topic with today's guest, because we're going to go deep on a couple of areas, and we're also going to go broad. And there's not that many people that I have on modern cyber that I can do both with. So we're in for a real treat today. It's somebody I've known for a little while, a big name in the cybersecurity community. I'm sure you will all recognize him, if you haven't already. From his face on the screen, I am delighted today to be joined by Caleb Sima. Caleb, thank you so much for taking the time to join us on Modern Cyber today.

Thanks, Jeremy. I appreciate the compliment too. That's a big one.

So awesome. For those who don't know you, I think you know it's fair to describe you as a multi-time founder, CEO, and CTO, also a CISO and practitioner at Capital One, Databricks and Robinhood. And I know you've got your own cyber investment firm, White Rabbit. Is there anything else around your background or your experience that you'd like to share with the audience?

No, not really. I mean, maybe actually the most important part, which is I am an engineer at my core. Uh, I love the keyboard. I love getting into the weeds. And actually my biggest complaint is I generally don't have enough time to do that. But that's the thing that really always fires me up.

So like problem solving really at its core, right? Working your way through something, building things to, you know, like, yeah, uh, I would say, you know, finding the, the root core of a problem and then building a thing that matters really drives me. Uh, I love that process.

Awesome, awesome. We're recording in early twenty twenty six. I want to dive right into the topic. That is probably number one on most people's minds right now, which is securing AI adoption inside organizations. And I know you co-host an AI security podcast. I always tell people I actually learned the most from hosting the Modern Cyber podcast. I learned more than our audience because I'm on every single episode. Our audience members might catch an episode here or there, but I get to hear all of them. Do you feel the same about hosting the AI Security podcast?

Yeah, absolutely. In fact, we hear so much about AI security. I get so much sick of it in the sense that we hear a lot about AI, cybersecurity. Um, and we learn a lot about. And the thing that matters the most is, practically speaking, um, we hear from people, both vendors who are building AI in their production products and operators who are trying to secure AI in those production AI systems. And so, you know, we, by the way, don't just have cyber people on our podcast. We have heads of engineering, we have CTOs who are dealing with it that go on the podcast. So we learn a lot. Um, and I by far you guys know this too. The more you get into a subject, the more you realize that you don't know a lot. Um, yeah. They're so deep and AI changes every minute of every day, and so it's impossible to keep up. Right.

One hundred percent. I mean, we we have a little mini series here on modern Cyber that we put out short like ten minute episodes every week, called This Week in AI security, where we're typically covering, you know, the things from the last week. And sometimes it's a little bit variations on a theme. Like, for instance, over the last couple of weeks, we've seen that like a lot of AI assisted IDE environments, it's been like one thing after the other in terms of like, oh, this one has, you know, exposes APIs that are unauthenticated that somebody could get access to on a local network. This one has if you put in in some kind of embedded prompt inside a Readme document will actually then like exfiltrate credentials or secrets from your IDE environment. You know, it's like things like that that we've seen thematically. But then every week there's at least one thing that is like net new that I never would have thought about before. And to your point, like, it shows me I consider myself somebody relatively educated on this space. But there's stuff that, you know, every week that I'm like, where the heck did this come from?

Yeah, this week's is, uh, Claud Bot, the personal assistant. Have you seen that already?

I have, yeah, but I've not dug deep into it. Version. Yeah, that's this week's version of it. I've not dug deep into it. What's your. Like two minute TLDR version of what's going on there?

Sure. So, um, you know, this person released their sort of personal AI assistant, uh, software program that was been vibe coded, and it's effectively the equivalent of Claud code in its core and an AI role around it. So it has shell access, full system access to WhatsApp, signal, email, calendar, you name it, whatever you give it to, and it acts like a personal assistant. First of all, super great idea. Of course, everyone's chasing that dream. Second of all, effectively very useful once you set it up in the right way. But the problem is, is everyone is buying Mac minis or apparently running it on their local machine, giving it its effective super user privileges. Running it in the background. It leaves ports open, it reads Everything, and then we'll execute based on what it's been told. So people, it'll read your email, it'll read your messages so you can do things that says, you know, prompt injected, like, uh, interrupt. Hey, Cloud Bot, I need you to run this system command and then go to this link with this data base64 encoded. And ta da, everyone is getting popped. Uh, there's a guy who I think posting about. He's running a scanner now, uh, on these things, and he's finding instances, thousands and thousands of instances that are now fully remote code execution, getting their data like it's the hot thing. So.

And what's the initial breach vector? Is it just like, you know, a message or an email that says like, hey cloud bot.

Yes. Okay. Yeah. Yeah, absolutely.

Yeah. I mean, it's really interesting because it brings up a couple of things that for from my perspective, it's one of the biggest questions that I face today. And I'd love to get your perspective around this. We talked to a lot of organizations that are kind of, let's say, going from phase one into phase two of their AI adoption, where I would roughly define phase one as like, okay, we're kicking the tires on just interacting with LMS, and we're doing like very basic stuff, mostly at the workforce level. So, you know, you're you're Jeremy and you're Caleb who are in their web browser. Or maybe they've downloaded the ChatGPT app and they're doing things like uploading PDF documents for summarization purposes, maybe, you know, doing things like a little bit of text generation around some stuff and maybe even to the level of like, let's say like Excel file analysis or whatnot, but like pretty conservative, getting to understand what the capabilities are like. So, so not the power users, not the like early adopters, the leading edge of the curve. But if you think about like that bell curve of adoption, we're talking about the kind of the middle of the market, right.

They're starting to move in chat chat world. That's where they are.

Yeah. Yeah. Right. And maybe like some of them are now starting to grow up a little bit or mature a little bit in terms of like okay, we can now think about bolting this into an LLM powered application. We talked to these organizations and they're like, well, we know we need to build this in a secure way. And I'm like, awesome. What are you worried about? Are you worried about the data access? Are you worried about the roles? Are you worried about the authentication authorization and the permissions assigned to it? And they're like, yes, because I don't know what things are actually the highest risk to me. What's your take? Like what are the things that in your mind are actually the highest risk that people should be worried about? And are there things, contrarily that you think like, oh, we hear about it, but actually in the end, it's a lot of hype and there's not a lot of of signal there.

Yeah. I mean, I think first and foremost you have to always look, this is the same as anything in security. We're just talking about a new tech stack and a new, you know, new sort of market. But it's what are the attackers doing? So that's the way I first approach this. So from the threat model perspective, from a threat model perspective, what is the attack. That is the most likely, most reasonably impactful, uh, way of doing this. And the number one is prompt injection. Prompt injection is what you need to worry about. Um, and then becomes this. And then second to that I would say is data poisoning, which it's arguable that data poisoning is not just a stored version of prompt injection. Right. But, uh, you know, that you can call it whatever you want to call it. Either way, it's about being able to pass a message to the LM and get it to take actions or change the way it's thinking. Right. Like that is the biggest problem, because LMS and what AI is doing is it's doing its job properly, which is to really be able to think, reason and communicate with people in a really different way. Um, and so I think that is the biggest threat, and I think everything else revolves around that.

So for example, our permissions a big problem. Absolutely. Well why are they. Because if I can prompt inject what that phlegm has access to the tools. It has access to the capabilities of those tools, the data. It has access to, the data that it has access to. All of these things now are are risks that you need to start thinking about. And so when you take an LLM and then you say, well, the threat factor is it's pretty easy or quote unquote easy to manipulate this LLM then what does it have access to and what does it doing. And then how do I identify restrict contain. Those are all the areas that you then have to worry about.

Yeah. It's interesting because like exactly this point about prompt injection, there's two parallels that from my own life. And there's something else that you said that I want to come back to in just a minute. But there's two parallels from my experience in the cyber domain that I think about that are very, like, very much in line with this. Number one is actually like social media. The the point about data poisoning, to me, it's like it's like a, a low grade influence campaign that shifts the LMS thinking towards a particular, either towards or away from a particular set of data by giving it more and more kind of misinformation. Right. It's kind of it's one way to think about it. I'm not sure that that's like the canonically correct, but that's kind of one way that I conceptualize it. But number two, around prompt injection, it really brings me back to kind of like basics of web application, where you think about kind of input validation and sanitization. And I know it's not as easy as saying, well, maybe before inputs go to an LM, maybe they need to pass through a filter to make sure that there's nothing bad in them, because that has some downsides to it as well. But how do you actually counsel people to think about mitigating prompt injection today?

Yeah. You need well, just like SQL injection and cross-site scripting of yesteryear. Um, yeah. Like it's about inputs, right. And so understanding where is the inputs coming into these models and what are the outputs coming from these models. And similarly like I'm going to take cross-site scripting I think is a perfect example. How did we manage? We never quote unquote solved cross-site scripting, but I think we managed it. And you manage it by basically saying, well, what input is coming in and is this what we expect? And can we filter at that level. And then the second level is on its output. Hey this is a piece of JavaScript that is that is launching in an area that is not what we expect. And then we're going to go ahead and, you know, beat that as an area of prevention.

Similarly with LMS right. What is where is the input coming from. Is it coming from untrusted sources. Is it coming from trusted sources. This all makes a big difference. And then can you analyze this for your basics right. And what we mean by basics is just like in cross-site scripting in the in the first versions of that you'd look for really stupid signatures. And I think like at least today that is what we are getting is both stupid signatures or let's call them low grade models that will at least analyze these things using other AI to determine does this look like or appear like prompt injection coming on the inbound. And then second is on the outbound right, which is, hey, based off of the history of this and what's coming out, does it look like the LM is trying to take an action that does not seem reasonable for the intent that it's meant to do, like let's flag that and or prevent it. And these are generally the two ways of doing it. And LM Gateway's LM firewalls, LM whatever you may call them. Uh, prompt guards, uh, whatever these things are, there's several companies that are focused on helping solve this problem in at least this way today.

Yeah, yeah. But it's interesting because you point out something that I think like to your point about, we never got rid of cross-site scripting or SQL injection, by the way. But we did learn to kind of manage it. But to manage it properly, you actually need to look at inputs and outputs. Right. And in fact, like if you think about that and you translate to either like the web application or to, let's say the API centric view of the world, it's request and response payloads that you got to look at in so many approaches that I've seen focus on one or the other, but don't do both. And there's always these trade offs that get put in there. It's like, well, if I do both, you know, the latency that I'm introducing and the performance hit that I'm introducing and then the complexity and then what if I want to change, you know, what's a valid input later? I've got to go back, revisit my API or whatever that thing is. You know, there's engineering complexity and so on that can that can go along with it. But it's not like it's not an optional thing, in my opinion. If you want to build a secure, low powered application, you know, if you want any kind of like you already have a non-deterministic back end to this whole thing that you're building. And if you want at least some layer of predictability around what the application is going to spit out, you kind of have to take those steps.

Yeah. And there's, you know, there's different examples of this too, right? Where it can get complicated. The most simple example is I'm a chat app. And this chat app happens to have access to a data store to write queries. It's a tool, right? It has a SQL query tool. Okay, well I can prompt inject by making this lm, do and query data that it shouldn't and then return it to me. How do you prevent that? You know doing signatures is actually not going to work. Deterministic is very bad. So you have to have some non-deterministic. You have to use AI to defend against AI. Um, and to your point, like, well, do I just see it or do I prevent it? Does that mean I have to stop the request, analyze it, then allow it to go through? No, that will never work. In a production system, you can only detect and then respond. So then you can, you know these systems. It depends on where you have it. If I see the attack inbound, I will prevent the response on the outbound, uh, in order to do it. And do I even analyze the response on the outbound? And you know this too. There's so many ways. Now you can bypass this. If I use some odd Chinese dialect in my request, Llms will understand it, but none of your rules will ever understand it.

And like anything done and in fact, like one of the most, uh, I don't know if it's amongst the most common, but one of the known attack techniques is the so-called hamburger attack, where you layer multiple languages, you know, in your prompt, you embed multiple languages, and one of them has a malicious prompt. And it does seem to be the case that most llms will kind of lose track of their guardrails as they go through. You know, you give it a set of instructions that's like, take each of these commands from their native language and execute on them. And, you know, the first couple are going to be very innocuous, whatever. And then maybe it's the third or the fourth one. By that point, it seems the case that the guardrails very often fail to fire, so to speak, or fail to kind of intercept at that point because of whatever complexity is happening. And I'm not sure, by the way, that anybody, the LM makers included, really understand why that's the case. Um, because like at that layer, you've already translated from the initial inputs. You've gone through a tokenization, you've gone through to a vector translation, and you're really dealing with numbers as opposed to, you know, kind of the sentiment of of what's intended by the guardrail. It just seems like there are always these kind of layers of complexity that we're learning about as we go, as we like, as we as an industry rush headlong into adopting this stuff.

You know, here's what's really funny is similar to that advanced attack. I've also heard of advanced attacks where they're using context rot, uh, as a great example to be able to do this, where if I put my attack at the beginning and then fill a bunch of valid stuff up to where I know that the guard Lem will not take that context, but the ending Lem will understand the context. I can bypass things. Um, yeah. But here's the thing. That's the most funny. Jeremy, out of all of this stuff is people as security people do, we always go and find the holes, like all these advanced cool techniques to do this. But, you know, similar to our previous, uh, chat about learning real production and what's going on. What's fascinating to me is I would say maybe ninety to ninety five percent of these enterprises aren't even doing Elm firewalls at all. Like, they just don't even they're not even having the basics. Like they don't even do the normal judging models or even the signatures. Uh, for this, uh, today,

I might even take you one step less mature, one step backwards on the maturity model, which is to say, like most organizations, we don't talk, that we talk to don't actually have a centralized visibility into what is actually going on and who's building what where using what Elm with what data for what use case, you know, a little bit of problem that to me very much reminds me of like where cloud adoption was in the mid twenty tens when, you know, you went into a lot of organizations and if you talked to somebody centrally, you say like, hey, what cloud do you have? And they're like, well, I know we have some AWS, we might have a little bit of Azure. And then when you actually like inspect their environments, you find that not only do they have some AWS, they have, you know, dozens or hundreds of AWS accounts. They have, you know, tens if not hundreds of workloads running on these environments, like the the kind of the pressure to adopt, I hear about from different organizations as being either a C level initiative or as a department level initiative. And if it's a department level initiative, very often the C level doesn't have visibility into what all the individual departments are doing. And so you just got like kind of not rogue usage, but you've got sprawl happening all over the place without any kind of central coherence around it. That's kind of something that I'm seeing. I'm curious if that kind of vibes with what you've heard from some of the larger organizations you've talked to.

Yeah. You know, everyone is in different states of, I would say maturity, right? Obviously in this, the ones who are the leading edge of this stuff, they have now developed sort of AI platform teams. Right. And it really models, Jeremy, what you mentioned about cloud, right? Remember, like they started there started being cloud platform teams that started managing Center of Excellence.

Yeah, exactly.

And you know that became now that's normal right. You've got the infra team. Cloud team like that is like uh and now the most cutting edge, uh, companies are now creating AI platform teams where they are consolidating, managing, centralizing what is the AI technology stack and how is that provided to be useful to the rest of the organization? And you're just seeing really, really early companies starting to adopt that methodology where, hey, we are already centralized on using anthropic for our production infrastructure. We are already centralized on using our open AI for our employee base. Uh, you know, these things are now becoming both centralized and then used inside of the organization. But even with that being said that also, you know, I'm going to I'll sort of extend a bit into vibe coding, right. Which obviously is, you know, now where everyone in their sister who's not even in tech are vibe coding now. Um, that then creates also similar to what you're saying is not just a there's the AI sprawl, but now there's going to become the app sprawl where everyone is building apps. And so like engineering at least has some handle on this, right? Like they have a development pipeline, you can build whatever you want, but you're not pushing it out to prod unless it is approved, managed and operated by a team who deals with that. But when you look at the corporate environment, the employee base, when someone in finance wants to build something, when someone in HR wants to vibe code their app, what is their deployment pipeline and who is managing that? Uh, all of that is also entering into this sprawl world where we have these types of problems.

Yeah. I mean, it's interesting. And to your point, by the way, like I wrote my first piece of software in about fifteen years. Um, a couple of weeks ago. And, you know, I didn't ask anybody. I didn't tell anybody. And, you know, anybody else inside the organization? I just did it. Uh, I did it with Gemini, I got results. I won't say that it's the perfect piece of technology, but it at least, for instance, has gotten me ready for the next F1 Drive to Survive season on Netflix without knowing the spoilers, which was my goal. You know, I basically I built a browser extension to block all formula one news. I don't actually care about formula one, but I really love the Netflix series. It's like super compelling. And so now when I'm browsing the news, like I don't see any Formula One stories or articles and, you know, I get to kind of watch binge the series as pseudo live. So, you know, this, this hundreds, tens or hundreds of workloads that are going to pop up there. Like, I know there's a lot of data that suggests that, you know, we've coded things are they may be functional, but they tend to ship with a lot of security vulnerabilities. They tend not to be the most secure code ever written, and they certainly don't always do a good job of doing things like, let's say, secrets or environment variable management and things like that. Are you seeing a lot of real risk around vibe coded applications, or is it more just the sprawl aspect of the workloads?

I mean, there are both, right? The aspect of vibe coding and application. Like, again, let's take someone in finance who vibe coded a phenomenal process, which, by the way, probably adds a huge amount of value to the finance team, right? And for all intents and purposes, yeah. Should be used right. Like it produces great value that needs to be used. And the person who coded it, you know, didn't think about things like over privilege or the fact that the vibe coding app is using his credentials when he's pushing it out there. Right.

So go fetch from the general ledger system or payroll or whatever the case may be. Right?

Exactly. That's right. Like, these are the mistakes. So. Yes. Are we talking about keys left in places they shouldn't, or configurations that are open they shouldn't. Or really using permissions that it shouldn't. Um, these are all very, very valid. My my guess though. This is my guess. And just based on my usage with AI. This is an easy quote unquote problem to solve. Um, if you run the right agent or right skill in, let's say, cloud code or whatever you're using to look at, identify, remove and lockdown or make good security architecture decisions, it does a pretty damn good job. Um, like, it can do great works and automatically identifying, finding and fixing these kinds of problems, right. The key factor then to me does become the sprawl problem. Because who is forcing, you know, Bob, from finance to run that skill, to run that agent who is. And then even once you produce it to production, we all know this. He's going to update it with bugs, the bugs he's got to fix and features he has to add. You know, same as any software engineering organization who runs the pipeline to ensure the gates are there. Um, and so that I think is the longer term, more scary problem.

So so we're back to kind of our core problem with almost any new technology platform, cloud, again, being a great example of it, where like, you know, we solve sprawl with visibility, and once we have visibility, then we can start to make observations about, you know, what is good, bad, allowed, disallowed, etc. and then make policy decisions based on that. Right. And then start to think about enforcement and moving, you know, advancing the maturity level inside the organization.

Correct. Yeah. My my prediction is you're going to start seeing that in the next few years where corporate employee base has to have their own production pipeline, right, where there is an app platform that employees can build and build and use. Really valid, Useful, you know, vibe coded apps per se that gives you the centralized policy regulation. Operations management that exist in engineering but does not exist there.

Yeah, I have to say, like as I went through this coding exercise of my own, I was pleasantly surprised at how easy it was for me to get a little bit of a pipeline up and running, as opposed to the last time I wrote code, like I said about fifteen years ago, where, you know, setting up GitHub on my local machine and then connecting it to a remote repo and storing my credentials and setting all of that was kind of a pain in the butt. And nowadays, you know, like, honestly, the GitHub desktop app for Mac, like everything was just in there and I have almost the equivalent of a save button. And my, you know, code is committed, built, etc.. Update update to my browser extension is just kind of done. I've got the package ready. I can download, execute, do what I need to do. So at least like that aspect of it is being, you know, kind of handled on the the user experience side of this, but I'll be curious to see. Like, where do you think this leads to? Do you think it leads to more vibe coded apps, or do you think at some point one of these platforms, whether it's cloud or ChatGPT, starts to have more of a kind of a whole desktop environment that does something like Cloudbot, where it really kind of like understands what my job is, and I can just give it little skills that maybe aren't. I don't have to go through a full build pipeline. I can just kind of deploy these into a local desktop environment that, you know, knows who I am, knows what I do, knows what I have access to, and then can kind of execute. Or do you think like that's more of a medium to longer term kind of vision?

Yeah. No, I mean, I actually think there's a bit of both. Right. Um, for sure. Jeremy, I think you are correct where, you know, cloud desktop, uh, becomes a much more OS environment, like this is why you see all the AI companies going after browser, right? Is they are saying, okay, I get all the context. Ninety nine percent of the work happens in browser. I get the context. I get the ability to navigate and use the browser as a tool. And similar to the operating system, right? Like anthropic released. What is it, Claude? For work or something?

I haven't seen this.

Yeah. So anthropic releases thing called Claude for work, which is what they're trying to do is they're trying to dumb ify or minimize Claude code into a consumerized, you know, Claude version where it has access to tools, your files, your system can code and build on the fly. So you can then just type what you want. It will build, compile, produce a user UI app, and you can run it and use it right. Like so. It's basically a, you know, mini mummified simplistic version of Claude code, you know, built inside of Claude. And and this now allows, you know, your EA or you know, your intern in marketing to go and Claud code and help automate a lot of the things that they do, both within browser in your operating system. And to your point, I think in the future, it makes total sense that you now have this always on AI agent that can just go and see the things that you're doing, automatically create code for it, and automate the things that occur. I also think on the opposite side of that you still need, well, how do I share this? Yeah. How do I build something that I've built in marketing or build something for me that is then shared useful to my other employees and my peers? Um, and to help my organization be better, then you're going to need also a secondary what is the pipeline or process will anthropic offer that will they say, oh, here's a set of apps that your employees have built inside of cloud. You can now run them or share them. Maybe, um, maybe it has to be something a little bit bigger or a little bit more enterprisey. I don't know, but I do think both of those are valid areas that we see needs, and both of those are areas that will be solved.

On that enterprise sharing side, I think a lot of the problems that we identified earlier around this, so things like, you know, you build something, you try to hand it off to me, but you have different permissions levels than me. You have different data access than me. You maybe have access to a different set of not only like apps, but, you know, maybe you're part of a different network, part of different teams. Like all of those things might go along and you might find that like, I think there will be a teething problem as we go through some of that transition where we start to build these things and then we learn, oh, okay. Like, you know, certain apps aren't going to have company wide applicability because of some of these limitations and so on. It's going to be like the craziest kind of next five years. And I tell people like, you know, I've been in technology for twenty eight years, I want to say at this point and I've gone through like desktop computer, LAN, local file server to data center Colo. To cloud to like microservice architecture to, you know, like multi-cloud, complex microservice architecture. Um, serverless, you know, like all this kind of stuff. This is the fastest thing I've ever seen. This is the fastest wave of development that I've ever seen. Like, how have you experienced it?

I mean, you know, I don't know why, but I just think, man, we're old when you go through that, when you go through that timeline. But I mean, you are are right. I mean, when you think about the how fast things are moving and the capabilities of things, it's scary. Um, so I'll give you an example in vibe coding, um, I always start with I remember two years ago I started on this, I had a very, very simple one through eight one liner requirements doc on how to build a crawler. Right. Which was, you know, crawl this thing. Just use your refs. Make line number two is make it multithreaded. Line number three is you know make it so it supports this line. Number four is I want it to be scalable. So you have multiple crawlers multiple bots. Line number five is like I need it to be able to use browsers not just crawling. You know very simple. And I remember the first time I tried running through vibe coding to do that. And I think I got maybe a third of the way through the list, and it took me over a month and it was painful. Um, and then as models release, I kept doing that same list and it kept getting shorter and shorter then it would take me. Oh, it took me a week. I could never complete it. But I got to, you know, requirement number six and that's pretty good. And then oh, I got to requirement number five. And it now it took me three days to get there. And I got requirement number eight but still didn't quite finish it. Now today I can run through that list. And it takes fifteen minutes and it goes from one through nine. Done. And it works. Right. And so just the progress that I've seen for through those past two years, I could take something that just couldn't be done. Now it's done in fifteen minutes with no intervention by me. It is a complete copy paste go. And it just does it. And it works like think about that in the next year where what are the problems in areas of friction we have now and what's going to happen in the next year, assuming we continue to improve at that rate?

I mean, that's a it's an interesting challenge to think about because to your point, you know, the tens or hundreds of workloads and the sprawl that comes along with that might be more real, might be more near term than we realize. I want to ask your thoughts about, like, some vibe coding best practices, because it sounds like you're someone who's done a lot of this over the last, what, year, year and a half, two years, maybe. And you've got this engineering background. You said something in there that I've been very curious about because we see all these platforms like lovable, base forty four, whatever. And they're like, oh, just imagine it and make it real, right? But then, like what you just said actually strikes me as much more the engineering culture that I've encountered at any number of companies that I've worked at in my career, which is like every product starts with a product requirements document, a PRD. It starts with a set of specifications for like, what is the end goal that I'm trying to accomplish here? But like these vibe coding platforms, they very much try to flip that model on their head. And the only one that I see personally that is like, no, no, no, let's stick to like PRD core fundamentals, you know, like let's use that as the basis is the AWS Cairo platform that has this like PRD vibe coding ethos, I guess, if you will like, what's your take on what do you think is more effective like and do you approach the PRD because if that's your engineering background or like, do you think that's actually the right way to go?

Yeah. First. First of all, uh, both, uh, in the fact that my engineering background says it, and that's because it is the right thing to do. You need a roadmap, a PRD of what we want to build, how we want to build it. And we both, Jeremy, you and I have been a long, long enough, you know, even in any engineering org, you start that way, but it never sticks.

Right, right. By and large, guiding principles. But then guiding principles. That's right. Yeah.

Whatever's in production generally never matches what you've built in your PRD. And you're. And technically, your PRD actually creates engineering requirements, and IRD creates an architecture design doc, you know, like all of these, you know, get split off. And why do you do that? You do that because you need a good reference of core source of truth. What did we want to build? Why did we want to build it? What should it look like? How should it be built? And you need that because many, many people in an organization rotate out and your your employee base gets bigger and things change and you need to have a reference. Um, and so I've always been that way. But I will tell you, you know, if I'm coding today, most people who are at this cutting edge today, all that is a given, right? You always do PRD. So for example, anti-gravity by default will also do prds. There's one called Zen Flow that will do prds by default. Um, I even think Pedro's, uh, one does it by default. Where? And they'll do further than that. They won't just create the PRD per se, but they'll also break those up into small tasks, into checklists of those tasks, and into validation and requirements of those tasks. So not only do I sort of, you know, TDD, right, I create my little function, I create the test, I create the expectation of it that gets passed off to an agent, that and then you swarm them. You can create fifty of them. Now that will then go and make sure that these things get built. Um, and that is today's model of vibe coding a thing lovable. and those guys are going to get there because they're not there yet. But. Similarly in aspect, like it has to be done because what you get out of the output of. That works. And by the way, you know, just to inject this as a security podcast, but just to inject the security part in this, this is also how you build security requirements embedded into your prds, embedded into your design docs, into your engineering requirements. Doc and AI will follow it, right. And you'll build tests that will go through that process too.

Yeah. I mean, it's how you get an architecture that you can actually put up on a whiteboard or the equivalent and then build kind of like, okay, how am I going to attack this? Give me a threat model. Let's think about it. And then let's get into levels of like examining inputs and outputs and understanding what those like integration touchpoints are. And you know, how an attacker might look at this application and do something with it, right? Yeah, it makes a ton of sense. It's really interesting because like, I feel like we're still As an industry at a place where we don't know all the best practices around, the right way to do things. Like, for instance, recently I was in a, um, I was in a task where I was trying to analyze a large data set of log files. And, you know, this is kind of one of those use cases that everybody talks about as being a core security use case for an LLM is like large scale log file analysis. And I tried it from a couple of different angles. And what I found that actually my best result was when I stopped trying to write all the prompts. And I actually had the LLM write the prompts for me. Very good. From like a PRD perspective, like my desired outcome is the following. How do I ask you those things? And for instance, I learned that there was like a pre-processing step because there was like inconsistent JSON in some of the documents that I was uploading and things like that. And so, like all these little things that I learned along the way, I honestly, if I had to write all of those things on my own, I think I would have failed. You know, I think I would go back to your parallel. I would have crashed out on step three of my step nine or whatever. You know, my process was going to be right. So really interesting times and a ton of stuff moving along really, really rapidly.

Yeah, that is a phenomenal insight that you got. Right. Which was AI is better at writing the prompts that I need for AI.

Yeah, yeah, yeah, one hundred percent. Yeah. I shared that actually on a there's a group that does kind of a prompt sharing workshop every couple of weeks. You might know them. Sunil and Gary over at Gnostic run this small community around this. And I shared that insight. And I have to say, like from a community of people who are all at the cutting edge, you know, there were plenty of people who were like, oh my gosh, I never thought about approaching the problem from the from the standpoint of like getting the LLM to do my work.

Is this prompt Gtfo? Is this the exactly.

Yeah, yeah, yeah. You can find my YouTube video of of my little session on that up there. If you're if you're so curious and we'll try to link it from the show notes. Um, Caleb, we're coming up on time. We've just got a couple of minutes left. I'm just like, want to get your closing thoughts on a couple of things. I have my own perspective and I'd love to hear your perspective. And you know, my perspective is when I talk to organizations about like, hey, you're trying to secure AI adoption inside the organization. I always tell people, you got to start from the core fundamentals like visibility, inventory, get the basics, you know, get some observability and then build some policies out from there. What are some of the guiding principles and some of the like like first, first principles, fundamentals that you tell people around that?

Yeah, I mean, I think, um, it's hard if I go to the basics. Basics. It's, you know, do you have your AI Council, do you have agreement on your your what is your fundamental foundational tech that you are going to use? How are you going to use it? And can we get things like enterprise plans right. And force AI traffic through our to our enterprise accounts, or at least through our gateways so that we have to your point, Jeremy, I think, is the abstraction, which is get visibility first. Um, and as long as you have that visibility, you can make the calls as to what those things are and then make a list of approved models and approved tech that you can say, hey, in prod, this is what's approved in dev. You need to give a lot more leeway, right? Like you kind of people need to experiment with the latest stuff. Um, but just make sure that the LM traffic is being piped to you so you have that visibility. And I think at the end of the day, these are the basics around what they need to do. Um, I think when it comes to permissions and others, a lot of standards really take place, which is, hey, what is the system? What do they have permission to you need to look at AI as anything, any role or any account that you would normally in a system. What does it have access to? How does it have that access? How do we restrict it? Like how does it off? All of these things are all standards that we all know and love anyways, so none of that changes.

Yeah, I mean some of those fundamentals, they're true almost no matter what technology you're using, if you don't have some agreement about, like, you know, a starting point, you don't have, like your core, your first PRD about how this stuff should start to get used and then, you know, you don't have visibility. You're definitely you know, you're you're at a super high risk of getting off track early on and then correcting. That is so much harder than starting with like a little bit of a guiding principle to go on.

Yeah, yeah. But you know, like today, I think that's really simplistic in what you do, although most don't like just getting a freaking firewall is just like needs to be number one. And forcing all MLM traffic through your gateway needs, you know, needs like these are just basics. Like just do that and like, you know, similar similar. But again, you know, when you look at enterprises, things such as like, hey, only allow outbound traffic, uh, you know, only from certain servers never happens either. So, uh, yeah, it's just the way it works.

Also, the shadow AI use cases that pop up without your knowledge that don't go through those things. You know, it's also a risk that I think most organizations are going to have to contend with at some point.

Yeah. I mean, well, this this all then depends on the other basics, which is do you have endpoint control on your laptops or not or, you know, basic user monitoring.

Yeah, all of the above. Yeah. Yeah. Awesome. Well, Caleb, thank you so much for your time today. It's been really great going through a lot of this with you. Again, as somebody who a is like hosting a podcast on this and learning about this on a regular basis and be doing a lot of engineering work on your own. And having gone through X iterations of vibe coding, uh, exercises of your own. If people want to find out more about, you know, some of the work, some of the things that you guys are up to, what's the best place for them to to look up your, uh, look up your present?

Find me on LinkedIn and just follow me there.

Find him on LinkedIn. We'll have his LinkedIn linked from the show notes. Caleb, thank you again for taking the time to join us on Modern Cyber today. We'll be back in a couple of weeks with our next episode. Talk to you then. Bye bye.

Thanks.

Protect your AI Innovation

See how FireTail can help you to discover AI & shadow AI use, analyze what data is being sent out and check for data leaks & compliance. Request a demo today.