Mastodon Feed: Post

Boosted by glyph ("Glyph"):
mttaggart@infosec.exchange ("Taggart") wrote:

So listen.

All LLM "red teaming" is kind of a joke because of the impossibility of verifiably defending the space. There will always be a smarter mouse/jailbreak/prompt injection. But embedded application testing—that is, testing AI features once incorporated into a separate application—is uniquely pointless.

Most of the tooling to automate the drudgery of LLM red teaming assumes API access to the model/application, against which it will fire endless prompts and evaluate responses. But once the app is embedded in an application, that access is almost never available. What's left is direct application access—in other words, clicking your way to glory. Maybe you want to try to Computer Use your way to a solution, but odds are you'll just end up doing this manually. And so doing a less thorough job. And so defending even less of the possibility space.

LLMs are fundamentally insecurable, but if you only get to them once they're baked into another application, that's somehow even more the case.