Behavioral Detection for AI Agents: Research & Approaches
A summary of research on detecting compromised agents through behavioral analysis
Key Findings
- 1Most current prompt-based defenses suffer from high Attack Success Rates, demonstrating limited robustness against sophisticated injection attacks (arXiv research)
- 2AI agents move 16x more data than human users, requiring behavioral baselines to detect anomalous access patterns (Obsidian Security)
- 3SentinelAgent framework combines rule-based classification with LLM-based semantic reasoning to detect multi-agent attack patterns
- 4Behavioral analytics can detect attacks invisible to traditional tools by establishing dynamic baselines for users, devices, and applications
- 5Multi-agent systems require special attention as compromised agents can trigger harmful actions across shared state and privileges
Abstract
Why Behavioral Detection
Input-based security—scanning prompts for known attack patterns—faces fundamental limitations when applied to AI agents. Research demonstrates that most current prompt-based defenses suffer from high Attack Success Rates (ASR), showing limited robustness against sophisticated injection attacks.
Several factors make input-based detection insufficient:
Semantic Variability: The same attack can be phrased infinitely many ways. Unlike SQL injection where you can escape special characters, natural language has no special characters to escape.
Adaptive Attacks: Research shows that attackers can craft perturbations or split adversarial strings across multiple input fields, exceeding 50% attack success rates even against state-of-the-art defenses.
Indirect Vectors: Malicious instructions embedded in external content bypass input scanning entirely. The agent processes the attack as part of legitimate data retrieval.
Behavioral detection takes a different approach: rather than enumerating all possible attacks, it characterizes normal agent behavior and detects deviations. This catches novel attacks that input filtering misses because attacks must cause the agent to behave differently to achieve their goals.
Research Approaches to Behavioral Detection
Several research frameworks have emerged for behavioral detection in AI agent systems:
User and Entity Behavior Analytics (UEBA): AI's greatest defensive strength is its ability to learn what "normal" looks like across complex digital environments. By analyzing massive volumes of data, AI builds dynamic behavioral baselines for every user, device, and application. This UEBA capability allows security systems to detect novel, zero-day, and polymorphic attacks invisible to traditional tools.
SentinelAgent Framework: SentinelAgent combines rule-based classification with LLM-based semantic reasoning for behavior analysis on collected telemetry data. The framework enables detection across multiple granularities—from individual agent misbehavior to complex multi-agent attack patterns. It successfully detected sophisticated attacks including prompt injection propagation, unauthorized tool usage, and multi-agent collusion scenarios.
Multi-Agent Monitoring: Research from ACM Computing Surveys shows that when agents share memory, databases, execution privileges, or delegated tasks, a single compromised agent can repeatedly trigger harmful actions across the system. Behavioral monitoring must account for this emergent collusion that arises from shared state rather than explicit coordination.
Non-Human Identity Analytics: As organizations deploy more AI agents, security teams must extend anomaly detection to non-human identities. This includes monitoring for unusual data access patterns, querying records outside typical scope, or accessing sensitive data at unusual times.
The Scale Challenge
AI agents create unprecedented data movement that requires new monitoring approaches:
Data Volume: According to Obsidian Security research, AI agents move 16x more data than human users. This expanded attack surface cannot be manually monitored.
Workforce Limitations: Security operations face workforce shortages approaching four million professionals worldwide. Automated behavioral detection is necessary given the scale of agent deployments.
Market Response: Global AI-in-cybersecurity spending is expected to grow from $24.8B in 2024 toward $146.5B by 2034, reflecting the scale of investment required.
Enterprise Adoption: More than 60% of large enterprises deployed autonomous AI agents in production by 2025, yet legacy IAM tools remain inadequate for securing entities with non-deterministic behaviors.
Implementation Considerations
Based on research and industry implementations, several practical considerations emerge:
Baseline Establishment: Behavioral baselines must account for the non-deterministic nature of AI agents. Traditional IAM tools designed for predictable workloads are inadequate. Baselines should capture:
Detection Dimensions: Modern security platforms establish baselines for normal agent behavior across SaaS applications, then flag deviations such as:
Regulatory Alignment: NIST SP 1800-35, published in November 2024, provides guidance on implementing Zero Trust Architecture with Enhanced Identity Governance and continuous authentication for all workload identities. Federal agencies face a 2026 implementation deadline.
Real-Time Requirements: AI agent security demands continuous behavioral monitoring to detect threats that evade signature-based defenses. Batch analysis is insufficient for agents that can take immediate real-world actions.
Limitations and Open Questions
Research identifies several limitations and open questions:
Baseline Adaptation: Agent behavior legitimately changes over time. Baselines must adapt while remaining sensitive to genuine anomalies. The research community has not yet established best practices for this balance.
False Positive Management: Behavioral anomalies are not always attacks. Organizations must tune detection thresholds to minimize alert fatigue while maintaining security coverage.
Explainability: When behavioral detection triggers an alert, security teams need to understand why. Research on explainable anomaly detection for AI agents is still emerging.
Multi-Agent Complexity: As noted in ACM Computing Surveys research, multi-agent systems exhibit emergent behaviors that complicate baseline establishment. A behavior that is anomalous for a single agent may be normal when agents collaborate.
Defense Verification: While some research shows promising results (e.g., firewall approaches achieving 0% ASR on benchmarks), translating benchmark performance to real-world deployments remains challenging.
References
- Various. "Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?". arXiv, 2025.
- Various. "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways". ACM Computing Surveys, 2025.
- Obsidian Security. "Security for AI Agents: Protecting Intelligent Systems in 2025". Obsidian Security Blog, 2025.
- NIST. "NIST SP 1800-35: Implementing a Zero Trust Architecture". NIST Special Publication, 2024.
- Various. "A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes". arXiv, 2026.