Technical Papers15 min read

Behavioral Detection for AI Agents: Research & Approaches

A summary of research on detecting compromised agents through behavioral analysis

Moltwire Security Research·January 25, 2026

Key Findings

1Most current prompt-based defenses suffer from high Attack Success Rates, demonstrating limited robustness against sophisticated injection attacks (arXiv research)
2AI agents move 16x more data than human users, requiring behavioral baselines to detect anomalous access patterns (Obsidian Security)
3SentinelAgent framework combines rule-based classification with LLM-based semantic reasoning to detect multi-agent attack patterns
4Behavioral analytics can detect attacks invisible to traditional tools by establishing dynamic baselines for users, devices, and applications
5Multi-agent systems require special attention as compromised agents can trigger harmful actions across shared state and privileges

Abstract

Traditional security approaches focus on blocking known-bad inputs. For AI agents, this is insufficient—research shows that even state-of-the-art prompt-based defenses exhibit high Attack Success Rates against sophisticated attacks. This paper summarizes research on behavioral detection approaches that focus on identifying compromised agents through behavioral deviations rather than input pattern matching. We examine the rationale for behavioral monitoring, synthesize findings from academic research and industry implementations, and discuss practical considerations for applying these approaches. All claims are supported by cited research from ACM, arXiv, and industry security reports.

Why Behavioral Detection

Input-based security—scanning prompts for known attack patterns—faces fundamental limitations when applied to AI agents. Research demonstrates that most current prompt-based defenses suffer from high Attack Success Rates (ASR), showing limited robustness against sophisticated injection attacks.

Several factors make input-based detection insufficient:

Semantic Variability: The same attack can be phrased infinitely many ways. Unlike SQL injection where you can escape special characters, natural language has no special characters to escape.

Adaptive Attacks: Research shows that attackers can craft perturbations or split adversarial strings across multiple input fields, exceeding 50% attack success rates even against state-of-the-art defenses.

Indirect Vectors: Malicious instructions embedded in external content bypass input scanning entirely. The agent processes the attack as part of legitimate data retrieval.

Behavioral detection takes a different approach: rather than enumerating all possible attacks, it characterizes normal agent behavior and detects deviations. This catches novel attacks that input filtering misses because attacks must cause the agent to behave differently to achieve their goals.

Research Approaches to Behavioral Detection

Several research frameworks have emerged for behavioral detection in AI agent systems:

User and Entity Behavior Analytics (UEBA): AI's greatest defensive strength is its ability to learn what "normal" looks like across complex digital environments. By analyzing massive volumes of data, AI builds dynamic behavioral baselines for every user, device, and application. This UEBA capability allows security systems to detect novel, zero-day, and polymorphic attacks invisible to traditional tools.

SentinelAgent Framework: SentinelAgent combines rule-based classification with LLM-based semantic reasoning for behavior analysis on collected telemetry data. The framework enables detection across multiple granularities—from individual agent misbehavior to complex multi-agent attack patterns. It successfully detected sophisticated attacks including prompt injection propagation, unauthorized tool usage, and multi-agent collusion scenarios.

Multi-Agent Monitoring: Research from ACM Computing Surveys shows that when agents share memory, databases, execution privileges, or delegated tasks, a single compromised agent can repeatedly trigger harmful actions across the system. Behavioral monitoring must account for this emergent collusion that arises from shared state rather than explicit coordination.

Non-Human Identity Analytics: As organizations deploy more AI agents, security teams must extend anomaly detection to non-human identities. This includes monitoring for unusual data access patterns, querying records outside typical scope, or accessing sensitive data at unusual times.

The Scale Challenge

AI agents create unprecedented data movement that requires new monitoring approaches:

Data Volume: According to Obsidian Security research, AI agents move 16x more data than human users. This expanded attack surface cannot be manually monitored.

Workforce Limitations: Security operations face workforce shortages approaching four million professionals worldwide. Automated behavioral detection is necessary given the scale of agent deployments.

Market Response: Global AI-in-cybersecurity spending is expected to grow from $24.8B in 2024 toward $146.5B by 2034, reflecting the scale of investment required.

Enterprise Adoption: More than 60% of large enterprises deployed autonomous AI agents in production by 2025, yet legacy IAM tools remain inadequate for securing entities with non-deterministic behaviors.

Implementation Considerations

Based on research and industry implementations, several practical considerations emerge:

Baseline Establishment: Behavioral baselines must account for the non-deterministic nature of AI agents. Traditional IAM tools designed for predictable workloads are inadequate. Baselines should capture:

Typical tool usage patterns and frequencies

Normal data access volumes and targets

Expected external communication endpoints

Session characteristics and action sequences

Detection Dimensions: Modern security platforms establish baselines for normal agent behavior across SaaS applications, then flag deviations such as:

Unusual data access patterns

Querying records outside typical scope

Accessing sensitive data at unusual times

Unexpected external API calls

Regulatory Alignment: NIST SP 1800-35, published in November 2024, provides guidance on implementing Zero Trust Architecture with Enhanced Identity Governance and continuous authentication for all workload identities. Federal agencies face a 2026 implementation deadline.

Real-Time Requirements: AI agent security demands continuous behavioral monitoring to detect threats that evade signature-based defenses. Batch analysis is insufficient for agents that can take immediate real-world actions.

Limitations and Open Questions

Research identifies several limitations and open questions:

Baseline Adaptation: Agent behavior legitimately changes over time. Baselines must adapt while remaining sensitive to genuine anomalies. The research community has not yet established best practices for this balance.

False Positive Management: Behavioral anomalies are not always attacks. Organizations must tune detection thresholds to minimize alert fatigue while maintaining security coverage.

Explainability: When behavioral detection triggers an alert, security teams need to understand why. Research on explainable anomaly detection for AI agents is still emerging.

Multi-Agent Complexity: As noted in ACM Computing Surveys research, multi-agent systems exhibit emergent behaviors that complicate baseline establishment. A behavior that is anomalous for a single agent may be normal when agents collaborate.

Defense Verification: While some research shows promising results (e.g., firewall approaches achieving 0% ASR on benchmarks), translating benchmark performance to real-world deployments remains challenging.

References

Various. "Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?". arXiv, 2025.
Various. "AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways". ACM Computing Surveys, 2025.
Obsidian Security. "Security for AI Agents: Protecting Intelligent Systems in 2025". Obsidian Security Blog, 2025.
NIST. "NIST SP 1800-35: Implementing a Zero Trust Architecture". NIST Special Publication, 2024.
Various. "A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes". arXiv, 2026.