Research - AI Agent Security Research

Technical PapersFeatured18 min read

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

A systematic evaluation of five frontier models across two encoding schemes, four hint levels, and tool use ablation — 8,308 graded outputs with full statistical analysis

Key Findings:

•Tool use amplifies hidden instruction compliance by orders of magnitude — Claude Haiku jumps from 0.8% to 49.2% (Cohen's h = 1.37, OR = 115.1), all models show significant increases (p < 0.003)
•Provider-specific encoding vulnerability: GPT-5.2 decodes zero-width binary at 69-70% but 0% on Unicode Tags; Claude Opus achieves 100% on Tags but only 48-68% on zero-width (tools ON)
•Claude Sonnet 4 is the most susceptible overall at 71.2% compliance (tools ON), reaching 98-100% on both ZW and Tag encodings with full hints

Marcus Graves·February 24, 2026

#reverse-captcha#steganography#zero-width-unicode#unicode-tags

All Papers

Threat Analysis20 min read

The AI Agent Threat Landscape: A Research Summary

A synthesis of current research on security threats facing autonomous AI systems

Moltwire Security Research·January 30, 2026

Technical Papers15 min read

Behavioral Detection for AI Agents: Research & Approaches

A summary of research on detecting compromised agents through behavioral analysis

Moltwire Security Research·January 25, 2026

Security Research

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

Key Findings:

All Papers

The AI Agent Threat Landscape: A Research Summary

Behavioral Detection for AI Agents: Research & Approaches

Apply Our Research