Security Research

In-depth analysis of AI agent security threats, detection methodologies, and defense strategies from the Moltwire research team.

Technical PapersThreat AnalysisCase StudiesPerspectives
Technical PapersFeatured18 min read

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

A systematic evaluation of five frontier models across two encoding schemes, four hint levels, and tool use ablation — 8,308 graded outputs with full statistical analysis

Key Findings:

  • Tool use amplifies hidden instruction compliance by orders of magnitude — Claude Haiku jumps from 0.8% to 49.2% (Cohen's h = 1.37, OR = 115.1), all models show significant increases (p < 0.003)
  • Provider-specific encoding vulnerability: GPT-5.2 decodes zero-width binary at 69-70% but 0% on Unicode Tags; Claude Opus achieves 100% on Tags but only 48-68% on zero-width (tools ON)
  • Claude Sonnet 4 is the most susceptible overall at 71.2% compliance (tools ON), reaching 98-100% on both ZW and Tag encodings with full hints
Marcus Graves·February 24, 2026
#reverse-captcha#steganography#zero-width-unicode#unicode-tags

All Papers

Apply Our Research

Put these security insights into practice with Moltwire's real-time threat detection for AI agents.

Start Free