Technical PapersFeatured18 min read
Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection
A systematic evaluation of five frontier models across two encoding schemes, four hint levels, and tool use ablation — 8,308 graded outputs with full statistical analysis
Key Findings:
- •Tool use amplifies hidden instruction compliance by orders of magnitude — Claude Haiku jumps from 0.8% to 49.2% (Cohen's h = 1.37, OR = 115.1), all models show significant increases (p < 0.003)
- •Provider-specific encoding vulnerability: GPT-5.2 decodes zero-width binary at 69-70% but 0% on Unicode Tags; Claude Opus achieves 100% on Tags but only 48-68% on zero-width (tools ON)
- •Claude Sonnet 4 is the most susceptible overall at 71.2% compliance (tools ON), reaching 98-100% on both ZW and Tag encodings with full hints
Marcus Graves·February 24, 2026
#reverse-captcha#steganography#zero-width-unicode#unicode-tags