Paper: "Towards a Science of AI Agent Reliability" (arXiv:2602.16666)
Authors: Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan (Princeton University)
Published: February 24, 2026
Synthesis Date: March 9, 2026
This paper establishes the first comprehensive framework for evaluating AI agent reliability as distinct from capability, revealing that 18 months of rapid accuracy progress (slope 0.21/year) have yielded minimal reliability gains (slope 0.03-0.10/year).
✅ Capability ≠ Reliability: Moderate correlation (r=0.82-0.92) with reliability slope only 0.15
⚠️ Consistency failures: Low pass∧k despite high pass@k (agents can solve but don't reliably)
📉 Prompt robustness varies: 20-40% accuracy drops from semantic rephrasings
🎯 Discrimination stagnated: Can't predict which tasks will fail (poor AUROC)
🔒 Safety floor effects: 1-2% violations persist even in best models