anthropic.com
|
ksl
|
|
Anthropic published a large-scale study of real agent behavior drawn from millions of interactions across Claude Code and its public API. The standout finding: Claude stops to ask for human input more than twice as often as users actually interrupt it, which flips the usual framing around AI oversight. Experienced users auto-approve over 40% of sessions compared to 20% among newcomers, and the longest autonomous runs nearly doubled to 45-plus minutes in just three months. Only 0.8% of tool calls involved irreversible actions like sending customer emails. The research makes a quiet but pointed case that pre-deployment safety testing alone misses the dynamics that matter – a position that puts Anthropic ahead of OpenAI and Google in building measurement infrastructure for agentic deployment at scale.
