testingcatalog.com
|
ksl
|
|
A leaked draft revealed that Anthropic has a model called Mythos – also referenced as Capybara – sitting above Opus in the Claude hierarchy, with dramatically higher scores on coding, reasoning, and cybersecurity benchmarks. The problem: internal evaluations flagged it as posing unprecedented cybersecurity risks, with capabilities that could outpace defensive countermeasures. Anthropic hasn’t released it publicly, citing high serving costs and the need for efficiency work, while granting early access to cybersecurity-focused organizations so defenders can prepare. The decision to hold a model back over offensive cyber capability is still rare, though both OpenAI and Google DeepMind have flagged similar concerns in their own frontier safety reports. Anthropic’s track record of publishing capability thresholds before release makes this disclosure more credible than most pre-launch leaks.
