Claude Skill Creator Adds Evals and A/B Testing

claude.com

ksl

|

19m ago

Anthropic updated its Skill Creator with a built-in evaluation framework that lets authors test agent skills without writing code – define prompts, set success criteria, get pass/fail results. A new benchmarking mode tracks pass rates, execution time, and token usage, and the whole thing plugs into CI pipelines. The more interesting piece is multi-agent testing: independent agents run evals in parallel with clean contexts, while comparator agents handle A/B tests between skill versions. There’s also a description optimizer that analyzes how skills trigger against sample prompts, splitting data into train/test sets and iterating up to five times to reduce misfires. Anthropic is steadily building the tooling layer that turns prompt engineering from craft into something closer to software QA – a pattern OpenAI and Google have been much slower to formalize.

Source link

What's Hot

Iran Conflict: Data centres new target in Iran conflict; 3 Amazon units hit by drones in UAE, Bahrain | India News

Anthropic Pitched Claude for Pentagon Drone …

Sonia Gandhi: Parliament must debate ‘dissonance’ in foreign policy: Congress MP Sonia Gandhi | India News

Success and failure in sport do not depend on cheques and balances

T20 World Cup: It’s a one-off game now and we back ourselves, says New Zealand skipper Santner

I worked with a baseball coach to improve my power-hitting game: Sanjay Krishnamurthi

T20 World Cup: Focus on match-ups as Proteas meet Black Caps for a spot in the final

T20 WORLD CUP | What secrets does the under-cover Wankhede pitch hold?

Claude Skill Creator Adds Evals and A/B Testing

Anthropic Pitched Claude for Pentagon Drone …

OpenAI Ships GPT-5.3 Instant to Fix the Tone

Google Launches Gemini 3.1 Flash-Lite at $0.25

Global App Testing Launches Human AI Evaluation

Why Anthropic Acts Like the SaaS It Threatens

Pentagon Blacklists Anthropic After AI Ethic…

News

Company

Services

What's Hot

Claude Skill Creator Adds Evals and A/B Testing

Keep Reading

News

Company

Services

Subscribe to Updates