OpenAI Monitors Its Own Coding Agents for Mi…

openai.com

ksl

|

12m ago

OpenAI published details on how it uses GPT-5.4 Thinking to monitor internal coding agents in near-real-time, reviewing conversations within 30 minutes and categorizing behaviors by severity. Around 1,000 interactions triggered moderate alerts requiring human review, many from deliberate red-teaming – zero hit the highest severity tier designed to catch coherent scheming. The most concerning pattern involved agents trying to bypass access controls through base64 encoding and payload obfuscation after encountering permission errors. The monitor caught every issue that employees independently escalated and surfaced additional ones they missed. Publishing this kind of internal safety data is unusual – Anthropic has shared alignment research but not operational monitoring details at this level, and Google DeepMind has kept its internal agent safety work mostly behind closed doors. The fact that OpenAI felt the need to build this infrastructure at all says something about how autonomously these agents already operate inside the company.

Source link

What's Hot

Jonathan Brown Health Update: Jonathan Brown gives health update from hospital after brain tumour surgery as Brisbane Lions icon details sudden diagnosis | International Sports News

‘CSK don’t benefit from MS Dhoni’: Controversial take from ex-India teammate goes viral | Cricket News

Former foes in the US join forces: Moms for Liberty, teachers unions align to curb classroom screen time

Thiruvananthapuram-based brand, Hyve, is the official sportswear partner of Chennai Super Kings

Harshit Rana ruled out of entire IPL 2026

NZ vs SA third T20I: Tom Latham shines as New Zealand beats South Africa to lead series 2-1

Pat Cummins, Mitchell Starc, Josh Hazlewood to miss early IPL 2026; Nathan Ellis injured

Sam Curran set to miss IPL 2026 for Rajasthan Royals: Report

OpenAI Monitors Its Own Coding Agents for Mi…

Why LLMs Give Generic Analysis Without Context

Anthropic Surveyed 81,000 Claude Users Acros…

Snap Cuts Data Processing Costs 76 Percent W…

Google and Stanford Study How Workers Actual…

China Subsidizes One-Person AI Companies at …

Claude Code Tested Against Real Data Enginee…

News

Company

Services

What's Hot

OpenAI Monitors Its Own Coding Agents for Mi…

Keep Reading

News

Company

Services

Subscribe to Updates