developers.openai.com
|
ksl
|
|
OpenAI published a detailed prompting guide for gpt-realtime, its speech-to-speech model now in general availability. The techniques differ substantially from text prompting – pacing instructions control speaking style independently from the speed parameter, phonetic hints fix brand name pronunciation, and character-by-character output prevents garbled alphanumeric sequences like phone numbers. Two conversation flow patterns stand out: a state machine approach with JSON-encoded transitions and dynamic session updates that narrow available tools as dialogue progresses. The guide also introduces a supervisor pattern where a text model handles planning while the realtime model rephrases output for natural speech, effectively splitting reasoning from delivery. Voice agent builders working with ElevenLabs, Vapi, or Retell will find familiar problems here, but the solutions are tightly coupled to OpenAI’s own tooling.
