What happened
OpenAI released three new AI voice models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—available via its Realtime API. GPT-Realtime-2 offers GPT-5-class reasoning for complex voice interactions. GPT-Realtime-Translate provides live speech translation across over 70 input and 13 output languages. GPT-Realtime-Whisper delivers low-latency streaming speech-to-text transcription. Pricing for GPT-Realtime-2 is $32 per million input tokens and $64 per million output tokens; Translate costs $0.034 per minute, and Whisper $0.017 per minute.
Why it matters
Developers gain integrated tools for building robust, real-time voice applications, reducing the complexity of stitching together disparate components for reasoning, translation, and transcription. The per-minute pricing for translation and transcription models provides new cost structures for voice AI services. Platform engineers can now deploy more capable voice agents with extended conversational context, while procurement teams must evaluate these new cost structures against current multi-vendor stacks. This follows OpenAI's recent release of GPT-5.5 models, accelerating the pace of developer-facing AI innovation.




