OpenAI releases real-time voice translation

Unusual Whales

13 May 2024 • 1 min read

OpenAI is introducing GPT-4o, a new version of its GPT-4 model, which powers ChatGPT. According to OpenAI CTO Mira Murati, the updated model is significantly faster and enhances capabilities across text, vision, and audio. Murati announced this in a livestream on Monday, stating that the model will be freely available to all users. Paid users will continue to enjoy up to five times the capacity limits of free users, Murati added.

In a blog post, OpenAI mentioned that GPT-4o's capabilities will be rolled out iteratively, with its text and image capabilities beginning to roll out in ChatGPT from today.

OpenAI CEO Sam Altman explained that the model is "natively multimodal," meaning it can generate content or understand commands in voice, text, or images. Developers will have access to the API, which Altman stated is half the price and twice as fast as GPT-4 Turbo, on X.

New features are being introduced to ChatGPT's voice mode as part of the new model. The app will now function as a Her-like voice assistant, providing real-time responses and observing the user's surroundings. Altman noted that the current voice mode is more limited, only responding to one prompt at a time and utilizing only audio input.

Altman reflected on OpenAI's evolution in a blog post after the livestream event. He mentioned that while the company's original vision was to create various benefits for the world, this vision has shifted. OpenAI has faced criticism for not open-sourcing its advanced AI models, and Altman seems to suggest that the focus has changed to making these models available to developers through paid APIs, allowing third parties to create using the technology. "Instead, it now looks like we'll create AI, and then other people will use it to create all sorts of amazing things that we all benefit from."

Before the GPT-4o launch, there were conflicting reports suggesting that OpenAI was unveiling an AI search engine to rival Google and Perplexity, a voice assistant integrated into GPT-4, or a completely new and improved model, GPT-5. OpenAI timed this launch just before Google I/O, the tech giant's flagship conference, where several AI products from the Gemini team are expected to be launched.

Sign up for more like this.