OpenAI lets developers build real-time voice apps - at a substantial premium

Jakub Porzycki/NurPhoto via Getty Images

OpenAI’s annual developer day took place Wednesday in San Francisco, with a raft of product and feature announcements. The event’s centerpiece was the company’s introduction of its real-time application programming interface (API).

The feature for developers makes it possible to send and receive spoken-language inputs and outputs during inference operations, or making predictions with a production large language model (LLM). It is hoped this type of interaction can enable a more fluid, real-time conversation between a person and a language model.

Also: OpenAI’s Altman sees ‘superintelligence’ just around the corner – but he’s short on details

This capability also comes at a hefty premium. OpenAI currently prices the GPT-4o large language model, which is the model that forms the basis for the real-time API, at $2.50 per million tokens of input text, and $10 per million output tokens.

The real-time input and output cost is at least twice that rate, based on both text and audio tokens, since GPT-4o needs both kinds of input and output. Input and output tokens for GPT-4o when using the real-time API cost $5 and $20, respectively, per million tokens.

openai-dev-day-2024-splash-image — A busy schedule at the developer day.

OpenAI

For voice tokens, the cost is a whopping $100 per million audio input tokens and $200 per million audio output tokens.

Also: How to use ChatGPT to optimize your resume

OpenAI notes that with standard statistics for voice conversations, the pricing of audio tokens “equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output.”

openai-real-time-api-pricing — OpenAI’s pricing sheet for real-time API function calls in GPT-4o large language model inference.

OpenAI

OpenAI gives examples of how real-time voice can be used in generative AI, including an automated health coach giving a person advice, and a language tutor that can engage in conversations with a student to practice a new language.

During the developer conference, OpenAI offered a way to reduce the total cost to developers, with prompt caching, which is re-using tokens on inputs that have been previously submitted to the model. That approach cuts the price of GPT-4o input text tokens in half.

Also: OpenAI’s budget GPT-4o mini model is now cheaper to fine-tune, too

Also introduced Wednesday was LLM “distillation”, which lets developers use the data from larger models to train smaller models.

A developer captures the input and output of one of OpenAI’s more capable language models, such as GPT-4o, using the technique known as “stored completions”. Those stored completions then become the training data to “fine tune” a smaller model, such as GPT-4o mini.

OpenAI bills the distillation service as a way to eliminate a lot of iterative work required by developers to train smaller models from larger models.

“Until now, distillation has been a multi-step, error-prone process,” says the company’s blog on the matter, “which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements.”

Also: Businesses can reach decision dominance using AI. Here’s how

Distillation comes in addition to OpenAI’s existing fine-tuning service, the difference being that you can use the larger model’s input-output pairs as the fine-tuning data. To the fine-tuning service, the company Wednesday added image fine tuning. A developer submits a data set of images, just as they would with text, to make an existing model, such as GPT-4o, more specific to a task or a domain of knowledge.

An example in practice is work by food delivery service Grab. The company uses real-world images of street signs to have GPT-4o perform mapping of the company’s delivery routes. “Grab was able to improve lane count accuracy by 20% and speed limit sign localization by 13% over a base GPT-4o model, enabling them to better automate their mapping operations from a previously manual process,” states OpenAI.

Pricing is based on chopping up each image a developer submits into tokens, which are then priced at $3.75 per million input tokens and $15 per million output tokens, the same as standard fine-tuning. For training image models, the cost is $25 per million tokens.

Source link

Breaking News

What Is a Water Bath?

Microsoftâs AI Recall Tool Is Still Sucking Up Credit Card and Social Security Numbers

Fast Horse Racing Results | Sky Sports Horse Racing

Astro Bot Is Getting An Official Funko Pop Next Year, But Preorders Close Soon

EU citizens are enthusiastic about AI use in law enforcement, but some fear it is a danger to democracy

Trader Joe’s New Butter Is So Good, Fans Are ‘Grabbing It in a Heartbeat’

I tested the Kindle Scribe for two weeks, and its best feature isn’t what I expected

Why We Suck at Judging the Strength of Knots

Skyrim Special Edition accounted for 65% of all Nexus Mods downloads last month, proving that the wait for The Elder Scrolls 6 is nothing more than a skill issue

What Is a Water Bath?

Microsoftâs AI Recall Tool Is Still Sucking Up Credit Card and Social Security Numbers

Fast Horse Racing Results | Sky Sports Horse Racing

Astro Bot Is Getting An Official Funko Pop Next Year, But Preorders Close Soon

OpenAI lets developers build real-time voice apps – at a substantial premium

More From Author

What Is a Water Bath?

Microsoftâs AI Recall Tool Is Still Sucking Up Credit Card and Social Security Numbers

Fast Horse Racing Results | Sky Sports Horse Racing

+ There are no comments

Cancel reply

Oura Ring 4 has officially arrived – putting the Samsung Galaxy Ring on notice

The Rings of Power season 2 might have just quietly introduced a major Lord of the Rings location in its final seconds

You May Also Like:

What Is a Water Bath?

Microsoftâs AI Recall Tool Is Still Sucking Up Credit Card and Social Security Numbers

Fast Horse Racing Results | Sky Sports Horse Racing

Astro Bot Is Getting An Official Funko Pop Next Year, But Preorders Close Soon

EU citizens are enthusiastic about AI use in law enforcement, but some fear it is a danger to democracy

Trader Joe’s New Butter Is So Good, Fans Are ‘Grabbing It in a Heartbeat’

I tested the Kindle Scribe for two weeks, and its best feature isn’t what I expected

Why We Suck at Judging the Strength of Knots

Breaking News

Top Tagged

+ There are no comments

Oura Ring 4 has officially arrived – putting the Samsung Galaxy Ring on notice

The Rings of Power season 2 might have just quietly introduced a major Lord of the Rings location in its final seconds