OpenAI's Models for Voice Agents | Weekly Edition

PLUS HOT AI Tools & Tutorials

Mar 21, 2025

Hello and welcome to our weekly roundup!

This week, the degree of conflict in the AI industry has dropped a bit (compared to the previous one), and developers are back to releasing many new models.

Today, we have new tools for building voice agents, a major upgrade to Google Gemini, and a bunch of other updates. Let's discuss.

In case you missed it, YC recently held a Demo Day for its W25 batch. We checked all AI startups and selected a few that are developing handy stuff for creators:
YC W25 Startups We Started Using
Stepan Ikaev and Daniil Andreev
·
March 18, 2025
Read full story

This Creators’ AI Edition:

Featured Materials 🎟️
News of the week 🌍
Useful tools ⚒️
Weekly Guides 📕
AI Meme of the Week 🤡
AI Tweet of the Week 🐦
(Bonus) Materials 🎁

Your advertisement could be featured here!

Sponsor a spot in our newsletter to connect with subscribers interested in AI, technology, and startups. For inquiries, click the button below.

Featured Material 🎟️

Audio Models by OpenAI

OpenAI next-generation audio models blog hero image

OpenAI continues to expand its toolkit for developers who want to create agents. Last week, the company showed a few rather helpful solutions, and now it has moved on to more awe-inspiring things. It has unveiled its latest audio models, designed for building and improving the capabilities of voice agents.

The release includes new speech-to-text and text-to-speech models, now available through the OpenAI API. Here’s what you need to know.

Improvements in Speech-to-Text

The new gpt-4o-transcribe and gpt-4o-mini-transcribe models offer higher accuracy and reduced word error rates than previous Whisper models. OpenAI attributes the improvements to advancements in reinforcement learning and the use of diverse audio datasets.

Enhanced Text-to-Speech Options

The company has also introduced the gpt-4o-mini-tts model, which allows developers to specify how speech should be delivered. This feature enables the customization of voice characteristics for applications like customer service or creative projects.

Currently, the text-to-speech models are limited to preset voices.

Keep your mailbox updated with key knowledge & news from the AI industry

The audio models rely on GPT‑4o and GPT‑4o-mini architectures, pre-trained with audio-focused datasets. OpenAI has refined distillation techniques to transfer knowledge from larger models to smaller ones and implemented reinforcement learning methods to boost transcription accuracy.

Availability

You can now access these models through the OpenAI API. OpenAI plans to expand customization options for synthetic voices while maintaining safety standards.

It also promises to collaborate with policymakers, researchers, and developers to address the opportunities and challenges posed by synthetic audio technology.

News Of The Week 🌍

Google Unveils Canvas, Deep Research, and Audio Overview

This week, Google introduced three new services for Gemini: Canvas, Deep Research, and Audio Overview. Canvas provides a clean interface for writing and coding. Deep Research delivers a toolset for data analysis and scientific inquiry. And Audio Overview will allow you to create podcasts with two hosts from your content.

The recent Gemini updates don't end there. The company also showed an updated Image Editor with tons of new features. Learn more with this post:
Image Gen, Editing & Design Tools
Stepan Ikaev
·
March 20, 2025
Read full story

Apple Sued for False Ad Over Apple Intelligence

The lawsuit, filed Wednesday in U.S. District Court in San Jose, seeks class-action status and damages for those who purchased iPhones and other Apple Intelligence-enabled devices. The plaintiffs allege that owners of the devices did not receive promised AI features. Apple has not yet commented.

The news is quite a blow to the company, considering Apple recently delayed the update. In addition, earlier this week, Bloomberg reported that Tim Cook has “lost confidence” in the current AI head, John Giannandrea, “to execute on product development.”

Claude Gains Web Search Capability

Anthropic has unveiled a web search feature for its Claude chatbot. The company's AI finally has access to real-time information beyond its previous knowledge in October 2024. The feature is currently in preview mode for paid users in the U.S., with plans to expand to free users and other countries soon.

xAI Acquires Hotshot AI Startup

Elon Musk's AI company, xAI, has acquired Hotshot, a San Francisco-based startup specializing in AI videos. Founded by Aakash Sastry and John Mullan, Hotshot developed text-to-video models such as Hotshot-XL and Hotshot Act One. These models simplify video creation for applications in education, entertainment, and business communication.

Hotshot’s technology will be integrated into xAI’s ecosystem, contributing to developing a “Grok Video” model for the chatbot platform.

Sharing is caring! Refer someone who recently started a learning journey in AI. Make them more productive and earn rewards!

Refer a friend

xAI Launches Image Generation API

Elon Musk's startup did more than just deals this week. XAI has also launched a new API, “grok-2-image-1212,” enabling developers to generate images from text prompts. The API allows up to 10 images per request, with a limit of 5 requests per second.

Images are delivered in JPEG format at $0.07 per image. So, it is competitively priced against alternatives like Black Forest Labs ($0.05) and Ideogram ($0.08).

Stability AI Reveals Stable Virtual Camera

Stability AI showed a new model, Stable Virtual Camera. It can transform 2D images into immersive 3D videos with realistic depth and dynamic camera movements. You can input up to 32 images to create videos with preset paths like spiral, dolly zoom, and pan.

The model is available for research use under a non-commercial license. You can read the download the weights on Hugging Face and access the code on GitHub.

Useful Tools ⚒️

Epiphany – Turn voice notes into instant actions in your favorite tools

SeabassAI – Discover real-world AI applications and AI-built programs

Crawl AI – Build your own AI with one prompt

Codegen – AI developer that analyzes + improves codebase 24/7 in Slack

SeabassAI – Real-world AI apps and AI-built programs

Seabass AI is a good place to explore the latest AI tools in-depth. It is a community hub where developers can submit projects, and users can explore practical AI solutions across industries. If you're tired of the same-old-same-old picks with ChatGPT, Gemini, and other mainstream apps, check it out.