Hello and welcome to our weekly roundup!
This week, the degree of conflict in the AI industry has dropped a bit (compared to the previous one), and developers are back to releasing many new models.
Today, we have new tools for building voice agents, a major upgrade to Google Gemini, and a bunch of other updates. Let's discuss.
In case you missed it, YC recently held a Demo Day for its W25 batch. We checked all AI startups and selected a few that are developing handy stuff for creators:
This Creators’ AI Edition:
Featured Materials 🎟️
News of the week 🌍
Useful tools ⚒️
Weekly Guides 📕
AI Meme of the Week 🤡
AI Tweet of the Week 🐦
(Bonus) Materials 🎁
Your advertisement could be featured here!
Sponsor a spot in our newsletter to connect with subscribers interested in AI, technology, and startups. For inquiries, click the button below.
Featured Material 🎟️
Audio Models by OpenAI
OpenAI continues to expand its toolkit for developers who want to create agents. Last week, the company showed a few rather helpful solutions, and now it has moved on to more awe-inspiring things. It has unveiled its latest audio models, designed for building and improving the capabilities of voice agents.
The release includes new speech-to-text and text-to-speech models, now available through the OpenAI API. Here’s what you need to know.
Improvements in Speech-to-Text
The new gpt-4o-transcribe and gpt-4o-mini-transcribe models offer higher accuracy and reduced word error rates than previous Whisper models. OpenAI attributes the improvements to advancements in reinforcement learning and the use of diverse audio datasets.
Enhanced Text-to-Speech Options
The company has also introduced the gpt-4o-mini-tts model, which allows developers to specify how speech should be delivered. This feature enables the customization of voice characteristics for applications like customer service or creative projects.
Currently, the text-to-speech models are limited to preset voices.
Keep your mailbox updated with key knowledge & news from the AI industry
The audio models rely on GPT‑4o and GPT‑4o-mini architectures, pre-trained with audio-focused datasets. OpenAI has refined distillation techniques to transfer knowledge from larger models to smaller ones and implemented reinforcement learning methods to boost transcription accuracy.
Availability
You can now access these models through the OpenAI API. OpenAI plans to expand customization options for synthetic voices while maintaining safety standards.
It also promises to collaborate with policymakers, researchers, and developers to address the opportunities and challenges posed by synthetic audio technology.
News Of The Week 🌍
Google Unveils Canvas, Deep Research, and Audio Overview
This week, Google introduced three new services for Gemini: Canvas, Deep Research, and Audio Overview. Canvas provides a clean interface for writing and coding. Deep Research delivers a toolset for data analysis and scientific inquiry. And Audio Overview will allow you to create podcasts with two hosts from your content.
The recent Gemini updates don't end there. The company also showed an updated Image Editor with tons of new features. Learn more with this post:
Apple Sued for False Ad Over Apple Intelligence
The lawsuit, filed Wednesday in U.S. District Court in San Jose, seeks class-action status and damages for those who purchased iPhones and other Apple Intelligence-enabled devices. The plaintiffs allege that owners of the devices did not receive promised AI features. Apple has not yet commented.
The news is quite a blow to the company, considering Apple recently delayed the update. In addition, earlier this week, Bloomberg reported that Tim Cook has “lost confidence” in the current AI head, John Giannandrea, “to execute on product development.”
Claude Gains Web Search Capability
Anthropic has unveiled a web search feature for its Claude chatbot. The company's AI finally has access to real-time information beyond its previous knowledge in October 2024. The feature is currently in preview mode for paid users in the U.S., with plans to expand to free users and other countries soon.
xAI Acquires Hotshot AI Startup
Elon Musk's AI company, xAI, has acquired Hotshot, a San Francisco-based startup specializing in AI videos. Founded by Aakash Sastry and John Mullan, Hotshot developed text-to-video models such as Hotshot-XL and Hotshot Act One. These models simplify video creation for applications in education, entertainment, and business communication.
Hotshot’s technology will be integrated into xAI’s ecosystem, contributing to developing a “Grok Video” model for the chatbot platform.
Sharing is caring! Refer someone who recently started a learning journey in AI. Make them more productive and earn rewards!
xAI Launches Image Generation API
Elon Musk's startup did more than just deals this week. XAI has also launched a new API, “grok-2-image-1212,” enabling developers to generate images from text prompts. The API allows up to 10 images per request, with a limit of 5 requests per second.
Images are delivered in JPEG format at $0.07 per image. So, it is competitively priced against alternatives like Black Forest Labs ($0.05) and Ideogram ($0.08).
Stability AI Reveals Stable Virtual Camera
Stability AI showed a new model, Stable Virtual Camera. It can transform 2D images into immersive 3D videos with realistic depth and dynamic camera movements. You can input up to 32 images to create videos with preset paths like spiral, dolly zoom, and pan.
The model is available for research use under a non-commercial license. You can read the download the weights on Hugging Face and access the code on GitHub.
Useful Tools ⚒️
Epiphany – Turn voice notes into instant actions in your favorite tools
SeabassAI – Discover real-world AI applications and AI-built programs
Crawl AI – Build your own AI with one prompt
Codegen – AI developer that analyzes + improves codebase 24/7 in Slack
SeabassAI – Real-world AI apps and AI-built programs
Seabass AI is a good place to explore the latest AI tools in-depth. It is a community hub where developers can submit projects, and users can explore practical AI solutions across industries. If you're tired of the same-old-same-old picks with ChatGPT, Gemini, and other mainstream apps, check it out.
Share this post with friends, especially those interested in AI stories!
Weekly Guides 📕
How to Use Gemini AI’s Deep Research to Save Hours
How to Grow Your LinkedIn with AI in 3 Simple Steps
Build your first AI marketing automation
How to Build AI Agents with MCP! (Cursor, Cline, VS Code)
AI Meme Of The Week 🤡
Let's not deny it, that guy was really good.
AI Tweet Of The Week
An agent browser from Perplexity is on the way!
(Bonus) Materials 🏆
YC W25 Startups We Started Using
How AI-coding tool Cursor is changing the way developers work
Yahoo Is Still Here—and It Has Big Plans for AI
Not all AI-assisted programming is vibe coding (but vibe coding rocks)
A high schooler built a website that lets you challenge AI to a Minecraft build-off
Share this edition with your friends!