At the end of last week, OpenAI finally opened access to its o3-mini model (for all users!), which responded to the buzz from DeepSeek. Today, we review it, compare it to the DeepSeek R1, and determine which reasoning model is right for your tasks.
We'll run a series of tests in which each participant solves complex problems from different domains. Let's go!
Keep your mailbox updated with practical knowledge & key news from the AI industry!
OpenAI o3-mini | Quick Overview
Although we are comparing o3 with DeepSeek R1 here, we won’t be taking a detailed look at the Chinese startup in this post. If you want to learn more about it:
OpenAI's o3-mini is the latest addition to the reasoning model lineup, building upon the foundation laid by the o1 model. While o1 was designed to handle complex tasks across various domains, o3-mini introduces enhanced reasoning capabilities, particularly in science, technology, engineering, and mathematics (STEM) fields.
In performance evaluations, o3-mini has improved over its predecessor. It matches o1's proficiency in tasks like mathematics and coding reasoning but delivers responses approximately 24% faster. Expert assessments indicate that o3-mini provides more accurate answers, with a 39% reduction in major errors on challenging questions.
As for availability (I guess we can thank DeepSeek for this), o3-mini is available to all categories of ChatGPT users, including the free tier. However, depending on the type of subscription, OpenAI will give you a different number of messages per day.
Pricing
ChatGPT Pro ($200/month): Unlimited use
Plus ($20/month): 150 daily messages
Free users: Limited testing via "Reason" mode (but less than 150)
API Cost: $1.10 per million input tokens and $4.40 per million output tokens
Overall, the cost for APIs in o3-mini is 63% cheaper than o1-mini.
As we gradually move to comparison, DeepSeek's R1 presents a notably more economical option. DeepSeek R1 is priced at $0.14 per million input tokens (cache hit), $0.55 per million input tokens (cache miss), and $2.19 per million output tokens.
o3-mini vs DeepSeek R1
Now, about the technical side of the question. We won't go into all the benchmarks and analyze dozens of synthetic tests. While they give a general idea of performance, these numbers are useless for real-world scenarios.
Here are the LiveBench data:
LiveBench updates its questions using recent sources such as arXiv papers, news articles, and IMDb movie synopses. Each question has an objective, verifiable answer, allowing automatic scoring without relying on human or LLM judges.
As you can see, the OpenAI model slightly outperforms its competitor in almost all checks. The only exception is math reasoning (76.55 vs. 79.54), where DeepSeek R1 is better. At the same time, it is important to keep in mind that we are talking about o3-mini (high)—the most productive subcategory.
Approximately the same results show other benchmarks:
Very close, but the o3-mini wins more often: in four tests out of six.
API
To avoid burdening you with unnecessary details, below is a table comparing the APIs of the two models. Considering the cost, DeepSeek could be more attractive, but we shouldn’t forget about the problems with data leaks and the fact that reliability leaves much to be desired (unlike o3-mini).
Who stole from who?
There is also one funny thing. After the release of DeepSeek, some users on X noticed that the model sometimes called itself “ChatGPT.” This led many to suspect that the Chinese AI is based on competitor data. But that's not all.
o3-mini distinguished itself, occasionally reasoning in Chinese. So, now many users think that OpenAI copied some of DeepSeek's open-source code to release an updated model as soon as possible but didn't have time to edit the code.
This information cannot be reliably confirmed, so you decide which is true.
Prompts Testing | Who’s better?
We will compare the o3-mini and DeepSeek R1 within five complex and practical prompts, defining the best AI model on the market. Here are our categories:
Creative Writing
Analysis of Real-Time Content
Logical Problem Solving
Financial Modeling & Projections
Business Strategy
Within these topics, the emphasis will be on use cases for creators and entrepreneurs. At the end of each test, we’ll look at the results and determine the winner.
1. Creative Writing
Keep reading with a 7-day free trial
Subscribe to Creators' AI to keep reading this post and get 7 days of free access to the full post archives.