With the launch of o3-mini, a direct reaction to Chinese company DeepSeek’s R1 model, which matched top-tier achievement at a fraction of the mathematical cost, OpenAI hurried to support its position in the open market on Friday.

” We’re releasing OpenAI o3-mini, the newest, most cost-efficient unit in our argument set, available in both ChatGPT and the API now” OpenAI said in an established website article. This effective and quick model, which was presented in December 2024, pushes the limits of what little models can achieve while still maintaining the low cost and low overhead of the OpenAI o1-mini.

In order to increase user adoption of the new family of reasoning models, OpenAI also made completely reasoning capabilities available to users for the first time. Additionally, it tripled the normal message limits for paying customers, increasing the number of times they use them.

Unlike GPT-4o and the GPT community of versions, the “o” community of AI types is focused on argument things. They’re less creative, but have embedded ring of thought argument that makes them more capable of solving complex problems, backtracking on bad assessments, and building better structure script.

At the highest level, OpenAI has two main families of AI models: Generative Pre-trained Transformers ( GPT ) and” Omni” ( o ).

  • GPT is like the mother’s designer: A right-brain form, it’s good for role-playing, talk, creative reading, summarizing, description, brainstorming, chatting, etc.
  • O is the mother’s guy. It sucks at telling stories, but is fantastic at coding, solving math equations, analyzing difficult issues, planning its reasoning process step-by-step, comparing research papers, etc.

The fresh o3 little comes in three versions—low, medium, or higher. In exchange for more “inference,” which builders must pay per token, these categories will give users better answers.

OpenAI o3-mini, aimed at performance, is worse than OpenAI o1-mini in public awareness and multilingual ring of idea, however, it ratings better at different tasks like scripting or factuality. All the other models ( o3-mini medium and o3-mini high ) do beat OpenAI o1-mini in every single benchmark.

Image: OpenAI

DeepSeek’s breakthrough, which delivered better results than OpenAI’s flagship model while using just a fraction of the computing power, triggered a massive tech selloff that wiped nearly$ 1 trillion from U. S. markets. Investors were skeptical about the potential demand for Nvidia’s pricey AI chips, despite the company’s$ 600 billion market value alone.

The efficiency gap stemmed from DeepSeek’s novel approach to model architecture.

While American companies focused on throwing more computing power at AI development, DeepSeek’s team found ways to streamline how models process information, making them more efficient. When Chinese tech giant Alibaba released Qwen2.5 Max, an even more capable model than the one DeepSeek used as its foundation, putting the brakes on what might turn out to be a new wave of Chinese AI innovation, the competition increased.

OpenAI o3-mini makes another attempt to close that void. The new model operates for less money, runs 24 % faster than its predecessor, and meets or exceeds older models in important benchmarks.

Its pricing is also more competitive. OpenAI o3-mini’s rates —$ 0.55 per million input tokens and$ 4.40 per million output tokens—are a lot higher than DeepSeek’s R1 pricing of$ 0.14 and$ 2.19 for the same volumes, however, they decrease the gap between OpenAI and DeepSeek, and represent a major cut when compared to the prices charged to run OpenAI o1.

Image: OpenAI

And that might be key to its success. OpenAI o3-mini is closed-source, unlike DeepSeek R1 which is available for free—but for those willing to pay for use on hosted servers, the appeal will increase depending on the intended use.

OpenAI o3 mini-medium scores 79.6 on the AIME benchmark of math problems. DeepSeek R1 scores 79.8, a score that is only beaten by the most powerful model in the family, OpenAI mini-o3 high, which scores 87.3 points.

The same pattern can be seen in other benchmarks: The GPQA marks, which measure proficiency in different scientific disciplines, are 71.5 for DeepSeek R1, 70.6 for o3-mini low, and 79.7 for o3-mini high. R1 is at the 96.3rd percentile in Codeforces, a benchmark for coding tasks, whereas o3-mini low is at the 93rd percentile and o3-mini high is at the 97th percentile.

So there are differences, but they may not be as significant in terms of benchmarks in terms of the model used to carry out a task.

Using DeepSeek R1 and OpenAI o3-mini testing.

We tested the model with a few tasks to see how well it performed against DeepSeek R1.

The first task was a spy game to test how good it was at multi-step reasoning. We select the same sample from the Github dataset BIG-bench that we used to evaluate DeepSeek R1. The model must discover the identity of the stalker involved in a school trip to a far-off place in the snowy region where students and teachers are confronted by a number of strange disappearances ( the full story is available here ).

The story’s OpenAI o3-mini didn’t go well and the wrong conclusions were drawn. According to the answer provided by the test, the stalker’s name is Leo. DeepSeek R1 got it right, whereas OpenAI o3-mini got it wrong, saying the stalker’s name was Eric. ( Fun fact: Due to OpenAI’s flag as unsafe, we are unable to share the conversation’s link. )

The model is reasonably good at logical language-related tasks that don’t involve math. For instance, we required the model to complete five sentences that end in a particular word before providing the final response. The model was able to comprehend the task, assess the results, and then provide the final response. It thought about its reply for four seconds, corrected one wrong answer, and provided a reply that was fully correct.

It is also very good at math, demonstrating its ability to solve problems that some benchmarks indicate are extremely challenging. OpenAI o3-mini completed the same challenging problem that DeepSeek R1 required 275 seconds to solve in just 33 seconds.

So a pretty good effort, OpenAI. Your move DeepSeek.

Generally Intelligent Newsletter

A generative AI model’s voiceover for a weekly AI journey.

Share This Story, Choose Your Platform!