A new model that matched—and often outperformed—one of China’s most advanced AI techniques, DeepSeek, was released on Wednesday by a group of international experts from leading educational institutions and software companies.
OpenThinker-32B, developed by the Open Thoughts consortium, achieved a 90.6 % accuracy score on the MATH500 benchmark, edging past DeepSeek’s 89.4 %.
The unit also outperformed DeepSeek on public problem-solving things, scoring 61.6 on the GPQA-Diamond standard compared to DeepSeek’s 57.6. On the LCBv2 standard, it hit a good 68.9, showing solid performance across different testing situations.
In other words, it’s better than a similarly-sized version of DeepSeek R1 at general scientific knowledge ( GPQA-Diamond ). It even beat DeepSeek at MATH500 while losing at the AIME benchmarks—both of which try to calculate arithmetic skills.
It scores 68.9 points, compared to 71.2 for DeepSeek for coding, but since the model is available source, all these scores can start to improve significantly after users begin using it.
What set this progress off was its performance: OpenThinker required just 114, 000 education examples to achieve these results, while DeepSeek used 800, 000.
The OpenThoughts-114k database came packed with extensive metadata for each problem: floor fact solutions, test cases for script problems, starter code where needed, and domain-specific information.
An AI determine handled math identification while its practice Curator platform validated script solutions against test cases.
The group claimed to have completed the task in 90 days using four nodes with eight H100 GPUs. A separate data with 137, 000 unverified tests, trained on Italy’s Leonardo Supercomputer, burned through 11, 520 A100 days in just 30 days.
The group stated in their records that “verification helps to maintain quality while increasing diversity and training prompt size.” Yet unsubstantiated versions performed well, despite their inability to match the verified model’s best results, according to the research.
The Alibaba Qwen2.5-32B-Instruct LLM is a reasonable 16, 000-token context window that is sufficient to handle difficult scientific proofs and lengthy coding problems but significantly below the accepted standards.
This release comes as AI logic skills are being increasingly pushed by other people, which appears to be happening at the speed of thought. OpenAI announced on February 12 that all types following GPT-5 may have reasoning abilities. One day later, Elon Musk hyped up xAI’s Grok-3’s enhanced problem-solving features, promising it would be the best argument type to day, and just a few hours ago, Nous Research released another open-source logic design, DeepHermes, based on Meta’s Llama 3.1.
After DeepSeek demonstrated comparable performance to OpenAI’s o1 at significantly reduced costs, the field gained momentum. DeepSeek R1 is free to download, use, and modify, with the training techniques also revealed.
However, unlike Open Thoughts, which decided to open source everything, the DeepSeek development team kept its training data private.
Due to having access to all the pieces of the puzzle, developers may have a harder time understanding OpenThinker and reproducing its results from scratch than they would have with DeepSeek.
This release once more demonstrates the viability of creating competitive models without massive proprietary datasets for the broader AI community. Additionally, it might be a more reliable partner for Western developers who are unsure whether or not to use an open-source Chinese model.
OpenThinker is available for download at HuggingFace. A smaller, less powerful 7B parameter model is also available for lower-end devices.
The Open Thoughts team pulled together researchers from different American universities, including Stanford, Berkeley, and UCLA, alongside Germany’s Juelich Supercomputing Center. The US-based Toyota Research Institute and other players in the EU AI scene also support it.
edited by Sebastian Sinclair and Josh Quittner
Generally Intelligent Newsletter
A generative AI model’s generative AI model, Gen, tells a weekly AI journey.