AI businesses evaluated themselves against OpenAI, the market leader. No more. Then that China’s DeepSeek has emerged as the frontrunner, it’s become the one to overcome.

On Monday, DeepSeek turned the AI industry on its mind, causing billions of dollars in loss on Wall Street while raising concerns about how effective some U. S. startups—and venture capital — truly are.

Two brand-new AI behemoths have now entered the ring: Alibaba in China and The Allen Institute for AI in Seattle, both of which claim their creations are on par with or better than DeepSeek V3.

A new type of Tülu 3, a free, open-source 405-billion feature big speech design, was unveiled today by the Allen Institute for AI, a research organization based in the United States known for the release of a more reasonable vision model called Molmo.

” We are thrilled to announce the start of Tülu 3 405B—the second application of totally empty post-training meals to the largest open-weight types”, the Paul Allen-funded non-profit said in a blog post. We demonstrate the scalability and effectiveness of our post-training recipe using a 405B parameter scale with this release.

For those who like comparing sizes, Meta’s latest LLM, &nbsp, Llama-3.3, &nbsp, has 70 billion parameters, and its largest model to date is&nbsp, Llama-3.1 405b—the same size as Tülu 3.

32 nodes with 256 GPUs running simultaneously for training were required because the model’s size required extraordinary computational resources.

While developing its model, The Allen Institute encountered a number of roadblocks. Due to the size of Tülu 3, the team had to divide the workload between hundreds of specialized computer chips, with 240 chip managers overseeing the training process and 16 others managing real-time operations.

Even with this much computing power, the system frequently crashed and required constant monitoring to keep it running.

The novel Reinforcement Learning with Verifiable Rewards ( RLVR ) framework, which demonstrated particular strength in mathematical reasoning tests, was Tülu 3’s breakthrough.

Each RLVR iteration took approximately 35 minutes, with inference requiring 550 seconds, weight transfer 25 seconds, and training 1, 500 seconds, with the AI getting better at problem-solving with each round.

Image: Ai2

A training method known as Reinforcement Learning with Verifiable Rewards ( RLVR ) looks like a sophisticated tutoring system.

The AI received specific tasks, like solving math problems, and got instant feedback on whether its answers were correct.

However, unlike traditional AI training ( such as the one used by openAI to train ChatGPT ), where human feedback can be subjective, RLVR only rewards AI that produce verifiably correct answers, similar to how a math teacher can tell when a student’s solution is correct or incorrect.

This is why the model excels at math and logic problems but struggles at other tasks like creative writing, roleplaying, or factual analysis.

The model is available at Allen AI’s playground, a free site with a UI similar to ChatGPT and other AI chatbots.

What could be expected from a model this large was confirmed by our tests.

It excels at using logic and solving problems. We used a number of math and science benchmarks to generate various random problems, which turned out to be good answers that were even simpler to understand than the sample answers provided by the benchmarks.

However, it failed in other logical language-related tasks that didn’t involve math, such as writing sentences that end in a specific word.

Also, Tülu 3 isn’t multimodal. Instead, it stuck to what it knew best—churning out text. No fancy image generation or embedded Chain-of-Thought tricks here.

On the upside, the interface is free to use, requiring a simple login, either via Allen AI’s playground or by downloading the weights to run locally.

Hugging Face offers the model for download, with alternatives ranging from 8 billion to the enormous 405 billion parameter version.

Chinese Tech Giant Joins the Group

Meanwhile, China isn’t resting on DeepSeek’s laurels.

Amid all the hubbub, Alibaba dropped Qwen 2.5-Max, a massive language model trained on over 20 trillion tokens.

Just days after DeepSeek R1 stuttering the market, the Chinese tech giant released the model during the Lunar New Year.

Benchmark tests showed Qwen 2.5-Max outperformed DeepSeek V3 in several key areas, including coding, math, reasoning, and general knowledge, as evaluated using benchmarks like Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond.

The model demonstrated competitive results against industry leaders like GPT-4o and Claude 3.5-Sonne, t according to the model’s card.

Qwen3.5 Max results in AI benchmarks
Image: Alibaba

Developers can integrate the model using well-known tools and techniques using Alibaba’s cloud platform and OpenAI-compatible API.

The company’s documentation showed detailed examples of implementation, suggesting a push for widespread adoption.

However, Alibaba’s Qwen Chat web portal, which is the best option for general users, is pretty impressive and looks pretty impressive, if you’re willing to create an account there. It is arguably the most adaptable AI chatbot interface there is right now.

Qwen Chat allows users to generate text, code, and images flawlessly. It also supports web search functionality, artifacts, and even a very good video generator, all in the same UI—for free.

Additionally, it has a special feature that allows users to “battle” against two different models to get the best response.

Overall, Qwen’s UI is more versatile than Allen AI’s.

In text responses, Qwen2.5-Max proved to be better than Tülu 3 at creative writing and reasoning tasks that involved language analysis. For instance, it was able to create phrases that ended in a particular word.

Its video generator is a nice addition that is unquestionably superior to offers like Kling or Luma Labs—definitely superior to what Sora can produce.

Also, its image generator provides realistic and pleasant images, showing a clear advantage over OpenAI’s DALL-E 3, but clearly behind top models like Flux or MidJourney.

The triple release of DeepSeek, Qwen2.5-Max, and Tülu 3 just gave the open-source AI world its most significant boost in a while.

By using earlier Qwen technology for distillation, DeepSeek had already attracted attention by creating its R1 reasoning model, demonstrating that open-source AI could compete with billion-dollar tech giants for a fraction of the cost.

And now Qwen2.5-Max has upped the ante. If DeepSeek follows its established playbook—leveraging Qwen’s architecture—its next reasoning model could pack an even bigger punch.

Still, this could be a good opportunity for the Allen Institute. OpenAI is preparing to introduce its o3 reasoning model, which some industry experts predicted could cost users up to$ 1, 000 per query.

If so, Tülu 3’s arrival could be a great open-source alternative—especially for developers wary of building on Chinese technology due to security concerns or regulatory requirements.

edited by Sebastian Sinclair and Josh Quittner

Generally Intelligent Newsletter

A generative AI model’s voiceover for a weekly AI journey.

Share This Story, Choose Your Platform!