Elon Musk’s xAI-developed Grok-3 was unveiled on Monday, with the company presenting a sizable computing facilities that hints at even greater passions while making strong claims about its capabilities.

Although many of the real demonstrations appeared to be replays of what other AI companies have already accomplished, the statement focused primarily on fresh mathematical body, standard performance, and upcoming features.

The star of the initial part of the show wasn’t the AI itself, but rather” Colossus”, a behemoth cluster of 200, 000 GPUs that powers Grok-3’s training. &nbsp,

The structure came up in two stages: 122 days of simultaneous training on 100, 000 GPUs, followed by 92 days of scaling up to the whole 200, 000. Developing this system proved to be more difficult than creating the AI model itself, according to the xAI designers. &nbsp,

Musk claims they are aiming for five days the current power, creating what would be the most effective GPU grouping on earth, and that the organization currently has plans for an even more powerful grouping.

When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model ( the regular model without Chain of Thought and reasoning embedded ) consistently tops the charts in math ( AIME), science ( GPOA ), and coding (LCB) tests. &nbsp,

Additionally, it appears very promising in blind tests. &nbsp,

xAI confirmed that the enigmatic model with the model number” Chocolate” was a prototype of Grok-3 that was made available for download to the LLM Arena. &nbsp,

In those tests, it received the best ELO among all LLMs, which indicates that users preferred its answers over those provided by all other AI models in direct competition without knowing which model they were evaluating.

By training their AIs on those datasets, this method is probably the most reliable way to assess quality without giving models the opportunity to rig benchmarks. Taus of anonymous users have blinded themselves to this benchmark based solely on preferences and blind choices.

xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAI
xAI team shows off Grok 3’s benchmark tests during a live presentation. Image: xAI

A specialized” Reasoning Beta” variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher —reaching 93 % on the AIME 2025 benchmark compared to the other best-performing models that rank below 87 %.

Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.

In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.

But when xAI moved to demonstrate Grok-3’s capabilities live, the presentation felt more like a game of catch-up than innovation. The team showcased the model solving physics problems and writing game code from scratch—impressive feats that ChatGPT, Claude, and Google’s Gemini mastered a while ago. &nbsp,

New tools, old tricks

They also introduced DeepSearch, a research agent that, like similar tools from OpenAI and Google, scours the web and generates extensive reports on given topics.

Grok-3 is immediately accessible to X Premium Plus subscribers, but the most powerful version and updated versions will typically reside in a separate app or on Grok .com.

Voice interactions, similar to OpenAI’s” Advanced Voice Mode” will arrive in the upcoming weeks, with Musk emphasizing this isn’t simple text-to-speech but a genuine AI voice model capable of natural, expressive speech. &nbsp,

Developers will get API access in the coming weeks, along with audio transcription capabilities, making Grok-3 a powerful tool for third-party AI-powered apps.

xAI also revealed plans for an AI gaming studio that will allow developers to create games powered by Grok-3 shortly after showcasing an example of a Tetris game created by Grok. &nbsp,

Right now, the model is being slowly rolled out. has not yet been granted access to the model, but some enthusiasts have tried it and are so far satisfied with the outcomes.

Computer scientist Lex Friedman, one of the loudest voices in the AI space, praised Grok-3’s capabilities.

It was compared to leading market rivals, according to others.

” Grok 3 + Thinking feels somewhere around the state of art territory of OpenAI’s strongest models ( o1-pro,$ 200/month ), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking”, former OpenAI co-founder Andrej Karpathy wrote in an extensive post on X. ” For now, big congrats to the xAI team, they clearly have huge velocity and momentum”

X user Penny2x shared a game built from scratch with Grok-3—a 2d platformer similar to Mario Bros. &nbsp,

They said Grok’s ability to comprehend instructions and make improvements after several iterations made them appear impressed. &nbsp,

” I just keep asking for adjustments,” the game says,” and it keeps spitting the game out in a single file that I can download and run.” he posted something in a post on X. ” This is incredible. We live in the future. Now everyone is a developer.

Thank Doge offers the game for testing.

The company also confirmed plans to open-source Grok-2 once Grok-3 is fully mature and running correctly, which is expected to occur sometime in the coming months. &nbsp,

xAI previously open-sourced its models after Grok-2, continuing its trend of releasing older versions to spur innovation—though Grok-2 lags behind top-tier models.

Grok-3 appears to be adept at matching what the best AI models can already do, for the time being. &nbsp,

When xAI releases its promised voice features, gaming tools, and API access in the weeks to come, the real test will be. Now, the ball is in OpenAI’s court, which is set to release GPT-4.5 soon.

Generally Intelligent Newsletter

A generative AI model called Gen narrates a weekly AI journey.

Share This Story, Choose Your Platform!