The most recent AI model family from OpenAI achieved what many believed difficult: it scored an unprecedented 87.5 % on the challenging benchmark known as the Autonomous Research Collaborative Artificial General Intelligence, which is essentially close to the upper limit of what is technically regarded as “human.”
The ARC-AG I standard testing how close a design is to achieving artificial general intelligence, meaning whether it can think, solve problems, and react like a man in different scenarios… even when it hasn’t been trained for them. The standard is incredibly simple for humans to defeat, but it is extremely difficult for machines to comprehend and resolve.
The San Francisco-based AI research firm unveiled o3 and o3-mini last week as piece of its” 12 weeks of OpenAI” campaign—and just days after Google announced its own o1 company. The upcoming model from OpenAI was closer to artificial general intelligence than anticipated, according to the launch.
OpenAI’s fresh reasoning-focused type marks a fundamental change in how AI systems approach difficult reasoning. O3 introduces a novel “program production” method that allows it to address completely new problems it hasn’t encountered before, in contrast to traditional large vocabulary designs that rely on pattern matching.
” This is not only progressive development, but a true breakthrough”, the ARC group stated in their assessment report. In a blog post, ARC Prize co-founder Francois Chollet went yet further, suggesting that “o3 is a program capable of adapting to jobs it has never encountered before, probably approaching human-level achievement in the ARC-AG I domain”.
Just for reference, here is what ARC Prize says about its scores:” The average human performance in the study was between 73.3 % and 77.2 % correct ( public training set average: 76.2 %, public evaluation set average: 64.2 %. )”
OpenAI o3 achieved an 88.5 % report using great processing tools. That report was unmatched by any other current AI design.
Is o3 AG I? Depending on who you ask, it depends.
Despite its impressive results, the ARC Prize board—and other experts —said that AGI has not yet been achieved, so the$ 1 million prize remains unclaimed. However, researchers from different AI sectors differed on whether O3 had broken the AGI standard.
Some—including Chollet himself—took problem with the whether the benchmarking test itself was even the best measure of whether a unit was approaching true, human-level problem-solving:” Passing ARC-AG I does not correspond to achieving AGI, and as a matter of fact, I don’t believe o3 is Electric yet”, Chollet said. ” O3 still fails on some very easy tasks, indicating fundamental differences with human intelligence”.
He made reference to a more recent version of the AGI benchmark, which he claimed would give an accurateer insight into how far an AI is from being human-like in terms of reason. Early data points point toward the proposed ARC-AGI-2 benchmark, which could reduce its score to less than 30 % even at high compute ( while a smart human would still be able to score over 95 % without any training ),” said Cholet.
Other test-testers even asserted that OpenAI had actually cheated on the system. ” Models like o3 use planning tricks. They outline steps (” scratchpads” ) to improve accuracy, but they’re still advanced text predictors. For example, when o3 ‘ counts letters,’ it’s generating text about counting, not truly reasoning”, Zeroqode co-founder Levon Terteryan wrote on X.
Why OpenAI’s o3 Isn’t AGI
OpenAI’s new reasoning model, o3, is impressive on benchmarks but still far from AGI.
What is AG I?
AGI ( Artificial General Intelligence ) refers to a system capable of human-level understanding across tasks. It should:
– Play chess like a human. … pic. twitter.com/yn4cuDTFte— Levon Terteryan ( @levon377 ) December 21, 2024
A similar point of view is shared by other AI scientists, like the award-winning AI researcher Melanie Mitchel, who argued that o3 isn’t truly reasoning but performing a “heuristic search”.
Chollet and others criticized OpenAI for being unreliable with how its models operate. The models appear to be trained on different Chain of Thought processes “in a fashion perhaps not too dissimilar to AlphaZero-style Monte-Carlo tree search”, said Mitchell. In other words, it simply doesn’t know how to solve a new issue and instead uses its most likely Chain of Thoughts to build up its extensive knowledge base until it is successful in solving it.
In other words, o3 isn’t truly creative—it simply relies on a vast library to trial-and-error its way to a solution.
” Brute force ( does not equals ) intelligence. o3 relied on extreme computing power to reach its unofficial score”, Jeff Joyce, host of the Humanity Unchained AI podcast, argued on Linkedin. ” True AGI would need to solve problems efficiently. Even with unlimited resources, o3 couldn’t crack over 100 puzzles that humans find easy”.
Vahidi Kazemi, an OpenAI researcher, is a prisoner of war in the” This is AG I” camp. He cited the earlier o1 model, which he claimed was the first to reason rather than just predict the next token, as evidenced by his assertion that “in my opinion we have already achieved AG I.”
He drew a parallel to scientific methodology, contending that since science itself relies on systematic, repeatable steps to validate hypotheses, it’s inconsistent to dismiss AI models as non-AG I simply because they follow a set of predetermined instructions. That said, OpenAI has” not achieved’ better than any human at any task,'” he wrote.
In my opinion, O1 is where we have already attained AGI, which is even more evident. We have not succeeded in being “better than any human at any task,” but what we do do is “better than most humans at most tasks.” Some claim that LLMs are only adept at following a recipe. Firstly, no one can really explain…
— Vahid Kazemi ( @VahidK ) December 6, 2024
For his part, OpenAI CEO Sam Altman isn’t taking a position on whether AGI has been reached. He simply stated that the O3 Mini is an “incredibly smart model with very good performance and cost,” and that the O3 Mini is a “very very smart model.”
Being intelligent may not suffice to demonstrate that AGI has been achieved, at least not yet. But stay tuned:” We view this as sort of the beginning of the next phase of AI”, he added.
Generally Intelligent Newsletter
A generative AI model’s voiceover for a weekly AI journey.