O3 and O4-mini models of OpenAi hallucinate more: industry alert

By Canuto

The new models of artificial intelligence of OpenAi, O3 and O4-mini, show alarming indicators of higher rates of hallucinations’ regarding their predecessors, unleashing questions about the reliability and precision of advanced systems of AI in critical tasks.
***

Openai admits that its O3 and O4-mini reasoning models halld up more than the previous versions, according to internal and external tests.
‘Hallucinations’ hinder the adoption of AI in sectors that require high precision, such as legal or financial.
The company explores to integrate web search capabilities and new training strategies to reduce errors, although they still do not fully understand hallucinations.

Alert for the increase in hallucinations in OpenAi

Openai, leader in the development of artificial intelligence, recently launched its O3 and O4-mini reasoning models. Despite promising new advances, these versions have generated concern by showing a greater tendency to “hallucinate” or invent information, according to a technical report cited by Techcrunch.

Hallucinations have been a historical challenge in the field of AI. These are errors where the model presents fictitious information as if it were true. Although each new iteration used to bring improvements, the trend is reversed with these new developments: both O3 and O4-mini present higher errors rates compared to previous models, such as O1, O1-MINI and O3-MINI, as well as with traditional solutions such as GPT-4O.

Worrying figures: internal and external tests confirm the problem

According to data published by OpenAI and collected by TechCrunch, the O3 model incurred hallucinations in 33% of the Personqa test questions, an internal evaluation focused on measuring the precision of knowledge about people. For comparative purposes, previous models O1 and O3-MINI showed respective rates of 16% and 14.8%.

The O4-mini model even exceeded those figures, failing in a disturbing 48% of the cases evaluated under the same standard. These statistics double or even triple the error levels of previous models, which represents a serious challenge for adoption in areas where precision is critical.

For their part, external laboratories such as transluce have detected similar trends. They found that the O3 model came to invent details about alleged actions, such as claiming to have executed code outside of chatgpt in a 2021 MacBook Pro, which is technically impossible for AI.

Experts discuss the causes and possible solutions

Openai itself acknowledges not completely understanding the reason for the increase in hallucinations. In their technical documentation they affirm that “more research is needed” to understand why the expansion of reasoning models can contribute to greater generation of errors.

The first hypotheses point to how the reinforcement learning used for the models of the series or. In this regard, Neil Chowdhury, a Transluce and Ex OpenAI researcher, argues that this method could amplify problems that are usually partially mitigated with classic post-processes techniques.

In parallel, Sarah Schwettmann, co -founder of Transluce, warns that the high hallucinations rate in O3 reduces its real utility for various applications. Kian Katanforoosh, a professor at Stanford and CEO of Workra, admitted TechCrunch that the model is promising in programming workflows, but tends to provide broken links, complicating tasks that require verification of external sources.

Implications for critical sectors and alternative strategies

The increase in ‘hallucinations’ in AI can significantly limit its implementation in sectors such as legal, financial or medical, where errors can lead to serious consequences. While the creativity resulting from these deviations allows to generate innovative ideas, in contexts where accuracy is unnegotiable, any failure erodes confidence.

To attack the problem, OpenAI is exploring the integration of web search capabilities into their models. For example, the GPT-4O version with online search function has reached 90% precision in simple tests. It is expected that, by allowing direct verification of external information, hallucination rates decrease, although this will depend on the disposition of users to interact with external search engines.

Niko Felix, Openai spokesman, affirms that “addressing hallucinations in all our models is a continuous research area and we continue working to improve its precision and reliability.”

New trend in AI: Reasoning and its paradox

In the last year, the artificial intelligence industry has turned to the development of reasoning models. This approach, which seeks to improve performance in complex tasks without multiplying computing and data costs, promises to move beyond the capacities of traditional models.

However, according to what O3 and O4-mini reflect, strengthening reasoning may have the side effect of amplifying ‘hallucinations’. The challenge now is to find a balance between advanced abilities and reliability, especially before implementing intelligent systems in contexts dominated by strict regulations or legal repercussions.

The debate is served and the race for truly reliable artificial intelligence continues, while companies, researchers and users observe carefully the next steps of OpenAi and their rivals in the sector.

Original image of Diariobitcoin, created with artificial intelligence, for free use, licensed under public domain.

This article was written by an AI content editor and reviewed by a human editor to guarantee quality and precision.

WARNING: Diariobitcoin offers informative and educational content on various topics, including cryptocurrencies, AI, technology and regulations. We do not provide financial advice. Cryptactive investments are high risk and may not be adequate for all. Investigate, consult an expert and verify the applicable legislation before investing. I could lose all its capital.