Skip to content

ThePawn02

Gaming and Streaming Content

  • Blog
  • Editor's Picks
  • eSports
  • Guides
  • Headlines
  • News
  • Reviews
  • Uncategorized
  • Website Update
Primary Menu
  • Home
  • Watch Live
  • News
  • eSports
  • Blog
  • Reviews
  • Guides
  • Guild Login
    • Guild Mentality
    • The Zealots
    • Malign
  • Socials
    • Youtube Channel
    • Twitch Channel
    • Kick.com
    • Twitter
    • Instagram
    • Facebook
Subscribe
  • Home
  • 2025
  • May
  • ChatGPT’s hallucination problem is getting worse according to OpenAI’s own tests and nobody understands why
  • News

ChatGPT’s hallucination problem is getting worse according to OpenAI’s own tests and nobody understands why

With better reasoning ability comes even more of the wrong kind of robot dreams.
ThePawn.com May 6, 2025 3 min read
ChatGPT’s hallucination problem is getting worse according to OpenAI’s own tests and nobody understands why

With better reasoning ability comes even more of the wrong kind of robot dreams.

Remember when we reported a month ago or so that Anthropic had discovered that what’s happening inside AI models is very different from how the models themselves described their “thought” processes? Well, to that mystery surrounding the latest large language models (LLMs), along with countless others, you can now add ever worsening hallucination. And that’s according to the testing of the leading name in chatbots, OpenAI.

The New York Times reports that an OpenAI’s investigation into its latest GPT o3 and GPT o4-mini large LLMs found they are substantially more prone to hallucinating, or making up false information, than the previous GPT o1 model.

“The company found that o3 — its most powerful system — hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent,” the Times says.

“When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.”

OpenAI has said that more research is required to understand why the latest models are more prone to hallucination. But so-called “reasoning” models are the prime candidate according to some industry observers.

“The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer,” the Times claims.

In simple terms, reasoning models are a type of LLM designed to perform complex tasks. Instead of merely spitting out text based on statistical models of probability, reasoning models break questions or tasks down into individual steps akin to a human thought process.

OpenAI’s first reasoning model, o1, came out last year and was claimed to match the performance of PhD students in physics, chemistry, and biology, and beat them in math and coding thanks to the use of reinforcement learning techniques.

AI, explained

OpenAI logo displayed on a phone screen and ChatGPT website displayed on a laptop screen are seen in this illustration photo taken in Krakow, Poland on December 5, 2022.

(Image credit: Jakub Porzycki/NurPhoto via Getty Images)

What is artificial general intelligence?: We dive into the lingo of AI and what the terms actually mean.

“Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem,” OpenAI said when o1 was released.

However, OpenAI has pushed back against that narrative that reasoning models suffer from increased rates of hallucination. “Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini,” OpenAI’s Gaby Raila told the Times.

Whatever the truth, one thing is for sure. AI models need to largely cut out the nonsense and lies if they are to be anywhere near as useful as their proponents currently envisage. As it stands, it’s hard to trust the output of any LLM. Pretty much everything has to be carefully double checked.

That’s fine for some tasks. But where the main benefit is saving time or labour, the need to meticulously proof and fact check AI output does rather defeat the object of using them. It remains to be seen whether OpenAI and the rest of the LLM industry can get a handle on all those unwanted robot dreams.

About Post Author

ThePawn.com

See author's posts

Continue Reading

Previous: LG’s new OLED technology literally stretches the boundaries of display tech and promises to turn almost any surface into a screen—or should that be the other way round?
Next: Analyst claims Clair Obscur: Expedition 33 is selling twice as fast as any other recent JRPG has on Steam—and it’s got me wondering what the secret sauce is

Related News

Cozy foraging game Out and About shares a new trailer as part of the Wholesome Direct, and there’s a playtest you can take part in right now
2 min read
  • News

Cozy foraging game Out and About shares a new trailer as part of the Wholesome Direct, and there’s a playtest you can take part in right now

ThePawn.com June 7, 2025
This puzzle game challenges you to tell a family’s story through organising photo albums and creating scrapbooks
2 min read
  • News

This puzzle game challenges you to tell a family’s story through organising photo albums and creating scrapbooks

ThePawn.com June 7, 2025
After years of radio silence, Gecko Gods has finally announced a release window and published a demo
2 min read
  • News

After years of radio silence, Gecko Gods has finally announced a release window and published a demo

ThePawn.com June 7, 2025

Latest YouTube Video

Check out these awesome streamers

ThePawn02 on twitch

From Gamewatcher

  • New RTS title Game of Thrones: War for Westeros coming from PlaySide in 2026
  • Jurassic World Evolution 3 revealed at Summer Game Fest, launching in October 2025 on PC, PS5, and Xbox Series X/S
  • Dune Awakening Patch Notes - 1.1.0.5 Hotfix 1
  • Cyberpunk 2077 Patch 2.3 Release Date - Latest News
  • Dune Awakening Server Status - Latest Maintenance Alerts

From IGN

  • Dying Light: The Beast – The First Preview
  • Dress to Impress Codes (June 2025)
  • Chronicles: Medieval Aims to Take You From Middle Ages Zero to Hero
  • Capcom Confirms Year 3 Roster for Street Fighter 6
  • New Ryu Ga Gotoku Game Project Century Rerevealed as Stranger Than Heaven

From Kotaku

  • Splitgate 2 Dev Says He's Tired Of Playing Call Of Duty And Wants Titanfall 3 While Wearing A 'Make FPS Great Again' Hat: 'I’m Not Here To Apologize'
  • Kotaku’s Weekend Guide: 5 Great Games We’re Kicking Off The Summer With
  • Kotaku’s Biggest Gaming Culture News For The Week June 07, 2025
  • Kotaku’s Best Game Tips For The Week June 07, 2025
  • Kotaku’s Opinions For The Week June 07, 2025

.

You may have missed

Cozy foraging game Out and About shares a new trailer as part of the Wholesome Direct, and there’s a playtest you can take part in right now
2 min read
  • News

Cozy foraging game Out and About shares a new trailer as part of the Wholesome Direct, and there’s a playtest you can take part in right now

ThePawn.com June 7, 2025
This puzzle game challenges you to tell a family’s story through organising photo albums and creating scrapbooks
2 min read
  • News

This puzzle game challenges you to tell a family’s story through organising photo albums and creating scrapbooks

ThePawn.com June 7, 2025
After years of radio silence, Gecko Gods has finally announced a release window and published a demo
2 min read
  • News

After years of radio silence, Gecko Gods has finally announced a release window and published a demo

ThePawn.com June 7, 2025
Dying Light: The Beast – The First Preview
7 min read
  • Headlines

Dying Light: The Beast – The First Preview

ThePawn.com June 7, 2025
Privacy Policy
  • Home
  • Watch Live
  • News
  • eSports
  • Blog
  • Reviews
  • Guides
  • Guild Login
  • Socials
  • Twitch
  • YouTube
  • Instagram
  • Twitter
  • Facebook
  • Kick.com
Copyright © All rights reserved. | MoreNews by AF themes.