Look out, OpenAI's latest chatbot hallucinates less and might even count to three

Three shall be the number thou shalt count, and the number of the counting shall be three.

OpenAI has unleashed yet another new chatbot on we poor, unsuspecting humans. We give you o1, a chatbot designed for more advanced reasoning that’s claimed to be better at things like coding, math and generally solving multistep problems.

Perhaps the most significant change from previous OpenAI LLMs is a shift from mimicking patterns found in text training data to a focus on more direct problem solving, courtesy of reinforcement learning. The net result is said to be a more consistent, accurate chatbot.

“We have noticed that this model hallucinates less,” OpenAI’s research lead, Jerry Tworek, told The Verge. Of course, “hallucinates less” doesn’t mean no hallucinations at all. “We can’t say we solved hallucinations,” Tworek says. Ah.

Still, o1 is said to use something akin to a “chain of thought” that’s similar to how we humans process problems, step-by-step. That contributes to much higher claimed performance in tasks like coding and math.

Apparently, o1 scored 83% in the qualifying exam for the International Mathematics Olympiad, far better than the rather feeble 13% notched up by GPT-4o. It has also performed well in coding competitions and OpenAI says an imminent further update will enable it to match PhD students, “in challenging benchmark tasks in physics, chemistry and biology.”

However, despite these advances, or perhaps because of them, this new bot is actually worse by some measures. It has fewer facts about the world at its finger tips and it can’t browse the web or process images. It’s also currently slower to respond and spit out answers, currently, than GPT-4o.

Of course, one immediate question that follows from all this is whether this new chatbot still suffers any of the surprising limitations of previous bots. Can o1, for instance, even count to three?

Apparently, yes, it can. GPT-4o can apparently be flummoxed when ordered to count the number of “r’s” in the word “strawberry” only managing to count to two. But o1 gets all the way to three.

That step-change in counting ability, however, doesn’t come cheap. Developer access costs $15 per 1 million input tokens and $60 per 1 million output tokens. That’s three times and four times, respectively, more expensive than GPT-4o.

ChatGPT Plus and Team users reportedly already have access to the initial version of the bot, known as o1-preview. Meanwhile, in future a version called o1-mini will be made available for free, though OpenAI hasn’t put a date on that.

Your next machine

(Image credit: Future)

Best gaming PC: The top pre-built machines.
Best gaming laptop: Great devices for mobile gaming.

All told, it certainly sounds like a bot capable of more reliable responses—along with more practical reasoning—is a step towards both something both more useful in the real world and also closer to general or human-like intelligence.

That, indeed, is OpenAI’s plan. “We have been spending many months working on reasoning because we think this is actually the critical breakthrough,” OpenAI’s chief research officer, Bob McGrew says. “Fundamentally, this is a new modality for models in order to be able to solve the really hard problems that it takes in order to progress towards human-like levels of intelligence.”

Anyway, if it really can count to three, colour me impressed. And as a routine precaution it goes without saying that I for one welcome, well, you know the rest.

About Post Author

See author's posts

About Post Author

Related

You may have missed