Do AI Models Hallucinate? The Answer In This New Study Might Surprise You

do-ai-models-hallucinate?-the-answer-in-this-new-study-might-surprise-you
Do AI Models Hallucinate? The Answer In This New Study Might Surprise You

The launch of AI models keeps increasing with each passing day. From Google’s Gemini to OpenAI’s GPT-4o, the list is on the rise.

However, a new study is sharing some interesting findings that put to end a debate about AI models hallucinating. The conclusive answer is yes, and they’re all guilty of doing so. Some are involved so much that the end result is hilarious while in other cases, it’s a huge issue.

Not every model will make up false narratives at the same pace as the next but they’re all involved. The type of misinformation spouted depends on the kind of source involved and what it’s exposed to.

The research study from Cornell spoke about universities benchmarking the degree of hallucination with assistance from fact-checkers. As per the report, no model did great, no matter what topic was involved. Strikingly, all the models hallucinated and those that did the least answered little to no questions. Their fear of getting things wrong was so real.

As per the authors, the biggest take-home message is that AI models cannot be trusted. Even those deemed to be the best produce text that’s free from hallucinations, just 35% of the entire time.

To make the study more interesting and raise the standard of benchmarks, the authors used topics that even Wikipedia failed to have references for. More than one dozen different models were studied, and most of them were new releases.

From Meta and Google to Mistral and OpenAI, the list included them all. And as per the study, no model is hallucinating any less than before, including the biggest names in the industry.

For instance, GPT-4o didn’t perform any better in terms of fewer hallucinations than its much older counterpart GPT-3.5. Still, the AI giant had the least amount of hallucinations calculated in general, followed by Mixtral and Perplexity Sonar.

In terms of which topics proved to be the most difficult here, it was finance and the world of entertainment/celebrities. Interestingly, models found IT and geography the easiest to respond to. The logic behind this could be more training done in these domains. Where Wikipedia answers or references couldn’t be found, the models struggled a lot. This shows how most of them were trained using Wikipedia the most.

Those models who are capable of searching the web also struggled with non-Wiki queries. Did model size matter? Not really as small designs offered the same type or levels as their bigger-sized counterparts.

It’s a clear message to the world that vendors aren’t delivering what they’re claiming. The models fall short in terms of promises made or perhaps they were exaggerated too much. Whatever the case may be, it’s safe to conclude as per the authors that hallucinations and AI are in synchronizing and won’t be going away anytime soon.

What’s the solution? More policies and greater regulation seem to be the only way out, as per the researchers.

Image: DIW-Aigen

Read next: A New Research Shows that Usage of Tablets Among Children can Develop Anger Issues in Them