Mastodon Feed: Post

Mastodon FeedNov 6, 2025, 7:38 PM

Boosted by jsonstein@masto.deoan.org ("Jeff Sonstein"):
spenot@mas.to ("Spenot") wrote:

@justvanrossum Me too, however... "Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers." https://www.computerworld.com/article/4059383/openai-admits-ai-hallucinations-are-mathematically-inevitable-not-just-engineering-flaws.html