Mastodon Feed: Post

Mastodon FeedMar 2, 2026, 5:17 PM

Boosted by kornel ("Kornel"):
pamelafox@fosstodon.org ("Pamela Fox") wrote:

BullshitBench: a benchmark that measures whether models detect nonsense, call it out clearly, and avoid confidently continuing with invalid assumptions.
https://github.com/petergpt/bullshit-benchmark