Mastodon Feed: Post

Mastodon FeedDec 17, 2025, 10:39 AM

Simulated Company Shows Most AI Agents Flunk the Job:

"The results should comfort people worried about AI replacing them. The best of them, Claude 3.5 Sonnet from Anthropic, only completed 24% of the tasks. Google’s Gemini 2.0 Flash came in second with 11.4%, and OpenAI’s GPT-4o was third with 8.6%."

lol. https://www.cs.cmu.edu/news/2025/agent-company