Mastodon Feed: Post

Mastodon Feed

baldur@toot.cafe ("Baldur Bjarnason") wrote:

RE: https://mastodon.acm.org/@mxp/116475436932395582

“LMs Corrupt Your Documents When You Delegate”

https://arxiv.org/abs/2604.15597

> Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models [...] corrupt an average of 25% of document content by the end of long workflows

The only use case that didn't show catastrophic degradation was coding, although bear in mind that this only attempts to benchmark degradation and doesn't assess design, reliability, or quality of the output.