Mastodon Feed: Post

Mastodon FeedMay 31, 2026, 11:07 AM

Boosted by baldur@toot.cafe ("Baldur Bjarnason"):
jonny@neuromatch.social ("jonny (nonvenomous)") wrote:

@elebertus
Ive read so much LLM code at this point, there are still patterns that are present but elude my understanding, but one thing that's clear is that there are foundational flaw categories that are not improved upon by model version and appear in wildly different projects using wildly different models and harnesses. Testing is a big nexus of those flaws. I am not close to what would be a satisfying explanation of the dynamics, but every project suffers fucked testing problems.