Mastodon Feed: Post

Mastodon Feed

dysfun@treehouse.systems ("gaytabase") wrote:

anyway today i am implementing something which appears to require 16 XMM registers for maximum performance. that is to say the compiler could probably use fewer, but you'd lose at least a few cycles doing so.

with avvx2, you are upgraded to 32 XMM registers, which means we can actually do the second half of the algorithm on twice as much data at once and benefit from submit latency being lower than result latency. alas that isn't the most expensive half, but oh well.