Mastodon Feed: Post

Mastodon Feed

dysfun@treehouse.systems ("gaytabase") wrote:

RE: https://social.afront.org/@stylus/116235651886344794

apparently SVE2 is on phones. and the bitperm impl is less than twice the time of the loop overhead.

and look at the difference in the third impl (there are four sets of three results, each followed by an impl number). the pdep (which is like a nanosecond slower on x86-64) is a full 4 nanoseconds slower. most of which seems to be because it does a less good job of compensating for the loop carried dependency than x86-64. most interesting!