dysfun@treehouse.systems ("gaytabase") wrote:
i've had a look in agner fog's spreadsheet and frankly i'm none the wiser - zen 5 doesn't exactly look like a speed demon either and he doesn't list his methodology for masks (and then frankly it'd be useful to see at least 2 figures - mask on and mask off (and probably at least mask half full as well)).
meanwhile, i have calculated the worst case for doing it on the cpu and it looks like it may or may not be better, depending on how masks are processed, provided we know the size in advance. there are some means by which i can know the size in advance, of course (the simplest being padding to the worst case size with inert data).
overall i wouldn't say it's looking promising for avx512 gather intrinsics, but again i'd have to write a benchmark.