Mastodon Feed: Post

Mastodon Feed

kornel ("Kornel") wrote:

I've nerdsniped myself into finding faster approximation of cube root (needed for Lab #color space)

- polynomial approximations alone are not sufficient for x < 0.2. Not even big ones, blended, nor Padé.
- bit-twiddling tricks don't vectorize, and/or need exp2 that is expensive itself.

In the end I've found that a simple polynomial + 2× Halley's Rational Method is unbeatable. Precise. Autovectorizes nicely. 2.5× faster than std. 5× faster in batches of four.

https://github.com/kornelski/dssim/blob/c86745c423478993a12edf59ec76047ff52b3da4/dssim-core/src/tolab.rs#L48-L61