
It's interesting how RAM speed (lack of thereof) has ruined advanced data structures. Big flat array usually wins over fancy trees with clever logic. Brute-force SIMD can be faster than avoiding redundant work.
I've never been able to use bloom filters either — cache misses are too costly.