FRESH

Hacker News

Lessons from Hash Table Merging

84 points by attractivechaos

by exDM69

1 subcomments

Maybe this would be a suitable application for "Fibonacci hashing" [0][1], which is a trick to assign a hash table bucket from a hash value. Instead of just taking the modulo with the hash table size, it first multiplies the hash with a constant value 2^64/phi where phi is the golden ratio, and then takes the modulo.
There may be better constants than 2^64/phi, perhaps some large prime number with roughly equal number of one and zero bits could also work.
This will prevent bucket collisions on hash table resizing that may lead to "accidentally quadratic" behavior [2], while not requiring rehashing with a different salt.
I didn't do detailed analysis on whether it helps on hash table merging too, but I think it would.
[0] https://probablydance.com/2018/06/16/fibonacci-hashing-the-o... [1] https://news.ycombinator.com/item?id=43677122 [2] https://accidentallyquadratic.tumblr.com/post/153545455987/r...

by willvarfar

1 subcomments

Kudos, neat digging and writeup that makes us think :)
If you merge linear probed tables by iterating in sorted hash order then you are matching the storage order and can congest particular parts of the table and cause the linear probing worse case behaviour.
By changing the iteration order, or salting the hash, you can avoid this.
Of course chained hash tables don't suffer from this particular problem.
My quick thought is that hash tables ought keep an internal salt hidden away. This seems good to avoid 'attacks' as well as speeding up merging etc. The only downside I can think of is that the creation of the table needs to fetch a random salt that might not be quick, although that can alleviated by allowing it to be set externally in the table creation so people who don't care can set it to 0 or whatever. What am I missing?

by SkiFire13

1 subcomments

> I evaluated the following hash table libraries, all based on linear probing.
> Abseil
> Rust standard
> hashbrown
These hash tables are not based on plain linear probing, they use something that's essentially quadratic probing done in chunks. Not sure about the others but they might be doing something similar.

by AlotOfReading

0 subcomment

by tialaramex

3 subcomments

In Rust, don't do this, it's more work and it'll tend to be slower, often much slower.
HashMap implements Extend, so just h0.extend(h1) and you're done, the people who made your HashMap type are much better equipped to optimize this common operation.
In a new enough C++ in theory you might find the same functionality supported, but Quality of Implementation tends to be pretty frightful.

by wehateclusters

0 subcomment

by oleggromov

1 subcomments

There's a typo with 'ULL' string suffixes in the hexadecimal numbers in the first code example.