Thanks a lot, this looks awesome!! My language's hash tables use unidirectional linear probing without tombstones. Knuth's algorithm looks like could be a straight upgrade. Can't believe it was published half a century ago. Makes me wonder what else I'm sleeping on.
Even FCFS is a major upgrade over linear probing WRT worst-case probe length (see my benchmarks), despite being a 1-line change. Yet I’ve virtually never seen it in the wild.
A concurrent (though not lock-free) version of BLP is described here:
In my experiments, “Fibonacci hashing” only generates acceptable pseudorandomness for sequential inputs. Multiplication mixes low bits into high bits, but you also need to mix high bits into low bits with e.g. shift/xor. A fast mixing function like the Murmur3 finalizer is almost as fast and gives much better results.
Fibonacci hashing calculates the array index by throwing away the low bits so there’s not much point mixing the high bits into them. A hash table needs indexes to be dispersed well enough to keep probe sequences short; good randomness is sufficient, but the point of fibonacci hashing is that something much weaker is often good enough.
Maybe this is obvious to folks who have seen Robin Hood hashing before, but it took a minute for me to get my head around the significance of the author's Score function:
Score(Index, Key) is (Index - Key) & mask
Because keys are hashes and hashes are where probe sequences start, this is just a clever way to implicitly store how far away each entry is from its preferred location. And that's the intuition for why tombstones aren't needed: if deleting something leaves a gap, and you know the entries in front of it are already shifted forward, that means it's safe to shift them backward to fill the gap.
tobin_baker | 2 months ago
Shameless plug for bidirectional linear probing, which outperformed Robin Hood in my benchmarks:
https://github.com/senderista/hashtable-benchmarks
matheusmoreira | 2 months ago
Thanks a lot, this looks awesome!! My language's hash tables use unidirectional linear probing without tombstones. Knuth's algorithm looks like could be a straight upgrade. Can't believe it was published half a century ago. Makes me wonder what else I'm sleeping on.
tobin_baker | 2 months ago
Even FCFS is a major upgrade over linear probing WRT worst-case probe length (see my benchmarks), despite being a 1-line change. Yet I’ve virtually never seen it in the wild.
A concurrent (though not lock-free) version of BLP is described here:
https://fmt.ewi.utwente.nl/media/92.pdf
I stole part of my algorithm from this paper but have not implemented it so can’t vouch for its performance.
fanf | 2 months ago
Small hash tables can be very cute :-)
A nice way to do that is fibonacci hashing.
tobin_baker | 2 months ago
In my experiments, “Fibonacci hashing” only generates acceptable pseudorandomness for sequential inputs. Multiplication mixes low bits into high bits, but you also need to mix high bits into low bits with e.g. shift/xor. A fast mixing function like the Murmur3 finalizer is almost as fast and gives much better results.
fanf | 2 months ago
Fibonacci hashing calculates the array index by throwing away the low bits so there’s not much point mixing the high bits into them. A hash table needs indexes to be dispersed well enough to keep probe sequences short; good randomness is sufficient, but the point of fibonacci hashing is that something much weaker is often good enough.
[OP] bitshift | 2 months ago
Maybe this is obvious to folks who have seen Robin Hood hashing before, but it took a minute for me to get my head around the significance of the author's Score function:
Because keys are hashes and hashes are where probe sequences start, this is just a clever way to implicitly store how far away each entry is from its preferred location. And that's the intuition for why tombstones aren't needed: if deleting something leaves a gap, and you know the entries in front of it are already shifted forward, that means it's safe to shift them backward to fill the gap.