[Mesa-dev] [PATCH 0/4] RFC: Quadratic probing for the win

Fri Feb 10 08:41:13 UTC 2017

I've looked into Mesa hash-table some time ago, didn't finish anything 
(sigh, I have troubles finishing things off :( ). Anyway some things I 
got from there, in no particular order:

1. I've created a tool that records hash query traces and then replays 
them, sorta like apitrace for hashing. Well, "tool" is a too big word, 
just a bunch of hacks. It helps to get more precise and less noisy 
benchmark data than perf on full runs. I can dump the code somewhere if 
it can be useful.

2. There is a very interesting recent article benchmarking several 
hash-table algorithms (linear probing, quadratic, cuckoo, robin-hood) in 
various scenarios: "A Seven-Dimensional Analysis of Hashing Methods and 
its Implications on Query Processing" 
http://www.vldb.org/pvldb/vol9/p96-richter.pdf

3. The paper makes several interesting conclusions, for example that the 
most simple hash-function should be used for hash-tables, anything more 
complex than multiply-shift[1] doesn't pays off. Indeed, I found that 
slow FNV1a used by Mesa can be replaced by multiply-shift with better 
results. For arbitrary length inputs FNV1a_Yoshimitsu_Triad can be used, 
it's 2x faster than FNV1a on 8-byte inputs already, on larger inputs the 
gap will only widen (FNV1a uses 4 cycles per byte on Haswell). Hashing 
null-terminated strings (identifiers) is more complex, because arbitrary 
length hash-function on inputs with varying length will be dominated by 
branch-mispredicts -- I used approach to merge strlen() and hashing into 
single function (strlen->mask->hash block) that eliminates almost all 
branch mispredicts, it beats FNV1a on strings too, by 30-40%.

4. The most important thing is to eliminate integer division moving to 
2^N tables, it immediately gives 2x speedup against old Mesa hash-table 
implementation. It's possible to eliminate division with prime-sized 
tables too, using pre-computed magic numbers, but 2^N tables are just 
better.

5. For rehashing special insert() function can be used, with superfluous 
checks removed. It gives a small speedup.

6. On i965 on debug builds Mesa creates huge amount of hash queries, 
because some validation code (turned off on release builds) creates 
insane number of hash tables significantly distorting overall picture. 
Better to use release builds or cut out this part of code.

7. Robin-Hood has some pitfalls like other linear probing algorithms. If 
someone will try to clone hash-table iterating its elements and 
inserting them one-by-one to a newly created empty hash-table, they are 
going to have baaaaad time. Rust was hit by this recently. 
http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion 
At the very least this way of cloning was used in opt_copy_propagation.

[1] 
https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic

08.02.2017 23:35, Thomas Helland пишет:
> I was cleaning up my local git repo, and came across this series.
> Last time it was discussed was all the way back in April 2015.
> Things looked pretty good back then, but we where seeing a smaller
> regression in CPU-bound scenarios as Eric found with forcing software
> rendering while running Minecraft.
>
> I figured I'd do a retest of the series to see how it fares today.
> Using perf on shader-db I see:
> hash_table_search being cut from 3.88% down to 1.83%.
> _mesa_hash_data being cut from 1.47% down to 1.25%
> _mesa_hash_table_rehash going from 0.23% up to 1.16%
> hash_table_insert being cut from 2.26% down to 0.33%
>
> This yields an approximate 10% reduction in shader-db runtime.
>
> The increase in the rehash function is a bit peculiar.
> I'll look into increasing the table one more step, as a four entry
> hash table seems a bit on the low side. I'll also work on getting
> some more reliable numbers from a real world application, along
> with some more runs of shader-db to get better statistical certainty.
> I'll pull out my i3-6100 / RX460 combo and give this a spin
> with Borderlands 2 I think, as Marek's threaded GL work suggests
> this is a title with heavy CPU bottlenecking.
>
> As a side note, Robin Hood hashing was mentioned in the thread from
> back in April 2015. I actually have an implementation of that, but
> I'm still working out an issue that our make-check tests doesn't
> catch that causes corruption in the table when runing shader-db.
> I'm not to sure about it's effect though, as it sacrifices insert
> speed for lookup speed, but one never knows until one tests.
>
> Let me know if this is something I should persue. If not I'll
> mark this series as "junk" in my git repo, and get on with the cleaning.
>
> Thomas Helland (4):
>    util/tests: Expand collision test for hash table
>    util: Change hash_table to use quadratic probing
>    util: Change util/set to use quadratic probing
>    util: Use set_foreach instead of rolling our own
>
>   src/util/hash_table.c                 | 102 +++++++++--------------------
>   src/util/hash_table.h                 |   3 +-
>   src/util/set.c                        | 118 ++++++++++++----------------------
>   src/util/set.h                        |   3 +-
>   src/util/tests/hash_table/collision.c |  14 ++++
>   5 files changed, 89 insertions(+), 151 deletions(-)
>