<p dir="ltr"><br>
On Feb 28, 2015 5:05 PM, "Jason Ekstrand" <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
><br>
><br>
> On Feb 28, 2015 4:55 AM, "Thomas Helland" <<a href="mailto:thomashelland90@gmail.com">thomashelland90@gmail.com</a>> wrote:<br>
> ><br>
> > So here comes my hash-table series mentioned earlier.<br>
> ><br>
> > So, first of all, there's some issues.<br>
> > I've been strugling with hitting assertion failures.<br>
> > The table returns null at times when it apparently should not.<br>
> > It occurs after patch 1, and is fixed by patch 2.<br>
> > It also occured when I was tweaking the amount of free space in the<br>
> > table.<br>
> ><br>
> > With this series it should be tweaked and working (apart from patch 1),<br>
> > but it would be wonderfull if people gave it a run.<br>
> > I'm suspecting we might have some threading issues,<br>
> > but I don't really have much experience with threading in C/C++,<br>
> > so I haven't really debugged it to much.<br>
><br>
> It sounds like we need more unit tests. The ones we currently have are pretty <br>
> sparse (as seen by the double-insertion bug I found recently). Please figure <br>
> out why the first patch fails and write a unit test to trigger that.</p>
<p dir="ltr">I'll look into this and see what I can figure out.</p>
<p dir="ltr">><br>
> > I did not have any good testcases, so a shader-db run had to do.<br>
> > If anyone knows of any cpu-bound applications that specifically<br>
> > hit the hash-table I would be happy to hear about it.<br>
><br>
> <br>
> If you've got an Intel card please also test and give shader-db results with<br>
> INTEL_USE_NIR=1. The NIR code thrashes the hash table/set code more than<br>
> anything else in the driver. NIR also uses different access patterns than the<br>
> rest of the driver since most of the hash table lookups outside of NIR are for<br>
> glUint -> object lookups in entry points.<br>
><br>
> If this is shown to have good improvements, it might be worth looking at the<br>
> C++-based hash table we use in the GLSL compiler which doesn't currently<br>
> use the implementation in util.<br>
></p>
<p dir="ltr">These results are from runs with INTEL_USE_NIR=1.<br>
(I should have probably mentioned that.)<br>
For prog_hash_table I think there's some easy gains to be had.<br>
Just changing from separate chaining to a probing implementation<br>
will improve cache-locality greatly, which should be nice.<br>
Right now, with these patches up to the murmur hash patch,<br>
get_node amounts to 5.3 % of hits with oprofile.<br>
I'll look into this once I find the cause of the assertion failures.</p>
<p dir="ltr">> > These are results from running shader-db with oprofile after each commit.<br>
> > Function master comm1 comm2 comm3 comm4 comm5 comm6 comm7 comm8<br>
> > mesa_hash_data 2.81 ? 3.09 3.05 2.99 3.23 3.25 3.11 3.12<br>
> > hash_table_insert 4.28 ? 2.57 2.58 2.58 2.71 2.52 2.52 2.50<br>
> > hash_table_search 4.59 ? 2.67 2.72 2.70 2.87 2.64 2.64 2.59<br>
> > set_add 2.69 ? 2.81 2.80 2.70 1.69 1.65 1.74 1.72<br>
> > set_search 2.75 ? 2.84 2.86 2.74 2.11 2.07 2.08 2.09<br>
> > runtime 175 ? 170 162 165 157 160 160 164<br>
> ><br>
> > With regards to cheaper hash-functions:<br>
> > It seems this is a case of "much pain for no gain".<br>
> > Looking at this site[1] it looks like most popular<br>
> > hashes have about the same instruction cost<br>
> > for smaller amounts of data(10 bytes).<br>
> > I tried copying the C-code example from the<br>
> > Wikipedia-article on Murmur-hash 3, and using this.<br>
> > It did show a speed-up, but not close to what I would expect.<br>
> ><br>
> > It should also be noted that the crc32c instruction has a latency<br>
> > of three cycles on older processors[2].<br>
> > That article also has a different implementation of crc32c<br>
> > with fallback that I have not tried.<br>
> ><br>
> > [1]<a href="https://github.com/rurban/smhasher"> https://github.com/rurban/smhasher</a><br>
> > [2]<a href="http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software"> http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software</a><br>
> ><br>
> > Thomas Helland (8):<br>
> > util: Change hash_table to use quadratic probing.<br>
> > util: Change table size to be power of two<br>
> > glsl: Change ir_const_expr to use util/hash_table<br>
> > util: Change util/set to use quadratic probing<br>
> > util: Change set to use power-of-two sized table<br>
> > util: Change size of table to have 23% free<br>
> > util: Add murmur3 hashing function<br>
> > util: Add header for hardware crc32c<br>
> ><br>
> > src/glsl/ir_constant_expression.cpp | 20 +++---<br>
> > src/mesa/x86/common_x86.c | 4 ++<br>
> > src/mesa/x86/common_x86_features.h | 8 +++<br>
> > src/util/crc32c_hw.h | 67 ++++++++++++++++++++<br>
> > src/util/hash_table.c | 120 +++++++++++++++++++-----------------<br>
> > src/util/hash_table.h | 55 ++++++++++++++++-<br>
> > src/util/set.c | 97 ++++++++++++++---------------<br>
> > src/util/set.h | 1 -<br>
> > 8 files changed, 258 insertions(+), 114 deletions(-)<br>
> > create mode 100644 src/util/crc32c_hw.h<br>
> ><br>
> > --<br>
> > 2.2.1<br>
> ><br>
> > _______________________________________________<br>
> > mesa-dev mailing list<br>
> ><a href="mailto:mesa-dev@lists.freedesktop.org"> mesa-dev@lists.freedesktop.org</a><br>
> ><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev"> http://</a><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">lists.freedesktop.org</a><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">/mailman/</a><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">listinfo</a><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">/</a><a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev">mesa-de</a>v</p>