[Mesa-dev] [PATCH 0/8] Hash table and hash set reworking

Thu Mar 5 20:46:34 PST 2015

On Thu, Mar 5, 2015 at 5:36 PM, Thomas Helland <thomashelland90 at gmail.com>
wrote:

> 2015-02-28 17:05 GMT+01:00 Jason Ekstrand <jason at jlekstrand.net>:
> >
> > On Feb 28, 2015 4:55 AM, "Thomas Helland" <thomashelland90 at gmail.com>
> wrote:
> >>
> >> So here comes my hash-table series mentioned earlier.
> >>
> >> So, first of all, there's some issues.
> >> I've been strugling with hitting assertion failures.
> >> The table returns null at times when it apparently should not.
> >> It occurs after patch 1, and is fixed by patch 2.
> >> It also occured when I was tweaking the amount of free space in the
> >> table.
> >>
> >> With this series it should be tweaked and working (apart from patch 1),
> >> but it would be wonderfull if people gave it a run.
> >> I'm suspecting we might have some threading issues,
> >> but I don't really have much experience with threading in C/C++,
> >> so I haven't really debugged it to much.
> >
> > It sounds like we need more unit tests.  The ones we currently have are
> > pretty sparse (as seen by the double-insertion bug I found recently).
> > Please figure out why the first patch fails and write a unit test to
> trigger
> > that.
>
> What happens is that we fail to find an empty slot in the table.
> (We roll around back to the starting address, and give up)
> Should we add an assertion to flag that we fail to insert?
>

Yeah, that makes sense.  Yes, we should have an assert in the case that we
fail to insert and we'll have to be a bit more careful about our constants.

>
> The Wikipedia article on quadratic probing sheds some light
> on what is actually going on here, and somehow I managed
> to not notice it when I first read it. Quick summary:
>
> If we assume a form of:
> h(k,i) = k + c1 * i + c2 * i*i mod m
> (h is the hash, k is the starting hash, m is size)
>
> Then
> "For prime m > 2, most choices of c1 and c2 will make h(k,i)
> distinct for i in [0, (m-1)/2]. Such choices include c1 = c2 = 1/2,
> c1 = c2 = 1, and c1 = 0, c2 = 1. Because there are only about
> m/2 distinct probes for a given element, it is difficult to guarantee
> that insertions will succeed when the load factor is > 1/2."
>
> However, if the table is power-of-two sized:
> "... a good choice for the constants are c1 = c2 = 1/2,
> as the values of h(k,i) for i in [0,m-1] are all distinct."
>
> So this is not an issue if I fix up the hashing somewhat to use
> c1 = c2 = 1/2 as suggested, and we use 2^n table-size.
> In fact, it should guarantee that we don't fail to insert!
>
> I'll post a V2 of the series that does this, and squashes
> it together with the power-of-two change, to avoid regressions.
>
> Alternatively, we can use a different type of probing,
> like linear probing or robin-hood hashing.
> These have worse clustering behavior however.
>
> I'm a bit unsure how to write a test for this that works.
> I tried modifying the insert_many test to use the hash_function.
> My hope was that this should cause clustering and failure.
> Sadly this didn't trigger the fault even with 10'000 keys.
>

Yeah, that's tricky.  When I hit the double-insert bug a few weeks back it
took me 30 minutes to write the patch to fix it and 2 or 3 hours to figure
out how to hit it in a unit test.  One thing that would be good (but maybe
not sufficient) would be to have a test throws piles of adds and deletes at
the table and tracks them all in an array or something to verify the
results.  The data structure used to track things doesn't have to be fast,
it just has to be correct.

Also, it might be good to have a test that does a bunch of colliding
inserts but with a series of different hash values.  That way we can test
more than just "insert 1000 things with hash value 3" which tests basic
collision handling but isn't particularly complete.
--Jason

> Is there some way we can save the shader-ir, and then replay it?
> Then we can replay one of the shaders that trigger the fault as a test.
> Surely, the best "unit test" is to run a full shader-db run.
> This really trashes the table quite hard.
>
> Any suggestions would be appreciated.
>
> >
> >> I did not have any good testcases, so a shader-db run had to do.
> >> If anyone knows of any cpu-bound applications that specifically
> >> hit the hash-table I would be happy to hear about it.
> >
> > If you've got an Intel card please also test and give shader-db results
> with
> > INTEL_USE_NIR=1.  The NIR code thrashes the hash table/set code more than
> > anything else in the driver.  NIR also uses different access patterns
> than
> > the rest of the driver since most of the hash table lookups outside of
> NIR
> > are for glUint  -> object lookups in entry points.
> >
> > If this is shown to have good improvements, it might be worth looking at
> the
> > C++-based hash table we use in the GLSL compiler which doesn't currently
> use
> > the implementation in util.
> >
> >> These are results from running shader-db with oprofile after each
> commit.
> >> Function           master comm1  comm2  comm3  comm4  comm5  comm6
> comm7
> >> comm8
> >> mesa_hash_data     2.81   ?      3.09   3.05   2.99   3.23   3.25   3.11
> >> 3.12
> >> hash_table_insert  4.28   ?      2.57   2.58   2.58   2.71   2.52   2.52
> >> 2.50
> >> hash_table_search  4.59   ?      2.67   2.72   2.70   2.87   2.64   2.64
> >> 2.59
> >> set_add            2.69   ?      2.81   2.80   2.70   1.69   1.65   1.74
> >> 1.72
> >> set_search         2.75   ?      2.84   2.86   2.74   2.11   2.07   2.08
> >> 2.09
> >> runtime            175    ?      170    162    165    157    160    160
> >> 164
> >>
> >> With regards to cheaper hash-functions:
> >> It seems this is a case of "much pain for no gain".
> >> Looking at this site[1] it looks like most popular
> >> hashes have about the same instruction cost
> >> for smaller amounts of data(10 bytes).
> >> I tried copying the C-code example from the
> >> Wikipedia-article on Murmur-hash 3, and using this.
> >> It did show a speed-up, but not close to what I would expect.
> >>
> >> It should also be noted that the crc32c instruction has a latency
> >> of three cycles on older processors[2].
> >> That article also has a different implementation of crc32c
> >> with fallback that I have not tried.
> >>
> >> [1] https://github.com/rurban/smhasher
> >> [2]
> >>
> http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software
> >>
> >> Thomas Helland (8):
> >>   util: Change hash_table to use quadratic probing.
> >>   util: Change table size to be power of two
> >>   glsl: Change ir_const_expr to use util/hash_table
> >>   util: Change util/set to use quadratic probing
> >>   util: Change set to use power-of-two sized table
> >>   util: Change size of table to have 23% free
> >>   util: Add murmur3 hashing function
> >>   util: Add header for hardware crc32c
> >>
> >>  src/glsl/ir_constant_expression.cpp |  20 +++---
> >>  src/mesa/x86/common_x86.c           |   4 ++
> >>  src/mesa/x86/common_x86_features.h  |   8 +++
> >>  src/util/crc32c_hw.h                |  67 ++++++++++++++++++++
> >>  src/util/hash_table.c               | 120
> >> +++++++++++++++++++-----------------
> >>  src/util/hash_table.h               |  55 ++++++++++++++++-
> >>  src/util/set.c                      |  97 ++++++++++++++---------------
> >>  src/util/set.h                      |   1 -
> >>  8 files changed, 258 insertions(+), 114 deletions(-)
> >>  create mode 100644 src/util/crc32c_hw.h
> >>
> >> --
> >> 2.2.1
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150305/bb92a698/attachment-0001.html>