On 31 Mar 2015 02:19, "Eric Anholt" <<a href="mailto:eric@anholt.net">eric@anholt.net</a>> wrote: > > Thomas Helland <<a href="mailto:thomashelland90@gmail.com">thomashelland90@gmail.com</a>> writes: > > > This should give better cache locality, less memory consumption, > > less code, and should also be faster since we avoid a modulo operation. > > Also change table size to be power of two. > > This gives better performance as we can do bitmasking instead of > > modulo operations for fitting the hash in the address space. > > By using the algorithm hash = sh + i/2 + i*i/2 > > we are guaranteed that all retries from the quad probing > > are distinct, and so should be able to completely fill the table. > > This passes the test added to exercise a worst case collision scenario. > > Also, start at size = 16 instead of 4. > > This should reduce some allocation overhead > > when constantly using tables larger than 3 entries. > > The amount of space used before rehash is changed to 70%. > > This should decrease collisions slightly, leading to better performance. > > I've run a test to confirm that the (i + i * i) / 2 probing does > actually get us unique values, up to a maximum table size of (1 << 31) > entries. Cool! I just trusted the article blindly, nice to have it confirmed. > > The amount-filled-before-rehash should probably be a separate commit, > since it's increasing memory overhead, and each commit have its own > performance data justifying it (actually, it looks like that's missing > From this commit message). Similarly for start size = 16 instead of 4. > I wouldn't mind set/hash changes being in the same commits, though. I agree, sounds like a good plan. The performance data is indeed missing from the message. I rebased it in, but f'ed up and used the old commit id when sending out. Maybe I should include the 15'ish top performance hogs for each commit? Or stick them in the cover letter? Might be nice to get a full understanding of the effect things have. At least for upping the starting size this affects the whole top 10 list. So percentage of total will get skewed for everything else. I'll get a new version out by the weekend probably.