[systemd-devel] Hashmap iteration is too costly

vcaputo at pengaru.com vcaputo at pengaru.com
Fri Jul 14 10:18:43 UTC 2017


The current hashmap iteration incurs at least one function call per
iteration, and in my observations using gcc 6.3 w/-g -O2, it's two:

 internal_hashmap_iterate()
 hashmap_iterate_entry()

for every element in the hashmap.

In profiles of `journalctl -b --no-pager` having multiple (50) log files,
the hashmap iteration shows up as ~10% of the CPU in callgrind output.

Since it appears effort has been made to keep the hashmap structure opaque,
making these iterators zero-cost seems non-trivial.

As an experiment, I added two new hashmap functions:

 hashmap_get_array()
 ordered_hashmap_get_array()

These return an allocated array snapshot of the hashmap's contents, in the
same order they would be iterated in.  Their implementation has direct
access to the hashmap internals, and uses inlined versions of the
type-specific iterator functions to avoid the per-element function call
overhead of the usual iterators.

You can find the code here:

https://github.com/systemd/systemd/compare/303608...vcaputo:hashmap_get_array

The last commit includes some numbers, which I'll include them here for
convenience:

Before:

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m16.452s
   user    0m16.371s
   sys     0m0.066s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m16.375s
   user    0m16.310s
   sys     0m0.057s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m16.390s
   user    0m16.329s
   sys     0m0.047s

 After:

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m15.827s
   user    0m15.769s
   sys     0m0.052s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m15.650s
   user    0m15.592s
   sys     0m0.050s

  # time ./journalctl -b -1 --no-pager > /dev/null
   real    0m15.646s
   user    0m15.574s
   sys     0m0.056s


Note this is adding the cost of a malloc and free, and copying the
elements.  It's still faster than the existing hashmap iterator.

I'm not sold on this being a good solution, it's more meant to shed light
on what I feel is a bit of a wart and hopefully start some discussion.

Regards,
Vito Caputo


More information about the systemd-devel mailing list