[poppler] [RFC] Replace GooHash by std::unordered_map

Adam Reichold adam.reichold at t-online.de
Tue Feb 20 07:58:24 UTC 2018

Hello again,

Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va 
>> escriure:
>>> Hello,
>>> the attached patch replaced Poppler's internal hash table implementation
>>> GooHash by std::unordered_map found in the C++ standard library since
>>> C++11. This continues Poppler's slow drift towards standard library
>>> containers and removes one of the home-grown data structures with the
>>> main goals of reducing code size and improving the long term
>>> maintainability of the code base.
>> Do you have any benchmarks on "rendering" speed and code size?
> Sorry, not at hand. I will try to prepare them during the week.

I did run Splash rendering benchmarks of this branch against master with
the result of rendering the circa 2400 documents of the TeXLive
documentation present on my machine being:

Cumulative run time:
        Result: 90.95 min ∓ 1.1 %
        Reference: 91.57 min ∓ 1.2 %
        Deviation: -0.0 %
Cumulative memory usage:
        Result: 37.2 MB ∓ 0.7 %
        Reference: 37.0 MB ∓ 0.7 %
        Deviation: +0.0 %

(Where result is this patch and the reference is master.) (The
measurement was taken using the perftest script which I proposed here
some time ago for which I'll attach the patch again for reproduceability.)

I'll also attach the detailed comparison, but the gist seems to be that
if there are significant changes, the run time is reduced but the memory
usage is increased in the majority of cases. But this does not seem to
show up in the cumulative results.

Best regards, Adam.

P.S.: One could try to improve the memory usage by tuning the load
factor or calling shrink_to_fit where appropriate. Would you like me to
try to do this?

P.P.S.: One obvious area for improvement would be better
interoperability between GooString and std::string, i.e. not converting
them as C strings so that the length information does not need to be
recomputed. I will try to prepare this as a separate patch on top of
this one or should I include this here?

Best regards, Adam.

> Concerning code size, a release build of libpoppler.so goes from
>    text    data     bss     dec     hex filename
> 2464034  288852     360 2753246  2a02de libpoppler.so.73.0.0
> for the current master to
>    text    data     bss     dec     hex filename
> 2482129  288756     360 2771245  2a492d libpoppler.so.73.0.0
> with the patch applied, i.e. a 0.65% increase in binary size.
> Please note that in my original message, I was referring only to source
> code size, i.e.
> git diff --stat master...remove-goo-hash
>  18 files changed, 168 insertions(+), 803 deletions(-)
>> Cheers,
>>   Albert
> Best regards, Adam.
>>> Best regards, Adam.
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 525 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180220/88e78e00/attachment.sig>

More information about the poppler mailing list