[poppler] [RFC] Replace GooHash by std::unordered_map

Albert Astals Cid aacid at kde.org
Tue Feb 20 23:31:16 UTC 2018


El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va 
escriure:
> Hello again,
> 
> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va
> >> 
> >> escriure:
> >>> Hello,
> >>> 
> >>> the attached patch replaced Poppler's internal hash table implementation
> >>> GooHash by std::unordered_map found in the C++ standard library since
> >>> C++11. This continues Poppler's slow drift towards standard library
> >>> containers and removes one of the home-grown data structures with the
> >>> main goals of reducing code size and improving the long term
> >>> maintainability of the code base.
> >> 
> >> Do you have any benchmarks on "rendering" speed and code size?
> > 
> > Sorry, not at hand. I will try to prepare them during the week.
> 
> I did run Splash rendering benchmarks of this branch against master with
> the result of rendering the circa 2400 documents of the TeXLive
> documentation present on my machine being:

I'm wondering if those 2400 documents are diverse enough, which they may not 
be given they are all coming from "the same place".

> 
> Cumulative run time:
>         Result: 90.95 min ∓ 1.1 %
>         Reference: 91.57 min ∓ 1.2 %
>         Deviation: -0.0 %
> Cumulative memory usage:
>         Result: 37.2 MB ∓ 0.7 %
>         Reference: 37.0 MB ∓ 0.7 %
>         Deviation: +0.0 %
> 
> (Where result is this patch and the reference is master.) (The
> measurement was taken using the perftest script which I proposed here
> some time ago for which I'll attach the patch again for reproduceability.)

Did you really attach this before? i really don't remember it :D

Anyhow it seems the change is mostly a zero-sum runtime wise. 

Which bring us to the code side. First i know it sucks but your spacing is 
broken, please check the whole patch

-	nameToGID->removeInt(macGlyphNames[j]);
-	nameToGID->add(new GooString(macGlyphNames[j]), i);
+          nameToGID[macGlyphNames[j]] = i;

i could fix it, but it's better (for me) if you do :D

Now onto the code, 

  const auto emplaceRangeMap = [&](const char* encodingName, GBool unicodeOut, 
UnicodeMapRange* ranges, int len) {
    residentUnicodeMaps.emplace(
      std::piecewise_construct,
      std::forward_as_tuple(encodingName),
      std::forward_as_tuple(encodingName, unicodeOut, ranges, len)
    );
  };

It's the first time in my life i see std::piecewise_construct and 
std::forward_as_tuple, yes that probably makes me a bad C++ developer, but 
given there's lots of us around, it doesn't make me happy now i don't 
understand what the code does.

Then there's the benefit/risk ratio. The code using GooHash is mostly 
rocksolid and we haven't probably touched any line in it for a long time and 
we have probably neither written new code that uses GooHash.

In that regard we risk breaking working code.

The benefit is not more speed nor less memory usage as your measurements show.

Mostly the benefit is "removing code from maintainership", which i agree is a 
good thing, but as mentioned before it's some code "we mostly ignore", so it's 
not that much of a good thing.

So i'm hesitant of what to do. It really sounds like a nice thing to not have 
custom structures, but on the other hand it's not like they get much in the 
way of coding.

I'd really appreciate other people's comments on this.

Cheers,
  Albert

> 
> I'll also attach the detailed comparison, but the gist seems to be that
> if there are significant changes, the run time is reduced but the memory
> usage is increased in the majority of cases. But this does not seem to
> show up in the cumulative results.
> 
> Best regards, Adam.
> 
> P.S.: One could try to improve the memory usage by tuning the load
> factor or calling shrink_to_fit where appropriate. Would you like me to
> try to do this?
> 
> P.P.S.: One obvious area for improvement would be better
> interoperability between GooString and std::string, i.e. not converting
> them as C strings so that the length information does not need to be
> recomputed. I will try to prepare this as a separate patch on top of
> this one or should I include this here?
> 
> Best regards, Adam.
> 
> > Concerning code size, a release build of libpoppler.so goes from
> > 
> >    text    data     bss     dec     hex filename
> > 
> > 2464034  288852     360 2753246  2a02de libpoppler.so.73.0.0
> > 
> > for the current master to
> > 
> >    text    data     bss     dec     hex filename
> > 
> > 2482129  288756     360 2771245  2a492d libpoppler.so.73.0.0
> > 
> > with the patch applied, i.e. a 0.65% increase in binary size.
> > 
> > 
> > Please note that in my original message, I was referring only to source
> > code size, i.e.
> > 
> > git diff --stat master...remove-goo-hash
> > 
> >  18 files changed, 168 insertions(+), 803 deletions(-)
> >  
> >> Cheers,
> >> 
> >>   Albert
> > 
> > Best regards, Adam.
> > 
> >>> Best regards, Adam.
> >> 
> >> _______________________________________________
> >> poppler mailing list
> >> poppler at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/poppler






More information about the poppler mailing list