[poppler] [PATCH] ~10% speedup for loading/parsing a PDF file through simple GooString optimization

Reece Dunn msclrhd at hotmail.com
Mon Aug 14 07:37:18 PDT 2006

Leonard Rosenthol wrote:

> At 12:03 AM 8/14/2006, Krzysztof Kowalczyk wrote:
> >Looking at the profile data, it looks like a lot of allocations are
> >due to copying between instances of Object that are obtained from
> >Lexer::getObj(). The best solution would probably be to rework
> >Lexer::getObj() and its callers to not do copying.
>          Seems reasonable.
> >However, the code is very undisciplined about how it copies data
> >(sometimes it's the expensive, deep copy, sometimes it's a shallow
> >copy that just copies data like Object obj = *objOrig). Given that
> >it's hard to make a change without breaking stuff. My attempt at a
> >simple change like making Object::string an embedded value instead of
> >a pointer failed and I don't even understand why.
>          What about using a smart pointer?  Something as simple as 
> std::auto_ptr<> for a start - or possibly going all the way to 
> boost::shared_ptr<>.

These won't work as you need to call Object::release() (or whatever
the cleanup function is - I can't remember at the moment).

The problem is that Object is doing two jobs - it is acting as an owner
of the data (i.e. the deep copy) and those that reference the main
object (i.e. the shallow copy).

I would ideally like to see something like this:

struct Object
   Object(){ ... }
   ~Object(){ release(); }

so that there are no explicit calls to clean up the object references.
This would simplify all the explicit cleanup calls, removing the need
for a lot of the goto errN calls.

The first step would be to separate out Object (deep copy) from
ObjectRef (shallow copy).

I have made several attempts at cleaning up the XPDF codebase
as it does not make use of the C++ library (GooString vs. std::string,
GooList vs. std::list, qsort vs. std::sort), uses goto for resource
cleanup instead of RAII (Resource Aquisition Is Initialization - i.e.
constructors aquire a resource, destructors clean that resource up),
etc. These attempts have met with various degrees of success.

What I have been doing recently is having a build of the Poppler
code and a build of the improved code. Then, run pdftotext from
both builds on a set of PDF documents and compare. This is good
at identifying regressions.

Q: Is there any point in having GBool/gTrue/gFalse now since only
the very ancient of C++ compilers don't support bool/true/false?

- Reece
Be one of the first to try Windows Live Mail.

More information about the poppler mailing list