[poppler] More speed improvements: ~19% improvement for loading PDF 1.6 reference document.

Krzysztof Kowalczyk kkowalczyk at gmail.com
Sun Aug 13 20:13:08 PDT 2006


Hello,

I've updated bug https://bugs.freedesktop.org/show_bug.cgi?id=7808
that had my patch that improved loading of my test document by ~10%.

I've created two new patches that, when combined, provide ~19% speed
improvement when loading PDFReference16.pdf document (PDF reference
from Adobe website).

It cleans up previous patch and adds additional improvements.

Brief overview of changes:
* make GooString use internal buffer for short strings; re-factor
GooString to remove code duplication
* gfree() doesn't have to check for NULL pointer (C library does it
anyway, it's in the C ISO standard). gfree() is called so often that
removing that check improves the speed by 1%
* make UGooString use internal buffer as well; refactor the code to
make it more like GooString
* Parser::getObj(): make 'key' to be UGooString to avoid creating
temporary objects since dictAdd() uses UGooString as the argument
* Lexer::lookChar() and Lexer::getChar() - getChar() is often called
right after lookChar() (for about 30% of all getChar()s). Currently it
has to re-do all the work that lookChar() did. A very simple
optimization is to cache the last value of lookChar() and return it in
getChar() if available.
* PageLabelInfo.cc: #include <config.h> since it's needed for
compilation on Windows

Most of those changes reduce the number of malloc()/free() calls.

There are still plenty of opportunities for improvements since
malloc/free still take a significant portion of the time doing
loading. This is mostly due to a lot of copying between instances  of
Object, which require re-allocating memory for string/name/cmd
objects. I tried a simple optimization of making Object::string member
to be inline as opposed to a pointer, which would reduce number of
allocation, but current code is very undisciplened about ownership of
objects which wreaks havoc in ways I'm unable to debug at this time.

Other costly operations are Lexer::getChar() and Lexer::getObj().

-- kjk | http://blog.kowalczyk.info


More information about the poppler mailing list