[Poppler-bugs] [Bug 7808] New: Speedup PDF loading by ~10% by simple GooString optimization

bugzilla-daemon at annarchy.freedesktop.org bugzilla-daemon at annarchy.freedesktop.org
Mon Aug 7 22:56:51 PDT 2006


Please do not reply to this email: if you want to comment on the bug, go to    
       
the URL shown below and enter yourcomments there.     
   
https://bugs.freedesktop.org/show_bug.cgi?id=7808          
     
           Summary: Speedup PDF loading by ~10% by simple GooString
                    optimization
           Product: poppler
           Version: unspecified
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: poppler-bugs at lists.freedesktop.org
        ReportedBy: kkowalczyk at gmail.com


poppler allocates and frees a lot of objects, especially GooString objects. In
my profiling malloc() and free() are at the top of functions that take most of
the time.

I instrumented the code to record sizes of GooStrings at the time of
destruction. It turns out that ~90% of them are 16 bytes or less.

Currently allocating GooString() requires 2 malloc()s:
* one from new to allocate the object itself
* another one to allocate s pointer to a string

A very simple optimization is to keep a small buffer within GooString itself and
only allocate new space if the string gets bigger. This halfs the number of
malloc() calls for string that entirely fit in the buffer. In case of poppler, a
buffer of size 16 would eliminate 45% malloc()/free() calls that are caused by
GooString().

Another good part is that it doesn't even use more memory. malloc() rounds
allocation sizes anyway (by 16 bytes on Ubuntu, try printf("allocation rounding:
%d\n", -(int)((char*)malloc(1) - (char*)malloc(1)));) and the OS needs
book-keeping information (BOOK_KEEP_SIZE) which is implementation dependent but
usually not less than 8 bytes. Assuming those parameters, memory used by current
implementation of GooString is:
* BOOK_KEEP_SIZE + round_16(sizeof(GooString)) + round_16($str_size) +
BOOK_KEEP_SIZE = 2*BOOK_KEEP_SIZE + round_16(8) + round_16($str_size) =
2*8+16+round_16($str_size) = 32+round_16($str_size)
In implementation with static buffer, assuming buffer size 16:
* for $str_size <= 16: BOOK_KEEP_SIZE + round_16(sizeof(GooString) +
round_16($str_size) = 8 + round_16(24) = 8 + 32 = 40 so it's actually less real
memory used
For $str_size > 16 a bit more is used.

How to choose size of static space: currently I use 16 but it might be better to
use 24 (so that sizeof(GooString) is 32 because it'll probably be anyway due to
rounding to 16 bytes).

Does it give any real speedup? My test consisted of just loading (no rendering)
of PDFReference16.pdf file (PDF 1.6 spec available from Adobe website) which is
about 8.72 in size.

I used a release build and recorded user time averaged over 4 runs. I got a 10%
speedup (from 1453.095 milliseconds to 1303.7425 milliseconds).

On top of that, the implementation is dead simple.

Attached patch has a #define FAST_GOO_STRING in GooString.h so that interested
parties can easily compile both versions and compare the speeds.

It also has #define DO_HIST that adds code to collect string sizes at the time
of deletion. This shows that majority of strings in poppler is <=16 bytes.

Howerver, I do not recommend keeping those #defines in final version. It's
trivial to remove them.          
     
     
--           
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email         
     
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Poppler-bugs mailing list