[poppler] Speed improvements - chapter eleven

Wed Sep 6 04:02:32 PDT 2006

At 03:48 AM 9/6/2006, Krzysztof Kowalczyk wrote:
>I attempt to fix this by adding a way to get direct access to Stream's
>underlying buffer. That way a client (e.g. a Lexer) can request a
>buffer and getChar() logic becomes very fast "if buffer not empty, get
>char from buffer, otherwise ask for another buffer".
>
>Frankly, I was disappointed that it's only ~~5%. I was expecting much
>more. It turns out that the culprit is current implementation of flate
>stream, which is frequently used to compress streams inside PDFs. It
>decompresses data in very small chunks (e.g. 8 bytes on average per
>getBuf() call in my test) so we don't save nearly as much as if we
>were getting, say, 256 bytes at a time. I'm working on improving that
>as well, but this change lays the necessary foundation.

         Given these two things, why not consider reading an ENTIRE 
PDF Stream into memory and decompressing it - thus turning what is 
now a FlateStream->FileStream path with getChar() logic into a single 
MemStream with getBuf() logic??   Yes, it will mean having the entire 
stream in memory - but assuming a "PC" and not an embedded device, 
it's pretty safe to assume memory is present.  You could make it a 
document load option and you could dispose the memory when the stream 
is closed.


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:leonardr at pdfsages.com>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                              215-938-0880 (fax)