[poppler] Speed improvements - chapter eleven
Leonard Rosenthol
leonardr at pdfsages.com
Wed Sep 6 04:02:32 PDT 2006
At 03:48 AM 9/6/2006, Krzysztof Kowalczyk wrote:
>I attempt to fix this by adding a way to get direct access to Stream's
>underlying buffer. That way a client (e.g. a Lexer) can request a
>buffer and getChar() logic becomes very fast "if buffer not empty, get
>char from buffer, otherwise ask for another buffer".
>
>Frankly, I was disappointed that it's only ~~5%. I was expecting much
>more. It turns out that the culprit is current implementation of flate
>stream, which is frequently used to compress streams inside PDFs. It
>decompresses data in very small chunks (e.g. 8 bytes on average per
>getBuf() call in my test) so we don't save nearly as much as if we
>were getting, say, 256 bytes at a time. I'm working on improving that
>as well, but this change lays the necessary foundation.
Given these two things, why not consider reading an ENTIRE
PDF Stream into memory and decompressing it - thus turning what is
now a FlateStream->FileStream path with getChar() logic into a single
MemStream with getBuf() logic?? Yes, it will mean having the entire
stream in memory - but assuming a "PC" and not an embedded device,
it's pretty safe to assume memory is present. You could make it a
document load option and you could dispose the memory when the stream
is closed.
Leonard
---------------------------------------------------------------------------
Leonard Rosenthol <mailto:leonardr at pdfsages.com>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)
More information about the poppler
mailing list