[poppler] Speed improvements - chapter eleven

Wed Sep 6 11:22:55 PDT 2006

On 9/6/06, Leonard Rosenthol <leonardr at pdfsages.com> wrote:
> At 03:48 AM 9/6/2006, Krzysztof Kowalczyk wrote:
> >Frankly, I was disappointed that it's only ~~5%. I was expecting much
> >more. It turns out that the culprit is current implementation of flate
> >stream, which is frequently used to compress streams inside PDFs. It
> >decompresses data in very small chunks (e.g. 8 bytes on average per
> >getBuf() call in my test) so we don't save nearly as much as if we
> >were getting, say, 256 bytes at a time. I'm working on improving that
> >as well, but this change lays the necessary foundation.
>
>          Given these two things, why not consider reading an ENTIRE
> PDF Stream into memory and decompressing it - thus turning what is
> now a FlateStream->FileStream path with getChar() logic into a single
> MemStream with getBuf() logic??   Yes, it will mean having the entire
> stream in memory - but assuming a "PC" and not an embedded device,
> it's pretty safe to assume memory is present.  You could make it a
> document load option and you could dispose the memory when the stream
> is closed.

That should also improve speed. Although I don't think I'm gonna
attempt this optimization myself. All my optimization attempts are
guided by profiler output in order to try to get the biggest bang for
my time.

BTW: it looks like all my changes combined give me about 50% speedup
when loading PDFs and a measurable, but not dramatic, speedups when
rendering (i.e. between 1-15%, depending on the page).

-- kjk