[poppler] Loading documents from GInputStream?

Tommi Komulainen tko at litl.com
Mon May 12 04:14:55 PDT 2008


On Mon, May 12, 2008 at 11:08 AM, Albert Astals Cid <aacid at kde.org> wrote:
> A Dilluns 12 Maig 2008, Tommi Komulainen va escriure:
>
>
> > Hi,
>  >
>  > I'm investigating the possibility of loading poppler documents with
>  > GInputStream and I'm wondering if there's already someone working on
>  > it or have some ideas how to proceed? Quickly looking at Stream.h it
>  > would appear to me that poppler isn't prepared to load files from
>  > actual no seeking streams but rather expects to have all content
>  > available when starting. Please correct me if I'm wrong.
>  >
>  > I'm coming to this from browser perspective. The idea would be to
>  > begin displaying the PDF content as soon as possible, without having
>  > to wait for the whole document to download before displaying the title
>  > page.
>  >
>  > Another option I'm considering would be to direct poppler into loading
>  > a temporary, incomplete, file and knowing the expected size of the
>  > file deal with temporary EOF intelligently.
>  >
>  > Thoughts?
>
>  The XRef of the file is at the end so you need all the file to be able to
>  process it.
>
>  Then there are that "web optimized" PDF files that have multiple XRef that
>  form independent parts inside the PDF file so you can load a part of it as
>  soon as you find the first XRef, poppler does not any sort of intelligent
>  algorithm to work with partially downloaded streams.

Ah, evil. Forgive my ignorance about PDF format, but I take it that
you really need the XRef to be able to display anything? It's not like
you'd only lose images or so?

And I'd guess there also no way of telling beforehand whether a file
is 'web optimized' or not?


> I think it would be a nice addition to have just not sure what kind of api
> we'd need.

Given the need for random access I guess you'd need to store the whole
file in memory anyway. And the getChar() implementation could just
block reading the stream when necessary. Would be simple, but far from
optimal.

An alternative to blocking on the stream could signal 'try again
later' but I'm not sure what poppler could do in such case. Skip to
processing some other part of the file?


- Tommi


More information about the poppler mailing list