[poppler] poppler really slow when reading some documents
Jonathan Blandford
jrb at redhat.com
Wed Jan 4 19:43:48 PST 2006
On Thu, 2006-01-05 at 01:19 +0100, Albert Astals Cid wrote:
> Well, doing some more investigation seems Acrobat detects there is something
> fishy but is not able to recover from it, open the document and try to
> navigate to 6.6.4.3 or 6.6.4.4 and you'll realize they are not in the TOC (at
> least in linux acrobat 7.0.1)
>
> As an idea to catch (not recover from) that kind of errors i think we could
> try something like
>
> if you had a \) then either a space a \r a \n or combinations of them and
> then /Dest [
> then we assume the \) was a closing ) really and return, we loose the current
> item but do not stay forever trying to find the end. That may introduce
> errors in case someone is such a ill person that introduces a TOC item with
> these exact characters.
>
> What do you think?
Thanks for tracking this down, Albert!
The algorithm you described will catch this particular instance, but
might not catch other such errors (such as a missing ')'.) I'm
wondering if it also makes sense to put a cap on string sizes for
certain fields. It would be interesting to generate some busted PDF
files along those lines to see how acroread handles those errors. That
might be instructive to get an idea of how tolerant of malformed PDFs we
should be.
Thanks,
-Jonathan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.freedesktop.org/archives/poppler/attachments/20060104/dab79eae/attachment.pgp
More information about the poppler
mailing list