[Poppler-bugs] [Bug 75232] poppler can't parse seemingly OK PDF file

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Feb 20 08:09:19 PST 2014


https://bugs.freedesktop.org/show_bug.cgi?id=75232

--- Comment #3 from Hib Eris <hib at hiberis.nl> ---
The reason a document length is allowed to not match a linearization dict
length is that a pdf document can be modified by adding extra objects and a new
xref to the end of a document. Clearly such a *modified* document has a length
that is larger than the length specified in the linearization dict in the
original document. When parsing a *modified* document one should not rely on
the information in the linearization dict and/or hints table and fall back to
the parsing method of non linearized documents.

In the particular case of the document in
http://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r36q.pdf, the document is probably
modified and therefore threated as a non linearized document. However, when
parsing it as a non linearized document, it appears to be very broken, and
therefore poppler tries to reconstruct an xref table. That does not seem to
work well, thus failing to render the *modified* document.

Albert's patch adds a fallback which causes poppler to render the original
*unmodified* linearized document. 

Now, the question is, is it usefull to present the orignal *unmodified*
document to the user when the *modified* document is broken?

I think it is not, because the document is clearly modified for a reason and
presenting a document without the modifications is giving the user a false
representation of it.

But maybe that is what we always do to some extend with broken documents, so
for me it can go either way.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140220/d5e85f69/attachment.html>


More information about the Poppler-bugs mailing list