<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - poppler can't parse seemingly OK PDF file"
href="https://bugs.freedesktop.org/show_bug.cgi?id=75232#c3">Comment # 3</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW --- - poppler can't parse seemingly OK PDF file"
href="https://bugs.freedesktop.org/show_bug.cgi?id=75232">bug 75232</a>
from <span class="vcard"><a class="email" href="mailto:hib@hiberis.nl" title="Hib Eris <hib@hiberis.nl>"> <span class="fn">Hib Eris</span></a>
</span></b>
<pre>The reason a document length is allowed to not match a linearization dict
length is that a pdf document can be modified by adding extra objects and a new
xref to the end of a document. Clearly such a *modified* document has a length
that is larger than the length specified in the linearization dict in the
original document. When parsing a *modified* document one should not rely on
the information in the linearization dict and/or hints table and fall back to
the parsing method of non linearized documents.
In the particular case of the document in
<a href="http://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r36q.pdf">http://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r36q.pdf</a>, the document is probably
modified and therefore threated as a non linearized document. However, when
parsing it as a non linearized document, it appears to be very broken, and
therefore poppler tries to reconstruct an xref table. That does not seem to
work well, thus failing to render the *modified* document.
Albert's patch adds a fallback which causes poppler to render the original
*unmodified* linearized document.
Now, the question is, is it usefull to present the orignal *unmodified*
document to the user when the *modified* document is broken?
I think it is not, because the document is clearly modified for a reason and
presenting a document without the modifications is giving the user a false
representation of it.
But maybe that is what we always do to some extend with broken documents, so
for me it can go either way.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>