<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - libpoppler cannot parse some PDF files"
href="https://bugs.freedesktop.org/show_bug.cgi?id=88532#c1">Comment # 1</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - libpoppler cannot parse some PDF files"
href="https://bugs.freedesktop.org/show_bug.cgi?id=88532">bug 88532</a>
from <span class="vcard"><a class="email" href="mailto:mails.bugs.freedesktop.org-2009@gunter.ohrner.net" title="Gunter Ohrner <mails.bugs.freedesktop.org-2009@gunter.ohrner.net>"> <span class="fn">Gunter Ohrner</span></a>
</span></b>
<pre>I'm currently encountering the same bug with libpoppler 0.41 on Ubuntu 16.04.
As with Peter's document, it's an accounting statement file as well in my case.
All poppler-based tools seem to fail, even pdftotext. This effectively makes
the affected PDFs unprintable with a standard Linux system.
This problem had also been reported to Debian in 2013, however the resulting
Bug report got somehow messed up and later contained references to different
actual bugs. So don't be confused by its "fixed" state, this concrete problem
mentioned here isn't: <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=698456">https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=698456</a>
However, some analysis was done in this report:
<span class="quote">> From: Luc Maisonobe <<a href="mailto:luc@spaceroots.org">luc@spaceroots.org</a>>
> To: <a href="mailto:698456@bugs.debian.org">698456@bugs.debian.org</a>
> Subject: further investigation
> Date: Sat, 19 Jan 2013 19:23:24 +0100
> I have continued debugging the problem. It appears it is really linked
> to loading the LCL1Medium type1 font which is embedded into the file (it
> is object 32 in the file I transformed with qpdf).
>
> The analysis of the font starts correctly (parsing header
> %!PS-AdobeFont-1.0: LCL1Medium, ignoring a few parameters like /Notice,
> /FullName, /FamilyName, ... extracting /FontName, recognizing /Encoding
> 256 array , ignoring a suite of PostScript commands anf recognizing a
> series of dup commands like dup 32 /space put, dup 33 /exclam put ...
>
> The problem occurs just after the current dict is closed by the
> currentdict end instruction. The next instruction is currentfile eexec
> which is followed by a large binary blob, which itself is followed by 8
> lines composed of 64 characters '0' and an end of line marker (\r),
> followed by a cleartomark line. The binary blob is parsed as if it were
> composed of ASCII lines, and of course most of the lines exceed the 255
> characters length. In fact, as soon as the PostScript instruction eexec
> is encountered on currentfile (i.e. when currentfile eexec is detected
> on a line), the parser should not look for end of lines directly
> anymore, but look if the following blob is an encrypted binary. In this
> case, it should probably decrypt it. The encryption/decryption is
> described in the Adobe Type 1 Font format book, at chapter 7 (I have
> found this book online using a simple web search).
>
> So as a summary, the problem in the file I provided is due to a font
> loading which parses encrypted characters as if they were not encrypted,
> which then confuses the line breaks algorithm.</span ></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>