[Poppler-bugs] [Bug 88532] libpoppler cannot parse some PDF files
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Jul 15 14:49:19 UTC 2016
https://bugs.freedesktop.org/show_bug.cgi?id=88532
--- Comment #1 from Gunter Ohrner <mails.bugs.freedesktop.org-2009 at gunter.ohrner.net> ---
I'm currently encountering the same bug with libpoppler 0.41 on Ubuntu 16.04.
As with Peter's document, it's an accounting statement file as well in my case.
All poppler-based tools seem to fail, even pdftotext. This effectively makes
the affected PDFs unprintable with a standard Linux system.
This problem had also been reported to Debian in 2013, however the resulting
Bug report got somehow messed up and later contained references to different
actual bugs. So don't be confused by its "fixed" state, this concrete problem
mentioned here isn't: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=698456
However, some analysis was done in this report:
> From: Luc Maisonobe <luc at spaceroots.org>
> To: 698456 at bugs.debian.org
> Subject: further investigation
> Date: Sat, 19 Jan 2013 19:23:24 +0100
> I have continued debugging the problem. It appears it is really linked
> to loading the LCL1Medium type1 font which is embedded into the file (it
> is object 32 in the file I transformed with qpdf).
>
> The analysis of the font starts correctly (parsing header
> %!PS-AdobeFont-1.0: LCL1Medium, ignoring a few parameters like /Notice,
> /FullName, /FamilyName, ... extracting /FontName, recognizing /Encoding
> 256 array , ignoring a suite of PostScript commands anf recognizing a
> series of dup commands like dup 32 /space put, dup 33 /exclam put ...
>
> The problem occurs just after the current dict is closed by the
> currentdict end instruction. The next instruction is currentfile eexec
> which is followed by a large binary blob, which itself is followed by 8
> lines composed of 64 characters '0' and an end of line marker (\r),
> followed by a cleartomark line. The binary blob is parsed as if it were
> composed of ASCII lines, and of course most of the lines exceed the 255
> characters length. In fact, as soon as the PostScript instruction eexec
> is encountered on currentfile (i.e. when currentfile eexec is detected
> on a line), the parser should not look for end of lines directly
> anymore, but look if the following blob is an encrypted binary. In this
> case, it should probably decrypt it. The encryption/decryption is
> described in the Adobe Type 1 Font format book, at chapter 7 (I have
> found this book online using a simple web search).
>
> So as a summary, the problem in the file I provided is due to a font
> loading which parses encrypted characters as if they were not encrypted,
> which then confuses the line breaks algorithm.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160715/0b49fda2/attachment.html>
More information about the Poppler-bugs
mailing list