[Poppler-bugs] [Bug 18460] New: pdftohtml puts garbage into <title> tags and " Document Outline"
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sun Nov 9 16:38:39 PST 2008
http://bugs.freedesktop.org/show_bug.cgi?id=18460
Summary: pdftohtml puts garbage into <title> tags and "Document
Outline"
Product: poppler
Version: unspecified
Platform: Other
URL: http://www.maths.mq.edu.au/~ross/poppler/ZhangPeng/readm
e.html
OS/Version: Mac OS X (All)
Status: NEW
Severity: minor
Priority: low
Component: general
AssignedTo: poppler-bugs at lists.freedesktop.org
ReportedBy: ross at maths.mq.edu.au
Created an attachment (id=20170)
--> (http://bugs.freedesktop.org/attachment.cgi?id=20170)
Zhang Peng's PDF; it contains a chinese font
With the attached PDF, (supplied by Zhang Peng for another purpose)
http://lists.freedesktop.org/archives/poppler/2008-November/004216.html
pdftohtml fails to set the <title> tags correctly, resulting in invalid UTF8
bytes <FE><FF> .
Within the "Document Outline" section, both entries start this way, with the
first
being followed by more garbage.
This can be seen at the URL stated for this bug report:
http://www.maths.mq.edu.au/~ross/poppler/ZhangPeng/readme.html
(You may need to set the encoding manually to UTF8.)
Facts:
-----
The document contains chinese characters, with the following font info:
<</Subtype/Type0
/DescendantFonts 33 0 R
/BaseFont/AdobeSongStd-Light
/Encoding/UniGB-UCS2-H
/Type/Font>>
There is no embedded CMap resource:
> pdffonts readme.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
AdobeSongStd-Light CID Type 0 no no no 32 0
Observations:
-----------
(see also
http://lists.freedesktop.org/archives/poppler/2008-November/004220.html)
pdftotext worked fine for me,
both with Poppler v0.8.2 and Poppler v0.10.0
However there were problems with readme.pdf
when using other software.
e.g., Adobe Reader v8.1.0 and v9.0.0
both showed just blank pages;
Adobe Acrobat Pro v8.1.2
displayed the PDF just fine
Preview (MacOS X, v10.4.11)
displayed the PDF just fine
pdftohtml translated the PDF to a 2-page HTML, with frames
*but* there were some errors.
--
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the Poppler-bugs
mailing list