[Poppler-bugs] [Bug 69454] New: pdftohtml should include charset encoding in head section
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Sep 17 01:36:54 PDT 2013
https://bugs.freedesktop.org/show_bug.cgi?id=69454
Priority: medium
Bug ID: 69454
Assignee: poppler-bugs at lists.freedesktop.org
Summary: pdftohtml should include charset encoding in head
section
Severity: normal
Classification: Unclassified
OS: All
Reporter: samuel.thibault at ens-lyon.org
Hardware: Other
Status: NEW
Version: unspecified
Component: utils
Product: poppler
Created attachment 85950
--> https://bugs.freedesktop.org/attachment.cgi?id=85950&action=edit
test file
Hello,
After having converted a pdf file to html, all the UTF-8 characters such
as ● get bogus in the web browser, because the html file does not
advertise the character set encoding of the file. pdftohtml should add
this inside its <head>:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Pino Toscano added on http://bugs.debian.org/722281 that “This is
added already in some occasions, but apparently not in frames when
doing the "complex HTML output".”
For instance, after converting http://brl.thefreecat.org/ghm13.pdf
(also attached here), ghm13s.html does not contain any encoding.
Samuel
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130917/2449288e/attachment.html>
More information about the Poppler-bugs
mailing list