[Poppler-bugs] [Bug 69454] New: pdftohtml should include charset encoding in head section

Tue Sep 17 01:36:54 PDT 2013

https://bugs.freedesktop.org/show_bug.cgi?id=69454

          Priority: medium
            Bug ID: 69454
          Assignee: poppler-bugs at lists.freedesktop.org
           Summary: pdftohtml should include charset encoding in head
                    section
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: samuel.thibault at ens-lyon.org
          Hardware: Other
            Status: NEW
           Version: unspecified
         Component: utils
           Product: poppler

Created attachment 85950
  --> https://bugs.freedesktop.org/attachment.cgi?id=85950&action=edit
test file

Hello,

After having converted a pdf file to html, all the UTF-8 characters such
as ● get bogus in the web browser, because the html file does not
advertise the character set encoding of the file. pdftohtml should add
this inside its <head>:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

Pino Toscano added on http://bugs.debian.org/722281 that “This is
added already in some occasions, but apparently not in frames when
doing the "complex HTML output".”

For instance, after converting http://brl.thefreecat.org/ghm13.pdf
(also attached here), ghm13s.html does not contain any encoding.

Samuel

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130917/2449288e/attachment.html>