[Poppler-bugs] [Bug 101770] pdftohtml: don't put control characters in output

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jul 13 18:07:25 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=101770

Jason Crain <jason at inspiresomeone.us> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Is it possible to fix       |pdftohtml: don't put
                   |special chars?              |control characters in
                   |                            |output
          Component|utils                       |pdftohtml

--- Comment #5 from Jason Crain <jason at inspiresomeone.us> ---
I'm not an expert on PHP but it looks like that is calling out to poppler's
pdftohtml and PHP seems to not like control characters in HTML.  I also found
this secion in a W3C working draft:

https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#text-0
Text must not contain U+0000 characters. Text must not contain permanently
undefined Unicode characters (noncharacters). Text must not contain control
characters other than space characters.

So pdftohtml should probably not be putting control characters in its output.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20170713/1827ad3e/attachment.html>


More information about the Poppler-bugs mailing list