[Poppler-bugs] [Bug 101770] pdftohtml: don't put control characters in output
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Jul 13 18:07:25 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=101770
Jason Crain <jason at inspiresomeone.us> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Is it possible to fix |pdftohtml: don't put
|special chars? |control characters in
| |output
Component|utils |pdftohtml
--- Comment #5 from Jason Crain <jason at inspiresomeone.us> ---
I'm not an expert on PHP but it looks like that is calling out to poppler's
pdftohtml and PHP seems to not like control characters in HTML. I also found
this secion in a W3C working draft:
https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#text-0
Text must not contain U+0000 characters. Text must not contain permanently
undefined Unicode characters (noncharacters). Text must not contain control
characters other than space characters.
So pdftohtml should probably not be putting control characters in its output.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20170713/1827ad3e/attachment.html>
More information about the Poppler-bugs
mailing list