[Poppler-bugs] [Bug 104230] New: When extracting as XML all new lines are stripped

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Dec 12 20:37:30 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=104230

            Bug ID: 104230
           Summary: When extracting as XML all new lines are stripped
           Product: poppler
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: pdftohtml
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: clark at electrobeat.dk

Created attachment 136123
  --> https://bugs.freedesktop.org/attachment.cgi?id=136123&action=edit
test pdf

pdftohtml -s -i -xml test.pdf out.xml

VS

pdftohtml -s -i test.pdf out.html

When you extract the text as HTML alle new lines are kept, but if you extract
the text as XML they are stripped out and each new line is put in a new tag

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171212/f029453a/attachment.html>


More information about the Poppler-bugs mailing list