[Poppler-bugs] [Bug 56293] New: Incorrect positioning of text in PDFTOHTML

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Oct 22 13:17:34 PDT 2012


https://bugs.freedesktop.org/show_bug.cgi?id=56293

          Priority: medium
            Bug ID: 56293
          Assignee: poppler-bugs at lists.freedesktop.org
           Summary: Incorrect positioning of text in PDFTOHTML
          Severity: normal
    Classification: Unclassified
                OS: All
          Reporter: erik.engstroem at gmail.com
          Hardware: Other
            Status: NEW
           Version: unspecified
         Component: pdftohtml
           Product: poppler

Created attachment 68923
  --> https://bugs.freedesktop.org/attachment.cgi?id=68923&action=edit
pdf file inhibiting this behavior

PDFTOHTML converts text positions on certain PDF documents incorrect. Attached
is a document in which this happens. 

The following logic explains this further:
The size of an image of the first page is 1024x1408. The text "Brief article"
which can be seen highlighted should be positioned 19% from the top as seen
here:
http://imageshack.us/a/img526/6343/textshiftedpdf1.png

Poppler outputs this text with the following data when using pdftohtml -xml
<text top="409" left="447" width="80" height="15" font="0">Brief article</text>

The dimensions of this page according to poppler taken from the same xml file:
<page number="1" position="absolute" top="0" left="0" height="1488"
width="1063">

This would give us that the text should be according to poppler be positioned:
409/1488=0.27=27% which is clearly wrong. 

No other warning messages or errors were noted when converting this document

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20121022/7d0e8ef8/attachment.html>


More information about the Poppler-bugs mailing list