[Poppler-bugs] [Bug 94518] New: raw_order_layout completely broken

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Mar 12 20:47:45 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=94518

            Bug ID: 94518
           Summary: raw_order_layout completely broken
           Product: poppler
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: cpp frontend
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: jeroen.ooms at stat.ucla.edu

See also: https://lists.freedesktop.org/archives/poppler/2016-March/011727.html

Extracting text with raw_order_layout gives malformed and random output (no
text at all for most pages):

  ustring str = p->text(p->page_rect(), page::raw_order_layout);

An example:

 - source: http://arxiv.org/pdf/1403.2805.pdf
 - pdftotext default output: http://pastebin.com/raw/A93xPT4j
 - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD
 - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ

Output misses most text, has no spaces, etc. Also each time I run it, I get
different results so it looks like there is a memory bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160312/e6192a77/attachment.html>


More information about the Poppler-bugs mailing list