[Poppler-bugs] [Bug 103798] New: libpoppler cannot recreate pdftotext output, because physical_layout is not handled correctly

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Nov 17 17:02:49 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=103798

            Bug ID: 103798
           Summary: libpoppler cannot recreate pdftotext output, because
                    physical_layout is not handled correctly
           Product: poppler
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: critical
          Priority: medium
         Component: cpp frontend
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: dummydummy at gmx.fr

Dear maintainer, this bug concerns poppler 0.48.0 up to at least 0.60.1

in file .../gcc/poppler-page.cpp

the function     
         ustring page::text(const rectf &r, text_layout_enum layout_mode) const

when called with  physical_layout  as  layout_mode  incorrectly creates a 
TextOutputDev with second parameter (supposed to be true for physical_layout)
always set to gFalse, because the corresponding code in lines 272 and 273
(poppler 0.60.1) are 

    const GBool use_raw_order = (layout_mode == raw_order_layout);
    TextOutputDev td(0, gFalse, 0, use_raw_order, gFalse);


By contrast the pdftotext.cc creates TextOutputDev with second parameter set to
gTrue when called with the -layout command line option.

THE EFFECT, is that the text produced inside a program using libpoppler differs
from the more faithful text (which has, for example, blank lines where
required) produced by invoking pdftotext with the -layout option.

Would the following be a solution?
    const GBool use_raw_order = (layout_mode == raw_order_layout);
    const GBool use_physical_layout = !use_raw_order;
    TextOutputDev td(0, use_physical_layout, 0, use_raw_order, gFalse);

I would be grateful, if this could be fixed.
The alternative I do not relish, would appear to be to compile virtually all of
the poppler source code into my program, just to give it access to
TextOutputDev and thus be able to call it with gTrue as second parameter. This
does not appear to be what libpoppler is supposed to be for.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171117/2535ac6c/attachment-0001.html>


More information about the Poppler-bugs mailing list