[Poppler-bugs] [Bug 103798] New: libpoppler cannot recreate pdftotext output, because physical_layout is not handled correctly
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Nov 17 17:02:49 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=103798
Bug ID: 103798
Summary: libpoppler cannot recreate pdftotext output, because
physical_layout is not handled correctly
Product: poppler
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: critical
Priority: medium
Component: cpp frontend
Assignee: poppler-bugs at lists.freedesktop.org
Reporter: dummydummy at gmx.fr
Dear maintainer, this bug concerns poppler 0.48.0 up to at least 0.60.1
in file .../gcc/poppler-page.cpp
the function
ustring page::text(const rectf &r, text_layout_enum layout_mode) const
when called with physical_layout as layout_mode incorrectly creates a
TextOutputDev with second parameter (supposed to be true for physical_layout)
always set to gFalse, because the corresponding code in lines 272 and 273
(poppler 0.60.1) are
const GBool use_raw_order = (layout_mode == raw_order_layout);
TextOutputDev td(0, gFalse, 0, use_raw_order, gFalse);
By contrast the pdftotext.cc creates TextOutputDev with second parameter set to
gTrue when called with the -layout command line option.
THE EFFECT, is that the text produced inside a program using libpoppler differs
from the more faithful text (which has, for example, blank lines where
required) produced by invoking pdftotext with the -layout option.
Would the following be a solution?
const GBool use_raw_order = (layout_mode == raw_order_layout);
const GBool use_physical_layout = !use_raw_order;
TextOutputDev td(0, use_physical_layout, 0, use_raw_order, gFalse);
I would be grateful, if this could be fixed.
The alternative I do not relish, would appear to be to compile virtually all of
the poppler source code into my program, just to give it access to
TextOutputDev and thus be able to call it with gTrue as second parameter. This
does not appear to be what libpoppler is supposed to be for.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171117/2535ac6c/attachment-0001.html>
More information about the Poppler-bugs
mailing list