[poppler] configure text extraction within poppler-glib (WAS: disable form feeds using parameter "nopgbrk" in GlobalParams)
jdv at base2bio.com
Wed Dec 7 17:14:11 UTC 2016
I'm the current maintainer of the Perl poppler bindings. A user has been
trying to figure out how to configure the output of
PopplerPage::get_text in poppler-glib. The perl bindings are just a very
thin layer over poppler-glib using GObject introspection, so any
solution would need to be found in the poppler-glib library itself.
He was focused in on the GlobalParams class, which is part of core
poppler and is initialized within the PopplerDocument class of
poppler-glib, but there is no public interface to it from within
poppler-glib that I can see. He is concerned because he is seeing
different behavior from the "get_text" method (which again in perl is
just a direct binding to the poppler-glib method) on different platforms
regarding line feed characters at the end of pages.
The general question is, I believe, is there any way to control the
formatting of text extraction within poppler-glib?
More information about the poppler