[poppler] configure text extraction within poppler-glib (WAS: disable form feeds using parameter "nopgbrk" in GlobalParams)

Jeremy Volkening jdv at base2bio.com
Wed Dec 7 17:14:11 UTC 2016


I'm the current maintainer of the Perl poppler bindings. A user has been 
trying to figure out how to configure the output of 
PopplerPage::get_text in poppler-glib. The perl bindings are just a very 
thin layer over poppler-glib using GObject introspection, so any 
solution would need to be found in the poppler-glib library itself.

He was focused in on the GlobalParams class, which is part of core 
poppler and is initialized within the PopplerDocument class of 
poppler-glib, but there is no public interface to it from within 
poppler-glib that I can see. He is concerned because he is seeing 
different behavior from the "get_text" method (which again in perl is 
just a direct binding to the poppler-glib method) on different platforms 
regarding line feed characters at the end of pages.

The general question is, I believe, is there any way to control the 
formatting of text extraction within poppler-glib?


More information about the poppler mailing list