[poppler] Malformed/random output for raw_order_layout with c++ interface

Jeroen Ooms jeroen.ooms at stat.ucla.edu
Wed Mar 2 10:24:09 UTC 2016


I am trying to get the same (or similar) text output from the c++ interface
as when using the 'pdftotext' utility without the -layout option.
However raw_order_layout gives malformed output (no text at all for most
pages):

  ustring str = p->text(p->page_rect(), page::raw_order_layout);

An example:

 - source: http://arxiv.org/pdf/1403.2805.pdf
 - pdftotext default output: http://pastebin.com/raw/A93xPT4j
 - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD
 - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ

The last output is obviously malformed. It misses most text, has no spaces,
etc. Also each time I run it, I get different results so it looks like
there is a memory bug.

The source code of my bindings is on github:
https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20160302/908a299f/attachment.html>


More information about the poppler mailing list