[poppler] Malformed/random output for raw_order_layout with c++ interface

Albert Astals Cid aacid at kde.org
Wed Mar 2 21:15:18 UTC 2016

El Wednesday 02 March 2016, a les 11:24:09, Jeroen Ooms va escriure:
> I am trying to get the same (or similar) text output from the c++ interface
> as when using the 'pdftotext' utility without the -layout option.
> However raw_order_layout gives malformed output (no text at all for most
> pages):
>   ustring str = p->text(p->page_rect(), page::raw_order_layout);
> An example:
>  - source: http://arxiv.org/pdf/1403.2805.pdf
>  - pdftotext default output: http://pastebin.com/raw/A93xPT4j
>  - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD
>  - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ
> The last output is obviously malformed. It misses most text, has no spaces,
> etc. Also each time I run it, I get different results so it looks like
> there is a memory bug.
> The source code of my bindings is on github:
> https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp

Maybe you can have a look? The code of pdftotext is pretty small so looking at 
the cpp frontend and looking what's wrong should not be very hard.


More information about the poppler mailing list