[poppler] Malformed/random output for raw_order_layout with c++ interface

Albert Astals Cid aacid at kde.org
Wed Mar 2 21:15:18 UTC 2016


El Wednesday 02 March 2016, a les 11:24:09, Jeroen Ooms va escriure:
> I am trying to get the same (or similar) text output from the c++ interface
> as when using the 'pdftotext' utility without the -layout option.
> However raw_order_layout gives malformed output (no text at all for most
> pages):
> 
>   ustring str = p->text(p->page_rect(), page::raw_order_layout);
> 
> An example:
> 
>  - source: http://arxiv.org/pdf/1403.2805.pdf
>  - pdftotext default output: http://pastebin.com/raw/A93xPT4j
>  - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD
>  - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ
> 
> The last output is obviously malformed. It misses most text, has no spaces,
> etc. Also each time I run it, I get different results so it looks like
> there is a memory bug.
> 
> The source code of my bindings is on github:
> https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp

Maybe you can have a look? The code of pdftotext is pretty small so looking at 
the cpp frontend and looking what's wrong should not be very hard.

Cheers,
  Albert


More information about the poppler mailing list