[poppler] Malformed/random output for raw_order_layout with c++ interface
Albert Astals Cid
aacid at kde.org
Wed Mar 2 21:15:18 UTC 2016
El Wednesday 02 March 2016, a les 11:24:09, Jeroen Ooms va escriure:
> I am trying to get the same (or similar) text output from the c++ interface
> as when using the 'pdftotext' utility without the -layout option.
> However raw_order_layout gives malformed output (no text at all for most
> pages):
>
> ustring str = p->text(p->page_rect(), page::raw_order_layout);
>
> An example:
>
> - source: http://arxiv.org/pdf/1403.2805.pdf
> - pdftotext default output: http://pastebin.com/raw/A93xPT4j
> - cpp with page::physical_layout: http://pastebin.com/raw/MZFpTRbD
> - cpp with page::raw_order_layout http://pastebin.com/raw/n8dcsqkZ
>
> The last output is obviously malformed. It misses most text, has no spaces,
> etc. Also each time I run it, I get different results so it looks like
> there is a memory bug.
>
> The source code of my bindings is on github:
> https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp
Maybe you can have a look? The code of pdftotext is pretty small so looking at
the cpp frontend and looking what's wrong should not be very hard.
Cheers,
Albert
More information about the poppler
mailing list