<div dir="ltr">I am trying to get the same (or similar) text output from the c++ interface as when using the 'pdftotext' utility without the -layout option. However raw_order_layout gives malformed output (no text at all for most pages):<div><br></div><div> ustring str = p->text(p->page_rect(), page::raw_order_layout); </div><div><br></div><div>An example: </div><div> </div><div> - source: <a href="http://arxiv.org/pdf/1403.2805.pdf">http://arxiv.org/pdf/1403.2805.pdf</a></div><div> - pdftotext default output: <a href="http://pastebin.com/raw/A93xPT4j">http://pastebin.com/raw/A93xPT4j</a></div><div> - cpp with page::physical_layout: <a href="http://pastebin.com/raw/MZFpTRbD">http://pastebin.com/raw/MZFpTRbD</a></div><div> - cpp with page::raw_order_layout <a href="http://pastebin.com/raw/n8dcsqkZ">http://pastebin.com/raw/n8dcsqkZ</a></div><div><br></div><div>The last output is obviously malformed. It misses most text, has no spaces, etc. Also each time I run it, I get different results so it looks like there is a memory bug.</div><div><br></div><div>The source code of my bindings is on github: <a href="https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp">https://github.com/ropensci/pdftools/blob/master/src/bindings.cpp</a></div><div><br></div><div><br></div><div><br></div>
</div>