[poppler] page.text() does not take page orientation into account?

Jeroen Ooms jeroen.ooms at stat.ucla.edu
Wed Apr 13 23:57:14 UTC 2016


On Tue, Mar 8, 2016 at 2:34 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:
> When extracting text from a landscape pdf file using the cpp
> interface, text at the far right of the page does not get extracted .I
> think the problem is that page.text() always assumes portrait
> orientation and hence underestimates the width of the page:
>
>   p->text()
>   p->text(p->page_rect())
>
> Is this expected? What is the best way to extract all text from the
> page, irrespective of size and orientation?
>
> An example landscape pdf is here:
> https://github.com/ropensci/pdftools/files/161587/waurika_news_democrat.pdf

I would still be very interested in a fix or workaround for this
problem. I tried looking through the source but I don't understand it
well enough to figure out what is going wrong here. All help would be
really appreciated.


More information about the poppler mailing list