[poppler] Confusion about poppler_page_get_text and poppler_page_get_text_layout

Rupert Swarbrick rswarbrick at gmail.com
Mon Dec 19 13:03:58 PST 2011

Hi all,

I'm messing around with a lisp FFI binding to poppler (via the glib
interface) and have bumped into a strange situation.

If I've understood the documentation correctly, poppler_page_get_text
and poppler_page_get_text_layout should give me a string and an array of
rectangles, respectively. The n'th rectangle should be the position on
the page of the n'th chanacter of the string.

If I'm right there, I'm confused. For the first PDF with which I've
tried this, I get a string of length 1541 and an array of rectangles of
length 1477. This is... mystifying!

I presume that I've misunderstood what's supposed to happen (since I
can't imagine that Evince would work on this system if I was
right!). Can anyone clear up what I'm getting wrong?


PS If I have understood the documentation correctly, then I've probably
   made a programming error, but I can't really see how I could end up
   with something "almost right" like this, so I thought I should ask a
   human first!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 315 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111219/b901ae4c/attachment.pgp>

More information about the poppler mailing list