[poppler] Extracting word and image position from PDF

Albert Astals Cid aacid at kde.org
Wed Feb 15 13:53:02 PST 2012


El Dimecres, 15 de febrer de 2012, a les 22:59:38, Dan Filimon va escriure:
> Hi everyone!
> 
> I've been looking for ways to extract image and word positions (also
> how words form sentences and paragraphs would be useful) from a PDF.
> I'd like to get maps of words/images to rectangles (position, width,
> height).
> 
> Also, it would really be great if I could get the positions and
> hierarchy for every object on a page (sorry about my vague terminology
> when it comes to PDF, I've never worked with it). I tried looking at
> the code but there don't seem to be many comments and I can't find any
> documentation...
> 
> Could you please point me in the right direction?

Poppler::Page::textList seems to be what you want

http://people.freedesktop.org/~aacid/docs/qt4/classPoppler_1_1Page.html#a75dea3bf58f339f224239b757b4c1bb2

Albert

> 
> Thanks a lot,
> Dan
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list