[poppler] Python - PDF - hightlighted text (not annotation popup) - How to extract it text

bruno gallart bruno.gallart at orange.fr
Sun Nov 9 08:41:49 PST 2014


Bon dia Albert e mercès per ta responta,
(soi pas catalan mas lengadocian de Besièrs e mon catalan es fòrt luènh)

Thanks for your response Albert,

But I have readen the  poppler's Api and I does not see the object and 
the method for this (/Rect  ---> extract text with x,y coordonates). My 
question is quite boring but do you know the object that I must use to 
do this extraction ?

Thanks a lot
Gràcies molt

Bruno

Le 09/11/2014 16:48, Albert Astals Cid a écrit :
> El Diumenge, 9 de novembre de 2014, a les 10:38:08, bruno gallart va escriure:
>> Hello,
>>
>> I read many pdf's texts. I don't do annotations popup but I only highlight
>> text in yellow. I wanted to extract (with Python) this text to do some
>> indexation with Whoosh after for my studies. I saw that when the text is
>> highlihted the object created in the PDF's file is:
>>
>> 20 0 obj
>>
>> <<
>>
>> /C [1 1 0]
>>
>> /F 4
>>
>> /M (D:20141107203743+01'00')
>>
>> /P 7 0 R
>>
>> /T (bruno)
>>
>> /AP <<
>>
>> /N 31 0 R
>>
>>
>>
>> /NM (38048b89-6e9f-4434-9cae2b25dfc8c8a2)
>>
>> /Rect [112.707338 807.385499 164.672639 816.770264]
>>
>> /Subj (Surligner)
>>
>> /Subtype /Highlight
>>
>> /QuadPoints [114.570002 816.770274 162.809979 816.770274 114.570002
>> 807.385508 162.809979 807.385508]
>>
>> /CreationDate (D:20141107203743+01'00')
>>
>>
>>
>> endobj`<<
>>
>> Unlike a classical annotations here there is not the key " /Contents" and it
>> is my problem. I have tried pdfMiner, pyPDF, PyPDF2  and  now pyPoppler but
>> but ... I am not very good and don't find the way to extract the line I
>> want.
>>
>> My question:
>>
>> The key /QuadPoints can give me a link for the text highlighted ? Or is the
>> key /Rect can do this ?
> They are both "the same", seems in this case Rect has a bit more of "padding"
> but they depict the same area.
>
> Yes you should be able to use that rect to get the text in there.
>
> Cheers,
>    Albert
>
>> If somebody can give me some advices I will be happy.
>>
>> Thanks for your patience
>>
>> Bruno
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list