<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> <head> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"> <meta name=Generator content="Microsoft Word 12 (filtered medium)"> <style>  </style>  </head> <body lang=FR link=blue vlink=purple> <div class=Section1> Hello,<o:p></o:p> I read many pdf's texts. I don't do annotations popup but I only highlight text in yellow. I wanted to extract (with Python) this text to do some indexation with Whoosh after for my studies. I saw that when the text is highlihted the object created in the PDF's file is: <o:p></o:p> 20 0 obj<o:p></o:p> <<<o:p> </o:p> /C [1 1 0]<o:p></o:p> /F 4<o:p></o:p> /M (D:20141107203743+01'00')<o:p></o:p> /P 7 0 R<o:p></o:p> /T (bruno)<o:p></o:p> /AP <<<o:p></o:p> /N 31 0 R<o:p></o:p> >><o:p> </o:p> /NM (38048b89-6e9f-4434-9cae2b25dfc8c8a2)<o:p></o:p> /Rect [112.707338 807.385499 164.672639 816.770264]<o:p></o:p> /Subj (Surligner)<o:p></o:p> /Subtype /Highlight<o:p></o:p> /QuadPoints [114.570002 816.770274 162.809979 816.770274 114.570002 807.385508 162.809979 807.385508]<o:p></o:p> /CreationDate (D:20141107203743+01'00')<o:p></o:p> >><o:p> </o:p> endobj`<<<o:p></o:p> Unlike a classical annotations here there is not the key " /Contents" and it is my problem. I have tried pdfMiner, pyPDF, PyPDF2 and now pyPoppler but but ... I am not very good and don't find the way to extract the line I want.<o:p></o:p> My question:<o:p></o:p> The key /QuadPoints can give me a link for the text highlighted ? Or is the key /Rect can do this ?<o:p></o:p> If somebody can give me some advices I will be happy.<o:p></o:p> Thanks for your patience <o:p></o:p> Bruno<o:p></o:p> </div> </body> </html>