<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
margin-bottom:5.95pt;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=FR link=blue vlink=purple>
<div class=Section1>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>Hello,<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>I read many
pdf's texts. I don't do annotations popup but I only highlight text in yellow.
I wanted to extract (with Python) this text to do some indexation with Whoosh
after for my studies. I saw that when the text is highlihted the object created
in the PDF's file is: <o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>20 0 obj<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR><<<o:p> </o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/C [1 1 0]<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/F 4<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/M
(D:20141107203743+01'00')<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/P 7 0 R<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/T (bruno)<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/AP
<<<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/N 31 0 R<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>>><o:p> </o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/NM
(38048b89-6e9f-4434-9cae2b25dfc8c8a2)<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/Rect
[112.707338 807.385499 164.672639 816.770264]<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/Subj (Surligner)<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/Subtype
/Highlight<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/QuadPoints
[114.570002 816.770274 162.809979 816.770274 114.570002 807.385508 162.809979
807.385508]<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>/CreationDate
(D:20141107203743+01'00')<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>>><o:p> </o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>endobj`<<<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>Unlike a
classical annotations here there is not the key " /Contents" and it
is my problem. I have tried pdfMiner, pyPDF, PyPDF2 and now pyPoppler but but
... I am not very good and don't find the way to extract the line I want.<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>My
question:<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>The key
/QuadPoints can give me a link for the text highlighted ? Or is the key /Rect
can do this ?<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>If somebody
can give me some advices I will be happy.<o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>Thanks for
your patience <o:p></o:p></span></p>
<p style='margin-bottom:0cm;margin-bottom:.0001pt'><span lang=OC-FR>Bruno<o:p></o:p></span></p>
</div>
</body>
</html>