<html><head></head><body><div class="ydpffcdc826yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div></div>
<div dir="ltr" data-setdir="false">FYI, I was able to find the URIs of external links with code similar to this:</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false"><div> Annot *a = annots->getAnnot(x);<br> int type = a->getType();<br> if (type == Annot::typeLink) {<br> AnnotLink *link = static_cast<AnnotLink *>(a);<br> LinkAction *action = link->getAction();<br> auto kind = action->getKind();<br> if (kind == actionURI) {<br> auto *link_uri_action = dynamic_cast<LinkURI*>(action);<br> char *uri = (char*)(link_uri_action->getURI()->c_str());<br> }<br> }<br><br></div><div dir="ltr" data-setdir="false">In case it helps anyone else.<br></div><div dir="ltr" data-setdir="false">Shawn</div><div dir="ltr" data-setdir="false"><br></div></div><div><br></div>
</div><div id="yahoo_quoted_2221579686" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Wednesday, October 9, 2019, 11:10:21 AM PDT, Shawn McMurdo <shawn_mcmurdo@yahoo.com> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv7457390914"><div><div class="yiv7457390914yahoo-style-wrap" style="font-family:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div dir="ltr"><div>Hi,<br>I'm new to poppler and PDF internals so forgive me if this is obvious.<br>I am trying to find all of the relevant information about links within a tagged PDF that either go to a place in the same document or go to a url.<br><br>I tried traversing the tree of StructElements starting from the StructTreeRoot like this:<br><br> Catalog *catalog = doc->getCatalog();<br> StructTreeRoot *root = catalog->getStructTreeRoot();<br> unsigned numChildren = root->getNumChildren();<br> for (unsigned i = 0; i < numChildren; i++) {<br> StructElement *child = root->getChild(i);<br> printChild(child); // recursive<br> }<br><br>Partial output from a version 1.4 PDF looks like:<br><br> Document<br> L (block)<br> LI (block)<br> LBody (block)<br> P (block):<br> Link (inline):<br> Object 86 0<br> LI (block)<br> LBody (block)<br> P (block):<br> Link (inline):<br> Object 89 0<br> L (block)<br> LI (block)<br> LBody (block)<br> P (block):<br> Link (inline):<br> Object 88 0<br> LI (block)<br> LBody (block)<br> P (block):<br> Link (inline):<br> Object 87 0<br><br>This finds the links which seem to have an object ref num (87 for example).<br>How can I find out the URI or destination location in the document for this link?<br><br>I have tried code similar to these 3 blocks:<br>(I don't really understand the difference between the first two as the naming is confusing.)<br><br> // 1. DestNameTreeDest<br> int numDests = doc->getCatalog()->numDestNameTree();<br> for (int i = 0; i < numDests; i++) {<br> LinkDest *dest = doc->getCatalog()->getDestNameTreeDest(i);<br> // printf<br> }<br><br> // 2. DestsDest<br> numDests = doc->getCatalog()->numDests();<br> for (int i = 0; i < numDests; i++) {<br> LinkDest *dest = doc->getCatalog()->getDestsDest(i);<br> // printf<br> }<br><br> // 3. Annot<br> for (int i = firstPage; i <= lastPage; i++) {<br> Page *p = doc->getPage(i);<br> Annots *annots = p->getAnnots();<br> int numAnnots = annots->getNumAnnots();<br> for (int x = 0; x < numAnnots; x++) {<br> Annot *a = annots->getAnnot(x);<br> int type = a->getType();<br> if (type == Annot::typeLink) {<br> AnnotLink *link = static_cast<AnnotLink *>(a);<br> int kind = link->getAction()->getKind();<br> if (kind == 0) {<br> // GoTo<br> } else if (kind == 3) {<br> // URI<br> }<br> }<br> int id = a->getId();<br> const GooString *name = a->getName();<br> const GooString *contents = a->getContents();<br> // printf<br> }<br> }<br><br>When I run the code on a version 1.4 PDF containing both internal links and web links the first 2 blocks don't seem to find anything.<br>The last block finds the following:<br><br> Annot 0 Type 2 (Link) Kind 0 (GoTo) Id: 86 Contents: <br> Annot 1 Type 2 (Link) Kind 3 (URI) Id: 87 Contents: <br> Annot 2 Type 2 (Link) Kind 3 (URI) Id: 88 Contents: <br> Annot 3 Type 2 (Link) Kind 0 (GoTo) Id: 89 Contents: <br><br>When I run the code on a different version 1.5 PDF containing both internal and web links I see the following:<br><br>Page Destination Name<br> 1 [ XYZ 346 209 null ] "EN-05-10531.indd:Application Number:1832"<br> 1 [ XYZ 343 593 null ] "EN-05-10531.indd:Welcome to the Social Security Benefit Application:1830"<br>---vvv--- Begin Page 1 Annots ---vvv---<br>Printing 5 Annots.<br>Annot 0 Type 2 (Link) Kind 3 (URI) Id: 177 Contents: <br>Annot 1 Type 2 (Link) Kind 3 (URI) Id: 178 Contents: <br>Annot 2 Type 2 (Link) Kind 3 (URI) Id: 179 Contents: <br>Annot 3 Type 2 (Link) Kind 3 (URI) Id: 180 Contents: <br>Annot 4 Type 2 (Link) Kind 3 (URI) Id: 181 Contents: <br>---^^^--- End Page 1 Annots ---^^^---<br> 2 [ XYZ 311 256 null ] "EN-05-10531.indd:Finishing Your Application:1835"<br> 2 [ XYZ 29 522 null ] "EN-05-10531.indd:Questions About Your Benefits:1834"<br> 2 [ XYZ 311 707 null ] "EN-05-10531.indd:Questions About Your Work:1833"<br>---vvv--- Begin Page 2 Annots ---vvv---<br>Printing 0 Annots.<br>---^^^--- End Page 2 Annots ---^^^---<br><br>This did seem to find the XYZ dests for the internal links but not any urls.<br><br>Can anyone help point me in the right direction?<br>Thanks.<br>Shawn<br><br></div><div><br></div></div></div></div></div></div>
</div>
</div></body></html>