[poppler] Extract title from pdf file.

Alec Taylor alec.taylor6 at gmail.com
Thu Nov 10 13:57:54 PST 2011


As was previously mentioned, I am adding the semantic and logical
structuring into poppler core.

My plan is to figure out what fits into which category by post processing
the XML. Any suggestions on how to reverse [or post?!] engineer this XML
back into the PDF would be appreciated.

In a few days I will have a very accurate XML genereated with
<header></header>, <footer></footer> and table of contents tags.

This will involve the "pushing" of the actual "printed" page numbers, and
adding hyperlink to each ToC entry, and partitioning the page structure as
far as the 1.3 standard allows.

My code is extremely modular, neat & efficient, and included the writing of
an OO API. So it should be easily extendable with author, title, publisher,
year and section title extraction capabilities.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20111111/155fcd06/attachment.htm>


More information about the poppler mailing list