[poppler] Extract title from pdf file.

Peter A. Kerzum kerzum at yandex-team.ru
Wed Nov 9 06:51:25 PST 2011


Hi!

We use some approach based on character properties to extract meaningful title 
from document text. Metadata usualy stores filename in title field.

--
Peter

On Wednesday 09 November 2011 16:16:14 Alec Taylor wrote:
> On Wed, Nov 9, 2011 at 10:37 PM, Albert Astals Cid <aacid at kde.org> wrote:
> > A Dimecres, 9 de novembre de 2011, Alec Taylor vàreu escriure:
> >> Incorrect, all getDocInfo tells you is what the meta info says, it
> >> doesn't analyse the actual document, whereas my pdftopdf will update
> >> the metadata with the appropriate info after PDF analysis
> > 
> > Please do not top post, makes reading e-mail incredibly hard.
> > 
> > And no it is not incorrect, if the metadata does not have a title, then
> > the document does not have a title as defined per the spec.
> > 
> > Albert
> 
> But maybe the document doesn't have a title, because it was grabbed
> from scanning the book, then OCRing it. So what I will facilitate is
> the generation of proper metadata (+ more) from a current PDF lacking
> such.
> 
> So if the document does have a title, my pdftopdf tool will find it,
> and add it to the metadata.
> 
> I will contribute pdftopdf to poppler.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list