[poppler] Extract title from pdf file.

Wed Nov 9 06:53:27 PST 2011

Hi Peter,

Describe your method!

Cheers,

Alec Taylor

On Thu, Nov 10, 2011 at 1:51 AM, Peter A. Kerzum <kerzum at yandex-team.ru> wrote:
> Hi!
>
> We use some approach based on character properties to extract meaningful title
> from document text. Metadata usualy stores filename in title field.
>
> --
> Peter
>
> On Wednesday 09 November 2011 16:16:14 Alec Taylor wrote:
>> On Wed, Nov 9, 2011 at 10:37 PM, Albert Astals Cid <aacid at kde.org> wrote:
>> > A Dimecres, 9 de novembre de 2011, Alec Taylor vàreu escriure:
>> >> Incorrect, all getDocInfo tells you is what the meta info says, it
>> >> doesn't analyse the actual document, whereas my pdftopdf will update
>> >> the metadata with the appropriate info after PDF analysis
>> >
>> > Please do not top post, makes reading e-mail incredibly hard.
>> >
>> > And no it is not incorrect, if the metadata does not have a title, then
>> > the document does not have a title as defined per the spec.
>> >
>> > Albert
>>
>> But maybe the document doesn't have a title, because it was grabbed
>> from scanning the book, then OCRing it. So what I will facilitate is
>> the generation of proper metadata (+ more) from a current PDF lacking
>> such.
>>
>> So if the document does have a title, my pdftopdf tool will find it,
>> and add it to the metadata.
>>
>> I will contribute pdftopdf to poppler.
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>