[poppler] Extract title from pdf file.

Peter A. Kerzum kerzum at yandex-team.ru
Thu Nov 10 07:00:12 PST 2011


On Thursday 10 November 2011 14:36:39 Leonard Rosenthol wrote:
> EXCEPT that Poppler (and by extension, pdftoxml) does NOT process the
> tagging & structure of the PDF :(.   

This is not true, you can at least get Outline textx with poppler

> That's why I was hoping that you were
> ADDING THIS FEATURE to Poppler's core.
> 
> Leonard
> 
> On 11/9/11 10:44 PM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:
> >Running pdftohtml -xml, analysing XML, processing information back into
> >PDF
> >
> >On Thu, Nov 10, 2011 at 2:01 PM, Leonard Rosenthol <lrosenth at adobe.com>
> >
> >wrote:
> >> On 11/9/11 10:02 AM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:
> >>>>Are you also submitting patches to read & process any tags & structure
> >>>>in
> >>>>
> >>>> the PDF?  If the PDF is already tagged, then it will have any
> >>>> headers/footers already identified accordingly.  You should be using
> >>>>
> >>>>this
> >>>>
> >>>> when present.
> >>>
> >>>Yes, I am using the RapidXML library, which I specifically chose for
> >>>speed and that it is header only.
> >>>
> >> What does an XML library have to do with processing PDF structure &
> >> tagging (ISO 32000-1:2008, 14.7-14.9)???
> >> 
> >> 
> >> Leonard
> >
> >_______________________________________________
> >poppler mailing list
> >poppler at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

-- 
Пётр Керзум
Группа разработки поисковой платформы
СПб, тел. 8508


More information about the poppler mailing list