[poppler] Extract title from pdf file.

Wed Nov 9 07:02:24 PST 2011

2011/11/10 Leonard Rosenthol <lrosenth at adobe.com>:
> On 11/9/11 1:26 AM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:
>
>>The easiest way I can think of is to grab it from the headers and footers.
>>
>>I am about to submit a patch (any day now) which separate the header
>>and footers into separate tags from which you can access from
>>pdftohtml -xml.
>
> Are you also submitting patches to read & process any tags & structure in
> the PDF?  If the PDF is already tagged, then it will have any
> headers/footers already identified accordingly.  You should be using this
> when present.

Yes, I am using the RapidXML library, which I specifically chose for
speed and that it is header only.

The patch will literally be submitted in the next 3 days, if not earlier.

>
>>I will then work on incorporating it all back into the PDF, with ToC
>>linkage (I will make a new pdftopdf utility).
>
> So are you also writing the structure back into the PDF?

That's the plan, however that may take a while longer, I'll need to
checkup on what kind of helpers the current API provides. For
instance, I haven't had a chance to read the poppler-toc.cc file.
Would that be helpful?

> Leonard
>
>