[poppler] Extract title from pdf file.
Alec Taylor
alec.taylor6 at gmail.com
Wed Nov 9 07:02:24 PST 2011
2011/11/10 Leonard Rosenthol <lrosenth at adobe.com>:
> On 11/9/11 1:26 AM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:
>
>>The easiest way I can think of is to grab it from the headers and footers.
>>
>>I am about to submit a patch (any day now) which separate the header
>>and footers into separate tags from which you can access from
>>pdftohtml -xml.
>
> Are you also submitting patches to read & process any tags & structure in
> the PDF? If the PDF is already tagged, then it will have any
> headers/footers already identified accordingly. You should be using this
> when present.
Yes, I am using the RapidXML library, which I specifically chose for
speed and that it is header only.
The patch will literally be submitted in the next 3 days, if not earlier.
>
>>I will then work on incorporating it all back into the PDF, with ToC
>>linkage (I will make a new pdftopdf utility).
>
> So are you also writing the structure back into the PDF?
That's the plan, however that may take a while longer, I'll need to
checkup on what kind of helpers the current API provides. For
instance, I haven't had a chance to read the poppler-toc.cc file.
Would that be helpful?
> Leonard
>
>
More information about the poppler
mailing list