[poppler] Reverse-engineering an XML file generated by pdftohtml -xml back into the PDF?

Josh Richardson jric at chegg.com
Tue Nov 15 12:23:58 PST 2011


Someone on the list may have a better idea, but I would almost certainly
start with the PDFDoc created by reading the original document, and inject
back in the meta-data that you have collected -- I believe this was
Leonard's recommendation as well.

--josh

On 11/14/11 10:42 PM, "Alec Taylor" <alec.taylor6 at gmail.com> wrote:

>Good afternoon,
>
>How would I go about reverse-engineering an XML file generated by
>pdftohtml -xml bak into the [same] PDF?
>
>I have been spending a long time extending the XML output to include
>proper page numbers and header/footer detection.
>
>It would be extremely useful if I could push the additional logical
>structure information and page numbers back into the PDF the XML was
>generated from.
>
>How would I go about doing this?
>
>Thanks for all suggestions,
>
>Alec Taylor
>
>PS: T-9 days (or less!) until PATCH :)
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
>



More information about the poppler mailing list