[poppler] improved ebook pdf handling

Randall Puljek-Shank puljekshank at gmail.com
Sun Oct 4 22:05:54 PDT 2009


I'd like to improve the pdftohtml handling of ebooks.  Here are the goals
that I have:
1. Recognize table of contents and convert to links
2. Remove running headers and page numbers from the resulting text
3. Recognize columns

I'm thinking that each of these could be separate switches.  Anybody who is
interested to help is welcome of course, or pointers to similar code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freedesktop.org/archives/poppler/attachments/20091005/096dce1d/attachment.htm 


More information about the poppler mailing list