I'd like to improve the pdftohtml handling of ebooks. Here are the goals that I have:<div><br></div><div>1. Recognize table of contents and convert to links</div><div>2. Remove running headers and page numbers from the resulting text</div>
<div>3. Recognize columns</div><div><br></div><div>I'm thinking that each of these could be separate switches. Anybody who is interested to help is welcome of course, or pointers to similar code.</div>