[poppler] pdftohtml and HTML output

Ihar `Philips` Filipau thephilips at gmail.com
Thu May 17 09:17:08 PDT 2012


On 5/17/12, Toby Hewlett <toby at billminder.co.za> wrote:
>
> I need accurate positional information of each piece of text, so <P>
> with spaces is not suitable, therefore I need to revert to a version
> that generates HTML with <DIVS> and <SPANS>, but which still includes
> the -nodrm switch.
>
> Can anyone advise which version of Poppler utils might be suitable?
>

Check the output of `pdftohtml -xml`. The produced XML contains
positional information and can be easily converted into HTML with
divs/spans.


More information about the poppler mailing list