[poppler] pdftohtml and HTML output
Toby Hewlett
toby at billminder.co.za
Thu May 17 08:44:57 PDT 2012
Hi,
I've recently upgraded to Popper v0.18.4 from an older version v0.40
which did not have the -nodrm switch.
The HTML generated by the older version used SPAN tags inside DIVs with
positional information - for eaxmple:
<DIV style="position:absolute;top:1207;left:431"><nobr><span
class="ft00">Page 1</span></nobr></DIV>
However I have noticed that the 0.18.4 creates HTML now uses paragraph
<P> tags and often separates the text with spaces instead of separating
it out positionally - for example:
<P style="position:absolute;top:270px;left:54px;white-space:nowrap"
class="ft01"> 0214488062         DSL Fast
                   
      04 May 12 - 03 Jun 12        
                   
              R133.33</P>
I need accurate positional information of each piece of text, so <P>
with spaces is not suitable, therefore I need to revert to a version
that generates HTML with <DIVS> and <SPANS>, but which still includes
the -nodrm switch.
Can anyone advise which version of Poppler utils might be suitable?
Thanks!
Regards
Toby Hewlett
More information about the poppler
mailing list