[poppler] pdftohtml inserts spaces between lots of characters
Tim Garton
tim at itcresearch.com
Tue May 26 15:21:34 PDT 2009
All,
Can anyone offer me help on getting pdftohtml to not insert spaces
around lots of characters? A sample pdf is:
http://downloads.datastarved.net/sample_after_abbyy.pdf
when you run pdftohtml on that pdf, you will see that the first line
shows up as:
U N I T E D S T A T E S I N T E R N A T I O N A L T R A D E C O M
M I S S I O N
but running it through pdftotext (part of xpdf, the backend I believe
poppler uses), I get:
UNITED STATES INTERNATIONAL TRADE COMMISSION
Thanks!
-Tim
More information about the poppler
mailing list