[poppler] pdftohtml inserts spaces between lots of characters

Tim Garton tim at itcresearch.com
Tue May 26 15:21:34 PDT 2009


All,
    Can anyone offer me help on getting pdftohtml to not insert spaces
around lots of characters?  A sample pdf is:

http://downloads.datastarved.net/sample_after_abbyy.pdf

when you run pdftohtml on that pdf, you will see that the first line
shows up as:

U N I T E D  S T A T E S  I N T E R N A T I O N A L  T R A D E  C O M
M I S S I O N

but running it through pdftotext (part of xpdf, the backend I believe
poppler uses), I get:

UNITED STATES INTERNATIONAL TRADE COMMISSION

Thanks!

-Tim


More information about the poppler mailing list