[poppler] pdftotext convert error!

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Thu Feb 9 00:02:50 PST 2012


Hi,

I think it is not poppler issue.

In both PDFs, most (all?) fonts are embedded as PostScript Type1
without CID-keyed, so Unicode codepoints for the characters in
the embedded fonts cannot be extracted. In fact, even if you
copy & paste the text via Adobe Acrobat (or Adobe Reader),
the extracted text would be garbage.
If you have any application that can extract readable texts
from the PDFs, please let me know.

Regards,
mpsuzuki


杨辉强 wrote (2012/02/09 16:30):
> Hi, all:
>    I use the pdftotext in poppler/util/ directory. When it convert the 
> following two urls's pdf files,
> it seems convert errorly.
> 
>      http://www.100ec.cn/b2bimages/dcbg.pdf
>      http://sjb.qlwb.com.cn/images/2011-06/16/Q02/qd0216.pdf
> 
> 
> Can you give me some advices? Thank you very much.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list