[poppler] multibyte/asian pdftotext

tsuraan tsuraan at gmail.com
Wed May 31 09:31:27 PDT 2006


I'm attempting to convert a japanese pdf file to a text file
(preferably utf-8, but whatever works) using the pdftotext that ships
with poppler.  When I attempt to do the conversion, I get the
following output:

Error: Unknown character collection 'Adobe-Japan1'
Error: Couldn't find '90ms-RKSJ-H' CMap file for 'Adobe-Japan1' collection
Error: Unknown CMap '90ms-RKSJ-H' for character collection 'Adobe-Japan1'
Error: Unknown character collection 'Adobe-Japan1'

Which repeats several times, followed by a few dozen lines of:

Error (9298): No font in show

The pdf file is the first hit on google for "japanese.pdf".

I'm using gentoo, and I have installed the kochi-substitute,
arphicfonts, and baekmuk-fonts packages.  In addition, I have the line

displayFontT1 Adobe-Japan1
/usr/share/ghostscript/8.16/Resource/pdfcorefont/japanese/Adobe-Japan1-4

in my /etc/xpdfrc file, which is probably totally bogus, but seemed
like it might help.  The pdfcorefonts for japanese, korean, and
chinese are in directories parallel to the one listed there, but I'm
not worried about those character sets yet.

My copy of xpdf, which links against poppler, cannot display this pdf
either.  Can someone help me set up my fonts correctly, so that
pdftotext can convert my files?  Thanks in advance.

--jay


More information about the poppler mailing list