[poppler] problem displaying pdf contains Chinese character
James Cloos
cloos at jhcloos.com
Sun Apr 5 15:23:47 PDT 2009
>>>>> "Yuanle" == Yuanle Song <sylecn at gmail.com> writes:
>> :; fc-list 宋体 family familylang file
>> :; fc-list 黑体 family familylang file
>> :; fc-list 仿宋_GB2312 family familylang file
>> :; fc-list 楷体_GB2312 family familylang file
Yuanle> As I said I already have them, according to fc-list.
I was hoping to see the full family and familylang output, especially
for the latter two.
In any case, I hit upon soemthing poppler can do to circumvent this
issue. If poppler sets the lang parameter when searching for a font
the resulting font set is more likely to have glyphs which will work.
Eg, in this case poppler should ask for the equivilent of:
:; fc-match NAME:lang=zh-cn
or at least for the equivilent of:
:; fc-match NAME:lang=zh
choosing the lang it specifies based on the characters the pdf wants
to use it for.
Yuanle> Please tell me if there is other ways I can get more debug
Yuanle> output or log.
You can try uncompressing the pdf with something like podofo¹ or pdftk².
Then, looking in that file in a pager such as less(1) (or in a text
editor), search for the objects specified by pdffonts. Eg, for:
,----
| 仿宋_GB2312 TrueType no no no 1600 0
`----
Look in the uncompressed version for the regex /^1600 0 obj/. Everthing
from that line to the next line matching /^endobj/ should be the /Font
object for 仿宋_GB2312. It would be useful to see those objects for each
of the four fonts listed in your page1to10 report.
(To be explicit, the first digit-string in the regex is the object
column from the pdffonts outout and the second digit-string is the
ID column.)
The object may include a /FontDescriptor entry. If the contents of any
interesting entries are of the form:
/string [0-9]+ [0-9]+ R
then that is a reference to another object. You'll also want to look at
those referenced objects.
As an example, I'm looking at a file which has this object:
,----
| 4 0 obj
| <<
| /Type /Font
| /BaseFont /OZPPOK+LMRoman12-Regular
| /Encoding 6 0 R
| /FirstChar 49
| /FontDescriptor 9 0 R
| /LastChar 122
| /Subtype /Type1
| /Widths 7 0 R
| >>
| endobj
`----
That shows that I need to look at object 9 0 for the /FontDescriptor,
which looks like:
,----
| 9 0 obj
| <<
| /Type /FontDescriptor
| /Ascent 689
| /CapHeight 689
| /CharSet (/a/e/o/one/z)
| /Descent -194
| /Flags 4
| /FontBBox [ -422 -280 1394 1127 ]
| /FontFile 8 0 R
| /FontName /OZPPOK+LMRoman12-Regular
| /ItalicAngle 0
| /StemV 65
| /XHeight 431
| >>
| endobj
`----
The contents of the /Font objects and their children would be useful.
Is the pdf something you can post somewhere?
-JiMC
1] PoDoFo is at:
http://podofo.sourceforge.net/
http://sourceforge.net/projects/podofo/
The command line to uncompress a pdf is:
:; podofouncompress original.pdf new.pdf
2] PdfTk is at:
http://www.pdfhacks.com/pdftk
Its cli is:
:; pdftk original.pdf output new.pdf uncompress
Both are probably packaged by your distribution.
--
James Cloos <cloos at jhcloos.com> OpenPGP: 1024D/ED7DAEA6
More information about the poppler
mailing list