[poppler] Broken embedded chinese characters

Agus Syawal alfonsus_agus at yahoo.com
Wed Mar 17 03:29:19 PDT 2010


Hello all,
I'm new in this mailing list and joined this mailing list because of trouble I had with poppler.
At the moment we are making a small pdf viewer for embedded system using Qt 4.5.2 and Poppler dan found trouble with some Chinese character in a pdf our client sent as a sample (attached with file name chineseembedded.pdf and two screen shots: the broken one is using KPdf, the good one is using KGhostView).

I searched the mailing list archive but didn't find clear answer about why and how to solve the problem.
For your information, I'm quite new working with poppler and pdf files, so I need some tips and help.

Regarding the file itself, this is information I gather using pdffonts:
----------------------------------------------------------
name: BIBABF+TimesNewRoman,Italic
type: TrueType
(emb/sub/uni): yes/yes/yes

name: RXOGNH+Arial,Bold
type: TrueType
(emb/sub/uni): yes/yes/yes

name: FCQAKN+TimesNewRoman,Bold
type: TrueType
(emb/sub/uni): yes/yes/yes

name: XWPRSA+Courier
type: TrueType
(emb/sub/uni): yes/yes/no

name: VMRZRO+font000000001149f5dd
type: CID TrueType
(emb/sub/uni): yes/yes/no

name: KVNASX+TimesNewRoman
type: TrueType
(emb/sub/uni): yes/yes/yes

name: WZUCCW+font000000001149f5fe
type: CID TrueType
(emb/sub/uni): yes/yes/no

name: XWPRSA+Courier-Oblique
type: TrueType
(emb/sub/uni): yes/yes/no

name: FHPJMF+font000000001149f5fe
type: CID TrueType
(emb/sub/uni): yes/yes/no
----------------------------------------------------------

If I use pdftotext -raw chineseembedded.pdf, the result shows only the Latin characters, the chinese characters either skipped or in some places shown as boxes of non-readable characters.

I downloaded several pdf documents from internet, it shows the character correctly. Using pdffonts, the font name is not strange like the one used in chineseembedded.pdf (for instance: NCNMEN+SymbolMT, NCNLIB+ArialUnicodeMS, emb: yes, sub: yes, uni: yes).

Please enlighten me why this happens and how to fix it.


Regards,
Agus S Y


      
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chineseembedded.pdf
Type: application/pdf
Size: 282505 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100317/368b94b3/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BrokenCharacters.png
Type: image/png
Size: 241786 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100317/368b94b3/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GoodCharacters.png
Type: image/png
Size: 107332 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20100317/368b94b3/attachment-0003.png>


More information about the poppler mailing list