[poppler] Encoding of font names

Albert Astals Cid aacid at kde.org
Mon Aug 29 11:15:15 PDT 2011


A Dimarts, 30 d'agost de 2011, suzuki toshiya vàreu escriure:
> Hi,

Hi

> 
> I appreciate your interest & effort about non-Unicode font names!
> 
> Albert Astals Cid wrote:
> > Today I've been working on trying to fix the names reported by pdffonts
> > for non latin1 fonts, I have not got anything very clear while reading
> > the spec, but I understood that the BaseFont string is encoded using
> > the /Encoding encoding. This has worked fine for some files but not for
> > all like one that says
> > /BaseFont /#CB#CE#CC#E5
> > /Encoding /UniGB-UCS2-H
> > If i try to map that to Unicode i get nothing. And Adobe Reader properly
> > maps that to 宋体
> 
> Although I've not tested comprehensively yet, I guess
> Adobe implementation has some heuristic workaround for
> the font names coded by legacy localization mechanism.
> 
> 0xCB 0xCE 0xCC 0xE5 is GB-2312 encoding of 宋体.

Yeah, i know

> 
> # you can check as:
> # perl -le '{printf("%c%c%c%c\n", 0xCB, 0xCE, 0xCC, 0xE5);}' | iconv -f gbk
> -t utf-8
> 
> I guess, Adobe implementation processes as following:
> 
> 1) check font name if it is in hexadecimal syntax "/#xx#xx#xx..."
> 2) if its encoding is one of the predefined CJK CMaps,
>    try to decode the font name by
>    Adobe-CNS1 -> Big5
>    Adobe-GB1 -> GB-2312 (or GBK)
>    Adobe-Japan1 or Adobe-Japan2 -> Shift_JIS (or Windows-31J)
>    Adobe-Korea1 -> Wansung
> 
> Fortunately, core part of these legacy localizations are
> almost same in MS Windows and Mac OS, the coverage of possible
> legacy encoding is not so wide.
> 
> > Any idea what is the proper manipulation one has to do over BaseFont to
> > get the Unicode value?
> 
> I think if we can request iconv for the users who are interested
> in non-Unicode or non-ASCII font name, the conversion is not so
> difficult.

Using iconv from the code seems like a bit of a huge hack to me

> One of my concern is that I don't know about the handling of non-
> CJK (or CJK-but-not-predefined) localized font names, like,
> Adobe-Vietnam1, etc.
> 
> This is urgent issue? 

Not at all, i just stumbled upon it today and worked on it, but it is not 
urgent since it has been broken forever :D

Albert

> If not, I will try to write some workaround
> for this issue.
> 
> Regards,
> mpsuzuki
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list