[poppler] Encoding of font names

Leonard Rosenthol lrosenth at adobe.com
Mon Aug 29 11:02:41 PDT 2011


There is no magic here - it's all documented in ISO 32000-1:2008.

First you decode the string according to rules for Name objects, then
treat the result as UTF8.

Leonard


On 8/29/11 1:51 PM, "suzuki toshiya" <mpsuzuki at hiroshima-u.ac.jp> wrote:

>Hi,
>
>I appreciate your interest & effort about non-Unicode font names!
>
>Albert Astals Cid wrote:
>> Today I've been working on trying to fix the names reported by pdffonts
>>for 
>> non latin1 fonts, I have not got anything very clear while reading the
>>spec, 
>> but I understood that the BaseFont string is encoded using the
>>/Encoding 
>> encoding. This has worked fine for some files but not for all like one
>>that 
>> says
>> /BaseFont /#CB#CE#CC#E5
>> /Encoding /UniGB-UCS2-H
>> If i try to map that to Unicode i get nothing. And Adobe Reader
>>properly maps 
>> that to 宋体
>
>Although I've not tested comprehensively yet, I guess
>Adobe implementation has some heuristic workaround for
>the font names coded by legacy localization mechanism.
>
>0xCB 0xCE 0xCC 0xE5 is GB-2312 encoding of 宋体.
>
># you can check as:
># perl -le '{printf("%c%c%c%c\n", 0xCB, 0xCE, 0xCC, 0xE5);}' | iconv -f
>gbk -t utf-8
>
>I guess, Adobe implementation processes as following:
>
>1) check font name if it is in hexadecimal syntax "/#xx#xx#xx..."
>2) if its encoding is one of the predefined CJK CMaps,
>   try to decode the font name by
>   Adobe-CNS1 -> Big5
>   Adobe-GB1 -> GB-2312 (or GBK)
>   Adobe-Japan1 or Adobe-Japan2 -> Shift_JIS (or Windows-31J)
>   Adobe-Korea1 -> Wansung
>
>Fortunately, core part of these legacy localizations are
>almost same in MS Windows and Mac OS, the coverage of possible
>legacy encoding is not so wide.
>
>> Any idea what is the proper manipulation one has to do over BaseFont to
>>get 
>> the Unicode value?
>
>I think if we can request iconv for the users who are interested
>in non-Unicode or non-ASCII font name, the conversion is not so
>difficult.
>
>One of my concern is that I don't know about the handling of non-
>CJK (or CJK-but-not-predefined) localized font names, like,
>Adobe-Vietnam1, etc.
>
>This is urgent issue? If not, I will try to write some workaround
>for this issue.
>
>Regards,
>mpsuzuki
>_______________________________________________
>poppler mailing list
>poppler at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler



More information about the poppler mailing list