[Fontconfig] Improving Latin font selection for CJK locales

Gerrit Sangel z0idberg at gmx.de
Mon Jan 28 14:22:10 PST 2008


Am Montag 28 Januar 2008 schrieb Ed Trager:
 Currently, locales are composed of just two elements:
>
>        (1) A "language" code (ISO-639-1, -2 : "en", "ja", "zh", "th", etc.)
> and (2) A "region" code ("US", "CA", "FR", "TW", "HK", "SG", etc. )
>
> This concept is incomplete.  A THIRD ELEMENT, SCRIPT, NEEDS TO BE ADDED. 
> Using four-letter ISO-15924 (
> http://unicode.org/iso15924/iso15924-codes.html ) codes is the obvious
> answer:
>
>        (3) "Script" code (ISO-15924 : "arab", "cyrl", "hans"
> (simplified Chinese), "hant" (traditional Chinese)
>
> Both "region" and "script" can be considered as "optional".  So we
> could now enumerate locales such as:

I think I suggested this some months before and I still strongly support this. 
It would also be necessary for a German locale in Fraktur writing (for which 
I am currently gathering information). 

>
>  =>  "Fully Specified" locales with all three elements:
>
>        az_AZ_latn
>        az_AZ_cyrl
>        az_IR_arab
>
>        zh_HK_hans
>        zh_HK_hant
>
>  => Locales missing "region" would also be permissable (and I think
> this variant would be extremely useful and I think translators would
> perhaps favor the generality that this option provides in many
> real-life applications):

Also strongly support this. For de_Latf.

But I would urge for the script code with the first letter capitalized, so it 
can be properly distinguished from the language or region code.

>
>        az_latn
>        az_arab
>        az_cyrl
>
>        zh_hans
>        zh_hant
>
> => Locales missing "script" of course also permissable (this is the
> current "status quo"): Systems would have to have rules for the
> "default" script :
>
>        az_AZ  : defaults to "latn"  (Latin became official in
> Azerbaijan in 1991 although uptake has been apparently slow)
>        az_IR   : defaults to "arab"
>
>        zh_HK  : defaults to "hant"
>        zh_SG : defaults to "hans"
>
> => Locales missing both "region" and "script" are also permissable
> (again this does not differ from current "status quo"):
>
>        ja  : implies (defaults to) "ja_JP_jpan"
>        th  : implies (defaults to) "th_TH_thai"
>
> The CLDR community is one obvious place for discussions about this,
> and I apologize that I have not had the time to investigate how far
> discussions on this topic have gotten in CLDR or other relevant
> communities (like maybe Linux LSB folks?).
>
> Adding a four-letter script code to Locale is the obvious remedy.
> Perhaps the Pango and Fontconfig communities could take the lead in
> creating the minor changes in infrastructure needed to support this
> addition ?

Another question, but I do not know, to which applications this may be of 
concern: For German Fraktur, the application would sometimes have to switch 
fonts in a message string for some foreign words or upper case abbreviations 
(maybe this is similar to the CJK-latin-font problem). So somehow the 
translation files would have to have a possibility to change the script and 
(maybe) language on the fly, similar to html (with <span xml:lang="de-Latf">

The problem with fraktur is, that it is unified with ordinary Latin, so the 
difference could only be distinguished via a optional parameter, providing 
the information which script is to be used.


Gerrit Sangel


More information about the Fontconfig mailing list