[Fontconfig] Improving Latin font selection for CJK locales

Ed Trager ed.trager at gmail.com
Mon Jan 28 09:24:51 PST 2008


Hi, everyone,

Behdad notes:

> It could be easier if we could match on scripts instead of languages,
> but that's another issue.

... and I agree completely :

REQUIREMENT: EXPAND DEFINITION OF LOCALE TO INCLUDE OPTIONAL ISO-15924
SCRIPT CODE
=============================================================================

First of all, the notion of "locale" needs to be re-defined as
composed of *3* elements instead
of *2* elements.

Currently, locales are composed of just two elements:

       (1) A "language" code (ISO-639-1, -2 : "en", "ja", "zh", "th", etc.)
and (2) A "region" code ("US", "CA", "FR", "TW", "HK", "SG", etc. )

This concept is incomplete.  A THIRD ELEMENT, SCRIPT, NEEDS TO BE ADDED.  Using
four-letter ISO-15924 (
http://unicode.org/iso15924/iso15924-codes.html ) codes is the obvious
answer:

       (3) "Script" code (ISO-15924 : "arab", "cyrl", "hans"
(simplified Chinese), "hant" (traditional Chinese)

Both "region" and "script" can be considered as "optional".  So we
could now enumerate locales such as:

 =>  "Fully Specified" locales with all three elements:

       az_AZ_latn
       az_AZ_cyrl
       az_IR_arab

       zh_HK_hans
       zh_HK_hant

 => Locales missing "region" would also be permissable (and I think
this variant would be extremely useful and I think translators would
perhaps favor the generality that this option provides in many
real-life applications):

       az_latn
       az_arab
       az_cyrl

       zh_hans
       zh_hant

=> Locales missing "script" of course also permissable (this is the
current "status quo"): Systems would have to have rules for the
"default" script :

       az_AZ  : defaults to "latn"  (Latin became official in
Azerbaijan in 1991 although uptake has been apparently slow)
       az_IR   : defaults to "arab"

       zh_HK  : defaults to "hant"
       zh_SG : defaults to "hans"

=> Locales missing both "region" and "script" are also permissable
(again this does not differ from current "status quo"):

       ja  : implies (defaults to) "ja_JP_jpan"
       th  : implies (defaults to) "th_TH_thai"

The CLDR community is one obvious place for discussions about this,
and I apologize that I have not had the time to investigate how far
discussions on this topic have gotten in CLDR or other relevant
communities (like maybe Linux LSB folks?).

Adding a four-letter script code to Locale is the obvious remedy.
Perhaps the Pango and Fontconfig communities could take the lead in
creating the minor changes in infrastructure needed to support this
addition ?

Let's return to Behdad's Japanese example for a minute.  Recall that
modern Japanese is, for all intents and purposes, really composed of
four scripts ( Han, Katakana, Hiragana, Latin ).  So, for a Japanese
locale, perhaps I ought really be able to specify a different font set
each and every one of those four scripts independently, if I so
desire.

Best Wishes -- Ed Trager


On Jan 27, 2008 11:38 PM, Behdad Esfahbod <behdad at behdad.org> wrote:
> Hi,
>
> This keeps coming up again and again: CJK users want Pango to choose
> Latin fonts differently under a CJK locale than it does under a non-CJK
> locale.
>
> Making that work is currently impossible in Pango+fontconfig.  The
> reason being that Pango passes a Latin "lang" to fontconfig for Latin
> runs, and fontconfig and font configurations have no way to
> differentiate the Latin in CJK locale from Latin in Latin locale cases.
>
> I'd like to propose adding a new element named "locale" that holds the
> original locale language.  Fontconfig needs not know about this at all
> except that filling it in in FcDefaultSubstitute() like it does for
> "lang".  Then users can write configuration that is sensitive to locale.
>
> Pango then can pass PangoContext language as "locale".  PangoContext
> language defaults to the locale, so this is all consistent.
>
> I can do this all in Pango only, but given that I want to encourage CJK
> font developer/packagers to write such configuration for their fonts,
> would be nice to have it upstreamed.
>
> As an example, one would write:
>
>         <match>
>                 <test name="lang">
>                         <string>en</string>
>                 </test>
>                 <test name="locale">
>                         <string>ja</string>
>                 </test>
>                 <edit name="family" mode="prepend" binding="same">
>                         <string>SomeJapaneseFontWithGoodLatin</string>
>                 </edit>
>         </match>
>
> It could be easier if we could match on scripts instead of languages,
> but that's another issue.
>
> Keith, what do you think?
>
> --
> behdad
> http://behdad.org/
>
> "Those who would give up Essential Liberty to purchase a little
>  Temporary Safety, deserve neither Liberty nor Safety."
>         -- Benjamin Franklin, 1759
>
> _______________________________________________
> Fontconfig mailing list
> Fontconfig at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/fontconfig
>


More information about the Fontconfig mailing list