[Fontconfig] ISO 15924 font selection

Gerrit Sangel z0idberg at gmx.de
Tue Dec 4 02:06:51 PST 2007


Am Dienstag 04 Dezember 2007 schrieben Sie:

> I also had Unicode scripts in mind, instead of ISO 15924, and I had a
> user-readable version in mind, like "arabic" and "latin".  Pango already
> has that information and it can be deduced from standard Unicode script
> names.  Doesn't mean it can't be ISO 15924 names though, but the mapping
> is not one to one, and I really don't understand why Fraktur is a
> different script than Latin in there.  I don't think this feature if
> added should be used for things like Fraktur.

Well, in my opinion the case with Fraktur is more or less the same as with the 
Han unification. Apart from the long s, Fraktur shares the same code points 
with normal latin, so it can’t really be guessed via the code points. It may 
only be different glyphs, but the appearance is (imho) way too different to 
just speak of a different style like serif or sans serif. They are used in a 
different way, as well. Foreign words are usually not written in Fraktur, so 
sometimes the script information has to be changed in the sentence.

Doing this via CSS would work, but it is not really flexible. The first thing 
is, as far as I know, that there is no real “standard” Fraktur font 
available, so the web designer could not just specify a certain font. He 
would have to specify several fonts in CSS, which I think would be a bit too 
much work. If he would just do it via a script tag, he could just define
<p xml:lang="de-Latf">Das iſt Fraktur <span xml:lang="de">und das 
Antiqua</span></p>
and let the user care about which font he wants to use.

But what are the benefits of Unicode scripts? Is there a list available? As 
the Unicode website states, the Unicode Consortium was appointed to manage 
ISO 15924. So I would have guessed that this is the “official” script list 
for Unicode.

> > > In my opinion, it is much more flexible than defining fonts according
> > > to a specific region (e.g. TW or CN). In some cases, it is even
> > > necessary, because the region does not differ.
> >
> > Yeah, conflicts among multiple scripts used for the same langauge in the
> > same territory do exist, which fontconfig doesn't handle well at all.
>
> If we add script tags in excess to language tags, orthographies then can
> be extended to tell what script is used in them.  Matching can skip if
> script tags don't match.

Well, but why should script tags don’t match? I would guess (I’m no linguist) 
that you can express every language with every script, even though it may not 
be quite correct most of the time. So I don’t think that there should be a 
limitation.
I think the main purpose of the script tags is that a script can be specified 
for a language which is usually not written with that script.

But the different iso standards would not conflict as far as I know. ISO 639 
is written entirely in lowercase letters, ISO 3166 completely in uppercase 
and ISO 15924 has the first letter in uppercase, the other three in 
lowercase.
And I guess the ordering would be from “biggest” to “lowest”, so 
language-region-script.

> > > Do I understand this correctly, that the user can specify a font in the
> > > config file according to a specific language?
> >
> > You can match on the language and prepend a family name to make that
> > preferred.
> >
> > > I see this in Firefox (even though it does not seem to use fontconfig,
> > > but I guess an addon could be written to solve it)
> >
> > firefox does use fontconfig, although the language-based selection is
> > internal, not based on modifying fontconfig matching rules.
> >
> > > So I think a possible way would be to define a general rule for a
> > > language (according to ISO-639) or a script (ISO 15924) at first and
> > > then a specific rule for a language or script which would override the
> > > general rule.
> >
> > The pattern matching and editing rules should be able to handle this
> > without change, execpt for the addition of ISO 15924 script codes to the
> > existing set of language/territory pairs.
>
> Another piece of information that can improve language matching is to
> use ISO 639-3 macrolanguage information.  That can fontconfig for
> example that Dari is a Persian language for example:
>
>   http://bugzilla.gnome.org/show_bug.cgi?id=470907

Well, but this is for *languages*, not *scripts*. Another example would maybe 
this:
I have a Japanese text I want to write in old characters in use before 
simplification after WW2. Although some old characters are encoded 
differently, some were unified because there are only minor stylistic 
differences. I would have to use a higher level protocol to define that these 
should be old characters. But the language itself does not differ. ISO 15924 
has some tags for Han, namely Hani (Han ideographs), Hans (simplified Han), 
Hant (traditional Han). So I would define this old character as “ja-Hant” and 
the browser could select a font which has these old glyphs. 
In this case, you could not differentiate between a language and a region, 
because it is the same as modern Japanese. *Only* the script differs.

So I would really urge for ISO 15924. In my opinion, this is the best 
solution, because
a) an established standard exists
b) it is conform with ISO 639 and 3166
c) It is managed by the Unicode consortium
d) Why reinvent the wheel?

And I would not think, names like “arabic” or “latin‌” are that useful. First, 
because they explicitely aim towards english speakers, which especially in 
this case, I don’t like that much. Second, because the ISO 15924 tags are 
derived from more or less user readable names, and because they have 4 
letters, they are still quite well to read. Arabic is Arab and Latin is Latn. 
Third, if the web designer already has to look which language/country code he 
needs, I don’t think it would be very exhausting.
http://www.unicode.org/iso15924/iso15924-en.html


Gerrit


More information about the Fontconfig mailing list