[HarfBuzz] hb_script_from_iso15924_tag

Sun Mar 9 21:06:23 PDT 2014

A couple of minor points first

- This should map both Hans and Hant to HB_SCRIPT_HAN.

- The Georgian script in Unicode unifies various variants including
Khutsuri, so Geok needs to be mapped to HB_SCRIPT_GEORGIAN (see
Property_Value_Alias column of
http://www.unicode.org/iso15924/iso15924-codes.html).

A more fundamental point is that ISO 15924 has several script tags that
represent combinations of hb_script_t and so cannot be handled by this API
(hb_script_t includes only script tags that can be values of the Unicode
Script property), specifically:

Hrkt = Hira + Kata
Jpan = Hani + Hira + Kata
Kore = Hang + Hani

BCP 47 is also relevant here.  With BCP 47, a common (I guess by far the
most common) place to find an ISO 15924 script tag will be as part of an
IETF language tag. BCP 47 and the IETF language subtag registry also have a
Suppress-Script feature (http://tools.ietf.org/html/bcp47#section-3.1.9).
 This is a field in the record for a primary language subtag that
"indicates a script used to write the overwhelming majority of documents
for the given language".  So for example, "en" has Suppress-Script: "Latn"
and "th" has Suppress-Script: "Thai".  BCP 47 says that such a script
should not be included in the language tag.  To put it the other way round,
specifying a language tag of "en" is equivalent to specifying "en-Latn".
This relies on the combo script tags; for example, "ja" has
Suppress-Script: "Jpan".

Given the above, I think it would be more useful to have an API that would
turn an hb_language_t would into a list of zero or more (up to 3)
hb_script_t's; this would take into account any explicit script subtag
included in the language tag as well as any script implied by the
Suppress-Script field.  This would be useful for allowing script
itemization to take into account any explicit language tagging (I guess
this is what the Microsoft DirectWrite docs are talking about when they
mention "language guided" script itemization).

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20140310/6ca42d4c/attachment.html>