[HarfBuzz] Itemising Japanese scripts
Adam Twardoch (List)
list.adam at twardoch.com
Sun Apr 24 15:36:22 UTC 2016
I think they should always be merged. They were emcoded as three scripts in Unicode in the early days when it was not at all obvious how the script property is to be used. Certainly the notion of script itemisation in OpenType came much later and the fact that OpenType unifies them under one "kana" tag clearly indicating the preferred usage in OT context.
Hiragana is often used together with kanji within the same word. For example, the word 食べました (tabemashita, “ate”) has one root kanji followed by 4 hiragana to indicate the inflection.
The way kanji and kana are used is actually conceptually much closer to how uppercase and lowercase Latin letters are used in many European languages, than to mixing say Arabic and Latin scripts. Any “normal” Japanese text always consists of kanji mixed with hiragana, woth occasional katakana mixins.
In case of the three Japanese scripts, they definitely should be merged within runs.
Unfortunately, Microsoft has not produced a script-specific spec for CJK that would explain it
But since there is only one "kana" OT tag, I strongly suspect that Uniscribe/DW merges the CJK scripts.
Sent from my mobile phone.
> On 24.04.2016, at 16:43, Khaled Hosny <khaledhosny at eglug.org> wrote:
> I’m wondering what is the best practice of itemising Japanese scripts
> (Han, Hiragana, Katakana), should they be merged somehow or is it better
> to keep them in separate runs?
> I’m currently treating them as separate scripts so they end up in
> operate runs, but in the ~7000 characters of Japanese text I’m testing
> with I get ~2000 runs, if it were some English or Arabic text it would
> be just 1 run so it seems quite inefficient (though I didn’t make any
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the HarfBuzz