[HarfBuzz] Question regarding the use of HB_SCRIPT_KATAKANA for "regular" Japanese

Ariel Malka ariel at chronotext.org
Sun Dec 22 15:17:29 PST 2013


> As it happens, those three scripts are all considered "simple", so the
shaping
> logic in HarfBuzz is the same for all three.

Good to know. For the record, there's a function for checking if a script
is complex in the recent Harfbuzz-flavored Android OS: http://goo.gl/KL1KUi

> Where it does make a difference
> is if the font has ligatures, kerning, etc for those.  OpenType organizes
> those features by script, and if you request the wrong script you will
miss
> out on the features.

Makes sense to me for Hebrew, Arabic, Thai, etc., but I was bit surprised
to find-out that LATN was also a complex script.

So for instance, if I would shape some text containing Hebrew and English
solely using the HEBR script, I would probably loose kerning and ffi-like
ligatures for the english part (this is what I'm actually doing currently in
my "simple" BIDI implementation...)

> How you do font selection and what script you pass to HarfBuzz are two
> completely separate issues.  Font fallback stack should be per-language.

I understand that the best scenario will always be to take decisions based
on "language" rather than solely on "script", but it creates a problem:

Say you work on an API for Unicode text rendering: you can't promise your
users a solution where they would use arbitrary text without providing
language-context per span.

Or, to come back to the origin of the message: solutions like ICU's
"scrptrun" which are doing script detection are not appropriate (because
they won't help you finding the right font due to the lack of language
context...)

I guess the problem is even more generic, like with utf8-encoded html pages
rendered in modern browsers, as demonstrated by the creator of
liblinebreak: http://wyw.dcweb.cn/lang_utf8.htm

On Sun, Dec 22, 2013 at 10:47 PM, Behdad Esfahbod <behdad at behdad.org> wrote:

> On 13-12-22 10:10 AM, Ariel Malka wrote:
> > I'm trying to render "regular" (i.e. modern, horizontal) Japanese with
> Harfbuzz.
> >
> > So far, I have been using HB_SCRIPT_KATAKANA and it looks similar to
> what is
> > rendered via browsers.
> >
> > But after examining other rendering solutions I can see that "automatic
> script
> > detection" can often take place.
> >
> > For instance, the Mapnik project is using ICU's "scrptrun", which, given
> the
> > following sentence:
> >
> > ユニコードは、すべての文字に固有の番号を付与します
> >
> > would detect a mix of Katakana, Hiragana and Han scripts.
> >
> > But for instance, it would not change anything if I'd render the
> sentence by
> > mixing the 3 different scripts (i.e. instead of using only
> HB_SCRIPT_KATAKANA.)
> >
> > Or are there situations where it would make a difference?
>
> As it happens, those three scripts are all considered "simple", so the
> shaping
> logic in HarfBuzz is the same for all three.  Where it does make a
> difference
> is if the font has ligatures, kerning, etc for those.  OpenType organizes
> those features by script, and if you request the wrong script you will miss
> out on the features.
>
>
> > I'm asking that because I suspect a catch-22 situation here. For
> example, the
> > word "diameter" in Japanese is 直径 which, given to "scrptrun" would be
> > detected as Han script.
> >
> > As far as I understand, it could be a problem on systems where
> > DroidSansFallback.ttf is used, because the word would look like in
> Simplified
> > Chinese.
> >
> > Now, if we were using MTLmr3m.ttf, which is preferred for Japanese, the
> word
> > would have been rendered as intended.
>
> How you do font selection and what script you pass to HarfBuzz are two
> completely separate issues.  Font fallback stack should be per-language.
>
> > Reference: https://code.google.com/p/chromium/issues/detail?id=183830
> >
> > Any feedback would be appreciated. Note that the wisdom accumulated here
> will
> > be translated into tangible info and code samples (see
> > https://github.com/arielm/Unicode)
> >
> > Thanks!
> > Ariel
> >
> >
> > _______________________________________________
> > HarfBuzz mailing list
> > HarfBuzz at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/harfbuzz
> >
>
> --
> behdad
> http://behdad.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131223/38998b38/attachment-0001.html>


More information about the HarfBuzz mailing list