[HarfBuzz] hangul shaper patches

Dohyun Kim nomosnomos at gmail.com
Wed Jan 22 19:39:17 PST 2014


I've just found that jieubsida fonts [1] from Tsukurimashou Font
Project [2] do not work well with current hangul shaper.

~$ hb-unicode-encode AC00 | hb-shape --script=hang JieubsidaBatang.otf
[uni1100=0+0|uni1161=0+833]

Expected output is:

~$ hb-unicode-encode AC00 | hb-shape --script=latn
--features=ljmo,vjmo JieubsidaBatang.otf
[uniAC00=0+833]

The reason seems to be that hangul shaper is currently applying *jmo
features too early. The author of jieubsida fonts has intended to
apply *jmo features after ccmp feature, and so arranged the order of
gsub lookup tables. But hangul shaper is applying *jmo features before
everything else.

I am curious about what the output on windows 8 machine is, which is
not available to me for now.

[1] http://sourceforge.jp/frs/redir.php?m=iij&f=%2Ftsukurimashou%2F59405%2Fjieubsida-otf-0.8.zip
[2] http://sourceforge.jp/projects/tsukurimashou/



2014/1/22 Behdad Esfahbod <behdad at behdad.org>:
> Thanks Jonathan.  I've merged these.
>
> A few points:
>
> /* Same order as the feature array below */
> enum {
>   NONE,
>
>   LJMO,
>   VJMO,
>   TJMO,
>
>   FIRST_HANGUL_FEATURE = LJMO,
>   HANGUL_FEATURE_COUNT = TJMO + 1
> };
>
> Do you really need the NONE?  I don't see where / how that's used.
>
> I just want to note that by applying the jamo features only to one character
> at a time, we disallow contextual rules, but I guess that's what Uniscribe
> does also?
>
> I'm a bit uncomfortable that we are moving marks BEFORE normalization.  But
> then again, we are turning as much of that piece off as possible.  Perhaps I
> should turn more of it off.
>
> Anyway, looks like everyone's happy with this, so am I.  Thanks again.
>
> Cheers,
> behdad
>
> On 14-01-19 08:30 PM, Jonathan Kew wrote:
>> Hi Behdad,
>>
>> I'm attaching a series of patches for improvements to the Hangul shaper. These
>> provide support for Old Hangul sequences that do not have a precomposed
>> Unicode form, and handle the tone-mark reordering.
>>
>> With these patches, we exactly match uniscribe on the wikipedia test corpus
>> using malgun.ttf, except for (a) cases where there's a character that's not
>> supported in the font, so uniscribe gives .notdef but harfbuzz finds a
>> compatibility fallback, and (b) a handful of words where there's an <LV, T>
>> sequence that uniscribe doesn't support (it has no corresponding LVT
>> syllable), but we handle by decomposing to <L, V, T> and applying jamo features.
>>
>> JK
>
> --
> behdad
> http://behdad.org/
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz



-- 
Dohyun Kim
Seoul, Republic of Korea


More information about the HarfBuzz mailing list