[HarfBuzz] Hangul GSUB features
Jonathan Kew
jfkthame at googlemail.com
Sat Jan 25 11:17:47 PST 2014
On 25/1/14 17:36, mskala at ansuz.sooke.bc.ca wrote:
>>> * Conditional on some assessment of the structure of the syllable
>>> (perhaps the existence of a precomposed glyph?) the *jmo features may
>>> be applied - presumably to the output of ccmp, if it was applied.
>>
>> Yes - remembering that the decision as to which *jmo feature, if any, applies
>> to a given glyph was made *before* ccmp, and knows nothing about any changes
>> that happened there.
>
> What happens to these decisions when ccmp make substitutions? If we have
> a single glyph L tagged for ljmo and ccmp replaces it with a single glyph,
> is the new glyph also tagged for ljmo?
Yes.
> If we have something like L tagged
> for ljmo followed by LV not tagged, and ccmp replaces the pair of them
> with a single LLV glyph, will the LLV glyph be tagged?
Yes (at least, I think that's right - it'd be worth double-checking).
However, note that if you have, say, LV (not tagged for any *jmo
feature) followed by T (tagged tjmo) and replace the pair with LVT, I
don't think the resulting LVT will inherit the tjmo. When GSUB does a
many-to-one substitution, the result inherits the feature flags of the
first glyph in the input sequence, and the feature flags of the
subsequent glyph(s) are lost.
> If we have
> something like a single LLL glyph tagged for ljmo (the shaper would do
> that, right?) and ccmp splits it into three glyphs L L L, which if any of
> the new glyphs will inherit the tagging status of the original?
Yes. One-to-many will duplicate the features of the one to its many
replacements.
The two problems you're facing, I think, with the current harfbuzz code
in relation to the use of *jmo in your font are that:
(a) precomposed characters (LV, LVT) do not get tagged for any *jmo
features, and if you decompose them with ccmp, the resulting glyphs
still aren't tagged for *jmo (unlike the case where the shaper
decomposes them); and
(b) sequences with multiple L, V and/or T jamos are not recognized as
matching the <L, V [,T]?> pattern, and so do not get tagged for *jmo. In
something like <L, L, L, V, V, V, T, T, T>, the only two glyphs that
would be tagged for *jmo features would be the adjacent <L, V> pair; all
the rest would be considered "not part of a valid syllable" and left
untagged.
But if you ignore the *jmo features altogether, and do everything in a
series of ccmp lookups, I don't see why it shouldn't work as you intend.
>>> If I go this route, defining no *jmo tables, can I depend on ccmp and liga
>>> always being applied and always in that order?
>>
>> Currently, at least in harfbuzz, ccmp and liga (and the *jmo features, when
>> used) are all applied "together", with the order of lookups being their order
>
> What does applying them "together" mean? Is it just that nothing other
> than feature application is done in between applying features, or are
> they somehow simultaneous? In other words, does the output of each one
> become the input of the next, or are they all looking at the same input
> with the output somehow recombined?
What actually happens is more like the description in
http://www.microsoft.com/typography/otspec/chapter2.htm:
"After choosing which features to use, the client assembles all lookups
from the selected features. Multiple lookups may be needed to define the
data required for different substitution and positioning actions, as
well as to control the sequencing and effects of those actions.
To implement features, a client applies the lookups in the order the
lookup definitions occur in the LookupList. As a result, within the GSUB
or GPOS table, lookups from several different features may be
interleaved during text processing."
So for the L glyph in an <L, V, T> sequence, for example, the selected
features will include ljmo, as well as the "global" features ccmp and
liga (and others such as rlig, locl, etc.) We collect a list of all the
lookups from all these features, and apply those lookups in the order
they're defined in the font's LookupList, *not* in any predetermined
feature order.
Some shapers - particularly the Indic one - do apply features in
separate passes, because (unfortunately) that's how Microsoft chose to
implement their Indic fonts and shaper, but we have not found this to be
necessary for Hangul, and would prefer to avoid it.
>
> If I have glyphs L V T, with features ljmo and vjmo run in that order
> (glyph L tagged for ljmo and glyph V tagged for vjmo), and I want ljmo to
> change L into L.alt and vjmo to change V into V.alt, should vjmo contain a
> rule like "sub L.alt V' T" or like "sub L V' T"?
As you'll see from the above, this depends on how you order the lookups
(rather than on a fixed feature order imposed by the shaper).
>
> I thought that with multiple lookups in a single feature, substitution
> would still stop as soon as it found a match - so that the multiple
> lookups have the same effect as a single long lookup, with the advantages
> over really using a single long lookup being that using more than one
> allows sharing parts of tables among separate features, and splitting into
> more than one table allows representing runs of simpler rules in more
> concise table formats.
>
> But some quick experiments with FontForge suggest that in fact (at least
> in FontForge) it's as you imply: with multiple lookups in a feature, each
> one is applied to the output of the previous one. Thanks for bringing
> that to my attention! It will make things a lot easier for me.
Perhaps you were confusing this with the case of multiple *subtables*
within a single *lookup*. In this case, once a match occurs in one of
the subtables, the lookup is considered to have finished, and the
following subtables are not applied.
But multiple *lookups* within a single *feature* are definitely
supported and used.
>
> Something else I hadn't realized, but have just now verified at least in
> the case of FontForge, was that the order of tables in the font can
> override the "ccmp must be applied first" rule. I thought that was
> advice for renderers, but apparently it's the font's responsibility to
> implement it by putting ccmp first in the file.
Yes - again, see above.
I have not tested whether Uniscribe behaves this way for Hangul, or
whether it runs the features separately (as seems to be implied by the
old documentation). Provided you design your lookups to be applied in
the documented ccmp/ljmo/vjmo/tjmo/liga order *and* arrange the lookups
this way in the font, it shouldn't matter whether the shapers run them
"all at once" according to the generic OpenType spec or in separate passes.
JK
More information about the HarfBuzz
mailing list