[HarfBuzz] Hangul GSUB features

Sat Jan 25 09:36:31 PST 2014

On Sat, 25 Jan 2014, Jonathan Kew wrote:
> Because (4) and (5) are not canonically equivalent, they will not *function*
> as equivalents in general-purpose Unicode-based software, even when that
...
> And because these sequences are not equivalent, and will not be folded
> together during normalization or other Unicode-aware operations, I think we're
> actually doing users a *disservice* and hurting the reusability of data if we
> force them to display the same. This will mislead users into expecting
> interoperable behavior that will not actually work.

That's a philosophical question we're unlikely to settle here, but
fortunately, we don't need to.  I think that for my purposes, I want to at
least have the possibility of rendering such sequences available, even if
it ends up being a non-default compile-time option when the fonts are
being generated.

> >     * Conditional on some assessment of the structure of the syllable
> >       (perhaps the existence of a precomposed glyph?) the *jmo features may
> >       be applied - presumably to the output of ccmp, if it was applied.
>
> Yes - remembering that the decision as to which *jmo feature, if any, applies
> to a given glyph was made *before* ccmp, and knows nothing about any changes
> that happened there.

What happens to these decisions when ccmp make substitutions?  If we have
a single glyph L tagged for ljmo and ccmp replaces it with a single glyph,
is the new glyph also tagged for ljmo?  If we have something like L tagged
for ljmo followed by LV not tagged, and ccmp replaces the pair of them
with a single LLV glyph, will the LLV glyph be tagged?  If we have
something like a single LLL glyph tagged for ljmo (the shaper would do
that, right?) and ccmp splits it into three glyphs L L L, which if any of
the new glyphs will inherit the tagging status of the original?

[liga]
> it's enabled by default, authors may turn it off (directly, or as a
> side-effect of other styling). You probably don't want your basic Hangul
> support to break when ligatures are disabled.

True, though the effect in the current architecture would be to disable
precomposed syllables and instead render all syllables by composing parts,
just as if no precomposed syllables exist.  That won't look very bad, and
if precomposed syllables are not ligatures, they are awfully similar in
nature to ligatures - especially in a handwriting-styled font.  Disabling
them when ligatures are disabled might actually be the right thing to do.
Someone who wanted to make the suggestion to not disable the feature a bit
stronger, might put the substitutions in rlig instead of liga.

> > If I go this route, defining no *jmo tables, can I depend on ccmp and liga
> > always being applied and always in that order?
>
> Currently, at least in harfbuzz, ccmp and liga (and the *jmo features, when
> used) are all applied "together", with the order of lookups being their order

What does applying them "together" mean?  Is it just that nothing other
than feature application is done in between applying features, or are
they somehow simultaneous?  In other words, does the output of each one
become the input of the next, or are they all looking at the same input
with the output somehow recombined?

If I have glyphs L V T, with features ljmo and vjmo run in that order
(glyph L tagged for ljmo and glyph V tagged for vjmo), and I want ljmo to
change L into L.alt and vjmo to change V into V.alt, should vjmo contain a
rule like "sub L.alt V' T" or like "sub L V' T"?

> > Is there some longer
> > sequence of global tables I can depend on always being applied and always in
> > a specific order?
>
> Remember that you can have a whole sequence of lookups within a single
> feature; you don't need multiple features to achieve this.

I thought that with multiple lookups in a single feature, substitution
would still stop as soon as it found a match - so that the multiple
lookups have the same effect as a single long lookup, with the advantages
over really using a single long lookup being that using more than one
allows sharing parts of tables among separate features, and splitting into
more than one table allows representing runs of simpler rules in more
concise table formats.

But some quick experiments with FontForge suggest that in fact (at least
in FontForge) it's as you imply:  with multiple lookups in a feature, each
one is applied to the output of the previous one.  Thanks for bringing
that to my attention!  It will make things a lot easier for me.

Something else I hadn't realized, but have just now verified at least in
the case of FontForge, was that the order of tables in the font can
override the "ccmp must be applied first" rule.  I thought that was
advice for renderers, but apparently it's the font's responsibility to
implement it by putting ccmp first in the file.

> > Will the "shaper", even in the absence of *jmo tables,
> > perform some translations on the sequence of code points that I need to know
> > about in building my substitution table(s)?
>
> Yes; as described earlier, it will replace <L, V [, T]> and <LV, T> sequences
> with precomposed syllables where possible; and it will also decompose <LV, T>
> to <L, V, T> if a suitable <LVT> does not exist. However, I don't think this
> should matter to you, as your tables are presumably designed to support these
> equivalents anyway.

Yes - if I'm splitting and rejoining precomposed syllables myself with the
intention of hiding all distinctions between the precomposed and
uncomposed sequences, then it doesn't matter if the shaper does some of
that kind of thing first.

-- 
Matthew Skala
mskala at ansuz.sooke.bc.ca                 People before principles.
http://ansuz.sooke.bc.ca/