[HarfBuzz] Regression in ZWJ handling for Indic CV ligatures

Ian-Mathew Hornburg imhornburg at gmail.com
Sat Mar 23 13:05:09 PDT 2013


The recent commit a8cf7b4 seems to have fixed the regression with ZWJs
in Indic scripts that I recently described and Khaled reported for me
here [http://lists.freedesktop.org/archives/harfbuzz/2013-March/003035.html],
but seems to’ve introduced another regression in Indic.

Relevant background: In Bengali and Oriya (and some other Indic
scripts, I think, but these’re what I’m personally familiar with)
certain special consonant-vowel combinations trigger a lookup for
ligatures. Which ones are included differs from font to font, but good
ones contain them, but Unicode describes a method for *not* selecting
the ligated form as well, involving ZWJs and ZWNJs. The chapter on
Bengali in the Unicode standard describes how the behavior is meant to
work.

A given font can choose whether or not to use the ligated forms as the
default for rendering. If the ligated form is the default, a ZWNJ can
be inserted between the consonant and vowel to request the non-ligated
form (e.g., C-ZWNJ-V). If the non-ligated form is the default
(uncommon, but possible), a ZWJ can be inserted inbetween to
explicitly request the ligated form. While it’s not mentioned in
either the Unicode or OpenType standards, the Oriya script also
contains many of these special consonant-vowel ligatures, just like
Bengali.

While most fonts default to the ligated versions, where available,
it’s my understanding of the two specs that ZWJs should be able to be
included *anyway* to explicitly request the CV ligatures, even if it’s
technically redundant. Testing with 0.9.13 showed correct behavior,
and the inclusion of the superfluous ZWJs worked just fine with the
Bengali and Oriya fonts I tested.

In regards to Oriya, I’ve been testing with INDOLIPI’s e-Oriya OT
font, since it’s the only OpenType Oriya font I know of that contains
many Oriya CV ligatures. [Freely-available here:
http://www.aai.uni-hamburg.de/indtib/INDOLIPI/Indolipi.htm]

For example, because e-Oriya OT defaults to CV ligatures where
available, the sequence <0B15, 0B3F> should result in a ligated form
(rather than the vowel-sign positioned above). Since it’s semantically
the same, the sequence <0B15, ZWJ, 0B3F> *should* result in the same
output, but currently outputs the non-ligated form. The sequence
<0B15, ZWNJ, 0B3F> correctly displays the non-ligated form.

Bengali text, strangely, currently functions *correctly* for ligatures
that’re of the form CV, but fails for those that’re CCV. For example,
with the Vrinda font, the equivalent sequences <0997, 09C1> and <0997,
ZWJ, 09C1> both correctly render a CV ligature, and <0997, ZWNJ, 09C1>
correctly renders a non-ligated form.

The following are two CCV sequences that have special ligatures and
are not rendering correctly:
    <09A4, 09CD, 09B0, 09C1> (correctly returns the default “tru” ligature)
    <09A4, 09CD, 09B0, 200D, 09C1> (should render the same as the
above; returns the below form)
    <09A4, 09CD, 09B0, 200C, 09C1> (correctly returns “tr” ligature
with attached vowel-sign below)

    <09A8, 09CD, 09A4, 09C1> (correctly returns the default “ntu” ligature)
    <09A8, 09CD, 09A4, 200D, 09C1> (same misbehavior as above)
    <09A8, 09CD, 09A4, 200C, 09C1> (correctly returns “nt” ligature
with attached vowel-sign below)

I’m not entirely sure how this should best be handled for fonts that
do not contain any CV ligature lookups. I would guess that the
sequences of CV, C-ZWJ-V, and C-ZWNJ-V should likely all render the
same since the expected behavior would be “satisfied”.

If it would be helpful, I can supply a list of these kinds of
ligatures for Bengali and Oriya scripts for testing purposes.



More information about the HarfBuzz mailing list