[HarfBuzz] Harfbuzz Sinhala (si) script support status update

Harshula harshula at gmail.com
Sun Jul 22 09:36:52 PDT 2012


Hi Behdad and Jonathan,

1) I did a quick test of the latest commits. Basic Sinhala shaping seems
to have improved for Bhashitha font (IIRC, the original version was for
Windows) and gone backwards with GNU Free Font and LKLUG font.

The following file contains strings that represent the minimal shaping
support required:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt

This is how the output should look like:
http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.png

FreeSerif font:
http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip

LKLUG font:
http://sinhala.sourceforge.net/files/lklug.qa.ttf

Both Pango and ICU are able to shape the content of
icu-sinhala-rendering.txt correctly using either FreeSerif or LKLUG
fonts.


2) Please see my comments below regarding some of the commits. Also,
double check with the standard (SLS 1134):
http://www.icta.lk/en/programmes/pli-development/104-local-languages-initiative-/658-sinhala-standard-sls-1134-2004.html


On Thu, 2012-07-19 at 06:52 -0700, Behdad Esfahbod wrote:

> New commits:
> commit 422ecd2d3c198a36d07d409341cb82ea57c7ad6b
> Author: Behdad Esfahbod <behdad at behdad.org>
> Date:   Wed Jul 18 23:25:58 2012 -0400
> 
>     [Indic] Accept a forced Rakar sequence at the end of syllable
>     
>     In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
>     end of a Consonant,Matra syllable, you get a dotted-circle from
>     Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
>     And people have been encoding that sequence...  So, allow a forced
>     "ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
>     
>     Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).

A Rakaaranshaya <AL,ZWJ,Ra> immediately follows a consonant. Any
dependent vowel(s) follow after the Rakaaranshaya.

More generally, consonant clusters are encoded first and dependent
vowels follow afterwards. i.e. in phonetic order. Applications like text
to speech expect phonetic order.

IIRC, <Cons,ZWJ,AL,ZWJ,Ra> and <DV,ZWJ,AL,ZWJ,Ra> are invalid encodings.

There are encoding errors in Wikipedia. Allowing them to be identifiable
makes it easier for people to fix the errors.

> commit 6fc1732003d71cf90d37247482772c3da884687f
> Author: Behdad Esfahbod <behdad at behdad.org>
> Date:   Wed Jul 18 17:49:19 2012 -0400
> 
>     [Indic] Allow joiners on both sides of Halant at the same time
>     
>     The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
>     for Rakar.  Fixes two-thousand Sinhala tests.  Not many left.

<ZWJ,AL,ZWJ> is an invalid sequence. Valid sequences involving ZWJ are:

<Cons,AL,ZWJ,Cons>
<Cons,ZWJ,AL,Cons>

Note you could theoretically have something like
<Cons,ZWJ,AL,Cons,AL,ZWJ,Cons> .

> commit 3285e107c9a83aeb552e67f9460680ff6d167d88
> Author: Behdad Esfahbod <behdad at behdad.org>
> Date:   Wed Jul 18 17:22:14 2012 -0400
> 
>     [Indic] Implement Sinhala "Al Lakuna" Reph behavior
>     
>     In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.

Wasn't this working before? It is simply <Ra,AL,ZWJ,Cons> and I thought
the generic rules took care of it.

cya,
#




More information about the HarfBuzz mailing list