[HarfBuzz] Harfbuzz Sinhala (si) script support status update
Behdad Esfahbod
behdad at behdad.org
Sun Jul 22 20:31:22 PDT 2012
On 07/22/2012 10:33 PM, Harshula wrote:
> Hi,
>
> 1) I did a quick test of medium complexity Sinhala shaping and harfbuzz
> segfaults. Harfbuzz has definitely gone backwards for Sinhala shaping
> following the latest commits and is unusable.
>
> string: ර්කෙ
> encoding: <Ra,Al,ZWJ,Ka,kombuva>
>
> kombuva = U+0DD9
I can't reproduce the segfault using lklug font or FreeSerif. More info
needed. Also, can you confirm that you have HarfBuzz with glib compiled in?
> 2) Perhaps you would like to use the following script:
> http://git.savannah.gnu.org/cgit/sinhala.git/plain/utils/gen-unicode-sinhala.py
>
> And then run:
> gen-unicode-sinhala.py glyphs
Thanks. Added to the test suite.
behdad
> The output will contain almost all the valid encoding sequences that a
> layout engine needs to handle. Some of these encoding sequences are (a)
> not supported by any font, (b) are not used by the Sinhala language.
> That said, a layout engine should not segfault on that output.
>
> cya,
> #
>
> On Mon, 2012-07-23 at 02:36 +1000, Harshula wrote:
>> Hi Behdad and Jonathan,
>>
>> 1) I did a quick test of the latest commits. Basic Sinhala shaping seems
>> to have improved for Bhashitha font (IIRC, the original version was for
>> Windows) and gone backwards with GNU Free Font and LKLUG font.
>>
>> The following file contains strings that represent the minimal shaping
>> support required:
>> http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt
>>
>> This is how the output should look like:
>> http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.png
>>
>> FreeSerif font:
>> http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip
>>
>> LKLUG font:
>> http://sinhala.sourceforge.net/files/lklug.qa.ttf
>>
>> Both Pango and ICU are able to shape the content of
>> icu-sinhala-rendering.txt correctly using either FreeSerif or LKLUG
>> fonts.
>>
>>
>> 2) Please see my comments below regarding some of the commits. Also,
>> double check with the standard (SLS 1134):
>> http://www.icta.lk/en/programmes/pli-development/104-local-languages-initiative-/658-sinhala-standard-sls-1134-2004.html
>>
>>
>> On Thu, 2012-07-19 at 06:52 -0700, Behdad Esfahbod wrote:
>>
>>> New commits:
>>> commit 422ecd2d3c198a36d07d409341cb82ea57c7ad6b
>>> Author: Behdad Esfahbod <behdad at behdad.org>
>>> Date: Wed Jul 18 23:25:58 2012 -0400
>>>
>>> [Indic] Accept a forced Rakar sequence at the end of syllable
>>>
>>> In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra. If you put that at the
>>> end of a Consonant,Matra syllable, you get a dotted-circle from
>>> Uniscribe. Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
>>> And people have been encoding that sequence... So, allow a forced
>>> "ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
>>>
>>> Fixes some 100 or more of Sinhala failures. Now at 622 only (0.23%).
>>
>> A Rakaaranshaya <AL,ZWJ,Ra> immediately follows a consonant. Any
>> dependent vowel(s) follow after the Rakaaranshaya.
>>
>> More generally, consonant clusters are encoded first and dependent
>> vowels follow afterwards. i.e. in phonetic order. Applications like text
>> to speech expect phonetic order.
>>
>> IIRC, <Cons,ZWJ,AL,ZWJ,Ra> and <DV,ZWJ,AL,ZWJ,Ra> are invalid encodings.
>>
>> There are encoding errors in Wikipedia. Allowing them to be identifiable
>> makes it easier for people to fix the errors.
>>
>>> commit 6fc1732003d71cf90d37247482772c3da884687f
>>> Author: Behdad Esfahbod <behdad at behdad.org>
>>> Date: Wed Jul 18 17:49:19 2012 -0400
>>>
>>> [Indic] Allow joiners on both sides of Halant at the same time
>>>
>>> The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
>>> for Rakar. Fixes two-thousand Sinhala tests. Not many left.
>>
>> <ZWJ,AL,ZWJ> is an invalid sequence. Valid sequences involving ZWJ are:
>>
>> <Cons,AL,ZWJ,Cons>
>> <Cons,ZWJ,AL,Cons>
>>
>> Note you could theoretically have something like
>> <Cons,ZWJ,AL,Cons,AL,ZWJ,Cons> .
>>
>>> commit 3285e107c9a83aeb552e67f9460680ff6d167d88
>>> Author: Behdad Esfahbod <behdad at behdad.org>
>>> Date: Wed Jul 18 17:22:14 2012 -0400
>>>
>>> [Indic] Implement Sinhala "Al Lakuna" Reph behavior
>>>
>>> In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
>>
>> Wasn't this working before? It is simply <Ra,AL,ZWJ,Cons> and I thought
>> the generic rules took care of it.
>>
>> cya,
>> #
>
>
>
More information about the HarfBuzz
mailing list