[HarfBuzz] Harfbuzz Sinhala (si) script support status update

Harshula harshula at gmail.com
Sun Jul 22 20:37:05 PDT 2012


On Sun, 2012-07-22 at 23:10 -0400, Behdad Esfahbod wrote:
> Which fonts is this with?

The following three:

FreeSerif font:
http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip

LKLUG font:
http://sinhala.sourceforge.net/files/lklug.qa.ttf

Bhashitha:
http://www.icta.lk/en/programmes/pli-development/104-local-languages-initiative-/625-unicode-compliant-local-language-fonts.html

cya,
#

> On 07/22/2012 10:33 PM, Harshula wrote:
> > Hi,
> > 
> > 1) I did a quick test of medium complexity Sinhala shaping and harfbuzz
> > segfaults. Harfbuzz has definitely gone backwards for Sinhala shaping
> > following the latest commits and is unusable.
> > 
> > string: ර්‍කෙ
> > encoding: <Ra,Al,ZWJ,Ka,kombuva>
> > 
> > kombuva = U+0DD9
> > 
> > 2) Perhaps you would like to use the following script:
> > http://git.savannah.gnu.org/cgit/sinhala.git/plain/utils/gen-unicode-sinhala.py
> > 
> > And then run:
> > gen-unicode-sinhala.py glyphs
> > 
> > The output will contain almost all the valid encoding sequences that a
> > layout engine needs to handle. Some of these encoding sequences are (a)
> > not supported by any font, (b) are not used by the Sinhala language.
> > That said, a layout engine should not segfault on that output.
> > 
> > cya,
> > #
> > 
> > On Mon, 2012-07-23 at 02:36 +1000, Harshula wrote:
> >> Hi Behdad and Jonathan,
> >>
> >> 1) I did a quick test of the latest commits. Basic Sinhala shaping seems
> >> to have improved for Bhashitha font (IIRC, the original version was for
> >> Windows) and gone backwards with GNU Free Font and LKLUG font.
> >>
> >> The following file contains strings that represent the minimal shaping
> >> support required:
> >> http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt
> >>
> >> This is how the output should look like:
> >> http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.png
> >>
> >> FreeSerif font:
> >> http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip
> >>
> >> LKLUG font:
> >> http://sinhala.sourceforge.net/files/lklug.qa.ttf
> >>
> >> Both Pango and ICU are able to shape the content of
> >> icu-sinhala-rendering.txt correctly using either FreeSerif or LKLUG
> >> fonts.
> >>
> >>
> >> 2) Please see my comments below regarding some of the commits. Also,
> >> double check with the standard (SLS 1134):
> >> http://www.icta.lk/en/programmes/pli-development/104-local-languages-initiative-/658-sinhala-standard-sls-1134-2004.html
> >>
> >>
> >> On Thu, 2012-07-19 at 06:52 -0700, Behdad Esfahbod wrote:
> >>
> >>> New commits:
> >>> commit 422ecd2d3c198a36d07d409341cb82ea57c7ad6b
> >>> Author: Behdad Esfahbod <behdad at behdad.org>
> >>> Date:   Wed Jul 18 23:25:58 2012 -0400
> >>>
> >>>     [Indic] Accept a forced Rakar sequence at the end of syllable
> >>>     
> >>>     In Sinhala, Rakar is formed by Al-Lakuna,ZWJ,Ra.  If you put that at the
> >>>     end of a Consonant,Matra syllable, you get a dotted-circle from
> >>>     Uniscribe.  Apparently adding a ZWJ before the Al-Lakuna "fixes" that.
> >>>     And people have been encoding that sequence...  So, allow a forced
> >>>     "ZWJ,Virama,ZWJ,Ra" sequence at the of syllables.
> >>>     
> >>>     Fixes some 100 or more of Sinhala failures.  Now at 622 only (0.23%).
> >>
> >> A Rakaaranshaya <AL,ZWJ,Ra> immediately follows a consonant. Any
> >> dependent vowel(s) follow after the Rakaaranshaya.
> >>
> >> More generally, consonant clusters are encoded first and dependent
> >> vowels follow afterwards. i.e. in phonetic order. Applications like text
> >> to speech expect phonetic order.
> >>
> >> IIRC, <Cons,ZWJ,AL,ZWJ,Ra> and <DV,ZWJ,AL,ZWJ,Ra> are invalid encodings.
> >>
> >> There are encoding errors in Wikipedia. Allowing them to be identifiable
> >> makes it easier for people to fix the errors.
> >>
> >>> commit 6fc1732003d71cf90d37247482772c3da884687f
> >>> Author: Behdad Esfahbod <behdad at behdad.org>
> >>> Date:   Wed Jul 18 17:49:19 2012 -0400
> >>>
> >>>     [Indic] Allow joiners on both sides of Halant at the same time
> >>>     
> >>>     The sequence <ZWJ,Al-Lakuna,ZWJ> is used in Sinhala to explicitly ask
> >>>     for Rakar.  Fixes two-thousand Sinhala tests.  Not many left.
> >>
> >> <ZWJ,AL,ZWJ> is an invalid sequence. Valid sequences involving ZWJ are:
> >>
> >> <Cons,AL,ZWJ,Cons>
> >> <Cons,ZWJ,AL,Cons>
> >>
> >> Note you could theoretically have something like
> >> <Cons,ZWJ,AL,Cons,AL,ZWJ,Cons> .
> >>
> >>> commit 3285e107c9a83aeb552e67f9460680ff6d167d88
> >>> Author: Behdad Esfahbod <behdad at behdad.org>
> >>> Date:   Wed Jul 18 17:22:14 2012 -0400
> >>>
> >>>     [Indic] Implement Sinhala "Al Lakuna" Reph behavior
> >>>     
> >>>     In Sinhala, Reph is formed only explicitly, by the presence of a ZWJ.
> >>
> >> Wasn't this working before? It is simply <Ra,AL,ZWJ,Cons> and I thought
> >> the generic rules took care of it.
> >>
> >> cya,
> >> #
> > 
> > 
> > 





More information about the HarfBuzz mailing list