[HarfBuzz] New Indic standard?

G Karunakar indlinux at gmail.com
Wed Aug 19 13:31:01 PDT 2009

On Wed, Aug 19, 2009 at 10:08 PM, Ed Trager<ed.trager at gmail.com> wrote:
>> What does such a test suite involve? In the past, we have prepared
>> a list of base characters, plus allowed conjuncts (along with
>> example words) for some Indic languages in ICU. Along with these,
>> we have prepared screenshots of the expected rendering, which can
>> be compared to Harfbuzz rendering. Does that suffice?
> Is the pre-existing list of base characters plus allowed conjucts that
> you prepared for ICU testing comprehensive?
> If not comprehensive, can you provide a ballpark percentage of how
> much has been covered or not?
> Does it cover ALL the "major" languages commonly written using one of
> the Indic scripts, or just some of them?
> Are there annotations indicating, for example, conjuncts that are
> specifically allowed for some languages (say, perhaps classical
> Sanskrit) but not allowed or deprecated or considered old-fashioned
> etc. for some other languages (say, Modern Hindi)?
> Ideally we need to know more details about what you or anyone else has
> available before the question of "sufficiency" can be settled ...
> Where is the URL for the ICU test suite that you mention?  I would
> like to look at that, as I am sure others would too.  Having a test
> suite that is publically available would be a great first step.
> Setting up such a resource so that people could contribute / edit /
> add additional test cases would be a great next step.

Perhaps not for ICU, but UTF-8 text samples have be made for some languages..
Atleast the ones I can immediately point to

List of practical two consonant conjuncts
with some example words

Sample text containing most of Devanagari characters (95% of current
Unicode devanagari range)

Ok. dont have immediate rendering outputs availble as of now, put can
generate and put online.

Samples for Malayalam

Off course not very comprehensive test data to be directly used in a
test suite. Atleast for Devanagari/Hindi I can get back with such test
data in month.


