[HarfBuzz] Fwd: Discrepancies in IndicMatraCategory.txt and the book

Behdad Esfahbod behdad at behdad.org
Fri Jun 1 12:45:47 PDT 2012


-------- Original Message --------
Subject: Discrepancies in IndicMatraCategory.txt and the book
Date: Fri, 01 Jun 2012 15:44:51 -0400
From: Behdad Esfahbod <behdad at behdad.org>
To: Unicore Mailing List <unicore at unicode.org>,  Kenneth Whistler
<kenw at sybase.com>


I received my paperback a couple days ago and have been enjoying browsing it
on the couch.  Every time I open it up I come across something interesting
that I have not had seen in the PDF version before...

Anyway, being busy with Indic shaping these days, I thought I put the Indic
data in UCD to test.

I first compared Table 4-4 Class Zero Combining Marks—Reordrant of the
standard to the Indic_Matra_Category=Left.  Here's what I found:

  * The standard lists U+0DDA as a left matra, but UCD correctly marks it as
Left_And_Top.  Indeed, the standard lists it as "Left and Top" in Table 4.6.
Class Zero Combining Marks—Split.  So, U+0DDA needs to be removed from table 4-4.

  * The standard lists U+1A55 as reordrant.  The IndicMatraCategory doesn't
list it, correctly, because this is a medial consonant, not matra.  I'm not
sure though, whether a text rendering system can simply handle pre-base
reordering characters as if they are left matras...  I have not implemented
them yet.

  * The standard lists U+1C29 as reordrant, but UCD marks it Top_And_Left.
This is NOT a split glyph, so I think fixing UCD to mark it Left would be better.

  * The standard lists U+1C34 and U+1C35 as reordrants, but IndicMatraCategory
doesn't list them since they are not vowels.

  * Same about U+AA34, it's a pre-base reordering Ra.

  * The standard lists U+11184 as left matra.  UCD has U+111B4, and the charts
suggest that this is a typo.

Table 4-5. Thai, Lao, and Tai Viet Logical Order Exceptions is in agreement
with Indic_Matra_Category=Visual_Order_Left.  Good.

Table 4-6. Class Zero Combining Marks—Split:

  * For "Left and right" and "Left, top, and right", the standard and UCD
agree except for one item: The standard marks U+0DDD as Top+Left+Right, while
UCD does Left+Right.  The standard seems to be right here.  UCD needs to be fixed.

  * "Left and top" disagrees with UCD on U+1C29 as explained above.

  * The standard doesn't list U+0AC9 in "Top and right" but UCD does.  UCD
seems to be right to me.

  * The standard doesn't list U+1112E and U+1112F in "Top and bottom".  It should.

  * The standard doesn't list U+A9C0 in "Bottom and right" but UCD does.  This
is, again, not a split mark, so I think UCD should mark it Right instead.

That's it for now.


More information about the HarfBuzz mailing list