[HarfBuzz] Fwd: Discrepancies in IndicMatraCategory.txt and the book
Behdad Esfahbod
behdad at behdad.org
Fri Jun 1 12:45:47 PDT 2012
FYI.
-------- Original Message --------
Subject: Discrepancies in IndicMatraCategory.txt and the book
Date: Fri, 01 Jun 2012 15:44:51 -0400
From: Behdad Esfahbod <behdad at behdad.org>
To: Unicore Mailing List <unicore at unicode.org>, Kenneth Whistler
<kenw at sybase.com>
Hi,
I received my paperback a couple days ago and have been enjoying browsing it
on the couch. Every time I open it up I come across something interesting
that I have not had seen in the PDF version before...
Anyway, being busy with Indic shaping these days, I thought I put the Indic
data in UCD to test.
I first compared Table 4-4 Class Zero Combining Marks—Reordrant of the
standard to the Indic_Matra_Category=Left. Here's what I found:
* The standard lists U+0DDA as a left matra, but UCD correctly marks it as
Left_And_Top. Indeed, the standard lists it as "Left and Top" in Table 4.6.
Class Zero Combining Marks—Split. So, U+0DDA needs to be removed from table 4-4.
* The standard lists U+1A55 as reordrant. The IndicMatraCategory doesn't
list it, correctly, because this is a medial consonant, not matra. I'm not
sure though, whether a text rendering system can simply handle pre-base
reordering characters as if they are left matras... I have not implemented
them yet.
* The standard lists U+1C29 as reordrant, but UCD marks it Top_And_Left.
This is NOT a split glyph, so I think fixing UCD to mark it Left would be better.
* The standard lists U+1C34 and U+1C35 as reordrants, but IndicMatraCategory
doesn't list them since they are not vowels.
* Same about U+AA34, it's a pre-base reordering Ra.
* The standard lists U+11184 as left matra. UCD has U+111B4, and the charts
suggest that this is a typo.
Table 4-5. Thai, Lao, and Tai Viet Logical Order Exceptions is in agreement
with Indic_Matra_Category=Visual_Order_Left. Good.
Table 4-6. Class Zero Combining Marks—Split:
* For "Left and right" and "Left, top, and right", the standard and UCD
agree except for one item: The standard marks U+0DDD as Top+Left+Right, while
UCD does Left+Right. The standard seems to be right here. UCD needs to be fixed.
* "Left and top" disagrees with UCD on U+1C29 as explained above.
* The standard doesn't list U+0AC9 in "Top and right" but UCD does. UCD
seems to be right to me.
* The standard doesn't list U+1112E and U+1112F in "Top and bottom". It should.
* The standard doesn't list U+A9C0 in "Bottom and right" but UCD does. This
is, again, not a split mark, so I think UCD should mark it Right instead.
That's it for now.
Cheers,
behdad
More information about the HarfBuzz
mailing list