[HarfBuzz] The canonical ordering of hamza marks

Khaled Hosny khaledhosny at eglug.org
Fri Oct 18 14:23:04 PDT 2013


On Fri, Oct 18, 2013 at 02:04:48PM -0700, Roozbeh Pournader wrote:
> On Fri, Oct 18, 2013 at 7:52 AM, Khaled Hosny <khaledhosny at eglug.org> wrote:
> 
> > > Very obscure test data, just to demonstrate the algorithm:
> > >
> > > src: 0618 0619 064E 064F 0654 0658 0653 0654 0651 0656 0651 065C 0655
> > 0650
> > > ccc:   30   31   30   31  230  230  230  230   33  220   33  220  220
> > 32
> > > MCM:                      Yes  Yes       Yes                      Yes
> > >
> > > out: 0654 0658 0651 0651 0618 064E 0619 064F 0650 0656 065C 0655 0653
> > 0654
> > > ccc:  230  230   33   33   30   30   31   31   32  220  220  220  230
> >  230
> > > MCM:  Yes  Yes                                               Yes
> > Yes
> >
> > I think the order of Hamza below is not right, I'd expect it to come at
> > least before other below marks, regardless of whether there are other
> > MCM marks in the sequence or not.
> >
> 
> The order is right. It is blocked by the U+065C, which has the same
> combining class of 220. If a user is intentionally putting something of the
> same combining class before the hamza below, we shouldn't reorder them,
> since he intended that order. The same cannot be said about character of
> different combining classes, since various normalizations may reorder them.

OK, I don’t really grasp the description of the algorithm, so I’ll need
something testable to check

> > I disagree here, 0653 is actually a special form of Hamza and should be
> > treated as other MCM marks. The madda used in Quran serves a quite
> > different purpose and had its own code point; U+06E4 ARABIC SMALL
> > HIGH MADDA.
> >
> 
> Korans use two different kinds of madda. U+06E4 (Small High Madda) is the
> "small" madda used over U+06E5, U+06E6, U+06E7 and U+08F3, or if someone
> wants to use a smaller madda differentiated from the normal madda for some
> semantic or visual reason. I actually have a Unicode editorial committee
> action to clarify that in the text of Unicode 7.0.

I have never seen any semantic differentiation between Madda over U+06E5
et al and madda over other letters in Quran, and there can't be, since
those small letters (by definition) are exactly the same as there full
size counterparts except that they were omitted in the archaic
orthography present when Quran was first written. Furthermore,
<alef,quranic madda> ≠ <alef with madda above> and since <U+0627,U+0653>
is canonical equivalent to U+0622 it cannot be used to represent
<alef,quranic madda>.

Regards,
Khaled



More information about the HarfBuzz mailing list