[HarfBuzz] The canonical ordering of hamza marks
Khaled Hosny
khaledhosny at eglug.org
Fri Oct 18 14:23:04 PDT 2013
On Fri, Oct 18, 2013 at 02:04:48PM -0700, Roozbeh Pournader wrote:
> On Fri, Oct 18, 2013 at 7:52 AM, Khaled Hosny <khaledhosny at eglug.org> wrote:
>
> > > Very obscure test data, just to demonstrate the algorithm:
> > >
> > > src: 0618 0619 064E 064F 0654 0658 0653 0654 0651 0656 0651 065C 0655
> > 0650
> > > ccc: 30 31 30 31 230 230 230 230 33 220 33 220 220
> > 32
> > > MCM: Yes Yes Yes Yes
> > >
> > > out: 0654 0658 0651 0651 0618 064E 0619 064F 0650 0656 065C 0655 0653
> > 0654
> > > ccc: 230 230 33 33 30 30 31 31 32 220 220 220 230
> > 230
> > > MCM: Yes Yes Yes
> > Yes
> >
> > I think the order of Hamza below is not right, I'd expect it to come at
> > least before other below marks, regardless of whether there are other
> > MCM marks in the sequence or not.
> >
>
> The order is right. It is blocked by the U+065C, which has the same
> combining class of 220. If a user is intentionally putting something of the
> same combining class before the hamza below, we shouldn't reorder them,
> since he intended that order. The same cannot be said about character of
> different combining classes, since various normalizations may reorder them.
OK, I don’t really grasp the description of the algorithm, so I’ll need
something testable to check
> > I disagree here, 0653 is actually a special form of Hamza and should be
> > treated as other MCM marks. The madda used in Quran serves a quite
> > different purpose and had its own code point; U+06E4 ARABIC SMALL
> > HIGH MADDA.
> >
>
> Korans use two different kinds of madda. U+06E4 (Small High Madda) is the
> "small" madda used over U+06E5, U+06E6, U+06E7 and U+08F3, or if someone
> wants to use a smaller madda differentiated from the normal madda for some
> semantic or visual reason. I actually have a Unicode editorial committee
> action to clarify that in the text of Unicode 7.0.
I have never seen any semantic differentiation between Madda over U+06E5
et al and madda over other letters in Quran, and there can't be, since
those small letters (by definition) are exactly the same as there full
size counterparts except that they were omitted in the archaic
orthography present when Quran was first written. Furthermore,
<alef,quranic madda> ≠ <alef with madda above> and since <U+0627,U+0653>
is canonical equivalent to U+0622 it cannot be used to represent
<alef,quranic madda>.
Regards,
Khaled
More information about the HarfBuzz
mailing list