[HarfBuzz] The canonical ordering of hamza marks
Roozbeh Pournader
roozbeh at google.com
Fri Oct 18 14:04:48 PDT 2013
On Fri, Oct 18, 2013 at 7:52 AM, Khaled Hosny <khaledhosny at eglug.org> wrote:
> > Very obscure test data, just to demonstrate the algorithm:
> >
> > src: 0618 0619 064E 064F 0654 0658 0653 0654 0651 0656 0651 065C 0655
> 0650
> > ccc: 30 31 30 31 230 230 230 230 33 220 33 220 220
> 32
> > MCM: Yes Yes Yes Yes
> >
> > out: 0654 0658 0651 0651 0618 064E 0619 064F 0650 0656 065C 0655 0653
> 0654
> > ccc: 230 230 33 33 30 30 31 31 32 220 220 220 230
> 230
> > MCM: Yes Yes Yes
> Yes
>
> I think the order of Hamza below is not right, I'd expect it to come at
> least before other below marks, regardless of whether there are other
> MCM marks in the sequence or not.
>
The order is right. It is blocked by the U+065C, which has the same
combining class of 220. If a user is intentionally putting something of the
same combining class before the hamza below, we shouldn't reorder them,
since he intended that order. The same cannot be said about character of
different combining classes, since various normalizations may reorder them.
I disagree here, 0653 is actually a special form of Hamza and should be
> treated as other MCM marks. The madda used in Quran serves a quite
> different purpose and had its own code point; U+06E4 ARABIC SMALL
> HIGH MADDA.
>
Korans use two different kinds of madda. U+06E4 (Small High Madda) is the
"small" madda used over U+06E5, U+06E6, U+06E7 and U+08F3, or if someone
wants to use a smaller madda differentiated from the normal madda for some
semantic or visual reason. I actually have a Unicode editorial committee
action to clarify that in the text of Unicode 7.0.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131018/81087cbe/attachment.html>
More information about the HarfBuzz
mailing list