<div dir="ltr">BTW, Khaled, do you have examples of U+06E8 in the text of the Koran? I would appreciate sura and aya numbers if you do.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 18, 2013 at 4:33 PM, Roozbeh Pournader <span dir="ltr"><<a href="mailto:roozbeh@google.com" target="_blank">roozbeh@google.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Let me try to approach the problem from another angle.<div><br></div><div>Unicode, although originally planned to be more semantic, has become more and more a graphical encoding. This can be evidenced by the new characters encoded or not encoded. The UTC continuously refers people to use existing code points for things that are graphically similar to already-encoded characters but are semantically very different, but encodes new characters that are semantically the same as existing characters, but their exact visual representation is important and is based on rules that are very hard to derive.</div> <div><br></div><div>This is inevitable to some degree, since text rendering technology and fonts should not be expected to be very complex. So plain text representation becomes more visual in order to make life easier for the rendering engines.</div> <div><br></div><div>This can be evidenced by a lot of the newer characters in the Arabic blocks. The open tanweens or arrowheads in the Arabic Extended-A block were encoded because they were graphically different, while the committee did not encode a "waw with madda above" and recommended "waw+madda above" to be used for it instead. The diacritical hamza was the most controversial, and the controversy is the main reason for the hole at U+08A1 (it is reserved for a Beh With Hamza Above, which will be in Unicode 7.0).</div> <div><br></div><div>All in all, this means that UTC considers anything that very much looks like U+0653 a madda above, and anything that may need to be visually distinguished from it and be smaller in size a small high madda. The glyphs used in the chart show a significant size difference, and has been showing that difference since the small high madda got encoded in Unicode 2.0. Unicode actually doesn't prescribe exact usage of a lot of the Koranic marks, because the marks may be used very differently across the various Koranic traditions from Indonesia to Morocco.</div> <div><br></div><div>I don't think it's a good idea to consider madda to be a certain kind of hamza. Yes, in the modern Arabic language Alef+madda above is semantically equivalent to hamza+alef or alef+alef, but there is no hint of a hamza semantic when some minority languages using the Arabic script takes a madda and puts it over a waw to get a new vowel.</div> <div><br></div><div>I understand that means that there may be no "real" semantic difference between a normal madda and a small high madda, but there's really no semantic difference between a yeh and a farsi yeh either, and they are separately encoded. Unicode is quite graphical in its encoding.</div> <div><br></div><div>Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such characters anymore, except for the use of hamza above for diacritic usages of non-hamza semantics. So there may as well be future siblings for U+0681, U+076C, U+08A1, and U+08A8, but no future siblings to U+06C7 and U+06C8.</div> <div><br></div><div>Please tell me if there's anything I've missed to address.</div> </div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <span dir="ltr"><<a href="mailto:khaledhosny@eglug.org" target="_blank">khaledhosny@eglug.org</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:<br> > Khaled, you are referring to a specific style of writing the Koran. There<br> > are several others, which Unicode should be able to represent.<br> <br> </div>I’m not sure I follow here, if you think there should be a way to<br> differentiate between two forms of prolongation mark (aka Quranic<br> Madda), something I have never seen but i’m open to learn something new,<br> then a new code point should be encoded, instead of abusing a Hamza (aka<br> the other Madda) that has an incompatible normalization behaviour in<br> Unicode.<br> <br> And you ignored my other point.<br> <br> Regards,<br> Khaled<br> <div><div><br> > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <<a href="mailto:khaledhosny@eglug.org" target="_blank">khaledhosny@eglug.org</a>> wrote:<br> ><br> > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:<br> > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <<a href="mailto:khaledhosny@eglug.org" target="_blank">khaledhosny@eglug.org</a>><br> > > wrote:<br> > > ><br> > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above><br> > > > ><br> > > ><br> > > > Why?<br> > ><br> > > Because every Mushaf printed in Egypt (and most of the Arabic world)<br> > > since 1919[1] has a note at the end of Madda description stating that “…<br> > > and this mark should not be used to indicate an omitted Alef after[sic]<br> > > a written Alef, as in آمنوا, that were mistakingly put in many<br> > > Mushafs …”, which to me is a very frank indication that the two marks<br> > > are not the same thing.<br> > ><br> > > Also a vowel mark (which the Quranic Madda is) should not “blend” with<br> > > its base letter, the same way that U+06C7 is not canonically equivalent<br> > > to <U+0648,U+064F> etc.<br> > ><br> > > Regards,<br> > > Khaled<br> > ><br> > > 1. The date of first Mushaf printed by Al-Azhar where most of the<br> > > Quranic annotation marks were formalized and standardized.<br> > ><br> </div></div></blockquote></div><br></div> </div></div></blockquote></div><br></div>