[HarfBuzz] The canonical ordering of hamza marks

Thu May 8 11:53:05 PDT 2014

Here's some relevant document Roozbeh set to UTC recently:

  http://www.unicode.org/L2/L2014/14127-arabic-marks-order.pdf

On 13-10-18 07:33 PM, Roozbeh Pournader wrote:
> Let me try to approach the problem from another angle.
> 
> Unicode, although originally planned to be more semantic, has become more and
> more a graphical encoding. This can be evidenced by the new characters encoded
> or not encoded. The UTC continuously refers people to use existing code points
> for things that are graphically similar to already-encoded characters but are
> semantically very different, but encodes new characters that are semantically
> the same as existing characters, but their exact visual representation is
> important and is based on rules that are very hard to derive.
> 
> This is inevitable to some degree, since text rendering technology and fonts
> should not be expected to be very complex. So plain text representation
> becomes more visual in order to make life easier for the rendering engines.
> 
> This can be evidenced by a lot of the newer characters in the Arabic blocks.
> The open tanweens or arrowheads in the Arabic Extended-A block were encoded
> because they were graphically different, while the committee did not encode a
> "waw with madda above" and recommended "waw+madda above" to be used for it
> instead. The diacritical hamza was the most controversial, and the controversy
> is the main reason for the hole at U+08A1 (it is reserved for a Beh With Hamza
> Above, which will be in Unicode 7.0).
> 
> All in all, this means that UTC considers anything that very much looks like
> U+0653 a madda above, and anything that may need to be visually distinguished
> from it and be smaller in size a small high madda. The glyphs used in the
> chart show a significant size difference, and has been showing that difference
> since the small high madda got encoded in Unicode 2.0. Unicode actually
> doesn't prescribe exact usage of a lot of the Koranic marks, because the marks
> may be used very differently across the various Koranic traditions from
> Indonesia to Morocco.
> 
> I don't think it's a good idea to consider madda to be a certain kind of
> hamza. Yes, in the modern Arabic language Alef+madda above is semantically
> equivalent to hamza+alef or alef+alef, but there is no hint of a hamza
> semantic when some minority languages using the Arabic script takes a madda
> and puts it over a waw to get a new vowel.
> 
> I understand that means that there may be no "real" semantic difference
> between a normal madda and a small high madda, but there's really no semantic
> difference between a yeh and a farsi yeh either, and they are separately
> encoded. Unicode is quite graphical in its encoding.
> 
> Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such characters
> anymore, except for the use of hamza above for diacritic usages of non-hamza
> semantics. So there may as well be future siblings for U+0681, U+076C, U+08A1,
> and U+08A8, but no future siblings to U+06C7 and U+06C8.
> 
> Please tell me if there's anything I've missed to address.
> 
> 
> On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <khaledhosny at eglug.org
> <mailto:khaledhosny at eglug.org>> wrote:
> 
>     On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:
>     > Khaled, you are referring to a specific style of writing the Koran. There
>     > are several others, which Unicode should be able to represent.
> 
>     I’m not sure I follow here, if you think there should be a way to
>     differentiate between two forms of prolongation mark (aka Quranic
>     Madda), something I have never seen but i’m open to learn something new,
>     then a new code point should be encoded, instead of abusing a Hamza (aka
>     the other Madda) that has an incompatible normalization behaviour in
>     Unicode.
> 
>     And you ignored my other point.
> 
>     Regards,
>     Khaled
> 
>     > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <khaledhosny at eglug.org
>     <mailto:khaledhosny at eglug.org>> wrote:
>     >
>     > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:
>     > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <khaledhosny at eglug.org
>     <mailto:khaledhosny at eglug.org>>
>     > > wrote:
>     > > >
>     > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above>
>     > > > >
>     > > >
>     > > > Why?
>     > >
>     > > Because every Mushaf printed in Egypt (and most of the Arabic world)
>     > > since 1919[1] has a note at the end of Madda description stating that “…
>     > > and this mark should not be used to indicate an omitted Alef after[sic]
>     > > a written Alef, as in آمنوا, that were mistakingly put in many
>     > > Mushafs …”, which to me is a very frank indication that the two marks
>     > > are not the same thing.
>     > >
>     > > Also a vowel mark (which the Quranic Madda is) should not “blend” with
>     > > its base letter, the same way that U+06C7 is not canonically equivalent
>     > > to <U+0648,U+064F> etc.
>     > >
>     > > Regards,
>     > > Khaled
>     > >
>     > > 1. The date of first Mushaf printed by Al-Azhar where most of the
>     > > Quranic annotation marks were formalized and standardized.
>     > >
> 
> 

-- 
behdad
http://behdad.org/