[HarfBuzz] The canonical ordering of hamza marks
Behdad Esfahbod
behdad at behdad.org
Thu May 8 11:53:05 PDT 2014
Here's some relevant document Roozbeh set to UTC recently:
http://www.unicode.org/L2/L2014/14127-arabic-marks-order.pdf
On 13-10-18 07:33 PM, Roozbeh Pournader wrote:
> Let me try to approach the problem from another angle.
>
> Unicode, although originally planned to be more semantic, has become more and
> more a graphical encoding. This can be evidenced by the new characters encoded
> or not encoded. The UTC continuously refers people to use existing code points
> for things that are graphically similar to already-encoded characters but are
> semantically very different, but encodes new characters that are semantically
> the same as existing characters, but their exact visual representation is
> important and is based on rules that are very hard to derive.
>
> This is inevitable to some degree, since text rendering technology and fonts
> should not be expected to be very complex. So plain text representation
> becomes more visual in order to make life easier for the rendering engines.
>
> This can be evidenced by a lot of the newer characters in the Arabic blocks.
> The open tanweens or arrowheads in the Arabic Extended-A block were encoded
> because they were graphically different, while the committee did not encode a
> "waw with madda above" and recommended "waw+madda above" to be used for it
> instead. The diacritical hamza was the most controversial, and the controversy
> is the main reason for the hole at U+08A1 (it is reserved for a Beh With Hamza
> Above, which will be in Unicode 7.0).
>
> All in all, this means that UTC considers anything that very much looks like
> U+0653 a madda above, and anything that may need to be visually distinguished
> from it and be smaller in size a small high madda. The glyphs used in the
> chart show a significant size difference, and has been showing that difference
> since the small high madda got encoded in Unicode 2.0. Unicode actually
> doesn't prescribe exact usage of a lot of the Koranic marks, because the marks
> may be used very differently across the various Koranic traditions from
> Indonesia to Morocco.
>
> I don't think it's a good idea to consider madda to be a certain kind of
> hamza. Yes, in the modern Arabic language Alef+madda above is semantically
> equivalent to hamza+alef or alef+alef, but there is no hint of a hamza
> semantic when some minority languages using the Arabic script takes a madda
> and puts it over a waw to get a new vowel.
>
> I understand that means that there may be no "real" semantic difference
> between a normal madda and a small high madda, but there's really no semantic
> difference between a yeh and a farsi yeh either, and they are separately
> encoded. Unicode is quite graphical in its encoding.
>
> Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such characters
> anymore, except for the use of hamza above for diacritic usages of non-hamza
> semantics. So there may as well be future siblings for U+0681, U+076C, U+08A1,
> and U+08A8, but no future siblings to U+06C7 and U+06C8.
>
> Please tell me if there's anything I've missed to address.
>
>
> On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <khaledhosny at eglug.org
> <mailto:khaledhosny at eglug.org>> wrote:
>
> On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:
> > Khaled, you are referring to a specific style of writing the Koran. There
> > are several others, which Unicode should be able to represent.
>
> I’m not sure I follow here, if you think there should be a way to
> differentiate between two forms of prolongation mark (aka Quranic
> Madda), something I have never seen but i’m open to learn something new,
> then a new code point should be encoded, instead of abusing a Hamza (aka
> the other Madda) that has an incompatible normalization behaviour in
> Unicode.
>
> And you ignored my other point.
>
> Regards,
> Khaled
>
> > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <khaledhosny at eglug.org
> <mailto:khaledhosny at eglug.org>> wrote:
> >
> > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:
> > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <khaledhosny at eglug.org
> <mailto:khaledhosny at eglug.org>>
> > > wrote:
> > > >
> > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above>
> > > > >
> > > >
> > > > Why?
> > >
> > > Because every Mushaf printed in Egypt (and most of the Arabic world)
> > > since 1919[1] has a note at the end of Madda description stating that “…
> > > and this mark should not be used to indicate an omitted Alef after[sic]
> > > a written Alef, as in آمنوا, that were mistakingly put in many
> > > Mushafs …”, which to me is a very frank indication that the two marks
> > > are not the same thing.
> > >
> > > Also a vowel mark (which the Quranic Madda is) should not “blend” with
> > > its base letter, the same way that U+06C7 is not canonically equivalent
> > > to <U+0648,U+064F> etc.
> > >
> > > Regards,
> > > Khaled
> > >
> > > 1. The date of first Mushaf printed by Al-Azhar where most of the
> > > Quranic annotation marks were formalized and standardized.
> > >
>
>
--
behdad
http://behdad.org/
More information about the HarfBuzz
mailing list