[HarfBuzz] The canonical ordering of hamza marks

Khaled Hosny khaledhosny at eglug.org
Fri Oct 18 17:03:11 PDT 2013


AFAIK, the only place it is used is as the second Noon of ننجي in Aya
88, Sura 21.

Regards,
Khaled

On Fri, Oct 18, 2013 at 04:39:55PM -0700, Roozbeh Pournader wrote:
> BTW, Khaled, do you have examples of U+06E8 in the text of the Koran? I
> would appreciate sura and aya numbers if you do.
> 
> 
> On Fri, Oct 18, 2013 at 4:33 PM, Roozbeh Pournader <roozbeh at google.com>wrote:
> 
> > Let me try to approach the problem from another angle.
> >
> > Unicode, although originally planned to be more semantic, has become more
> > and more a graphical encoding. This can be evidenced by the new characters
> > encoded or not encoded. The UTC continuously refers people to use existing
> > code points for things that are graphically similar to already-encoded
> > characters but are semantically very different, but encodes new characters
> > that are semantically the same as existing characters, but their exact
> > visual representation is important and is based on rules that are very hard
> > to derive.
> >
> > This is inevitable to some degree, since text rendering technology and
> > fonts should not be expected to be very complex. So plain text
> > representation becomes more visual in order to make life easier for the
> > rendering engines.
> >
> > This can be evidenced by a lot of the newer characters in the Arabic
> > blocks. The open tanweens or arrowheads in the Arabic Extended-A block were
> > encoded because they were graphically different, while the committee did
> > not encode a "waw with madda above" and recommended "waw+madda above" to be
> > used for it instead. The diacritical hamza was the most controversial, and
> > the controversy is the main reason for the hole at U+08A1 (it is reserved
> > for a Beh With Hamza Above, which will be in Unicode 7.0).
> >
> > All in all, this means that UTC considers anything that very much looks
> > like U+0653 a madda above, and anything that may need to be visually
> > distinguished from it and be smaller in size a small high madda. The glyphs
> > used in the chart show a significant size difference, and has been showing
> > that difference since the small high madda got encoded in Unicode 2.0.
> > Unicode actually doesn't prescribe exact usage of a lot of the Koranic
> > marks, because the marks may be used very differently across the various
> > Koranic traditions from Indonesia to Morocco.
> >
> > I don't think it's a good idea to consider madda to be a certain kind of
> > hamza. Yes, in the modern Arabic language Alef+madda above is semantically
> > equivalent to hamza+alef or alef+alef, but there is no hint of a hamza
> > semantic when some minority languages using the Arabic script takes a madda
> > and puts it over a waw to get a new vowel.
> >
> > I understand that means that there may be no "real" semantic difference
> > between a normal madda and a small high madda, but there's really no
> > semantic difference between a yeh and a farsi yeh either, and they are
> > separately encoded. Unicode is quite graphical in its encoding.
> >
> > Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such
> > characters anymore, except for the use of hamza above for diacritic usages
> > of non-hamza semantics. So there may as well be future siblings for U+0681,
> > U+076C, U+08A1, and U+08A8, but no future siblings to U+06C7 and U+06C8.
> >
> > Please tell me if there's anything I've missed to address.
> >
> >
> > On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <khaledhosny at eglug.org>wrote:
> >
> >> On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:
> >> > Khaled, you are referring to a specific style of writing the Koran.
> >> There
> >> > are several others, which Unicode should be able to represent.
> >>
> >> I’m not sure I follow here, if you think there should be a way to
> >> differentiate between two forms of prolongation mark (aka Quranic
> >> Madda), something I have never seen but i’m open to learn something new,
> >> then a new code point should be encoded, instead of abusing a Hamza (aka
> >> the other Madda) that has an incompatible normalization behaviour in
> >> Unicode.
> >>
> >> And you ignored my other point.
> >>
> >> Regards,
> >> Khaled
> >>
> >> > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <khaledhosny at eglug.org>
> >> wrote:
> >> >
> >> > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:
> >> > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <
> >> khaledhosny at eglug.org>
> >> > > wrote:
> >> > > >
> >> > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above>
> >> > > > >
> >> > > >
> >> > > > Why?
> >> > >
> >> > > Because every Mushaf printed in Egypt (and most of the Arabic world)
> >> > > since 1919[1] has a note at the end of Madda description stating that
> >> “…
> >> > > and this mark should not be used to indicate an omitted Alef
> >> after[sic]
> >> > > a written Alef, as in آمنوا, that were mistakingly put in many
> >> > > Mushafs …”, which to me is a very frank indication that the two marks
> >> > > are not the same thing.
> >> > >
> >> > > Also a vowel mark (which the Quranic Madda is) should not “blend” with
> >> > > its base letter, the same way that U+06C7 is not canonically
> >> equivalent
> >> > > to <U+0648,U+064F> etc.
> >> > >
> >> > > Regards,
> >> > > Khaled
> >> > >
> >> > > 1. The date of first Mushaf printed by Al-Azhar where most of the
> >> > > Quranic annotation marks were formalized and standardized.
> >> > >
> >>
> >
> >



More information about the HarfBuzz mailing list