[HarfBuzz] The canonical ordering of hamza marks

Roozbeh Pournader roozbeh at google.com
Fri Oct 18 16:33:01 PDT 2013


Let me try to approach the problem from another angle.

Unicode, although originally planned to be more semantic, has become more
and more a graphical encoding. This can be evidenced by the new characters
encoded or not encoded. The UTC continuously refers people to use existing
code points for things that are graphically similar to already-encoded
characters but are semantically very different, but encodes new characters
that are semantically the same as existing characters, but their exact
visual representation is important and is based on rules that are very hard
to derive.

This is inevitable to some degree, since text rendering technology and
fonts should not be expected to be very complex. So plain text
representation becomes more visual in order to make life easier for the
rendering engines.

This can be evidenced by a lot of the newer characters in the Arabic
blocks. The open tanweens or arrowheads in the Arabic Extended-A block were
encoded because they were graphically different, while the committee did
not encode a "waw with madda above" and recommended "waw+madda above" to be
used for it instead. The diacritical hamza was the most controversial, and
the controversy is the main reason for the hole at U+08A1 (it is reserved
for a Beh With Hamza Above, which will be in Unicode 7.0).

All in all, this means that UTC considers anything that very much looks
like U+0653 a madda above, and anything that may need to be visually
distinguished from it and be smaller in size a small high madda. The glyphs
used in the chart show a significant size difference, and has been showing
that difference since the small high madda got encoded in Unicode 2.0.
Unicode actually doesn't prescribe exact usage of a lot of the Koranic
marks, because the marks may be used very differently across the various
Koranic traditions from Indonesia to Morocco.

I don't think it's a good idea to consider madda to be a certain kind of
hamza. Yes, in the modern Arabic language Alef+madda above is semantically
equivalent to hamza+alef or alef+alef, but there is no hint of a hamza
semantic when some minority languages using the Arabic script takes a madda
and puts it over a waw to get a new vowel.

I understand that means that there may be no "real" semantic difference
between a normal madda and a small high madda, but there's really no
semantic difference between a yeh and a farsi yeh either, and they are
separately encoded. Unicode is quite graphical in its encoding.

Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such
characters anymore, except for the use of hamza above for diacritic usages
of non-hamza semantics. So there may as well be future siblings for U+0681,
U+076C, U+08A1, and U+08A8, but no future siblings to U+06C7 and U+06C8.

Please tell me if there's anything I've missed to address.


On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <khaledhosny at eglug.org> wrote:

> On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:
> > Khaled, you are referring to a specific style of writing the Koran. There
> > are several others, which Unicode should be able to represent.
>
> I’m not sure I follow here, if you think there should be a way to
> differentiate between two forms of prolongation mark (aka Quranic
> Madda), something I have never seen but i’m open to learn something new,
> then a new code point should be encoded, instead of abusing a Hamza (aka
> the other Madda) that has an incompatible normalization behaviour in
> Unicode.
>
> And you ignored my other point.
>
> Regards,
> Khaled
>
> > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <khaledhosny at eglug.org>
> wrote:
> >
> > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:
> > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <khaledhosny at eglug.org
> >
> > > wrote:
> > > >
> > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above>
> > > > >
> > > >
> > > > Why?
> > >
> > > Because every Mushaf printed in Egypt (and most of the Arabic world)
> > > since 1919[1] has a note at the end of Madda description stating that
> “…
> > > and this mark should not be used to indicate an omitted Alef after[sic]
> > > a written Alef, as in آمنوا, that were mistakingly put in many
> > > Mushafs …”, which to me is a very frank indication that the two marks
> > > are not the same thing.
> > >
> > > Also a vowel mark (which the Quranic Madda is) should not “blend” with
> > > its base letter, the same way that U+06C7 is not canonically equivalent
> > > to <U+0648,U+064F> etc.
> > >
> > > Regards,
> > > Khaled
> > >
> > > 1. The date of first Mushaf printed by Al-Azhar where most of the
> > > Quranic annotation marks were formalized and standardized.
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131018/7fa22e7c/attachment.html>


More information about the HarfBuzz mailing list