[HarfBuzz] The canonical ordering of hamza marks

Fri Oct 18 16:39:55 PDT 2013

BTW, Khaled, do you have examples of U+06E8 in the text of the Koran? I
would appreciate sura and aya numbers if you do.

On Fri, Oct 18, 2013 at 4:33 PM, Roozbeh Pournader <roozbeh at google.com>wrote:

> Let me try to approach the problem from another angle.
>
> Unicode, although originally planned to be more semantic, has become more
> and more a graphical encoding. This can be evidenced by the new characters
> encoded or not encoded. The UTC continuously refers people to use existing
> code points for things that are graphically similar to already-encoded
> characters but are semantically very different, but encodes new characters
> that are semantically the same as existing characters, but their exact
> visual representation is important and is based on rules that are very hard
> to derive.
>
> This is inevitable to some degree, since text rendering technology and
> fonts should not be expected to be very complex. So plain text
> representation becomes more visual in order to make life easier for the
> rendering engines.
>
> This can be evidenced by a lot of the newer characters in the Arabic
> blocks. The open tanweens or arrowheads in the Arabic Extended-A block were
> encoded because they were graphically different, while the committee did
> not encode a "waw with madda above" and recommended "waw+madda above" to be
> used for it instead. The diacritical hamza was the most controversial, and
> the controversy is the main reason for the hole at U+08A1 (it is reserved
> for a Beh With Hamza Above, which will be in Unicode 7.0).
>
> All in all, this means that UTC considers anything that very much looks
> like U+0653 a madda above, and anything that may need to be visually
> distinguished from it and be smaller in size a small high madda. The glyphs
> used in the chart show a significant size difference, and has been showing
> that difference since the small high madda got encoded in Unicode 2.0.
> Unicode actually doesn't prescribe exact usage of a lot of the Koranic
> marks, because the marks may be used very differently across the various
> Koranic traditions from Indonesia to Morocco.
>
> I don't think it's a good idea to consider madda to be a certain kind of
> hamza. Yes, in the modern Arabic language Alef+madda above is semantically
> equivalent to hamza+alef or alef+alef, but there is no hint of a hamza
> semantic when some minority languages using the Arabic script takes a madda
> and puts it over a waw to get a new vowel.
>
> I understand that means that there may be no "real" semantic difference
> between a normal madda and a small high madda, but there's really no
> semantic difference between a yeh and a farsi yeh either, and they are
> separately encoded. Unicode is quite graphical in its encoding.
>
> Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such
> characters anymore, except for the use of hamza above for diacritic usages
> of non-hamza semantics. So there may as well be future siblings for U+0681,
> U+076C, U+08A1, and U+08A8, but no future siblings to U+06C7 and U+06C8.
>
> Please tell me if there's anything I've missed to address.
>
>
> On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <khaledhosny at eglug.org>wrote:
>
>> On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote:
>> > Khaled, you are referring to a specific style of writing the Koran.
>> There
>> > are several others, which Unicode should be able to represent.
>>
>> I’m not sure I follow here, if you think there should be a way to
>> differentiate between two forms of prolongation mark (aka Quranic
>> Madda), something I have never seen but i’m open to learn something new,
>> then a new code point should be encoded, instead of abusing a Hamza (aka
>> the other Madda) that has an incompatible normalization behaviour in
>> Unicode.
>>
>> And you ignored my other point.
>>
>> Regards,
>> Khaled
>>
>> > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <khaledhosny at eglug.org>
>> wrote:
>> >
>> > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote:
>> > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <
>> khaledhosny at eglug.org>
>> > > wrote:
>> > > >
>> > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above>
>> > > > >
>> > > >
>> > > > Why?
>> > >
>> > > Because every Mushaf printed in Egypt (and most of the Arabic world)
>> > > since 1919[1] has a note at the end of Madda description stating that
>> “…
>> > > and this mark should not be used to indicate an omitted Alef
>> after[sic]
>> > > a written Alef, as in آمنوا, that were mistakingly put in many
>> > > Mushafs …”, which to me is a very frank indication that the two marks
>> > > are not the same thing.
>> > >
>> > > Also a vowel mark (which the Quranic Madda is) should not “blend” with
>> > > its base letter, the same way that U+06C7 is not canonically
>> equivalent
>> > > to <U+0648,U+064F> etc.
>> > >
>> > > Regards,
>> > > Khaled
>> > >
>> > > 1. The date of first Mushaf printed by Al-Azhar where most of the
>> > > Quranic annotation marks were formalized and standardized.
>> > >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20131018/2c8d0faa/attachment.html>