[HarfBuzz] dotted circle is not appearing for dependant vowel

Shriramana Sharma samjnaa at gmail.com
Tue Jul 24 04:51:26 PDT 2012


On Tue, Jul 24, 2012 at 3:26 PM, Pravin Satpute <psatpute at redhat.com> wrote:
>
>    I see the dotted circle is still not appearing with dependant vowels
> (U+093f), Is this intentionally?
>    Might be since you are removing test cases generating dotted circle
> in Uniscribe before running it with harfbuzz.

May I take this opportunity to record what I have long felt on the
topic of dotted circles.

I feel that dotted circles should not be displayed except when not
doing so can cause non-canonically-equivalent encoded sequences to
appear the same. That is, they should be displayed only to distinguish
between such sequences. (This is to protect against phishing and
such.)

For example, the long vowel आ does not have a decomposition to अ+ ा
whereas it would appear the same as the latter if there is no dotted
circle. There are many such "do not use" recommendations for
independent vowels in the Indic Unicode chapters because of the
absences of canonical equivalences (unfortunate IMO but well....).
Reordrant vowels like ि are also likewise, because in the case of a
sequence अिक mistakenly typed (or maliciously introduced) for अकि if
there is no dotted circle the two sequences would appear the same
which is not appropriate from a security viewpoint as they are not
canonically equivalent.

My point is, there may be many reasons for unexpected combinations of
characters in Indic. Vedic texts is one. Minority orthographies is
(which may use rare combinations of vowel signs and diacritics)
another. Legitimate creative use (like काााााा) for "kaaaa" (a shout)
is yet another. Imposing a limited orthography (i.e. only recognizing
a certain set of patterns of sequences and producing dotted circles
for sequences that do not fit the pattern) would preclude the
usefulness of the rendering system to users of such cases.

Of course, this usability can also be achieved by first imposing a
generic orthography (i.e. script grammar) and later adding more
recognized sequences as per user community request. (This is also much
easier to produce and deliver to the community in open source
ecosystems than in proprietary ones.)

This would be advisable since it may be difficult to predict which
sequences in Indic would be confusable, especially with non-spacing
marks. For example, तु and तुु would be confusable if there is no
dotted circle and the second ु is overlaid upon the first.

But these sequences are not self-obvious, so it appears creating
regexs for sequences where dotted circles should *not* be produced
might be easier than to do so where they *should* be produced and it
would be appropriate to err on the side of caution.

I had to say this, being a scholar of Sanskrit and Vedic, which really
puts scripts (and hence software support for them) to their limit.
Pravin (OP on this thread) and I, we have plans for developing a Lohit
Devanagari Vedic font, so we'll be coming back on this...

-- 
Shriramana Sharma



More information about the HarfBuzz mailing list