<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Feb 24, 2015 at 5:03 AM, Richard Wordingham <span dir="ltr"><<a href="mailto:richard.wordingham@ntlworld.com" target="_blank">richard.wordingham@ntlworld.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Are we still left with IndicSyllabicCategory.txt as the only<br> functional definition of the properties? </blockquote><div><br></div><div>Not necessarily. USE seems to use a combination of Indic syllabic, Indic positional, and general categories, with some codepoints as exceptions. HarfBuzz has been using some very similar techniques too, with tables automatically derived from the Unicode data files and then some exceptions in code.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">1. Is <consonant><dependent_vowel>_<dependent_vowel> an allowed context<br> for a 'Consonant_Medial' if it is allowed for an invisible stacker plus<br> consonant?<br> <br> 2. Is <consonant><dependent_vowel>_# an allowed context for a<br> 'Consonant_Medial' if it is allowed for an invisible stacker plus<br> consonant?<br> <br> 3. Are they allowed contexts for 'Consonant_Subjoined' if they are<br> allowed for an invisible stacker plus consonant?<br></blockquote><div><br></div><div>They could be, as soon as we have evidence that there is need for allowing them (if we don't allow them at the moment). Generally, give us the character sequence that should work and doesn't, and why your sequence is correct according to Unicode encoding of a script, and HarfBuzz will get the patterns fixed to allow the character sequence.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Correction: I checked as I wrote and see that the USE specification was<br> released yesterday. If the blog page is correct, the Universal Shaping<br> Engine rejects the phonetic ordering of the Tai Tham encoding model.<br> The word /pɛːt/ 'eight' must be encoded <PA, SAKOT, DA, SIGN AE>! I<br> shall be studying the specification today. At first sight the USE<br> appears to reject the current encoding system.<br></blockquote><div><br></div><div>USE is really in draft mode IMO. There are several small details that it doesn't consider properly when it comes to Unicode. With HarfBuzz, we have been trying to be both closer to Unicode's definition of things, and more accepting of different sequences (i.e. show less dotted circles).</div><div><br></div><div>I specifically don't like USE's hard requirements of character ordering, which they</div><div>may sometimes be doing against Unicode recommendations. If you find examples when the Unicode standard recommends another order than USE does, please tell both the HarfBuzz community and the USE authors (contact Andrew Glass).</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> An important question for U+1A7A, U+1A7B and U+1A7C is:<br> <br> 4. May a 'syllable modifier' be followed by something other than a<br> syllable modifier? The description implies not, which reduces the<br> useful of what could have been a useful waste bin taxon, sweeping up<br> all the pure killers.<br></blockquote><div><br></div><div>My expectation in defining the new Syllable_Modifier was they they would typically occur at the end of syllables, but not necessarily the very very end. For example, I wouldn't be surprised if a Visarga or Bindu character follows them. (USE maybe more restrictive, but that's their problem.)</div><div><br></div><div>Still, I may have miscategorized the characters discussed here. They are quite under-documented in the standard and the proposals anyway. Any help in understanding them better would be appreciated, especially including real-world interesting cases that may not fit in the current model. I can help with getting the clarification into Unicode and fixes into HarfBuzz.</div><div><br></div><div>Also, the whole Syllable_Modifier category is sometimes just a catch-all for some of the weird or underdocumented things that don't easily fall into any other class. Feel free to suggest splits of the category.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span> > Please take a look and send me or UTC your suggestions (or file bugs<br> > at <a href="https://github.com/roozbehp/unicode-data/issues" target="_blank">https://github.com/roozbehp/unicode-data/issues</a>). If there was<br> > still a need to change something in HarfBuzz, we can do that too.<br> <br> </span>By 'sending to the UTC', are you suggesting anything more than a bug<br> report or document submission?</blockquote><div><br></div><div>No. A bug report or a document submission are the best ways forward. </div></div></div></div>