ICU has quick-check functions <a href="http://icu-project.org/apiref/icu4c/unorm2_8h.html#ad81711834f00bbeb97738004f4f08450">http://icu-project.org/apiref/icu4c/unorm2_8h.html#ad81711834f00bbeb97738004f4f08450</a> which can return YES, NO, MAYBE as to whether normalization is required. If you're making a pass over the data, this is not *much* more expensive than just checking for non ascii. Something to consider, either if ICU is used, or in principle. <div>
<br></div><div>-s <br><br><div class="gmail_quote">On Thu, Aug 9, 2012 at 10:32 AM, Jonathan Kew <span dir="ltr"><<a href="mailto:jfkthame@googlemail.com" target="_blank">jfkthame@googlemail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Behdad,<br>
<br>
While complex-script shaping is obviously far more interesting, in practice there is a lot of very simple ASCII text on the web. So what would you think of adding a minor optimization that looks like it can give us about 10% gain on shaping ASCII text with simple fonts? The idea is to make hb_buffer_add check whether any non-ASCII characters have been put in the buffer; and if not, there's no need to run the normalization pass.<br>
<br>
(Of course, there are plenty of non-ASCII characters that could also be present without normalization becoming relevant, but I didn't want to make the check any more expensive than a simple character-code comparison, and optimizing performance of ASCII-only runs will benefit a lot of real-world text for minimal effort.)<br>
<br>
This was prompted by profile data such as <a href="http://people.mozilla.com/~bgirard/cleopatra/?report=c2e6bea3647461c0675e59441b78c0f5c409ac0d" target="_blank">http://people.mozilla.com/~<u></u>bgirard/cleopatra/?report=<u></u>c2e6bea3647461c0675e59441b78c0<u></u>f5c409ac0d</a> (see <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=762710#c25" target="_blank">https://bugzilla.mozilla.org/<u></u>show_bug.cgi?id=762710#c25</a>), which relates to layout of a large, almost purely ASCII document. This shows the normalization pass - which we know is redundant for ASCII-only text - contributing around 10% of the total shaping time. With this patch, that time simply vanishes from the profile.<span class="HOEnZb"><font color="#888888"><br>
<br>
JK<br>
<br>
</font></span><br>_______________________________________________<br>
HarfBuzz mailing list<br>
<a href="mailto:HarfBuzz@lists.freedesktop.org">HarfBuzz@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/harfbuzz" target="_blank">http://lists.freedesktop.org/mailman/listinfo/harfbuzz</a><br>
<br></blockquote></div><br></div>