[HarfBuzz] optimization for ASCII-only text

Jonathan Kew jfkthame at googlemail.com
Thu Aug 9 10:32:37 PDT 2012


Hi Behdad,

While complex-script shaping is obviously far more interesting, in 
practice there is a lot of very simple ASCII text on the web. So what 
would you think of adding a minor optimization that looks like it can 
give us about 10% gain on shaping ASCII text with simple fonts? The idea 
is to make hb_buffer_add check whether any non-ASCII characters have 
been put in the buffer; and if not, there's no need to run the 
normalization pass.

(Of course, there are plenty of non-ASCII characters that could also be 
present without normalization becoming relevant, but I didn't want to 
make the check any more expensive than a simple character-code 
comparison, and optimizing performance of ASCII-only runs will benefit a 
lot of real-world text for minimal effort.)

This was prompted by profile data such as 
http://people.mozilla.com/~bgirard/cleopatra/?report=c2e6bea3647461c0675e59441b78c0f5c409ac0d 
(see https://bugzilla.mozilla.org/show_bug.cgi?id=762710#c25), which 
relates to layout of a large, almost purely ASCII document. This shows 
the normalization pass - which we know is redundant for ASCII-only text 
- contributing around 10% of the total shaping time. With this patch, 
that time simply vanishes from the profile.

JK

-------------- next part --------------
diff --git a/src/hb-buffer-private.hh b/src/hb-buffer-private.hh
index 9864ca2..6378458 100644
--- a/src/hb-buffer-private.hh
+++ b/src/hb-buffer-private.hh
@@ -95,6 +95,8 @@ struct hb_buffer_t {
   bool in_error; /* Allocation failed */
   bool have_output; /* Whether we have an output buffer going on */
   bool have_positions; /* Whether we have positions */
+  bool have_non_ascii; /* Whether any non-ASCII characters are present;
+                          if not, we don't need to normalize */
 
   unsigned int idx; /* Cursor into ->info and ->pos arrays */
   unsigned int len; /* Length of ->info and ->pos arrays */
diff --git a/src/hb-buffer.cc b/src/hb-buffer.cc
index db4edce..1626e6b 100644
--- a/src/hb-buffer.cc
+++ b/src/hb-buffer.cc
@@ -152,6 +152,7 @@ hb_buffer_t::reset (void)
   in_error = false;
   have_output = false;
   have_positions = false;
+  have_non_ascii = false;
 
   idx = 0;
   len = 0;
@@ -179,6 +180,8 @@ hb_buffer_t::add (hb_codepoint_t  codepoint,
   glyph->mask = mask;
   glyph->cluster = cluster;
 
+  have_non_ascii |= codepoint > 0x7f;
+
   len++;
 }
 
@@ -557,7 +560,8 @@ hb_buffer_get_empty (void)
 
     true, /* in_error */
     true, /* have_output */
-    true  /* have_positions */
+    true, /* have_positions */
+    false /* have_non_ascii */
   };
 
   return const_cast<hb_buffer_t *> (&_hb_buffer_nil);
diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index d1e1d6c..945bd98 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -500,10 +500,11 @@ hb_ot_shape_internal (hb_ot_shape_context_t *c)
 
   hb_ensure_native_direction (c->buffer);
 
-  _hb_ot_shape_normalize (c->font, c->buffer,
-			  c->plan->shaper->normalization_preference ?
-			  c->plan->shaper->normalization_preference (c->plan) :
-			  HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
+  if (c->buffer->have_non_ascii)
+    _hb_ot_shape_normalize (c->font, c->buffer,
+			    c->plan->shaper->normalization_preference ?
+			    c->plan->shaper->normalization_preference (c->plan) :
+			    HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
 
   hb_ot_shape_setup_masks (c);
 


More information about the HarfBuzz mailing list