[HarfBuzz] optimization for ASCII-only text
Jonathan Kew
jfkthame at googlemail.com
Thu Aug 9 10:32:37 PDT 2012
Hi Behdad,
While complex-script shaping is obviously far more interesting, in
practice there is a lot of very simple ASCII text on the web. So what
would you think of adding a minor optimization that looks like it can
give us about 10% gain on shaping ASCII text with simple fonts? The idea
is to make hb_buffer_add check whether any non-ASCII characters have
been put in the buffer; and if not, there's no need to run the
normalization pass.
(Of course, there are plenty of non-ASCII characters that could also be
present without normalization becoming relevant, but I didn't want to
make the check any more expensive than a simple character-code
comparison, and optimizing performance of ASCII-only runs will benefit a
lot of real-world text for minimal effort.)
This was prompted by profile data such as
http://people.mozilla.com/~bgirard/cleopatra/?report=c2e6bea3647461c0675e59441b78c0f5c409ac0d
(see https://bugzilla.mozilla.org/show_bug.cgi?id=762710#c25), which
relates to layout of a large, almost purely ASCII document. This shows
the normalization pass - which we know is redundant for ASCII-only text
- contributing around 10% of the total shaping time. With this patch,
that time simply vanishes from the profile.
JK
-------------- next part --------------
diff --git a/src/hb-buffer-private.hh b/src/hb-buffer-private.hh
index 9864ca2..6378458 100644
--- a/src/hb-buffer-private.hh
+++ b/src/hb-buffer-private.hh
@@ -95,6 +95,8 @@ struct hb_buffer_t {
bool in_error; /* Allocation failed */
bool have_output; /* Whether we have an output buffer going on */
bool have_positions; /* Whether we have positions */
+ bool have_non_ascii; /* Whether any non-ASCII characters are present;
+ if not, we don't need to normalize */
unsigned int idx; /* Cursor into ->info and ->pos arrays */
unsigned int len; /* Length of ->info and ->pos arrays */
diff --git a/src/hb-buffer.cc b/src/hb-buffer.cc
index db4edce..1626e6b 100644
--- a/src/hb-buffer.cc
+++ b/src/hb-buffer.cc
@@ -152,6 +152,7 @@ hb_buffer_t::reset (void)
in_error = false;
have_output = false;
have_positions = false;
+ have_non_ascii = false;
idx = 0;
len = 0;
@@ -179,6 +180,8 @@ hb_buffer_t::add (hb_codepoint_t codepoint,
glyph->mask = mask;
glyph->cluster = cluster;
+ have_non_ascii |= codepoint > 0x7f;
+
len++;
}
@@ -557,7 +560,8 @@ hb_buffer_get_empty (void)
true, /* in_error */
true, /* have_output */
- true /* have_positions */
+ true, /* have_positions */
+ false /* have_non_ascii */
};
return const_cast<hb_buffer_t *> (&_hb_buffer_nil);
diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index d1e1d6c..945bd98 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -500,10 +500,11 @@ hb_ot_shape_internal (hb_ot_shape_context_t *c)
hb_ensure_native_direction (c->buffer);
- _hb_ot_shape_normalize (c->font, c->buffer,
- c->plan->shaper->normalization_preference ?
- c->plan->shaper->normalization_preference (c->plan) :
- HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
+ if (c->buffer->have_non_ascii)
+ _hb_ot_shape_normalize (c->font, c->buffer,
+ c->plan->shaper->normalization_preference ?
+ c->plan->shaper->normalization_preference (c->plan) :
+ HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
hb_ot_shape_setup_masks (c);
More information about the HarfBuzz
mailing list