[cairo] Report for the pango patch and proper profiles
behdad at behdad.org
Fri Dec 1 00:59:45 PST 2006
On Wed, 2006-11-29 at 09:24 -0500, Xan Lopez wrote on cairo list:
> Anyway, thanks to the nice guys from opened-hand I received the patch
> that fixes the symbol resolving brokeness of my oprofiler, so I can
> now provide sane profiles. I'm attaching profiles for cairo, pango and
> pangocairo from a timetext run.
Can you attach a merged profile too (one for all libraries, oprofile
does that I believe)? With separate profiles, it's not as easy to see
what's going on, without guessing and all...
> An immediate comment from the pangocairo profile: is the glyph_extents
> cache working? it's taking a lot of CPU time and we are exposing the
> same ~20 characters characters all over again (this might be getting
> borderline off-topic for this list, now I think about it. If I'm
> bothering someone please just tell). I'm attaching the profiles
> because I've not yet received the fd.o web space, sorry about it.
On Wed, 2006-11-29 at 10:44 -0500, Xan Lopez wrote on gtk-i18n-list:
> In a follow up of my profiling experiments I tried to figure out
> why pango_cairo_fc_font_get_glyph_extents was appearing so high in the
> charts. I'm using the timetext program, which exposes a 67
> (including whitespace) character long string with 22 different
> characters as much as possible in one minute. One I checked that the
> cache was actually working I modified the function to just compute the
> extents of the first character received and then use that value all
> the time (yeah, weird experiments are fun). I hoped this would make
> the function drop in the profile, but it did nothing! Puzzled, I
> decided to count how many times was this function being called. As it
> turns out, for 292 expose events of a 67 character long GtkLabel we
> are calling it 56,191 times. That's almost 3 times per character per
> expose event, 4 if you ignore whitespace. Does it sound right?
Ok, I'm finally happy enough to reply to this.
First, the string is 64 chars long. And your 56,191 should really be
56,192. And that is exactly 3 pango_font_get_glyph_extents() calls per
character per expose, plus 2 initial ones:
56,192 = 292 * 64 * 3 + 64 * 2.
The initial ones come from the shaper and PangoLayout, and whose results
are cached in PangoLayout.
The 3-per-draw came as a surprise to me. I used to think that the only
place we were making those calls were in pango_renderer_draw_layout_line
and that I nuked that in favor of using a new API call
So I expected Pango HEAD to make one fewer such calls per glyph. But
that was not the case. Further investigations showed that it was
actually working correctly there, but that an innocent-looking bugfix of
mine was making another place in PangoLayout to make an extra call now:
So, I fixed that today:
and that brings us down to 2 get_glyph_extents() calls per glyph per
The remaining two are apparently generated by the PangoLayoutIter that
pango-renderer.c creates to render the layout. Looking into those, one
is done in update_run() to cache run_logical_rect. Of that cached
value, we only use the width (unless user asks for the run logical_rect
of course), so I switched that one over to
pango_glyph_string_get_width() too right now:
and now there's only 1 get_glyph_extents() calls per glyph per expose.
That one is a bit harder to fix, unless one caches per-line or per-item
extents. There's actually a patch for this already:
The problem is, PangoLayout has API that gives away pointer to internal
structures, such that you can modify the glyph widths, and that is
legitimate use. For example Firefox+Pango uses that to justify lines.
So I was under the impression that we cannot meaningfully cache much,
until I figured, well: we can cache, until the user asks for that evil
pointer! So, unless we have handed out a pointer to internal stuff, we
can cache. And even when we have given the pointer away, we can cache
as part of a single pango operation. So, I'm going to look more into
this, and come up with a sensible scheme that doesn't introduce
regressions. With that in place, it may make sense to revert some of my
pango_glyph_string_get_width() uses above and use the cached values;
maybe not. Donno.
Anyway, that's all for now. Some numbers:
For timetext.c :
Drawn label 112412 times. Average time spent drawing (in seconds): 0.000105
Drawn label 123871 times. Average time spent drawing (in seconds): 0.000063
So, in this case, the expose time was improved 40%, and overall
performance was improved 10%. I expect (much) higher numbers for the
latter on tiny gadgets, but can't tell unless I'm offered one ;-).
I use a small toy of mine to write /probes/ that can do cool things
about what library calls are being made without having to
compile/install pango all the time. It's a tiny script called bprobe,
available from GNOME CVS. For example, this is the probe I used to
figure out what's going on:
However, this only works because Pango doesn't have the machinery to
avoid PLT usage for local symbols. That's another thing to look into,
may have measurable performance (and size) improvements on smaller
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the cairo