[cairo] Oprofiling Cairo on ARM

Tue Nov 28 11:23:57 PST 2006

On 11/28/06, Carl Worth <cworth at cworth.org> wrote:
> On Mon, 27 Nov 2006 18:32:23 +0200, "Xan Lopez" wrote:
> > I finally managed to get oprofile running on my ARM environment.
>
> Thanks for doing this!
> > demangle it manually to get coherent results. Anyway, I'm attaching the top
> > offenders in an oprofiled gtk-theme-torturer run with Cairo 1.3.4.
>
> >   00007ee8 224      83.8951  libcairo.so.2.10.0       cairo_rectangle
> > 00052ed8 4662      6.0572  libcairo.so.2.10.0       __adddf3
>
> >   00007ee8 480      82.0513  libcairo.so.2.10.0       cairo_rectangle
> > 00053294 3782      4.9139  libcairo.so.2.10.0       __muldf3
>
> So the above shows that 10% of the time is spent doing floating-point
> adds and multiplies on behalf of cairo_rectangle, right?
>
> So this should benefit from a general short-circuiting of identity
> matrix transformations in _cairo_matrix_transform_point and
> _cairo_matrix_transform_distance.
>
> I know Daniel has been experimenting with patches for this. And I
> think he was just waiting to see a test case for which it was a
> bottleneck. Is that right Daniel? Maybe we want to add a test that
> just does a bunch of cairo_rectangle;cairo_fill will integer
> coordinates and an identity matrix?

You read my mind, er, hard drive's contents. So the good news is that
if you combine the short-circuiting with a no-FP fixed_from_double
(which is called twice for each cairo_rectangle), you can really speed
up this exact perf test case. Patches in the works.

Sorry for the delay, I've been busy getting my new nokia 770 (thanks
to Xan!) setup so I can submit perf diffs for the 770, instead of just
crossing my fingers that I did the right thing.

> As for what remains in the profile:
>
> > 0000add0 3241      4.2110  libcairo.so.2.10.0       _cairo_bentley_ottmann_tessellate_polygon
> > 00039bb4 1824      2.3699  libcairo.so.2.10.0       _cairo_xlib_surface_show_glyphs
> > 000061b4 1735      2.2542  libcairo.so.2.10.0       .plt
> > 0000f0bc 1612      2.0944  libcairo.so.2.10.0       _cairo_hash_table_lookup_internal
> > 000534ec 1434      1.8632  libcairo.so.2.10.0       __aeabi_ddiv
> > 000172a0 1133      1.4721  libcairo.so.2.10.0       _cairo_scaled_glyph_lookup
> > 00012c90 968       1.2577  libcairo.so.2.10.0       _cairo_path_fixed_interpret
>
> These are all rather small percentages individually, so I don't see
> any easy fixes that are going to make a big difference in this
> profile.

Actually, I do have somewhat easy fixes for
_cairo_xlib_surface_show_glyphs and _cairo_hash_table_lookup_internal.
My no-FP version of _cairo_lround (that I'm thoroughly testing at the
moment) seems to nicely reduce the show_glyphs problem, and I think
I've nailed down the hash_table lookup slowdown to the __umodsi3 (mod)
that shows up close to the top of 770 profiles. As far as I can tell,
it's the fact that the 770 is really slow at integer division (which
has to be performed to get the result of the mod) that makes the
lookup slow. And that lookup gets called a _lot_, especially for the
text-bound perf cases.

An easy fix is to make the hash table sizes powers of 2 so you can
just use a simple integer mask to perform the mod. Obviously, this
could put us at high risk of collisions, but if our hash functions are
good enough, it shouldn't make a difference. This is all in theory, I
haven't coded anything up yet. When I have time, I mean to make the
change to cairo-hash, and then observe the collisions using various
text tests (cairo-perf, gedit, some others I have laying around) to
see if we get more collisions or not. Of course, to be safe, we could
limit the use of powers-of-two hash table sizes to when
AVOID_FLOATING_POINT is defined.

> It would be interesting
> to know how much time is spent in cairo compared to the rest of the
> GTK+ stack, for example.

I have done some looking into the question of how much time is spent
where in the stack. If my initial profiles are correct, we are rapidly
becoming an insignificant (<10%) part of the whole picture, which
would explain why certain large improvements in cairo don't show up
much in the torturer. Overall, for us though, this is a good thing, as
we are quickly moving out of the way of the blame :) So, after my next
batch of patches, I think we'll be at the point that X, pangocairo,
pango, and gtk (in that order) will need attention if any significant
improvement in the torturer is to be seen.

Dan