[cairo] Oprofiling Cairo on ARM

Tue Nov 28 11:46:12 PST 2006

Hey there,

On 11/28/06, Daniel Amelang <daniel.amelang at gmail.com> wrote:
> On 11/28/06, Carl Worth <cworth at cworth.org> wrote:
> > On Mon, 27 Nov 2006 18:32:23 +0200, "Xan Lopez" wrote:
> > > I finally managed to get oprofile running on my ARM environment.
> >
> > Thanks for doing this!
> > > demangle it manually to get coherent results. Anyway, I'm attaching the top
> > > offenders in an oprofiled gtk-theme-torturer run with Cairo 1.3.4.
> >
> > >   00007ee8 224      83.8951  libcairo.so.2.10.0       cairo_rectangle
> > > 00052ed8 4662      6.0572  libcairo.so.2.10.0       __adddf3
> >
> > >   00007ee8 480      82.0513  libcairo.so.2.10.0       cairo_rectangle
> > > 00053294 3782      4.9139  libcairo.so.2.10.0       __muldf3
> >
> > So the above shows that 10% of the time is spent doing floating-point
> > adds and multiplies on behalf of cairo_rectangle, right?
> >
> > So this should benefit from a general short-circuiting of identity
> > matrix transformations in _cairo_matrix_transform_point and
> > _cairo_matrix_transform_distance.
> >
> > I know Daniel has been experimenting with patches for this. And I
> > think he was just waiting to see a test case for which it was a
> > bottleneck. Is that right Daniel? Maybe we want to add a test that
> > just does a bunch of cairo_rectangle;cairo_fill will integer
> > coordinates and an identity matrix?
>
> You read my mind, er, hard drive's contents. So the good news is that
> if you combine the short-circuiting with a no-FP fixed_from_double
> (which is called twice for each cairo_rectangle), you can really speed
> up this exact perf test case. Patches in the works.
>
> Sorry for the delay, I've been busy getting my new nokia 770 (thanks
> to Xan!) setup so I can submit perf diffs for the 770, instead of just
> crossing my fingers that I did the right thing.
>

Can't wait to test them :)

(...)

> > It would be interesting
> > to know how much time is spent in cairo compared to the rest of the
> > GTK+ stack, for example.
>
> I have done some looking into the question of how much time is spent
> where in the stack. If my initial profiles are correct, we are rapidly
> becoming an insignificant (<10%) part of the whole picture, which
> would explain why certain large improvements in cairo don't show up
> much in the torturer. Overall, for us though, this is a good thing, as
> we are quickly moving out of the way of the blame :) So, after my next
> batch of patches, I think we'll be at the point that X, pangocairo,
> pango, and gtk (in that order) will need attention if any significant
> improvement in the torturer is to be seen.
>

According to my profiles that is absolutely correct. Fixing the X
server would give us the biggest wins by far at this point. I sent a
full profile to the list, but I got stuck in the review queue because
it was 160K. Tomorrow I'll cut it down to 100Kish size and will send
it again. And yes, I'm the lamest person in the world for lacking some
kind of web space to post that kind of stuff. Anyway, the top
offenders right now are: X, pangocairo and pango , with a special
mention for a surprising g_type_check_is_a () (maybe someone should
embark on a holy mission to remove every type checking which is
outside of a public entry point in the API, if there's any around).

> Dan
>

Cheers, Xan