[Pixman] fast-scale branch performance improvements

Siarhei Siamashka siarhei.siamashka at gmail.com
Tue Mar 16 11:17:50 PDT 2010


On Tuesday 16 March 2010, Siarhei Siamashka wrote:
> On Tuesday 16 March 2010, Alexander Larsson wrote:
> > On Tue, 2010-03-16 at 16:51 +0200, Siarhei Siamashka wrote:
> > > Regarding the alex's branch and performance, I already mentioned that
> > > it was
> > > much slower for over_8888_0565 case in my benchmark when compared
> > > against my
> > > branch on ARM Cortex-A8 (the other cases of scaling are ok). I'm using
> > > the
> > > following test program for benchmarking these optimizations:
> > > http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=test-n-bench&i
> > > d=93ec60149cb3535f70a9e285de0b359ff444f26e
> > >
> > > The test program tries to benchmark scaling of when source and
> > > destination
> > > image sizes are approximately the same (and the performance can be
> > > more or
> > > less directly compared to the simple nonscaled blit).
> > >
> > > The results are (variance is only in the last digit):
> > >
> > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.06 MPix/s (1.21 FPS)
> > > vs.
> > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=8.72 MPix/s (2.08 FPS)
> > >
> > > which is quite a lot.
> >
> > Can you retry with my new branch:
> > http://cgit.freedesktop.org/~alexl/pixman/log/?h=alex-scaler2
>
> Now it is:
> op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.16 MPix/s (1.23 FPS)
>
> A little bit better, but still not good.

Found the problem, it's here:
> + SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, r5g6b5, 8888_565),
This should have a8r8g8b8 instead of a8b8g8r8. So this fast path just was not
run at all. Once fixed, it shows the expected performance.


Also 'alex-scaler2' branch is substantially slower than 'alex-scaler' for
normal repeat:

== nearest tiled SRC (alex-scaler) ==
op=1, src_fmt=20028888, dst_fmt=20028888, speed=90.91 MPix/s (21.67 FPS)
op=1, src_fmt=20028888, dst_fmt=10020565, speed=63.82 MPix/s (15.22 FPS)
op=1, src_fmt=10020565, dst_fmt=10020565, speed=92.16 MPix/s (21.97 FPS)

== nearest tiled SRC (alex-scaler2) ==
op=1, src_fmt=20028888, dst_fmt=20028888, speed=76.54 MPix/s (18.25 FPS)
op=1, src_fmt=20028888, dst_fmt=10020565, speed=50.44 MPix/s (12.03 FPS)
op=1, src_fmt=10020565, dst_fmt=10020565, speed=67.14 MPix/s (16.01 FPS)

One more anomaly is that 16bpp case somehow managed to get slower than
32bpp for normal repeat on ARM Cortex-A8. I'm checking what's wrong here.

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list