[Pixman] [PATCH 00/12] Implement more vmx fast paths

Thu Jul 16 02:50:05 PDT 2015

On Wed, 15 Jul 2015 11:48:28 -0400
Adam Jackson <ajax at redhat.com> wrote:

> On Thu, 2015-07-02 at 13:04 +0300, Oded Gabbay wrote:
> > Hi,
> > 
> > This patch-set implements the most heavily used fast paths, according to
> > profiling done by me using the cairo traces package.
> 
> I finally got a chance to try this series on a power7, and the results
> are... mixed.  A sampling of x11perf numbers (against Xvfb, just
> switching pixman before and after):
> 
>       before          after                Operation
> ------------   -------------------------   -------------------------
>    6856255.6      5564651.7 (     0.812)   10x10 rectangle 
>     125522.9       455209.1 (     3.627)   100x100 rectangle 
>       5419.2        29705.8 (     5.482)   500x500 rectangle 
> 
> This one is telling, I think.  This should be the vmx_fill path, and it
> looks like a nice win for large ops but a hit for small ops.  Is the
> vmx setup cost that high, or is there something else going on?
> 
>    1641838.0      1684290.9 (     1.026)   Char in 80-char aa line (Charter 10) 
>     432916.1       466759.2 (     1.078)   Char in 30-char aa line (Charter 24) 
>    1412008.5      1545401.0 (     1.094)   Char in 80-char aa line (Courier 12) 
>    1440361.7      1947014.6 (     1.352)   Char in 80-char rgb line (Charter 10) 
>     384600.6       576289.5 (     1.498)   Char in 30-char rgb line (Charter 24) 
>    1258381.8      1811421.7 (     1.439)   Char in 80-char rgb line (Courier 12) 
> 
> Render text gets faster, nice.
> 
>    1202555.7      1228256.6 (     1.021)   Scroll 10x10 pixels 
>     162282.8       131857.7 (     0.813)   Scroll 100x100 pixels 
>       6819.8         6256.2 (     0.917)   Scroll 500x500 pixels 
>    1695720.5      1752339.8 (     1.033)   Copy 10x10 from pixmap to window 
>     210222.2       165836.1 (     0.789)   Copy 100x100 from pixmap to window 
>      14408.8        10600.1 (     0.736)   Copy 500x500 from pixmap to window
> 
> This should be the vmx_blit path, and it gets quite a bit worse for
> large ops.  Eesh.
> 
>    1021293.5      1060568.6 (     1.038)   PutImage 10x10 square 
>      54803.7        56420.0 (     1.029)   PutImage 100x100 square 
>       1933.5         1935.4 (     1.001)   PutImage 500x500 square 
>    1418641.0      1432543.1 (     1.010)   ShmPutImage 10x10 square 
>     194769.2       160047.5 (     0.822)   ShmPutImage 100x100 square 
>      11951.2        10968.1 (     0.918)   ShmPutImage 500x500 square 
> 
> Again, blit path, and usually worse for large ops.
> 
>     576975.4       573388.4 (     0.994)   Composite 10x10 from pixmap to window 
>     156830.4       131246.8 (     0.837)   Composite 100x100 from pixmap to window 
>      12172.5        10150.2 (     0.834)   Composite 500x500 from pixmap to window 
> 
> Not-quite-a-blit path, but no transformation, and the same kind of
> performance hit.
> 
>     176570.2       176330.2 (     0.999)   Scale 5x5 from pixmap to 10x10 window 
>       4598.0         4460.9 (     0.970)   Scale 50x50 from pixmap to 100x100 window 
>        189.9          185.9 (     0.979)   Scale 250x250 from pixmap to 500x500 window 
>     269540.6       269767.4 (     1.001)   Scale 10x10 from pixmap to 5x5 window 
>     267201.2       268220.5 (     1.004)   Scale 100x100 from pixmap to 5x5 window 
>        766.8          740.1 (     0.965)   Scale 500x500 from pixmap to 250x250 window
> 
> All within the noise margin, so I suspect the series just doesn't hit
> these paths.  (Ignore the implausible numbers from "Scale 100x100",
> that's an x11perf bug I just pushed a fix for.)
> 
> I'm a little hesitant to take a 10% to 20% hit to software blit
> performance.  It might be that vmx_blt is just a mistake to try, that
> the CPU and compiler are smarter than we are.

By the way, the 'R' and 'RT' test results from the lowlevel-blt-bench
are also representing the handling of small images. As mentioned in
the boilerplate text of the lowlevel-blt-bench report:

    R   - random rectangles with 32x32 average size are copied from
          random locations of one 1920x1080 buffer to another
    RT  - as R, but 8x8 average sized rectangles are copied

So these types of regressions should not normally remain unnoticed even
when using the pixman's own benchmarking tools. But thanks for posting
the x11perf numbers. They are also interesting.

-- 
Best regards,
Siarhei Siamashka