[Pixman] [PATCH 00/12] Implement more vmx fast paths
Adam Jackson
ajax at redhat.com
Wed Jul 15 08:48:28 PDT 2015
On Thu, 2015-07-02 at 13:04 +0300, Oded Gabbay wrote:
> Hi,
>
> This patch-set implements the most heavily used fast paths, according to
> profiling done by me using the cairo traces package.
I finally got a chance to try this series on a power7, and the results
are... mixed. A sampling of x11perf numbers (against Xvfb, just
switching pixman before and after):
before after Operation
------------ ------------------------- -------------------------
6856255.6 5564651.7 ( 0.812) 10x10 rectangle
125522.9 455209.1 ( 3.627) 100x100 rectangle
5419.2 29705.8 ( 5.482) 500x500 rectangle
This one is telling, I think. This should be the vmx_fill path, and it
looks like a nice win for large ops but a hit for small ops. Is the
vmx setup cost that high, or is there something else going on?
1641838.0 1684290.9 ( 1.026) Char in 80-char aa line (Charter 10)
432916.1 466759.2 ( 1.078) Char in 30-char aa line (Charter 24)
1412008.5 1545401.0 ( 1.094) Char in 80-char aa line (Courier 12)
1440361.7 1947014.6 ( 1.352) Char in 80-char rgb line (Charter 10)
384600.6 576289.5 ( 1.498) Char in 30-char rgb line (Charter 24)
1258381.8 1811421.7 ( 1.439) Char in 80-char rgb line (Courier 12)
Render text gets faster, nice.
1202555.7 1228256.6 ( 1.021) Scroll 10x10 pixels
162282.8 131857.7 ( 0.813) Scroll 100x100 pixels
6819.8 6256.2 ( 0.917) Scroll 500x500 pixels
1695720.5 1752339.8 ( 1.033) Copy 10x10 from pixmap to window
210222.2 165836.1 ( 0.789) Copy 100x100 from pixmap to window
14408.8 10600.1 ( 0.736) Copy 500x500 from pixmap to window
This should be the vmx_blit path, and it gets quite a bit worse for
large ops. Eesh.
1021293.5 1060568.6 ( 1.038) PutImage 10x10 square
54803.7 56420.0 ( 1.029) PutImage 100x100 square
1933.5 1935.4 ( 1.001) PutImage 500x500 square
1418641.0 1432543.1 ( 1.010) ShmPutImage 10x10 square
194769.2 160047.5 ( 0.822) ShmPutImage 100x100 square
11951.2 10968.1 ( 0.918) ShmPutImage 500x500 square
Again, blit path, and usually worse for large ops.
576975.4 573388.4 ( 0.994) Composite 10x10 from pixmap to window
156830.4 131246.8 ( 0.837) Composite 100x100 from pixmap to window
12172.5 10150.2 ( 0.834) Composite 500x500 from pixmap to window
Not-quite-a-blit path, but no transformation, and the same kind of
performance hit.
176570.2 176330.2 ( 0.999) Scale 5x5 from pixmap to 10x10 window
4598.0 4460.9 ( 0.970) Scale 50x50 from pixmap to 100x100 window
189.9 185.9 ( 0.979) Scale 250x250 from pixmap to 500x500 window
269540.6 269767.4 ( 1.001) Scale 10x10 from pixmap to 5x5 window
267201.2 268220.5 ( 1.004) Scale 100x100 from pixmap to 5x5 window
766.8 740.1 ( 0.965) Scale 500x500 from pixmap to 250x250 window
All within the noise margin, so I suspect the series just doesn't hit
these paths. (Ignore the implausible numbers from "Scale 100x100",
that's an x11perf bug I just pushed a fix for.)
I'm a little hesitant to take a 10% to 20% hit to software blit
performance. It might be that vmx_blt is just a mistake to try, that
the CPU and compiler are smarter than we are.
- ajax
More information about the Pixman
mailing list