Unfair comparison of pixman to EXA (r100)

Carl Worth cworth at cworth.org
Thu Apr 26 11:25:18 PDT 2007


After comparing EXA to NoAccel, the next thing I wanted to see is what
happens when we compare using cairo-xlib with EXA to using cairo's
image backend, (client-side compositing with the pixman code in cairo
which was copied from the X server's fb code).

First, I should make it very clear that this comparison is totally
unfair. When rendering with cairo's image backend, cairo is just
pushing data around inside CPU memory---it's not talking to the X
server or a video card, nor ever getting anything to actually appear
on a display device.

Obviously, we expect at least some non-zero overhead for those
operations, so it's not fair to compare against a backend which
doesn't have any at all. A fair comparison would be to make a version
of cairo's xlib backend that just drew everything with cairo image
surfaces and then copied things to the X server when complete. That
would be traditional client-side rendering.

The comparison is so unfair that cairo-perf-diff-files wouldn't even
let me generate a report for it unless I lied first. I had to fudge
the pixman.perf report by changing all occurrences of "image" in the
backend column to be "xlib" instead. This file is available here:

http://cairographics.org/~cworth/cairo-exa-vs-xaa/pixman-unfair.perf

And the comparison from that to exa-dri.perf is here (again, with
highlights quoted inline below):

http://cairographics.org/~cworth/cairo-exa-vs-xaa/exa-dri-vs-pixman-unfair.txt

For the highlights, surprisingly enough there is nothing that shows up
as a speedup from pixman to EXA. In spite of the unfairness of the
test, this really surprised me since EXA shows up as up to 110x faster
than NoAccel for some tests, and the NoAccel and pixman speeds should
be similar (aside from the unfairness, and any optimizations in pixman
that haven't made their way into the X server's fb code yet[*]).

So I'd be glad to have someone point out something wrong in my
methodology and prove that these results are totally bogus. But for
now, this is what I'm seeing:

old: pixman-unfair
new: exa-dri
Slowdowns
=========
 xlib-rgb        long-lines-uncropped-100    4.87 -> 599.89: 123.26x slowdown
█████████████████████████████████████████████████████████████▏

OK, the above is a _huge_ performance problem, and is test case
designed to expose a specific performance bug, (which the X server has
had for a long time, and pixman had as well until recently).

So this is one where at least I know exactly what is happening. When
the server implements it support for XRenderCompositeTrapezoids it
constructs a mask large enough to fit all of the trapezoids,
(regardless of whether the destination surface is much smaller or is
clipped to something much smaller).

That means that whenever geometry is presented to the X server which
is partially on-screen, but mostly off-screen, (imagine zooming in to
see a small beach on a vector map with a polygon representing a large
lake, for example), the X server spends a lot of time doing useless
rasterization work.

So the 100x slowdown above represents the 100x ratio of
offscreen-to-onscreen aspect of the trapezoids in this test. So this
should be a simple fix in the X server and all of that performance can
come back, (and cairo could be modified to clip trapezoids before
sending them to the server as well).

 xlib-rgba              subimage_copy-512    0.00 ->   0.08: 25.14x slowdown
████████████▏
 xlib-rgb        paint_image_rgb_over-256    0.10 ->   1.86: 19.35x slowdown
█████████▏
 xlib-rgba                 rectangles-512    3.33 ->  45.38: 13.64x slowdown
██████▍
 xlib-rgba         box-outline-stroke-100    0.01 ->   0.06:  7.35x slowdown
███▏
 xlib-rgba  paint_similar_rgba_source-256    0.11 ->   0.73:  6.85x slowdown
██▍
 xlib-rgb              unaligned_clip-100    0.05 ->   0.28:  5.42x slowdown
██▎
 xlib-rgba           box-outline-fill-100    0.01 ->   0.06:  4.96x slowdown
██

[snip dozens of other test cases in the range of 2x to 5x slowdown]

Some of those are obviously the same problems where we saw EXA being
much slower than NoAccel. But others are perhaps worse (I see many
cases where NoAccel shows slowdowns of 30-50x compared to pixman, and
I don't think the unfairness of the comparison accounts for all of
that.)

Anyway, instead of getting disappointed that EXA isn't already making
more of cairo faster than client-side rendering (on my r100 at least),
I'll choose to get excited about how much potential we have to improve
things here.

Feedback on any of the above is quite welcome,

-Carl
 
[*] Fortunately, Soeren is working hard on getting pixman and fb
merged right now.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20070426/22af206a/attachment.pgp>


More information about the xorg mailing list