Baselining EXA quality (r100)

Wed May 16 11:03:38 PDT 2007

In concert with the effort I recently started to baseline EXA's
performance, I also want to baseline its quality. Again, I did this
with the hardware I had readily available, (still an r100---haven't
gotten a fancy new Intel GM965 yet).

The three things I decided to use for testing are the X test suite,
the rendercheck program, and cairo's test suite. The results I got for
each are detailed below.

To summarize the results:

X test suite: EXA fails fewer tests than XAA (82 compared to 96), but
	      I don't know how to interpret details of the failures.

Rendercheck: XAA passes all tests I ran while EXA fails two,
             (transformed source and mask).

	     XAA fails other tests which I did not run to completion,
	     (and I haven't run against EXA at all).

Cairo test suite: From a first look, it appears this suite found 1 bug
                  in XAA and 2 or 3 bugs in EXA. This suite provides
                  images showing the failures:

	http://people.freedesktop.org/~cworth/cairo-exa-vs-xaa/quality/

Hopefully that's helpful, and hopefully the details below provide
enough information for anybody who wants to replicate this kind of
testing with other driver+hardware combinations.

-Carl

X test suite
============
Instructions for obtaining, building and running the suite can be
found here:

	http://xorg.freedesktop.org/wiki/BuildingXtest

I followed those instructions and ran the test suite against an XAA X
server, and then an EXA X server, (adding only AccelMethod:exa and
AccelDFS:True options to the configuration file). When comparing the
results of vswrpt from each run, the following lines are different:

        CASES TESTS  PASS UNSUP UNTST NOTIU  WARN   FIP  FAIL UNRES  UNIN ABORT
XAA:
Xlib4      29   324   280    11    27     5     0     0     1     0     0     0
Xlib8      29   165   133    10    22     0     0     0     0     0     0     0
Xlib9      46  1472  1174    23    36   201     8     0    30     0     0     0
TOTAL     996  5552  4156    96   789   268    10     0    96   137     0     0

EXA:
Xlib4      29   324   275    11    27     5     0     0     6     0     0     0
Xlib8      29   165   132    10    22     0     0     0     0     1     0     0
Xlib9      46  1472  1192    23    36   201     9     0    11     0     0     0
TOTAL     996  5552  4168    96   789   268    11     0    82   138     0     0

Finding the differences in the above chart can be challenging, (wdiff
helps, but then the columns get messed up). Here's a summary of what
the above shows when changing from XAA to EXA:

	Xlib4:  5 PASS become FAIL
	Xlib8:  1 PASS become UNRES
	Xlib9: 19 FAIL become PASS
	Xlib9:  1 PASS become WARN

I haven't yet looked into chasing down the specific test cases that
have behavioral changes. Does anyone have more information about how
to go about that?

Rendercheck
===========
The rendercheck utility can be obtained via git as follows:

	git clone git://anongit.freedesktop.org/git/xorg/app/rendercheck

I ran into some gotchas when naively running the rendercheck program
that results from  compiling:

1. It takes forever to complete.

   I computed that on my laptop the composite and cacomposite tests
   would each take over 17 hours to complete. And since I wanted to
   run this against multiple X server I decided I just didn't have the
   patience, so I dropped those tests.

2. It generates enormous amounts of data.

   Once some tests start spewing errors, they spew a *lot*. The
   gradients test had spewed many hundreds of megabytes of errors in a
   rather short time before I interrupted it and dropped it from my
   runs.

3. It doesn't save any data, nor warn the user to save it.

   This is especially problematic in light of the above two
   problems. After waiting forever for a test to complete, a user can
   be in the sad situation of realizing that the output spewed to the
   terminal and now lost was the only information generated by
   rendercheck, (aside from a final count of tests passed and tests
   run).

So it would be nice to see some fixes made to this tool to make it
more usable.

As is, here's the command-line I ended up using:

	./rendercheck -t fill,dcoords,scoords,mcoords,tscoords,tmcoords,blend,repeat,triangles,bug7366 > precious-error-log.rendercheck

The explicit list of tests passed to the -t option differs from the
default by not including composite, cacomposite, and gradients (as
described above). This is somewhat unfortunate as each of these tests
were definitely spewing some actual errors with an XAA server before I
got bored and killed it. Maybe someone with more patience (and hard
drive space) than I have can go back and run these tests to completion
against XAA and EXA, (or fix the tests to be more efficient first).

As for results, here is the final line of output from runs against
both XAA and EXA:

	XAA: 3571749 tests passed of 3571749 total
	EXA: 3571747 tests passed of 3571749 total

That is, rendercheck is only counting two small strikes against
EXA. Looking at the log file, the two failures are in

	transformed src coords test 2
and	transformed mask coords test 2

More details can be seen in the log files here:

http://people.freedesktop.org/~cworth/cairo-exa-vs-xaa/quality/xaa.rendercheck
http://people.freedesktop.org/~cworth/cairo-exa-vs-xaa/quality/exa.rendercheck

Cairo test suite
================
The results of running the cairo test suite are most plain to see by
just looking at the resulting images:

	http://people.freedesktop.org/~cworth/cairo-exa-vs-xaa/quality/

The NoAccel case has 0 failures, (we've basically been using that as a
baseline for cairo releases, so that's not too surprising).

With XAA, 3 different tests are flagged as failures, but two of them,
(radial-gradient and random-intersections), look fine to me by visual
inspection. The 3rd failure, (unantialiased-shapes with destination
alpha), is a definite bug.

With EXA, 10 different tests are flagged as failures. It looks to me
like there are perhaps only 2 or 3 bugs indicated by the failures:

1. On several of the tests, cairo draws a checkered
   background. Whenever this is drawn with a 25x25 offset, the
   resulting pattern is positioned incorrectly, (this may be the same
   bug that rendercheck found). This appears to account for 6 of the
   10 failures.

2. The pixman-rotate test demonstrates an old bug, (long since fixed
   in pixman and apparently the X server software), in which rotated
   sample points were taken with the wrong sub-pixel offset. So
   perhaps a similar bug exists in EXA+ati.

3 The rotate-image-surface-paint test result is horribly wrong. It
  doesn't look like any familiar bug to me.

The remaining two failures, (clip-operator and source-clip-scale),
look reasonable by visual inspection, but perhaps source-clip-scale
deserves a closer look, (the difference image has a couple of pixels
that stand out in odd ways that might indicate bugs).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20070516/2b0a2393/attachment.pgp>