Baselining EXA quality (r100)

Thu May 17 11:58:54 PDT 2007

On Wed, 2007-05-16 at 11:03 -0700, Carl Worth wrote: 
> In concert with the effort I recently started to baseline EXA's
> performance, I also want to baseline its quality. Again, I did this
> with the hardware I had readily available, (still an r100---haven't
> gotten a fancy new Intel GM965 yet).
> 
> The three things I decided to use for testing are the X test suite,
> the rendercheck program, and cairo's test suite. The results I got for
> each are detailed below.
> 
> To summarize the results:
> 
> X test suite: EXA fails fewer tests than XAA (82 compared to 96), but
> 	      I don't know how to interpret details of the failures.
> 
> Rendercheck: XAA passes all tests I ran while EXA fails two,
>              (transformed source and mask).
> 
> 	     XAA fails other tests which I did not run to completion,
> 	     (and I haven't run against EXA at all).
> 
> Cairo test suite: From a first look, it appears this suite found 1 bug
>                   in XAA and 2 or 3 bugs in EXA. This suite provides
>                   images showing the failures:
> 
> 	http://people.freedesktop.org/~cworth/cairo-exa-vs-xaa/quality/
> 
> Hopefully that's helpful, and hopefully the details below provide
> enough information for anybody who wants to replicate this kind of
> testing with other driver+hardware combinations.
> 
> -Carl
> 
> X test suite
> ============
> Instructions for obtaining, building and running the suite can be
> found here:
> 
> 	http://xorg.freedesktop.org/wiki/BuildingXtest
> 
> I followed those instructions and ran the test suite against an XAA X
> server, and then an EXA X server, (adding only AccelMethod:exa and
> AccelDFS:True options to the configuration file). When comparing the
> results of vswrpt from each run, the following lines are different:
> 
>         CASES TESTS  PASS UNSUP UNTST NOTIU  WARN   FIP  FAIL UNRES  UNIN ABORT
> XAA:
> Xlib4      29   324   280    11    27     5     0     0     1     0     0     0
> Xlib8      29   165   133    10    22     0     0     0     0     0     0     0
> Xlib9      46  1472  1174    23    36   201     8     0    30     0     0     0
> TOTAL     996  5552  4156    96   789   268    10     0    96   137     0     0
> 
> EXA:
> Xlib4      29   324   275    11    27     5     0     0     6     0     0     0
> Xlib8      29   165   132    10    22     0     0     0     0     1     0     0
> Xlib9      46  1472  1192    23    36   201     9     0    11     0     0     0
> TOTAL     996  5552  4168    96   789   268    11     0    82   138     0     0
> 
> Finding the differences in the above chart can be challenging, (wdiff
> helps, but then the columns get messed up). Here's a summary of what
> the above shows when changing from XAA to EXA:
> 
> 	Xlib4:  5 PASS become FAIL
> 	Xlib8:  1 PASS become UNRES
> 	Xlib9: 19 FAIL become PASS
> 	Xlib9:  1 PASS become WARN
> 
> I haven't yet looked into chasing down the specific test cases that
> have behavioral changes. Does anyone have more information about how
> to go about that?
> 
> Rendercheck
> ===========
> The rendercheck utility can be obtained via git as follows:
> 
> 	git clone git://anongit.freedesktop.org/git/xorg/app/rendercheck
> 
> I ran into some gotchas when naively running the rendercheck program
> that results from  compiling:
> 
> 1. It takes forever to complete.
> 
>    I computed that on my laptop the composite and cacomposite tests
>    would each take over 17 hours to complete. And since I wanted to
>    run this against multiple X server I decided I just didn't have the
>    patience, so I dropped those tests.
> 
> 2. It generates enormous amounts of data.
> 
>    Once some tests start spewing errors, they spew a *lot*. The
>    gradients test had spewed many hundreds of megabytes of errors in a
>    rather short time before I interrupted it and dropped it from my
>    runs.

I don't think I've seen the gradients test ever succeed.

> 3. It doesn't save any data, nor warn the user to save it.
> 
>    This is especially problematic in light of the above two
>    problems. After waiting forever for a test to complete, a user can
>    be in the sad situation of realizing that the output spewed to the
>    terminal and now lost was the only information generated by
>    rendercheck, (aside from a final count of tests passed and tests
>    run).
> 
> So it would be nice to see some fixes made to this tool to make it
> more usable.
> 
> As is, here's the command-line I ended up using:
> 
> 	./rendercheck -t fill,dcoords,scoords,mcoords,tscoords,tmcoords,blend,repeat,triangles,bug7366 > precious-error-log.rendercheck
> 
> The explicit list of tests passed to the -t option differs from the
> default by not including composite, cacomposite, and gradients (as
> described above). This is somewhat unfortunate as each of these tests
> were definitely spewing some actual errors with an XAA server before I
> got bored and killed it. Maybe someone with more patience (and hard
> drive space) than I have can go back and run these tests to completion
> against XAA and EXA, (or fix the tests to be more efficient first).

I leave all the tests on by default, because I haven't come up with a
good baseline set of tests.  You obviously want to run all the ops at
least once with composite and cacomposite, to make sure you're
programming the blend ops right (and on radeon and most intel, you want
to do it with at least an x8r8g8b8 and an a8r8g8b8 format on both srcs
and dests).  You also obviously want to run all the formats once with
{ca,}composite, to make sure you didn't screw up with texture or
destination format handling.  The cross product here is too big, but the
subset of tests that you're probably interested in is driver-dependent
-- for example, the x8r8g8b8/a8r8g8b8 versus all blend ops isn't
important on hardware that is actually aware of 32-bit
textures/destinations with no alpha channel and that substitute 1.0s
when fetching the alpha, and a8 pictures aren't so magic on some
hardware.

So there are options for letting you subset the ops (-o) or the formats
as well as the tests, which lets you come up with the set of tests you
think is interesting.

Now that we have more than one render acceleration implementation, it
may be possible to come up with a reasonable baseline set of tests, but
I'm leaving that open to someone else to propose.

-- 
Eric Anholt                             anholt at FreeBSD.org
eric at anholt.net                         eric.anholt at intel.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part
URL: <http://lists.x.org/archives/xorg/attachments/20070517/2afef764/attachment.pgp>