[Pixman] [PATCH] test: Change composite so that it tests randomly generated images

Fri Oct 8 02:51:14 PDT 2010

On Tuesday 05 October 2010 20:47:46 Soeren Sandmann wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > On Sunday 07 March 2010, Søren Sandmann wrote:
> > > Previously this test would try to exhaustively test all combinations
> > > of formats and operators, which meant that it would take years to run.
> > > 
> > > Instead, generate random images and test those. The random seed is
> > > based on time(), so we will get different tests every time it is run.
> > > Whenever the test fails, it will print what the value of lcg_seed was
> > > when the test began.
> > > 
> > > This patch also adds a table of random seeds that have failed at some
> > > point. These seeds are run first before the random testing begins.
> > > Whenever this tests fails, you are supposed to add the seed that
> > > failed to the table, so that we can track that the bug doesn't get
> > > reintroduced.
> > 
> > I don't quite like any nondeterministic random factor in the standard
> > regression tests. Preferably the results of such tests should be
> > reproducible from run to run, even if they are not perfect and do not
> > provide full coverage.
> 
> I just sent a new set of patches that don't have the nondeterministic
> behavior. However, considering that several of the tinderboxes here:
> 
>         http://tinderbox.x.org/
> 
> are running make check over and over it would be really useful to make
> them run different sets of tests each time.

First I would like to say that I don't have any major objections regarding
these patches. Let's try it this way and see if it proves to be useful. The
tests have to be improved in one way or another eventually.

Now about the tinderbox stuff. If we could afford having a dedicated server for
pixman testing with lots of computational power, I think it would make sense to
run the tests in a number of different ways. For example, assuming x86-64 host
system, it makes sense to run them both with "-m32" option in CFLAGS and
without it. Also trying "--disable-sse2" and "--disable-mmx --disable-sse2"
configurations would help to spot problems/regressions in MMX or C fast paths
which would otherwise be hidden by SSE2 optimizations. Using mingw
crosscompiler, combined with binfmt-misc and wine could provide some testing
for win32 compatibility. Userspace qemu with binfmt-misc could be used to do
basic checks for big endian compatibility (for example when compiling pixman
for big endian mips).

Also current tests have a problem that we currently prefer them to run for no
more than a few minutes. And increasing the scope of fast paths (like adding
more optimizations for complex transformation cases) can increase testing
time beyond these reasonable limits, or make them less reliable if testing
time remains the same.

Another useful testing mode (yet to be introduced) could be to run them
overnight or unless interrupted.

> > This is quite important for having a clearly defined formal patch
> > submission process (Before submitting a patch, one needs to make sure
> > that the regression tests pass. If they don't pass, the problem has to
> > be investigated and patch fixed or regression tests updated if needed).
> > 
> > With the randomness in the tests, patch contributor may end up in
> > different confusing situations:
> > - regression test fails for him, even if his patch is fine (if the
> > problem was introduced by somebody else earlier)
> > - regression test passes for him, but fails for the others later (due to
> > the bug in the patch). In this case it would be hard to say if the
> > contributor did proper job running the regression tests in the first
> > place.
> 
> I'm not sure this is a big problem. If the test fails, it would print
> out the seed that failed so the contributor can try the test again
> without the patch applied. This will allow him to determine whether
> the problem was introduced by him or not. If it wasn't, hopefully he
> would report the bug.
> 
> It's true that some people, if the test fails intermittently for them,
> might try submitting anyway hoping that nobody will notice. However, I
> don't think most people would do that, and if the test fails so rarely
> that they could get away with something like that, any fixed subset of
> the test would likely also miss the problem.

One of the drawbacks is that this all makes learning curve a bit more steep, so 
that the contributors now need to understand what are these 'seed' things, how
to find the test programs in the pixman source tree and run them individually,
etc.

IMHO using git bisect feature with just 'make check' is a lot easier and does
not require much of the understanding about what's happening under the hood.

This also highlights the need of having formal instructions for pixman
contributors about how to submit patches and what are the basic requirements.
I think that the current RELEASING file does not serve this purpose well.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20101008/3ce68cf2/attachment.pgp>