Weston testing (Re: New to wayland, need help pls.)

Wed Apr 16 23:53:50 PDT 2014

On Wed, 16 Apr 2014 20:41:50 +0000
"Bryce W. Harrington" <b.harrington at samsung.com> wrote:

> On Wed, Apr 16, 2014 at 08:51:21AM +0300, Pekka Paalanen wrote:
> > On Tue, 15 Apr 2014 19:08:12 +0000
> > "Bryce W. Harrington" <b.harrington at samsung.com> wrote:
> > 
> > > Maybe adding more tests?  Any particular features recently added that
> > > could benefit from tests added?
> > 
> > Yes! That is a good one. Either adding completely new tests, or maybe
> > working with Marek Chalupa in converting all possible FAIL_TEST cases to
> > normal TEST cases, as permitted by Marek's
> > http://lists.freedesktop.org/archives/wayland-devel/2014-April/014190.html
> > 
> > I can't list any features off-hand that would need new tests which would
> > be easy to implement.
> 
> Would there be value to just basic unit tests on some of the established
> API's?  E.g. like take xdg-shell's API, check passing in negative or
> zero dimentions, crazy long strings for string params, stuff like that.
> A lot of the protocol's code is generated so it's unclear to me whether
> this level of testing would be useful or not, but if it is, it tends to
> be pretty newbie-friendly.

Even when the wrappers are generated, argument validity checks are
written manually, so yes, there would be value in writing those.

Another easy bit is testing object destruction orders for somehow
related objects. Like if you create a wl_viewport for wl_surface, and
then the wl_surface is destroyed, the wl_viewport should be inert and
using it should only be ignored in the compositor, not cause crashes or
protocol errors. "Inert" is a specific term in the specification
language for exactly that.

The sub-surface tests do extensive destroy order testing, but I think
it might lack testing the "ignore" part, and I'm sure there would be
other places to add more such tests.

Also, every defined protocol error in the protocol specifications should
be excercised by a test, that should be easy, too.

However, I'm not sure if xdg_shell particularly is a good target at the
moment, as I think there might still be changes coming?

> > But I guess we should at some point hook up a
> > real renderer to the headless backend, and add infrastructure for the
> > test framework to do screencaptures, so we could test also rendering
> > and things like window stacking and surface/buffer transformations.
> 
> That sounds keen, and could open up a lot of testing potential.  Shame
> it's not already in place!  Could you elaborate on how you think this
> could be created?  I'd like to explore this further myself.

Well, there is the test protocol extension, you could add
client-initiated screenshooting feature to it, perhaps on a designated
sub-rectangle.

Easiest for headless would probably be to start by plugging in the
pixman renderer, and defining a hardcoded single-resolution output.

Since tests also run the desktop-shell client which draws "random"
things like takes colors from configs and paints the clock, maybe it
could on init check if the test extension is advertised by the
compositor. If the test extension is there, desktop-shell client would
use only hardcoded configs, and paint a hardcoded time in the clock,
etc. so that the rendered output should always be the same.

Window initial position randomization in shell.c might be a problem.

You'd also need to save the screenshots into image files, so a
developer can check the result.

> > Pixman's fuzzer test might be a good inspiration on how to actually do
> > the tests once the infrastructure is up:
> > http://lists.freedesktop.org/archives/pixman/2014-April/003229.html
> > Provided we can guarantee pixel-perfect rendering, which might not
> > always be true, so a checksum-based checking might not always work.
> 
> I take it this is similar to the Cairo testsuite's perceptual diff
> stuff?  (Checksumming two bitmaps pixel by pixel)

I suppose, though I haven't looked at Cairo.

> What we're seeing in Cairo is that sometimes pixel differences indicate
> issues, but a lot of time it's ignorable, and deciding which is which is
> fairly hard.  But used in situations where you are 100% sure exactly
> what the images should look like, this can be quite effective.

Yes, Pixman does the same separation. Some tests require bit-perfect
results, others are known to have a little variance.

I suppose with a Pixman renderer, we can have the guarantees Pixman
has since we know which operations we use (or do we?), but with a
gl-renderer, or say, rpi-renderer, I don't think we can require
bit-perfect reproduction.

We can already choose the backend to use when running 'make check', we
could have the headless backend expose pixman renderer by default and
software rendered gl-renderer as an option.

> In practice, just having all the different rendering tests can be a
> handy workload for exposing crashes.  I suspect coupling it with some
> timing measurements it might also help expose performance regressions.

I don't think we can make automated performance testing too reliable. I
wouldn't want a build-bot for some distribution to fail 'make check'
just because it runs in a virtual machine on some heavily loaded
hardware.

So performance testing would be a separate case.

> > But I think that is getting quite heavy for just introductory work.
> 
> For something like this, 80% of the work is just deciding on a good
> design.  Most of the implementation work is going to be basic coding.
> There's going to be a few hard tasks (like attaching a snapshot renderer
> functionality to the headless backend), but even those are likely to get
> owners if they're well enough described.

Right. We need someone to put in the infrastructure. :-)

Thanks,
pq