[PATCH weston] Added simple unit/integration test framework and corresponding test program.

Fri May 29 17:15:23 PDT 2015

On 05/09/2015 07:38 AM, Daniel Stone wrote:
> Hi,
> Seems Pekka and I did an entirely unintentional tag-team, as this kind
> of high-level goal summary was exactly what I had in mind ...
> 
> On 6 May 2015 at 12:57, Pekka Paalanen <ppaalanen at gmail.com> wrote:

[SNIP]

This is a quick high-level summary of why I'm attacking the problem and
in the ways that I am doing so. My main concern is to quickly get to the
point where we have a good set of automated tests of different types
that can be easily applied on as many platforms as possible. One thing
I've seen that makes such testing easier is to base things on a good
testing framework and then layer project/task specific 'fixtures' on top
to allow for quick and simple testing.

Also it appears that Wayland/weston has need of a few *different* types
of testing, with perhaps three main buckets quickly visible: unit tests,
integration/white-box weston tests, and generic Wayland acceptance
tests. Unit tests should be written along side all new code, but I do
feel that this is the less important of the three for now. Integration
tests would allow for devs to more easily work on adding features to
weston and verify that they don't break things in the process. Finally
the highest level acceptance tests should be runnable against any
Wayland compositor and verify protocol implementation. This last one
currently lines up with the initial stated goals of Wayland FITS (though
not all of it's actual use cases/tests).

I did a survey of existing tools and frameworks to try to find a good
fit. Given that we are focusing on testing C code in C (though a runner
could be in some other language), there is not much suited to Wayland's
needs. C++ and Java have some of the better frameworks around, but are
too tied to the wrong languages. Many of the existing C frameworks cover
only half a solution (simple, but can't fork tests... forks tests but
missing logging, or non-trivial to register tests, etc.). There also
were some interesting things about igt, but most of the value seems to
be in the actual tests and fixtures, not the framework.

So rather than try to extract some files from another solution and split
it into a framework that would fit our needs, it was less work to just
start from scratch and implement something that covered all the
requirements. Of course, one of those is low-overhead on maintaining the
framework itself. The main approach was drawn from JUnit and other xUnit
frameworks, with heavy influence from TestNG, Google Test and Boost
Test. The assertion set from Google Test and Boost Test are one of the
very helpful things when writing unit tests.

Once the low-level framework is set then adding a few mid-level fixtures
for weston-specific setup will fall into place much more quickly.

> What I would like to see in terms of organisation is a set of simple C
> helpers that allows tests to set these environments up very easily:
> start a compositor instance with a certain set of modules or config
> fragments, spawn a test client (perhaps in a separate thread), etc.
> And no shell scripts required: the harder we make it to gdb something,
> the more we're just shooting ourselves in the foot.

That does line up with my main goals. I the past creating such helpers
as "fixtures" for common testing frameworks has been a main win. For
certain uses I've even written basic mock/fake/stub HTTP servers to help
drive tests. Allowing a developer working on a function a very simple
way to setup needed support infrastructure allows them to focus on
coding and testing instead of the framework itself.

Also I very much like to be able to run things from inside of Emacs
running gdb. Shell scripts break things and do make it much harder to
test and debug.

Some fixtures should be focused on server work and some on client.
Regardless, we would be doing well if a devloper could set up a new test
and get into his function in the debugger in under 5-10 minutes end-to-end.

>> This is not a wish for some sort of new "test specification language"
>> with which you can express oceans and the Moon, but a call for some
>> better structure. ;-)
> 
> Agreed.

Yes. I've seen some good payoff just from some basic organization that
conveys sufficient meaning. Perhaps something as simple as a few
subdirectories and a README in their top level.

> 
>> So, maybe take this as more like a note, that we have many kinds of
>> tests with different runtimes.
>>
>> I do have one wish: making it easy to run a test in gdb. That has
>> problems like choosing which process you want to attach gdb to, since
>> usually there are at least the compositor and the test client. Also,
>> how do you get gdb in there; start the process in gdb, or maybe add a
>> raise(SIGSTOP) on some conditions so gdb can attach "from outside".
>>
>> That's just a brain dump of the moment. :-)
> 
> And this. But I think that comes with more helpers, not more
> copy-and-wasted shell scripts.
> 

Definitely sounds in line with my experience.

> Here's a few things that I would really like to see, which is partly
> coloured by having worked quite a bit with intel-gpu-tools recently.
> 
> Self-contained executables: as above, I want to kill off the shell
> scripts to the extent that we can (perhaps one shell fragment you can
> source to set up to get paths).

At first I'd read this to mean splitting tests into a bunch of
individual executables. However I believe your intent is actually to
avoid needing to do more than launch gdb on a single binary with minimal
config.

That's one of the main things that first got me wanting to change some
of the Wayland testing. I'm used to being able to work through my code
in a debugger early and often. Those scripts and environment stuff get
in the way.

And, yes, my current working tree has a config.sh that I source for
paths and then go from there.

> Differentation between failures and unmet requirements: if we want to
> test something which isn't available on the target system, then that's
> a prerequisite failure, not a failure of the test itself.

There might be some subtleties to look into here, but I'm hoping to be
able to get by with just a simple flag-as-skipped in a test setup
function. Does that sound sufficient?

> Better output: just a pass/fail/skip by default with no other
> messages, with (optionally?) showing all output on fail/skip, but
> nothing at all on pass. See Bryce's 9/10 for what I mean: seeing that
> the buffer was successfully allocated is useless when the test passes,
> but helpful when you're trying to debug a failure. So this implies
> some kind of buffering in the core, but also infrastructure to handle
> signals and assertion failures which will dump all buffered output, a
> la igt. Machine-readable is a must here, in order to support ...

Some frameworks I know just drop dots for tests... some do nothing. I
left that level of minimalism for later, choosing instead to start with
command-line output consistent with Google Test. The internal
architecture allows for adding something even more minimal with very
little effort (e.g. TAP took me about 2 hours). For buffering it
includes 'tracepoints' that should at least be a good start. Boost and
Google have similar concepts, but some of the details vary.

> A real test runner/harness: run all tests by default, but allow
> inclusion/exclusion of specific patterns, print the list of available
> tests, print a concise summary, allow comparison of runs to runs.
> Piglit does quite well here, with its output optionally taking a
> previous run to compare to, showing you the current status
> (pass/fail/skip/crash), as well as differences between runs (newly
> passing, regressed to fail, etc).

The framework itself was setup with test runners midway between Google
Test and Boost Test. The latter adds a bit more in regard to organizing
tests in arbitrary hierarchies, but Google Test is following common
JUnit practice that most existing runners (such as those in Jenkins) expect.

Comparison from run to run is normally not deemed to be in scope for
unit testing, with boolean pass/fail being the norm. However, logging
the output in common formats will allow for easy tracking by other
tools. Jenkins itself has much of this added functionality, and having
JUnit style XML allows one to plug right in.

> As little duplication as possible: no copy-and-wasted shell scripts,
> no having to put every test in 4 different lists.

Definitely agree here.

Common practice for unit testing is to create a foo-test.c file (in a
parallel folder hierarchy) for every main foo.c to be tested. Then one
just adds that filename to the source list of one of the existing test
binaries.

> Same implementation language for tests: C.

There are some who argue that a C++ framework could easily be used with
C. However, there are just enough low-level tricky edge things that can
pop up that C does make a better test language.

> No onerous dependencies for the harness: Python 2 is available
> basically everywhere. Python 3 is getting there. Perl is usually
> available, but I can't tell you how much I dislike it. C also works.
> More exotic languages ... no thanks. The more difficult we make it to
> run on targets, the less it will get run.

> Coherence with test frameworks around us: people testing
> Wayland/Weston are likely to be testing DRM/KMS, (E)GL(ES), GStreamer,
> and the various toolkits. The closer we are to what these projects
> implement - especially in terms of compatibility with their automated
> run output - the more impact our framework will have.

That is a very good point. However, following standard practices from
the testing world, especially from Test Driven Development (TDD) will
allow most any developer to be able to contribute testing very quickly.
And for compatibility, if any frameworks need more than TAP or JUnit XML
to be created such additions can be added quickly. Just point out the
details and we can have things up in almost no time.

> I posit that igt + Piglit is a really good starting point for this.
> Others might disagree. Honestly I don't care too much, as long as we
> get something which not only meets these requirements, but also isn't
> onerous to maintain. XTS is a perfect example of what happens when you
> build up a huge infrastructure that ends up getting abandoned and
> becoming a net cost, as the cost of maintaining it outweighs the
> benefits the tests bring.

I think here that the set of tests that have been collected in igt +
Piglit are its main strength. Most, however, are more on the level of
white-box testing of graphics drivers. The good point, though, is that
we can look to that project to pull in the parts that make sense since
I've been told its under a compatible license. I'd even started by
seeing about extracting their framework parts and building from that,
but those are not near the feature set of Google Test which makes a
difference for unit testing at the least.

> It's blindingly obvious from our discussions on IRC and the style
> review I did earlier that we have very differing views on how testing
> should work. But what we do agree on is that testing (all of unit,
> integration and functional) is very important, and that without a
> great infrastructure to support this, we're shooting ourselves in the
> foot. I think the best way to go about this is to work with our
> existing tests - including Bryce's headless work - and piece-by-piece
> build up a framework which supports that. My main worry right now is
> that by designing a grand framework upfront and moulding the tests to
> fit that, that we might end up with a bit of a mismatch between that.
> But ultimately, as long as we get a framework which makes it as easy
> to write tests as possible, and can eventually fulfill all these
> goals, then I'm sure we'll all be happy.
> 
> Thanks for your work on these. Looking forward to see how it all develops.

I also think we need to avoid some grand solution and instead just get a
few simple pieces in a few standard layers and build up from there.
Getting things out of the way of developers is the main thing I've seen
actually enable them to write tests. That the base framework has gotten
90% of needed functionality in just about 2kloc is a good sign. (Note
that I'd definitely call it past that magic 80% level where you discover
another 80% yet to be done).

We need to fill in the mid-level stuff with additional Wayland and
Weston helper code, but by using the common 'fixture' paradigm for those
we should minimize work needed in actually writing and using the tests.
And finally, by splitting out the weston specifics into another folder
altogether it makes it easy for me to build my Wayland FITS replacement
on top of the other two layers.

-- 
Jon A. Cruz - Senior Open Source Developer
Samsung Open Source Group
jonc at osg.samsung.com