[Piglit] [PATCH] framework/backends/junit: Report expected failures/crashes as skipped.

Tue Mar 3 14:26:00 PST 2015

Jose Fonseca <jfonseca at vmware.com> writes:

> Thanks for the reviews.
>
>
> My setup is not complicated as yours.  Mostly because my main focus has 
> been llvmpipe, and we're not actively adding supporting to new OpenGL 
> extensions for it at this moment, so most of the new piglit tests either 
> skip or pass. New failures are relatively rare.

One of the lessons I learned with Jenkins is that if the automation is
complex, it should be in git and not in Jenkins projects.  Our Jenkins
jobs typically just execute a single script with a constrained set of
parameters.

This facilitates testing/debugging without invoking builds on the
Jenkins instance.  It also is much easier to handle development branches
by branching your scripts, as opposed to cloning Jenkins projects.

We orchestrate multi-platform jobs with python via Jenkins Remote Access
API.  Binaries and test results are communicated between builds via a
shared drive.

> Before I used 
> https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin -- it 
> allows to control if/when emails are sent with great flexibility. In 
> particular it allows to send emails for new regressions, but not for 
> tests that were failing previously.
>
> Still, nothing beats being able to look at a bunch of test jobs and 
> immediately tell all blue == all good.   Alhough I use jenkins for many 
> years now, this is a lesson I only learned recently -- it's better to 
> mask out expected failures somehow and get a boolean "all pass" or 
> "fail" for the whole test stuie, than trying to track pass/fail for 
> individual tests.  The latter just doesn't scale...
>
>
>
> We do have an internal branches where we run piglit.  I confess I don't 
> a good solution yet.  Your trick of maintaining a database of which git 
> commit tests were added is quite neat.  Another thing worth considering 
> would be to branch or tag/ piglit whenever Mesa is branched, and keep 
> using a matching (and unchanging) piglit commit.

Tagging piglit is a simpler solution than what I've done.  The primary
use case, though, is for developers to test their branches before they
send them to the mailing list.  Without a recent rebase, their branches
will typically report failures on tests which were fixed on master.

> We also run testsuites through different APIs (namely D3D9/10).  These 
> testsuite rarely get updated, and llvmpipe conformance is actually quite 
> good to start with, so it's easy to get "all pass" there.
>
>
> Piglit, by being continuously updated/extended, is indeed more of a 
> challenge than other testsuites.
>
>
> We also use piglit for testing our OpenGL guest driver, but we use an 
> internal testing infrastructure to driver, not Jenkins.  So our 
> experiences there don't apply.
>
>
> I also have a few benchmarks on jenkins.  Again, I only keep track of 
> performance metrics via Jenkins Plots and Measurements plugins, but I 
> don't produce pass/fail based on those metrics.  I am however 
> considering doing something of the sort -- e.g., getting the history of 
> the metrics via jenkins JSON API, fit into a probablylity distribution, 
> and fail when performance goes below a given percentile.

I'm curious to know which benchmarks you use.  We experience variability
in a lot of benchmarks, especially if they are cpu-heavy and running on
an under-powered system.  It would be great to have workloads that
produce reliable trends without having to run them repeatedly.

thanks for the details!

-Mark

> Jose
>
>
> On 03/03/15 18:34, Mark Janes wrote:
>> Thanks Jose! this is an improvement.
>>
>> In my experience, broken tests are introduced and fixed in mesa on a
>> daily basis.  This has a few consequences:
>>
>>   - On a daily basis, I look at failures and update the expected
>>     pass/fails depending on whether it is a new test or a regression.
>>     Much of this process is automated.
>>
>>   - Branches quickly diverge on the basis of passing/failing tests.
>>     Having separate pass/fail configs on release branches is
>>     unmanageable.  To account for this, my automation records the
>>     relevant commit sha as the value in the config file (the key is the
>>     test name).  I post-process the junit xml to filter out test failures
>>     with commits that occurred after the branch point.
>>
>>   - for platforms that are too slow to build each checkin, I run an
>>     automated bisect which builds/tests in jenkins, then updates config
>>     files.
>>
>>   - Our platform matrix generates over 350k unskipped tests for each
>>     build.  We filter out skipped tests due to the memory consumption on
>>     jenkins when displaying this many tests.
>>
>> I am interested in learning more about your test system, and sharing
>> lessons learned / techniques.
>>
>> -Mark
>>
>> Reviewed-by: Mark Janes <mark.a.janes at intel.com>
>>
>> Jose Fonseca <jfonseca at vmware.com> writes:
>>
>>> I recently tried the junit backend's ability to ignore expected
>>> failures/crashes and found it a godsend -- instead of having to look as
>>> test graph results periodically, I can just tell jenkins to email me
>>> when things go south.
>>>
>>> The only drawback is that by reporting the expected issues as passing it
>>> makes it too easy to forget about them and misinterpret the pass-rates.
>>> So this change modifies the junit backend to report the expected issues
>>> as skipped, making it more obvious when looking at the test graphs that
>>> these tests are not really passing, and that whatever functionality they
>>> target is not being fully covered.
>>>
>>> This change also makes use of the junit `message` attribute to explain
>>> the reason of the skip.  (In fact, we could consider using the `message`
>>> attribute on other kind of failures to inform the piglit result, instead
>>> of using the non-standard `type`.)
>>> ---
>>>   framework/backends/junit.py | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/framework/backends/junit.py b/framework/backends/junit.py
>>> index 82f9c29..53b6086 100644
>>> --- a/framework/backends/junit.py
>>> +++ b/framework/backends/junit.py
>>> @@ -129,17 +129,19 @@ class JUnitBackend(FileBackend):
>>>               # Add relevant result value, if the result is pass then it doesn't
>>>               # need one of these statuses
>>>               if data['result'] == 'skip':
>>> -                etree.SubElement(element, 'skipped')
>>> +                res = etree.SubElement(element, 'skipped')
>>>
>>>               elif data['result'] in ['warn', 'fail', 'dmesg-warn', 'dmesg-fail']:
>>>                   if expected_result == "failure":
>>>                       err.text += "\n\nWARN: passing test as an expected failure"
>>> +                    res = etree.SubElement(element, 'skipped', message='expected failure')
>>>                   else:
>>>                       res = etree.SubElement(element, 'failure')
>>>
>>>               elif data['result'] == 'crash':
>>>                   if expected_result == "error":
>>>                       err.text += "\n\nWARN: passing test as an expected crash"
>>> +                    res = etree.SubElement(element, 'skipped', message='expected crash')
>>>                   else:
>>>                       res = etree.SubElement(element, 'error')
>>>
>>> --
>>> 2.1.0