[Intel-gfx] [RFC i-g-t 0/4] Redundant test pruning

Tue Jun 27 11:46:28 UTC 2017

Quoting Daniel Vetter (2017-06-27 10:14:40)
> On Tue, Jun 27, 2017 at 09:02:02AM +0100, Tvrtko Ursulin wrote:
> > 
> > On 26/06/2017 17:09, Daniel Vetter wrote:
> > > On Fri, Jun 23, 2017 at 12:31:39PM +0100, Tvrtko Ursulin wrote:
> > > > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > 
> > > > Small series which saves test execution time by removing the redundant tests.
> > > > 
> > > > Tvrtko Ursulin (4):
> > > >    igt: Remove default from the engine list
> > > >    gem_exec_basic: Exercise the default engine selection
> > > >    gem_sync: Add all and store_all subtests
> > > >    extended.testlist: Remove some test-subtest combinations
> > > 
> > > Ack on patches 1&2, but I'm not sold on patch 3. Atm gem_* takes a
> > > ridiculous amount of machine time to run, you're adding more stuff. Are
> > > those tests really drastially better at catching races if we run them 10x
> > > longer? Is there no better way to exercise the races (lots more machines,
> > > maybe slower ones, which is atm impossible since it just takes way, way
> > > too long and we need an entire farm just for one machine).
> > 
> > New gem_sync subtests were suggested by Chris after I send the first version
> > of the series with the goal of getting the same coverage in faster time.
> > 
> > If you look at patch 4, it removes 18 * 150s of gem_sync subtests, and adds
> > 4 * 150s. So in total we are 35 minutes better of in the best case, a bit
> > less on smaller machines.
> 
> So why keep the other 18 tests when we have coverage by the new ones? Some
> developer modes (like e.g. kms_frontbuffer_tracking has) for testing is
> all nice, but piling ever higher amounts of redundant tests up isn't great
> imo.

They are redundant? The subtle differences have dramatic impact on
timings and bug discovery. I was suggesting that if we were going to
run a cutdown test, it may as well be engineered for the task. I am very
happy if we could replace all of the bulk stess tests with a fuzzy
approach. We obviously have to keep a minimal set to check expected
behaviour and to catch old regressions, but trying to capture all the
ways the hw can fail and muck up the driver should be automated. I've
been wondering if we can write a mock device powered by BPF (or
something) and see if we can do fault injection for the more obscure
code paths. Regular fuzzing over the abi to maximise code coverage is
much easier than defining how the hw is supposed to react and fuzzing
the hw through the driver.

I don't agree that cutting them out of CI helps me at all trying to find
bugs with mtbf of over 24 hours. CI scales by adding more machines, not
by reducing tests. We need more diversity in our tests, not less.
-Chris