[Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

Chris Wilson chris at chris-wilson.co.uk
Tue May 16 09:04:20 UTC 2017


On Tue, May 16, 2017 at 08:54:51AM +0000, Lofstedt, Marta wrote:
> 
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > Sent: Tuesday, May 16, 2017 11:21 AM
> > To: Lofstedt, Marta <marta.lofstedt at intel.com>
> > Cc: Daniel Vetter <daniel at ffwll.ch>; Martin Peres
> > <martin.peres at linux.intel.com>; intel-gfx at lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 07:42:51AM +0000, Lofstedt, Marta wrote:
> > > I hereby pull-out this patch.
> > > The idea of it was to know if we were already wedged at the beginning of
> > testing, that would give us information on how to interpret silly results; such
> > that test starting to get skipped and/or we got dmesg-warns/incomplete on
> > tests that usually should be skipped.
> > > Also, we are planning to soon deploy a piglit.conf solution where testing
> > will be terminated on wedged, so I agree that my test isn't really needed.
> > 
> > Not everything is broken by wedged; internally we just use that as an
> > indicator that GEM is hosed. KMS should still work, we must still be able to
> > drive the displays to show the error and keep the servers alive until the data
> > is saved (and hopefully gracefully degrade that we don't have to interrupt
> > their immediate session).
> 
> It doesn't matter if it is broken or not, if we are terminally wedged the rest of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500 is wedged at igt at gem_busy@basic-hang-default, then all test are skipped until gem_exec_reloc at basic-cpu-gtt-noreloc where the machine hangs, but it is a gem test so it should have been skipped, right. My conclusion from seeing this pattern multiple times is that after terminally wedged, silly things can happen, i.e. we can't trust the results, and since we don't want silly bugs, the CI testing should be stopped.

The machine didn't hang, it was remotely killed because the run timed out.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list