[Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

Lofstedt, Marta marta.lofstedt at intel.com
Tue May 16 10:07:41 UTC 2017



> -----Original Message-----
> From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> Sent: Tuesday, May 16, 2017 12:48 PM
> To: Lofstedt, Marta <marta.lofstedt at intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>; Martin Peres
> <martin.peres at linux.intel.com>; intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture
> the state of the GPU
> 
> On Tue, May 16, 2017 at 09:43:52AM +0000, Lofstedt, Marta wrote:
> >
> >
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > Sent: Tuesday, May 16, 2017 12:04 PM
> > > To: Lofstedt, Marta <marta.lofstedt at intel.com>
> > > Cc: Daniel Vetter <daniel at ffwll.ch>; Martin Peres
> > > <martin.peres at linux.intel.com>; intel-gfx at lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a
> > > test to capture the state of the GPU
> > >
> > > On Tue, May 16, 2017 at 08:54:51AM +0000, Lofstedt, Marta wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > > Sent: Tuesday, May 16, 2017 11:21 AM
> > > > > To: Lofstedt, Marta <marta.lofstedt at intel.com>
> > > > > Cc: Daniel Vetter <daniel at ffwll.ch>; Martin Peres
> > > > > <martin.peres at linux.intel.com>; intel-gfx at lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add
> > > > > a test to capture the state of the GPU
> > > > >
> > > > > On Tue, May 16, 2017 at 07:42:51AM +0000, Lofstedt, Marta wrote:
> > > > > > I hereby pull-out this patch.
> > > > > > The idea of it was to know if we were already wedged at the
> > > > > > beginning of
> > > > > testing, that would give us information on how to interpret
> > > > > silly results; such that test starting to get skipped and/or we
> > > > > got dmesg-warns/incomplete on tests that usually should be skipped.
> > > > > > Also, we are planning to soon deploy a piglit.conf solution
> > > > > > where testing
> > > > > will be terminated on wedged, so I agree that my test isn't really
> needed.
> > > > >
> > > > > Not everything is broken by wedged; internally we just use that
> > > > > as an indicator that GEM is hosed. KMS should still work, we
> > > > > must still be able to drive the displays to show the error and
> > > > > keep the servers alive until the data is saved (and hopefully
> > > > > gracefully degrade that we don't have to interrupt their immediate
> session).
> > > >
> > > > It doesn't matter if it is broken or not, if we are terminally
> > > > wedged the rest
> > > of the result may be silly. Look for example at CI_DRM_2612, the
> > > fi-elk-e7500 is wedged at igt at gem_busy@basic-hang-default, then all
> > > test are skipped until gem_exec_reloc at basic-cpu-gtt-noreloc where
> > > the machine hangs, but it is a gem test so it should have been
> > > skipped, right. My conclusion from seeing this pattern multiple
> > > times is that after terminally wedged, silly things can happen, i.e.
> > > we can't trust the results, and since we don't want silly bugs, the CI
> testing should be stopped.
> > >
> > > The machine didn't hang, it was remotely killed because the run timed
> out.
> > How do you know that?
> 
> The dmesg is a stream of flip timeouts until we run out of total BAT runtime
> (12 minutes + some startup slack).
> -Chris

Then look at CI_DRM_2602, wedged at igt at gem_busy@basic-hang-default, after a lot of skipping, we get incomplete result for another test, this time gem_exec_reloc at basic-gtt-cpu-noreloc

So, gem_exec_reloc at basic-cpu-gtt-noreloc and gem_exec_reloc at basic-gtt-cpu-noreloc are falsely getting blamed and my conclusion is that this is due to the permanent wedging started at gem_busy at basic-hang-default. So, to avoid bug reports for gem_exec_reloc at basic-cpu-gtt-noreloc and gem_exec_reloc at basic-gtt-cpu- noreloc the suggestion is to stop testing after we are terminally wedged. 

> 
> --
> Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list