[Piglit] [PATCH] Add dmesg option for reboot policy

Daniel Vetter daniel at ffwll.ch
Wed Nov 25 04:42:16 PST 2015


On Tue, Nov 24, 2015 at 02:10:34PM +0000, Emil Velikov wrote:
> Hi Yan,
> 
> The plan of having such a module is pretty sound.
> 
> That said I think that the actual policy/implementation could use some tweaks.
> 
> On 24 November 2015 at 12:14,  <yann.argotti at linux.intel.com> wrote:
> > From: Yann Argotti <yann.argotti at linux.intel.com>
> > Date: Tue, 24 Nov 2015 12:16:34 +0100
> >
> >  This adds a policy which advises when user should reboot system to avoid
> >  noisy test results due to system becoming unstable, for instance, and
> >  therefore continues testing successfully. To do this, a new Dmesg class is
> >  proposed which is not filtering dmesg and monitors whether or not one of
> >  the following event occurs:
> >   - gpu reset failed (not just gpu reset happened, that happens
> >  way too often and many tests even provoke hangs intentionally)  - gpu crash,
> >  - Oops:  - BUG  - lockdep splat that causes the locking validator to get
> >  disabled If one of these issues happen, piglit test execution is stopped
> >  -terminating test thread pool- and exit with code 3 to inform that reboot is
> >  advised. Then test execution resume, after rebooting system or not, is done
> >  like usually with command line parameter "resume".
> >
> Shouldn't one check for the above issues and trigger only when GPU
> reset was not successful ?
> Otherwise the idea of robustness, webgl and friends go down the drain.

This is exactly the idea. i915.ko prints different garbage into dmesg when
the gpu reset failed compared to when it succeeded. We can use that to
make a sensible decision for when to reboot.

And I'd expect mesa testing to unconditionally reboot even for a
successful reset, since we have a track record of slightly screwing up
reset handling for some obscure features (like miss setting some wa bits).
So for testing mesa with piglit (where we never expect a gpu hang to
happen) rebooting always is likely the right approach.

igt then has piles of testcases that intentionally hang the gpu, to
validate all the reset logic. So in total there's no gap, at least for
intel.
-Daniel


> After all I'd imagine that the kernel devs want to know when GPU reset
> does not work properly, albeit in a perfect world usespace should not
> be able to lockup/crash the GPU.
> 
> From a quick look at the patterns used, it seems that we'll trigger on
> any BUG/Oops, regardless of the source - gpu, wifi, fs, etc driver.
> This is bound to cause a lot of false positives, esp if the human
> being running these tests does not check through the actual messages.
> 
> These are topics for discussion, rather than a request for changing
> the current patch.
> 
> Cheers,
> Emil

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Piglit mailing list