[Intel-gfx] [PATCH] drm/i915: Reinstate error level message for non-simulated gpu hangs

Chris Wilson chris at chris-wilson.co.uk
Wed Oct 1 11:10:56 CEST 2014


On Wed, Oct 01, 2014 at 01:52:20AM -0700, Kenneth Graunke wrote:
> On Wednesday, October 01, 2014 10:29:07 AM Daniel Vetter wrote:
> > On Wed, Oct 01, 2014 at 09:19:50AM +0100, Chris Wilson wrote:
> > > On Wed, Oct 01, 2014 at 10:13:00AM +0200, Daniel Vetter wrote:
> > > > On Wed, Oct 01, 2014 at 07:28:39AM +0100, Chris Wilson wrote:
> > > > > On Wed, Oct 01, 2014 at 01:04:19AM +0200, Daniel Vetter wrote:
> > > > > > This seems to have been accidentally lost in
> > > > > > 
> > > > > > commit be62acb4cce1389a28296852737e3917d9cc5b25
> > > > > > Author: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > > > > > Date:   Fri Aug 30 16:19:28 2013 +0300
> > > > > > 
> > > > > >     drm/i915: ban badly behaving contexts
> > > > > > 
> > > > > > Without this real gpu hangs only log output at info level, which gets
> > > > > > filtered away by piglit's testrunner.
> > > > > 
> > > > > A successful GPU hang is not an error. Might be a warn or a notice, but
> > > > > it certainly isn't a driver error.
> > > > 
> > > > Well not of the kernel driver, but might very well be a bug in the
> > > > userspace driver. With this piglit marks tests that hung the gpu as
> > > > dmesg-fail, without this they might even pass. Ken raised this on irc and
> > > > I agree that it's a must-have feature for developers that their testsuite
> > > > can tell them when stuff broke. Provding this some other way is a lot more
> > > > work and imo should be done in a separate patch, this here is just the
> > > > minimal fix for this regression.
> > > 
> > > I strongly disagree that we should be working around self-imposed
> > > limitations of the test suite by making users believe their kernel is
> > > broken.
> > 
> > So what else should piglit do then?
> > -Daniel
> 
> Your GPU hanging is clearly more severe than "info" - it may impact your system stability, and likely represents a bug somewhere in the graphics drivers (whether kernel or userspace).  I think we all agree on that.
> 
> Piglit runs "dmesg --level emerg,alert,crit,err,warn,notice", which covers everything except "info" and "debug".  So anything other than info/debug would be just fine.

If we are happy with KERN_NOTICE (normal, but significant condition), that
is what I would suggest. Actually, we should make the GPU hang detection
itself the notice (to aide regular users). But that probably runs into
complications with the simulated hangs with igt causing a WARN test
failure - but again, I'd rather our interface with the user (and
userfacing bug reporting tools) be correct.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list