[Intel-gfx] [PATCH] drm/i915: Ignore -EIO from __i915_wait_request() during mmio flip

Chris Wilson chris at chris-wilson.co.uk
Tue Jun 16 05:10:33 PDT 2015


On Mon, Jun 15, 2015 at 06:34:51PM +0200, Daniel Vetter wrote:
> On Thu, Jun 11, 2015 at 09:01:08PM +0100, Chris Wilson wrote:
> > On Thu, Jun 11, 2015 at 07:14:28PM +0300, ville.syrjala at linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > 
> > > When the GPU gets reset __i915_wait_request() returns -EIO to the
> > > mmio flip worker. Currently we WARN whenever we get anything other
> > > than 0. Ignore the -EIO too since it's a perfectly normal thing
> > > to get during a GPU reset.
> > 
> > Nak. I consider it is a bug in __i915_wait_request(). I am discussing
> > with Thomas Elf how to fix this wrt the next generation of individual
> > ring resets.
> 
> We should only get an -EIO if the gpu is truly gone, but an -EAGAIN when
> the reset is ongoing. Neither is currently handled. For lockless users we
> probably want a version of wait_request which just dtrt (of waiting for
> the reset handler to complete without trying to grab the mutex and then
> returning). Or some other means of retrying.
> 
> Returning -EIO from the low-level wait function still seems appropriate,
> but callers need to eat/handle it appropriately. WARN_ON isn't it here
> ofc.

Bleh, a few years ago you decided not to take the EIO handling along the
call paths that don't care.

I disagree. There are two classes of callers, those that care about
EIO/EAGAIN and those that simply want to know when the GPU is no longer
processing that request. That latter class is still popping up in
bugzilla with frozen displays. For the former, we actually only care
about backoff if we are holding the mutex - and that is only required
for EAGAIN. The only user that cares about EIO is throttle().
 
> Also we have piles of flip vs. gpu hang testcases ... do they fail to
> provoke this or is this another case of bug lost in bugzilla?

We have a few bugs every year for incorrect EIOs returned by
wait_request, but none for this case.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list