[Intel-gfx] [PATCH 03/15] drm/i915: Upgrade execbuffer fail after resume failure to EIO

Fri Aug 8 11:46:07 CEST 2014

On Fri, Aug 08, 2014 at 10:17:10AM +0100, Chris Wilson wrote:
> On Wed, Aug 06, 2014 at 10:39:16AM +0200, Daniel Vetter wrote:
> > On Wed, Aug 06, 2014 at 09:12:32AM +0100, Chris Wilson wrote:
> > > On Wed, Aug 06, 2014 at 09:56:45AM +0200, Daniel Vetter wrote:
> > > > On Tue, Aug 05, 2014 at 07:51:14AM -0700, Rodrigo Vivi wrote:
> > > > > From: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > 
> > > > > If we try to execute on a known ring, but it has failed to be
> > > > > initialised correctly, report that the GPU is hung rather than the
> > > > > command invalid. This leaves us reporting EINVAL only if the user
> > > > > requests execution on a ring that is not supported by the device.
> > > > > 
> > > > > This should prevent UXA from getting stuck in a null render loop after a
> > > > > failed resume.
> > > > > 
> > > > > v2 (Rodrigo): Fix conflict and add VCS2 ring and
> > > > >    	      s/intel_ring_buffer/intel_engine_cs.
> > > > > 
> > > > > Reported-by: Jiri Kosina <jikos at jikos.cz>
> > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> > > > 
> > > > This isn't required any more, see
> > > > 
> > > > commit 074c6adaf4e7d1423d373bd5d1afc20b683cb4d0
> > > > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Date:   Wed Apr 9 09:19:43 2014 +0100
> > > > 
> > > >     drm/i915: Mark device as wedged if we fail to resume
> > > > 
> > > > for the alternate merged patch.
> > > 
> > > Hmm, there is still a path that ends here, but the example above is
> > > already fixed as you say.
> > 
> > We have the EIO check both in the resume and driver load paths. Which
> > other path are we missing?
> 
> The GPU may be set to wedged, but this check in execbuffer occurs before
> we check for a wedged GPU.

But we no longer free the ring structures over suspedn/resume, so at least
the commit message is outdated.

I wonder whether the easier fix wouldn't be to continue ring init if we
get an -EIO.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch