[Intel-gfx] [PATCH] drm/i915: Upgrade execbuffer fail after resume failure to EIO

Chris Wilson chris at chris-wilson.co.uk
Tue Mar 25 17:29:55 CET 2014


On Tue, Mar 25, 2014 at 04:36:05PM +0100, Daniel Vetter wrote:
> On Tue, Mar 25, 2014 at 08:15:54AM +0000, Chris Wilson wrote:
> > On Tue, Mar 25, 2014 at 09:07:00AM +0100, Daniel Vetter wrote:
> > > On Tue, Mar 25, 2014 at 9:03 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > > If we try to execute on a known ring, but it has failed to be
> > > > initialised correctly, report that the GPU is hung rather than the
> > > > command invalid. This leaves us reporting EINVAL only if the user
> > > > requests execution on a ring that is not supported by the device.
> > > >
> > > > This should prevent UXA from getting stuck in a null render loop after a
> > > > failed resume.
> > > >
> > > > Reported-by: Jiri Kosina <jikos at jikos.cz>
> > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > 
> > > Feels a bit like duct-tape ... Shouldn't we instead stop tearing down
> > > ringbuffer structures over suspend/resume? And maybe the same for init
> > > with your patch applied.
> > 
> > Even if we did, we would still end up with g45 failing to restart
> > the ring at random, so we need some w/a.
> >  
> > > Or we simply check for wedged first thing in execbuf ...
> > 
> > See the first 2 patches ;-) The first is actually essential as we have
> > no other guard against writing into the non-existent ring.
> > 
> > I thought about doing that. However, I prefer doing arg validation
> > first i.e. get all (or as much as is feasible) of the EINVAL checks out
> > of the way so that we avoid touching hardware or leaking any information
> > to a malicious client. The problem in this case is where is not actually
> > an invalid arg.
> > 
> > Note that we will detect the EIO later before touching hardware (so long
> > as the first two patches are in place).
> 
> Yeah I've seen the other patches. I think we should try to keep all the
> ring structures around even when the hw init failed. I've made some feeble
> attempts a while ago to split the structure init from the hw init stuff,
> but kinda never fully materialized ...
> 
> Imo if our set of valid rings semi-randomly changes at runtime even,
> that's not good.

Agreed, but sadly we can't trust hardware to always work, and we need
something to prevent explosions. I quite like the idea of marking the
GPU wedged if hw init fails so that we lose acceleration but keep
modesetting around.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list