[Intel-gfx] [PATCH] drm/i915: Restore context and pd for ringbuffer submission after reset

Chris Wilson chris at chris-wilson.co.uk
Sat Feb 4 19:46:16 UTC 2017


On Sat, Feb 04, 2017 at 07:37:13PM +0000, Chris Wilson wrote:
> Following a reset, the context and page directory registers are lost.
> However, the queue of requests that we resubmit after the reset may
> depend upon them - the registers are restored from a context image, but
> that restore may be inhibited and may simply be absent from the request
> if it was in the middle of a sequence using the same context. If we
> prime the CCID/PD registers with the first request in the queue (even
> for the hung request), we prevent invalid memory access for the
> following requests (and continually hung engines).
> 
> Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> ---
> 
> This could do with going to stable but requires a few odds and ends, such
> as dma_fence_set_error(). Oh well, fortunately it is not as bad it might
> seem since these registers are restored from the context - but that then
> requires a mesa context to reset the GPU state (as fortunately we called
> MI_SET_CONTEXT at the start of every batch!), but any other request in
> the meantime will likely hang again.
> 
> (Also I left gen8/ringbuffer reset_hw as an exercise for the reader)

I'm also puzzled as to how this escaped igt, the fence test should have
tried to write through the aliasing ppgtt without a context restore
(i.e. into randomness) following the hang. Weird. On the positive side,
it may mean the impact isn't as large as I think it should be.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list