[Intel-gfx] [PATCH 2/2] drm/i915: Reorder hw init to avoid executing with invalid context/mm state

Ville Syrjälä ville.syrjala at linux.intel.com
Wed Jan 21 03:21:33 PST 2015


On Wed, Jan 21, 2015 at 10:12:12AM +0000, Chris Wilson wrote:
> On Fri, Dec 05, 2014 at 07:03:42PM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 05, 2014 at 04:26:23PM +0000, Chris Wilson wrote:
> > > On Fri, Dec 05, 2014 at 04:53:49PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Dec 05, 2014 at 02:38:46PM +0000, Chris Wilson wrote:
> > > > > On Fri, Dec 05, 2014 at 04:31:35PM +0200, Ville Syrjälä wrote:
> > > > > > On Fri, Dec 05, 2014 at 02:15:22PM +0000, Chris Wilson wrote:
> > > > > > > Currently we initialise the rings, add the first context switch to the
> > > > > > > ring and execute our golden state then enable (aliasing or full) ppgtt.
> > > > > > > However, as we enable ppgtt using direct MMIO but load the PD using
> > > > > > > MI_LRI, we end up executing the context switch and golden render state
> > > > > > > with an invalid PD generating page faults. To solve this issue, first do
> > > > > > > the ppgtt PD setup, then set the default context and write the commands
> > > > > > > to run the render state into the ring, before we activate the ring. This
> > > > > > > allows us to be sure that the register state is valid before we begin
> > > > > > > execution.
> > > > > > > 
> > > > > > > This was spotted when writing the seqno/request conversion, but only with
> > > > > > > the ERROR capture did I realise that it was a necessity now.
> > > > > > > 
> > > > > > > RFC: cleanup the error handling in i915_gem_init_hw.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/i915_gem.c         | 20 ++++++++++----------
> > > > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 ++++++---
> > > > > > >  2 files changed, 16 insertions(+), 13 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > index c1c11418231b..c13842d3cbc9 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > > > > @@ -4796,15 +4796,15 @@ i915_gem_init_hw(struct drm_device *dev)
> > > > > > >  	 */
> > > > > > >  	init_unused_rings(dev);
> > > > > > >  
> > > > > > > -	for_each_ring(ring, dev_priv, i) {
> > > > > > > -		ret = ring->init_hw(ring);
> > > > > > > -		if (ret)
> > > > > > > -			return ret;
> > > > > > > -	}
> > > > > > > -
> > > > > > >  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
> > > > > > >  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
> > > > > > 
> > > > > > This is going to assume ring->head/tail are already valid?
> > > > > 
> > > > > We write into the ring obj, not the ring itself, which should be setup
> > > > > during the various intel_init_engine, i.e. the backing storage is
> > > > > independent of the actual registers.
> > > > 
> > > > I mean the software shadows, not the registers themselves. When the GPU
> > > > hangs I expect rign->head != ring->tail. So what makes those two identical
> > > > again after the GPU reset?
> > > 
> > > Why would they be equal after reset?
> > 
> > They wouldn't. That's my whole point.
> > 
> > > At the moment, we discard all
> > > outstanding requests which makes them equal.
> > 
> > OK, so you're saying somehwere we set them to some sane values. But
> > I don't see any obvious code for that. The obvious code was in
> > init_ring_common(), but you just removed said code in this patch.
> 
> When we clear the request list, we should be updating the position of
> last known head, which should then be used to set reset the ring->head
> (which in turn is used to reprogram the hw on reset/enable).

It's been a while, but IIRC the problem seemed to me that we don't
call i915_gem_retire_requests_ring() on reset and hence never update
ring->last_retired_head. And even if we did, we don't seem to call
intel_ring_update_space() until the end of init_ring_common() so we
wouldn't do the ring->head=ring->last_retired_head assignment either.
Thus ring->head would be left as its pre-reset value.

-- 
Ville Syrjälä
Intel OTC


More information about the Intel-gfx mailing list