[Intel-gfx] [PATCH] drm/i915/execlists: Clear STOP_RING bit on reset
Chris Wilson
chris at chris-wilson.co.uk
Tue Sep 10 10:02:34 UTC 2019
Quoting Mika Kuoppala (2019-09-10 10:54:43)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > Quoting Mika Kuoppala (2019-09-10 10:31:05)
> >> Chris Wilson <chris at chris-wilson.co.uk> writes:
> >>
> >> > During reset, we try to ensure no forward progress of the CS prior to
> >> > the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
> >> > register is context saved and do we end up in the odd situation where we
> >> > save the STOP_RING bit and so try to stop the engine again immediately
> >> > upon resume. This is quite unexpected and causes us to complain about an
> >> > early CS completion event!
> >>
> >> The completion event is a product of resume with a stop set?
> >
> > A completion event is the product of STOP_RING. Whether it is the
> > completion event that we keep failing on...
> >
> >> If my memory serves me well, we have fought this before.
> >
> > Exactly. We've reduced the frequency of when we apply the STOP_RING, but
> > have not eliminated it.
> >
> >> But I have still missing pieces. Why would we not want to
> >> set this for all contexts primed for execution? In gen8 too.
> >
> > It's not in the gen8 context, afaict. I searched the context image for an
> > LRI with the RING_MI_MODE register:
> > https://patchwork.freedesktop.org/patch/329919/?series=66468&rev=1
> >
> >> I mean, queuing context with a ring stopped just doesn't
> >> sound right on any gen.
> >
> > We clear the STOP_RING in the register on resume just in case, and that
> > is being flagged on Icelake (which I put down to the reset not being very
> > thorough!). The remaining question was whether we were restoring it from
> > the context image.
>
> Hmm yeah, I was confused of the sequence of setup. With that cleared
> and with the context state being worked on, perhaps we can add
> sanity checkers to the queuing path.
Yeah, I think there's definitely some fun we can have here. At the very
least a check that CTX_RING_START == ring->start would be a good
sanitycheck.
> Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
As always, the only way to be sure if this changes the mtbf is to let is
soak. One day I may be able to run my own extended testing on icl!
-Chris
More information about the Intel-gfx
mailing list