[Intel-gfx] [PATCH] drm/i915: fix SFC reset flow
Chris Wilson
chris at chris-wilson.co.uk
Tue Sep 17 19:49:19 UTC 2019
Quoting Daniele Ceraolo Spurio (2019-09-17 20:45:02)
>
>
> On 9/17/2019 11:57 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-17 19:36:35)
> >>
> >> On 9/17/2019 12:57 AM, Chris Wilson wrote:
> >>> Quoting Daniele Ceraolo Spurio (2019-09-16 22:41:04)
> >>>> Our assumption that the we can ask the HW to lock the SFC even if not
> >>>> currently in use does not match the HW commitment. The expectation from
> >>>> the HW is that SW will not try to lock the SFC if the engine is not
> >>>> using it and if we do that the behavior is undefined; on ICL the HW
> >>>> ends up to returning the ack and ignoring our lock request, but this is
> >>>> not guaranteed and we shouldn't expect it going forward.
> >>>>
> >>>> Reported-by: Owen Zhang <owen.zhang at intel.com>
> >>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >>>> ---
> >>>> @@ -366,10 +368,13 @@ static u32 gen11_lock_sfc(struct intel_engine_cs *engine)
> >>>> sfc_forced_lock_ack_bit,
> >>>> sfc_forced_lock_ack_bit,
> >>>> 1000, 0, NULL)) {
> >>>> - DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
> >>>> + /* did we race the unlock? */
> >>>> + if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
> >>>> + DRM_ERROR("Wait for SFC forced lock ack failed\n");
> >>> What's our plan if this *ERROR* is ever triggered?
> >>>
> >>> If it remains do nothing and check the logs on death, then it remains
> >>> just a debug splat. If there is a plan to actually do something to
> >>> handle the error, do it!
> >>> -Chris
> >> AFAIU the only thing we can do is escalate to full gpu reset. However,
> >> the probability of this failing should be next to non-existent (only one
> >> engine can use the SFC at any time so there is no lock contention), so
> >> I'm not convinced the fallback is worth the effort. The error is still
> >> useful IMO to catch unexpected behavior on new platforms, as it happened
> >> in this case with the media team reporting seeing this message on gen12
> >> with the previous behavior. This said, I'm happy to add the extra logic
> >> if you believe it is worth it.
> > We've see this message on every icl run!
> > -Chris
>
> I've never noticed it, which tests are hitting it? My understanding from
> what the HW team said is that on ICL the ack will always come back (even
> if it is not part of the "official" SW/HW interface) and the HW tweak
> that stops that is a gen12 change. Something else might be wrong is this
> is firing off in our ICL CI, also because I don't think we have any test
> case that actually uses the SFC, do we?
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6911/fi-icl-u2/igt@i915_selftest@live_hangcheck.html
All icl, live_hangcheck or live_reset, for as long as I can remember.
-Chris
More information about the Intel-gfx
mailing list