[Intel-gfx] [PATCH] drm/i915: fix SFC reset flow

Tue Sep 17 18:57:48 UTC 2019

Quoting Daniele Ceraolo Spurio (2019-09-17 19:36:35)
> 
> 
> On 9/17/2019 12:57 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-09-16 22:41:04)
> >> Our assumption that the we can ask the HW to lock the SFC even if not
> >> currently in use does not match the HW commitment. The expectation from
> >> the HW is that SW will not try to lock the SFC if the engine is not
> >> using it and if we do that the behavior is undefined; on ICL the HW
> >> ends up to returning the ack and ignoring our lock request, but this is
> >> not guaranteed and we shouldn't expect it going forward.
> >>
> >> Reported-by: Owen Zhang <owen.zhang at intel.com>
> >> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >> ---
> >> @@ -366,10 +368,13 @@ static u32 gen11_lock_sfc(struct intel_engine_cs *engine)
> >>                                           sfc_forced_lock_ack_bit,
> >>                                           sfc_forced_lock_ack_bit,
> >>                                           1000, 0, NULL)) {
> >> -               DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
> >> +               /* did we race the unlock? */
> >> +               if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
> >> +                       DRM_ERROR("Wait for SFC forced lock ack failed\n");
> > What's our plan if this *ERROR* is ever triggered?
> >
> > If it remains do nothing and check the logs on death, then it remains
> > just a debug splat. If there is a plan to actually do something to
> > handle the error, do it!
> > -Chris
> 
> AFAIU the only thing we can do is escalate to full gpu reset. However, 
> the probability of this failing should be next to non-existent (only one 
> engine can use the SFC at any time so there is no lock contention), so 
> I'm not convinced the fallback is worth the effort. The error is still 
> useful IMO to catch unexpected behavior on new platforms, as it happened 
> in this case with the media team reporting seeing this message on gen12 
> with the previous behavior. This said, I'm happy to add the extra logic 
> if you believe it is worth it.

We've see this message on every icl run!
-Chris