[Intel-gfx] [PATCH] drm/i915: fix SFC reset flow

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Tue Sep 17 19:45:02 UTC 2019



On 9/17/2019 11:57 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-09-17 19:36:35)
>>
>> On 9/17/2019 12:57 AM, Chris Wilson wrote:
>>> Quoting Daniele Ceraolo Spurio (2019-09-16 22:41:04)
>>>> Our assumption that the we can ask the HW to lock the SFC even if not
>>>> currently in use does not match the HW commitment. The expectation from
>>>> the HW is that SW will not try to lock the SFC if the engine is not
>>>> using it and if we do that the behavior is undefined; on ICL the HW
>>>> ends up to returning the ack and ignoring our lock request, but this is
>>>> not guaranteed and we shouldn't expect it going forward.
>>>>
>>>> Reported-by: Owen Zhang <owen.zhang at intel.com>
>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>>> ---
>>>> @@ -366,10 +368,13 @@ static u32 gen11_lock_sfc(struct intel_engine_cs *engine)
>>>>                                            sfc_forced_lock_ack_bit,
>>>>                                            sfc_forced_lock_ack_bit,
>>>>                                            1000, 0, NULL)) {
>>>> -               DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
>>>> +               /* did we race the unlock? */
>>>> +               if (intel_uncore_read_fw(uncore, sfc_usage) & sfc_usage_bit)
>>>> +                       DRM_ERROR("Wait for SFC forced lock ack failed\n");
>>> What's our plan if this *ERROR* is ever triggered?
>>>
>>> If it remains do nothing and check the logs on death, then it remains
>>> just a debug splat. If there is a plan to actually do something to
>>> handle the error, do it!
>>> -Chris
>> AFAIU the only thing we can do is escalate to full gpu reset. However,
>> the probability of this failing should be next to non-existent (only one
>> engine can use the SFC at any time so there is no lock contention), so
>> I'm not convinced the fallback is worth the effort. The error is still
>> useful IMO to catch unexpected behavior on new platforms, as it happened
>> in this case with the media team reporting seeing this message on gen12
>> with the previous behavior. This said, I'm happy to add the extra logic
>> if you believe it is worth it.
> We've see this message on every icl run!
> -Chris

I've never noticed it, which tests are hitting it? My understanding from 
what the HW team said is that on ICL the ack will always come back (even 
if it is not part of the "official" SW/HW interface) and the HW tweak 
that stops that is a gen12 change. Something else might be wrong is this 
is firing off in our ICL CI, also because I don't think we have any test 
case that actually uses the SFC, do we?

Daniele



More information about the Intel-gfx mailing list