[Intel-gfx] [PATCH 3/3] drm/i915/guc: sleep on enable
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Mon Oct 15 21:41:47 UTC 2018
On 15/10/18 12:23, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2018-10-15 19:33:26)
>>
>>
>> On 14/10/18 10:02, Chris Wilson wrote:
>>> Seems like there's a missing ack before the guc is ready for commands.
>>>
>>
>> I'm assuming you're running without HuC since the HuC auth H2G comes
>> before this one.
>
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4981/fi-apl-guc/boot0.log
> i915.enable_guc=3
> <7>[ 6.877175] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch i915/bxt_guc_ver9_29.bin
> <7>[ 6.877268] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch PENDING
> <7>[ 6.879780] [drm:intel_uc_fw_fetch [i915]] GuC fw size 146432 ptr 000000003fdb20d0
> <7>[ 6.879869] [drm:intel_uc_fw_fetch [i915]] GuC fw version 9.29 (wanted 9.29)
> <7>[ 6.880425] [drm:intel_uc_fw_fetch [i915]] GuC fw fetch SUCCESS
> <7>[ 6.880723] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch i915/bxt_huc_ver01_07_1398.bin
> <7>[ 6.880807] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch PENDING
> <7>[ 6.882529] [drm:intel_uc_fw_fetch [i915]] HuC fw size 154432 ptr 000000000aad61c4
> <7>[ 6.882621] [drm:intel_uc_fw_fetch [i915]] HuC fw version 1.7 (wanted 1.7)
> <7>[ 6.883098] [drm:intel_uc_fw_fetch [i915]] HuC fw fetch SUCCESS
>
>> What we're polling to indicate load completion (GS_UKERNEL_READY) is
>> definitely what the firmware uses to signal readiness. The other check
>> we do (GS_MIA_CORE_STATE) should only apply for rc6 scenarios. From what
>> I can see from the firmware code, all the initialization steps are done
>> before GS_UKERNEL_READY is written to the status register so there
>> shouldn't be any missing acks in principle.
>
>> Is the GuC returning anything in the scratch 0 register? It should be
>> printed out by the H2G error message. The value of the status register
>> (0xc000) could also provide interesting debug info.
>
> When do you want to know? As you are probably aware, our first
> indication of failure is from wait_for_guc_preempt_report() and
> the wait there on report->report_return_status timing out.
>
> Michel asked what was the value when it timed out, but alas apl-guc was
> not available for comment.
> -Chris
>
I think found the root cause of the issue (with the help of one of the
GuC devs). The guc suspend/resume protocol requires us to do an extra
couple of steps to make sure GuC is done managing its state, waiting on
the H2G return is not enough; since we're not correctly doing those GuC
is still in the middle of the resume process when the preemption request
arrives, thus causing the failure. Patch to fix this incoming.
Note that since we ensure the HW is idle before suspend we could
theoretically skip the guc_resume step as there is nothing to restore,
but this is untested from the GuC side so not recommended yet. We still
need to do guc_suspend since that step ensures that all guc timers are
correctly disabled.
I think you mentioned you were also seeing issues even outside of the
suspend/resume path, so we probably have a different issue as well :(
Daniele
More information about the Intel-gfx
mailing list