[PATCH v1] drm/i915/guc: Flush ct receive tasklet during reset preparation
Dong, Zhanjun
zhanjun.dong at intel.com
Tue Nov 5 15:38:03 UTC 2024
On 2024-11-04 6:20 p.m., Daniele Ceraolo Spurio wrote:
>
>
>
> On 10/30/2024 3:38 PM, Zhanjun Dong wrote:
>> GuC to host communication is interrupt driven, the handling has 3
>> parts: interrupt context, tasklet and request queue worker.
>> During GuC reset prepare, interrupt is disabled before destroy
>> contexts steps start. The IRQ and worker flushed to finish
>> in progress message handling if there are. The tasklet flush is
>> missing, it might causes 2 race conditions:
>> 1. Tasklet runs after IRQ flushed, add request to queue after worker
>> flush started, causes unexpected G2H message request processing,
>> meanwhile, reset prepare code already get the context destroyed.
>> This will causes error reported about bad context state.
>> 2. Tasklet runs after intel_guc_submission_reset_prepare,
>> ct_try_receive_message start to run, while intel_uc_reset_prepare
>> already finished guc sanitize and set ct->enable to false. This will
>> causes warning on incorrect ct->enable state.
>>
>> Add the missing tasklet flush to flush all 3 parts.
>>
>> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 9ede6f240d79..353a9167c9a4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1688,6 +1688,10 @@ void intel_guc_submission_reset_prepare(struct
>> intel_guc *guc)
>> spin_lock_irq(guc_to_gt(guc)->irq_lock);
>> spin_unlock_irq(guc_to_gt(guc)->irq_lock);
>> + /* Flush tasklet */
>> + tasklet_disable(&guc->ct.receive_tasklet);
>> + tasklet_enable(&guc->ct.receive_tasklet);
>> +
>
> It looks like we might have the same problem around suspend/resume,
> because AFAICS the tasklet is never stopped anywhere except driver
> unload. Maybe it's worth adding the tasklet disabling/enabling to the
> interrupt disabling/enabling functions, i.e. guc->interrupts.disable/
> enable(), so it's automatically called any time we want to disable GuC
> interrupts? not a blocker.
>
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>
> Daniele
>
Thanks Daniele for review.
I like the idea to put tasklet disabling/enabling to the
> interrupt disabling/enabling functions. Let me do some investigation
on suspend/resume workflow and run some test first. It might take some time.
This patch might fix multiple issues, I would like to get it merged
after we got positive CI.Full result.
Regards,
Zhanjun Dong
>> guc_flush_submissions(guc);
>> guc_flush_destroyed_contexts(guc);
>> flush_work(&guc->ct.requests.worker);
>
More information about the dri-devel
mailing list