[PATCH v1] drm/i915/guc: Flush ct receive tasklet during reset preparation

Dong, Zhanjun zhanjun.dong at intel.com
Tue Nov 5 15:38:03 UTC 2024



On 2024-11-04 6:20 p.m., Daniele Ceraolo Spurio wrote:
> 
> 
> 
> On 10/30/2024 3:38 PM, Zhanjun Dong wrote:
>> GuC to host communication is interrupt driven, the handling has 3
>> parts: interrupt context, tasklet and request queue worker.
>> During GuC reset prepare, interrupt is disabled before destroy
>> contexts steps start. The IRQ and worker flushed to finish
>> in progress message handling if there are. The tasklet flush is
>> missing, it might causes 2 race conditions:
>> 1. Tasklet runs after IRQ flushed, add request to queue after worker
>> flush started, causes unexpected G2H message request processing,
>> meanwhile, reset prepare code already get the context destroyed.
>> This will causes error reported about bad context state.
>> 2. Tasklet runs after intel_guc_submission_reset_prepare,
>> ct_try_receive_message start to run, while intel_uc_reset_prepare
>> already finished guc sanitize and set ct->enable to false. This will
>> causes warning on incorrect ct->enable state.
>>
>> Add the missing tasklet flush to flush all 3 parts.
>>
>> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
>> Cc: John Harrison <John.C.Harrison at Intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/ 
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 9ede6f240d79..353a9167c9a4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1688,6 +1688,10 @@ void intel_guc_submission_reset_prepare(struct 
>> intel_guc *guc)
>>       spin_lock_irq(guc_to_gt(guc)->irq_lock);
>>       spin_unlock_irq(guc_to_gt(guc)->irq_lock);
>> +    /* Flush tasklet */
>> +    tasklet_disable(&guc->ct.receive_tasklet);
>> +    tasklet_enable(&guc->ct.receive_tasklet);
>> +
> 
> It looks like we might have the same problem around suspend/resume, 
> because AFAICS the tasklet is never stopped anywhere except driver 
> unload. Maybe it's worth adding the tasklet disabling/enabling to the 
> interrupt disabling/enabling functions, i.e. guc->interrupts.disable/ 
> enable(), so it's automatically called any time we want to disable GuC 
> interrupts? not a blocker.
> 
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> 
> Daniele
> 
Thanks Daniele for review.

I like the idea to put tasklet disabling/enabling to the
 > interrupt disabling/enabling functions. Let me do some investigation 
on suspend/resume workflow and run some test first. It might take some time.
This patch might fix multiple issues, I would like to get it merged 
after we got positive CI.Full result.

Regards,
Zhanjun Dong

>>       guc_flush_submissions(guc);
>>       guc_flush_destroyed_contexts(guc);
>>       flush_work(&guc->ct.requests.worker);
> 



More information about the Intel-gfx mailing list