[PATCH v1] drm/i915/guc: Flush ct receive tasklet during reset preparation

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Mon Nov 4 23:20:48 UTC 2024




On 10/30/2024 3:38 PM, Zhanjun Dong wrote:
> GuC to host communication is interrupt driven, the handling has 3
> parts: interrupt context, tasklet and request queue worker.
> During GuC reset prepare, interrupt is disabled before destroy
> contexts steps start. The IRQ and worker flushed to finish
> in progress message handling if there are. The tasklet flush is
> missing, it might causes 2 race conditions:
> 1. Tasklet runs after IRQ flushed, add request to queue after worker
> flush started, causes unexpected G2H message request processing,
> meanwhile, reset prepare code already get the context destroyed.
> This will causes error reported about bad context state.
> 2. Tasklet runs after intel_guc_submission_reset_prepare,
> ct_try_receive_message start to run, while intel_uc_reset_prepare
> already finished guc sanitize and set ct->enable to false. This will
> causes warning on incorrect ct->enable state.
>
> Add the missing tasklet flush to flush all 3 parts.
>
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9ede6f240d79..353a9167c9a4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1688,6 +1688,10 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
>   	spin_lock_irq(guc_to_gt(guc)->irq_lock);
>   	spin_unlock_irq(guc_to_gt(guc)->irq_lock);
>   
> +	/* Flush tasklet */
> +	tasklet_disable(&guc->ct.receive_tasklet);
> +	tasklet_enable(&guc->ct.receive_tasklet);
> +

It looks like we might have the same problem around suspend/resume, 
because AFAICS the tasklet is never stopped anywhere except driver 
unload. Maybe it's worth adding the tasklet disabling/enabling to the 
interrupt disabling/enabling functions, i.e. 
guc->interrupts.disable/enable(), so it's automatically called any time 
we want to disable GuC interrupts? not a blocker.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>

Daniele

>   	guc_flush_submissions(guc);
>   	guc_flush_destroyed_contexts(guc);
>   	flush_work(&guc->ct.requests.worker);



More information about the Intel-gfx mailing list