[PATCH v1] drm/i915/guc: Flush ct receive tasklet during reset preparation

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Mon Nov 4 18:26:47 UTC 2024


Just some minor nits on header. Otherwise, LGTM:

Reviewed-by: Alan Previn <alan.previn.teres.alexis at intel.com>

On Wed, 2024-10-30 at 15:38 -0700, Zhanjun Dong wrote:
> GuC to host communication is interrupt driven, the handling has 3
> parts: interrupt context, tasklet and request queue worker.
> During GuC reset prepare, interrupt is disabled before destroy
> contexts steps start. The IRQ and worker flushed to finish
alan: "and worker are flushed to finish"
> in progress message handling if there are. The tasklet flush is
alan: "any outstanding in-progress message handling. But, the tasklet
flush..."
> missing, it might causes 2 race conditions:
> 1. Tasklet runs after IRQ flushed, add request to queue after worker
> flush started, causes unexpected G2H message request processing,
> meanwhile, reset prepare code already get the context destroyed.
> This will causes error reported about bad context state.
> 2. Tasklet runs after intel_guc_submission_reset_prepare,
> ct_try_receive_message start to run, while intel_uc_reset_prepare
> already finished guc sanitize and set ct->enable to false. This will
> causes warning on incorrect ct->enable state.
> 
> Add the missing tasklet flush to flush all 3 parts.
> 
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9ede6f240d79..353a9167c9a4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1688,6 +1688,10 @@ void intel_guc_submission_reset_prepare(struct
> intel_guc *guc)

alan: i still feel like we should be just killing off the guc at this
point (via GT_RESTT) before any of the following reset prep sequences.
But as per offline conversation, we agreed that might be too intrusive
a change for i915 while new design ideas are being concentrated on Xe.


>         spin_lock_irq(guc_to_gt(guc)->irq_lock);
>         spin_unlock_irq(guc_to_gt(guc)->irq_lock);
>  
> +       /* Flush tasklet */
> +       tasklet_disable(&guc->ct.receive_tasklet);
> +       tasklet_enable(&guc->ct.receive_tasklet);
> +
>         guc_flush_submissions(guc);
>         guc_flush_destroyed_contexts(guc);
>         flush_work(&guc->ct.requests.worker);



More information about the dri-devel mailing list