[PATCH 2/9] drm/xe: Cancel pending TLB inval workers on teardown

Thu Aug 21 22:13:12 UTC 2025

On Wed, Aug 20, 2025 at 11:30:50PM +0000, Stuart Summers wrote:
> There is a small window of time during the driver teardown where
> the TLB invalidation fence->gt is set to NULL but an invalidation
> worker is still alive. This won't cause an issue currently because
> the tlb_invalidation structure itself is still present. However as
> part of a future refactor, we are looking to split that structure
> out. To be safe, go ahead and cancel any outstanding TLB invalidation
> fence worker thread on _fini() since at that time we are already
> tearing down the driverworker thread on _fini() since at that time we
> are already tearing down the driver and don't need to continue
> monitoring this.
> 
> Add a new _fini() routine on the GT TLB invalidation
> side to handle this worker cleanup on driver teardown.
> 
> Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c                  |  2 ++
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 21 +++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index a3397f04abcc..bbf3f4da7f54 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -69,6 +69,8 @@ static void gt_fini(struct drm_device *drm, void *arg)
>  {
>  	struct xe_gt *gt = arg;
>  
> +	xe_gt_tlb_invalidation_fini(gt);
> +
>  	destroy_workqueue(gt->ordered_wq);
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 75854b963d66..170f6c97268f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -188,6 +188,27 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
>  	mutex_unlock(&gt->tlb_invalidation.seqno_lock);
>  }
>  
> +/**
> + *
> + * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation state
> + *
> + * Cancel pending fence workers and clean up any additional
> + * GT TLB invalidation state.
> + */
> +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
> +{
> +	/*
> +	 * At this point, we expect any outstanding TLB transactions
> +	 * to still signal through the normal channels of interrupts
> +	 * or MMIO polling updates, etc. There could be some workers
> +	 * triggered as a result of timeouts in those channels, though,
> +	 * that we want to make sure get cleaned up during an unbind.
> +	 * So here, cancel the delayed work, but let the normal fence
> +	 * signaling handle the rest.
> +	 */

You should assert this is true then, e.g., the pending list is empty.

I haven't checked but later in the series when we move this to an
embedded struct, is the fini owned by tlb_inval layer (i.e., it has its
own drmm or devm fini handler)? That would be a better placement IMO.

Matt

> +	cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
> +}
> +
>  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
>  {
>  	int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> index f7f0f2eaf4b5..3e4cff3922d6 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> @@ -16,6 +16,7 @@ struct xe_vm;
>  struct xe_vma;
>  
>  int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
> +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
>  
>  void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
>  int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
> -- 
> 2.34.1
>