[PATCH 2/9] drm/xe: Cancel pending TLB inval workers on teardown
Matthew Brost
matthew.brost at intel.com
Thu Aug 21 22:13:12 UTC 2025
On Wed, Aug 20, 2025 at 11:30:50PM +0000, Stuart Summers wrote:
> There is a small window of time during the driver teardown where
> the TLB invalidation fence->gt is set to NULL but an invalidation
> worker is still alive. This won't cause an issue currently because
> the tlb_invalidation structure itself is still present. However as
> part of a future refactor, we are looking to split that structure
> out. To be safe, go ahead and cancel any outstanding TLB invalidation
> fence worker thread on _fini() since at that time we are already
> tearing down the driverworker thread on _fini() since at that time we
> are already tearing down the driver and don't need to continue
> monitoring this.
>
> Add a new _fini() routine on the GT TLB invalidation
> side to handle this worker cleanup on driver teardown.
>
> Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> ---
> drivers/gpu/drm/xe/xe_gt.c | 2 ++
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 21 +++++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 +
> 3 files changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index a3397f04abcc..bbf3f4da7f54 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -69,6 +69,8 @@ static void gt_fini(struct drm_device *drm, void *arg)
> {
> struct xe_gt *gt = arg;
>
> + xe_gt_tlb_invalidation_fini(gt);
> +
> destroy_workqueue(gt->ordered_wq);
> }
>
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 75854b963d66..170f6c97268f 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -188,6 +188,27 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
> mutex_unlock(>->tlb_invalidation.seqno_lock);
> }
>
> +/**
> + *
> + * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation state
> + *
> + * Cancel pending fence workers and clean up any additional
> + * GT TLB invalidation state.
> + */
> +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
> +{
> + /*
> + * At this point, we expect any outstanding TLB transactions
> + * to still signal through the normal channels of interrupts
> + * or MMIO polling updates, etc. There could be some workers
> + * triggered as a result of timeouts in those channels, though,
> + * that we want to make sure get cleaned up during an unbind.
> + * So here, cancel the delayed work, but let the normal fence
> + * signaling handle the rest.
> + */
You should assert this is true then, e.g., the pending list is empty.
I haven't checked but later in the series when we move this to an
embedded struct, is the fini owned by tlb_inval layer (i.e., it has its
own drmm or devm fini handler)? That would be a better placement IMO.
Matt
> + cancel_delayed_work(>->tlb_invalidation.fence_tdr);
> +}
> +
> static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> {
> int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> index f7f0f2eaf4b5..3e4cff3922d6 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> @@ -16,6 +16,7 @@ struct xe_vm;
> struct xe_vma;
>
> int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
> +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
>
> void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
> int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
> --
> 2.34.1
>
More information about the Intel-xe
mailing list