[PATCH 2/9] drm/xe: Cancel pending TLB inval workers on teardown
Summers, Stuart
stuart.summers at intel.com
Thu Aug 21 23:34:53 UTC 2025
On Thu, 2025-08-21 at 15:13 -0700, Matthew Brost wrote:
> On Wed, Aug 20, 2025 at 11:30:50PM +0000, Stuart Summers wrote:
> > There is a small window of time during the driver teardown where
> > the TLB invalidation fence->gt is set to NULL but an invalidation
> > worker is still alive. This won't cause an issue currently because
> > the tlb_invalidation structure itself is still present. However as
> > part of a future refactor, we are looking to split that structure
> > out. To be safe, go ahead and cancel any outstanding TLB
> > invalidation
> > fence worker thread on _fini() since at that time we are already
> > tearing down the driverworker thread on _fini() since at that time
> > we
> > are already tearing down the driver and don't need to continue
> > monitoring this.
> >
> > Add a new _fini() routine on the GT TLB invalidation
> > side to handle this worker cleanup on driver teardown.
> >
> > Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_gt.c | 2 ++
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 21
> > +++++++++++++++++++++
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 +
> > 3 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c
> > b/drivers/gpu/drm/xe/xe_gt.c
> > index a3397f04abcc..bbf3f4da7f54 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -69,6 +69,8 @@ static void gt_fini(struct drm_device *drm, void
> > *arg)
> > {
> > struct xe_gt *gt = arg;
> >
> > + xe_gt_tlb_invalidation_fini(gt);
> > +
> > destroy_workqueue(gt->ordered_wq);
> > }
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > index 75854b963d66..170f6c97268f 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > @@ -188,6 +188,27 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt
> > *gt)
> > mutex_unlock(>->tlb_invalidation.seqno_lock);
> > }
> >
> > +/**
> > + *
> > + * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation
> > state
> > + *
> > + * Cancel pending fence workers and clean up any additional
> > + * GT TLB invalidation state.
> > + */
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
> > +{
> > + /*
> > + * At this point, we expect any outstanding TLB
> > transactions
> > + * to still signal through the normal channels of
> > interrupts
> > + * or MMIO polling updates, etc. There could be some
> > workers
> > + * triggered as a result of timeouts in those channels,
> > though,
> > + * that we want to make sure get cleaned up during an
> > unbind.
> > + * So here, cancel the delayed work, but let the normal
> > fence
> > + * signaling handle the rest.
> > + */
>
> You should assert this is true then, e.g., the pending list is empty.
The problem is this seems a little racey. I don't think we can
guarantee that the responses are back from GuC by the time we hit this
_fini() routine.
>
> I haven't checked but later in the series when we move this to an
> embedded struct, is the fini owned by tlb_inval layer (i.e., it has
> its
> own drmm or devm fini handler)? That would be a better placement IMO.
Oh sorry I should have been more clear in the cover letter. Yes I am
doing that, but I thought it best to fix this independently first. At
the time this series starts, we don't have a good place to inject a new
drmm fini handler. I modify this path later in this series to do
exactly that (remove the call from gt_fini() and add it to a new
tlb_fini() that gets called via the drm layer).
Thanks,
Stuart
>
> Matt
>
> > + cancel_delayed_work(>->tlb_invalidation.fence_tdr);
> > +}
> > +
> > static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int
> > seqno)
> > {
> > int seqno_recv = READ_ONCE(gt-
> > >tlb_invalidation.seqno_recv);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > index f7f0f2eaf4b5..3e4cff3922d6 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > @@ -16,6 +16,7 @@ struct xe_vm;
> > struct xe_vma;
> >
> > int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
> >
> > void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
> > int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
> > --
> > 2.34.1
> >
More information about the Intel-xe
mailing list