[PATCH 2/9] drm/xe: Cancel pending TLB inval workers on teardown
Matthew Brost
matthew.brost at intel.com
Mon Aug 25 18:20:32 UTC 2025
On Mon, Aug 25, 2025 at 12:06:44PM -0600, Summers, Stuart wrote:
> On Mon, 2025-08-25 at 17:57 +0000, Stuart Summers wrote:
> > Add a new _fini() routine on the GT TLB invalidation
> > side to handle this worker cleanup on driver teardown.
> >
> > v2: Move the TLB teardown to the gt fini() routine called during
> > gt_init rather than in gt_alloc. This way the GT structure stays
> > alive for while we reset the TLB state.
> >
> > Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_gt.c | 2 ++
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 12 ++++++++++++
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h | 1 +
> > 3 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index a3397f04abcc..178c4783bbda 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -603,6 +603,8 @@ static void xe_gt_fini(void *arg)
> > struct xe_gt *gt = arg;
> > int i;
> >
> > + xe_gt_tlb_invalidation_fini(gt);
> > +
> > for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
> > xe_hw_fence_irq_finish(>->fence_irq[i]);
> >
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > index 75854b963d66..db00c5adead9 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > @@ -188,6 +188,18 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt
> > *gt)
> > mutex_unlock(>->tlb_invalidation.seqno_lock);
> > }
> >
> > +/**
> > + *
> > + * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation state
> > + *
> > + * Cancel pending fence workers and clean up any additional
> > + * GT TLB invalidation state.
> > + */
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
> > +{
> > + xe_gt_tlb_invalidation_reset(gt);
>
> I've been seeing an issue on fault injection, running in a tight while
I think fault injection case will be fixed by [1] whenever that merges.
[1] https://patchwork.freedesktop.org/series/152870/
For general safety though, I think calling tlb_invalidation_reset is a
good idea.
> loop, where occasionally we see that a couple of sysfs files weren't
> properly torn down on a prior driver instance, followed by TLB
> invalidation timeouts. Up until today I was only able to reproduce that
> with this series, so I wanted to be sure we weren't causing something
> here, particularly with this _reset() call (one of the reasons I had
> declined to include this in the original series). Today though, even
> without the series, I was able to reproduce that behavior (-EEXIST on
> sysfs create, followed by TLB inval timeout). So I don't think we
> should block this series on that debug.
>
I agree. The prior CI run LGTM. The failure here [2] should be fixed by
[3] which merged last night.
[2] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152022v8/shard-lnl-7/igt@xe_exec_compute_mode@many-execqueues-userptr-rebind.html
[3] https://patchwork.freedesktop.org/series/153197/
> I see a few things in ci-buglog that could be related, although I don't
> see any results in those to confirm:
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/7175?query_key=a5707a4d3ae2ebb8c04ef6cea0ef747322df4ee1
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/10412?query_key=402f2615406c4afa4814a29849b312a0c7b66e9c
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/15004?query_key=e0ce601ae69ec76bbdf27293dc2919ba07357de3
>
> Anyway, at least for this series, I think we can ignore that issue.
>
Again agree. I think if this CI run is clean, go ahead and merge.
With that:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> Thanks,
> Stuart
>
> > +}
> > +
> > static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> > {
> > int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > index f7f0f2eaf4b5..3e4cff3922d6 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > @@ -16,6 +16,7 @@ struct xe_vm;
> > struct xe_vma;
> >
> > int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
> >
> > void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
> > int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
>
More information about the Intel-xe
mailing list