[PATCH 2/9] drm/xe: Cancel pending TLB inval workers on teardown

Matthew Brost matthew.brost at intel.com
Mon Aug 25 18:20:32 UTC 2025


On Mon, Aug 25, 2025 at 12:06:44PM -0600, Summers, Stuart wrote:
> On Mon, 2025-08-25 at 17:57 +0000, Stuart Summers wrote:
> > Add a new _fini() routine on the GT TLB invalidation
> > side to handle this worker cleanup on driver teardown.
> > 
> > v2: Move the TLB teardown to the gt fini() routine called during
> >     gt_init rather than in gt_alloc. This way the GT structure stays
> >     alive for while we reset the TLB state.
> > 
> > Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt.c                  |  2 ++
> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 12 ++++++++++++
> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  1 +
> >  3 files changed, 15 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index a3397f04abcc..178c4783bbda 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -603,6 +603,8 @@ static void xe_gt_fini(void *arg)
> >         struct xe_gt *gt = arg;
> >         int i;
> >  
> > +       xe_gt_tlb_invalidation_fini(gt);
> > +
> >         for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i)
> >                 xe_hw_fence_irq_finish(&gt->fence_irq[i]);
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > index 75854b963d66..db00c5adead9 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > @@ -188,6 +188,18 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt
> > *gt)
> >         mutex_unlock(&gt->tlb_invalidation.seqno_lock);
> >  }
> >  
> > +/**
> > + *
> > + * xe_gt_tlb_invalidation_fini - Clean up GT TLB invalidation state
> > + *
> > + * Cancel pending fence workers and clean up any additional
> > + * GT TLB invalidation state.
> > + */
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt)
> > +{
> > +       xe_gt_tlb_invalidation_reset(gt);
> 
> I've been seeing an issue on fault injection, running in a tight while

I think fault injection case will be fixed by [1] whenever that merges.

[1] https://patchwork.freedesktop.org/series/152870/

For general safety though, I think calling tlb_invalidation_reset is a
good idea.

> loop, where occasionally we see that a couple of sysfs files weren't
> properly torn down on a prior driver instance, followed by TLB
> invalidation timeouts. Up until today I was only able to reproduce that
> with this series, so I wanted to be sure we weren't causing something
> here, particularly with this _reset() call (one of the reasons I had
> declined to include this in the original series). Today though, even
> without the series, I was able to reproduce that behavior (-EEXIST on
> sysfs create, followed by TLB inval timeout). So I don't think we
> should block this series on that debug.
> 

I agree. The prior CI run LGTM. The failure here [2] should be fixed by
[3] which merged last night.

[2] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-152022v8/shard-lnl-7/igt@xe_exec_compute_mode@many-execqueues-userptr-rebind.html
[3] https://patchwork.freedesktop.org/series/153197/ 

> I see a few things in ci-buglog that could be related, although I don't
> see any results in those to confirm:
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/7175?query_key=a5707a4d3ae2ebb8c04ef6cea0ef747322df4ee1
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/10412?query_key=402f2615406c4afa4814a29849b312a0c7b66e9c
> https://gfx-ci.igk.intel.com/cibuglog-ng/issue/15004?query_key=e0ce601ae69ec76bbdf27293dc2919ba07357de3
> 
> Anyway, at least for this series, I think we can ignore that issue.
> 

Again agree. I think if this CI run is clean, go ahead and merge.

With that:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> Thanks,
> Stuart
> 
> > +}
> > +
> >  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> >  {
> >         int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > index f7f0f2eaf4b5..3e4cff3922d6 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > @@ -16,6 +16,7 @@ struct xe_vm;
> >  struct xe_vma;
> >  
> >  int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt);
> > +void xe_gt_tlb_invalidation_fini(struct xe_gt *gt);
> >  
> >  void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
> >  int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt);
> 


More information about the Intel-xe mailing list