[PATCH 5/5] drm/xe: Split TLB invalidation code in frontend and backend

Wed Jul 23 21:22:01 UTC 2025

On Wed, Jul 23, 2025 at 02:55:24PM -0600, Summers, Stuart wrote:
> On Wed, 2025-07-23 at 13:47 -0700, Matthew Brost wrote:
> > 
> 
> <cut>
> (just to reduce the noise in the rest of the patch here for now...)
> 
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_reset - Initialize TLB invalidation reset
> > > > > > + * xe_tlb_inval_reset() - TLB invalidation reset
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   *
> > > > > >   * Signal any pending invalidation fences, should be called
> > > > > > during a GT reset
> > > > > >   */
> > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > >         int pending_seqno;
> > > > > >  
> > > > > >         /*
> > > > > > -        * we can get here before the CTs are even
> > > > > > initialized if
> > > > > > we're wedging
> > > > > > -        * very early, in which case there are not going to
> > > > > > be
> > > > > > any pending
> > > > > > -        * fences so we can bail immediately.
> > > > > > +        * we can get here before the backends are even
> > > > > > initialized if we're
> > > > > > +        * wedging very early, in which case there are not
> > > > > > going
> > > > > > to be any
> > > > > > +        * pendind fences so we can bail immediately.
> > > > > >          */
> > > > > > -       if (!xe_guc_ct_initialized(&gt->uc.guc.ct))
> > > > > > +       if (!tlb_inval->ops->initialized(tlb_inval))
> > > > > >                 return;
> > > > > >  
> > > > > >         /*
> > > > > > -        * CT channel is already disabled at this point. No
> > > > > > new
> > > > > > TLB requests can
> > > > > > +        * Backend is already disabled at this point. No new
> > > > > > TLB
> > > > > > requests can
> > > > > >          * appear.
> > > > > >          */
> > > > > >  
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -       spin_lock_irq(&gt->tlb_inval.pending_lock);
> > > > > > -       cancel_delayed_work(&gt->tlb_inval.fence_tdr);
> > > > > > +       tlb_inval->ops->lock(tlb_inval);
> > > > > 
> > > > > I think you want a dedicated lock embedded in struct
> > > > > xe_tlb_inval,
> > > > > rather than reaching into the backend to grab one.
> > > > > 
> > > > > This will deadlock as written: G2H TLB inval messages are
> > > > > sometimes
> > > > > processed while holding ct->lock (non-fast path, unlikely) and
> > > > > sometimes
> > > > > without it (fast path, likely).
> > > > 
> > > > Ugh, I'm off today. Ignore the deadlock part, I was confusing
> > > > myself...
> > > > I was thinking this was the function xe_tlb_inval_done_handler,
> > > > it is
> > > > not. I still think xe_tlb_inval should its own lock but this
> > > > patch
> > > > written should work with s/xe_guc_ct_send/xe_guc_ct_send_locked.
> > > 
> > > So one reason I didn't go that way is we did just the reverse
> > > recently
> > > - moved from a TLB dedicated lock to the more specific CT lock
> > > since
> > > these are all going into the CT handler anyway when we use GuC
> > > submission. Then this embedded version allows us to lock at the
> > > bottom
> > > data layer rather than having a separate lock in the upper layer.
> > > Another thing is we might want to have different types of
> > > invalidation
> > > running in parallel without locking the data in the upper layer
> > > since
> > > the real contention would be in the lower level pipelining anyway.
> > > 
> > 
> > I can see the reasoning behind this approach, and maybe it’s fine.
> > 
> > But consider the case where the GuC backend has to look up a VM,
> > iterate
> > over a list of exec queues, and send multiple H2Gs to the hardware,
> > each
> > with a corresponding G2H (per-context invalidations). In the worst
> > case,
> > the CT code may have to wait for and process some G2Hs because our
> > G2H
> > credits are exhausted—all while holding the CT lock, which currently
> > blocks any hardware submissions (i.e., hardware submissions need the
> > CT
> > lock). Now imagine multiple sources issuing invalidations: they could
> > grab the CT lock before a submission waiting on it, further delaying
> > that
> > submission. 
> > 
> > The longer a mutex is held, the more likely the CPU thread holding it
> > could switched out while holding it.
> > 
> > This doesn’t seem scalable compared to using a finer-grained CT lock
> > (e.g., only taking it in xe_guc_ct_send).
> > 
> > I’m not saying this won’t work as you have it—I think it will—but the
> > consequences of holding the CT lock for an extended period need to be
> > considered.
> 
> Couple more thoughts.. so in the case you mentioned, ideally I'd like
> to have just a single invalidation per request, rather than across a
> whole VM. That's the reason we have the range based invalidation to

Yes, this is ranged based.

> begin with. If we get to the point where we want to make that even
> finer, that's great, but we should still just have a single
> invalidation per request (again, ideally).
> 

Maybe you have a different idea, but I was thinking of queue-based
invalidations: the frontend assigns a single seqno, the backend issues N
invalidations to the hardware—one per GCID mapped in the VM/GT tuple—and
then signals the frontend when all invalidations associated with the
seqno are complete. With the GuC, a GCID corresponds to each exec queue’s
gucid mapped in the VM/GT tuple. Different backends can handle this
differently.

> Also, you already have some patches up on the list that do some
> coalescing of invalidations so we reduce the number of invalidations
> for multiple ranges. I didn't want to include those patches here
> because IMO they are really a separate feature here and it'd be nice to
> review that on its own.
>

I agree it is a seperate thing, that should help in some cases, and
should be reviewed on its own.

That doesn't help in the case of multiple VM's issuing invalidations
though (think eviction is occuring or MMU notifiers are firing). The
lock contenion is moved from a dedicated TLB invalidation lock, to a
widely shared CT lock. If multiple TLB invalidations are contending, now
all other users of the CT lock contend at this higher level. i.e., by
only acquring CT lock at last part of an invalidation, other waiters
(non-invalidation) get QoS.

Matt

> So basically, the per request lock here also pushes us to implement in
> a more efficient and precise way rather than just hammering as many
> invalidations over a given range as possible.
> 
> And of course there are going to need to be bigger hammer invalidations
> sometimes (like the full VF invalidation we're doing in the
> invalidate_all() routines), but those still fall into the same category
> of precision, just with a larger scope (rather than multiple smaller
> invalidations).
> 
> Thanks,
> Stuart
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Stuart
> > > 
> > > > 
> > > > Matt 
> > > > 
> > > > > 
> > > > > I’d call this lock seqno_lock, since it protects exactly
> > > > > that—the
> > > > > order
> > > > > in which a seqno is assigned by the frontend and handed to the
> > > > > backend.
> > > > > 
> > > > > Prime this lock for reclaim as well—do what primelockdep() does
> > > > > in
> > > > > xe_guc_ct.c—to make it clear that memory allocations are not
> > > > > allowed
> > > > > while the lock is held as TLB invalidations can be called from
> > > > > two
> > > > > reclaim paths:
> > > > > 
> > > > > - MMU notifier callbacks
> > > > > - The dma-fence signaling path of VM binds that require a TLB
> > > > >   invalidation
> > > > > 
> > > > > > +       spin_lock_irq(&tlb_inval->pending_lock);
> > > > > > +       cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > >         /*
> > > > > >          * We might have various kworkers waiting for TLB
> > > > > > flushes
> > > > > > to complete
> > > > > >          * which are not tracked with an explicit TLB fence,
> > > > > > however at this
> > > > > > -        * stage that will never happen since the CT is
> > > > > > already
> > > > > > disabled, so
> > > > > > -        * make sure we signal them here under the assumption
> > > > > > that we have
> > > > > > +        * stage that will never happen since the backend is
> > > > > > already disabled,
> > > > > > +        * so make sure we signal them here under the
> > > > > > assumption
> > > > > > that we have
> > > > > >          * completed a full GT reset.
> > > > > >          */
> > > > > > -       if (gt->tlb_inval.seqno == 1)
> > > > > > +       if (tlb_inval->seqno == 1)
> > > > > >                 pending_seqno = TLB_INVALIDATION_SEQNO_MAX -
> > > > > > 1;
> > > > > >         else
> > > > > > -               pending_seqno = gt->tlb_inval.seqno - 1;
> > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, pending_seqno);
> > > > > > +               pending_seqno = tlb_inval->seqno - 1;
> > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, pending_seqno);
> > > > > >  
> > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > -                                &gt-
> > > > > > >tlb_inval.pending_fences,
> > > > > > link)
> > > > > > -               inval_fence_signal(gt_to_xe(gt), fence);
> > > > > > -       spin_unlock_irq(&gt->tlb_inval.pending_lock);
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > +                                &tlb_inval->pending_fences,
> > > > > > link)
> > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > > +       spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > > +       tlb_inval->ops->unlock(tlb_inval);
> > > > > >  }
> > > > > >  
> > > > > > -static bool tlb_inval_seqno_past(struct xe_gt *gt, int
> > > > > > seqno)
> > > > > > +static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval
> > > > > > *tlb_inval, int seqno)
> > > > > >  {
> > > > > > -       int seqno_recv = READ_ONCE(gt->tlb_inval.seqno_recv);
> > > > > > +       int seqno_recv = READ_ONCE(tlb_inval->seqno_recv);
> > > > > > +
> > > > > > +       lockdep_assert_held(&tlb_inval->pending_lock);
> > > > > >  
> > > > > >         if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX
> > > > > > /
> > > > > > 2))
> > > > > >                 return false;
> > > > > > @@ -201,44 +192,20 @@ static bool tlb_inval_seqno_past(struct
> > > > > > xe_gt *gt, int seqno)
> > > > > >         return seqno_recv >= seqno;
> > > > > >  }
> > > > > >  
> > > > > > -static int send_tlb_inval(struct xe_guc *guc, const u32
> > > > > > *action,
> > > > > > int len)
> > > > > > -{
> > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > -
> > > > > > -       xe_gt_assert(gt, action[1]);    /* Seqno */
> > > > > > -       lockdep_assert_held(&guc->ct.lock);
> > > > > > -
> > > > > > -       /*
> > > > > > -        * XXX: The seqno algorithm relies on TLB
> > > > > > invalidation
> > > > > > being processed
> > > > > > -        * in order which they currently are, if that changes
> > > > > > the
> > > > > > algorithm will
> > > > > > -        * need to be updated.
> > > > > > -        */
> > > > > > -
> > > > > > -       xe_gt_stats_incr(gt, XE_GT_STATS_ID_TLB_INVAL, 1);
> > > > > > -
> > > > > > -       return xe_guc_ct_send(&guc->ct, action, len,
> > > > > > -                             G2H_LEN_DW_TLB_INVALIDATE, 1);
> > > > > > -}
> > > > > > -
> > > > > >  static void xe_tlb_inval_fence_prep(struct
> > > > > > xe_tlb_inval_fence
> > > > > > *fence)
> > > > > >  {
> > > > > >         struct xe_tlb_inval *tlb_inval = fence->tlb_inval;
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -
> > > > > > -       lockdep_assert_held(&gt->uc.guc.ct.lock);
> > > > > >  
> > > > > >         fence->seqno = tlb_inval->seqno;
> > > > > > -       trace_xe_tlb_inval_fence_send(xe, fence);
> > > > > > +       trace_xe_tlb_inval_fence_send(tlb_inval->xe, fence);
> > > > > >  
> > > > > >         spin_lock_irq(&tlb_inval->pending_lock);
> > > > > >         fence->inval_time = ktime_get();
> > > > > >         list_add_tail(&fence->link, &tlb_inval-
> > > > > > >pending_fences);
> > > > > >  
> > > > > >         if (list_is_singular(&tlb_inval->pending_fences))
> > > > > > -               queue_delayed_work(system_wq,
> > > > > > -                                  &tlb_inval->fence_tdr,
> > > > > > -                                  tlb_timeout_jiffies(gt));
> > > > > > +               queue_delayed_work(system_wq, &tlb_inval-
> > > > > > > fence_tdr,
> > > > > > +                                  tlb_inval->ops-
> > > > > > > timeout_delay(tlb_inval));
> > > > > >         spin_unlock_irq(&tlb_inval->pending_lock);
> > > > > >  
> > > > > >         tlb_inval->seqno = (tlb_inval->seqno + 1) %
> > > > > > @@ -247,202 +214,63 @@ static void
> > > > > > xe_tlb_inval_fence_prep(struct
> > > > > > xe_tlb_inval_fence *fence)
> > > > > >                 tlb_inval->seqno = 1;
> > > > > >  }
> > > > > >  
> > > > > > -#define MAKE_INVAL_OP(type)    ((type <<
> > > > > > XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
> > > > > > -               XE_GUC_TLB_INVAL_MODE_HEAVY <<
> > > > > > XE_GUC_TLB_INVAL_MODE_SHIFT | \
> > > > > > -               XE_GUC_TLB_INVAL_FLUSH_CACHE)
> > > > > > -
> > > > > > -static int send_tlb_inval_ggtt(struct xe_gt *gt, int seqno)
> > > > > > -{
> > > > > > -       u32 action[] = {
> > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION,
> > > > > > -               seqno,
> > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
> > > > > > -       };
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > ARRAY_SIZE(action));
> > > > > > -}
> > > > > > -
> > > > > > -static int send_tlb_inval_all(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > -                             struct xe_tlb_inval_fence
> > > > > > *fence)
> > > > > > -{
> > > > > > -       u32 action[] = {
> > > > > > -               XE_GUC_ACTION_TLB_INVALIDATION_ALL,
> > > > > > -               0,  /* seqno, replaced in send_tlb_inval */
> > > > > > -               MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > > > -       };
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -
> > > > > > -       xe_gt_assert(gt, fence);
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action,
> > > > > > ARRAY_SIZE(action));
> > > > > > -}
> > > > > > +#define xe_tlb_inval_issue(__tlb_inval, __fence, op,
> > > > > > args...)  \
> > > > > > +({                                                          
> > > > > >    \
> > > > > > +       int
> > > > > > __ret;                                              \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       xe_assert((__tlb_inval)->xe, (__tlb_inval)-
> > > > > > >ops);       \
> > > > > > +       xe_assert((__tlb_inval)->xe,
> > > > > > (__fence));                \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       (__tlb_inval)->ops-
> > > > > > >lock((__tlb_inval));                \
> > > > > > +       xe_tlb_inval_fence_prep((__fence));                  
> > > > > >    \
> > > > > > +       __ret = op((__tlb_inval), (__fence)->seqno,
> > > > > > ##args);    \
> > > > > > +       if (__ret <
> > > > > > 0)                                          \
> > > > > > +               xe_tlb_inval_fence_signal_unlocked((__fence))
> > > > > > ;  \
> > > > > > +       (__tlb_inval)->ops-
> > > > > > >unlock((__tlb_inval));              \
> > > > > > +                                                            
> > > > > >    \
> > > > > > +       __ret == -ECANCELED ? 0 :
> > > > > > __ret;                        \
> > > > > > +})
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_gt_tlb_invalidation_all - Invalidate all TLBs across
> > > > > > PF
> > > > > > and all VFs.
> > > > > > - * @gt: the &xe_gt structure
> > > > > > - * @fence: the &xe_tlb_inval_fence to be signaled on
> > > > > > completion
> > > > > > + * xe_tlb_inval_all() - Issue a TLB invalidation for all
> > > > > > TLBs
> > > > > > + * @tlb_inval: TLB invalidation client
> > > > > > + * @fence: invalidation fence which will be signal on TLB
> > > > > > invalidation
> > > > > > + * completion
> > > > > >   *
> > > > > > - * Send a request to invalidate all TLBs across PF and all
> > > > > > VFs.
> > > > > > + * Issue a TLB invalidation for all TLBs. Completion of TLB
> > > > > > is
> > > > > > asynchronous and
> > > > > > + * caller can use the invalidation fence to wait for
> > > > > > completion.
> > > > > >   *
> > > > > >   * Return: 0 on success, negative error code on error
> > > > > >   */
> > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > >                      struct xe_tlb_inval_fence *fence)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       int err;
> > > > > > -
> > > > > > -       err = send_tlb_inval_all(tlb_inval, fence);
> > > > > > -       if (err)
> > > > > > -               xe_gt_err(gt, "TLB invalidation request
> > > > > > failed
> > > > > > (%pe)", ERR_PTR(err));
> > > > > > -
> > > > > > -       return err;
> > > > > > -}
> > > > > > -
> > > > > > -/*
> > > > > > - * Ensure that roundup_pow_of_two(length) doesn't overflow.
> > > > > > - * Note that roundup_pow_of_two() operates on unsigned long,
> > > > > > - * not on u64.
> > > > > > - */
> > > > > > -#define MAX_RANGE_TLB_INVALIDATION_LENGTH
> > > > > > (rounddown_pow_of_two(ULONG_MAX))
> > > > > > -
> > > > > > -static int send_tlb_inval_ppgtt(struct xe_gt *gt, u64 start,
> > > > > > u64
> > > > > > end,
> > > > > > -                               u32 asid, int seqno)
> > > > > > -{
> > > > > > -#define MAX_TLB_INVALIDATION_LEN       7
> > > > > > -       u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > > > -       u64 length = end - start;
> > > > > > -       int len = 0;
> > > > > > -
> > > > > > -       action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > > > -       action[len++] = seqno;
> > > > > > -       if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > > > -           length > MAX_RANGE_TLB_INVALIDATION_LENGTH) {
> > > > > > -               action[len++] =
> > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
> > > > > > -       } else {
> > > > > > -               u64 orig_start = start;
> > > > > > -               u64 align;
> > > > > > -
> > > > > > -               if (length < SZ_4K)
> > > > > > -                       length = SZ_4K;
> > > > > > -
> > > > > > -               /*
> > > > > > -                * We need to invalidate a higher granularity
> > > > > > if
> > > > > > start address
> > > > > > -                * is not aligned to length. When start is
> > > > > > not
> > > > > > aligned with
> > > > > > -                * length we need to find the length large
> > > > > > enough
> > > > > > to create an
> > > > > > -                * address mask covering the required range.
> > > > > > -                */
> > > > > > -               align = roundup_pow_of_two(length);
> > > > > > -               start = ALIGN_DOWN(start, align);
> > > > > > -               end = ALIGN(end, align);
> > > > > > -               length = align;
> > > > > > -               while (start + length < end) {
> > > > > > -                       length <<= 1;
> > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > length);
> > > > > > -               }
> > > > > > -
> > > > > > -               /*
> > > > > > -                * Minimum invalidation size for a 2MB page
> > > > > > that
> > > > > > the hardware
> > > > > > -                * expects is 16MB
> > > > > > -                */
> > > > > > -               if (length >= SZ_2M) {
> > > > > > -                       length = max_t(u64, SZ_16M, length);
> > > > > > -                       start = ALIGN_DOWN(orig_start,
> > > > > > length);
> > > > > > -               }
> > > > > > -
> > > > > > -               xe_gt_assert(gt, length >= SZ_4K);
> > > > > > -               xe_gt_assert(gt, is_power_of_2(length));
> > > > > > -               xe_gt_assert(gt, !(length &
> > > > > > GENMASK(ilog2(SZ_16M)
> > > > > > - 1,
> > > > > > -                                                  
> > > > > > ilog2(SZ_2M)
> > > > > > + 1)));
> > > > > > -               xe_gt_assert(gt, IS_ALIGNED(start, length));
> > > > > > -
> > > > > > -               action[len++] =
> > > > > > MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
> > > > > > -               action[len++] = asid;
> > > > > > -               action[len++] = lower_32_bits(start);
> > > > > > -               action[len++] = upper_32_bits(start);
> > > > > > -               action[len++] = ilog2(length) - ilog2(SZ_4K);
> > > > > > -       }
> > > > > > -
> > > > > > -       xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN);
> > > > > > -
> > > > > > -       return send_tlb_inval(&gt->uc.guc, action, len);
> > > > > > -}
> > > > > > -
> > > > > > -static int __xe_tlb_inval_ggtt(struct xe_gt *gt,
> > > > > > -                              struct xe_tlb_inval_fence
> > > > > > *fence)
> > > > > > -{
> > > > > > -       int ret;
> > > > > > -
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > -
> > > > > > -       ret = send_tlb_inval_ggtt(gt, fence->seqno);
> > > > > > -       if (ret < 0)
> > > > > > -               inval_fence_signal_unlocked(gt_to_xe(gt),
> > > > > > fence);
> > > > > > -
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       /*
> > > > > > -        * -ECANCELED indicates the CT is stopped for a GT
> > > > > > reset.
> > > > > > TLB caches
> > > > > > -        *  should be nuked on a GT reset so this error can
> > > > > > be
> > > > > > ignored.
> > > > > > -        */
> > > > > > -       if (ret == -ECANCELED)
> > > > > > -               return 0;
> > > > > > -
> > > > > > -       return ret;
> > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > tlb_inval-
> > > > > > > ops->all);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_ggtt - Issue a TLB invalidation on this GT
> > > > > > for
> > > > > > the GGTT
> > > > > > + * xe_tlb_inval_ggtt() - Issue a TLB invalidation for the
> > > > > > GGTT
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   *
> > > > > > - * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > > invalidation is
> > > > > > - * synchronous.
> > > > > > + * Issue a TLB invalidation for the GGTT. Completion of TLB
> > > > > > is
> > > > > > asynchronous and
> > > > > > + * caller can use the invalidation fence to wait for
> > > > > > completion.
> > > > > >   *
> > > > > >   * Return: 0 on success, negative error code on error
> > > > > >   */
> > > > > >  int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -       unsigned int fw_ref;
> > > > > > -
> > > > > > -       if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
> > > > > > -           gt->uc.guc.submission_state.enabled) {
> > > > > > -               struct xe_tlb_inval_fence fence;
> > > > > > -               int ret;
> > > > > > -
> > > > > > -               xe_tlb_inval_fence_init(tlb_inval, &fence,
> > > > > > true);
> > > > > > -               ret = __xe_tlb_inval_ggtt(gt, &fence);
> > > > > > -               if (ret)
> > > > > > -                       return ret;
> > > > > > -
> > > > > > -               xe_tlb_inval_fence_wait(&fence);
> > > > > > -       } else if (xe_device_uc_enabled(xe) &&
> > > > > > !xe_device_wedged(xe)) {
> > > > > > -               struct xe_mmio *mmio = &gt->mmio;
> > > > > > -
> > > > > > -               if (IS_SRIOV_VF(xe))
> > > > > > -                       return 0;
> > > > > > -
> > > > > > -               fw_ref = xe_force_wake_get(gt_to_fw(gt),
> > > > > > XE_FW_GT);
> > > > > > -               if (xe->info.platform == XE_PVC ||
> > > > > > GRAPHICS_VER(xe) >= 20) {
> > > > > > -                       xe_mmio_write32(mmio,
> > > > > > PVC_GUC_TLB_INV_DESC1,
> > > > > > -
> > > > > >                                        PVC_GUC_TLB_INV_DESC1_
> > > > > > INVAL
> > > > > > IDATE);
> > > > > > -                       xe_mmio_write32(mmio,
> > > > > > PVC_GUC_TLB_INV_DESC0,
> > > > > > -
> > > > > >                                        PVC_GUC_TLB_INV_DESC0_
> > > > > > VALID
> > > > > > );
> > > > > > -               } else {
> > > > > > -                       xe_mmio_write32(mmio, GUC_TLB_INV_CR,
> > > > > > -
> > > > > >                                        GUC_TLB_INV_CR_INVALID
> > > > > > ATE);
> > > > > > -               }
> > > > > > -               xe_force_wake_put(gt_to_fw(gt), fw_ref);
> > > > > > -       }
> > > > > > +       struct xe_tlb_inval_fence fence, *fence_ptr = &fence;
> > > > > > +       int ret;
> > > > > >  
> > > > > > -       return 0;
> > > > > > +       xe_tlb_inval_fence_init(tlb_inval, fence_ptr, true);
> > > > > > +       ret = xe_tlb_inval_issue(tlb_inval, fence_ptr,
> > > > > > tlb_inval-
> > > > > > > ops->ggtt);
> > > > > > +       xe_tlb_inval_fence_wait(fence_ptr);
> > > > > > +
> > > > > > +       return ret;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_range - Issue a TLB invalidation on this GT
> > > > > > for
> > > > > > an address range
> > > > > > + * xe_tlb_inval_range() - Issue a TLB invalidation for an
> > > > > > address range
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @fence: invalidation fence which will be signal on TLB
> > > > > > invalidation
> > > > > >   * completion
> > > > > > @@ -460,33 +288,12 @@ int xe_tlb_inval_range(struct
> > > > > > xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > >                        struct xe_tlb_inval_fence *fence, u64
> > > > > > start, u64 end,
> > > > > >                        u32 asid)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > -       int  ret;
> > > > > > -
> > > > > > -       xe_gt_assert(gt, fence);
> > > > > > -
> > > > > > -       /* Execlists not supported */
> > > > > > -       if (xe->info.force_execlist) {
> > > > > > -               __inval_fence_signal(xe, fence);
> > > > > > -               return 0;
> > > > > > -       }
> > > > > > -
> > > > > > -       mutex_lock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       xe_tlb_inval_fence_prep(fence);
> > > > > > -
> > > > > > -       ret = send_tlb_inval_ppgtt(gt, start, end, asid,
> > > > > > fence-
> > > > > > > seqno);
> > > > > > -       if (ret < 0)
> > > > > > -               inval_fence_signal_unlocked(xe, fence);
> > > > > > -
> > > > > > -       mutex_unlock(&gt->uc.guc.ct.lock);
> > > > > > -
> > > > > > -       return ret;
> > > > > > +       return xe_tlb_inval_issue(tlb_inval, fence,
> > > > > > tlb_inval-
> > > > > > > ops->ppgtt,
> > > > > > +                                 start, end, asid);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_vm - Issue a TLB invalidation on this GT for
> > > > > > a
> > > > > > VM
> > > > > > + * xe_tlb_inval_vm() - Issue a TLB invalidation for a VM
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @vm: VM to invalidate
> > > > > >   *
> > > > > > @@ -496,27 +303,22 @@ void xe_tlb_inval_vm(struct
> > > > > > xe_tlb_inval
> > > > > > *tlb_inval, struct xe_vm *vm)
> > > > > >  {
> > > > > >         struct xe_tlb_inval_fence fence;
> > > > > >         u64 range = 1ull << vm->xe->info.va_bits;
> > > > > > -       int ret;
> > > > > >  
> > > > > >         xe_tlb_inval_fence_init(tlb_inval, &fence, true);
> > > > > > -
> > > > > > -       ret = xe_tlb_inval_range(tlb_inval, &fence, 0, range,
> > > > > > vm-
> > > > > > > usm.asid);
> > > > > > -       if (ret < 0)
> > > > > > -               return;
> > > > > > -
> > > > > > +       xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm-
> > > > > > > usm.asid);
> > > > > >         xe_tlb_inval_fence_wait(&fence);
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_done_handler - TLB invalidation done handler
> > > > > > - * @gt: gt
> > > > > > + * xe_tlb_inval_done_handler() - TLB invalidation done
> > > > > > handler
> > > > > > + * @tlb_inval: TLB invalidation client
> > > > > >   * @seqno: seqno of invalidation that is done
> > > > > >   *
> > > > > >   * Update recv seqno, signal any TLB invalidation fences,
> > > > > > and
> > > > > > restart TDR
> > > > > 
> > > > > I'd mention that is function is safe be called from any context
> > > > > (i.e.,
> > > > > process, atomic, and hardirq contexts are allowed).
> > > > > 
> > > > > We might need to convert tlb_inval.pending_lock to a
> > > > > raw_spinlock_t
> > > > > for
> > > > > PREEMPT_RT enablement. Same for the GuC fast_lock. AFAIK we
> > > > > haven’t
> > > > > had
> > > > > any complaints, so maybe I’m just overthinking it, but also
> > > > > perhaps
> > > > > not.
> > > > > 
> > > > > >   */
> > > > > > -static void xe_tlb_inval_done_handler(struct xe_gt *gt, int
> > > > > > seqno)
> > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > int seqno)
> > > > > >  {
> > > > > > -       struct xe_device *xe = gt_to_xe(gt);
> > > > > > +       struct xe_device *xe = tlb_inval->xe;
> > > > > >         struct xe_tlb_inval_fence *fence, *next;
> > > > > >         unsigned long flags;
> > > > > >  
> > > > > > @@ -535,77 +337,53 @@ static void
> > > > > > xe_tlb_inval_done_handler(struct xe_gt *gt, int seqno)
> > > > > >          * officially process the CT message like if racing
> > > > > > against
> > > > > >          * process_g2h_msg().
> > > > > >          */
> > > > > > -       spin_lock_irqsave(&gt->tlb_inval.pending_lock,
> > > > > > flags);
> > > > > > -       if (tlb_inval_seqno_past(gt, seqno)) {
> > > > > > -               spin_unlock_irqrestore(&gt-
> > > > > > > tlb_inval.pending_lock, flags);
> > > > > > +       spin_lock_irqsave(&tlb_inval->pending_lock, flags);
> > > > > > +       if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) {
> > > > > > +               spin_unlock_irqrestore(&tlb_inval-
> > > > > > >pending_lock,
> > > > > > flags);
> > > > > >                 return;
> > > > > >         }
> > > > > >  
> > > > > > -       WRITE_ONCE(gt->tlb_inval.seqno_recv, seqno);
> > > > > > +       WRITE_ONCE(tlb_inval->seqno_recv, seqno);
> > > > > >  
> > > > > >         list_for_each_entry_safe(fence, next,
> > > > > > -                                &gt-
> > > > > > >tlb_inval.pending_fences,
> > > > > > link) {
> > > > > > +                                &tlb_inval->pending_fences,
> > > > > > link) {
> > > > > >                 trace_xe_tlb_inval_fence_recv(xe, fence);
> > > > > >  
> > > > > > -               if (!tlb_inval_seqno_past(gt, fence->seqno))
> > > > > > +               if (!xe_tlb_inval_seqno_past(tlb_inval,
> > > > > > fence-
> > > > > > > seqno))
> > > > > >                         break;
> > > > > >  
> > > > > > -               inval_fence_signal(xe, fence);
> > > > > > +               xe_tlb_inval_fence_signal(fence);
> > > > > >         }
> > > > > >  
> > > > > > -       if (!list_empty(&gt->tlb_inval.pending_fences))
> > > > > > +       if (!list_empty(&tlb_inval->pending_fences))
> > > > > >                 mod_delayed_work(system_wq,
> > > > > > -                                &gt->tlb_inval.fence_tdr,
> > > > > > -                                tlb_timeout_jiffies(gt));
> > > > > > +                                &tlb_inval->fence_tdr,
> > > > > > +                                tlb_inval->ops-
> > > > > > > timeout_delay(tlb_inval));
> > > > > >         else
> > > > > > -               cancel_delayed_work(&gt-
> > > > > > >tlb_inval.fence_tdr);
> > > > > > +               cancel_delayed_work(&tlb_inval->fence_tdr);
> > > > > >  
> > > > > > -       spin_unlock_irqrestore(&gt->tlb_inval.pending_lock,
> > > > > > flags);
> > > > > > -}
> > > > > > -
> > > > > > -/**
> > > > > > - * xe_guc_tlb_inval_done_handler - TLB invalidation done
> > > > > > handler
> > > > > > - * @guc: guc
> > > > > > - * @msg: message indicating TLB invalidation done
> > > > > > - * @len: length of message
> > > > > > - *
> > > > > > - * Parse seqno of TLB invalidation, wake any waiters for
> > > > > > seqno,
> > > > > > and signal any
> > > > > > - * invalidation fences for seqno. Algorithm for this depends
> > > > > > on
> > > > > > seqno being
> > > > > > - * received in-order and asserts this assumption.
> > > > > > - *
> > > > > > - * Return: 0 on success, -EPROTO for malformed messages.
> > > > > > - */
> > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > > *msg,
> > > > > > u32 len)
> > > > > > -{
> > > > > > -       struct xe_gt *gt = guc_to_gt(guc);
> > > > > > -
> > > > > > -       if (unlikely(len != 1))
> > > > > > -               return -EPROTO;
> > > > > > -
> > > > > > -       xe_tlb_inval_done_handler(gt, msg[0]);
> > > > > > -
> > > > > > -       return 0;
> > > > > > +       spin_unlock_irqrestore(&tlb_inval->pending_lock,
> > > > > > flags);
> > > > > >  }
> > > > > >  
> > > > > >  static const char *
> > > > > > -inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > > +xe_inval_fence_get_driver_name(struct dma_fence *dma_fence)
> > > > > >  {
> > > > > >         return "xe";
> > > > > >  }
> > > > > >  
> > > > > >  static const char *
> > > > > > -inval_fence_get_timeline_name(struct dma_fence *dma_fence)
> > > > > > +xe_inval_fence_get_timeline_name(struct dma_fence
> > > > > > *dma_fence)
> > > > > >  {
> > > > > > -       return "inval_fence";
> > > > > > +       return "tlb_inval_fence";
> > > > > >  }
> > > > > >  
> > > > > >  static const struct dma_fence_ops inval_fence_ops = {
> > > > > > -       .get_driver_name = inval_fence_get_driver_name,
> > > > > > -       .get_timeline_name = inval_fence_get_timeline_name,
> > > > > > +       .get_driver_name = xe_inval_fence_get_driver_name,
> > > > > > +       .get_timeline_name =
> > > > > > xe_inval_fence_get_timeline_name,
> > > > > >  };
> > > > > >  
> > > > > >  /**
> > > > > > - * xe_tlb_inval_fence_init - Initialize TLB invalidation
> > > > > > fence
> > > > > > + * xe_tlb_inval_fence_init() - Initialize TLB invalidation
> > > > > > fence
> > > > > >   * @tlb_inval: TLB invalidation client
> > > > > >   * @fence: TLB invalidation fence to initialize
> > > > > >   * @stack: fence is stack variable
> > > > > > @@ -618,15 +396,12 @@ void xe_tlb_inval_fence_init(struct
> > > > > > xe_tlb_inval *tlb_inval,
> > > > > >                              struct xe_tlb_inval_fence
> > > > > > *fence,
> > > > > >                              bool stack)
> > > > > >  {
> > > > > > -       struct xe_gt *gt = tlb_inval->private;
> > > > > > -
> > > > > > -       xe_pm_runtime_get_noresume(gt_to_xe(gt));
> > > > > > +       xe_pm_runtime_get_noresume(tlb_inval->xe);
> > > > > >  
> > > > > > -       spin_lock_irq(&gt->tlb_inval.lock);
> > > > > > -       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > -                      &gt->tlb_inval.lock,
> > > > > > +       spin_lock_irq(&tlb_inval->lock);
> > > > > > +       dma_fence_init(&fence->base, &inval_fence_ops,
> > > > > > &tlb_inval->lock,
> > > > > >                        dma_fence_context_alloc(1), 1);
> > > > > > -       spin_unlock_irq(&gt->tlb_inval.lock);
> > > > > > +       spin_unlock_irq(&tlb_inval->lock);
> > > > > 
> > > > > While here, 'fence_lock' is probably a better name.
> > > > > 
> > > > > Matt
> > > > > 
> > > > > >         INIT_LIST_HEAD(&fence->link);
> > > > > >         if (stack)
> > > > > >                 set_bit(FENCE_STACK_BIT, &fence->base.flags);
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > index 7adee3f8c551..cdeafc8d4391 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.h
> > > > > > @@ -18,24 +18,30 @@ struct xe_vma;
> > > > > >  int xe_gt_tlb_inval_init_early(struct xe_gt *gt);
> > > > > >  
> > > > > >  void xe_tlb_inval_reset(struct xe_tlb_inval *tlb_inval);
> > > > > > -int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > -void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > > xe_vm *vm);
> > > > > >  int xe_tlb_inval_all(struct xe_tlb_inval *tlb_inval,
> > > > > >                      struct xe_tlb_inval_fence *fence);
> > > > > > +int xe_tlb_inval_ggtt(struct xe_tlb_inval *tlb_inval);
> > > > > > +void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct
> > > > > > xe_vm *vm);
> > > > > >  int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval,
> > > > > >                        struct xe_tlb_inval_fence *fence,
> > > > > >                        u64 start, u64 end, u32 asid);
> > > > > > -int xe_guc_tlb_inval_done_handler(struct xe_guc *guc, u32
> > > > > > *msg,
> > > > > > u32 len);
> > > > > >  
> > > > > >  void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval,
> > > > > >                              struct xe_tlb_inval_fence
> > > > > > *fence,
> > > > > >                              bool stack);
> > > > > > -void xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence
> > > > > > *fence);
> > > > > >  
> > > > > > +/**
> > > > > > + * xe_tlb_inval_fence_wait() - TLB invalidiation fence wait
> > > > > > + * @fence: TLB invalidation fence to wait on
> > > > > > + *
> > > > > > + * Wait on a TLB invalidiation fence until it signals, non
> > > > > > interruptable
> > > > > > + */
> > > > > >  static inline void
> > > > > >  xe_tlb_inval_fence_wait(struct xe_tlb_inval_fence *fence)
> > > > > >  {
> > > > > >         dma_fence_wait(&fence->base, false);
> > > > > >  }
> > > > > >  
> > > > > > +void xe_tlb_inval_done_handler(struct xe_tlb_inval
> > > > > > *tlb_inval,
> > > > > > int seqno);
> > > > > > +
> > > > > >  #endif /* _XE_TLB_INVAL_ */
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > index 05b6adc929bb..c1ad96d24fc8 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> > > > > > @@ -9,10 +9,85 @@
> > > > > >  #include <linux/workqueue.h>
> > > > > >  #include <linux/dma-fence.h>
> > > > > >  
> > > > > > -/** struct xe_tlb_inval - TLB invalidation client */
> > > > > > +struct xe_tlb_inval;
> > > > > > +
> > > > > > +/** struct xe_tlb_inval_ops - TLB invalidation ops (backend)
> > > > > > */
> > > > > > +struct xe_tlb_inval_ops {
> > > > > > +       /**
> > > > > > +        * @all: Invalidate all TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*all)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @ggtt: Invalidate global translation TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*ggtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @ppttt: Invalidate per-process translation TLBs
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        * @seqno: Seqno of TLB invalidation
> > > > > > +        * @start: Start address
> > > > > > +        * @end: End address
> > > > > > +        * @asid: Address space ID
> > > > > > +        *
> > > > > > +        * Return 0 on success, -ECANCELED if backend is mid-
> > > > > > reset, error on
> > > > > > +        * failure
> > > > > > +        */
> > > > > > +       int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32
> > > > > > seqno,
> > > > > > u64 start,
> > > > > > +                    u64 end, u32 asid);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @initialized: Backend is initialized
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        *
> > > > > > +        * Return: True if back is initialized, False
> > > > > > otherwise
> > > > > > +        */
> > > > > > +       bool (*initialized)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @flush: Flush pending TLB invalidations
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        */
> > > > > > +       void (*flush)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @timeout_delay: Timeout delay for TLB invalidation
> > > > > > +        * @tlb_inval: TLB invalidation client
> > > > > > +        *
> > > > > > +        * Return: Timeout delay for TLB invalidation in
> > > > > > jiffies
> > > > > > +        */
> > > > > > +       long (*timeout_delay)(struct xe_tlb_inval
> > > > > > *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @lock: Lock resources protecting the backend seqno
> > > > > > management
> > > > > > +        */
> > > > > > +       void (*lock)(struct xe_tlb_inval *tlb_inval);
> > > > > > +
> > > > > > +       /**
> > > > > > +        * @unlock: Lock resources protecting the backend
> > > > > > seqno
> > > > > > management
> > > > > > +        */
> > > > > > +       void (*unlock)(struct xe_tlb_inval *tlb_inval);
> > > > > > +};
> > > > > > +
> > > > > > +/** struct xe_tlb_inval - TLB invalidation client (frontend)
> > > > > > */
> > > > > >  struct xe_tlb_inval {
> > > > > >         /** @private: Backend private pointer */
> > > > > >         void *private;
> > > > > > +       /** @xe: Pointer to Xe device */
> > > > > > +       struct xe_device *xe;
> > > > > > +       /** @ops: TLB invalidation ops */
> > > > > > +       const struct xe_tlb_inval_ops *ops;
> > > > > >         /** @tlb_inval.seqno: TLB invalidation seqno,
> > > > > > protected
> > > > > > by CT lock */
> > > > > >  #define TLB_INVALIDATION_SEQNO_MAX     0x100000
> > > > > >         int seqno;
> > > > > > -- 
> > > > > > 2.34.1
> > > > > > 
> > > 
>