[Intel-xe] [PATCH] drm/xe: fix tlb_invalidation_seqno_past()

Christopher Snowhill kode54 at gmail.com
Mon May 8 03:42:13 UTC 2023


Why is XE_FENCE_INITIAL_SEQNO defined to (-127) when seqno variable
types are mostly unsigned everywhere, except for logic that is
checking for wraps?

On Sun, May 7, 2023 at 8:26 PM Christopher Snowhill <kode54 at gmail.com> wrote:
>
> False alarm. Looks like there's a different seqno overflow somewhere in here:
>
> [  854.230925] xe 0000:28:00.0: [drm] Engine reset: guc_id=59
> [  854.236564] xe 0000:28:00.0: [drm] Timedout job: seqno=4294967169,
> guc_id=59, flags=0x8
> [  859.885349] xe 0000:28:00.0: [drm] Ioctl argument check failed at
> drivers/gpu/drm/xe/xe_exec.c:178: engine->flags & ENGINE_FLAG_BANNED
> [  859.885366] xe 0000:28:00.0: [drm] Ioctl argument check failed at
> drivers/gpu/drm/xe/xe_exec.c:178: engine->flags & ENGINE_FLAG_BANNED
>
> Also, curses to gmail for not defaulting to plain text mode.
>
>
> On Sun, May 7, 2023 at 8:19 PM Christopher Snowhill <kode54 at gmail.com> wrote:
> >
> > Wow, this patch made intel-compute-runtime suddenly start working properly instead of causing a "GPU hang" that wasn't really a hang but instead a seqno overflow.
> >
> > On Sun, May 7, 2023 at 5:53 PM Matthew Brost <matthew.brost at intel.com> wrote:
> >>
> >> On Fri, May 05, 2023 at 03:49:10PM +0100, Matthew Auld wrote:
> >> > Checking seqno_recv >= seqno looks like it will incorrectly report true
> >> > when the seqno has wrapped (not unlikely given
> >> > TLB_INVALIDATION_SEQNO_MAX). Calling xe_gt_tlb_invalidation_wait() might
> >> > then return before the flush has been completed by the GuC.
> >> >
> >> > Fix this by treating a large negative delta as an indication that the
> >> > seqno has wrapped around. Similar to how we treat a large positive delta
> >> > as an indication that the seqno_recv must have wrapped around, but in
> >> > that case the seqno has likely also signalled.
> >> >
> >> > It looks like we could also potentially make the seqno use the full
> >> > 32bits as supported by the GuC.
> >>
> >> Yea we def could use more of the space but in the end we have the seqno
> >> wrap issue. I think I set this to a low value to prove the wrapping
> >> protection worked (it didn't) by triigering wraps more often than the
> >> wrapping 32 bits.
> >>
> >> With, this patch LGTM.
> >>
> >> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
> >>
> >> >
> >> > Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> >> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> >> > Cc: Matthew Brost <matthew.brost at intel.com>
> >> > ---
> >> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 7 ++++---
> >> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> >> > index 604f189dbd70..67822b3dd353 100644
> >> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> >> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> >> > @@ -251,14 +251,15 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
> >> >
> >> >  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> >> >  {
> >> > -     if (gt->tlb_invalidation.seqno_recv >= seqno)
> >> > -             return true;
> >> > +     if (seqno - gt->tlb_invalidation.seqno_recv <
> >> > +         -(TLB_INVALIDATION_SEQNO_MAX / 2))
> >> > +             return false;
> >> >
> >> >       if (seqno - gt->tlb_invalidation.seqno_recv >
> >> >           (TLB_INVALIDATION_SEQNO_MAX / 2))
> >> >               return true;
> >> >
> >> > -     return false;
> >> > +     return gt->tlb_invalidation.seqno_recv >= seqno;
> >> >  }
> >> >
> >> >  /**
> >> > --
> >> > 2.40.0
> >> >


More information about the Intel-xe mailing list