[Intel-xe] [PATCH] drm/xe: fix tlb_invalidation_seqno_past()

Christopher Snowhill kode54 at gmail.com
Mon May 8 03:26:32 UTC 2023


False alarm. Looks like there's a different seqno overflow somewhere in here:

[  854.230925] xe 0000:28:00.0: [drm] Engine reset: guc_id=59
[  854.236564] xe 0000:28:00.0: [drm] Timedout job: seqno=4294967169,
guc_id=59, flags=0x8
[  859.885349] xe 0000:28:00.0: [drm] Ioctl argument check failed at
drivers/gpu/drm/xe/xe_exec.c:178: engine->flags & ENGINE_FLAG_BANNED
[  859.885366] xe 0000:28:00.0: [drm] Ioctl argument check failed at
drivers/gpu/drm/xe/xe_exec.c:178: engine->flags & ENGINE_FLAG_BANNED

Also, curses to gmail for not defaulting to plain text mode.


On Sun, May 7, 2023 at 8:19 PM Christopher Snowhill <kode54 at gmail.com> wrote:
>
> Wow, this patch made intel-compute-runtime suddenly start working properly instead of causing a "GPU hang" that wasn't really a hang but instead a seqno overflow.
>
> On Sun, May 7, 2023 at 5:53 PM Matthew Brost <matthew.brost at intel.com> wrote:
>>
>> On Fri, May 05, 2023 at 03:49:10PM +0100, Matthew Auld wrote:
>> > Checking seqno_recv >= seqno looks like it will incorrectly report true
>> > when the seqno has wrapped (not unlikely given
>> > TLB_INVALIDATION_SEQNO_MAX). Calling xe_gt_tlb_invalidation_wait() might
>> > then return before the flush has been completed by the GuC.
>> >
>> > Fix this by treating a large negative delta as an indication that the
>> > seqno has wrapped around. Similar to how we treat a large positive delta
>> > as an indication that the seqno_recv must have wrapped around, but in
>> > that case the seqno has likely also signalled.
>> >
>> > It looks like we could also potentially make the seqno use the full
>> > 32bits as supported by the GuC.
>>
>> Yea we def could use more of the space but in the end we have the seqno
>> wrap issue. I think I set this to a low value to prove the wrapping
>> protection worked (it didn't) by triigering wraps more often than the
>> wrapping 32 bits.
>>
>> With, this patch LGTM.
>>
>> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>>
>> >
>> > Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>> > Cc: Matthew Brost <matthew.brost at intel.com>
>> > ---
>> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 7 ++++---
>> >  1 file changed, 4 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> > index 604f189dbd70..67822b3dd353 100644
>> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> > @@ -251,14 +251,15 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
>> >
>> >  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
>> >  {
>> > -     if (gt->tlb_invalidation.seqno_recv >= seqno)
>> > -             return true;
>> > +     if (seqno - gt->tlb_invalidation.seqno_recv <
>> > +         -(TLB_INVALIDATION_SEQNO_MAX / 2))
>> > +             return false;
>> >
>> >       if (seqno - gt->tlb_invalidation.seqno_recv >
>> >           (TLB_INVALIDATION_SEQNO_MAX / 2))
>> >               return true;
>> >
>> > -     return false;
>> > +     return gt->tlb_invalidation.seqno_recv >= seqno;
>> >  }
>> >
>> >  /**
>> > --
>> > 2.40.0
>> >


More information about the Intel-xe mailing list