<div dir="ltr">Wow, this patch made intel-compute-runtime suddenly start working properly instead of causing a "GPU hang" that wasn't really a hang but instead a seqno overflow.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, May 7, 2023 at 5:53 PM Matthew Brost <<a href="mailto:matthew.brost@intel.com">matthew.brost@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, May 05, 2023 at 03:49:10PM +0100, Matthew Auld wrote:<br>
> Checking seqno_recv >= seqno looks like it will incorrectly report true<br>
> when the seqno has wrapped (not unlikely given<br>
> TLB_INVALIDATION_SEQNO_MAX). Calling xe_gt_tlb_invalidation_wait() might<br>
> then return before the flush has been completed by the GuC.<br>
> <br>
> Fix this by treating a large negative delta as an indication that the<br>
> seqno has wrapped around. Similar to how we treat a large positive delta<br>
> as an indication that the seqno_recv must have wrapped around, but in<br>
> that case the seqno has likely also signalled.<br>
> <br>
> It looks like we could also potentially make the seqno use the full<br>
> 32bits as supported by the GuC.<br>
<br>
Yea we def could use more of the space but in the end we have the seqno<br>
wrap issue. I think I set this to a low value to prove the wrapping<br>
protection worked (it didn't) by triigering wraps more often than the<br>
wrapping 32 bits.<br>
<br>
With, this patch LGTM.<br>
<br>
Reviewed-by: Matthew Brost <<a href="mailto:matthew.brost@intel.com" target="_blank">matthew.brost@intel.com</a>><br>
<br>
> <br>
> Signed-off-by: Matthew Auld <<a href="mailto:matthew.auld@intel.com" target="_blank">matthew.auld@intel.com</a>><br>
> Cc: Thomas Hellström <<a href="mailto:thomas.hellstrom@linux.intel.com" target="_blank">thomas.hellstrom@linux.intel.com</a>><br>
> Cc: Matthew Brost <<a href="mailto:matthew.brost@intel.com" target="_blank">matthew.brost@intel.com</a>><br>
> ---<br>
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 7 ++++---<br>
> 1 file changed, 4 insertions(+), 3 deletions(-)<br>
> <br>
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c<br>
> index 604f189dbd70..67822b3dd353 100644<br>
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c<br>
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c<br>
> @@ -251,14 +251,15 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,<br>
> <br>
> static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)<br>
> {<br>
> - if (gt->tlb_invalidation.seqno_recv >= seqno)<br>
> - return true;<br>
> + if (seqno - gt->tlb_invalidation.seqno_recv <<br>
> + -(TLB_INVALIDATION_SEQNO_MAX / 2))<br>
> + return false;<br>
> <br>
> if (seqno - gt->tlb_invalidation.seqno_recv ><br>
> (TLB_INVALIDATION_SEQNO_MAX / 2))<br>
> return true;<br>
> <br>
> - return false;<br>
> + return gt->tlb_invalidation.seqno_recv >= seqno;<br>
> }<br>
> <br>
> /**<br>
> -- <br>
> 2.40.0<br>
> <br>
</blockquote></div>