[Intel-xe] [PATCH v4 3/7] drm/xe/tlb: increment next seqno after successful CT send

Matthew Auld matthew.auld at intel.com
Thu Jul 6 15:22:28 UTC 2023


On 06/07/2023 16:15, Matthew Brost wrote:
> On Thu, Jul 06, 2023 at 10:42:37AM +0100, Matthew Auld wrote:
>> On 06/07/2023 04:59, Matthew Brost wrote:
>>> On Wed, Jul 05, 2023 at 05:06:06PM +0100, Matthew Auld wrote:
>>>> If we are in the middle of a GT reset or similar the CT might be
>>>> disabled, such that the CT send fails. However we already incremented
>>>> gt->tlb_invalidation.seqno which might lead to warnings, since we
>>>> effectively just skipped a seqno:
>>>>
>>>>       0000:00:02.0: drm_WARN_ON(expected_seqno != msg[0])
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> Cc: José Roberto de Souza <jose.souza at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 11 ++++++-----
>>>>    1 file changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> index 2fcb477604e2..b38da572d268 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> @@ -124,10 +124,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>>>    		trace_xe_gt_tlb_invalidation_fence_send(fence);
>>>>    	}
>>>>    	action[1] = seqno;
>>>> -	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>>>> -		TLB_INVALIDATION_SEQNO_MAX;
>>>> -	if (!gt->tlb_invalidation.seqno)
>>>> -		gt->tlb_invalidation.seqno = 1;
>>>>    	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
>>>>    				    G2H_LEN_DW_TLB_INVALIDATE, 1);
>>>>    	if (!ret && fence) {
>>>> @@ -137,8 +133,13 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>>>    					   &gt->tlb_invalidation.fence_tdr,
>>>>    					   TLB_TIMEOUT);
>>>>    	}
>>>> -	if (!ret)
>>>
>>> Do we now (after this entire series) have the another race where the
>>> below warn could fire as the CT fast path executes before we update the
>>> seqno value? Would it be better to just roll back the seqno on error?
>>
>> Ohh, so we do the CT send, we process the g2h message double quick on the
>> fast-path, before we even had a chance to add the fence to the list? I
>> missed that. Maybe easiest is just to sample seqno_recv here under the lock,
>> and signal the fence directly if the seqno was already written. Thanks for
>> catching that.
>>
> 
> Yep, that is a race is the later patches, in particular this snippet, right?
> 
>   	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
>   				    G2H_LEN_DW_TLB_INVALIDATE, 1);
>   	if (!ret && fence) {
> +		spin_lock_irq(&gt->tlb_invalidation.pending_lock);
>   		fence->invalidation_time = ktime_get();
> -		if (queue_work)
> +		list_add_tail(&fence->link,
> +			      &gt->tlb_invalidation.pending_fences);
> +
> +		if (list_is_singular(&gt->tlb_invalidation.pending_fences))
>   			queue_delayed_work(system_wq,
>   					   &gt->tlb_invalidation.fence_tdr,
>   					   TLB_TIMEOUT);
> +
> +		spin_unlock_irq(&gt->tlb_invalidation.pending_lock);
> +	} else if (ret < 0 && fence) {
> +		trace_xe_gt_tlb_invalidation_fence_signal(fence);
> +		dma_fence_signal(&fence->base);
> +		dma_fence_put(&fence->base);
>   	}
> 
> So before the list_add_tail we should check the recv seqno and possibly
> signal the fence? That makes sense to me.

Yup, exactly that.

>   
>> I don't think there is any race with updating tlb_invalidation.seqno, the
>> receiver side only considers the seqno_recv. So not sure it matters where we
>> increment tlb_invalidation.seqno so long as we are under ct->lock and
>> ideally don't leave it incremented on error for the next user. I can do the
>> roll back instead if your prefer?
>>
> 
> Right, so this patch itself is actually fine. It patch #7 as discussed
> above that needs a fix.
> 
> Ok, with that:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>

Thanks.

> 
>>>
>>> Matt
>>>
>>>> +	if (!ret) {
>>>> +		gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>>>> +			TLB_INVALIDATION_SEQNO_MAX;
>>>> +		if (!gt->tlb_invalidation.seqno)
>>>> +			gt->tlb_invalidation.seqno = 1;
>>>>    		ret = seqno;
>>>> +	}
>>>>    	if (ret < 0 && fence)
>>>>    		invalidation_fence_signal(fence);
>>>>    	mutex_unlock(&guc->ct.lock);
>>>> -- 
>>>> 2.41.0
>>>>


More information about the Intel-xe mailing list