[Intel-xe] [PATCH v4 3/7] drm/xe/tlb: increment next seqno after successful CT send

Thu Jul 6 09:42:37 UTC 2023

On 06/07/2023 04:59, Matthew Brost wrote:
> On Wed, Jul 05, 2023 at 05:06:06PM +0100, Matthew Auld wrote:
>> If we are in the middle of a GT reset or similar the CT might be
>> disabled, such that the CT send fails. However we already incremented
>> gt->tlb_invalidation.seqno which might lead to warnings, since we
>> effectively just skipped a seqno:
>>
>>      0000:00:02.0: drm_WARN_ON(expected_seqno != msg[0])
>>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: José Roberto de Souza <jose.souza at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> index 2fcb477604e2..b38da572d268 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> @@ -124,10 +124,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>   		trace_xe_gt_tlb_invalidation_fence_send(fence);
>>   	}
>>   	action[1] = seqno;
>> -	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>> -		TLB_INVALIDATION_SEQNO_MAX;
>> -	if (!gt->tlb_invalidation.seqno)
>> -		gt->tlb_invalidation.seqno = 1;
>>   	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
>>   				    G2H_LEN_DW_TLB_INVALIDATE, 1);
>>   	if (!ret && fence) {
>> @@ -137,8 +133,13 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>   					   &gt->tlb_invalidation.fence_tdr,
>>   					   TLB_TIMEOUT);
>>   	}
>> -	if (!ret)
> 
> Do we now (after this entire series) have the another race where the
> below warn could fire as the CT fast path executes before we update the
> seqno value? Would it be better to just roll back the seqno on error?

Ohh, so we do the CT send, we process the g2h message double quick on 
the fast-path, before we even had a chance to add the fence to the list? 
I missed that. Maybe easiest is just to sample seqno_recv here under the 
lock, and signal the fence directly if the seqno was already written. 
Thanks for catching that.

I don't think there is any race with updating tlb_invalidation.seqno, 
the receiver side only considers the seqno_recv. So not sure it matters 
where we increment tlb_invalidation.seqno so long as we are under 
ct->lock and ideally don't leave it incremented on error for the next 
user. I can do the roll back instead if your prefer?

> 
> Matt
> 
>> +	if (!ret) {
>> +		gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>> +			TLB_INVALIDATION_SEQNO_MAX;
>> +		if (!gt->tlb_invalidation.seqno)
>> +			gt->tlb_invalidation.seqno = 1;
>>   		ret = seqno;
>> +	}
>>   	if (ret < 0 && fence)
>>   		invalidation_fence_signal(fence);
>>   	mutex_unlock(&guc->ct.lock);
>> -- 
>> 2.41.0
>>