[Intel-xe] [PATCH v4 3/7] drm/xe/tlb: increment next seqno after successful CT send
Matthew Auld
matthew.auld at intel.com
Thu Jul 6 15:22:28 UTC 2023
On 06/07/2023 16:15, Matthew Brost wrote:
> On Thu, Jul 06, 2023 at 10:42:37AM +0100, Matthew Auld wrote:
>> On 06/07/2023 04:59, Matthew Brost wrote:
>>> On Wed, Jul 05, 2023 at 05:06:06PM +0100, Matthew Auld wrote:
>>>> If we are in the middle of a GT reset or similar the CT might be
>>>> disabled, such that the CT send fails. However we already incremented
>>>> gt->tlb_invalidation.seqno which might lead to warnings, since we
>>>> effectively just skipped a seqno:
>>>>
>>>> 0000:00:02.0: drm_WARN_ON(expected_seqno != msg[0])
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>> Cc: José Roberto de Souza <jose.souza at intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 11 ++++++-----
>>>> 1 file changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> index 2fcb477604e2..b38da572d268 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> @@ -124,10 +124,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>>> trace_xe_gt_tlb_invalidation_fence_send(fence);
>>>> }
>>>> action[1] = seqno;
>>>> - gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>>>> - TLB_INVALIDATION_SEQNO_MAX;
>>>> - if (!gt->tlb_invalidation.seqno)
>>>> - gt->tlb_invalidation.seqno = 1;
>>>> ret = xe_guc_ct_send_locked(&guc->ct, action, len,
>>>> G2H_LEN_DW_TLB_INVALIDATE, 1);
>>>> if (!ret && fence) {
>>>> @@ -137,8 +133,13 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>>> >->tlb_invalidation.fence_tdr,
>>>> TLB_TIMEOUT);
>>>> }
>>>> - if (!ret)
>>>
>>> Do we now (after this entire series) have the another race where the
>>> below warn could fire as the CT fast path executes before we update the
>>> seqno value? Would it be better to just roll back the seqno on error?
>>
>> Ohh, so we do the CT send, we process the g2h message double quick on the
>> fast-path, before we even had a chance to add the fence to the list? I
>> missed that. Maybe easiest is just to sample seqno_recv here under the lock,
>> and signal the fence directly if the seqno was already written. Thanks for
>> catching that.
>>
>
> Yep, that is a race is the later patches, in particular this snippet, right?
>
> ret = xe_guc_ct_send_locked(&guc->ct, action, len,
> G2H_LEN_DW_TLB_INVALIDATE, 1);
> if (!ret && fence) {
> + spin_lock_irq(>->tlb_invalidation.pending_lock);
> fence->invalidation_time = ktime_get();
> - if (queue_work)
> + list_add_tail(&fence->link,
> + >->tlb_invalidation.pending_fences);
> +
> + if (list_is_singular(>->tlb_invalidation.pending_fences))
> queue_delayed_work(system_wq,
> >->tlb_invalidation.fence_tdr,
> TLB_TIMEOUT);
> +
> + spin_unlock_irq(>->tlb_invalidation.pending_lock);
> + } else if (ret < 0 && fence) {
> + trace_xe_gt_tlb_invalidation_fence_signal(fence);
> + dma_fence_signal(&fence->base);
> + dma_fence_put(&fence->base);
> }
>
> So before the list_add_tail we should check the recv seqno and possibly
> signal the fence? That makes sense to me.
Yup, exactly that.
>
>> I don't think there is any race with updating tlb_invalidation.seqno, the
>> receiver side only considers the seqno_recv. So not sure it matters where we
>> increment tlb_invalidation.seqno so long as we are under ct->lock and
>> ideally don't leave it incremented on error for the next user. I can do the
>> roll back instead if your prefer?
>>
>
> Right, so this patch itself is actually fine. It patch #7 as discussed
> above that needs a fix.
>
> Ok, with that:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
Thanks.
>
>>>
>>> Matt
>>>
>>>> + if (!ret) {
>>>> + gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
>>>> + TLB_INVALIDATION_SEQNO_MAX;
>>>> + if (!gt->tlb_invalidation.seqno)
>>>> + gt->tlb_invalidation.seqno = 1;
>>>> ret = seqno;
>>>> + }
>>>> if (ret < 0 && fence)
>>>> invalidation_fence_signal(fence);
>>>> mutex_unlock(&guc->ct.lock);
>>>> --
>>>> 2.41.0
>>>>
More information about the Intel-xe
mailing list