[PATCH] drm/amdgpu: refine kiq access register

Wed Apr 22 12:33:08 UTC 2020

Am 22.04.20 um 14:20 schrieb Tao, Yintian:
> Hi  Christian
>
>
> Please see inline commetns.
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: 2020年4月22日 19:57
> To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
> Cc: amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register
>
> Am 22.04.20 um 13:49 schrieb Tao, Yintian:
>> Hi  Christian
>>
>>
>> Can you help answer the questions below? Thanks in advance.
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: 2020年4月22日 19:03
>> To: Tao, Yintian <Yintian.Tao at amd.com>; Liu, Monk <Monk.Liu at amd.com>;
>> Kuehling, Felix <Felix.Kuehling at amd.com>
>> Cc: amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register
>>
>> Am 22.04.20 um 11:29 schrieb Yintian Tao:
>>> According to the current kiq access register method, there will be
>>> race condition when using KIQ to read register if multiple clients
>>> want to read at same time just like the expample below:
>>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the
>>> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll
>>> the seqno-1 5. the kiq complete these two read operation 6. client-A
>>> to read the register at the wb buffer and
>>>       get REG-1 value
>>>
>>> And if there are multiple clients to frequently write registers
>>> through KIQ which may raise the KIQ ring buffer overwritten problem.
>>>
>>> Therefore, allocate fixed number wb slot for rreg use and limit the
>>> submit number which depends on the kiq ring_size in order to prevent
>>> the overwritten problem.
>>>
>>> v2: directly use amdgpu_device_wb_get() for each read instead
>>>        of to reserve fixde number slot.
>>>        if there is no enough kiq ring buffer or rreg slot then
>>>        directly print error log and return instead of busy waiting
>> I would split that into three patches. One for each problem we have here:
>>
>> 1. Fix kgd_hiq_mqd_load() and maybe other occasions to use spin_lock_irqsave().
>> [yttao]: Do you mean that we need to use spin_lock_irqsave for the functions just like kgd_hiq_mqd_load()?
> Yes, I strongly think so.
>
> See when you have one spin lock you either need always need to lock it with irqs disabled or never.
>
> In other words we always need to either use spin_lock() or spin_lock_irqsave(), but never mix them with the same lock.
>
> The only exception to this rule is when you take multiple locks, e.g.
> you can do:
>
> spin_lock_irqsave(&a, flags);
> spin_lock(&b, flags);
> spin_lock(&c, flags);
> ....
> spin_unlock_irqsave(&a, flags);
>
> Here you don't need to use spin_lock_irqsave for b and c. But we rarely have that case in the code.
> [yttao]: thanks , I got it. I will submit another patch for it.
>
>> 2. Prevent the overrung of the KIQ. Please drop the approach with the
>> atomic here. Instead just add a amdgpu_fence_wait_polling() into
>> amdgpu_fence_emit_polling() as I discussed with Monk.
>> [yttao]: Sorry, I can't get your original idea for the amdgpu_fence_wait_polling(). Can you give more details about it? Thanks in advance.
>>
>> "That is actually only a problem because the KIQ uses polling waits.
>>
>> See amdgpu_fence_emit() waits for the oldest possible fence to be signaled before emitting a new one.
>>
>> I suggest that we do the same in amdgpu_fence_emit_polling(). A one liner like the following should be enough:
>>
>> amdgpu_fence_wait_polling(ring, seq - ring->fence_drv.num_fences_mask, timeout);"
>> [yttao]: there is no usage of num_fences_mask at kiq fence polling, the num_fences_mask is only effective at dma_fence architecture.
>> 		If I understand correctly, do you want the protype code below? If the protype code is wrong, can you help give one sample? Thanks in advance.
>>
>> int amdgpu_fence_emit_polling(struct amdgpu_ring *ring, uint32_t *s) {
>>           uint32_t seq;
>>
>>           if (!s)
>>                   return -EINVAL;
>> +		amdgpu_fence_wait_polling(ring, seq, timeout);
>>           seq = ++ring->fence_drv.sync_seq;
> Your understanding sounds more or less correct. The code should look something like this:
>
> seq = ++ring->fence_drv.sync_seq;
> amdgpu_fence_wait_polling(ring, seq -
> number_of_allowed_submissions_to_the_kiq, timeout);
> [yttao]: whether we need directly wait at the first just like below? Otherwise, amdgpu_ring_emit_wreg may overwrite the KIQ ring buffer.

There should always be room for at least one more submission.

As long as we always submit a fence checking the free room there should 
be fine.

Regards,
Christian.

> +		amdgpu_fence_wait_polling(ring, seq - number_of_allowed_submissions_to_the_kiq, timeout);
> 		spin_lock_irqsave(&kiq->ring_lock, flags);
>          amdgpu_ring_alloc(ring, 32);
>          amdgpu_ring_emit_wreg(ring, reg, v);
>          amdgpu_fence_emit_polling(ring, &seq); /* wait */
>          amdgpu_ring_commit(ring);
>          spin_unlock_irqrestore(&kiq->ring_lock, flags);
>
> I just used num_fences_mask as number_of_allowed_submissions_to_the_kiq
> because it is probably a good value to start with.
>
> But you could give that as parameter as well if you think that makes more sense.
>
>>           amdgpu_ring_emit_fence(ring, ring->fence_drv.gpu_addr,
>>                           ¦      seq, 0);
>>
>>           *s = seq;
>>
>>           return 0;
>> }
>>
>>
>>
>>
>> 3. Use amdgpu_device_wb_get() each time we need to submit a read.
>> [yttao]: yes, I will do it.
> Thanks,
> Christian.
>
>> Regards,
>> Christian.
>>