[PATCH] drm/amdgpu: optimize the padding with hw optimization

Wed Aug 7 08:21:27 UTC 2024

On 04/08/2024 19:11, Marek Olšák wrote:
> On Thu, Aug 1, 2024 at 2:55 PM Marek Olšák <maraeo at gmail.com> wrote:
>>
>> On Thu, Aug 1, 2024, 03:37 Christian König <christian.koenig at amd.com> wrote:
>>>
>>> Am 01.08.24 um 08:53 schrieb Marek Olšák:
>>>
>>> On Thu, Aug 1, 2024, 00:28 Khatri, Sunil <sukhatri at amd.com> wrote:
>>>>
>>>>
>>>> On 8/1/2024 8:49 AM, Marek Olšák wrote:
>>>>>> +       /* Header is at index 0, followed by num_nops - 1 NOP packet's */
>>>>>> +       for (i = 1; i < num_nop; i++)
>>>>>> +               amdgpu_ring_write(ring, ring->funcs->nop);
>>>>> This loop should be removed. It's unnecessary CPU overhead and we
>>>>> should never get more than 0x3fff NOPs (maybe use BUG_ON). Leaving the
>>>>> whole packet body uninitialized is the fastest option.
>>>> That was the original intent to just move the WPTR for the no of nops
>>>> and tried too. Based on Christian inputs we should not let the nops packet
>>>>
>>>> as garbage or whatever was there originally as a threat/safety measure.
>>>
>>>
>>> It doesn't help safety. It can only be read by the GPU with kernel-level permissions.
>>>
>>> Initializing the packet body is useless and adds CPU overhead, especially with the 256 NOPs or so that we use for no reason.
>>>
>>>
>>> Not filling the remaining ring buffers with NOPs is a pretty clear NAK from my side. Leaving garbage in the ring buffer is not even remotely defensive.
>>
>>
>> What are you defending against? You know the ring is kernel-owned memory, right?
> 
> This was pushed without any justification why you need to clear
> kernel-allocated memory with some constant number up to 30000 times
> per second that only the kernel can read.

I see that this seems to be controversial, but FWIW, if the loop ends up 
staying, at least we could replace it with memset32 as I have shown in 
https://lore.kernel.org/amd-gfx/20240715104026.6311-1-tursulin@igalia.com/ 
that the inefficient amdgpu_ring_write can show up in the profile.

And also maybe consider other than gfx? Again, I did something in 
https://lore.kernel.org/amd-gfx/20240712152855.45284-4-tursulin@igalia.com/, 
but AMD folks will know if there is a similar (like in this series) 
approach which also improves the GPU side processing and not just CPU side.

Regards,

Tvrtko