[PATCH] drm/amdgpu: optimize the padding with hw optimization

Sun Aug 4 18:11:45 UTC 2024

On Thu, Aug 1, 2024 at 2:55 PM Marek Olšák <maraeo at gmail.com> wrote:
>
> On Thu, Aug 1, 2024, 03:37 Christian König <christian.koenig at amd.com> wrote:
>>
>> Am 01.08.24 um 08:53 schrieb Marek Olšák:
>>
>> On Thu, Aug 1, 2024, 00:28 Khatri, Sunil <sukhatri at amd.com> wrote:
>>>
>>>
>>> On 8/1/2024 8:49 AM, Marek Olšák wrote:
>>> >> +       /* Header is at index 0, followed by num_nops - 1 NOP packet's */
>>> >> +       for (i = 1; i < num_nop; i++)
>>> >> +               amdgpu_ring_write(ring, ring->funcs->nop);
>>> > This loop should be removed. It's unnecessary CPU overhead and we
>>> > should never get more than 0x3fff NOPs (maybe use BUG_ON). Leaving the
>>> > whole packet body uninitialized is the fastest option.
>>> That was the original intent to just move the WPTR for the no of nops
>>> and tried too. Based on Christian inputs we should not let the nops packet
>>>
>>> as garbage or whatever was there originally as a threat/safety measure.
>>
>>
>> It doesn't help safety. It can only be read by the GPU with kernel-level permissions.
>>
>> Initializing the packet body is useless and adds CPU overhead, especially with the 256 NOPs or so that we use for no reason.
>>
>>
>> Not filling the remaining ring buffers with NOPs is a pretty clear NAK from my side. Leaving garbage in the ring buffer is not even remotely defensive.
>
>
> What are you defending against? You know the ring is kernel-owned memory, right?

This was pushed without any justification why you need to clear
kernel-allocated memory with some constant number up to 30000 times
per second that only the kernel can read.

Marek