[PATCH libdrm] amdgpu: add a faster BO list API

Thu Jan 10 11:51:03 UTC 2019

Am 10.01.19 um 12:41 schrieb Marek Olšák:
>
>
> On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian 
> <Christian.Koenig at amd.com <mailto:Christian.Koenig at amd.com> wrote:
>
>     Am 10.01.19 um 00:39 schrieb Marek Olšák:
>>     On Wed, Jan 9, 2019 at 1:41 PM Christian König
>>     <ckoenig.leichtzumerken at gmail.com
>>     <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>>
>>         Am 09.01.19 um 17:14 schrieb Marek Olšák:
>>>         On Wed, Jan 9, 2019 at 8:09 AM Christian König
>>>         <ckoenig.leichtzumerken at gmail.com
>>>         <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>>>
>>>             Am 09.01.19 um 13:36 schrieb Marek Olšák:
>>>>
>>>>
>>>>             On Wed, Jan 9, 2019, 5:28 AM Christian König
>>>>             <ckoenig.leichtzumerken at gmail.com
>>>>             <mailto:ckoenig.leichtzumerken at gmail.com> wrote:
>>>>
>>>>                 Looks good, but I'm wondering what's the actual
>>>>                 improvement?
>>>>
>>>>
>>>>             No malloc calls and 1 less for loop copying the bo list.
>>>
>>>             Yeah, but didn't we want to get completely rid of the bo
>>>             list?
>>>
>>>
>>>         If we have multiple IBs (e.g. gfx + compute) that share a BO
>>>         list, I think it's faster to send the BO list to the kernel
>>>         only once.
>>
>>         That's not really faster.
>>
>>         The only thing we safe us is a single loop over all BOs to
>>         lockup the handle into a pointer and that is only a tiny
>>         fraction of the overhead.
>>
>>         The majority of the overhead is locking the BOs and reserving
>>         space for the submission.
>>
>>         What could really help here is to submit gfx+comput together
>>         in just one CS IOCTL. This way we would need the locking and
>>         space reservation only once.
>>
>>         It's a bit of work in the kernel side, but certainly doable.
>>
>>
>>     OK. Any objections to this patch?
>
>     In general I'm wondering if we couldn't avoid adding so much new
>     interface.
>
>
> There are Vulkan drivers that still use the bo_list interface.
>
>
>     For example we can avoid the malloc() when we just cache the last
>     freed bo_list structure in the device. We would just need an
>     atomic pointer exchange operation for that.
>
>
>     This way we even don't need to change mesa at all.
>
>
> There is still the for loop that we need to get rid of.

Yeah, but that I'm fine to handle with a amdgpu_bo_list_create_raw which 
only takes the handles and still returns the amdgpu_bo_list structure we 
are used to.

See what I'm mostly concerned about is having another CS function to 
maintain.

>
>
>     Regarding optimization, this chunk can be replaced by a cast on 64bit:
>>     +	chunk_array = alloca(sizeof(uint64_t) * num_chunks);
>>     +	for (i = 0; i < num_chunks; i++)
>>     +		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
>
> It can't. The input is an array of structures. The ioctl takes an 
> array of pointers.

Ah! Haven't seen this, sorry for the noise.

Christian.

>
> Marek
>
>
>     Regards,
>     Christian.
>
>>
>>     Thanks,
>>     Marek
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190110/db7fb88e/attachment-0001.html>