[PATCH libdrm] amdgpu: add a faster BO list API

Marek Olšák maraeo at gmail.com
Thu Jan 10 12:25:26 UTC 2019


On Thu, Jan 10, 2019, 6:51 AM Christian König <
ckoenig.leichtzumerken at gmail.com wrote:

> Am 10.01.19 um 12:41 schrieb Marek Olšák:
>
>
>
> On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian <Christian.Koenig at amd.com
> wrote:
>
>> Am 10.01.19 um 00:39 schrieb Marek Olšák:
>>
>> On Wed, Jan 9, 2019 at 1:41 PM Christian König <
>> ckoenig.leichtzumerken at gmail.com> wrote:
>>
>>> Am 09.01.19 um 17:14 schrieb Marek Olšák:
>>>
>>> On Wed, Jan 9, 2019 at 8:09 AM Christian König <
>>> ckoenig.leichtzumerken at gmail.com> wrote:
>>>
>>>> Am 09.01.19 um 13:36 schrieb Marek Olšák:
>>>>
>>>>
>>>>
>>>> On Wed, Jan 9, 2019, 5:28 AM Christian König <
>>>> ckoenig.leichtzumerken at gmail.com wrote:
>>>>
>>>>> Looks good, but I'm wondering what's the actual improvement?
>>>>>
>>>>
>>>> No malloc calls and 1 less for loop copying the bo list.
>>>>
>>>>
>>>> Yeah, but didn't we want to get completely rid of the bo list?
>>>>
>>>
>>> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I
>>> think it's faster to send the BO list to the kernel only once.
>>>
>>>
>>> That's not really faster.
>>>
>>> The only thing we safe us is a single loop over all BOs to lockup the
>>> handle into a pointer and that is only a tiny fraction of the overhead.
>>>
>>> The majority of the overhead is locking the BOs and reserving space for
>>> the submission.
>>>
>>> What could really help here is to submit gfx+comput together in just one
>>> CS IOCTL. This way we would need the locking and space reservation only
>>> once.
>>>
>>> It's a bit of work in the kernel side, but certainly doable.
>>>
>>
>> OK. Any objections to this patch?
>>
>>
>> In general I'm wondering if we couldn't avoid adding so much new
>> interface.
>>
>
> There are Vulkan drivers that still use the bo_list interface.
>
>
>> For example we can avoid the malloc() when we just cache the last freed
>> bo_list structure in the device. We would just need an atomic pointer
>> exchange operation for that.
>>
>
>> This way we even don't need to change mesa at all.
>>
>
> There is still the for loop that we need to get rid of.
>
>
> Yeah, but that I'm fine to handle with a amdgpu_bo_list_create_raw which
> only takes the handles and still returns the amdgpu_bo_list structure we
> are used to.
>
> See what I'm mostly concerned about is having another CS function to
> maintain.
>

There is no maintenance cost. It's just a wrapper. Eventually all drivers
will switch to it.

Marek


>
>
>> Regarding optimization, this chunk can be replaced by a cast on 64bit:
>>
>> +	chunk_array = alloca(sizeof(uint64_t) * num_chunks);
>> +	for (i = 0; i < num_chunks; i++)
>> +		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
>>
>> It can't. The input is an array of structures. The ioctl takes an array
> of pointers.
>
>
> Ah! Haven't seen this, sorry for the noise.
>
> Christian.
>
>
> Marek
>
>
>> Regards,
>> Christian.
>>
>>
>> Thanks,
>> Marek
>>
>>
>>
> _______________________________________________
> amd-gfx mailing listamd-gfx at lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190110/4ec15e53/attachment.html>


More information about the amd-gfx mailing list