[PATCH v3 0/9] AMDGPU Usermode queues

Shashank Sharma shashank.sharma at amd.com
Tue Apr 11 09:48:37 UTC 2023


On 11/04/2023 11:37, Christian König wrote:
> Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>>
>> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma 
>>> <shashank.sharma at amd.com> wrote:
>>>>
>>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>> [AMD Official Use Only - General]
>>>>>>
>>>>>> Hello Bas,
>>>>>>
>>>>>> This is not the correct interpretation of the code, the 
>>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser 
>>>>>> has to exclusively call  it.
>>>>>>
>>>>>> Please see the sample test program in the existing libDRM series 
>>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which 
>>>>>> does the destroy_OP
>>>>>>
>>>>>> for the IOCTL.
>>>>> In the presence of crashes the kernel should always be able to clean
>>>>> this up no? Otherwise there is a resource leak?
>>>> The crash handling is the same as any of the existing GPU resource 
>>>> which
>>>> are allocated and freed with IOCTL_OPs.
>>> Most of those are handled in the when the DRM fd gets closed (i.e.
>>> when the process exits):
>>>
>>> - buffers through drm_gem_release()
>>> - mappings in amdgpu_vm_fini
>>> - contexts in amdgpu_ctx_mgr_fini
>>>
>>> etc.
>>>
>>> Why would we do things differently for userspace queues? It doesn't
>>> look complicated looking at the above patch (which does seem to work).
>>
>> As the code is in initial stage, I have not given much thoughts about 
>> handling resource leak due to app crash, but this seems like a good 
>> suggestion.
>>
>> I am taking a note and will try to accommodate this in an upcoming 
>> version of the series.
>
> Bas is right, the application doesn't necessary needs to clean up on 
> exit (but it's still good custody to do so).
>
> See amdgpu_driver_postclose_kms() for how we cleanup (for example) the 
> ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
>
Thanks for the pointers Christian,

I also feel like that its good to have this cleanup for those apps which 
did not clean-up themselves (due to crash or coding error).

So something like,

on close_fd,

for_idr_each {

    get_queue()

    if (queue)

        free_queue

}

But we will also keep the queue_free_OP as well, so that if an app 
allocate multiple queues, and wants to free some in between, it can do it.

- Shashank

> Regards,
> Christian.
>
>>
>> - Shashank
>>
>>>> To be honest a crash handling can be very elaborated and complex one,
>>>> and hence only can be done at the driver unload IMO, which doesn't 
>>>> help
>>>> at that stage,
>>>>
>>>> coz anyways driver will re-allocate the resources on next load.
>>>>
>>>> - Shashank
>>>>
>>>>>> - Shashank
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>> Sent: 10 April 2023 11:26
>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander 
>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix 
>>>>>> <Felix.Kuehling at amd.com>; Koenig, Christian 
>>>>>> <Christian.Koenig at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>
>>>>>> Hi Shashank,
>>>>>>
>>>>>> I think I found the issue: I wasn't destroying the user queue in 
>>>>>> my program and the kernel doesn't clean up any remaining user 
>>>>>> queues in the postclose hook. I think we need something like
>>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> While running things multiple times now works, I still have 
>>>>>> problems doing multiple submissions from the same queue. Looking 
>>>>>> forward to the updated test/sample
>>>>>>
>>>>>> Thanks,
>>>>>> Bas
>>>>>>
>>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank 
>>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>>> [AMD Official Use Only - General]
>>>>>>>
>>>>>>> Hello Bas,
>>>>>>> Thanks for trying this out.
>>>>>>>
>>>>>>> This could be due to the doorbell as you mentioned, Usermode 
>>>>>>> queue uses doorbell manager internally.
>>>>>>> This week, we are planning to publis the latest libDRM sample 
>>>>>>> code which uses a doorbell object (instead of the doorbell hack 
>>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>>> We have tested this full stack (libDRM test + Usermode queue + 
>>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well 
>>>>>>> for us.
>>>>>>>
>>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to 
>>>>>>> build
>>>>>>> your kernel:
>>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr 
>>>>>>>
>>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>>> changes with doorbell objects.
>>>>>>>
>>>>>>> Regards
>>>>>>> Shashank
>>>>>>> -----Original Message-----
>>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>> Sent: 10 April 2023 02:37
>>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix 
>>>>>>> <Felix.Kuehling at amd.com>;
>>>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>
>>>>>>> Hi Shashank,
>>>>>>>
>>>>>>> I tried writing a program to experiment with usermode queues and 
>>>>>>> I found some weird behavior: The first run of the program works 
>>>>>>> as expected, while subsequent runs don't seem to do anything 
>>>>>>> (and I allocate everything in GTT, so it should be zero 
>>>>>>> initialized consistently). Is this a known issue?
>>>>>>>
>>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl 
>>>>>>> so it could very well be that I do the doorbell wrong 
>>>>>>> (especially since the ioctl implementation was never shared 
>>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write 
>>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it 
>>>>>>> possible to update the test in libdrm?
>>>>>>>
>>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bas
>>>>>>>
>>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma 
>>>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>>> This patch series introduces AMDGPU usermode queues for gfx 
>>>>>>>> workloads.
>>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>>> graphics hardware without any interaction with kernel/DRM 
>>>>>>>> schedulers.
>>>>>>>> In this method, a userspace graphics application can create its 
>>>>>>>> own
>>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>>
>>>>>>>> The general idea of how this is supposed to work:
>>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>>     - A queue object to hold the workload packets.
>>>>>>>>     - A read pointer object.
>>>>>>>>     - A write pointer object.
>>>>>>>>     - A doorbell page.
>>>>>>>> - The application picks a 32-bit offset in the doorbell page 
>>>>>>>> for this queue.
>>>>>>>> - The application uses the usermode_queue_create IOCTL 
>>>>>>>> introduced in
>>>>>>>>     this patch, by passing the the GPU addresses of these 
>>>>>>>> objects (read
>>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell 
>>>>>>>> offset from
>>>>>>>>     the doorbell page)
>>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>>> - The application can start submitting the data in the queue as 
>>>>>>>> soon as
>>>>>>>>     the kernel IOCTL returns.
>>>>>>>> - After filling the workload data in the queue, the app must 
>>>>>>>> write the
>>>>>>>>     number of dwords added in the queue into the doorbell 
>>>>>>>> offset, and the
>>>>>>>>     GPU will start fetching the data.
>>>>>>>>
>>>>>>>> libDRM changes for this series and a sample DRM test program 
>>>>>>>> can be
>>>>>>>> found in the MESA merge request here:
>>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>>
>>>>>>>> This patch series depends on the doorbell-manager changes, 
>>>>>>>> which are
>>>>>>>> being reviewed here:
>>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>>
>>>>>>>> Alex Deucher (1):
>>>>>>>>     drm/amdgpu: UAPI for user queue management
>>>>>>>>
>>>>>>>> Arvind Yadav (2):
>>>>>>>>     drm/amdgpu: add new parameters in v11_struct
>>>>>>>>     drm/amdgpu: map wptr BO into GART
>>>>>>>>
>>>>>>>> Shashank Sharma (6):
>>>>>>>>     drm/amdgpu: add usermode queue base code
>>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>>     drm/amdgpu: create context space for usermode queue
>>>>>>>>     drm/amdgpu: map usermode queue into MES
>>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
>>>>>>>>
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
>>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
>>>>>>>>    include/uapi/drm/amdgpu_drm.h                 | 55 ++++
>>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>>    create mode 100644
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>>    create mode 100644 
>>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.40.0
>>>>>>>>
>


More information about the amd-gfx mailing list