[PATCH v3 0/9] AMDGPU Usermode queues

Christian König christian.koenig at amd.com
Tue Apr 11 09:37:59 UTC 2023


Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>
> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma 
>> <shashank.sharma at amd.com> wrote:
>>>
>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>> <Shashank.Sharma at amd.com> wrote:
>>>>> [AMD Official Use Only - General]
>>>>>
>>>>> Hello Bas,
>>>>>
>>>>> This is not the correct interpretation of the code, the 
>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser 
>>>>> has to exclusively call  it.
>>>>>
>>>>> Please see the sample test program in the existing libDRM series 
>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which does 
>>>>> the destroy_OP
>>>>>
>>>>> for the IOCTL.
>>>> In the presence of crashes the kernel should always be able to clean
>>>> this up no? Otherwise there is a resource leak?
>>> The crash handling is the same as any of the existing GPU resource 
>>> which
>>> are allocated and freed with IOCTL_OPs.
>> Most of those are handled in the when the DRM fd gets closed (i.e.
>> when the process exits):
>>
>> - buffers through drm_gem_release()
>> - mappings in amdgpu_vm_fini
>> - contexts in amdgpu_ctx_mgr_fini
>>
>> etc.
>>
>> Why would we do things differently for userspace queues? It doesn't
>> look complicated looking at the above patch (which does seem to work).
>
> As the code is in initial stage, I have not given much thoughts about 
> handling resource leak due to app crash, but this seems like a good 
> suggestion.
>
> I am taking a note and will try to accommodate this in an upcoming 
> version of the series.

Bas is right, the application doesn't necessary needs to clean up on 
exit (but it's still good custody to do so).

See amdgpu_driver_postclose_kms() for how we cleanup (for example) the 
ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.

Regards,
Christian.

>
> - Shashank
>
>>> To be honest a crash handling can be very elaborated and complex one,
>>> and hence only can be done at the driver unload IMO, which doesn't help
>>> at that stage,
>>>
>>> coz anyways driver will re-allocate the resources on next load.
>>>
>>> - Shashank
>>>
>>>>> - Shashank
>>>>>
>>>>> -----Original Message-----
>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>> Sent: 10 April 2023 11:26
>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander 
>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix 
>>>>> <Felix.Kuehling at amd.com>; Koenig, Christian 
>>>>> <Christian.Koenig at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>
>>>>> Hi Shashank,
>>>>>
>>>>> I think I found the issue: I wasn't destroying the user queue in 
>>>>> my program and the kernel doesn't clean up any remaining user 
>>>>> queues in the postclose hook. I think we need something like
>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 
>>>>>
>>>>> ?
>>>>>
>>>>> While running things multiple times now works, I still have 
>>>>> problems doing multiple submissions from the same queue. Looking 
>>>>> forward to the updated test/sample
>>>>>
>>>>> Thanks,
>>>>> Bas
>>>>>
>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank 
>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>> [AMD Official Use Only - General]
>>>>>>
>>>>>> Hello Bas,
>>>>>> Thanks for trying this out.
>>>>>>
>>>>>> This could be due to the doorbell as you mentioned, Usermode 
>>>>>> queue uses doorbell manager internally.
>>>>>> This week, we are planning to publis the latest libDRM sample 
>>>>>> code which uses a doorbell object (instead of the doorbell hack 
>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>> We have tested this full stack (libDRM test + Usermode queue + 
>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well 
>>>>>> for us.
>>>>>>
>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to 
>>>>>> build
>>>>>> your kernel:
>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr 
>>>>>>
>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>> changes with doorbell objects.
>>>>>>
>>>>>> Regards
>>>>>> Shashank
>>>>>> -----Original Message-----
>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>> Sent: 10 April 2023 02:37
>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix 
>>>>>> <Felix.Kuehling at amd.com>;
>>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>
>>>>>> Hi Shashank,
>>>>>>
>>>>>> I tried writing a program to experiment with usermode queues and 
>>>>>> I found some weird behavior: The first run of the program works 
>>>>>> as expected, while subsequent runs don't seem to do anything (and 
>>>>>> I allocate everything in GTT, so it should be zero initialized 
>>>>>> consistently). Is this a known issue?
>>>>>>
>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl 
>>>>>> so it could very well be that I do the doorbell wrong (especially 
>>>>>> since the ioctl implementation was never shared AFAICT?), but it 
>>>>>> seems like the kernel submissions (i.e. write wptr in dwords to 
>>>>>> the wptr va and to the doorbell). Is it possible to update the 
>>>>>> test in libdrm?
>>>>>>
>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>
>>>>>> Thanks,
>>>>>> Bas
>>>>>>
>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma 
>>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>> This patch series introduces AMDGPU usermode queues for gfx 
>>>>>>> workloads.
>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>> graphics hardware without any interaction with kernel/DRM 
>>>>>>> schedulers.
>>>>>>> In this method, a userspace graphics application can create its own
>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>
>>>>>>> The general idea of how this is supposed to work:
>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>     - A queue object to hold the workload packets.
>>>>>>>     - A read pointer object.
>>>>>>>     - A write pointer object.
>>>>>>>     - A doorbell page.
>>>>>>> - The application picks a 32-bit offset in the doorbell page for 
>>>>>>> this queue.
>>>>>>> - The application uses the usermode_queue_create IOCTL 
>>>>>>> introduced in
>>>>>>>     this patch, by passing the the GPU addresses of these 
>>>>>>> objects (read
>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell 
>>>>>>> offset from
>>>>>>>     the doorbell page)
>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>> - The application can start submitting the data in the queue as 
>>>>>>> soon as
>>>>>>>     the kernel IOCTL returns.
>>>>>>> - After filling the workload data in the queue, the app must 
>>>>>>> write the
>>>>>>>     number of dwords added in the queue into the doorbell 
>>>>>>> offset, and the
>>>>>>>     GPU will start fetching the data.
>>>>>>>
>>>>>>> libDRM changes for this series and a sample DRM test program can be
>>>>>>> found in the MESA merge request here:
>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>
>>>>>>> This patch series depends on the doorbell-manager changes, which 
>>>>>>> are
>>>>>>> being reviewed here:
>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>
>>>>>>> Alex Deucher (1):
>>>>>>>     drm/amdgpu: UAPI for user queue management
>>>>>>>
>>>>>>> Arvind Yadav (2):
>>>>>>>     drm/amdgpu: add new parameters in v11_struct
>>>>>>>     drm/amdgpu: map wptr BO into GART
>>>>>>>
>>>>>>> Shashank Sharma (6):
>>>>>>>     drm/amdgpu: add usermode queue base code
>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>     drm/amdgpu: create context space for usermode queue
>>>>>>>     drm/amdgpu: map usermode queue into MES
>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
>>>>>>>
>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           |   3 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  10 +-
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   6 +
>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    |  66 ++++
>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     |  16 +-
>>>>>>>    include/uapi/drm/amdgpu_drm.h                 |  55 ++++
>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>    create mode 100644
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>    create mode 100644 
>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.40.0
>>>>>>>



More information about the amd-gfx mailing list