[PATCH v3 0/9] AMDGPU Usermode queues

Shashank Sharma shashank.sharma at amd.com
Tue Apr 11 10:55:16 UTC 2023


On 11/04/2023 12:00, Bas Nieuwenhuizen wrote:
> On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma
> <shashank.sharma at amd.com> wrote:
>>
>> On 11/04/2023 11:37, Christian König wrote:
>>> Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>>>> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>>>>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
>>>>> <shashank.sharma at amd.com> wrote:
>>>>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>>>> [AMD Official Use Only - General]
>>>>>>>>
>>>>>>>> Hello Bas,
>>>>>>>>
>>>>>>>> This is not the correct interpretation of the code, the
>>>>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser
>>>>>>>> has to exclusively call  it.
>>>>>>>>
>>>>>>>> Please see the sample test program in the existing libDRM series
>>>>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which
>>>>>>>> does the destroy_OP
>>>>>>>>
>>>>>>>> for the IOCTL.
>>>>>>> In the presence of crashes the kernel should always be able to clean
>>>>>>> this up no? Otherwise there is a resource leak?
>>>>>> The crash handling is the same as any of the existing GPU resource
>>>>>> which
>>>>>> are allocated and freed with IOCTL_OPs.
>>>>> Most of those are handled in the when the DRM fd gets closed (i.e.
>>>>> when the process exits):
>>>>>
>>>>> - buffers through drm_gem_release()
>>>>> - mappings in amdgpu_vm_fini
>>>>> - contexts in amdgpu_ctx_mgr_fini
>>>>>
>>>>> etc.
>>>>>
>>>>> Why would we do things differently for userspace queues? It doesn't
>>>>> look complicated looking at the above patch (which does seem to work).
>>>> As the code is in initial stage, I have not given much thoughts about
>>>> handling resource leak due to app crash, but this seems like a good
>>>> suggestion.
>>>>
>>>> I am taking a note and will try to accommodate this in an upcoming
>>>> version of the series.
>>> Bas is right, the application doesn't necessary needs to clean up on
>>> exit (but it's still good custody to do so).
>>>
>>> See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
>>> ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
>>>
>> Thanks for the pointers Christian,
>>
>> I also feel like that its good to have this cleanup for those apps which
>> did not clean-up themselves (due to crash or coding error).
> I think the patch I linked earlier does exactly this: keep the IOCTL,
> but on fini goes through the list and destroys the queue:
> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29

Yep, just needs additional check to free only when its not already 
freed, like doble free check. Will try to reuse most of it.

- Shashank

>> So something like,
>>
>> on close_fd,
>>
>> for_idr_each {
>>
>>      get_queue()
>>
>>      if (queue)
>>
>>          free_queue
>>
>> }
>>
>> But we will also keep the queue_free_OP as well, so that if an app
>> allocate multiple queues, and wants to free some in between, it can do it.
>>
>> - Shashank
>>
>>> Regards,
>>> Christian.
>>>
>>>> - Shashank
>>>>
>>>>>> To be honest a crash handling can be very elaborated and complex one,
>>>>>> and hence only can be done at the driver unload IMO, which doesn't
>>>>>> help
>>>>>> at that stage,
>>>>>>
>>>>>> coz anyways driver will re-allocate the resources on next load.
>>>>>>
>>>>>> - Shashank
>>>>>>
>>>>>>>> - Shashank
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>>> Sent: 10 April 2023 11:26
>>>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix
>>>>>>>> <Felix.Kuehling at amd.com>; Koenig, Christian
>>>>>>>> <Christian.Koenig at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
>>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>>
>>>>>>>> Hi Shashank,
>>>>>>>>
>>>>>>>> I think I found the issue: I wasn't destroying the user queue in
>>>>>>>> my program and the kernel doesn't clean up any remaining user
>>>>>>>> queues in the postclose hook. I think we need something like
>>>>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> While running things multiple times now works, I still have
>>>>>>>> problems doing multiple submissions from the same queue. Looking
>>>>>>>> forward to the updated test/sample
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bas
>>>>>>>>
>>>>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
>>>>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>>>>> [AMD Official Use Only - General]
>>>>>>>>>
>>>>>>>>> Hello Bas,
>>>>>>>>> Thanks for trying this out.
>>>>>>>>>
>>>>>>>>> This could be due to the doorbell as you mentioned, Usermode
>>>>>>>>> queue uses doorbell manager internally.
>>>>>>>>> This week, we are planning to publis the latest libDRM sample
>>>>>>>>> code which uses a doorbell object (instead of the doorbell hack
>>>>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>>>>> We have tested this full stack (libDRM test + Usermode queue +
>>>>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well
>>>>>>>>> for us.
>>>>>>>>>
>>>>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to
>>>>>>>>> build
>>>>>>>>> your kernel:
>>>>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
>>>>>>>>>
>>>>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>>>>> changes with doorbell objects.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Shashank
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>>>> Sent: 10 April 2023 02:37
>>>>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix
>>>>>>>>> <Felix.Kuehling at amd.com>;
>>>>>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>>>
>>>>>>>>> Hi Shashank,
>>>>>>>>>
>>>>>>>>> I tried writing a program to experiment with usermode queues and
>>>>>>>>> I found some weird behavior: The first run of the program works
>>>>>>>>> as expected, while subsequent runs don't seem to do anything
>>>>>>>>> (and I allocate everything in GTT, so it should be zero
>>>>>>>>> initialized consistently). Is this a known issue?
>>>>>>>>>
>>>>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl
>>>>>>>>> so it could very well be that I do the doorbell wrong
>>>>>>>>> (especially since the ioctl implementation was never shared
>>>>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write
>>>>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it
>>>>>>>>> possible to update the test in libdrm?
>>>>>>>>>
>>>>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Bas
>>>>>>>>>
>>>>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
>>>>>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>>>>> This patch series introduces AMDGPU usermode queues for gfx
>>>>>>>>>> workloads.
>>>>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>>>>> graphics hardware without any interaction with kernel/DRM
>>>>>>>>>> schedulers.
>>>>>>>>>> In this method, a userspace graphics application can create its
>>>>>>>>>> own
>>>>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>>>>
>>>>>>>>>> The general idea of how this is supposed to work:
>>>>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>>>>      - A queue object to hold the workload packets.
>>>>>>>>>>      - A read pointer object.
>>>>>>>>>>      - A write pointer object.
>>>>>>>>>>      - A doorbell page.
>>>>>>>>>> - The application picks a 32-bit offset in the doorbell page
>>>>>>>>>> for this queue.
>>>>>>>>>> - The application uses the usermode_queue_create IOCTL
>>>>>>>>>> introduced in
>>>>>>>>>>      this patch, by passing the the GPU addresses of these
>>>>>>>>>> objects (read
>>>>>>>>>>      ptr, write ptr, queue base address and 32-bit doorbell
>>>>>>>>>> offset from
>>>>>>>>>>      the doorbell page)
>>>>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>>>>> - The application can start submitting the data in the queue as
>>>>>>>>>> soon as
>>>>>>>>>>      the kernel IOCTL returns.
>>>>>>>>>> - After filling the workload data in the queue, the app must
>>>>>>>>>> write the
>>>>>>>>>>      number of dwords added in the queue into the doorbell
>>>>>>>>>> offset, and the
>>>>>>>>>>      GPU will start fetching the data.
>>>>>>>>>>
>>>>>>>>>> libDRM changes for this series and a sample DRM test program
>>>>>>>>>> can be
>>>>>>>>>> found in the MESA merge request here:
>>>>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>>>>
>>>>>>>>>> This patch series depends on the doorbell-manager changes,
>>>>>>>>>> which are
>>>>>>>>>> being reviewed here:
>>>>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>>>>
>>>>>>>>>> Alex Deucher (1):
>>>>>>>>>>      drm/amdgpu: UAPI for user queue management
>>>>>>>>>>
>>>>>>>>>> Arvind Yadav (2):
>>>>>>>>>>      drm/amdgpu: add new parameters in v11_struct
>>>>>>>>>>      drm/amdgpu: map wptr BO into GART
>>>>>>>>>>
>>>>>>>>>> Shashank Sharma (6):
>>>>>>>>>>      drm/amdgpu: add usermode queue base code
>>>>>>>>>>      drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>>>>      drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>>>>      drm/amdgpu: create context space for usermode queue
>>>>>>>>>>      drm/amdgpu: map usermode queue into MES
>>>>>>>>>>      drm/amdgpu: generate doorbell index for userqueue
>>>>>>>>>>
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>>>>     .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
>>>>>>>>>>     drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
>>>>>>>>>>     include/uapi/drm/amdgpu_drm.h                 | 55 ++++
>>>>>>>>>>     9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>>>>     create mode 100644
>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>>>>     create mode 100644
>>>>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 2.40.0
>>>>>>>>>>


More information about the amd-gfx mailing list