[PATCH v3 0/9] AMDGPU Usermode queues
Shashank Sharma
shashank.sharma at amd.com
Tue Apr 11 09:48:37 UTC 2023
On 11/04/2023 11:37, Christian König wrote:
> Am 10.04.23 um 16:26 schrieb Shashank Sharma:
>>
>> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
>>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
>>> <shashank.sharma at amd.com> wrote:
>>>>
>>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
>>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>> [AMD Official Use Only - General]
>>>>>>
>>>>>> Hello Bas,
>>>>>>
>>>>>> This is not the correct interpretation of the code, the
>>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser
>>>>>> has to exclusively call it.
>>>>>>
>>>>>> Please see the sample test program in the existing libDRM series
>>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which
>>>>>> does the destroy_OP
>>>>>>
>>>>>> for the IOCTL.
>>>>> In the presence of crashes the kernel should always be able to clean
>>>>> this up no? Otherwise there is a resource leak?
>>>> The crash handling is the same as any of the existing GPU resource
>>>> which
>>>> are allocated and freed with IOCTL_OPs.
>>> Most of those are handled in the when the DRM fd gets closed (i.e.
>>> when the process exits):
>>>
>>> - buffers through drm_gem_release()
>>> - mappings in amdgpu_vm_fini
>>> - contexts in amdgpu_ctx_mgr_fini
>>>
>>> etc.
>>>
>>> Why would we do things differently for userspace queues? It doesn't
>>> look complicated looking at the above patch (which does seem to work).
>>
>> As the code is in initial stage, I have not given much thoughts about
>> handling resource leak due to app crash, but this seems like a good
>> suggestion.
>>
>> I am taking a note and will try to accommodate this in an upcoming
>> version of the series.
>
> Bas is right, the application doesn't necessary needs to clean up on
> exit (but it's still good custody to do so).
>
> See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
> ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
>
Thanks for the pointers Christian,
I also feel like that its good to have this cleanup for those apps which
did not clean-up themselves (due to crash or coding error).
So something like,
on close_fd,
for_idr_each {
get_queue()
if (queue)
free_queue
}
But we will also keep the queue_free_OP as well, so that if an app
allocate multiple queues, and wants to free some in between, it can do it.
- Shashank
> Regards,
> Christian.
>
>>
>> - Shashank
>>
>>>> To be honest a crash handling can be very elaborated and complex one,
>>>> and hence only can be done at the driver unload IMO, which doesn't
>>>> help
>>>> at that stage,
>>>>
>>>> coz anyways driver will re-allocate the resources on next load.
>>>>
>>>> - Shashank
>>>>
>>>>>> - Shashank
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>> Sent: 10 April 2023 11:26
>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix
>>>>>> <Felix.Kuehling at amd.com>; Koenig, Christian
>>>>>> <Christian.Koenig at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>
>>>>>> Hi Shashank,
>>>>>>
>>>>>> I think I found the issue: I wasn't destroying the user queue in
>>>>>> my program and the kernel doesn't clean up any remaining user
>>>>>> queues in the postclose hook. I think we need something like
>>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> While running things multiple times now works, I still have
>>>>>> problems doing multiple submissions from the same queue. Looking
>>>>>> forward to the updated test/sample
>>>>>>
>>>>>> Thanks,
>>>>>> Bas
>>>>>>
>>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
>>>>>> <Shashank.Sharma at amd.com> wrote:
>>>>>>> [AMD Official Use Only - General]
>>>>>>>
>>>>>>> Hello Bas,
>>>>>>> Thanks for trying this out.
>>>>>>>
>>>>>>> This could be due to the doorbell as you mentioned, Usermode
>>>>>>> queue uses doorbell manager internally.
>>>>>>> This week, we are planning to publis the latest libDRM sample
>>>>>>> code which uses a doorbell object (instead of the doorbell hack
>>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
>>>>>>> We have tested this full stack (libDRM test + Usermode queue +
>>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well
>>>>>>> for us.
>>>>>>>
>>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to
>>>>>>> build
>>>>>>> your kernel:
>>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
>>>>>>>
>>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
>>>>>>> changes with doorbell objects.
>>>>>>>
>>>>>>> Regards
>>>>>>> Shashank
>>>>>>> -----Original Message-----
>>>>>>> From: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>> Sent: 10 April 2023 02:37
>>>>>>> To: Sharma, Shashank <Shashank.Sharma at amd.com>
>>>>>>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>>>>>>> <Alexander.Deucher at amd.com>; Kuehling, Felix
>>>>>>> <Felix.Kuehling at amd.com>;
>>>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
>>>>>>>
>>>>>>> Hi Shashank,
>>>>>>>
>>>>>>> I tried writing a program to experiment with usermode queues and
>>>>>>> I found some weird behavior: The first run of the program works
>>>>>>> as expected, while subsequent runs don't seem to do anything
>>>>>>> (and I allocate everything in GTT, so it should be zero
>>>>>>> initialized consistently). Is this a known issue?
>>>>>>>
>>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl
>>>>>>> so it could very well be that I do the doorbell wrong
>>>>>>> (especially since the ioctl implementation was never shared
>>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write
>>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it
>>>>>>> possible to update the test in libdrm?
>>>>>>>
>>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bas
>>>>>>>
>>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
>>>>>>> <shashank.sharma at amd.com> wrote:
>>>>>>>> This patch series introduces AMDGPU usermode queues for gfx
>>>>>>>> workloads.
>>>>>>>> Usermode queues is a method of GPU workload submission into the
>>>>>>>> graphics hardware without any interaction with kernel/DRM
>>>>>>>> schedulers.
>>>>>>>> In this method, a userspace graphics application can create its
>>>>>>>> own
>>>>>>>> workqueue and submit it directly in the GPU HW.
>>>>>>>>
>>>>>>>> The general idea of how this is supposed to work:
>>>>>>>> - The application creates the following GPU objetcs:
>>>>>>>> - A queue object to hold the workload packets.
>>>>>>>> - A read pointer object.
>>>>>>>> - A write pointer object.
>>>>>>>> - A doorbell page.
>>>>>>>> - The application picks a 32-bit offset in the doorbell page
>>>>>>>> for this queue.
>>>>>>>> - The application uses the usermode_queue_create IOCTL
>>>>>>>> introduced in
>>>>>>>> this patch, by passing the the GPU addresses of these
>>>>>>>> objects (read
>>>>>>>> ptr, write ptr, queue base address and 32-bit doorbell
>>>>>>>> offset from
>>>>>>>> the doorbell page)
>>>>>>>> - The kernel creates the queue and maps it in the HW.
>>>>>>>> - The application can start submitting the data in the queue as
>>>>>>>> soon as
>>>>>>>> the kernel IOCTL returns.
>>>>>>>> - After filling the workload data in the queue, the app must
>>>>>>>> write the
>>>>>>>> number of dwords added in the queue into the doorbell
>>>>>>>> offset, and the
>>>>>>>> GPU will start fetching the data.
>>>>>>>>
>>>>>>>> libDRM changes for this series and a sample DRM test program
>>>>>>>> can be
>>>>>>>> found in the MESA merge request here:
>>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>>>>>>>
>>>>>>>> This patch series depends on the doorbell-manager changes,
>>>>>>>> which are
>>>>>>>> being reviewed here:
>>>>>>>> https://patchwork.freedesktop.org/series/115802/
>>>>>>>>
>>>>>>>> Alex Deucher (1):
>>>>>>>> drm/amdgpu: UAPI for user queue management
>>>>>>>>
>>>>>>>> Arvind Yadav (2):
>>>>>>>> drm/amdgpu: add new parameters in v11_struct
>>>>>>>> drm/amdgpu: map wptr BO into GART
>>>>>>>>
>>>>>>>> Shashank Sharma (6):
>>>>>>>> drm/amdgpu: add usermode queue base code
>>>>>>>> drm/amdgpu: add new IOCTL for usermode queue
>>>>>>>> drm/amdgpu: create GFX-gen11 MQD for userqueue
>>>>>>>> drm/amdgpu: create context space for usermode queue
>>>>>>>> drm/amdgpu: map usermode queue into MES
>>>>>>>> drm/amdgpu: generate doorbell index for userqueue
>>>>>>>>
>>>>>>>> drivers/gpu/drm/amd/amdgpu/Makefile | 3 +
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +-
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 +
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
>>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
>>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
>>>>>>>> .../gpu/drm/amd/include/amdgpu_userqueue.h | 66 ++++
>>>>>>>> drivers/gpu/drm/amd/include/v11_structs.h | 16 +-
>>>>>>>> include/uapi/drm/amdgpu_drm.h | 55 ++++
>>>>>>>> 9 files changed, 677 insertions(+), 9 deletions(-) create mode
>>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>>>>>> create mode 100644
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
>>>>>>>> create mode 100644
>>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.40.0
>>>>>>>>
>
More information about the amd-gfx
mailing list