[PATCH v11 00/28] AMDGPU usermode queues
Sharma, Shashank
shashank.sharma at amd.com
Wed Sep 25 09:14:54 UTC 2024
On 19/09/2024 18:59, Alex Deucher wrote:
> On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma <shashank.sharma at amd.com> wrote:
>> This patch series introduces base code of AMDGPU usermode queues for gfx
>> workloads. Usermode queues is a method of GPU workload submission into the
>> graphics hardware without any interaction with kernel/DRM schedulers. In
>> this method, a userspace graphics application can create its own workqueue
>> and submit it directly in the GPU HW.
>>
>> The general idea of how Userqueues are supposed to work:
>> - The application creates the following GPU objetcs:
>> - A queue object to hold the workload packets.
>> - A read pointer object.
>> - A write pointer object.
>> - A doorbell page.
>> - Other supporting buffer objects as per target IP engine (shadow, GDS
>> etc, information available with AMDGPU_INFO_IOCTL)
> the queue, rptr, wptr, and metadata buffers don't have to be separate
> buffers. Userspace could suballocate them out of the same buffer. We
> just need the virtual addresses. However, we need to keep track of
> the GPU virtual addresses used by the user queue for these buffers and
> prevent them from being unmapped until the queue is destroyed, similar
> to what we do on the KFD side. Otherwise, the user could unmap one of
> the buffers and submit work to the user queue which could cause it to
> hang.
Noted, thanks Alex.
> Alex
>
>> - The application picks a 32-bit offset in the doorbell page for this
>> queue.
>> - The application uses the usermode_queue_create IOCTL introduced in
>> this patch, by passing the GPU addresses of these objects (read ptr,
>> write ptr, queue base address, shadow, gds) with doorbell object and
>> 32-bit doorbell offset in the doorbell page.
>> - The kernel creates the queue and maps it in the HW.
>> - The application maps the GPU buffers in process address space.
>> - The application can start submitting the data in the queue as soon as
>> the kernel IOCTL returns.
>> - After filling the workload data in the queue, the app must write the
>> number of dwords added in the queue into the doorbell offset and the
>> WPTR buffer. The GPU will start fetching the data as soon as its done.
>> - This series adds usermode queue support for all three MES based IPs
>> (GFX, SDMA and Compute).
>> - This series also adds eviction fences to handle migration of the
>> userqueue mapped buffers by TTM.
>> - For synchronization of userqueues, we have added a secure semaphores
>> IOCTL which is getting reviewed separately here:
>> https://patchwork.freedesktop.org/patch/611971/
>>
>> libDRM UAPI changes for this series can be found here:
>> (This also contains an example test utility which demonstrates
>> the usage of userqueue UAPI)
>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
>>
>> MESA changes consuming this series can be seen in the MR here:
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010
>>
>> Alex Deucher (1):
>> drm/amdgpu: UAPI for user queue management
>>
>> Arvind Yadav (4):
>> drm/amdgpu: enable SDMA usermode queues
>> drm/amdgpu: Add input fence to sync bo unmap
>> drm/amdgpu: fix MES GFX mask
>> Revert "drm/amdgpu: don't allow userspace to create a doorbell BO"
>>
>> Shashank Sharma (18):
>> drm/amdgpu: add usermode queue base code
>> drm/amdgpu: add new IOCTL for usermode queue
>> drm/amdgpu: add helpers to create userqueue object
>> drm/amdgpu: create MES-V11 usermode queue for GFX
>> drm/amdgpu: create context space for usermode queue
>> drm/amdgpu: map usermode queue into MES
>> drm/amdgpu: map wptr BO into GART
>> drm/amdgpu: generate doorbell index for userqueue
>> drm/amdgpu: cleanup leftover queues
>> drm/amdgpu: enable GFX-V11 userqueue support
>> drm/amdgpu: enable compute/gfx usermode queue
>> drm/amdgpu: update userqueue BOs and PDs
>> drm/amdgpu: add kernel config for gfx-userqueue
>> drm/amdgpu: add gfx eviction fence helpers
>> drm/amdgpu: add userqueue suspend/resume functions
>> drm/amdgpu: suspend gfx userqueues
>> drm/amdgpu: resume gfx userqueues
>> Revert "drm/amdgpu/gfx11: only enable CP GFX shadowing on SR-IOV"
>>
>> drivers/gpu/drm/amd/amdgpu/Kconfig | 8 +
>> drivers/gpu/drm/amd/amdgpu/Makefile | 10 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 11 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +
>> .../drm/amd/amdgpu/amdgpu_eviction_fence.c | 297 ++++++++
>> .../drm/amd/amdgpu/amdgpu_eviction_fence.h | 67 ++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 68 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 11 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 -
>> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 +-
>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 713 ++++++++++++++++++
>> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 74 ++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 644 ++++++++++++++++
>> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 42 +-
>> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 16 +-
>> .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c | 395 ++++++++++
>> .../gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h | 30 +
>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 5 +
>> .../gpu/drm/amd/include/amdgpu_userqueue.h | 100 +++
>> drivers/gpu/drm/amd/include/v11_structs.h | 4 +-
>> include/uapi/drm/amdgpu_drm.h | 252 +++++++
>> 22 files changed, 2722 insertions(+), 45 deletions(-)
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.c
>> create mode 100644 drivers/gpu/drm/amd/amdgpu/mes_v11_0_userqueue.h
>> create mode 100644 drivers/gpu/drm/amd/include/amdgpu_userqueue.h
>>
>> --
>> 2.45.1
>>
More information about the amd-gfx
mailing list