[RFC] amdgpu: Add a context flag to disable implicit sync
Joshua Ashton
joshua at froggi.es
Wed Aug 7 19:23:13 UTC 2024
I was thinking about this more recently. I was initially considering "maybe this should be a per-BO import," but I couldn't think of anything in the GL model that would actually benefit given its not "true" bindless and there's no update-after-bind there.
Worth others more familiar with GL asking that question to themselves also. I am definitely not totally up on what's possible there.
Overall, I think I am OK with this approach, even though I think mixing implicit and explicit sync is gross, and I want the pain that is implicit sync to just go away forever. :-)
- Joshie 🐸✨
On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand <faith at gfxstrand.net> wrote:
>Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was used to disable implicit
>synchronization on BOs when explicit synchronization can be used. The
>problem is that this flag is per-BO and affects all amdgpu users in the
>system, not just the usermode drver which sets it. This can lead to
>some unintended consequences for userspace if not used carefully.
>
>Since the introduction of DMA_BUF_IOCTL_EXPORT_SYNC_FILE and
>DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace window system components
>have grown the ability to convert between the Vulkan explicit sync model
>and the legacy implicit sync model used by X11 and Wayland in the past.
>This allows both old and new components to exist simultaneously and talk
>to each other. In particular, XWayland is able to convert between the
>two to let Vulkan apps work seamlessly with older X11 compositors that
>aren't aware of explicit synchronizaiton. This is rapidly becoming the
>backbone of synchronization in the Linux window system space.
>
>Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC breaks this because it
>disables all implicit synchronization on the given BO, regardless of
>which userspace driver is rendering with it and regardless of the
>assumptions made by the client application. In particular, this is
>causing issues with RADV and radeonsi. RADV sets the flag to disable
>implicit sync on window system buffers so that it can safely have them
>resident at all times without causing internal over-synchronization.
>The BO is then handed off to a compositor which composites using
>radeonsi. If the compositor uses the EGL_ANDROID_native_fence_sync
>extension to pass explicit sync files through to radeonsi, then
>everything is fine. However, if that buffer ever gets handed to an
>instance of radeonsi with any assumption of implicit synchronization,
>radeonsi won't be able sync on the BO because RADV disabled implict sync
>on that BO system-wide. It doesn't matter whether some window system
>component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to set the appropriate
>fence on the BO, amdgpu will ignore it.
>
>This new flag disables implicit sync on the context rather than the BO.
>This way RADV can disable implicit sync (which is what RADV has always
>wanted) without affecting other components in the system. If RADV (or
>some other driver) wants implicit sync on some BO, it can use
>DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE to
>manually synchronize with other implicit-sync components. This is the
>approach we've taken with NVK/nouveau, ANV/xe, and similar to the
>approach taken by ANV/i915 and it works well for those drivers.
>
>Ideally, I would like to see something like this back-ported to at least
>the kernel where DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced
>so that we don't have to wait another year for the fix to reach users.
>However, I understand that back-porting UAPI is problematic and I'll
>leave that decision up to the amdgpu maintainers. Michel suggested that
>a new CTX_OP would make more sense if we want to back-port it but the
>create flag made more sense to me from an API design PoV.
>
>Signed-off-by: Faith Ekstrand <faith.ekstrand at collabora.com>
>Cc: Alex Deucher <alexander.deucher at amd.com>
>Cc: Christian König <christian.koenig at amd.com>
>Cc: David Airlie <airlied at gmail.com>
>Cc: Michel Dänzer <mdaenzer at redhat.com>
>Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12 ++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 7 +++++++
> include/uapi/drm/amdgpu_drm.h | 12 +++++++++++-
> 4 files changed, 28 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>index ec888fc6ead8..8410b4426541 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>@@ -1196,7 +1196,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
> struct dma_resv *resv = bo->tbo.base.resv;
> enum amdgpu_sync_mode sync_mode;
>
>- sync_mode = amdgpu_bo_explicit_sync(bo) ?
>+ sync_mode = (amdgpu_ctx_explicit_sync(p->ctx) ||
>+ amdgpu_bo_explicit_sync(bo)) ?
> AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
> r = amdgpu_sync_resv(p->adev, &p->sync, resv, sync_mode,
> &fpriv->vm);
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>index 5cb33ac99f70..a304740ccedf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>@@ -318,7 +318,8 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx,
> }
>
> static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
>- struct drm_file *filp, struct amdgpu_ctx *ctx)
>+ uint32_t flags, struct drm_file *filp,
>+ struct amdgpu_ctx *ctx)
> {
> struct amdgpu_fpriv *fpriv = filp->driver_priv;
> u32 current_stable_pstate;
>@@ -334,6 +335,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
> ctx->mgr = mgr;
> spin_lock_init(&ctx->ring_lock);
>
>+ ctx->flags = flags;
> ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
> ctx->reset_counter_query = ctx->reset_counter;
> ctx->generation = amdgpu_vm_generation(mgr->adev, &fpriv->vm);
>@@ -474,6 +476,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
> struct amdgpu_fpriv *fpriv,
> struct drm_file *filp,
> int32_t priority,
>+ uint32_t flags,
> uint32_t *id)
> {
> struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>@@ -493,7 +496,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
> }
>
> *id = (uint32_t)r;
>- r = amdgpu_ctx_init(mgr, priority, filp, ctx);
>+ r = amdgpu_ctx_init(mgr, priority, flags, filp, ctx);
> if (r) {
> idr_remove(&mgr->ctx_handles, *id);
> *id = 0;
>@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp)
> {
> int r;
>- uint32_t id, stable_pstate;
>+ uint32_t id, stable_pstate, flags;
> int32_t priority;
>
> union drm_amdgpu_ctx *args = data;
>@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>
> id = args->in.ctx_id;
> priority = args->in.priority;
>+ flags = args->in.flags;
>
> /* For backwards compatibility, we need to accept ioctls with garbage
> * in the priority field. Garbage values in the priority field, result
>@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>
> switch (args->in.op) {
> case AMDGPU_CTX_OP_ALLOC_CTX:
>- r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
>+ r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, flags, &id);
> args->out.alloc.ctx_id = id;
> break;
> case AMDGPU_CTX_OP_FREE_CTX:
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>index 85376baaa92f..9431c8d2ea59 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity {
> struct amdgpu_ctx {
> struct kref refcount;
> struct amdgpu_ctx_mgr *mgr;
>+ uint32_t flags;
> unsigned reset_counter;
> unsigned reset_counter_query;
> uint64_t generation;
>@@ -84,6 +85,12 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
> bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio);
> void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t ctx_prio);
>
>+static inline bool
>+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx)
>+{
>+ return ctx->flags & AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC;
>+}
>+
> int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp);
>
>diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>index 96e32dafd4f0..e9d87a6e3d86 100644
>--- a/include/uapi/drm/amdgpu_drm.h
>+++ b/include/uapi/drm/amdgpu_drm.h
>@@ -125,7 +125,14 @@ extern "C" {
> #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS (1 << 5)
> /* Flag that BO is always valid in this VM */
> #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1 << 6)
>-/* Flag that BO sharing will be explicitly synchronized */
>+/* Flag that BO sharing will be explicitly synchronized
>+ *
>+ * This flag should not be used unless the client can guarantee that no
>+ * other driver which ever touches this BO will ever want to use implicit
>+ * synchronization as it disables implicit sync on this BO system-wide.
>+ * Instead, drivers which use an explicit synchronization model should
>+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC.
>+ */
> #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC (1 << 7)
> /* Flag that indicates allocating MQD gart on GFX9, where the mtype
> * for the second page onward should be set to NC. It should never
>@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list {
> #define AMDGPU_CTX_OP_GET_STABLE_PSTATE 5
> #define AMDGPU_CTX_OP_SET_STABLE_PSTATE 6
>
>+/* indicate that all synchronization will be explicit */
>+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC (1<<0)
>+
> /* GPU reset status */
> #define AMDGPU_CTX_NO_RESET 0
> /* this the context caused it */
More information about the dri-devel
mailing list