[RFC] amdgpu: Add a context flag to disable implicit sync

Wed Aug 7 20:33:16 UTC 2024

On Wed, Aug 7, 2024 at 10:25 PM Faith Ekstrand <faith at gfxstrand.net> wrote:

> On Wed, Aug 7, 2024 at 2:23 PM Joshua Ashton <joshua at froggi.es> wrote:
>
>> I was thinking about this more recently. I was initially considering
>> "maybe this should be a per-BO import," but I couldn't think of anything in
>> the GL model that would actually benefit given its not "true" bindless and
>> there's no update-after-bind there.
>>
>
> That's also an option and it's the way it works on i915. However, then you
> have to pass lists of things to the kernel and that's kinda gross. If we
> need it, we can do that. Otherwise, I think a more global thing is fine.  I
> think Bas is currently investigating a per-submit approach which is a tad
> different from either but should also work okay.
>
>

Yeah, I'm working on a per-submit thing (also using BOOKKEEP fences instead
of using the EXPLICIT wait mode to ensure we also don't add implicit
fences).

We do have a per-BO list on submission already, so we could add things
there, it is just very annoying to implement as currently at the point we
do fence wait/signal we lost the association with the BO list. Given that
I  don't see an use case anytime soon (there are some theoreticals like
radeonsi might start doing READ usage instead of RW usage with extra
tracking) I feel it isn't worth that added implementation complexity

Worth others more familiar with GL asking that question to themselves also.
>> I am definitely not totally up on what's possible there.
>>
>> Overall, I think I am OK with this approach, even though I think mixing
>> implicit and explicit sync is gross, and I want the pain that is implicit
>> sync to just go away forever. :-)
>>
>
> So say we all...
>
> ~Faith
>
>
>
>> - Joshie 🐸✨
>>
>>
>> On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand <
>> faith at gfxstrand.net> wrote:
>> >Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was used to disable implicit
>> >synchronization on BOs when explicit synchronization can be used.  The
>> >problem is that this flag is per-BO and affects all amdgpu users in the
>> >system, not just the usermode drver which sets it.  This can lead to
>> >some unintended consequences for userspace if not used carefully.
>> >
>> >Since the introduction of DMA_BUF_IOCTL_EXPORT_SYNC_FILE and
>> >DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace window system components
>> >have grown the ability to convert between the Vulkan explicit sync model
>> >and the legacy implicit sync model used by X11 and Wayland in the past.
>> >This allows both old and new components to exist simultaneously and talk
>> >to each other.  In particular, XWayland is able to convert between the
>> >two to let Vulkan apps work seamlessly with older X11 compositors that
>> >aren't aware of explicit synchronizaiton.  This is rapidly becoming the
>> >backbone of synchronization in the Linux window system space.
>> >
>> >Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC breaks this because it
>> >disables all implicit synchronization on the given BO, regardless of
>> >which userspace driver is rendering with it and regardless of the
>> >assumptions made by the client application.  In particular, this is
>> >causing issues with RADV and radeonsi.  RADV sets the flag to disable
>> >implicit sync on window system buffers so that it can safely have them
>> >resident at all times without causing internal over-synchronization.
>> >The BO is then handed off to a compositor which composites using
>> >radeonsi.  If the compositor uses the EGL_ANDROID_native_fence_sync
>> >extension to pass explicit sync files through to radeonsi, then
>> >everything is fine.  However, if that buffer ever gets handed to an
>> >instance of radeonsi with any assumption of implicit synchronization,
>> >radeonsi won't be able sync on the BO because RADV disabled implict sync
>> >on that BO system-wide.  It doesn't matter whether some window system
>> >component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to set the appropriate
>> >fence on the BO, amdgpu will ignore it.
>> >
>> >This new flag disables implicit sync on the context rather than the BO.
>> >This way RADV can disable implicit sync (which is what RADV has always
>> >wanted) without affecting other components in the system.  If RADV (or
>> >some other driver) wants implicit sync on some BO, it can use
>> >DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE to
>> >manually synchronize with other implicit-sync components.  This is the
>> >approach we've taken with NVK/nouveau, ANV/xe, and similar to the
>> >approach taken by ANV/i915 and it works well for those drivers.
>> >
>> >Ideally, I would like to see something like this back-ported to at least
>> >the kernel where DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced
>> >so that we don't have to wait another year for the fix to reach users.
>> >However, I understand that back-porting UAPI is problematic and I'll
>> >leave that decision up to the amdgpu maintainers.  Michel suggested that
>> >a new CTX_OP would make more sense if we want to back-port it but the
>> >create flag made more sense to me from an API design PoV.
>> >
>> >Signed-off-by: Faith Ekstrand <faith.ekstrand at collabora.com>
>> >Cc: Alex Deucher <alexander.deucher at amd.com>
>> >Cc: Christian König <christian.koenig at amd.com>
>> >Cc: David Airlie <airlied at gmail.com>
>> >Cc: Michel Dänzer <mdaenzer at redhat.com>
>> >Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>> >---
>> > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  3 ++-
>> > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12 ++++++++----
>> > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  7 +++++++
>> > include/uapi/drm/amdgpu_drm.h           | 12 +++++++++++-
>> > 4 files changed, 28 insertions(+), 6 deletions(-)
>> >
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> >index ec888fc6ead8..8410b4426541 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> >@@ -1196,7 +1196,8 @@ static int amdgpu_cs_sync_rings(struct
>> amdgpu_cs_parser *p)
>> >               struct dma_resv *resv = bo->tbo.base.resv;
>> >               enum amdgpu_sync_mode sync_mode;
>> >
>> >-              sync_mode = amdgpu_bo_explicit_sync(bo) ?
>> >+              sync_mode = (amdgpu_ctx_explicit_sync(p->ctx) ||
>> >+                           amdgpu_bo_explicit_sync(bo)) ?
>> >                       AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
>> >               r = amdgpu_sync_resv(p->adev, &p->sync, resv, sync_mode,
>> >                                    &fpriv->vm);
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> >index 5cb33ac99f70..a304740ccedf 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>> >@@ -318,7 +318,8 @@ static int amdgpu_ctx_get_stable_pstate(struct
>> amdgpu_ctx *ctx,
>> > }
>> >
>> > static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
>> >-                         struct drm_file *filp, struct amdgpu_ctx *ctx)
>> >+                         uint32_t flags, struct drm_file *filp,
>> >+                         struct amdgpu_ctx *ctx)
>> > {
>> >       struct amdgpu_fpriv *fpriv = filp->driver_priv;
>> >       u32 current_stable_pstate;
>> >@@ -334,6 +335,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr
>> *mgr, int32_t priority,
>> >       ctx->mgr = mgr;
>> >       spin_lock_init(&ctx->ring_lock);
>> >
>> >+      ctx->flags = flags;
>> >       ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
>> >       ctx->reset_counter_query = ctx->reset_counter;
>> >       ctx->generation = amdgpu_vm_generation(mgr->adev, &fpriv->vm);
>> >@@ -474,6 +476,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device
>> *adev,
>> >                           struct amdgpu_fpriv *fpriv,
>> >                           struct drm_file *filp,
>> >                           int32_t priority,
>> >+                          uint32_t flags,
>> >                           uint32_t *id)
>> > {
>> >       struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>> >@@ -493,7 +496,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device
>> *adev,
>> >       }
>> >
>> >       *id = (uint32_t)r;
>> >-      r = amdgpu_ctx_init(mgr, priority, filp, ctx);
>> >+      r = amdgpu_ctx_init(mgr, priority, flags, filp, ctx);
>> >       if (r) {
>> >               idr_remove(&mgr->ctx_handles, *id);
>> >               *id = 0;
>> >@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void
>> *data,
>> >                    struct drm_file *filp)
>> > {
>> >       int r;
>> >-      uint32_t id, stable_pstate;
>> >+      uint32_t id, stable_pstate, flags;
>> >       int32_t priority;
>> >
>> >       union drm_amdgpu_ctx *args = data;
>> >@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void
>> *data,
>> >
>> >       id = args->in.ctx_id;
>> >       priority = args->in.priority;
>> >+      flags = args->in.flags;
>> >
>> >       /* For backwards compatibility, we need to accept ioctls with
>> garbage
>> >        * in the priority field. Garbage values in the priority field,
>> result
>> >@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void
>> *data,
>> >
>> >       switch (args->in.op) {
>> >       case AMDGPU_CTX_OP_ALLOC_CTX:
>> >-              r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
>> >+              r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, flags,
>> &id);
>> >               args->out.alloc.ctx_id = id;
>> >               break;
>> >       case AMDGPU_CTX_OP_FREE_CTX:
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>> >index 85376baaa92f..9431c8d2ea59 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>> >@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity {
>> > struct amdgpu_ctx {
>> >       struct kref                     refcount;
>> >       struct amdgpu_ctx_mgr           *mgr;
>> >+      uint32_t                        flags;
>> >       unsigned                        reset_counter;
>> >       unsigned                        reset_counter_query;
>> >       uint64_t                        generation;
>> >@@ -84,6 +85,12 @@ struct dma_fence *amdgpu_ctx_get_fence(struct
>> amdgpu_ctx *ctx,
>> > bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio);
>> > void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t
>> ctx_prio);
>> >
>> >+static inline bool
>> >+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx)
>> >+{
>> >+      return ctx->flags & AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC;
>> >+}
>> >+
>> > int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>> >                    struct drm_file *filp);
>> >
>> >diff --git a/include/uapi/drm/amdgpu_drm.h
>> b/include/uapi/drm/amdgpu_drm.h
>> >index 96e32dafd4f0..e9d87a6e3d86 100644
>> >--- a/include/uapi/drm/amdgpu_drm.h
>> >+++ b/include/uapi/drm/amdgpu_drm.h
>> >@@ -125,7 +125,14 @@ extern "C" {
>> > #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS     (1 << 5)
>> > /* Flag that BO is always valid in this VM */
>> > #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID     (1 << 6)
>> >-/* Flag that BO sharing will be explicitly synchronized */
>> >+/* Flag that BO sharing will be explicitly synchronized
>> >+ *
>> >+ * This flag should not be used unless the client can guarantee that no
>> >+ * other driver which ever touches this BO will ever want to use
>> implicit
>> >+ * synchronization as it disables implicit sync on this BO system-wide.
>> >+ * Instead, drivers which use an explicit synchronization model should
>> >+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC.
>> >+ */
>> > #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC               (1 << 7)
>> > /* Flag that indicates allocating MQD gart on GFX9, where the mtype
>> >  * for the second page onward should be set to NC. It should never
>> >@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list {
>> > #define AMDGPU_CTX_OP_GET_STABLE_PSTATE       5
>> > #define AMDGPU_CTX_OP_SET_STABLE_PSTATE       6
>> >
>> >+/* indicate that all synchronization will be explicit */
>> >+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC (1<<0)
>> >+
>> > /* GPU reset status */
>> > #define AMDGPU_CTX_NO_RESET           0
>> > /* this the context caused it */
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20240807/342fc500/attachment-0001.htm>