<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Am 07.08.24 um 22:33 schrieb Bas Nieuwenhuizen:<br>
<blockquote type="cite"
cite="mid:CAP+8YyFMxBOSwBNLpcoUw8bsfNdYDiJoYT=8E3B=d3U-c3Am+A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Wed, Aug 7, 2024 at 10:25 PM Faith Ekstrand
<<a href="mailto:faith@gfxstrand.net"
moz-do-not-send="true" class="moz-txt-link-freetext">faith@gfxstrand.net</a>>
wrote:</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Aug 7, 2024 at
2:23 PM Joshua Ashton <<a
href="mailto:joshua@froggi.es" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">joshua@froggi.es</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I
was thinking about this more recently. I was initially
considering "maybe this should be a per-BO import,"
but I couldn't think of anything in the GL model that
would actually benefit given its not "true" bindless
and there's no update-after-bind there.<br>
</blockquote>
<div><br>
</div>
<div>That's also an option and it's the way it works on
i915. However, then you have to pass lists of things
to the kernel and that's kinda gross. If we need it,
we can do that. Otherwise, I think a more global thing
is fine. I think Bas is currently investigating a
per-submit approach which is a tad different from
either but should also work okay.<br>
</div>
<div> </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Yeah, I'm working on a per-submit thing (also using
BOOKKEEP fences instead of using the EXPLICIT wait mode to
ensure we also don't add implicit fences). <br>
</div>
</div>
</div>
</blockquote>
<br>
Yeah agree. Your implementation with the per submission flag and
using BOOKKEEP actually sounds like the right thing to do to me as
well.<br>
<br>
We need to keep in mind that synchronization goes in both ways, e.g.
explicit->implicit as well as implicit->explicit.<br>
<br>
I would rather like to keep the implicit->explicit handling
(which this patch here completely disables) and only allow the
explicit->implicit handling (which by using BOOKKEEP fences).<br>
<br>
This way it is possible that we still over synchronize for example
for a double buffered BO is re-used by an explicit client and
implicit display server, but that's probably not something we should
optimize in the first place.<br>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite"
cite="mid:CAP+8YyFMxBOSwBNLpcoUw8bsfNdYDiJoYT=8E3B=d3U-c3Am+A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>We do have a per-BO list on submission already, so we
could add things there, it is just very annoying to
implement as currently at the point we do fence wait/signal
we lost the association with the BO list. Given that I
don't see an use case anytime soon (there are some
theoreticals like radeonsi might start doing READ usage
instead of RW usage with extra tracking) I feel it isn't
worth that added implementation complexity<br>
</div>
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Worth others more familiar with GL asking that
question to themselves also. I am definitely not
totally up on what's possible there.<br>
<br>
Overall, I think I am OK with this approach, even
though I think mixing implicit and explicit sync is
gross, and I want the pain that is implicit sync to
just go away forever. :-)<br>
</blockquote>
<div><br>
</div>
<div>So say we all...</div>
<div><br>
</div>
<div>~Faith</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
- Joshie 🐸✨<br>
<br>
<br>
On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand
<<a href="mailto:faith@gfxstrand.net"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">faith@gfxstrand.net</a>>
wrote:<br>
>Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was
used to disable implicit<br>
>synchronization on BOs when explicit
synchronization can be used. The<br>
>problem is that this flag is per-BO and affects
all amdgpu users in the<br>
>system, not just the usermode drver which sets
it. This can lead to<br>
>some unintended consequences for userspace if not
used carefully.<br>
><br>
>Since the introduction of
DMA_BUF_IOCTL_EXPORT_SYNC_FILE and<br>
>DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace
window system components<br>
>have grown the ability to convert between the
Vulkan explicit sync model<br>
>and the legacy implicit sync model used by X11 and
Wayland in the past.<br>
>This allows both old and new components to exist
simultaneously and talk<br>
>to each other. In particular, XWayland is able to
convert between the<br>
>two to let Vulkan apps work seamlessly with older
X11 compositors that<br>
>aren't aware of explicit synchronizaiton. This is
rapidly becoming the<br>
>backbone of synchronization in the Linux window
system space.<br>
><br>
>Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC
breaks this because it<br>
>disables all implicit synchronization on the given
BO, regardless of<br>
>which userspace driver is rendering with it and
regardless of the<br>
>assumptions made by the client application. In
particular, this is<br>
>causing issues with RADV and radeonsi. RADV sets
the flag to disable<br>
>implicit sync on window system buffers so that it
can safely have them<br>
>resident at all times without causing internal
over-synchronization.<br>
>The BO is then handed off to a compositor which
composites using<br>
>radeonsi. If the compositor uses the
EGL_ANDROID_native_fence_sync<br>
>extension to pass explicit sync files through to
radeonsi, then<br>
>everything is fine. However, if that buffer ever
gets handed to an<br>
>instance of radeonsi with any assumption of
implicit synchronization,<br>
>radeonsi won't be able sync on the BO because RADV
disabled implict sync<br>
>on that BO system-wide. It doesn't matter whether
some window system<br>
>component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to
set the appropriate<br>
>fence on the BO, amdgpu will ignore it.<br>
><br>
>This new flag disables implicit sync on the
context rather than the BO.<br>
>This way RADV can disable implicit sync (which is
what RADV has always<br>
>wanted) without affecting other components in the
system. If RADV (or<br>
>some other driver) wants implicit sync on some BO,
it can use<br>
>DMA_BUF_IOCTL_EXPORT_SYNC_FILE and
DMA_BUF_IOCTL_IMPORT_SYNC_FILE to<br>
>manually synchronize with other implicit-sync
components. This is the<br>
>approach we've taken with NVK/nouveau, ANV/xe, and
similar to the<br>
>approach taken by ANV/i915 and it works well for
those drivers.<br>
><br>
>Ideally, I would like to see something like this
back-ported to at least<br>
>the kernel where
DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced<br>
>so that we don't have to wait another year for the
fix to reach users.<br>
>However, I understand that back-porting UAPI is
problematic and I'll<br>
>leave that decision up to the amdgpu maintainers.
Michel suggested that<br>
>a new CTX_OP would make more sense if we want to
back-port it but the<br>
>create flag made more sense to me from an API
design PoV.<br>
><br>
>Signed-off-by: Faith Ekstrand <<a
href="mailto:faith.ekstrand@collabora.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">faith.ekstrand@collabora.com</a>><br>
>Cc: Alex Deucher <<a
href="mailto:alexander.deucher@amd.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">alexander.deucher@amd.com</a>><br>
>Cc: Christian König <<a
href="mailto:christian.koenig@amd.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">christian.koenig@amd.com</a>><br>
>Cc: David Airlie <<a
href="mailto:airlied@gmail.com" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">airlied@gmail.com</a>><br>
>Cc: Michel Dänzer <<a
href="mailto:mdaenzer@redhat.com" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">mdaenzer@redhat.com</a>><br>
>Cc: Bas Nieuwenhuizen <<a
href="mailto:bas@basnieuwenhuizen.nl"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">bas@basnieuwenhuizen.nl</a>><br>
>---<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++-<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12
++++++++----<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 7
+++++++<br>
> include/uapi/drm/amdgpu_drm.h | 12
+++++++++++-<br>
> 4 files changed, 28 insertions(+), 6 deletions(-)<br>
><br>
>diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
>index ec888fc6ead8..8410b4426541 100644<br>
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c<br>
>@@ -1196,7 +1196,8 @@ static int
amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)<br>
> struct dma_resv *resv =
bo->tbo.base.resv;<br>
> enum amdgpu_sync_mode sync_mode;<br>
> <br>
>- sync_mode =
amdgpu_bo_explicit_sync(bo) ?<br>
>+ sync_mode =
(amdgpu_ctx_explicit_sync(p->ctx) ||<br>
>+
amdgpu_bo_explicit_sync(bo)) ?<br>
> AMDGPU_SYNC_EXPLICIT :
AMDGPU_SYNC_NE_OWNER;<br>
> r = amdgpu_sync_resv(p->adev,
&p->sync, resv, sync_mode,<br>
>
&fpriv->vm);<br>
>diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
>index 5cb33ac99f70..a304740ccedf 100644<br>
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c<br>
>@@ -318,7 +318,8 @@ static int
amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx,<br>
> }<br>
> <br>
> static int amdgpu_ctx_init(struct amdgpu_ctx_mgr
*mgr, int32_t priority,<br>
>- struct drm_file *filp,
struct amdgpu_ctx *ctx)<br>
>+ uint32_t flags, struct
drm_file *filp,<br>
>+ struct amdgpu_ctx *ctx)<br>
> {<br>
> struct amdgpu_fpriv *fpriv =
filp->driver_priv;<br>
> u32 current_stable_pstate;<br>
>@@ -334,6 +335,7 @@ static int
amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t
priority,<br>
> ctx->mgr = mgr;<br>
> spin_lock_init(&ctx->ring_lock);<br>
> <br>
>+ ctx->flags = flags;<br>
> ctx->reset_counter =
atomic_read(&mgr->adev->gpu_reset_counter);<br>
> ctx->reset_counter_query =
ctx->reset_counter;<br>
> ctx->generation =
amdgpu_vm_generation(mgr->adev, &fpriv->vm);<br>
>@@ -474,6 +476,7 @@ static int
amdgpu_ctx_alloc(struct amdgpu_device *adev,<br>
> struct amdgpu_fpriv
*fpriv,<br>
> struct drm_file *filp,<br>
> int32_t priority,<br>
>+ uint32_t flags,<br>
> uint32_t *id)<br>
> {<br>
> struct amdgpu_ctx_mgr *mgr =
&fpriv->ctx_mgr;<br>
>@@ -493,7 +496,7 @@ static int
amdgpu_ctx_alloc(struct amdgpu_device *adev,<br>
> }<br>
> <br>
> *id = (uint32_t)r;<br>
>- r = amdgpu_ctx_init(mgr, priority, filp,
ctx);<br>
>+ r = amdgpu_ctx_init(mgr, priority, flags,
filp, ctx);<br>
> if (r) {<br>
>
idr_remove(&mgr->ctx_handles, *id);<br>
> *id = 0;<br>
>@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct
drm_device *dev, void *data,<br>
> struct drm_file *filp)<br>
> {<br>
> int r;<br>
>- uint32_t id, stable_pstate;<br>
>+ uint32_t id, stable_pstate, flags;<br>
> int32_t priority;<br>
> <br>
> union drm_amdgpu_ctx *args = data;<br>
>@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct
drm_device *dev, void *data,<br>
> <br>
> id = args->in.ctx_id;<br>
> priority = args->in.priority;<br>
>+ flags = args->in.flags;<br>
> <br>
> /* For backwards compatibility, we need to
accept ioctls with garbage<br>
> * in the priority field. Garbage values in
the priority field, result<br>
>@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct
drm_device *dev, void *data,<br>
> <br>
> switch (args->in.op) {<br>
> case AMDGPU_CTX_OP_ALLOC_CTX:<br>
>- r = amdgpu_ctx_alloc(adev, fpriv,
filp, priority, &id);<br>
>+ r = amdgpu_ctx_alloc(adev, fpriv,
filp, priority, flags, &id);<br>
> args->out.alloc.ctx_id = id;<br>
> break;<br>
> case AMDGPU_CTX_OP_FREE_CTX:<br>
>diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h<br>
>index 85376baaa92f..9431c8d2ea59 100644<br>
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h<br>
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h<br>
>@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity {<br>
> struct amdgpu_ctx {<br>
> struct kref refcount;<br>
> struct amdgpu_ctx_mgr *mgr;<br>
>+ uint32_t flags;<br>
> unsigned
reset_counter;<br>
> unsigned
reset_counter_query;<br>
> uint64_t generation;<br>
>@@ -84,6 +85,12 @@ struct dma_fence
*amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,<br>
> bool amdgpu_ctx_priority_is_valid(int32_t
ctx_prio);<br>
> void amdgpu_ctx_priority_override(struct
amdgpu_ctx *ctx, int32_t ctx_prio);<br>
> <br>
>+static inline bool<br>
>+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx)<br>
>+{<br>
>+ return ctx->flags &
AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC;<br>
>+}<br>
>+<br>
> int amdgpu_ctx_ioctl(struct drm_device *dev, void
*data,<br>
> struct drm_file *filp);<br>
> <br>
>diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h<br>
>index 96e32dafd4f0..e9d87a6e3d86 100644<br>
>--- a/include/uapi/drm/amdgpu_drm.h<br>
>+++ b/include/uapi/drm/amdgpu_drm.h<br>
>@@ -125,7 +125,14 @@ extern "C" {<br>
> #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS (1
<< 5)<br>
> /* Flag that BO is always valid in this VM */<br>
> #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1
<< 6)<br>
>-/* Flag that BO sharing will be explicitly
synchronized */<br>
>+/* Flag that BO sharing will be explicitly
synchronized<br>
>+ *<br>
>+ * This flag should not be used unless the client
can guarantee that no<br>
>+ * other driver which ever touches this BO will
ever want to use implicit<br>
>+ * synchronization as it disables implicit sync
on this BO system-wide.<br>
>+ * Instead, drivers which use an explicit
synchronization model should<br>
>+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC.<br>
>+ */<br>
> #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC
(1 << 7)<br>
> /* Flag that indicates allocating MQD gart on
GFX9, where the mtype<br>
> * for the second page onward should be set to
NC. It should never<br>
>@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list {<br>
> #define AMDGPU_CTX_OP_GET_STABLE_PSTATE 5<br>
> #define AMDGPU_CTX_OP_SET_STABLE_PSTATE 6<br>
> <br>
>+/* indicate that all synchronization will be
explicit */<br>
>+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC
(1<<0)<br>
>+<br>
> /* GPU reset status */<br>
> #define AMDGPU_CTX_NO_RESET 0<br>
> /* this the context caused it */<br>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</body>
</html>