[PATCH 15/15] RFC: drm/amdgpu: Implement a proper implicit fencing uapi

Wed Jun 23 15:15:44 UTC 2021

Am 23.06.21 um 17:12 schrieb Daniel Vetter:
> On Wed, Jun 23, 2021 at 05:07:17PM +0200, Christian König wrote:
>> Am 23.06.21 um 17:03 schrieb Daniel Vetter:
>>> On Wed, Jun 23, 2021 at 04:58:27PM +0200, Bas Nieuwenhuizen wrote:
>>>> On Wed, Jun 23, 2021 at 4:50 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>>>>> On Wed, Jun 23, 2021 at 4:02 PM Christian König
>>>>> <christian.koenig at amd.com> wrote:
>>>>>> Am 23.06.21 um 15:49 schrieb Daniel Vetter:
>>>>>>> On Wed, Jun 23, 2021 at 3:44 PM Christian König
>>>>>>> <christian.koenig at amd.com> wrote:
>>>>>>>> Am 23.06.21 um 15:38 schrieb Bas Nieuwenhuizen:
>>>>>>>>> On Wed, Jun 23, 2021 at 2:59 PM Christian König
>>>>>>>>> <christian.koenig at amd.com> wrote:
>>>>>>>>>> Am 23.06.21 um 14:18 schrieb Daniel Vetter:
>>>>>>>>>>> On Wed, Jun 23, 2021 at 11:45 AM Bas Nieuwenhuizen
>>>>>>>>>>> <bas at basnieuwenhuizen.nl> wrote:
>>>>>>>>>>>> On Tue, Jun 22, 2021 at 6:55 PM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
>>>>>>>>>>>>> WARNING: Absolutely untested beyond "gcc isn't dying in agony".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Implicit fencing done properly needs to treat the implicit fencing
>>>>>>>>>>>>> slots like a funny kind of IPC mailbox. In other words it needs to be
>>>>>>>>>>>>> explicitly. This is the only way it will mesh well with explicit
>>>>>>>>>>>>> fencing userspace like vk, and it's also the bare minimum required to
>>>>>>>>>>>>> be able to manage anything else that wants to use the same buffer on
>>>>>>>>>>>>> multiple engines in parallel, and still be able to share it through
>>>>>>>>>>>>> implicit sync.
>>>>>>>>>>>>>
>>>>>>>>>>>>> amdgpu completely lacks such an uapi. Fix this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Luckily the concept of ignoring implicit fences exists already, and
>>>>>>>>>>>>> takes care of all the complexities of making sure that non-optional
>>>>>>>>>>>>> fences (like bo moves) are not ignored. This support was added in
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit 177ae09b5d699a5ebd1cafcee78889db968abf54
>>>>>>>>>>>>> Author: Andres Rodriguez <andresx7 at gmail.com>
>>>>>>>>>>>>> Date:   Fri Sep 15 20:44:06 2017 -0400
>>>>>>>>>>>>>
>>>>>>>>>>>>>          drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortuantely it's the wrong semantics, because it's a bo flag and
>>>>>>>>>>>>> disables implicit sync on an allocated buffer completely.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We _do_ want implicit sync, but control it explicitly. For this we
>>>>>>>>>>>>> need a flag on the drm_file, so that a given userspace (like vulkan)
>>>>>>>>>>>>> can manage the implicit sync slots explicitly. The other side of the
>>>>>>>>>>>>> pipeline (compositor, other process or just different stage in a media
>>>>>>>>>>>>> pipeline in the same process) can then either do the same, or fully
>>>>>>>>>>>>> participate in the implicit sync as implemented by the kernel by
>>>>>>>>>>>>> default.
>>>>>>>>>>>>>
>>>>>>>>>>>>> By building on the existing flag for buffers we avoid any issues with
>>>>>>>>>>>>> opening up additional security concerns - anything this new flag here
>>>>>>>>>>>>> allows is already.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All drivers which supports this concept of a userspace-specific
>>>>>>>>>>>>> opt-out of implicit sync have a flag in their CS ioctl, but in reality
>>>>>>>>>>>>> that turned out to be a bit too inflexible. See the discussion below,
>>>>>>>>>>>>> let's try to do a bit better for amdgpu.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This alone only allows us to completely avoid any stalls due to
>>>>>>>>>>>>> implicit sync, it does not yet allow us to use implicit sync as a
>>>>>>>>>>>>> strange form of IPC for sync_file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For that we need two more pieces:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - a way to get the current implicit sync fences out of a buffer. Could
>>>>>>>>>>>>>        be done in a driver ioctl, but everyone needs this, and generally a
>>>>>>>>>>>>>        dma-buf is involved anyway to establish the sharing. So an ioctl on
>>>>>>>>>>>>>        the dma-buf makes a ton more sense:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-4-jason%40jlekstrand.net%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7Cbbfb6a2fd1ab4ab448d608d936594aae%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637600579485508943%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EKK8SOxfcGPIcH3EFEN656G76vQnn2jGlGyMkuY4f5k%3D&reserved=0
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Current drivers in upstream solves this by having the opt-out flag
>>>>>>>>>>>>>        on their CS ioctl. This has the downside that very often the CS
>>>>>>>>>>>>>        which must actually stall for the implicit fence is run a while
>>>>>>>>>>>>>        after the implicit fence point was logically sampled per the api
>>>>>>>>>>>>>        spec (vk passes an explicit syncobj around for that afaiui), and so
>>>>>>>>>>>>>        results in oversync. Converting the implicit sync fences into a
>>>>>>>>>>>>>        snap-shot sync_file is actually accurate.
>>>>>>>>>>>>>
>>>>>>>>>>>>> - Simillar we need to be able to set the exclusive implicit fence.
>>>>>>>>>>>>>        Current drivers again do this with a CS ioctl flag, with again the
>>>>>>>>>>>>>        same problems that the time the CS happens additional dependencies
>>>>>>>>>>>>>        have been added. An explicit ioctl to only insert a sync_file (while
>>>>>>>>>>>>>        respecting the rules for how exclusive and shared fence slots must
>>>>>>>>>>>>>        be update in struct dma_resv) is much better. This is proposed here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>        https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F20210520190007.534046-5-jason%40jlekstrand.net%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7Cbbfb6a2fd1ab4ab448d608d936594aae%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637600579485508943%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nrsIXQYXctDMFvYuUfM2LO%2BozUfzopfs34ZpTKjNCKo%3D&reserved=0
>>>>>>>>>>>>>
>>>>>>>>>>>>> These three pieces together allow userspace to fully control implicit
>>>>>>>>>>>>> fencing and remove all unecessary stall points due to them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, as much as the implicit fencing model fundamentally allows:
>>>>>>>>>>>>> There is only one set of fences, you can only choose to sync against
>>>>>>>>>>>>> only writers (exclusive slot), or everyone. Hence suballocating
>>>>>>>>>>>>> multiple buffers or anything else like this is fundamentally not
>>>>>>>>>>>>> possible, and can only be fixed by a proper explicit fencing model.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Aside from that caveat this model gets implicit fencing as closely to
>>>>>>>>>>>>> explicit fencing semantics as possible:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the actual implementation I opted for a simple setparam ioctl, no
>>>>>>>>>>>>> locking (just atomic reads/writes) for simplicity. There is a nice
>>>>>>>>>>>>> flag parameter in the VM ioctl which we could use, except:
>>>>>>>>>>>>> - it's not checked, so userspace likely passes garbage
>>>>>>>>>>>>> - there's already a comment that userspace _does_ pass garbage in the
>>>>>>>>>>>>>        priority field
>>>>>>>>>>>>> So yeah unfortunately this flag parameter for setting vm flags is
>>>>>>>>>>>>> useless, and we need to hack up a new one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> v2: Explain why a new SETPARAM (Jason)
>>>>>>>>>>>>>
>>>>>>>>>>>>> v3: Bas noticed I forgot to hook up the dependency-side shortcut. We
>>>>>>>>>>>>> need both, or this doesn't do much.
>>>>>>>>>>>>>
>>>>>>>>>>>>> v4: Rebase over the amdgpu patch to always set the implicit sync
>>>>>>>>>>>>> fences.
>>>>>>>>>>>> So I think there is still a case missing in this implementation.
>>>>>>>>>>>> Consider these 3 cases
>>>>>>>>>>>>
>>>>>>>>>>>> (format: a->b: b waits on a. Yes, I know arrows are hard)
>>>>>>>>>>>>
>>>>>>>>>>>> explicit->explicit: This doesn't wait now, which is good
>>>>>>>>>>>> Implicit->explicit: This doesn't wait now, which is good
>>>>>>>>>>>> explicit->implicit : This still waits as the explicit submission still
>>>>>>>>>>>> adds shared fences and most things that set an exclusive fence for
>>>>>>>>>>>> implicit sync will hence wait on it.
>>>>>>>>>>>>
>>>>>>>>>>>> This is probably good enough for what radv needs now but also sounds
>>>>>>>>>>>> like a risk wrt baking in new uapi behavior that we don't want to be
>>>>>>>>>>>> the end result.
>>>>>>>>>>>>
>>>>>>>>>>>> Within AMDGPU this is probably solvable in two ways:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Downgrade AMDGPU_SYNC_NE_OWNER to AMDGPU_SYNC_EXPLICIT for shared fences.
>>>>>>>>>>> I'm not sure that works. I think the right fix is that radeonsi also
>>>>>>>>>>> switches to this model, with maybe a per-bo CS flag to set indicate
>>>>>>>>>>> write access, to cut down on the number of ioctls that are needed
>>>>>>>>>>> otherwise on shared buffers. This per-bo flag would essentially select
>>>>>>>>>>> between SYNC_NE_OWNER and SYNC_EXPLICIT on a per-buffer basis.
>>>>>>>>>> Yeah, but I'm still not entirely sure why that approach isn't sufficient?
>>>>>>>>>>
>>>>>>>>>> Problem with the per context or per vm flag is that you then don't get
>>>>>>>>>> any implicit synchronization any more when another process starts using
>>>>>>>>>> the buffer.
>>>>>>>>> That is exactly what I want for Vulkan :)
>>>>>>>> Yeah, but as far as I know this is not something we can do.
>>>>>>>>
>>>>>>>> See we have use cases like screen capture and debug which rely on that
>>>>>>>> behavior.
>>>>>>> They will keep working, if (and only if) the vulkan side sets the
>>>>>>> winsys fences correctly. Also, everything else in vulkan aside from
>>>>>>> winsys is explicitly not synced at all, you have to import drm syncobj
>>>>>>> timeline on the gl side.
>>>>>>>
>>>>>>>> The only thing we can do is to say on a per buffer flag that a buffer
>>>>>>>> should not participate in implicit sync at all.
>>>>>>> Nah, this doesn't work. Because it's not a global decision, is a local
>>>>>>> decision for the rendered. Vulkan wants to control implicit sync
>>>>>>> explicitly, and the kernel can't force more synchronization. If a
>>>>>>> buffer is shared as a winsys buffer between vulkan client and gl using
>>>>>>> compositor, then you _have_ to use implicit sync on it. But vk needs
>>>>>>> to set the fences directly (and if the app gets it wrong, you get
>>>>>>> misrendering, but that is the specified behavour of vulkan).
>>>>>> Yeah, but that's exactly what we tried to avoid.
>>>>>>
>>>>>> Mhm, when we attach the flag to the process/VM then this would break the
>>>>>> use case of VA-API and Vulkan in the same process.
>>>>>>
>>>>>> But I think if you attach the flag to the context that should indeed
>>>>>> work fine.
>>>>> Yeah that's a question I have, whether the drm_file is shared within
>>>>> one process among everything, or whether radeonsi/libva/vk each have
>>>>> their own. If each have their own drm_file, then we should be fine,
>>>>> otherwise we need to figure out another place to put this (worst case
>>>>> as a CS extension that vk just sets on every submit).
>>>> libdrm_amdgpu dedupes it all so we mostly end up with one drm_file per
>>>> process (modulo minigbm on chromeos and modulo a master fd).
>>>>
>>>> That said the current proposal is for the context right? And on the
>>>> context this should pretty much work? So I'm not sure why this is the
>>>> part we are discussing?
>>> It's on the fpriv->vm, so on the FD. I assumed vulkan at least would want
>>> to have it's private VM for this. And on the quick I didn't see any other
>>> way to create a VM than to have an FD of your own.
>> You can't have your own FD in libdrm_amdgpu userspace. We had a pretty hard
>> design discussion about that already.
>>
>> What you could do is to load your own copy of libdrm_amdgpu, but I won't
>> recommend that.
>>
>> Just putting the flag on the context instead of the VM is much cleaner as
>> far as I can see anyway.
> Helper for the blind? If you gues expect me to move that myself ...

Add the flag to struct amdgpu_ctx, you can use amdgpu_ctx_ioctl() to set 
it. Then during CS that is available as p->ctx.

If I'm not totally mistaken that is also what Bas had in mind with his 
comment.

Christian.

> -Daniel
>
>> Christian.
>>
>>> If there's something else that means "gpu context with it's own vm" then
>>> the flag would need to be moved there, pointers appreciated (but maybe
>>> someone with hw + userspace can do that quicker).
>>> -Daniel
>>>
>>>>> Also yes this risks that a vk app which was violationing the winsys
>>>>> spec will now break, which is why I think we should do this sooner
>>>>> than later. Otherwise the list of w/a we might need to apply in vk
>>>>> userspace will become very long :-( At least since this is purely
>>>>> opt-in from userspace, we only need to have the w/a list in userspace,
>>>>> where mesa has the infrastructure for that already.
>>>>> -Daniel
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> -Daniel
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>>> The current amdgpu uapi just doesn't allow any other model without an
>>>>>>>>>>> explicit opt-in. So current implicit sync userspace just has to
>>>>>>>>>>> oversync, there's not much choice.
>>>>>>>>>>>
>>>>>>>>>>>> 2) Have an EXPLICIT fence owner that is used for explicit submissions
>>>>>>>>>>>> that is ignored by AMDGPU_SYNC_NE_OWNER.
>>>>>>>>>>>>
>>>>>>>>>>>> But this doesn't solve cross-driver interactions here.
>>>>>>>>>>> Yeah cross-driver is still entirely unsolved, because
>>>>>>>>>>> amdgpu_bo_explicit_sync() on the bo didn't solve that either.
>>>>>>>>>> Hui? You have lost me. Why is that still unsolved?
>>>>>>>>> The part we're trying to solve with this patch is Vulkan should not
>>>>>>>>> participate in any implicit sync at all wrt submissions (and then
>>>>>>>>> handle the implicit sync for WSI explicitly using the fence
>>>>>>>>> import/export stuff that Jason wrote). As long we add shared fences to
>>>>>>>>> the dma_resv we participate in implicit sync (at the level of an
>>>>>>>>> implicit sync read) still, at least from the perspective of later jobs
>>>>>>>>> waiting on these fences.
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> -Daniel
>>>>>>>>>>>
>>>>>>>>>>>>> Cc: mesa-dev at lists.freedesktop.org
>>>>>>>>>>>>> Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>>>>>>>>>>>> Cc: Dave Airlie <airlied at gmail.com>
>>>>>>>>>>>>> Cc: Rob Clark <robdclark at chromium.org>
>>>>>>>>>>>>> Cc: Kristian H. Kristensen <hoegsberg at google.com>
>>>>>>>>>>>>> Cc: Michel Dänzer <michel at daenzer.net>
>>>>>>>>>>>>> Cc: Daniel Stone <daniels at collabora.com>
>>>>>>>>>>>>> Cc: Sumit Semwal <sumit.semwal at linaro.org>
>>>>>>>>>>>>> Cc: "Christian König" <christian.koenig at amd.com>
>>>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>>>>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>>>>>>>>> Cc: Deepak R Varma <mh12gx2825 at gmail.com>
>>>>>>>>>>>>> Cc: Chen Li <chenli at uniontech.com>
>>>>>>>>>>>>> Cc: Kevin Wang <kevin1.wang at amd.com>
>>>>>>>>>>>>> Cc: Dennis Li <Dennis.Li at amd.com>
>>>>>>>>>>>>> Cc: Luben Tuikov <luben.tuikov at amd.com>
>>>>>>>>>>>>> Cc: linaro-mm-sig at lists.linaro.org
>>>>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  7 +++++--
>>>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++++++++++++++++++++
>>>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  6 ++++++
>>>>>>>>>>>>>       include/uapi/drm/amdgpu_drm.h           | 10 ++++++++++
>>>>>>>>>>>>>       4 files changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>> index 65df34c17264..c5386d13eb4a 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>>>>>>>>>> @@ -498,6 +498,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>>              struct amdgpu_bo *gds;
>>>>>>>>>>>>>              struct amdgpu_bo *gws;
>>>>>>>>>>>>>              struct amdgpu_bo *oa;
>>>>>>>>>>>>> +       bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync);
>>>>>>>>>>>>>              int r;
>>>>>>>>>>>>>
>>>>>>>>>>>>>              INIT_LIST_HEAD(&p->validated);
>>>>>>>>>>>>> @@ -577,7 +578,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>>>>>>>>>>>>>
>>>>>>>>>>>>>                      e->bo_va = amdgpu_vm_bo_find(vm, bo);
>>>>>>>>>>>>>
>>>>>>>>>>>>> -               if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
>>>>>>>>>>>>> +               if (bo->tbo.base.dma_buf &&
>>>>>>>>>>>>> +                   !(no_implicit_sync || amdgpu_bo_explicit_sync(bo))) {
>>>>>>>>>>>>>                              e->chain = dma_fence_chain_alloc();
>>>>>>>>>>>>>                              if (!e->chain) {
>>>>>>>>>>>>>                                      r = -ENOMEM;
>>>>>>>>>>>>> @@ -649,6 +651,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
>>>>>>>>>>>>>       {
>>>>>>>>>>>>>              struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
>>>>>>>>>>>>>              struct amdgpu_bo_list_entry *e;
>>>>>>>>>>>>> +       bool no_implicit_sync = READ_ONCE(fpriv->vm.no_implicit_sync);
>>>>>>>>>>>>>              int r;
>>>>>>>>>>>>>
>>>>>>>>>>>>>              list_for_each_entry(e, &p->validated, tv.head) {
>>>>>>>>>>>>> @@ -656,7 +659,7 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p)
>>>>>>>>>>>>>                      struct dma_resv *resv = bo->tbo.base.resv;
>>>>>>>>>>>>>                      enum amdgpu_sync_mode sync_mode;
>>>>>>>>>>>>>
>>>>>>>>>>>>> -               sync_mode = amdgpu_bo_explicit_sync(bo) ?
>>>>>>>>>>>>> +               sync_mode = no_implicit_sync || amdgpu_bo_explicit_sync(bo) ?
>>>>>>>>>>>>>                              AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
>>>>>>>>>>>>>                      r = amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode,
>>>>>>>>>>>>>                                           &fpriv->vm);
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> index c080ba15ae77..f982626b5328 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> @@ -1724,6 +1724,26 @@ int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv)
>>>>>>>>>>>>>              return 0;
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>
>>>>>>>>>>>>> +int amdgpu_setparam_ioctl(struct drm_device *dev, void *data,
>>>>>>>>>>>>> +                         struct drm_file *filp)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +       struct drm_amdgpu_setparam *setparam = data;
>>>>>>>>>>>>> +       struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       switch (setparam->param) {
>>>>>>>>>>>>> +       case AMDGPU_SETPARAM_NO_IMPLICIT_SYNC:
>>>>>>>>>>>>> +               if (setparam->value)
>>>>>>>>>>>>> +                       WRITE_ONCE(fpriv->vm.no_implicit_sync, true);
>>>>>>>>>>>>> +               else
>>>>>>>>>>>>> +                       WRITE_ONCE(fpriv->vm.no_implicit_sync, false);
>>>>>>>>>>>>> +               break;
>>>>>>>>>>>>> +       default:
>>>>>>>>>>>>> +               return -EINVAL;
>>>>>>>>>>>>> +       }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +       return 0;
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>>       const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>>>>>>>>>>>>>              DRM_IOCTL_DEF_DRV(AMDGPU_GEM_CREATE, amdgpu_gem_create_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>>              DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>> @@ -1742,6 +1762,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>>>>>>>>>>>>>              DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>>              DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>>              DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>> +       DRM_IOCTL_DEF_DRV(AMDGPU_SETPARAM, amdgpu_setparam_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
>>>>>>>>>>>>>       };
>>>>>>>>>>>>>
>>>>>>>>>>>>>       static const struct drm_driver amdgpu_kms_driver = {
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>>>>>>>>>>> index ddb85a85cbba..0e8c440c6303 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>>>>>>>>>>> @@ -321,6 +321,12 @@ struct amdgpu_vm {
>>>>>>>>>>>>>              bool                    bulk_moveable;
>>>>>>>>>>>>>              /* Flag to indicate if VM is used for compute */
>>>>>>>>>>>>>              bool                    is_compute_context;
>>>>>>>>>>>>> +       /*
>>>>>>>>>>>>> +        * Flag to indicate whether implicit sync should always be skipped on
>>>>>>>>>>>>> +        * this context. We do not care about races at all, userspace is allowed
>>>>>>>>>>>>> +        * to shoot itself with implicit sync to its fullest liking.
>>>>>>>>>>>>> +        */
>>>>>>>>>>>>> +       bool no_implicit_sync;
>>>>>>>>>>>>>       };
>>>>>>>>>>>>>
>>>>>>>>>>>>>       struct amdgpu_vm_manager {
>>>>>>>>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> index 0cbd1540aeac..9eae245c14d6 100644
>>>>>>>>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>>>>>>>>> @@ -54,6 +54,7 @@ extern "C" {
>>>>>>>>>>>>>       #define DRM_AMDGPU_VM                  0x13
>>>>>>>>>>>>>       #define DRM_AMDGPU_FENCE_TO_HANDLE     0x14
>>>>>>>>>>>>>       #define DRM_AMDGPU_SCHED               0x15
>>>>>>>>>>>>> +#define DRM_AMDGPU_SETPARAM            0x16
>>>>>>>>>>>>>
>>>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_CREATE    DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
>>>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_GEM_MMAP      DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
>>>>>>>>>>>>> @@ -71,6 +72,7 @@ extern "C" {
>>>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_VM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_VM, union drm_amdgpu_vm)
>>>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
>>>>>>>>>>>>>       #define DRM_IOCTL_AMDGPU_SCHED         DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
>>>>>>>>>>>>> +#define DRM_IOCTL_AMDGPU_SETPARAM      DRM_IOW(DRM_COMMAND_BASE + DRM_AMDGPU_SETPARAM, struct drm_amdgpu_setparam)
>>>>>>>>>>>>>
>>>>>>>>>>>>>       /**
>>>>>>>>>>>>>        * DOC: memory domains
>>>>>>>>>>>>> @@ -306,6 +308,14 @@ union drm_amdgpu_sched {
>>>>>>>>>>>>>              struct drm_amdgpu_sched_in in;
>>>>>>>>>>>>>       };
>>>>>>>>>>>>>
>>>>>>>>>>>>> +#define AMDGPU_SETPARAM_NO_IMPLICIT_SYNC       1
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +struct drm_amdgpu_setparam {
>>>>>>>>>>>>> +       /* AMDGPU_SETPARAM_* */
>>>>>>>>>>>>> +       __u32   param;
>>>>>>>>>>>>> +       __u32   value;
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>>       /*
>>>>>>>>>>>>>        * This is not a reliable API and you should expect it to fail for any
>>>>>>>>>>>>>        * number of reasons and have fallback path that do not use userptr to
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 2.32.0.rc2
>>>>>>>>>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7Cbbfb6a2fd1ab4ab448d608d936594aae%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637600579485508943%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o%2BceWYx09dRBKYizcr2Hg8usS4Nl8%2Bz%2F4YM4hgWPYpE%3D&reserved=0