[RFC PATCH v3 13/21] drm/exec: Rework contended locking
Thomas Hellström
thomas.hellstrom at linux.intel.com
Tue May 28 08:07:58 UTC 2024
On Tue, 2024-05-28 at 08:51 +0200, Christian König wrote:
> Hi Thomas,
>
> Am 28.05.24 um 08:36 schrieb Thomas Hellström:
> > Hi, Christian.
> >
> > I'd appreciate if you could respond to the below, since it is a bit
> > hard to try to design around a problem I don't believe exists, and
> > come
> > up with a good solution for that.
> >
> > In short.
> > 1) To prefault userptr we have to exit the ww transaction anyway.
>
> Yeah and I would rather like to have that handling in drm_exec at
> some time.
>
> Basically a GEM object with outdated struct pages backing it can be
> handled in the same loop at the ww transaction.
OK.
>
> > 2) Any contended lock held at loop start is completely encapsulated
> > in
> > the ww transaction and can and will be unlocked when exiting it, so
> > this patch doesn't introduce any additional problems for userptr
> > handling AFAICT.
>
> The drm_exec object was intentionally design to not have anything
> locked
> at the beginning of the loop. See the discussion I had with Sima
> around
> that when pushing the drm_exec object upstream.
>
> I would really like to stick with that design and honestly don't see
> the
> reason to change that. Contenting on a trylock seem to be much more
> questionable.
The change here is to make sure we *don't* have contention in a
trylock, which is otherwise inherent in the current drm_exec design.
What I'm trying to say here is that we end up with the contended lock
grabbed at loop start you already conceptually have a conflicting lock
held (the we_class::acquire_key). Both these can be resolved.
>
> > 3) The need for a fully capable ww transaction helper moving
> > forward.
> > If we need a tool that also does userptr locking, then I think we
> > need
> > to separate that from the ww transaction tool and only pass the
> > latter
> > around to TTM.
>
> drm_exec is *not* meant to be a ww_transaction helper.
>
> The functionality here is to support drivers in their CS interface
> and
> that includes userptr handling as well as a couple of other things.
Then if so, I don't think drm_exec is the correct functionality to pass
to TTM to resolve the eviction issues, but rather a ww transaction
helper that can be used standalone *and* by drm_exec. Now the
functionality would be more or less what drm exec is today, but
slightly augmented.
But then IMHO instead of changing name and more or less replicating
what drm_exec is today wouldn't it be a better idea to subclass
drm_exec into a full-fledged CS helper at the time when that
functionality is indeed added?
/Thomas
>
> Regards,
> Christian.
>
> >
> > Thanks,
> > Thomas
> >
> > On Wed, 2024-05-22 at 19:42 +0200, Thomas Hellström wrote:
> > > On Wed, 2024-05-22 at 18:52 +0200, Christian König wrote:
> > > > Am 22.05.24 um 16:32 schrieb Thomas Hellström:
> > > > > On Wed, 2024-05-22 at 07:52 +0200, Christian König wrote:
> > > > > > Am 21.05.24 um 09:16 schrieb Thomas Hellström:
> > > > > > > If contention and backoff occurs during a drm_exec ww
> > > > > > > transaction,
> > > > > > > the contended lock is not locked again until the next
> > > > > > > orinary
> > > > > > > attempt to lock a dma_resv lock. However, with the
> > > > > > > introduction
> > > > > > > of
> > > > > > > drm_exec_trylock(), that doesn't work, since the locking
> > > > > > > of
> > > > > > > the
> > > > > > > contended lock needs to be a sleeping lock. Neither can
> > > > > > > we
> > > > > > > ignore
> > > > > > > locking the contended lock during a trylock since that
> > > > > > > would
> > > > > > > violate
> > > > > > > at least the ww_mutex annotations.
> > > > > > >
> > > > > > > So resolve this by actually locking the contended lock
> > > > > > > during
> > > > > > > drm_exec_retry_on_contention(). However, this introduces
> > > > > > > a
> > > > > > > new
> > > > > > > point
> > > > > > > of failure since locking the contended lock may return -
> > > > > > > EINTR.
> > > > > > >
> > > > > > > Hence drm_exec_retry_on_contention() must take an error
> > > > > > > parameter
> > > > > > > and
> > > > > > > also return a value indicating success.
> > > > > > After thinking more about that I have to pretty clearly NAK
> > > > > > this.
> > > > > >
> > > > > I thought we were beyond upfront NAKing in the first reply :/
> > > > Well my memory could fail me, but I mentioned concerns on this
> > > > approach
> > > > before.
> > > >
> > > > I was a bit annoyed seeing that again. But could as well be
> > > > that my
> > > > response never got out or that I'm mixing things up.
> > > I haven't seen it at least. Last discussion on this I saw was
> > > here. I didn't see a follow-up on that.
> > >
> > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@linux.intel.com/
> > >
> > >
> > > > > > It's an intentional design decision to guarantee that at
> > > > > > the
> > > > > > start of
> > > > > > the loop no object is locked.
> > > > > >
> > > > > > This is because Sima and I wanted to integrate userptr
> > > > > > handling
> > > > > > into
> > > > > > drm_exec as well in the long term.
> > > > > First I agree the interface looks worse with this patch.
> > > > > But I thought generic userptr handling were going to end up
> > > > > as a
> > > > > gpuvm
> > > > > helper (without using GEM objects) as we've discussed
> > > > > previously.
> > > > We might be talking past each other. That sounds like SVM, e.g.
> > > > on
> > > > demand paging.
> > > >
> > > > What I mean is pre-faulting during command submission like
> > > > radeon,
> > > > amdgpu and i915 do for the userptr handling.
> > > Yes, then we're talking about the same thing.
> > >
> > > We discussed in this thread here, started by Dave.
> > >
> > > https://lore.kernel.org/dri-devel/CAPM=9twPgn+fpbkig0Vhjt=cJdHQFbNH_Z=sRhSZwuvLKhavbA@mail.gmail.com/
> > >
> > > I still think the right place is in drm_gpuvm for this sort of
> > > stuff.
> > > And I think that's the concluding argument by Sima as well.
> > >
> > > In any case, If the planned drm_exec development is to be a full
> > > execbuf helper, I think we need a capable sub-helper for ONLY the
> > > ww
> > > transaction locking as well, with support for the various locking
> > > primitives. In particular if we're going to be able to port i915
> > > ww
> > > transaction locking over. There are more uses of the ww locking
> > > transacions than execbuf.
> > >
> > > > For that you need to re-start the whole handling similar to how
> > > > you
> > > > need
> > > > to re-start for the mutex locking when you detect that the page
> > > > array
> > > > is
> > > > stale, the difference is that you are not allowed to hold any
> > > > resv
> > > > locks
> > > > while pre-faulting.
> > > >
> > > > That's why it is a requirement that the drm_exec loop starts
> > > > without
> > > > any
> > > > locks held.
> > > But wouldn't you need an outer (userptr) loop and an inner
> > > (ww_transaction) loop for this? Why would we want to re-validate
> > > userptrs on -EDEADLKS?
> > >
> > > > > Anyway if still there would be helpers in drm_exec for some
> > > > > other
> > > > > generic userptr solution, those need to be done before the
> > > > > ww_acquire_ctx_init(). The contended locking here is done
> > > > > after,
> > > > > so
> > > > > I
> > > > > can't really see how these would clash.
> > > > Yes, that indeed was a problem. The ww_acquire_ctx_init() was
> > > > intentionally moved into drm_exec_cleanup() to partially
> > > > prevent
> > > > that
> > > > issue.
> > > >
> > > > I haven't fully figured out how to do handle everything
> > > > exactly,
> > > > but
> > > > at
> > > > least in principle it can be made work. With this change here
> > > > it
> > > > becomes
> > > > impossible.
> > > >
> > > > > Still, If we need to come up with another solution, I think
> > > > > it's
> > > > > fair
> > > > > we clearly sort out why.
> > > > >
> > > > > > I think we should just document that drm_exec_trylock()
> > > > > > can't
> > > > > > be
> > > > > > used
> > > > > > to
> > > > > > lock the first BO in the loop and explicitly WARN if that's
> > > > > > the
> > > > > > case.
> > > > > Unfortunately that's not sufficient for the general use-case.
> > > > > If
> > > > > we
> > > > > want to keep the ttm_bo_vm approach of dropping the mmap lock
> > > > > when
> > > > > there is contention on the bo resv, we need to be able to
> > > > > trylock
> > > > > on
> > > > > first lock.
> > > > Mhm, why exactly do we still have that dance in the first
> > > > place?
> > > >
> > > > I mean we have sorted out the mmap() and dma_resv() locking
> > > > order
> > > > long
> > > > ago. See dma_resv_lockdep() which is enforcing that.
> > > I explained that in my reply here:
> > >
> > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@linux.intel.com/
> > >
> > > We shouldn't be holding the mmap lock when waiting for stuff. In
> > > particular not while waiting for mutexes that may be blocked by
> > > gpu
> > > activity.
> > >
> > > > > Also bo creation is using trylock but might be able to use
> > > > > a sleeping lock there. But if that sleeping lock triggers an
> > > > > -
> > > > > EDEADLK
> > > > > (DEBUG_WW_MUTEX_SLOWPATH) we have the weird situation of
> > > > > referencing an
> > > > > object that never was fully created as a contending object.
> > > > I wanted to eliminate that as well by not validating the BO
> > > > during
> > > > initialization any more.
> > > > So bo creation would then be:
> > > >
> > > > ttm_bo_init(bo)
> > > >
> > > > drm_exec_while_not_all_locked() {
> > > > drm_exec_prepare_object(bo, 1);
> > > >
> > > > ttm_bo_validate(bo);
> > > > }
> > > >
> > > > if (r)
> > > > ttm_bo_put(bo);
> > > >
> > > > return r;
> > > >
> > > > I have that on a branch here somewhere prepared, but never got
> > > > the
> > > > time
> > > > to clean it up.
> > > Still, bo creation and validation may be part of a ww transaction
> > > as
> > > well, like page-table bos (Although those are pre-locked so
> > > perhaps
> > > not
> > > a good example). But in the general case, I'm not sure this is
> > > sufficient for all use-cases.
> > >
> > > /Thomas
> > >
> > >
> > >
> > > > Regards,
> > > > Christian.
> > > >
> > > > > So the only really working alternative solution I can see is
> > > > > that
> > > > > drm_exec_trylock simply fails if there is a contended lock
> > > > > and
> > > > > we'd
> > > > > need to live with the weird bo creation situation described
> > > > > above.
> > > > >
> > > > > /Thomas
> > > > >
> > > > > > Regards,
> > > > > > Christian.
> > > > > >
> > > > > > > Cc: Christian König<christian.koenig at amd.com>
> > > > > > > Cc: Somalapuram Amaranath<Amaranath.Somalapuram at amd.com>
> > > > > > > Cc: Matthew Brost<matthew.brost at intel.com>
> > > > > > > Cc:<dri-devel at lists.freedesktop.org>
> > > > > > > Signed-off-by: Thomas
> > > > > > > Hellström<thomas.hellstrom at linux.intel.com>
> > > > > > > ---
> > > > > > > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16
> > > > > > > ++++---
> > > > > > > --
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6
> > > > > > > ++--
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 4 +-
> > > > > > > -
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 8
> > > > > > > ++---
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 8
> > > > > > > ++---
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 4 +-
> > > > > > > -
> > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 8
> > > > > > > ++---
> > > > > > > drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
> > > > > > > drivers/gpu/drm/drm_exec.c | 35
> > > > > > > ++++++++++++++-----
> > > > > > > drivers/gpu/drm/drm_gpuvm.c | 8
> > > > > > > ++---
> > > > > > > drivers/gpu/drm/imagination/pvr_job.c | 2 +-
> > > > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 2 +-
> > > > > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +-
> > > > > > > drivers/gpu/drm/tests/drm_exec_test.c | 12
> > > > > > > +++----
> > > > > > > drivers/gpu/drm/xe/xe_gt_pagefault.c | 4 +-
> > > > > > > -
> > > > > > > drivers/gpu/drm/xe/xe_vm.c | 10
> > > > > > > +++---
> > > > > > > include/drm/drm_exec.h | 23
> > > > > > > +++++++++---
> > > > > > > 17 files changed, 92 insertions(+), 62 deletions(-)
> > > > > > >
> > > > > > > diff --git
> > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > index e4d4e55c08ad..4a08a692aa1f 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > @@ -1152,12 +1152,12 @@ static int
> > > > > > > reserve_bo_and_vm(struct
> > > > > > > kgd_mem
> > > > > > > *mem,
> > > > > > > drm_exec_init(&ctx->exec,
> > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&ctx->exec) {
> > > > > > > ret = amdgpu_vm_lock_pd(vm, &ctx->exec,
> > > > > > > 2);
> > > > > > > - drm_exec_retry_on_contention(&ctx-
> > > > > > > >exec);
> > > > > > > + ret = drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto error;
> > > > > > >
> > > > > > > ret = drm_exec_prepare_obj(&ctx->exec,
> > > > > > > &bo-
> > > > > > > > tbo.base, 1);
> > > > > > > - drm_exec_retry_on_contention(&ctx-
> > > > > > > >exec);
> > > > > > > + ret = drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto error;
> > > > > > > }
> > > > > > > @@ -1199,14 +1199,14 @@ static int
> > > > > > > reserve_bo_and_cond_vms(struct
> > > > > > > kgd_mem *mem,
> > > > > > >
> > > > > > > ret = amdgpu_vm_lock_pd(entry-
> > > > > > > > bo_va-
> > > > > > > > base.vm,
> > > > > > > &ctx-
> > > > > > > >exec,
> > > > > > > 2);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec, ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto error;
> > > > > > > ++ctx->n_vms;
> > > > > > > }
> > > > > > >
> > > > > > > ret = drm_exec_prepare_obj(&ctx->exec,
> > > > > > > &bo-
> > > > > > > > tbo.base, 1);
> > > > > > > - drm_exec_retry_on_contention(&ctx-
> > > > > > > >exec);
> > > > > > > + ret = drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto error;
> > > > > > > }
> > > > > > > @@ -2619,7 +2619,7 @@ static int
> > > > > > > validate_invalid_user_pages(struct
> > > > > > > amdkfd_process_info *process_info)
> > > > > > > list_for_each_entry(peer_vm,
> > > > > > > &process_info-
> > > > > > > > vm_list_head,
> > > > > > > vm_list_node) {
> > > > > > > ret = amdgpu_vm_lock_pd(peer_vm,
> > > > > > > &exec,
> > > > > > > 2);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto unreserve_out;
> > > > > > > }
> > > > > > > @@ -2631,7 +2631,7 @@ static int
> > > > > > > validate_invalid_user_pages(struct
> > > > > > > amdkfd_process_info *process_info)
> > > > > > >
> > > > > > > gobj = &mem->bo->tbo.base;
> > > > > > > ret =
> > > > > > > drm_exec_prepare_obj(&exec,
> > > > > > > gobj,
> > > > > > > 1);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret))
> > > > > > > goto unreserve_out;
> > > > > > > }
> > > > > > > @@ -2875,7 +2875,7 @@ int
> > > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info,
> > > > > > > struct
> > > > > > > dma_fence __rcu *
> > > > > > > list_for_each_entry(peer_vm,
> > > > > > > &process_info-
> > > > > > > > vm_list_head,
> > > > > > > vm_list_node) {
> > > > > > > ret = amdgpu_vm_lock_pd(peer_vm,
> > > > > > > &exec,
> > > > > > > 2);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret)) {
> > > > > > > pr_err("Locking VM PD
> > > > > > > failed,
> > > > > > > ret:
> > > > > > > %d\n", ret);
> > > > > > > goto ttm_reserve_fail;
> > > > > > > @@ -2891,7 +2891,7 @@ int
> > > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info,
> > > > > > > struct
> > > > > > > dma_fence __rcu *
> > > > > > >
> > > > > > > gobj = &mem->bo->tbo.base;
> > > > > > > ret =
> > > > > > > drm_exec_prepare_obj(&exec,
> > > > > > > gobj,
> > > > > > > 1);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (unlikely(ret)) {
> > > > > > > pr_err("drm_exec_prepare
> > > > > > > _obj
> > > > > > > failed, ret: %d\n", ret);
> > > > > > > goto ttm_reserve_fail;
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > index ec888fc6ead8..299e46a6d934 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > @@ -897,7 +897,7 @@ static int
> > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > amdgpu_cs_parser *p,
> > > > > > >
> > > > > > > drm_exec_until_all_locked(&p->exec) {
> > > > > > > r = amdgpu_vm_lock_pd(&fpriv->vm, &p-
> > > > > > > >exec,
> > > > > > > 1
> > > > > > > + p-
> > > > > > > > gang_size);
> > > > > > > - drm_exec_retry_on_contention(&p->exec);
> > > > > > > + r = drm_exec_retry_on_contention(&p-
> > > > > > > >exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_free_user_pages;
> > > > > > >
> > > > > > > @@ -905,7 +905,7 @@ static int
> > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > amdgpu_cs_parser *p,
> > > > > > > /* One fence for TTM and one for
> > > > > > > each
> > > > > > > CS
> > > > > > > job */
> > > > > > > r = drm_exec_prepare_obj(&p-
> > > > > > > >exec,
> > > > > > > &e-
> > > > > > > > bo-
> > > > > > > > tbo.base,
> > > > > > > 1 + p-
> > > > > > > > gang_size);
> > > > > > > - drm_exec_retry_on_contention(&p-
> > > > > > > > exec);
> > > > > > > + r =
> > > > > > > drm_exec_retry_on_contention(&p-
> > > > > > > > exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto
> > > > > > > out_free_user_pages;
> > > > > > >
> > > > > > > @@ -915,7 +915,7 @@ static int
> > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > amdgpu_cs_parser *p,
> > > > > > > if (p->uf_bo) {
> > > > > > > r = drm_exec_prepare_obj(&p-
> > > > > > > >exec,
> > > > > > > &p-
> > > > > > > > uf_bo->tbo.base,
> > > > > > > 1 + p-
> > > > > > > > gang_size);
> > > > > > > - drm_exec_retry_on_contention(&p-
> > > > > > > > exec);
> > > > > > > + r =
> > > > > > > drm_exec_retry_on_contention(&p-
> > > > > > > > exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto
> > > > > > > out_free_user_pages;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > index cfdf558b48b6..8b2b86c7a6c5 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > @@ -74,7 +74,7 @@ int amdgpu_map_static_csa(struct
> > > > > > > amdgpu_device
> > > > > > > *adev, struct amdgpu_vm *vm,
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > if (likely(!r))
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &bo-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r)) {
> > > > > > > DRM_ERROR("failed to reserve
> > > > > > > CSA,PD
> > > > > > > BOs:
> > > > > > > err=%d\n", r);
> > > > > > > goto error;
> > > > > > > @@ -114,7 +114,7 @@ int amdgpu_unmap_static_csa(struct
> > > > > > > amdgpu_device *adev, struct amdgpu_vm *vm,
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > if (likely(!r))
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &bo-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r)) {
> > > > > > > DRM_ERROR("failed to reserve
> > > > > > > CSA,PD
> > > > > > > BOs:
> > > > > > > err=%d\n", r);
> > > > > > > goto error;
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > index 67c234bcf89f..17e16c971e21 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > @@ -239,12 +239,12 @@ static void
> > > > > > > amdgpu_gem_object_close(struct
> > > > > > > drm_gem_object *obj,
> > > > > > > drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > r = drm_exec_prepare_obj(&exec, &bo-
> > > > > > > > tbo.base,
> > > > > > > 1);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > > }
> > > > > > > @@ -776,13 +776,13 @@ int amdgpu_gem_va_ioctl(struct
> > > > > > > drm_device
> > > > > > > *dev, void *data,
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > if (gobj) {
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > gobj);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&exec);
> > > > > > > + r =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error;
> > > > > > > }
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(&fpriv->vm, &exec,
> > > > > > > 2);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > index 5ca5c47ab54e..1b1a5147606e 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > @@ -1221,12 +1221,12 @@ int
> > > > > > > amdgpu_mes_ctx_map_meta_data(struct
> > > > > > > amdgpu_device *adev,
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &ctx_data-
> > > > > > > > meta_data_obj-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error_fini_exec;
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error_fini_exec;
> > > > > > > }
> > > > > > > @@ -1292,12 +1292,12 @@ int
> > > > > > > amdgpu_mes_ctx_unmap_meta_data(struct
> > > > > > > amdgpu_device *adev,
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &ctx_data-
> > > > > > > > meta_data_obj-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > index e22cb2b5cd92..72b8213e352c 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > @@ -77,7 +77,7 @@ int amdgpu_seq64_map(struct
> > > > > > > amdgpu_device
> > > > > > > *adev,
> > > > > > > struct amdgpu_vm *vm,
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > if (likely(!r))
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &bo-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error;
> > > > > > > }
> > > > > > > @@ -138,7 +138,7 @@ void amdgpu_seq64_unmap(struct
> > > > > > > amdgpu_device
> > > > > > > *adev, struct amdgpu_fpriv *fpriv)
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > if (likely(!r))
> > > > > > > r = drm_exec_lock_obj(&exec,
> > > > > > > &bo-
> > > > > > > > tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > index e01c1c8e64c4..63392ce43945 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > @@ -89,12 +89,12 @@ static int map_ring_data(struct
> > > > > > > amdgpu_device
> > > > > > > *adev, struct amdgpu_vm *vm,
> > > > > > > drm_exec_init(&exec, 0, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > r = drm_exec_lock_obj(&exec, &bo-
> > > > > > > >tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error_fini_exec;
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto error_fini_exec;
> > > > > > > }
> > > > > > > @@ -152,12 +152,12 @@ static int unmap_ring_data(struct
> > > > > > > amdgpu_device *adev, struct amdgpu_vm *vm,
> > > > > > > drm_exec_init(&exec, 0, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > r = drm_exec_lock_obj(&exec, &bo-
> > > > > > > >tbo.base);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + r = drm_exec_retry_on_contention(&exec,
> > > > > > > r);
> > > > > > > if (unlikely(r))
> > > > > > > goto out_unlock;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > index 386875e6eb96..a3aa7fd22f6a 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > @@ -1499,7 +1499,7 @@ static int
> > > > > > > svm_range_reserve_bos(struct
> > > > > > > svm_validate_context *ctx, bool intr)
> > > > > > > vm = drm_priv_to_vm(pdd-
> > > > > > > >drm_priv);
> > > > > > >
> > > > > > > r = amdgpu_vm_lock_pd(vm, &ctx-
> > > > > > > > exec,
> > > > > > > 2);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec);
> > > > > > > + r =
> > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > exec, r);
> > > > > > > if (unlikely(r)) {
> > > > > > > pr_debug("failed %d to
> > > > > > > reserve
> > > > > > > bo\n", r);
> > > > > > > goto unreserve_out;
> > > > > > > diff --git a/drivers/gpu/drm/drm_exec.c
> > > > > > > b/drivers/gpu/drm/drm_exec.c
> > > > > > > index 2da094bdf8a4..3770a5d30213 100644
> > > > > > > --- a/drivers/gpu/drm/drm_exec.c
> > > > > > > +++ b/drivers/gpu/drm/drm_exec.c
> > > > > > > @@ -28,12 +28,12 @@
> > > > > > > * drm_exec_init(&exec,
> > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT);
> > > > > > > * drm_exec_until_all_locked(&exec) {
> > > > > > > * ret = drm_exec_prepare_obj(&exec, boA,
> > > > > > > 1);
> > > > > > > - * drm_exec_retry_on_contention(&exec);
> > > > > > > + * ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > * if (ret)
> > > > > > > * goto error;
> > > > > > > *
> > > > > > > * ret = drm_exec_prepare_obj(&exec, boB,
> > > > > > > 1);
> > > > > > > - * drm_exec_retry_on_contention(&exec);
> > > > > > > + * ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > * if (ret)
> > > > > > > * goto error;
> > > > > > > * }
> > > > > > > @@ -48,7 +48,8 @@
> > > > > > > */
> > > > > > >
> > > > > > > /* Dummy value used to initially enter the retry loop
> > > > > > > */
> > > > > > > -#define DRM_EXEC_DUMMY ((void *)~0)
> > > > > > > +#define DRM_EXEC_DUMMY ERR_PTR(-ESTALE)
> > > > > > > +#define DRM_EXEC_CONTENDED ERR_PTR(-EDEADLK)
> > > > > > >
> > > > > > > /* Unlock all objects and drop references */
> > > > > > > static void drm_exec_unlock_all(struct drm_exec
> > > > > > > *exec)
> > > > > > > @@ -131,8 +132,7 @@ bool drm_exec_cleanup(struct drm_exec
> > > > > > > *exec)
> > > > > > > return true;
> > > > > > > }
> > > > > > >
> > > > > > > - drm_exec_unlock_all(exec);
> > > > > > > - exec->num_objects = 0;
> > > > > > > + exec->contended = NULL;
> > > > > > > return true;
> > > > > > > }
> > > > > > > EXPORT_SYMBOL(drm_exec_cleanup);
> > > > > > > @@ -194,6 +194,27 @@ static int
> > > > > > > drm_exec_lock_contended(struct
> > > > > > > drm_exec *exec)
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * drm_exec_handle_contended() - Perform cleanup before
> > > > > > > a ww
> > > > > > > transaction restart
> > > > > > > + * @exec: Pointer to the drm_exec object.
> > > > > > > + *
> > > > > > > + * Unlocks all held resvs and re-locks the contended
> > > > > > > object.
> > > > > > > + *
> > > > > > > + * Return: 0 on success, negative error code on failure.
> > > > > > > + */
> > > > > > > +int drm_exec_handle_contended(struct drm_exec *exec)
> > > > > > > +{
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + drm_exec_unlock_all(exec);
> > > > > > > + exec->num_objects = 0;
> > > > > > > + ret = drm_exec_lock_contended(exec);
> > > > > > > + exec->contended = DRM_EXEC_CONTENDED;
> > > > > > > +
> > > > > > > + return ret;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(drm_exec_handle_contended);
> > > > > > > +
> > > > > > > /**
> > > > > > > * drm_exec_lock_obj - lock a GEM object for use
> > > > > > > * @exec: the drm_exec object with the state
> > > > > > > @@ -209,10 +230,6 @@ int drm_exec_lock_obj(struct
> > > > > > > drm_exec
> > > > > > > *exec,
> > > > > > > struct drm_gem_object *obj)
> > > > > > > {
> > > > > > > int ret;
> > > > > > >
> > > > > > > - ret = drm_exec_lock_contended(exec);
> > > > > > > - if (unlikely(ret))
> > > > > > > - return ret;
> > > > > > > -
> > > > > > > if (exec->prelocked == obj) {
> > > > > > > drm_gem_object_put(exec->prelocked);
> > > > > > > exec->prelocked = NULL;
> > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > index f9eb56f24bef..0923d6ae18e2 100644
> > > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > @@ -1254,18 +1254,18 @@ drm_gpuvm_exec_lock(struct
> > > > > > > drm_gpuvm_exec
> > > > > > > *vm_exec)
> > > > > > >
> > > > > > > drm_exec_until_all_locked(exec) {
> > > > > > > ret = drm_gpuvm_prepare_vm(gpuvm, exec,
> > > > > > > num_fences);
> > > > > > > - drm_exec_retry_on_contention(exec);
> > > > > > > + ret = drm_exec_retry_on_contention(exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > goto err;
> > > > > > >
> > > > > > > ret = drm_gpuvm_prepare_objects(gpuvm,
> > > > > > > exec,
> > > > > > > num_fences);
> > > > > > > - drm_exec_retry_on_contention(exec);
> > > > > > > + ret = drm_exec_retry_on_contention(exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > goto err;
> > > > > > >
> > > > > > > if (vm_exec->extra.fn) {
> > > > > > > ret = vm_exec-
> > > > > > > >extra.fn(vm_exec);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > goto err;
> > > > > > > }
> > > > > > > @@ -1346,7 +1346,7 @@ drm_gpuvm_exec_lock_range(struct
> > > > > > > drm_gpuvm_exec *vm_exec,
> > > > > > > drm_exec_until_all_locked(exec) {
> > > > > > > ret = drm_gpuvm_prepare_range(gpuvm,
> > > > > > > exec,
> > > > > > > addr,
> > > > > > > range,
> > > > > > > vm_exec-
> > > > > > > > num_fences);
> > > > > > > - drm_exec_retry_on_contention(exec);
> > > > > > > + ret = drm_exec_retry_on_contention(exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > goto err;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > b/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > index 78c2f3c6dce0..6e0ce6c4576c 100644
> > > > > > > --- a/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > +++ b/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > @@ -574,7 +574,7 @@ prepare_job_resvs_for_each(struct
> > > > > > > drm_exec
> > > > > > > *exec, struct pvr_job_data *job_data,
> > > > > > > drm_exec_until_all_locked(exec) {
> > > > > > > int err = jobs_lock_all_objs(exec,
> > > > > > > job_data,
> > > > > > > job_count);
> > > > > > >
> > > > > > > - drm_exec_retry_on_contention(exec);
> > > > > > > + err = drm_exec_retry_on_contention(exec,
> > > > > > > err);
> > > > > > > if (err)
> > > > > > > return err;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > index fba78193127d..01992b43ea4b 100644
> > > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > @@ -259,7 +259,7 @@ static int submit_lock_objects(struct
> > > > > > > msm_gem_submit *submit)
> > > > > > > for (unsigned i = 0; i < submit->nr_bos;
> > > > > > > i++)
> > > > > > > {
> > > > > > > struct drm_gem_object *obj =
> > > > > > > submit-
> > > > > > > > bos[i].obj;
> > > > > > > ret =
> > > > > > > drm_exec_prepare_obj(&submit-
> > > > > > > > exec,
> > > > > > > obj, 1);
> > > > > > > -
> > > > > > > drm_exec_retry_on_contention(&su
> > > > > > > bmit-
> > > > > > > > exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&submit->exec, ret);
> > > > > > > if (ret)
> > > > > > > goto error;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > index ee02cd833c5e..0c871634fdfb 100644
> > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > @@ -1350,7 +1350,7 @@ nouveau_uvmm_bind_job_submit(struct
> > > > > > > nouveau_job *job,
> > > > > > > drm_exec_init(exec, vme->flags, 0);
> > > > > > > drm_exec_until_all_locked(exec) {
> > > > > > > ret = bind_lock_validate(job, exec, vme-
> > > > > > > > num_fences);
> > > > > > > - drm_exec_retry_on_contention(exec);
> > > > > > > + ret = drm_exec_retry_on_contention(exec,
> > > > > > > ret);
> > > > > > > if (ret) {
> > > > > > > op = list_last_op(&bind_job-
> > > > > > > >ops);
> > > > > > > goto unwind;
> > > > > > > diff --git a/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > index 81f928a429ba..28558fdb08df 100644
> > > > > > > --- a/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > +++ b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > @@ -63,7 +63,7 @@ static void test_lock(struct kunit
> > > > > > > *test)
> > > > > > > drm_exec_init(&exec,
> > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = drm_exec_lock_obj(&exec, &gobj);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > > @@ -83,14 +83,14 @@ static void test_lock_unlock(struct
> > > > > > > kunit
> > > > > > > *test)
> > > > > > > drm_exec_init(&exec,
> > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = drm_exec_lock_obj(&exec, &gobj);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > >
> > > > > > > drm_exec_unlock_obj(&exec, &gobj);
> > > > > > > ret = drm_exec_lock_obj(&exec, &gobj);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > > @@ -110,13 +110,13 @@ static void test_duplicates(struct
> > > > > > > kunit
> > > > > > > *test)
> > > > > > > drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = drm_exec_lock_obj(&exec, &gobj);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > >
> > > > > > > ret = drm_exec_lock_obj(&exec, &gobj);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > > @@ -137,7 +137,7 @@ static void test_prepare(struct kunit
> > > > > > > *test)
> > > > > > > drm_exec_init(&exec,
> > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = drm_exec_prepare_obj(&exec, &gobj,
> > > > > > > 1);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > index 040dd142c49c..20ec1ab1b52d 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > @@ -200,7 +200,7 @@ static int handle_pagefault(struct
> > > > > > > xe_gt
> > > > > > > *gt,
> > > > > > > struct pagefault *pf)
> > > > > > > drm_exec_init(&exec, 0, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = xe_pf_begin(&exec, vma, atomic,
> > > > > > > tile-
> > > > > > > > id);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > goto unlock_dma_resv;
> > > > > > >
> > > > > > > @@ -543,7 +543,7 @@ static int handle_acc(struct xe_gt
> > > > > > > *gt,
> > > > > > > struct
> > > > > > > acc *acc)
> > > > > > > drm_exec_init(&exec, 0, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > ret = xe_pf_begin(&exec, vma, true,
> > > > > > > tile-
> > > > > > > > id);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + ret =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > ret);
> > > > > > > if (ret)
> > > > > > > break;
> > > > > > > }
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > index e2ec148c9c33..335524e803e7 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > @@ -501,7 +501,7 @@ static void
> > > > > > > preempt_rebind_work_func(struct
> > > > > > > work_struct *w)
> > > > > > > bool done = false;
> > > > > > >
> > > > > > > err = xe_preempt_work_begin(&exec, vm,
> > > > > > > &done);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + err =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > err);
> > > > > > > if (err || done) {
> > > > > > > drm_exec_fini(&exec);
> > > > > > > if (err &&
> > > > > > > xe_vm_validate_should_retry(&exec, err, &end))
> > > > > > > @@ -1052,7 +1052,7 @@ static void
> > > > > > > xe_vma_destroy_unlocked(struct
> > > > > > > xe_vma *vma)
> > > > > > > drm_exec_init(&exec, 0, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > err = xe_vm_lock_vma(&exec, vma);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + err =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > err);
> > > > > > > if (XE_WARN_ON(err))
> > > > > > > break;
> > > > > > > }
> > > > > > > @@ -2148,11 +2148,11 @@ static struct xe_vma
> > > > > > > *new_vma(struct
> > > > > > > xe_vm
> > > > > > > *vm, struct drm_gpuva_op_map *op,
> > > > > > > err = 0;
> > > > > > > if (!bo->vm) {
> > > > > > > err =
> > > > > > > drm_exec_lock_obj(&exec,
> > > > > > > xe_vm_obj(vm));
> > > > > > > -
> > > > > > > drm_exec_retry_on_conten
> > > > > > > tion
> > > > > > > (&
> > > > > > > exec);
> > > > > > > + err =
> > > > > > > drm_exec_retry_on_contention(&exec, err);
> > > > > > > }
> > > > > > > if (!err) {
> > > > > > > err =
> > > > > > > drm_exec_lock_obj(&exec,
> > > > > > > &bo->ttm.base);
> > > > > > > -
> > > > > > > drm_exec_retry_on_conten
> > > > > > > tion
> > > > > > > (&
> > > > > > > exec);
> > > > > > > + err =
> > > > > > > drm_exec_retry_on_contention(&exec, err);
> > > > > > > }
> > > > > > > if (err) {
> > > > > > > drm_exec_fini(&exec);
> > > > > > > @@ -2884,7 +2884,7 @@ static int
> > > > > > > vm_bind_ioctl_ops_execute(struct
> > > > > > > xe_vm *vm,
> > > > > > > DRM_EXEC_IGNORE_DUPLICATES, 0);
> > > > > > > drm_exec_until_all_locked(&exec) {
> > > > > > > err =
> > > > > > > vm_bind_ioctl_ops_lock_and_prep(&exec,
> > > > > > > vm,
> > > > > > > vops);
> > > > > > > - drm_exec_retry_on_contention(&exec);
> > > > > > > + err =
> > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > err);
> > > > > > > if (err)
> > > > > > > goto unlock;
> > > > > > >
> > > > > > > diff --git a/include/drm/drm_exec.h
> > > > > > > b/include/drm/drm_exec.h
> > > > > > > index aa786b828a0a..fafb40d96e38 100644
> > > > > > > --- a/include/drm/drm_exec.h
> > > > > > > +++ b/include/drm/drm_exec.h
> > > > > > > @@ -51,6 +51,8 @@ struct drm_exec {
> > > > > > > struct drm_gem_object *prelocked;
> > > > > > > };
> > > > > > >
> > > > > > > +int drm_exec_handle_contended(struct drm_exec *exec);
> > > > > > > +
> > > > > > > /**
> > > > > > > * drm_exec_obj() - Return the object for a give
> > > > > > > drm_exec
> > > > > > > index
> > > > > > > * @exec: Pointer to the drm_exec context
> > > > > > > @@ -113,15 +115,26 @@ __PASTE(__drm_exec_,
> > > > > > > __LINE__): \
> > > > > > > /**
> > > > > > > * drm_exec_retry_on_contention - restart the loop to
> > > > > > > grap
> > > > > > > all
> > > > > > > locks
> > > > > > > * @exec: drm_exec object
> > > > > > > + * @_ret: The current error status
> > > > > > > *
> > > > > > > * Control flow helper to continue when a contention
> > > > > > > was
> > > > > > > detected
> > > > > > > and we need to
> > > > > > > * clean up and re-start the loop to prepare all GEM
> > > > > > > objects.
> > > > > > > + *
> > > > > > > + * Return: If no loop restart occurred: The error
> > > > > > > status.
> > > > > > > */
> > > > > > > -#define
> > > > > > > drm_exec_retry_on_contention(exec) \
> > > > > > > - do
> > > > > > > { \
> > > > > > > - if
> > > > > > > (unlikely(drm_exec_is_contended(exec))) \
> > > > > > > - goto
> > > > > > > *__drm_exec_retry_ptr; \
> > > > > > > - } while (0)
> > > > > > > +#define drm_exec_retry_on_contention(exec,
> > > > > > > _ret) \
> > > > > > > + ({
> > > > > > >
> > > > > > > \
> > > > > > > + struct drm_exec *__exec =
> > > > > > > (exec); \
> > > > > > > + int __ret =
> > > > > > > (_ret); \
> > > > > > > +
> > > > > > >
> > > > > > > \
> > > > > > > + if
> > > > > > > (unlikely(drm_exec_is_contended(__exec)))
> > > > > > > { \
> > > > > > > + WARN_ON(__ret != -
> > > > > > > EDEADLK); \
> > > > > > > + __ret =
> > > > > > > drm_exec_handle_contended(__exec); \
> > > > > > > + if
> > > > > > > (!__ret) \
> > > > > > > + goto
> > > > > > > *__drm_exec_retry_ptr; \
> > > > > > > + }
> > > > > > >
> > > > > > > \
> > > > > > > + __ret;
> > > > > > >
> > > > > > > \
> > > > > > > + })
> > > > > > >
> > > > > > > /**
> > > > > > > * drm_exec_is_contended - check for contention
>
More information about the Intel-xe
mailing list