[RFC PATCH v3 13/21] drm/exec: Rework contended locking

Thomas Hellström thomas.hellstrom at linux.intel.com
Wed May 29 07:18:04 UTC 2024


On Tue, 2024-05-28 at 13:03 +0200, Christian König wrote:
> Am 28.05.24 um 10:07 schrieb Thomas Hellström:
> > On Tue, 2024-05-28 at 08:51 +0200, Christian König wrote:
> > > > 2) Any contended lock held at loop start is completely
> > > > encapsulated
> > > > in
> > > > the ww transaction and can and will be unlocked when exiting
> > > > it, so
> > > > this patch doesn't introduce any additional problems for
> > > > userptr
> > > > handling AFAICT.
> > > The drm_exec object was intentionally design to not have anything
> > > locked
> > > at the beginning of the loop. See the discussion I had with Sima
> > > around
> > > that when pushing the drm_exec object upstream.
> > > 
> > > I would really like to stick with that design and honestly don't
> > > see
> > > the
> > > reason to change that. Contenting on a trylock seem to be much
> > > more
> > > questionable.
> > The change here is to make sure we *don't* have contention in a
> > trylock, which is otherwise inherent in the current drm_exec
> > design.
> 
> My sentence was probably a bit misleading. What I wanted to say is
> that 
> trylock as first thing in the loop sounds really odd to me.
> 
> See the intention of a trylock is to acquire something optional. What
> we 
> do for the freshly allocated BO and the 'don't try to block with the 
> mmap lock held' case is actually kind something different.
> 
> A clean approach would be to to have the BO initialization and
> backing 
> store allocation steps separated. In this case you don't even need to
> use trylock here.
> 
> And for the VM fault locking case the clean approach would be to tell
> the drm_exec object of the vm_fault parameters so that this helper
> can 
> do the drop all locks, drop the mmap lock, acquire the blocking lock
> and 
> return -EAGAIN.
> 
> This has the huge benefit that we not only stop blocking for the
> faulted 
> BO, but eventually all others which might need to move so that the 
> faulted BO is CPU accessible. I think that this is actually the more 
> problematic case.

Yes, this fault handler approach sounds ok to me. Do you mean we make
drm_exec aware of both the mmap lock and the desire not to block while
it is held? I figure that could come in handy for SVM gpu pagefaults as
well.

For the bo initalization, the suggested solution there deviates a lot
from the non-exec case, where a never-failing trylock on creation is
sometimes necessary. This will probably be an unwanted obstacle for
drm_exec conversion. 

What do you think about the following for bo init? This also basically
means the user of drm_exec can also *choose* to avoid trylock problems
already at the start of the drm_exec loop.

https://patchwork.freedesktop.org/patch/595863/?series=133643&rev=7
and
https://patchwork.freedesktop.org/patch/595866/?series=133643&rev=7

Then if we also do the fault handler mode we could drop this patch
assuming that drivers have all tools to handle existing cases of
ww_mutex_trylock(). 


> 
> > What I'm trying to say here is that we end up with the contended
> > lock
> > grabbed at loop start you already conceptually have a conflicting
> > lock
> > held (the we_class::acquire_key). Both these can be resolved.
> 
> Yeah, I'm perfectly aware of that. But this is just a shortcoming of 
> lockdep and not a real problem.
> 
> During the drm_exec code review we already moved the
> ww_acquire_init() 
> into the cleanup function so that it's only called at the start of
> the 
> loop. Background is that we ran into lockdep warnings with that
> otherwise.
> 
> But functionally it would still work if we do this in drm_exec_ini().
> 
> > > > 3) The need for a fully capable ww transaction helper moving
> > > > forward.
> > > > If we need a tool that also does userptr locking, then I think
> > > > we
> > > > need
> > > > to separate that from the ww transaction tool and only pass the
> > > > latter
> > > > around to TTM.
> > > drm_exec is *not* meant to be a ww_transaction helper.
> > > 
> > > The functionality here is to support drivers in their CS
> > > interface
> > > and
> > > that includes userptr handling as well as a couple of other
> > > things.
> > Then if so, I don't think drm_exec is the correct functionality to
> > pass
> > to TTM to resolve the eviction issues, but rather a ww transaction
> > helper that can be used standalone *and* by drm_exec. Now the
> > functionality would be more or less what drm exec is today, but
> > slightly augmented.
> > 
> > But then IMHO instead of changing name and more or less replicating
> > what drm_exec is today wouldn't it be a better idea to subclass
> > drm_exec into a full-fledged CS helper at the time when that
> > functionality is indeed added?
> 
> Mhm, interesting idea. But it still means that the base object needs
> to 
> be designed in a way to not prevent the implementation of the
> subclass.

Sure, although I am still of the opinion that the patch at hand does
not prevent that, but agreed violates a design principle.

I think we need to be careful to avoid a situaion where it is possible
to open-code something that is desriable / needed by drivers, and that
is not possible withing drm_exec, but still the driver needs to use
drm_exec since it's required for other tools.

/Thomas


> 
> Christian.
> 
> > 
> > /Thomas
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Thanks,
> > > > Thomas
> > > > 
> > > > On Wed, 2024-05-22 at 19:42 +0200, Thomas Hellström wrote:
> > > > > On Wed, 2024-05-22 at 18:52 +0200, Christian König wrote:
> > > > > > Am 22.05.24 um 16:32 schrieb Thomas Hellström:
> > > > > > > On Wed, 2024-05-22 at 07:52 +0200, Christian König wrote:
> > > > > > > > Am 21.05.24 um 09:16 schrieb Thomas Hellström:
> > > > > > > > > If contention and backoff occurs during a drm_exec ww
> > > > > > > > > transaction,
> > > > > > > > > the contended lock is not locked again until the next
> > > > > > > > > orinary
> > > > > > > > > attempt to lock a dma_resv lock. However, with the
> > > > > > > > > introduction
> > > > > > > > > of
> > > > > > > > > drm_exec_trylock(), that doesn't work, since the
> > > > > > > > > locking
> > > > > > > > > of
> > > > > > > > > the
> > > > > > > > > contended lock needs to be a sleeping lock. Neither
> > > > > > > > > can
> > > > > > > > > we
> > > > > > > > > ignore
> > > > > > > > > locking the contended lock during a trylock since
> > > > > > > > > that
> > > > > > > > > would
> > > > > > > > > violate
> > > > > > > > > at least the ww_mutex annotations.
> > > > > > > > > 
> > > > > > > > > So resolve this by actually locking the contended
> > > > > > > > > lock
> > > > > > > > > during
> > > > > > > > > drm_exec_retry_on_contention(). However, this
> > > > > > > > > introduces
> > > > > > > > > a
> > > > > > > > > new
> > > > > > > > > point
> > > > > > > > > of failure since locking the contended lock may
> > > > > > > > > return -
> > > > > > > > > EINTR.
> > > > > > > > > 
> > > > > > > > > Hence drm_exec_retry_on_contention() must take an
> > > > > > > > > error
> > > > > > > > > parameter
> > > > > > > > > and
> > > > > > > > > also return a value indicating success.
> > > > > > > > After thinking more about that I have to pretty clearly
> > > > > > > > NAK
> > > > > > > > this.
> > > > > > > >                                      
> > > > > > > I thought we were beyond upfront NAKing in the first
> > > > > > > reply :/
> > > > > > Well my memory could fail me, but I mentioned concerns on
> > > > > > this
> > > > > > approach
> > > > > > before.
> > > > > > 
> > > > > > I was a bit annoyed seeing that again. But could as well be
> > > > > > that my
> > > > > > response never got out or that I'm mixing things up.
> > > > > I haven't seen it at least. Last discussion on this I saw was
> > > > > here. I didn't see a follow-up on that.
> > > > > 
> > > > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@linux.intel.com/
> > > > > 
> > > > > 
> > > > > > > > It's an intentional design decision to guarantee that
> > > > > > > > at
> > > > > > > > the
> > > > > > > > start of
> > > > > > > > the loop no object is locked.
> > > > > > > > 
> > > > > > > > This is because Sima and I wanted to integrate userptr
> > > > > > > > handling
> > > > > > > > into
> > > > > > > > drm_exec as well in the long term.
> > > > > > > First I agree the interface looks worse with this patch.
> > > > > > > But I thought generic userptr handling were going to end
> > > > > > > up
> > > > > > > as a
> > > > > > > gpuvm
> > > > > > > helper (without using GEM objects) as we've discussed
> > > > > > > previously.
> > > > > > We might be talking past each other. That sounds like SVM,
> > > > > > e.g.
> > > > > > on
> > > > > > demand paging.
> > > > > > 
> > > > > > What I mean is pre-faulting during command submission like
> > > > > > radeon,
> > > > > > amdgpu and i915 do for the userptr handling.
> > > > > Yes, then we're talking about the same thing.
> > > > > 
> > > > > We discussed in this thread here, started by Dave.
> > > > > 
> > > > > https://lore.kernel.org/dri-devel/CAPM=9twPgn+fpbkig0Vhjt=cJdHQFbNH_Z=sRhSZwuvLKhavbA@mail.gmail.com/
> > > > > 
> > > > > I still think the right place is in drm_gpuvm for this sort
> > > > > of
> > > > > stuff.
> > > > > And I think that's the concluding argument by Sima as well.
> > > > > 
> > > > > In any case, If the planned drm_exec development is to be a
> > > > > full
> > > > > execbuf helper, I think we need a capable sub-helper for ONLY
> > > > > the
> > > > > ww
> > > > > transaction locking as well, with support for the various
> > > > > locking
> > > > > primitives. In particular if we're going to be able to port
> > > > > i915
> > > > > ww
> > > > > transaction locking over. There are more uses of the ww
> > > > > locking
> > > > > transacions than execbuf.
> > > > > 
> > > > > > For that you need to re-start the whole handling similar to
> > > > > > how
> > > > > > you
> > > > > > need
> > > > > > to re-start for the mutex locking when you detect that the
> > > > > > page
> > > > > > array
> > > > > > is
> > > > > > stale, the difference is that you are not allowed to hold
> > > > > > any
> > > > > > resv
> > > > > > locks
> > > > > > while pre-faulting.
> > > > > > 
> > > > > > That's why it is a requirement that the drm_exec loop
> > > > > > starts
> > > > > > without
> > > > > > any
> > > > > > locks held.
> > > > > But wouldn't you need an outer (userptr) loop and an inner
> > > > > (ww_transaction) loop for this? Why would we want to re-
> > > > > validate
> > > > > userptrs on -EDEADLKS?
> > > > > 
> > > > > > > Anyway if still there would be helpers in drm_exec for
> > > > > > > some
> > > > > > > other
> > > > > > > generic userptr solution, those need to be done before
> > > > > > > the
> > > > > > > ww_acquire_ctx_init(). The contended locking here is done
> > > > > > > after,
> > > > > > > so
> > > > > > > I
> > > > > > > can't really see how these would clash.
> > > > > > Yes, that indeed was a problem. The ww_acquire_ctx_init()
> > > > > > was
> > > > > > intentionally moved into drm_exec_cleanup() to partially
> > > > > > prevent
> > > > > > that
> > > > > > issue.
> > > > > > 
> > > > > > I haven't fully figured out how to do handle everything
> > > > > > exactly,
> > > > > > but
> > > > > > at
> > > > > > least in principle it can be made work. With this change
> > > > > > here
> > > > > > it
> > > > > > becomes
> > > > > > impossible.
> > > > > > 
> > > > > > > Still, If we need to come up with another solution, I
> > > > > > > think
> > > > > > > it's
> > > > > > > fair
> > > > > > > we clearly sort out why.
> > > > > > > 
> > > > > > > > I think we should just document that drm_exec_trylock()
> > > > > > > > can't
> > > > > > > > be
> > > > > > > > used
> > > > > > > > to
> > > > > > > > lock the first BO in the loop and explicitly WARN if
> > > > > > > > that's
> > > > > > > > the
> > > > > > > > case.
> > > > > > > Unfortunately that's not sufficient for the general use-
> > > > > > > case.
> > > > > > > If
> > > > > > > we
> > > > > > > want to keep the ttm_bo_vm approach of dropping the mmap
> > > > > > > lock
> > > > > > > when
> > > > > > > there is contention on the bo resv, we need to be able to
> > > > > > > trylock
> > > > > > > on
> > > > > > > first lock.
> > > > > > Mhm, why exactly do we still have that dance in the first
> > > > > > place?
> > > > > > 
> > > > > > I mean we have sorted out the mmap() and dma_resv() locking
> > > > > > order
> > > > > > long
> > > > > > ago. See dma_resv_lockdep() which is enforcing that.
> > > > > I explained that in my reply here:
> > > > > 
> > > > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@linux.intel.com/
> > > > > 
> > > > > We shouldn't be holding the mmap lock when waiting for stuff.
> > > > > In
> > > > > particular not while waiting for mutexes that may be blocked
> > > > > by
> > > > > gpu
> > > > > activity.
> > > > > 
> > > > > > >     Also bo creation is using trylock but might be able
> > > > > > > to use
> > > > > > > a sleeping lock there. But if that sleeping lock triggers
> > > > > > > an
> > > > > > > -
> > > > > > > EDEADLK
> > > > > > > (DEBUG_WW_MUTEX_SLOWPATH) we have the weird situation of
> > > > > > > referencing an
> > > > > > > object that never was fully created as a contending
> > > > > > > object.
> > > > > > I wanted to eliminate that as well by not validating the BO
> > > > > > during
> > > > > > initialization any more.
> > > > > > So bo creation would then be:
> > > > > > 
> > > > > > ttm_bo_init(bo)
> > > > > > 
> > > > > > drm_exec_while_not_all_locked() {
> > > > > >        drm_exec_prepare_object(bo, 1);
> > > > > > 
> > > > > >        ttm_bo_validate(bo);
> > > > > > }
> > > > > > 
> > > > > > if (r)
> > > > > >        ttm_bo_put(bo);
> > > > > > 
> > > > > > return r;
> > > > > > 
> > > > > > I have that on a branch here somewhere prepared, but never
> > > > > > got
> > > > > > the
> > > > > > time
> > > > > > to clean it up.
> > > > > Still, bo creation and validation may be part of a ww
> > > > > transaction
> > > > > as
> > > > > well, like page-table bos (Although those are pre-locked so
> > > > > perhaps
> > > > > not
> > > > > a good example). But in the general case, I'm not sure this
> > > > > is
> > > > > sufficient for all use-cases.
> > > > > 
> > > > > /Thomas
> > > > > 
> > > > > 
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > So the only really working alternative solution I can see
> > > > > > > is
> > > > > > > that
> > > > > > > drm_exec_trylock simply fails if there is a contended
> > > > > > > lock
> > > > > > > and
> > > > > > > we'd
> > > > > > > need to live with the weird bo creation situation
> > > > > > > described
> > > > > > > above.
> > > > > > > 
> > > > > > > /Thomas
> > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > > Cc: Christian König<christian.koenig at amd.com>
> > > > > > > > > Cc: Somalapuram
> > > > > > > > > Amaranath<Amaranath.Somalapuram at amd.com>
> > > > > > > > > Cc: Matthew Brost<matthew.brost at intel.com>
> > > > > > > > > Cc:<dri-devel at lists.freedesktop.org>
> > > > > > > > > Signed-off-by: Thomas
> > > > > > > > > Hellström<thomas.hellstrom at linux.intel.com>
> > > > > > > > > ---
> > > > > > > > >      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |
> > > > > > > > > 16
> > > > > > > > > ++++---
> > > > > > > > > --
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        | 
> > > > > > > > > 6
> > > > > > > > > ++--
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       | 
> > > > > > > > > 4 +-
> > > > > > > > > -
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       | 
> > > > > > > > > 8
> > > > > > > > > ++---
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       | 
> > > > > > > > > 8
> > > > > > > > > ++---
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c     | 
> > > > > > > > > 4 +-
> > > > > > > > > -
> > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  | 
> > > > > > > > > 8
> > > > > > > > > ++---
> > > > > > > > >      drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 
> > > > > > > > > 2 +-
> > > > > > > > >      drivers/gpu/drm/drm_exec.c                    |
> > > > > > > > > 35
> > > > > > > > > ++++++++++++++-----
> > > > > > > > >      drivers/gpu/drm/drm_gpuvm.c                   | 
> > > > > > > > > 8
> > > > > > > > > ++---
> > > > > > > > >      drivers/gpu/drm/imagination/pvr_job.c         | 
> > > > > > > > > 2 +-
> > > > > > > > >      drivers/gpu/drm/msm/msm_gem_submit.c          | 
> > > > > > > > > 2 +-
> > > > > > > > >      drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 
> > > > > > > > > 2 +-
> > > > > > > > >      drivers/gpu/drm/tests/drm_exec_test.c         |
> > > > > > > > > 12
> > > > > > > > > +++----
> > > > > > > > >      drivers/gpu/drm/xe/xe_gt_pagefault.c          | 
> > > > > > > > > 4 +-
> > > > > > > > > -
> > > > > > > > >      drivers/gpu/drm/xe/xe_vm.c                    |
> > > > > > > > > 10
> > > > > > > > > +++---
> > > > > > > > >      include/drm/drm_exec.h                        |
> > > > > > > > > 23
> > > > > > > > > +++++++++---
> > > > > > > > >      17 files changed, 92 insertions(+), 62
> > > > > > > > > deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git
> > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > > > index e4d4e55c08ad..4a08a692aa1f 100644
> > > > > > > > > ---
> > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > > > +++
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > > > > > > > > @@ -1152,12 +1152,12 @@ static int
> > > > > > > > > reserve_bo_and_vm(struct
> > > > > > > > > kgd_mem
> > > > > > > > > *mem,
> > > > > > > > >      	drm_exec_init(&ctx->exec,
> > > > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&ctx->exec) {
> > > > > > > > >      		ret = amdgpu_vm_lock_pd(vm, &ctx-
> > > > > > > > > >exec,
> > > > > > > > > 2);
> > > > > > > > > -		drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (unlikely(ret))
> > > > > > > > >      			goto error;
> > > > > > > > >      
> > > > > > > > >      		ret = drm_exec_prepare_obj(&ctx-
> > > > > > > > > >exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base, 1);
> > > > > > > > > -		drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (unlikely(ret))
> > > > > > > > >      			goto error;
> > > > > > > > >      	}
> > > > > > > > > @@ -1199,14 +1199,14 @@ static int
> > > > > > > > > reserve_bo_and_cond_vms(struct
> > > > > > > > > kgd_mem *mem,
> > > > > > > > >      
> > > > > > > > >      			ret =
> > > > > > > > > amdgpu_vm_lock_pd(entry-
> > > > > > > > > > bo_va-
> > > > > > > > > > base.vm,
> > > > > > > > >      						&ctx
> > > > > > > > > -
> > > > > > > > > > exec,
> > > > > > > > > 2);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&ctx-
> > > > > > > > > > exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec, ret);
> > > > > > > > >      			if (unlikely(ret))
> > > > > > > > >      				goto error;
> > > > > > > > >      			++ctx->n_vms;
> > > > > > > > >      		}
> > > > > > > > >      
> > > > > > > > >      		ret = drm_exec_prepare_obj(&ctx-
> > > > > > > > > >exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base, 1);
> > > > > > > > > -		drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (unlikely(ret))
> > > > > > > > >      			goto error;
> > > > > > > > >      	}
> > > > > > > > > @@ -2619,7 +2619,7 @@ static int
> > > > > > > > > validate_invalid_user_pages(struct
> > > > > > > > > amdkfd_process_info *process_info)
> > > > > > > > >      		list_for_each_entry(peer_vm,
> > > > > > > > > &process_info-
> > > > > > > > > > vm_list_head,
> > > > > > > > >      				    vm_list_node) {
> > > > > > > > >      			ret =
> > > > > > > > > amdgpu_vm_lock_pd(peer_vm,
> > > > > > > > > &exec,
> > > > > > > > > 2);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      			if (unlikely(ret))
> > > > > > > > >      				goto unreserve_out;
> > > > > > > > >      		}
> > > > > > > > > @@ -2631,7 +2631,7 @@ static int
> > > > > > > > > validate_invalid_user_pages(struct
> > > > > > > > > amdkfd_process_info *process_info)
> > > > > > > > >      
> > > > > > > > >      			gobj = &mem->bo->tbo.base;
> > > > > > > > >      			ret =
> > > > > > > > > drm_exec_prepare_obj(&exec,
> > > > > > > > > gobj,
> > > > > > > > > 1);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      			if (unlikely(ret))
> > > > > > > > >      				goto unreserve_out;
> > > > > > > > >      		}
> > > > > > > > > @@ -2875,7 +2875,7 @@ int
> > > > > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info,
> > > > > > > > > struct
> > > > > > > > > dma_fence __rcu *
> > > > > > > > >      		list_for_each_entry(peer_vm,
> > > > > > > > > &process_info-
> > > > > > > > > > vm_list_head,
> > > > > > > > >      				    vm_list_node) {
> > > > > > > > >      			ret =
> > > > > > > > > amdgpu_vm_lock_pd(peer_vm,
> > > > > > > > > &exec,
> > > > > > > > > 2);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      			if (unlikely(ret)) {
> > > > > > > > >      				pr_err("Locking VM
> > > > > > > > > PD
> > > > > > > > > failed,
> > > > > > > > > ret:
> > > > > > > > > %d\n", ret);
> > > > > > > > >      				goto
> > > > > > > > > ttm_reserve_fail;
> > > > > > > > > @@ -2891,7 +2891,7 @@ int
> > > > > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info,
> > > > > > > > > struct
> > > > > > > > > dma_fence __rcu *
> > > > > > > > >      
> > > > > > > > >      			gobj = &mem->bo->tbo.base;
> > > > > > > > >      			ret =
> > > > > > > > > drm_exec_prepare_obj(&exec,
> > > > > > > > > gobj,
> > > > > > > > > 1);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      			if (unlikely(ret)) {
> > > > > > > > >      				pr_err("drm_exec_pre
> > > > > > > > > pare
> > > > > > > > > _obj
> > > > > > > > > failed, ret: %d\n", ret);
> > > > > > > > >      				goto
> > > > > > > > > ttm_reserve_fail;
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > index ec888fc6ead8..299e46a6d934 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > > > > > > > @@ -897,7 +897,7 @@ static int
> > > > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > > > amdgpu_cs_parser *p,
> > > > > > > > >      
> > > > > > > > >      	drm_exec_until_all_locked(&p->exec) {
> > > > > > > > >      		r = amdgpu_vm_lock_pd(&fpriv->vm,
> > > > > > > > > &p-
> > > > > > > > > > exec,
> > > > > > > > > 1
> > > > > > > > > + p-
> > > > > > > > > > gang_size);
> > > > > > > > > -		drm_exec_retry_on_contention(&p-
> > > > > > > > > >exec);
> > > > > > > > > +		r = drm_exec_retry_on_contention(&p-
> > > > > > > > > > exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_free_user_pages;
> > > > > > > > >      
> > > > > > > > > @@ -905,7 +905,7 @@ static int
> > > > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > > > amdgpu_cs_parser *p,
> > > > > > > > >      			/* One fence for TTM and one
> > > > > > > > > for
> > > > > > > > > each
> > > > > > > > > CS
> > > > > > > > > job */
> > > > > > > > >      			r = drm_exec_prepare_obj(&p-
> > > > > > > > > > exec,
> > > > > > > > > &e-
> > > > > > > > > > bo-
> > > > > > > > > > tbo.base,
> > > > > > > > >      						 1 +
> > > > > > > > > p-
> > > > > > > > > > gang_size);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention(&p-
> > > > > > > > > > exec);
> > > > > > > > > +			r =
> > > > > > > > > drm_exec_retry_on_contention(&p-
> > > > > > > > > > exec,
> > > > > > > > > r);
> > > > > > > > >      			if (unlikely(r))
> > > > > > > > >      				goto
> > > > > > > > > out_free_user_pages;
> > > > > > > > >      
> > > > > > > > > @@ -915,7 +915,7 @@ static int
> > > > > > > > > amdgpu_cs_parser_bos(struct
> > > > > > > > > amdgpu_cs_parser *p,
> > > > > > > > >      		if (p->uf_bo) {
> > > > > > > > >      			r = drm_exec_prepare_obj(&p-
> > > > > > > > > > exec,
> > > > > > > > > &p-
> > > > > > > > > > uf_bo->tbo.base,
> > > > > > > > >      						 1 +
> > > > > > > > > p-
> > > > > > > > > > gang_size);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention(&p-
> > > > > > > > > > exec);
> > > > > > > > > +			r =
> > > > > > > > > drm_exec_retry_on_contention(&p-
> > > > > > > > > > exec,
> > > > > > > > > r);
> > > > > > > > >      			if (unlikely(r))
> > > > > > > > >      				goto
> > > > > > > > > out_free_user_pages;
> > > > > > > > >      		}
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > > > index cfdf558b48b6..8b2b86c7a6c5 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> > > > > > > > > @@ -74,7 +74,7 @@ int amdgpu_map_static_csa(struct
> > > > > > > > > amdgpu_device
> > > > > > > > > *adev, struct amdgpu_vm *vm,
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > >      		if (likely(!r))
> > > > > > > > >      			r = drm_exec_lock_obj(&exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r)) {
> > > > > > > > >      			DRM_ERROR("failed to reserve
> > > > > > > > > CSA,PD
> > > > > > > > > BOs:
> > > > > > > > > err=%d\n", r);
> > > > > > > > >      			goto error;
> > > > > > > > > @@ -114,7 +114,7 @@ int
> > > > > > > > > amdgpu_unmap_static_csa(struct
> > > > > > > > > amdgpu_device *adev, struct amdgpu_vm *vm,
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > >      		if (likely(!r))
> > > > > > > > >      			r = drm_exec_lock_obj(&exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r)) {
> > > > > > > > >      			DRM_ERROR("failed to reserve
> > > > > > > > > CSA,PD
> > > > > > > > > BOs:
> > > > > > > > > err=%d\n", r);
> > > > > > > > >      			goto error;
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > > > index 67c234bcf89f..17e16c971e21 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > > > > > > > @@ -239,12 +239,12 @@ static void
> > > > > > > > > amdgpu_gem_object_close(struct
> > > > > > > > > drm_gem_object *obj,
> > > > > > > > >      	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_IGNORE_DUPLICATES,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		r = drm_exec_prepare_obj(&exec, &bo-
> > > > > > > > > > tbo.base,
> > > > > > > > > 1);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      	}
> > > > > > > > > @@ -776,13 +776,13 @@ int amdgpu_gem_va_ioctl(struct
> > > > > > > > > drm_device
> > > > > > > > > *dev, void *data,
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		if (gobj) {
> > > > > > > > >      			r = drm_exec_lock_obj(&exec,
> > > > > > > > > gobj);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&exec);
> > > > > > > > > +			r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      			if (unlikely(r))
> > > > > > > > >      				goto error;
> > > > > > > > >      		}
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(&fpriv->vm,
> > > > > > > > > &exec,
> > > > > > > > > 2);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error;
> > > > > > > > >      	}
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > > > index 5ca5c47ab54e..1b1a5147606e 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > > > > > > > > @@ -1221,12 +1221,12 @@ int
> > > > > > > > > amdgpu_mes_ctx_map_meta_data(struct
> > > > > > > > > amdgpu_device *adev,
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		r = drm_exec_lock_obj(&exec,
> > > > > > > > >      				      &ctx_data-
> > > > > > > > > > meta_data_obj-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error_fini_exec;
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error_fini_exec;
> > > > > > > > >      	}
> > > > > > > > > @@ -1292,12 +1292,12 @@ int
> > > > > > > > > amdgpu_mes_ctx_unmap_meta_data(struct
> > > > > > > > > amdgpu_device *adev,
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		r = drm_exec_lock_obj(&exec,
> > > > > > > > >      				      &ctx_data-
> > > > > > > > > > meta_data_obj-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      	}
> > > > > > > > > diff --git
> > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > > > index e22cb2b5cd92..72b8213e352c 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
> > > > > > > > > @@ -77,7 +77,7 @@ int amdgpu_seq64_map(struct
> > > > > > > > > amdgpu_device
> > > > > > > > > *adev,
> > > > > > > > > struct amdgpu_vm *vm,
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > >      		if (likely(!r))
> > > > > > > > >      			r = drm_exec_lock_obj(&exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error;
> > > > > > > > >      	}
> > > > > > > > > @@ -138,7 +138,7 @@ void amdgpu_seq64_unmap(struct
> > > > > > > > > amdgpu_device
> > > > > > > > > *adev, struct amdgpu_fpriv *fpriv)
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > >      		if (likely(!r))
> > > > > > > > >      			r = drm_exec_lock_obj(&exec,
> > > > > > > > > &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error;
> > > > > > > > >      	}
> > > > > > > > > diff --git
> > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > > > index e01c1c8e64c4..63392ce43945 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
> > > > > > > > > @@ -89,12 +89,12 @@ static int map_ring_data(struct
> > > > > > > > > amdgpu_device
> > > > > > > > > *adev, struct amdgpu_vm *vm,
> > > > > > > > >      	drm_exec_init(&exec, 0, 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		r = drm_exec_lock_obj(&exec, &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error_fini_exec;
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto error_fini_exec;
> > > > > > > > >      	}
> > > > > > > > > @@ -152,12 +152,12 @@ static int
> > > > > > > > > unmap_ring_data(struct
> > > > > > > > > amdgpu_device *adev, struct amdgpu_vm *vm,
> > > > > > > > >      	drm_exec_init(&exec, 0, 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		r = drm_exec_lock_obj(&exec, &bo-
> > > > > > > > > > tbo.base);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      
> > > > > > > > >      		r = amdgpu_vm_lock_pd(vm, &exec, 0);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		r =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > r);
> > > > > > > > >      		if (unlikely(r))
> > > > > > > > >      			goto out_unlock;
> > > > > > > > >      	}
> > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > > > index 386875e6eb96..a3aa7fd22f6a 100644
> > > > > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> > > > > > > > > @@ -1499,7 +1499,7 @@ static int
> > > > > > > > > svm_range_reserve_bos(struct
> > > > > > > > > svm_validate_context *ctx, bool intr)
> > > > > > > > >      			vm = drm_priv_to_vm(pdd-
> > > > > > > > > > drm_priv);
> > > > > > > > >      
> > > > > > > > >      			r = amdgpu_vm_lock_pd(vm,
> > > > > > > > > &ctx-
> > > > > > > > > > exec,
> > > > > > > > > 2);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&ctx-
> > > > > > > > > > exec);
> > > > > > > > > +			r =
> > > > > > > > > drm_exec_retry_on_contention(&ctx-
> > > > > > > > > > exec, r);
> > > > > > > > >      			if (unlikely(r)) {
> > > > > > > > >      				pr_debug("failed %d
> > > > > > > > > to
> > > > > > > > > reserve
> > > > > > > > > bo\n", r);
> > > > > > > > >      				goto unreserve_out;
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_exec.c
> > > > > > > > > b/drivers/gpu/drm/drm_exec.c
> > > > > > > > > index 2da094bdf8a4..3770a5d30213 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_exec.c
> > > > > > > > > +++ b/drivers/gpu/drm/drm_exec.c
> > > > > > > > > @@ -28,12 +28,12 @@
> > > > > > > > >       *	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT);
> > > > > > > > >       *	drm_exec_until_all_locked(&exec) {
> > > > > > > > >       *		ret = drm_exec_prepare_obj(&exec,
> > > > > > > > > boA,
> > > > > > > > > 1);
> > > > > > > > > - *		drm_exec_retry_on_contention(&exec);
> > > > > > > > > + *		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >       *		if (ret)
> > > > > > > > >       *			goto error;
> > > > > > > > >       *
> > > > > > > > >       *		ret = drm_exec_prepare_obj(&exec,
> > > > > > > > > boB,
> > > > > > > > > 1);
> > > > > > > > > - *		drm_exec_retry_on_contention(&exec);
> > > > > > > > > + *		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >       *		if (ret)
> > > > > > > > >       *			goto error;
> > > > > > > > >       *	}
> > > > > > > > > @@ -48,7 +48,8 @@
> > > > > > > > >       */
> > > > > > > > >      
> > > > > > > > >      /* Dummy value used to initially enter the retry
> > > > > > > > > loop
> > > > > > > > > */
> > > > > > > > > -#define DRM_EXEC_DUMMY ((void *)~0)
> > > > > > > > > +#define DRM_EXEC_DUMMY ERR_PTR(-ESTALE)
> > > > > > > > > +#define DRM_EXEC_CONTENDED ERR_PTR(-EDEADLK)
> > > > > > > > >      
> > > > > > > > >      /* Unlock all objects and drop references */
> > > > > > > > >      static void drm_exec_unlock_all(struct drm_exec
> > > > > > > > > *exec)
> > > > > > > > > @@ -131,8 +132,7 @@ bool drm_exec_cleanup(struct
> > > > > > > > > drm_exec
> > > > > > > > > *exec)
> > > > > > > > >      		return true;
> > > > > > > > >      	}
> > > > > > > > >      
> > > > > > > > > -	drm_exec_unlock_all(exec);
> > > > > > > > > -	exec->num_objects = 0;
> > > > > > > > > +	exec->contended = NULL;
> > > > > > > > >      	return true;
> > > > > > > > >      }
> > > > > > > > >      EXPORT_SYMBOL(drm_exec_cleanup);
> > > > > > > > > @@ -194,6 +194,27 @@ static int
> > > > > > > > > drm_exec_lock_contended(struct
> > > > > > > > > drm_exec *exec)
> > > > > > > > >      	return ret;
> > > > > > > > >      }
> > > > > > > > >      
> > > > > > > > > +/**
> > > > > > > > > + * drm_exec_handle_contended() - Perform cleanup
> > > > > > > > > before
> > > > > > > > > a ww
> > > > > > > > > transaction restart
> > > > > > > > > + * @exec: Pointer to the drm_exec object.
> > > > > > > > > + *
> > > > > > > > > + * Unlocks all held resvs and re-locks the contended
> > > > > > > > > object.
> > > > > > > > > + *
> > > > > > > > > + * Return: 0 on success, negative error code on
> > > > > > > > > failure.
> > > > > > > > > + */
> > > > > > > > > +int drm_exec_handle_contended(struct drm_exec *exec)
> > > > > > > > > +{
> > > > > > > > > +	int ret;
> > > > > > > > > +
> > > > > > > > > +	drm_exec_unlock_all(exec);
> > > > > > > > > +	exec->num_objects = 0;
> > > > > > > > > +	ret = drm_exec_lock_contended(exec);
> > > > > > > > > +	exec->contended = DRM_EXEC_CONTENDED;
> > > > > > > > > +
> > > > > > > > > +	return ret;
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL(drm_exec_handle_contended);
> > > > > > > > > +
> > > > > > > > >      /**
> > > > > > > > >       * drm_exec_lock_obj - lock a GEM object for use
> > > > > > > > >       * @exec: the drm_exec object with the state
> > > > > > > > > @@ -209,10 +230,6 @@ int drm_exec_lock_obj(struct
> > > > > > > > > drm_exec
> > > > > > > > > *exec,
> > > > > > > > > struct drm_gem_object *obj)
> > > > > > > > >      {
> > > > > > > > >      	int ret;
> > > > > > > > >      
> > > > > > > > > -	ret = drm_exec_lock_contended(exec);
> > > > > > > > > -	if (unlikely(ret))
> > > > > > > > > -		return ret;
> > > > > > > > > -
> > > > > > > > >      	if (exec->prelocked == obj) {
> > > > > > > > >      		drm_gem_object_put(exec->prelocked);
> > > > > > > > >      		exec->prelocked = NULL;
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > > > b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > > > index f9eb56f24bef..0923d6ae18e2 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > > > > > > @@ -1254,18 +1254,18 @@ drm_gpuvm_exec_lock(struct
> > > > > > > > > drm_gpuvm_exec
> > > > > > > > > *vm_exec)
> > > > > > > > >      
> > > > > > > > >      	drm_exec_until_all_locked(exec) {
> > > > > > > > >      		ret = drm_gpuvm_prepare_vm(gpuvm,
> > > > > > > > > exec,
> > > > > > > > > num_fences);
> > > > > > > > > -		drm_exec_retry_on_contention(exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			goto err;
> > > > > > > > >      
> > > > > > > > >      		ret =
> > > > > > > > > drm_gpuvm_prepare_objects(gpuvm,
> > > > > > > > > exec,
> > > > > > > > > num_fences);
> > > > > > > > > -		drm_exec_retry_on_contention(exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			goto err;
> > > > > > > > >      
> > > > > > > > >      		if (vm_exec->extra.fn) {
> > > > > > > > >      			ret = vm_exec-
> > > > > > > > > > extra.fn(vm_exec);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > ret);
> > > > > > > > >      			if (ret)
> > > > > > > > >      				goto err;
> > > > > > > > >      		}
> > > > > > > > > @@ -1346,7 +1346,7 @@
> > > > > > > > > drm_gpuvm_exec_lock_range(struct
> > > > > > > > > drm_gpuvm_exec *vm_exec,
> > > > > > > > >      	drm_exec_until_all_locked(exec) {
> > > > > > > > >      		ret = drm_gpuvm_prepare_range(gpuvm,
> > > > > > > > > exec,
> > > > > > > > > addr,
> > > > > > > > > range,
> > > > > > > > >      					     
> > > > > > > > > vm_exec-
> > > > > > > > > > num_fences);
> > > > > > > > > -		drm_exec_retry_on_contention(exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			goto err;
> > > > > > > > >      	}
> > > > > > > > > diff --git a/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > > > b/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > > > index 78c2f3c6dce0..6e0ce6c4576c 100644
> > > > > > > > > --- a/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > > > +++ b/drivers/gpu/drm/imagination/pvr_job.c
> > > > > > > > > @@ -574,7 +574,7 @@ prepare_job_resvs_for_each(struct
> > > > > > > > > drm_exec
> > > > > > > > > *exec, struct pvr_job_data *job_data,
> > > > > > > > >      	drm_exec_until_all_locked(exec) {
> > > > > > > > >      		int err = jobs_lock_all_objs(exec,
> > > > > > > > > job_data,
> > > > > > > > > job_count);
> > > > > > > > >      
> > > > > > > > > -		drm_exec_retry_on_contention(exec);
> > > > > > > > > +		err =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > err);
> > > > > > > > >      		if (err)
> > > > > > > > >      			return err;
> > > > > > > > >      	}
> > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > > > index fba78193127d..01992b43ea4b 100644
> > > > > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > > > > > @@ -259,7 +259,7 @@ static int
> > > > > > > > > submit_lock_objects(struct
> > > > > > > > > msm_gem_submit *submit)
> > > > > > > > >      		for (unsigned i = 0; i < submit-
> > > > > > > > > >nr_bos;
> > > > > > > > > i++)
> > > > > > > > > {
> > > > > > > > >      			struct drm_gem_object *obj =
> > > > > > > > > submit-
> > > > > > > > > > bos[i].obj;
> > > > > > > > >      			ret =
> > > > > > > > > drm_exec_prepare_obj(&submit-
> > > > > > > > > > exec,
> > > > > > > > > obj, 1);
> > > > > > > > > -
> > > > > > > > > 			drm_exec_retry_on_contention
> > > > > > > > > (&su
> > > > > > > > > bmit-
> > > > > > > > > > exec);
> > > > > > > > > +			ret =
> > > > > > > > > drm_exec_retry_on_contention(&submit->exec, ret);
> > > > > > > > >      			if (ret)
> > > > > > > > >      				goto error;
> > > > > > > > >      		}
> > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > > > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > > > index ee02cd833c5e..0c871634fdfb 100644
> > > > > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > > > > > > > > @@ -1350,7 +1350,7 @@
> > > > > > > > > nouveau_uvmm_bind_job_submit(struct
> > > > > > > > > nouveau_job *job,
> > > > > > > > >      	drm_exec_init(exec, vme->flags, 0);
> > > > > > > > >      	drm_exec_until_all_locked(exec) {
> > > > > > > > >      		ret = bind_lock_validate(job, exec,
> > > > > > > > > vme-
> > > > > > > > > > num_fences);
> > > > > > > > > -		drm_exec_retry_on_contention(exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret) {
> > > > > > > > >      			op = list_last_op(&bind_job-
> > > > > > > > > > ops);
> > > > > > > > >      			goto unwind;
> > > > > > > > > diff --git a/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > > > b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > > > index 81f928a429ba..28558fdb08df 100644
> > > > > > > > > --- a/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > > > +++ b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > > > > > @@ -63,7 +63,7 @@ static void test_lock(struct kunit
> > > > > > > > > *test)
> > > > > > > > >      	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = drm_exec_lock_obj(&exec,
> > > > > > > > > &gobj);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > > @@ -83,14 +83,14 @@ static void
> > > > > > > > > test_lock_unlock(struct
> > > > > > > > > kunit
> > > > > > > > > *test)
> > > > > > > > >      	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = drm_exec_lock_obj(&exec,
> > > > > > > > > &gobj);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > >      
> > > > > > > > >      		drm_exec_unlock_obj(&exec, &gobj);
> > > > > > > > >      		ret = drm_exec_lock_obj(&exec,
> > > > > > > > > &gobj);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > > @@ -110,13 +110,13 @@ static void
> > > > > > > > > test_duplicates(struct
> > > > > > > > > kunit
> > > > > > > > > *test)
> > > > > > > > >      	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_IGNORE_DUPLICATES,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = drm_exec_lock_obj(&exec,
> > > > > > > > > &gobj);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > >      
> > > > > > > > >      		ret = drm_exec_lock_obj(&exec,
> > > > > > > > > &gobj);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > > @@ -137,7 +137,7 @@ static void test_prepare(struct
> > > > > > > > > kunit
> > > > > > > > > *test)
> > > > > > > > >      	drm_exec_init(&exec,
> > > > > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = drm_exec_prepare_obj(&exec,
> > > > > > > > > &gobj,
> > > > > > > > > 1);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > > > index 040dd142c49c..20ec1ab1b52d 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
> > > > > > > > > @@ -200,7 +200,7 @@ static int
> > > > > > > > > handle_pagefault(struct
> > > > > > > > > xe_gt
> > > > > > > > > *gt,
> > > > > > > > > struct pagefault *pf)
> > > > > > > > >      	drm_exec_init(&exec, 0, 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = xe_pf_begin(&exec, vma,
> > > > > > > > > atomic,
> > > > > > > > > tile-
> > > > > > > > > > id);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			goto unlock_dma_resv;
> > > > > > > > >      
> > > > > > > > > @@ -543,7 +543,7 @@ static int handle_acc(struct
> > > > > > > > > xe_gt
> > > > > > > > > *gt,
> > > > > > > > > struct
> > > > > > > > > acc *acc)
> > > > > > > > >      	drm_exec_init(&exec, 0, 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		ret = xe_pf_begin(&exec, vma, true,
> > > > > > > > > tile-
> > > > > > > > > > id);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		ret =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > ret);
> > > > > > > > >      		if (ret)
> > > > > > > > >      			break;
> > > > > > > > >      	}
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > > > index e2ec148c9c33..335524e803e7 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > > > > > > > @@ -501,7 +501,7 @@ static void
> > > > > > > > > preempt_rebind_work_func(struct
> > > > > > > > > work_struct *w)
> > > > > > > > >      		bool done = false;
> > > > > > > > >      
> > > > > > > > >      		err = xe_preempt_work_begin(&exec,
> > > > > > > > > vm,
> > > > > > > > > &done);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		err =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > err);
> > > > > > > > >      		if (err || done) {
> > > > > > > > >      			drm_exec_fini(&exec);
> > > > > > > > >      			if (err &&
> > > > > > > > > xe_vm_validate_should_retry(&exec, err, &end))
> > > > > > > > > @@ -1052,7 +1052,7 @@ static void
> > > > > > > > > xe_vma_destroy_unlocked(struct
> > > > > > > > > xe_vma *vma)
> > > > > > > > >      	drm_exec_init(&exec, 0, 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		err = xe_vm_lock_vma(&exec, vma);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		err =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > err);
> > > > > > > > >      		if (XE_WARN_ON(err))
> > > > > > > > >      			break;
> > > > > > > > >      	}
> > > > > > > > > @@ -2148,11 +2148,11 @@ static struct xe_vma
> > > > > > > > > *new_vma(struct
> > > > > > > > > xe_vm
> > > > > > > > > *vm, struct drm_gpuva_op_map *op,
> > > > > > > > >      			err = 0;
> > > > > > > > >      			if (!bo->vm) {
> > > > > > > > >      				err =
> > > > > > > > > drm_exec_lock_obj(&exec,
> > > > > > > > > xe_vm_obj(vm));
> > > > > > > > > -
> > > > > > > > > 				drm_exec_retry_on_co
> > > > > > > > > nten
> > > > > > > > > tion
> > > > > > > > > (&
> > > > > > > > > exec);
> > > > > > > > > +				err =
> > > > > > > > > drm_exec_retry_on_contention(&exec, err);
> > > > > > > > >      			}
> > > > > > > > >      			if (!err) {
> > > > > > > > >      				err =
> > > > > > > > > drm_exec_lock_obj(&exec,
> > > > > > > > > &bo->ttm.base);
> > > > > > > > > -
> > > > > > > > > 				drm_exec_retry_on_co
> > > > > > > > > nten
> > > > > > > > > tion
> > > > > > > > > (&
> > > > > > > > > exec);
> > > > > > > > > +				err =
> > > > > > > > > drm_exec_retry_on_contention(&exec, err);
> > > > > > > > >      			}
> > > > > > > > >      			if (err) {
> > > > > > > > >      				drm_exec_fini(&exec)
> > > > > > > > > ;
> > > > > > > > > @@ -2884,7 +2884,7 @@ static int
> > > > > > > > > vm_bind_ioctl_ops_execute(struct
> > > > > > > > > xe_vm *vm,
> > > > > > > > >      		      DRM_EXEC_IGNORE_DUPLICATES,
> > > > > > > > > 0);
> > > > > > > > >      	drm_exec_until_all_locked(&exec) {
> > > > > > > > >      		err =
> > > > > > > > > vm_bind_ioctl_ops_lock_and_prep(&exec,
> > > > > > > > > vm,
> > > > > > > > > vops);
> > > > > > > > > -		drm_exec_retry_on_contention(&exec);
> > > > > > > > > +		err =
> > > > > > > > > drm_exec_retry_on_contention(&exec,
> > > > > > > > > err);
> > > > > > > > >      		if (err)
> > > > > > > > >      			goto unlock;
> > > > > > > > >      
> > > > > > > > > diff --git a/include/drm/drm_exec.h
> > > > > > > > > b/include/drm/drm_exec.h
> > > > > > > > > index aa786b828a0a..fafb40d96e38 100644
> > > > > > > > > --- a/include/drm/drm_exec.h
> > > > > > > > > +++ b/include/drm/drm_exec.h
> > > > > > > > > @@ -51,6 +51,8 @@ struct drm_exec {
> > > > > > > > >      	struct drm_gem_object *prelocked;
> > > > > > > > >      };
> > > > > > > > >      
> > > > > > > > > +int drm_exec_handle_contended(struct drm_exec
> > > > > > > > > *exec);
> > > > > > > > > +
> > > > > > > > >      /**
> > > > > > > > >       * drm_exec_obj() - Return the object for a give
> > > > > > > > > drm_exec
> > > > > > > > > index
> > > > > > > > >       * @exec: Pointer to the drm_exec context
> > > > > > > > > @@ -113,15 +115,26 @@ __PASTE(__drm_exec_,
> > > > > > > > > __LINE__):					
> > > > > > > > > 	\
> > > > > > > > >      /**
> > > > > > > > >       * drm_exec_retry_on_contention - restart the
> > > > > > > > > loop to
> > > > > > > > > grap
> > > > > > > > > all
> > > > > > > > > locks
> > > > > > > > >       * @exec: drm_exec object
> > > > > > > > > + * @_ret: The current error status
> > > > > > > > >       *
> > > > > > > > >       * Control flow helper to continue when a
> > > > > > > > > contention
> > > > > > > > > was
> > > > > > > > > detected
> > > > > > > > > and we need to
> > > > > > > > >       * clean up and re-start the loop to prepare all
> > > > > > > > > GEM
> > > > > > > > > objects.
> > > > > > > > > + *
> > > > > > > > > + * Return: If no loop restart occurred: The error
> > > > > > > > > status.
> > > > > > > > >       */
> > > > > > > > > -#define
> > > > > > > > > drm_exec_retry_on_contention(exec)		
> > > > > > > > > 	\
> > > > > > > > > -	do
> > > > > > > > > {						
> > > > > > > > > 	\
> > > > > > > > > -		if
> > > > > > > > > (unlikely(drm_exec_is_contended(exec)))	\
> > > > > > > > > -			goto
> > > > > > > > > *__drm_exec_retry_ptr;		\
> > > > > > > > > -	} while (0)
> > > > > > > > > +#define drm_exec_retry_on_contention(exec,
> > > > > > > > > _ret)			\
> > > > > > > > > +	({					
> > > > > > > > > 	
> > > > > > > > > 	
> > > > > > > > > 	\
> > > > > > > > > +		struct drm_exec *__exec =
> > > > > > > > > (exec);			\
> > > > > > > > > +		int __ret =
> > > > > > > > > (_ret);					\
> > > > > > > > > +						
> > > > > > > > > 	
> > > > > > > > > 	
> > > > > > > > > 	\
> > > > > > > > > +		if
> > > > > > > > > (unlikely(drm_exec_is_contended(__exec)))
> > > > > > > > > {		\
> > > > > > > > > +			WARN_ON(__ret != -
> > > > > > > > > EDEADLK);			\
> > > > > > > > > +			__ret =
> > > > > > > > > drm_exec_handle_contended(__exec);	\
> > > > > > > > > +			if
> > > > > > > > > (!__ret)					\
> > > > > > > > > +				goto
> > > > > > > > > *__drm_exec_retry_ptr;		\
> > > > > > > > > +		}				
> > > > > > > > > 	
> > > > > > > > > 	
> > > > > > > > > 	\
> > > > > > > > > +		__ret;				
> > > > > > > > > 	
> > > > > > > > > 	
> > > > > > > > > 	\
> > > > > > > > > +	})
> > > > > > > > >      
> > > > > > > > >      /**
> > > > > > > > >       * drm_exec_is_contended - check for contention



More information about the dri-devel mailing list