<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    Am 22.05.24 um 16:32 schrieb Thomas Hellström:<br>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <pre class="moz-quote-pre" wrap="">On Wed, 2024-05-22 at 07:52 +0200, Christian König wrote:
</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">Am 21.05.24 um 09:16 schrieb Thomas Hellström:
</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">If contention and backoff occurs during a drm_exec ww transaction,
the contended lock is not locked again until the next orinary
attempt to lock a dma_resv lock. However, with the introduction of
drm_exec_trylock(), that doesn't work, since the locking of the
contended lock needs to be a sleeping lock. Neither can we ignore
locking the contended lock during a trylock since that would
violate
at least the ww_mutex annotations.

So resolve this by actually locking the contended lock during
drm_exec_retry_on_contention(). However, this introduces a new
point
of failure since locking the contended lock may return -EINTR.

Hence drm_exec_retry_on_contention() must take an error parameter
and
also return a value indicating success.
</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">
After thinking more about that I have to pretty clearly NAK this.
                                  
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">I thought we were beyond upfront NAKing in the first reply :/</pre>
    </blockquote>
    <br>
    Well my memory could fail me, but I mentioned concerns on this
    approach before.<br>
    <br>
    I was a bit annoyed seeing that again. But could as well be that my
    response never got out or that I'm mixing things up.<br>
    <br>
    <span style="white-space: pre-wrap">
</span>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">It's an intentional design decision to guarantee that at the start of
the loop no object is locked.

This is because Sima and I wanted to integrate userptr handling into 
drm_exec as well in the long term.
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
First I agree the interface looks worse with this patch.
But I thought generic userptr handling were going to end up as a gpuvm
helper (without using GEM objects) as we've discussed previously.</pre>
    </blockquote>
    <br>
    We might be talking past each other. That sounds like SVM, e.g. on
    demand paging.<br>
    <br>
    What I mean is pre-faulting during command submission like radeon,
    amdgpu and i915 do for the userptr handling.<br>
    <br>
    For that you need to re-start the whole handling similar to how you
    need to re-start for the mutex locking when you detect that the page
    array is stale, the difference is that you are not allowed to hold
    any resv locks while pre-faulting.<br>
    <br>
    That's why it is a requirement that the drm_exec loop starts without
    any locks held.<br>
    <br>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <pre class="moz-quote-pre" wrap="">Anyway if still there would be helpers in drm_exec for some other
generic userptr solution, those need to be done before the
ww_acquire_ctx_init(). The contended locking here is done after, so I
can't really see how these would clash.</pre>
    </blockquote>
    <br>
    Yes, that indeed was a problem. The ww_acquire_ctx_init() was
    intentionally moved into drm_exec_cleanup() to partially prevent
    that issue.<br>
    <br>
    I haven't fully figured out how to do handle everything exactly, but
    at least in principle it can be made work. With this change here it
    becomes impossible.<br>
    <br>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <pre class="moz-quote-pre" wrap="">Still, If we need to come up with another solution, I think it's fair
we clearly sort out why.

</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">I think we should just document that drm_exec_trylock() can't be used
to 
lock the first BO in the loop and explicitly WARN if that's the case.
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
Unfortunately that's not sufficient for the general use-case. If we
want to keep the ttm_bo_vm approach of dropping the mmap lock when
there is contention on the bo resv, we need to be able to trylock on
first lock.</pre>
    </blockquote>
    <br>
    Mhm, why exactly do we still have that dance in the first place?<br>
    <br>
    I mean we have sorted out the mmap() and dma_resv() locking order
    long ago. See dma_resv_lockdep() which is enforcing that.<br>
    <br>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <pre class="moz-quote-pre" wrap=""> Also bo creation is using trylock but might be able to use
a sleeping lock there. But if that sleeping lock triggers an -EDEADLK
(DEBUG_WW_MUTEX_SLOWPATH) we have the weird situation of referencing an
object that never was fully created as a contending object.</pre>
    </blockquote>
    <br>
    I wanted to eliminate that as well by not validating the BO during
    initialization any more.<br>
    <br>
    So bo creation would then be:<br>
    <br>
    ttm_bo_init(bo)<br>
    <br>
    drm_exec_while_not_all_locked() {<br>
        drm_exec_prepare_object(bo, 1);<br>
    <br>
        ttm_bo_validate(bo);<br>
    }<br>
    <br>
    if (r)<br>
        ttm_bo_put(bo);<br>
    <br>
    return r;<br>
    <br>
    I have that on a branch here somewhere prepared, but never got the
    time to clean it up.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote type="cite" cite="mid:3f3d7fb84a11e8341a79331a0a682df0f3e8897d.camel@linux.intel.com">
      <pre class="moz-quote-pre" wrap="">

So the only really working alternative solution I can see is that
drm_exec_trylock simply fails if there is a contended lock and we'd
need to live with the weird bo creation situation described above.

/Thomas

</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">
Regards,
Christian.

</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">
Cc: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:christian.koenig@amd.com"><christian.koenig@amd.com></a>
Cc: Somalapuram Amaranath <a class="moz-txt-link-rfc2396E" href="mailto:Amaranath.Somalapuram@amd.com"><Amaranath.Somalapuram@amd.com></a>
Cc: Matthew Brost <a class="moz-txt-link-rfc2396E" href="mailto:matthew.brost@intel.com"><matthew.brost@intel.com></a>
Cc: <a class="moz-txt-link-rfc2396E" href="mailto:dri-devel@lists.freedesktop.org"><dri-devel@lists.freedesktop.org></a>
Signed-off-by: Thomas Hellström <a class="moz-txt-link-rfc2396E" href="mailto:thomas.hellstrom@linux.intel.com"><thomas.hellstrom@linux.intel.com></a>
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 16 ++++-----
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  6 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c       |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  8 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  8 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c     |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c  |  8 ++---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  2 +-
  drivers/gpu/drm/drm_exec.c                    | 35
++++++++++++++-----
  drivers/gpu/drm/drm_gpuvm.c                   |  8 ++---
  drivers/gpu/drm/imagination/pvr_job.c         |  2 +-
  drivers/gpu/drm/msm/msm_gem_submit.c          |  2 +-
  drivers/gpu/drm/nouveau/nouveau_uvmm.c        |  2 +-
  drivers/gpu/drm/tests/drm_exec_test.c         | 12 +++----
  drivers/gpu/drm/xe/xe_gt_pagefault.c          |  4 +--
  drivers/gpu/drm/xe/xe_vm.c                    | 10 +++---
  include/drm/drm_exec.h                        | 23 +++++++++---
  17 files changed, 92 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e4d4e55c08ad..4a08a692aa1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1152,12 +1152,12 @@ static int reserve_bo_and_vm(struct kgd_mem
*mem,
      drm_exec_init(&ctx->exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
      drm_exec_until_all_locked(&ctx->exec) {
              ret = amdgpu_vm_lock_pd(vm, &ctx->exec, 2);
-               drm_exec_retry_on_contention(&ctx->exec);
+               ret = drm_exec_retry_on_contention(&ctx->exec,
ret);
              if (unlikely(ret))
                      goto error;
  
              ret = drm_exec_prepare_obj(&ctx->exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base, 1);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&ctx->exec);
+               ret = drm_exec_retry_on_contention(&ctx->exec,
ret);
              if (unlikely(ret))
                      goto error;
      }
@@ -1199,14 +1199,14 @@ static int reserve_bo_and_cond_vms(struct
kgd_mem *mem,
  
                      ret = amdgpu_vm_lock_pd(entry->bo_va-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">base.vm,
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                                               &ctx->exec, 2);
-                       drm_exec_retry_on_contention(&ctx->exec);
+                       ret = drm_exec_retry_on_contention(&ctx-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">exec, ret);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                       if (unlikely(ret))
                              goto error;
                      ++ctx->n_vms;
              }
  
              ret = drm_exec_prepare_obj(&ctx->exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base, 1);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&ctx->exec);
+               ret = drm_exec_retry_on_contention(&ctx->exec,
ret);
              if (unlikely(ret))
                      goto error;
      }
@@ -2619,7 +2619,7 @@ static int validate_invalid_user_pages(struct
amdkfd_process_info *process_info)
              list_for_each_entry(peer_vm, &process_info-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">vm_list_head,
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                                   vm_list_node) {
                      ret = amdgpu_vm_lock_pd(peer_vm, &exec,
2);
-                       drm_exec_retry_on_contention(&exec);
+                       ret = drm_exec_retry_on_contention(&exec,
ret);
                      if (unlikely(ret))
                              goto unreserve_out;
              }
@@ -2631,7 +2631,7 @@ static int validate_invalid_user_pages(struct
amdkfd_process_info *process_info)
  
                      gobj = &mem->bo->tbo.base;
                      ret = drm_exec_prepare_obj(&exec, gobj,
1);
-                       drm_exec_retry_on_contention(&exec);
+                       ret = drm_exec_retry_on_contention(&exec,
ret);
                      if (unlikely(ret))
                              goto unreserve_out;
              }
@@ -2875,7 +2875,7 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
dma_fence __rcu *
              list_for_each_entry(peer_vm, &process_info-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">vm_list_head,
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                                   vm_list_node) {
                      ret = amdgpu_vm_lock_pd(peer_vm, &exec,
2);
-                       drm_exec_retry_on_contention(&exec);
+                       ret = drm_exec_retry_on_contention(&exec,
ret);
                      if (unlikely(ret)) {
                              pr_err("Locking VM PD failed, ret:
%d\n", ret);
                              goto ttm_reserve_fail;
@@ -2891,7 +2891,7 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct
dma_fence __rcu *
  
                      gobj = &mem->bo->tbo.base;
                      ret = drm_exec_prepare_obj(&exec, gobj,
1);
-                       drm_exec_retry_on_contention(&exec);
+                       ret = drm_exec_retry_on_contention(&exec,
ret);
                      if (unlikely(ret)) {
                              pr_err("drm_exec_prepare_obj
failed, ret: %d\n", ret);
                              goto ttm_reserve_fail;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..299e46a6d934 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -897,7 +897,7 @@ static int amdgpu_cs_parser_bos(struct
amdgpu_cs_parser *p,
  
      drm_exec_until_all_locked(&p->exec) {
              r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec, 1 + p-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">gang_size);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&p->exec);
+               r = drm_exec_retry_on_contention(&p->exec, r);
              if (unlikely(r))
                      goto out_free_user_pages;
  
@@ -905,7 +905,7 @@ static int amdgpu_cs_parser_bos(struct
amdgpu_cs_parser *p,
                      /* One fence for TTM and one for each CS
job */
                      r = drm_exec_prepare_obj(&p->exec, &e->bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base,
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                                                1 + p-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">gang_size);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                        drm_exec_retry_on_contention(&p->exec);
+                       r = drm_exec_retry_on_contention(&p->exec,
r);
                      if (unlikely(r))
                              goto out_free_user_pages;
  
@@ -915,7 +915,7 @@ static int amdgpu_cs_parser_bos(struct
amdgpu_cs_parser *p,
              if (p->uf_bo) {
                      r = drm_exec_prepare_obj(&p->exec, &p-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">uf_bo->tbo.base,
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                                                1 + p-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">gang_size);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                        drm_exec_retry_on_contention(&p->exec);
+                       r = drm_exec_retry_on_contention(&p->exec,
r);
                      if (unlikely(r))
                              goto out_free_user_pages;
              }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
index cfdf558b48b6..8b2b86c7a6c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
@@ -74,7 +74,7 @@ int amdgpu_map_static_csa(struct amdgpu_device
*adev, struct amdgpu_vm *vm,
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
              if (likely(!r))
                      r = drm_exec_lock_obj(&exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r)) {
                      DRM_ERROR("failed to reserve CSA,PD BOs:
err=%d\n", r);
                      goto error;
@@ -114,7 +114,7 @@ int amdgpu_unmap_static_csa(struct
amdgpu_device *adev, struct amdgpu_vm *vm,
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
              if (likely(!r))
                      r = drm_exec_lock_obj(&exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r)) {
                      DRM_ERROR("failed to reserve CSA,PD BOs:
err=%d\n", r);
                      goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 67c234bcf89f..17e16c971e21 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -239,12 +239,12 @@ static void amdgpu_gem_object_close(struct
drm_gem_object *obj,
      drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
      drm_exec_until_all_locked(&exec) {
              r = drm_exec_prepare_obj(&exec, &bo->tbo.base, 1);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
  
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
      }
@@ -776,13 +776,13 @@ int amdgpu_gem_va_ioctl(struct drm_device
*dev, void *data,
      drm_exec_until_all_locked(&exec) {
              if (gobj) {
                      r = drm_exec_lock_obj(&exec, gobj);
-                       drm_exec_retry_on_contention(&exec);
+                       r = drm_exec_retry_on_contention(&exec,
r);
                      if (unlikely(r))
                              goto error;
              }
  
              r = amdgpu_vm_lock_pd(&fpriv->vm, &exec, 2);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error;
      }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 5ca5c47ab54e..1b1a5147606e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1221,12 +1221,12 @@ int amdgpu_mes_ctx_map_meta_data(struct
amdgpu_device *adev,
      drm_exec_until_all_locked(&exec) {
              r = drm_exec_lock_obj(&exec,
                                    &ctx_data->meta_data_obj-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error_fini_exec;
  
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error_fini_exec;
      }
@@ -1292,12 +1292,12 @@ int amdgpu_mes_ctx_unmap_meta_data(struct
amdgpu_device *adev,
      drm_exec_until_all_locked(&exec) {
              r = drm_exec_lock_obj(&exec,
                                    &ctx_data->meta_data_obj-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
  
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
      }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index e22cb2b5cd92..72b8213e352c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -77,7 +77,7 @@ int amdgpu_seq64_map(struct amdgpu_device *adev,
struct amdgpu_vm *vm,
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
              if (likely(!r))
                      r = drm_exec_lock_obj(&exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error;
      }
@@ -138,7 +138,7 @@ void amdgpu_seq64_unmap(struct amdgpu_device
*adev, struct amdgpu_fpriv *fpriv)
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
              if (likely(!r))
                      r = drm_exec_lock_obj(&exec, &bo-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">tbo.base);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error;
      }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
index e01c1c8e64c4..63392ce43945 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
@@ -89,12 +89,12 @@ static int map_ring_data(struct amdgpu_device
*adev, struct amdgpu_vm *vm,
      drm_exec_init(&exec, 0, 0);
      drm_exec_until_all_locked(&exec) {
              r = drm_exec_lock_obj(&exec, &bo->tbo.base);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error_fini_exec;
  
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto error_fini_exec;
      }
@@ -152,12 +152,12 @@ static int unmap_ring_data(struct
amdgpu_device *adev, struct amdgpu_vm *vm,
      drm_exec_init(&exec, 0, 0);
      drm_exec_until_all_locked(&exec) {
              r = drm_exec_lock_obj(&exec, &bo->tbo.base);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
  
              r = amdgpu_vm_lock_pd(vm, &exec, 0);
-               drm_exec_retry_on_contention(&exec);
+               r = drm_exec_retry_on_contention(&exec, r);
              if (unlikely(r))
                      goto out_unlock;
      }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 386875e6eb96..a3aa7fd22f6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1499,7 +1499,7 @@ static int svm_range_reserve_bos(struct
svm_validate_context *ctx, bool intr)
                      vm = drm_priv_to_vm(pdd->drm_priv);
  
                      r = amdgpu_vm_lock_pd(vm, &ctx->exec, 2);
-                       drm_exec_retry_on_contention(&ctx->exec);
+                       r = drm_exec_retry_on_contention(&ctx-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">exec, r);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                       if (unlikely(r)) {
                              pr_debug("failed %d to reserve
bo\n", r);
                              goto unreserve_out;
diff --git a/drivers/gpu/drm/drm_exec.c
b/drivers/gpu/drm/drm_exec.c
index 2da094bdf8a4..3770a5d30213 100644
--- a/drivers/gpu/drm/drm_exec.c
+++ b/drivers/gpu/drm/drm_exec.c
@@ -28,12 +28,12 @@
   *  drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
   *  drm_exec_until_all_locked(&exec) {
   *          ret = drm_exec_prepare_obj(&exec, boA, 1);
- *             drm_exec_retry_on_contention(&exec);
+ *             ret = drm_exec_retry_on_contention(&exec, ret);
   *          if (ret)
   *                  goto error;
   *
   *          ret = drm_exec_prepare_obj(&exec, boB, 1);
- *             drm_exec_retry_on_contention(&exec);
+ *             ret = drm_exec_retry_on_contention(&exec, ret);
   *          if (ret)
   *                  goto error;
   *  }
@@ -48,7 +48,8 @@
   */
  
  /* Dummy value used to initially enter the retry loop */
-#define DRM_EXEC_DUMMY ((void *)~0)
+#define DRM_EXEC_DUMMY ERR_PTR(-ESTALE)
+#define DRM_EXEC_CONTENDED ERR_PTR(-EDEADLK)
  
  /* Unlock all objects and drop references */
  static void drm_exec_unlock_all(struct drm_exec *exec)
@@ -131,8 +132,7 @@ bool drm_exec_cleanup(struct drm_exec *exec)
              return true;
      }
  
-       drm_exec_unlock_all(exec);
-       exec->num_objects = 0;
+       exec->contended = NULL;
      return true;
  }
  EXPORT_SYMBOL(drm_exec_cleanup);
@@ -194,6 +194,27 @@ static int drm_exec_lock_contended(struct
drm_exec *exec)
      return ret;
  }
  
+/**
+ * drm_exec_handle_contended() - Perform cleanup before a ww
transaction restart
+ * @exec: Pointer to the drm_exec object.
+ *
+ * Unlocks all held resvs and re-locks the contended object.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int drm_exec_handle_contended(struct drm_exec *exec)
+{
+       int ret;
+
+       drm_exec_unlock_all(exec);
+       exec->num_objects = 0;
+       ret = drm_exec_lock_contended(exec);
+       exec->contended = DRM_EXEC_CONTENDED;
+
+       return ret;
+}
+EXPORT_SYMBOL(drm_exec_handle_contended);
+
  /**
   * drm_exec_lock_obj - lock a GEM object for use
   * @exec: the drm_exec object with the state
@@ -209,10 +230,6 @@ int drm_exec_lock_obj(struct drm_exec *exec,
struct drm_gem_object *obj)
  {
      int ret;
  
-       ret = drm_exec_lock_contended(exec);
-       if (unlikely(ret))
-               return ret;
-
      if (exec->prelocked == obj) {
              drm_gem_object_put(exec->prelocked);
              exec->prelocked = NULL;
diff --git a/drivers/gpu/drm/drm_gpuvm.c
b/drivers/gpu/drm/drm_gpuvm.c
index f9eb56f24bef..0923d6ae18e2 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -1254,18 +1254,18 @@ drm_gpuvm_exec_lock(struct drm_gpuvm_exec
*vm_exec)
  
      drm_exec_until_all_locked(exec) {
              ret = drm_gpuvm_prepare_vm(gpuvm, exec,
num_fences);
-               drm_exec_retry_on_contention(exec);
+               ret = drm_exec_retry_on_contention(exec, ret);
              if (ret)
                      goto err;
  
              ret = drm_gpuvm_prepare_objects(gpuvm, exec,
num_fences);
-               drm_exec_retry_on_contention(exec);
+               ret = drm_exec_retry_on_contention(exec, ret);
              if (ret)
                      goto err;
  
              if (vm_exec->extra.fn) {
                      ret = vm_exec->extra.fn(vm_exec);
-                       drm_exec_retry_on_contention(exec);
+                       ret = drm_exec_retry_on_contention(exec,
ret);
                      if (ret)
                              goto err;
              }
@@ -1346,7 +1346,7 @@ drm_gpuvm_exec_lock_range(struct
drm_gpuvm_exec *vm_exec,
      drm_exec_until_all_locked(exec) {
              ret = drm_gpuvm_prepare_range(gpuvm, exec, addr,
range,
                                            vm_exec-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">num_fences);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(exec);
+               ret = drm_exec_retry_on_contention(exec, ret);
              if (ret)
                      goto err;
      }
diff --git a/drivers/gpu/drm/imagination/pvr_job.c
b/drivers/gpu/drm/imagination/pvr_job.c
index 78c2f3c6dce0..6e0ce6c4576c 100644
--- a/drivers/gpu/drm/imagination/pvr_job.c
+++ b/drivers/gpu/drm/imagination/pvr_job.c
@@ -574,7 +574,7 @@ prepare_job_resvs_for_each(struct drm_exec
*exec, struct pvr_job_data *job_data,
      drm_exec_until_all_locked(exec) {
              int err = jobs_lock_all_objs(exec, job_data,
job_count);
  
-               drm_exec_retry_on_contention(exec);
+               err = drm_exec_retry_on_contention(exec, err);
              if (err)
                      return err;
      }
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c
b/drivers/gpu/drm/msm/msm_gem_submit.c
index fba78193127d..01992b43ea4b 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -259,7 +259,7 @@ static int submit_lock_objects(struct
msm_gem_submit *submit)
              for (unsigned i = 0; i < submit->nr_bos; i++) {
                      struct drm_gem_object *obj = submit-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">bos[i].obj;
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">                       ret = drm_exec_prepare_obj(&submit->exec,
obj, 1);
-                       drm_exec_retry_on_contention(&submit-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">exec);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">+                        ret =
drm_exec_retry_on_contention(&submit->exec, ret);
                      if (ret)
                              goto error;
              }
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index ee02cd833c5e..0c871634fdfb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -1350,7 +1350,7 @@ nouveau_uvmm_bind_job_submit(struct
nouveau_job *job,
      drm_exec_init(exec, vme->flags, 0);
      drm_exec_until_all_locked(exec) {
              ret = bind_lock_validate(job, exec, vme-
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">num_fences);
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">-                drm_exec_retry_on_contention(exec);
+               ret = drm_exec_retry_on_contention(exec, ret);
              if (ret) {
                      op = list_last_op(&bind_job->ops);
                      goto unwind;
diff --git a/drivers/gpu/drm/tests/drm_exec_test.c
b/drivers/gpu/drm/tests/drm_exec_test.c
index 81f928a429ba..28558fdb08df 100644
--- a/drivers/gpu/drm/tests/drm_exec_test.c
+++ b/drivers/gpu/drm/tests/drm_exec_test.c
@@ -63,7 +63,7 @@ static void test_lock(struct kunit *test)
      drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
      drm_exec_until_all_locked(&exec) {
              ret = drm_exec_lock_obj(&exec, &gobj);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
@@ -83,14 +83,14 @@ static void test_lock_unlock(struct kunit
*test)
      drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
      drm_exec_until_all_locked(&exec) {
              ret = drm_exec_lock_obj(&exec, &gobj);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
  
              drm_exec_unlock_obj(&exec, &gobj);
              ret = drm_exec_lock_obj(&exec, &gobj);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
@@ -110,13 +110,13 @@ static void test_duplicates(struct kunit
*test)
      drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
      drm_exec_until_all_locked(&exec) {
              ret = drm_exec_lock_obj(&exec, &gobj);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
  
              ret = drm_exec_lock_obj(&exec, &gobj);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
@@ -137,7 +137,7 @@ static void test_prepare(struct kunit *test)
      drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
      drm_exec_until_all_locked(&exec) {
              ret = drm_exec_prepare_obj(&exec, &gobj, 1);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              KUNIT_EXPECT_EQ(test, ret, 0);
              if (ret)
                      break;
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c
b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 040dd142c49c..20ec1ab1b52d 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -200,7 +200,7 @@ static int handle_pagefault(struct xe_gt *gt,
struct pagefault *pf)
      drm_exec_init(&exec, 0, 0);
      drm_exec_until_all_locked(&exec) {
              ret = xe_pf_begin(&exec, vma, atomic, tile->id);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              if (ret)
                      goto unlock_dma_resv;
  
@@ -543,7 +543,7 @@ static int handle_acc(struct xe_gt *gt, struct
acc *acc)
      drm_exec_init(&exec, 0, 0);
      drm_exec_until_all_locked(&exec) {
              ret = xe_pf_begin(&exec, vma, true, tile->id);
-               drm_exec_retry_on_contention(&exec);
+               ret = drm_exec_retry_on_contention(&exec, ret);
              if (ret)
                      break;
      }
diff --git a/drivers/gpu/drm/xe/xe_vm.c
b/drivers/gpu/drm/xe/xe_vm.c
index e2ec148c9c33..335524e803e7 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -501,7 +501,7 @@ static void preempt_rebind_work_func(struct
work_struct *w)
              bool done = false;
  
              err = xe_preempt_work_begin(&exec, vm, &done);
-               drm_exec_retry_on_contention(&exec);
+               err = drm_exec_retry_on_contention(&exec, err);
              if (err || done) {
                      drm_exec_fini(&exec);
                      if (err &&
xe_vm_validate_should_retry(&exec, err, &end))
@@ -1052,7 +1052,7 @@ static void xe_vma_destroy_unlocked(struct
xe_vma *vma)
      drm_exec_init(&exec, 0, 0);
      drm_exec_until_all_locked(&exec) {
              err = xe_vm_lock_vma(&exec, vma);
-               drm_exec_retry_on_contention(&exec);
+               err = drm_exec_retry_on_contention(&exec, err);
              if (XE_WARN_ON(err))
                      break;
      }
@@ -2148,11 +2148,11 @@ static struct xe_vma *new_vma(struct xe_vm
*vm, struct drm_gpuva_op_map *op,
                      err = 0;
                      if (!bo->vm) {
                              err = drm_exec_lock_obj(&exec,
xe_vm_obj(vm));
-
                                drm_exec_retry_on_contention(&exec);
+                               err =
drm_exec_retry_on_contention(&exec, err);
                      }
                      if (!err) {
                              err = drm_exec_lock_obj(&exec,
&bo->ttm.base);
-
                                drm_exec_retry_on_contention(&exec);
+                               err =
drm_exec_retry_on_contention(&exec, err);
                      }
                      if (err) {
                              drm_exec_fini(&exec);
@@ -2884,7 +2884,7 @@ static int vm_bind_ioctl_ops_execute(struct
xe_vm *vm,
                    DRM_EXEC_IGNORE_DUPLICATES, 0);
      drm_exec_until_all_locked(&exec) {
              err = vm_bind_ioctl_ops_lock_and_prep(&exec, vm,
vops);
-               drm_exec_retry_on_contention(&exec);
+               err = drm_exec_retry_on_contention(&exec, err);
              if (err)
                      goto unlock;
  
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
index aa786b828a0a..fafb40d96e38 100644
--- a/include/drm/drm_exec.h
+++ b/include/drm/drm_exec.h
@@ -51,6 +51,8 @@ struct drm_exec {
      struct drm_gem_object *prelocked;
  };
  
+int drm_exec_handle_contended(struct drm_exec *exec);
+
  /**
   * drm_exec_obj() - Return the object for a give drm_exec index
   * @exec: Pointer to the drm_exec context
@@ -113,15 +115,26 @@ __PASTE(__drm_exec_,
__LINE__):                                              \
  /**
   * drm_exec_retry_on_contention - restart the loop to grap all
locks
   * @exec: drm_exec object
+ * @_ret: The current error status
   *
   * Control flow helper to continue when a contention was detected
and we need to
   * clean up and re-start the loop to prepare all GEM objects.
+ *
+ * Return: If no loop restart occurred: The error status.
   */
-#define drm_exec_retry_on_contention(exec)                     \
-       do {                                                    \
-               if (unlikely(drm_exec_is_contended(exec)))      \
-                       goto *__drm_exec_retry_ptr;             \
-       } while (0)
+#define drm_exec_retry_on_contention(exec,
_ret)                   \
+       ({                                                      
        \
+               struct drm_exec *__exec =
(exec);                 \
+               int __ret =
(_ret);                                 \
+                                                               
        \
+               if (unlikely(drm_exec_is_contended(__exec)))
{               \
+                       WARN_ON(__ret != -
EDEADLK);                       \
+                       __ret =
drm_exec_handle_contended(__exec);      \
+                       if
(!__ret)                                        \
+                               goto
*__drm_exec_retry_ptr;          \
+               }                                               
        \
+               __ret;                                          
        \
+       })
  
  /**
   * drm_exec_is_contended - check for contention
</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">
</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
</pre>
    </blockquote>
    <br>
  </body>
</html>