<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">That won't work correctly. The TTM BO
      is unreferenced in a couple of more places which we don't have
      control over.<br>
      <br>
      To make it even worse we actually can't take the reservation lock
      during GPU reset because the reservation object might already be
      destroyed when we remove the BO from the list.<br>
      <br>
      I will take a look at this myself today to find a solution which
      should work.<br>
      <br>
      Regards,<br>
      Christian.<br>
      <br>
      Am 11.09.2018 um 07:41 schrieb zhoucm1:<br>
    </div>
    <blockquote type="cite"
      cite="mid:83f7a45f-1aee-a2a8-bc82-f3433157c6cb@amd.com">
      <br>
      <br>
      On 2018年09月11日 11:37, zhoucm1 wrote:
      <br>
      <blockquote type="cite">
        <br>
        <br>
        On 2018年09月11日 11:32, Deng, Emily wrote:
        <br>
        <blockquote type="cite">
          <blockquote type="cite">-----Original Message-----
            <br>
            From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a>
            On Behalf Of
            <br>
            zhoucm1
            <br>
            Sent: Tuesday, September 11, 2018 11:28 AM
            <br>
            To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Zhou,
            David(ChunMing)
            <br>
            <a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <br>
            Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
            <br>
            <br>
            <br>
            <br>
            On 2018年09月11日 11:23, Deng, Emily wrote:
            <br>
            <blockquote type="cite">
              <blockquote type="cite">-----Original Message-----
                <br>
                From: Zhou, David(ChunMing)
                <br>
                Sent: Tuesday, September 11, 2018 11:03 AM
                <br>
                To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>;
                <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
                <br>
                Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock
                issue.
                <br>
                <br>
                <br>
                <br>
                On 2018年09月11日 10:51, Emily Deng wrote:
                <br>
                <blockquote type="cite">It will ramdomly have the dead
                  lock issue when test TDR:
                  <br>
                  1. amdgpu_device_handle_vram_lost gets the lock
                  shadow_list_lock 2.
                  <br>
                  amdgpu_bo_create locked the bo's resv lock 3.
                  <br>
                  amdgpu_bo_create_shadow is waiting for the
                  shadow_list_lock 4.
                  <br>
                  amdgpu_device_recover_vram_from_shadow is waiting for
                  the bo's resv
                  <br>
                  lock.
                  <br>
                  <br>
                  v2:
                  <br>
                        Make a local copy of the list
                  <br>
                  <br>
                  Signed-off-by: Emily Deng <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>
                  <br>
                  ---
                  <br>
                      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
                  <br>
                </blockquote>
                ++++++++++++++++++++-
                <br>
                <blockquote type="cite">    1 file changed, 20
                  insertions(+), 1 deletion(-)
                  <br>
                  <br>
                  diff --git
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                  <br>
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                  <br>
                  index 2a21267..8c81404 100644
                  <br>
                  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                  <br>
                  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                  <br>
                  @@ -3105,6 +3105,9 @@ static int
                  <br>
                </blockquote>
                amdgpu_device_handle_vram_lost(struct amdgpu_device
                *adev)
                <br>
                <blockquote type="cite">        long r = 1;
                  <br>
                          int i = 0;
                  <br>
                          long tmo;
                  <br>
                  +    struct list_head local_shadow_list;
                  <br>
                  +
                  <br>
                  +    INIT_LIST_HEAD(&local_shadow_list);
                  <br>
                  <br>
                          if (amdgpu_sriov_runtime(adev))
                  <br>
                              tmo = msecs_to_jiffies(8000);
                  <br>
                  @@ -3112,8 +3115,19 @@ static int
                  <br>
                </blockquote>
                amdgpu_device_handle_vram_lost(struct amdgpu_device
                *adev)
                <br>
                <blockquote type="cite">            tmo =
                  msecs_to_jiffies(100);
                  <br>
                  <br>
                          DRM_INFO("recover vram bo from shadow
                  start\n");
                  <br>
                  +
                  <br>
                  +    mutex_lock(&adev->shadow_list_lock);
                  <br>
                  +    list_splice_init(&adev->shadow_list,
                  &local_shadow_list);
                  <br>
                  +    mutex_unlock(&adev->shadow_list_lock);
                  <br>
                  +
                  <br>
                  +
                  <br>
                          mutex_lock(&adev->shadow_list_lock);
                  <br>
                </blockquote>
                local_shadow_list is a local variable, I think it
                doesn't need lock
                <br>
                at all, no one change it. Otherwise looks good to me.
                <br>
              </blockquote>
              The bo->shadow_list which now is in local_shadow_list
              maybe destroy in
              <br>
              case that it already in amdgpu_bo_destroy, then it will
              change
              <br>
            </blockquote>
            local_shadow_list, so need lock the shadow_list_lock.
            <br>
            Ah, sorry for noise, I forget you don't reference these BOs.
            <br>
          </blockquote>
          Yes, I don't reference these Bos, as I found even reference
          these Bos, it still couldn't avoid the case that another
          process is already
          <br>
          in amdgpu_bo_destroy.
          <br>
        </blockquote>
        ??? that shouldn't happen, the reference is belonged to list.
        But back to here, we don't need reference them.
        <br>
        And since no shadow BO is added to local after splice, we'd
        better to use list_next_entry to iterate the local shadow list
        instead of list_for_each_entry_safe.
        <br>
        <br>
        Thanks,
        <br>
        David Zhou
        <br>
        <blockquote type="cite">
          <blockquote type="cite">Thanks,
            <br>
            David Zhou
            <br>
            <blockquote type="cite">Best wishes
              <br>
              Emily Deng
              <br>
              <blockquote type="cite">Thanks,
                <br>
                David Zhou
                <br>
                <blockquote type="cite">-   
                  list_for_each_entry_safe(bo, tmp,
                  &adev->shadow_list, shadow_list) {
                  <br>
                  +    list_for_each_entry_safe(bo, tmp,
                  &local_shadow_list, shadow_list) {
                  <br>
                </blockquote>
              </blockquote>
            </blockquote>
          </blockquote>
        </blockquote>
      </blockquote>
      because shadow list doesn't take bo reference, we should give a
      amdgpu_bo_ref(bo) with attached patch before unlock.
      <br>
      You can have a try.
      <br>
      <br>
      Thanks,
      <br>
      David Zhou
      <br>
      <blockquote type="cite">
        <blockquote type="cite">
          <blockquote type="cite">
            <blockquote type="cite">
              <blockquote type="cite">
                <blockquote type="cite">+
                  mutex_unlock(&adev->shadow_list_lock);
                  <br>
                  +
                  <br>
                  +        if (!bo)
                  <br>
                  +            continue;
                  <br>
                  +
                  <br>
                              next = NULL;
                  <br>
                             
                  amdgpu_device_recover_vram_from_shadow(adev, ring, bo,
                  <br>
                </blockquote>
                &next);
                <br>
                <blockquote type="cite">            if (fence) {
                  <br>
                  @@ -3132,9 +3146,14 @@ static int
                  <br>
                  amdgpu_device_handle_vram_lost(struct amdgpu_device
                  *adev)
                  <br>
                  <br>
                              dma_fence_put(fence);
                  <br>
                              fence = next;
                  <br>
                  +        mutex_lock(&adev->shadow_list_lock);
                  <br>
                          }
                  <br>
                          mutex_unlock(&adev->shadow_list_lock);
                  <br>
                  <br>
                  +    mutex_lock(&adev->shadow_list_lock);
                  <br>
                  +    list_splice_init(&local_shadow_list,
                  &adev->shadow_list);
                  <br>
                  +    mutex_unlock(&adev->shadow_list_lock);
                  <br>
                  +
                  <br>
                          if (fence) {
                  <br>
                              r = dma_fence_wait_timeout(fence, false,
                  tmo);
                  <br>
                              if (r == 0)
                  <br>
                </blockquote>
              </blockquote>
            </blockquote>
            _______________________________________________
            <br>
            amd-gfx mailing list
            <br>
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <br>
            <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
            <br>
          </blockquote>
        </blockquote>
        <br>
        _______________________________________________
        <br>
        amd-gfx mailing list
        <br>
        <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
        <br>
        <a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
        <br>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>