<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">That won't work correctly. The TTM BO
is unreferenced in a couple of more places which we don't have
control over.<br>
<br>
To make it even worse we actually can't take the reservation lock
during GPU reset because the reservation object might already be
destroyed when we remove the BO from the list.<br>
<br>
I will take a look at this myself today to find a solution which
should work.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 11.09.2018 um 07:41 schrieb zhoucm1:<br>
</div>
<blockquote type="cite"
cite="mid:83f7a45f-1aee-a2a8-bc82-f3433157c6cb@amd.com">
<br>
<br>
On 2018年09月11日 11:37, zhoucm1 wrote:
<br>
<blockquote type="cite">
<br>
<br>
On 2018年09月11日 11:32, Deng, Emily wrote:
<br>
<blockquote type="cite">
<blockquote type="cite">-----Original Message-----
<br>
From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a>
On Behalf Of
<br>
zhoucm1
<br>
Sent: Tuesday, September 11, 2018 11:28 AM
<br>
To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Zhou,
David(ChunMing)
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
<br>
<br>
<br>
<br>
On 2018年09月11日 11:23, Deng, Emily wrote:
<br>
<blockquote type="cite">
<blockquote type="cite">-----Original Message-----
<br>
From: Zhou, David(ChunMing)
<br>
Sent: Tuesday, September 11, 2018 11:03 AM
<br>
To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock
issue.
<br>
<br>
<br>
<br>
On 2018年09月11日 10:51, Emily Deng wrote:
<br>
<blockquote type="cite">It will ramdomly have the dead
lock issue when test TDR:
<br>
1. amdgpu_device_handle_vram_lost gets the lock
shadow_list_lock 2.
<br>
amdgpu_bo_create locked the bo's resv lock 3.
<br>
amdgpu_bo_create_shadow is waiting for the
shadow_list_lock 4.
<br>
amdgpu_device_recover_vram_from_shadow is waiting for
the bo's resv
<br>
lock.
<br>
<br>
v2:
<br>
Make a local copy of the list
<br>
<br>
Signed-off-by: Emily Deng <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
<br>
</blockquote>
++++++++++++++++++++-
<br>
<blockquote type="cite"> 1 file changed, 20
insertions(+), 1 deletion(-)
<br>
<br>
diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
index 2a21267..8c81404 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
<br>
@@ -3105,6 +3105,9 @@ static int
<br>
</blockquote>
amdgpu_device_handle_vram_lost(struct amdgpu_device
*adev)
<br>
<blockquote type="cite"> long r = 1;
<br>
int i = 0;
<br>
long tmo;
<br>
+ struct list_head local_shadow_list;
<br>
+
<br>
+ INIT_LIST_HEAD(&local_shadow_list);
<br>
<br>
if (amdgpu_sriov_runtime(adev))
<br>
tmo = msecs_to_jiffies(8000);
<br>
@@ -3112,8 +3115,19 @@ static int
<br>
</blockquote>
amdgpu_device_handle_vram_lost(struct amdgpu_device
*adev)
<br>
<blockquote type="cite"> tmo =
msecs_to_jiffies(100);
<br>
<br>
DRM_INFO("recover vram bo from shadow
start\n");
<br>
+
<br>
+ mutex_lock(&adev->shadow_list_lock);
<br>
+ list_splice_init(&adev->shadow_list,
&local_shadow_list);
<br>
+ mutex_unlock(&adev->shadow_list_lock);
<br>
+
<br>
+
<br>
mutex_lock(&adev->shadow_list_lock);
<br>
</blockquote>
local_shadow_list is a local variable, I think it
doesn't need lock
<br>
at all, no one change it. Otherwise looks good to me.
<br>
</blockquote>
The bo->shadow_list which now is in local_shadow_list
maybe destroy in
<br>
case that it already in amdgpu_bo_destroy, then it will
change
<br>
</blockquote>
local_shadow_list, so need lock the shadow_list_lock.
<br>
Ah, sorry for noise, I forget you don't reference these BOs.
<br>
</blockquote>
Yes, I don't reference these Bos, as I found even reference
these Bos, it still couldn't avoid the case that another
process is already
<br>
in amdgpu_bo_destroy.
<br>
</blockquote>
??? that shouldn't happen, the reference is belonged to list.
But back to here, we don't need reference them.
<br>
And since no shadow BO is added to local after splice, we'd
better to use list_next_entry to iterate the local shadow list
instead of list_for_each_entry_safe.
<br>
<br>
Thanks,
<br>
David Zhou
<br>
<blockquote type="cite">
<blockquote type="cite">Thanks,
<br>
David Zhou
<br>
<blockquote type="cite">Best wishes
<br>
Emily Deng
<br>
<blockquote type="cite">Thanks,
<br>
David Zhou
<br>
<blockquote type="cite">-
list_for_each_entry_safe(bo, tmp,
&adev->shadow_list, shadow_list) {
<br>
+ list_for_each_entry_safe(bo, tmp,
&local_shadow_list, shadow_list) {
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
because shadow list doesn't take bo reference, we should give a
amdgpu_bo_ref(bo) with attached patch before unlock.
<br>
You can have a try.
<br>
<br>
Thanks,
<br>
David Zhou
<br>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">+
mutex_unlock(&adev->shadow_list_lock);
<br>
+
<br>
+ if (!bo)
<br>
+ continue;
<br>
+
<br>
next = NULL;
<br>
amdgpu_device_recover_vram_from_shadow(adev, ring, bo,
<br>
</blockquote>
&next);
<br>
<blockquote type="cite"> if (fence) {
<br>
@@ -3132,9 +3146,14 @@ static int
<br>
amdgpu_device_handle_vram_lost(struct amdgpu_device
*adev)
<br>
<br>
dma_fence_put(fence);
<br>
fence = next;
<br>
+ mutex_lock(&adev->shadow_list_lock);
<br>
}
<br>
mutex_unlock(&adev->shadow_list_lock);
<br>
<br>
+ mutex_lock(&adev->shadow_list_lock);
<br>
+ list_splice_init(&local_shadow_list,
&adev->shadow_list);
<br>
+ mutex_unlock(&adev->shadow_list_lock);
<br>
+
<br>
if (fence) {
<br>
r = dma_fence_wait_timeout(fence, false,
tmo);
<br>
if (r == 0)
<br>
</blockquote>
</blockquote>
</blockquote>
_______________________________________________
<br>
amd-gfx mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
<br>
</blockquote>
</blockquote>
<br>
_______________________________________________
<br>
amd-gfx mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
<br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
</blockquote>
<br>
</body>
</html>