<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"Microsoft YaHei";
panose-1:2 11 5 3 2 2 4 2 2 4;}
@font-face
{font-family:"\@Microsoft YaHei";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
color:black;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><b><span style="color:windowtext">From:</span></b><span style="color:windowtext"> Christian König <ckoenig.leichtzumerken@gmail.com>
<br>
<b>Sent:</b> Tuesday, September 11, 2018 2:40 PM<br>
<b>To:</b> Zhou, David(ChunMing) <David1.Zhou@amd.com>; Deng, Emily <Emily.Deng@amd.com>; Zhou, David(ChunMing) <David1.Zhou@amd.com>; amd-gfx@lists.freedesktop.org<br>
<b>Subject:</b> Re: [PATCH] drm/amdgpu: Fix the dead lock issue.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">That won't work correctly. The TTM BO is unreferenced in a couple of more places which we don't have control over.<br>
<br>
To make it even worse we actually can't take the reservation lock during GPU reset because the reservation object might already be destroyed when we remove the BO from the list.<br>
<br>
I will take a look at this myself today to find a solution which should work.<span style="color:windowtext"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:windowtext">Ok, thanks very much.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:windowtext"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:windowtext">Best wishes<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:windowtext">Emily Deng<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="color:windowtext"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:windowtext"><o:p> </o:p></span></p>
<p class="MsoNormal"><br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 11.09.2018 um 07:41 schrieb zhoucm1:<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><br>
<br>
On 2018<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">年</span>09<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">月</span>11<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">日</span> 11:37, zhoucm1 wrote:
<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><br>
<br>
On 2018<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">年</span>09<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">月</span>11<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">日</span> 11:32, Deng, Emily
wrote: <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">-----Original Message----- <br>
From: amd-gfx <a href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> On Behalf Of
<br>
zhoucm1 <br>
Sent: Tuesday, September 11, 2018 11:28 AM <br>
To: Deng, Emily <a href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Zhou, David(ChunMing)
<br>
<a href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>; <a href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a> <br>
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. <br>
<br>
<br>
<br>
On 2018<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">年</span>09<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">月</span>11<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">日</span> 11:23, Deng, Emily
wrote: <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">-----Original Message----- <br>
From: Zhou, David(ChunMing) <br>
Sent: Tuesday, September 11, 2018 11:03 AM <br>
To: Deng, Emily <a href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; <a href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a> <br>
Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue. <br>
<br>
<br>
<br>
On 2018<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">年</span>09<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">月</span>11<span lang="ZH-CN" style="font-family:"Microsoft YaHei",sans-serif">日</span> 10:51, Emily Deng
wrote: <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">It will ramdomly have the dead lock issue when test TDR: <br>
1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2. <br>
amdgpu_bo_create locked the bo's resv lock 3. <br>
amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4. <br>
amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv <br>
lock. <br>
<br>
v2: <br>
Make a local copy of the list <br>
<br>
Signed-off-by: Emily Deng <a href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>
<br>
--- <br>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">++++++++++++++++++++- <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> 1 file changed, 20 insertions(+), 1 deletion(-) <br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c <br>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c <br>
index 2a21267..8c81404 100644 <br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c <br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c <br>
@@ -3105,6 +3105,9 @@ static int <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> long r = 1; <br>
int i = 0; <br>
long tmo; <br>
+ struct list_head local_shadow_list; <br>
+ <br>
+ INIT_LIST_HEAD(&local_shadow_list); <br>
<br>
if (amdgpu_sriov_runtime(adev)) <br>
tmo = msecs_to_jiffies(8000); <br>
@@ -3112,8 +3115,19 @@ static int <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> tmo = msecs_to_jiffies(100); <br>
<br>
DRM_INFO("recover vram bo from shadow start\n"); <br>
+ <br>
+ mutex_lock(&adev->shadow_list_lock); <br>
+ list_splice_init(&adev->shadow_list, &local_shadow_list); <br>
+ mutex_unlock(&adev->shadow_list_lock); <br>
+ <br>
+ <br>
mutex_lock(&adev->shadow_list_lock); <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">local_shadow_list is a local variable, I think it doesn't need lock
<br>
at all, no one change it. Otherwise looks good to me. <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">The bo->shadow_list which now is in local_shadow_list maybe destroy in
<br>
case that it already in amdgpu_bo_destroy, then it will change <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">local_shadow_list, so need lock the shadow_list_lock. <br>
Ah, sorry for noise, I forget you don't reference these BOs. <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">Yes, I don't reference these Bos, as I found even reference these Bos, it still couldn't avoid the case that another process is already
<br>
in amdgpu_bo_destroy. <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">??? that shouldn't happen, the reference is belonged to list. But back to here, we don't need reference them.
<br>
And since no shadow BO is added to local after splice, we'd better to use list_next_entry to iterate the local shadow list instead of list_for_each_entry_safe.
<br>
<br>
Thanks, <br>
David Zhou <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Thanks, <br>
David Zhou <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Best wishes <br>
Emily Deng <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Thanks, <br>
David Zhou <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">- list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
<br>
+ list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) { <o:p></o:p></p>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<p class="MsoNormal">because shadow list doesn't take bo reference, we should give a amdgpu_bo_ref(bo) with attached patch before unlock.
<br>
You can have a try. <br>
<br>
Thanks, <br>
David Zhou <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">+ mutex_unlock(&adev->shadow_list_lock); <br>
+ <br>
+ if (!bo) <br>
+ continue; <br>
+ <br>
next = NULL; <br>
amdgpu_device_recover_vram_from_shadow(adev, ring, bo, <o:p></o:p></p>
</blockquote>
<p class="MsoNormal">&next); <br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"> if (fence) { <br>
@@ -3132,9 +3146,14 @@ static int <br>
amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) <br>
<br>
dma_fence_put(fence); <br>
fence = next; <br>
+ mutex_lock(&adev->shadow_list_lock); <br>
} <br>
mutex_unlock(&adev->shadow_list_lock); <br>
<br>
+ mutex_lock(&adev->shadow_list_lock); <br>
+ list_splice_init(&local_shadow_list, &adev->shadow_list); <br>
+ mutex_unlock(&adev->shadow_list_lock); <br>
+ <br>
if (fence) { <br>
r = dma_fence_wait_timeout(fence, false, tmo); <br>
if (r == 0) <o:p></o:p></p>
</blockquote>
</blockquote>
</blockquote>
<p class="MsoNormal">_______________________________________________ <br>
amd-gfx mailing list <br>
<a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> <br>
<a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
<o:p></o:p></p>
</blockquote>
</blockquote>
<p class="MsoNormal"><br>
_______________________________________________ <br>
amd-gfx mailing list <br>
<a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a> <br>
<a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
<o:p></o:p></p>
</blockquote>
<p class="MsoNormal"><br>
<br>
<br>
<br>
<o:p></o:p></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>amd-gfx mailing list<o:p></o:p></pre>
<pre><a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><o:p></o:p></pre>
<pre><a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><o:p></o:p></pre>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>