<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Malgun Gothic";
panose-1:2 11 5 3 2 0 0 2 0 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"\@Malgun Gothic";
panose-1:2 11 5 3 2 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<p style="font-family:Arial;font-size:10pt;color:#008000;margin:15pt;" align="Left">
[Public]<br>
</p>
<br>
<div>
<div class="WordSection1">
<p class="MsoNormal">I wouldn’t know if it was another bug elsewhere.<o:p></o:p></p>
<p class="MsoNormal">From what I was seeing, the leak was coming from !p->xnack_enable on the svm_range_restore_pages call.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If it helps, I saw this on Aldebaran where a shader does some bad memory access on purpose on a debugged ptraced child process.<o:p></o:p></p>
<p class="MsoNormal">The vm fault prompt pops up in dmesgs and a stale KFD process appends per run without this fix.<o:p></o:p></p>
<p class="MsoNormal">I’m just assuming at this point that the IV retry bit is set but I never confirmed that.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Jon<o:p></o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Yang, Philip <Philip.Yang@amd.com> <br>
<b>Sent:</b> Wednesday, September 1, 2021 12:30 PM<br>
<b>To:</b> Kim, Jonathan <Jonathan.Kim@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org<br>
<b>Subject:</b> Re: [PATCH] drm/amdkfd: drop process ref count when xnack disable<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 2021-09-01 9:45 a.m., Kim, Jonathan wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin:5.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:blue">[AMD Official Use Only]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">We were seeing process leaks on a couple of machines running certain tests that triggered vm faults on purpose.<o:p></o:p></p>
<p class="MsoNormal">I think svm_range_restore_pages gets called unconditionally on vm fault handling (unless the retry interrupt payload bit is supposed to be clear with xnack off)?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</blockquote>
<p>yes, with xnack off, sh_mem_config retry should be off, retry bit is supposed to be clear in fault interrupt vector, we should not try to recover vm fault, just report the vm fault back to application and evict user queues. Maybe it is another bug cause
p->xnack_enabled and sh_mem_config retry mismatch under specific condition?<o:p></o:p></p>
<p>Regards,<o:p></o:p></p>
<p>Philip<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">Either way, this patch prevents the process leaks we seeing and is also:<o:p></o:p></p>
<p class="MsoNormal">Reviewed-by: Jonathan Kim <a href="mailto:jonathan.kim@amd.com">
<jonathan.kim@amd.com></a><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Jon<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> amd-gfx <a href="mailto:amd-gfx-bounces@lists.freedesktop.org">
<amd-gfx-bounces@lists.freedesktop.org></a> <b>On Behalf Of </b>philip yang<br>
<b>Sent:</b> Wednesday, September 1, 2021 7:30 AM<br>
<b>To:</b> Sierra Guiza, Alejandro (Alex) <a href="mailto:Alex.Sierra@amd.com"><Alex.Sierra@amd.com></a>;
<a href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Re: [PATCH] drm/amdkfd: drop process ref count when xnack disable<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">[CAUTION: External Email] <o:p></o:p></p>
<div>
<p> <o:p></o:p></p>
<div>
<p class="MsoNormal">On 2021-08-31 10:41 p.m., Alex Sierra wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>During svm restore pages interrupt handler, kfd_process ref count was<o:p></o:p></pre>
<pre>never dropped when xnack was disabled. Therefore, the object was never<o:p></o:p></pre>
<pre>released.<o:p></o:p></pre>
</blockquote>
<p>Good catch, but if xnack is off, we should not get here to recover fault.<o:p></o:p></p>
<p>The fix looks good to me.<o:p></o:p></p>
<p>Reviewed-by: Philip Yang <a href="mailto:philip.yang@amd.com"><philip.yang@amd.com></a><o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre> <o:p></o:p></pre>
<pre>Signed-off-by: Alex Sierra <a href="mailto:alex.sierra@amd.com"><alex.sierra@amd.com></a><o:p></o:p></pre>
<pre>---<o:p></o:p></pre>
<pre> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 ++-<o:p></o:p></pre>
<pre> 1 file changed, 2 insertions(+), 1 deletion(-)<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c<o:p></o:p></pre>
<pre>index 8f9b5b53dab5..110c46cd7fac 100644<o:p></o:p></pre>
<pre>--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c<o:p></o:p></pre>
<pre>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c<o:p></o:p></pre>
<pre>@@ -2484,7 +2484,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,<o:p></o:p></pre>
<pre> }<o:p></o:p></pre>
<pre> if (!p->xnack_enabled) {<o:p></o:p></pre>
<pre> pr_debug("XNACK not enabled for pasid 0x%x\n", pasid);<o:p></o:p></pre>
<pre>- return -EFAULT;<o:p></o:p></pre>
<pre>+ r = -EFAULT;<o:p></o:p></pre>
<pre>+ goto out;<o:p></o:p></pre>
<pre> }<o:p></o:p></pre>
<pre> svms = &p->svms;<o:p></o:p></pre>
<pre> <o:p></o:p></pre>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</body>
</html>