<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-09-07 12:48 p.m., Felix
Kuehling wrote:<br>
</div>
<blockquote type="cite" cite="mid:03c5e276-c478-c33c-9f75-e03a56ef16a6@amd.com">
<pre class="moz-quote-pre" wrap="">Am 2021-09-07 um 12:07 p.m. schrieb James Zhu:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Separate iommu_resume from kfd_resume, and move it before
other amdgpu ip init/resume.
Fixed Bugzilla: <a class="moz-txt-link-freetext" href="https://bugzilla.kernel.org/show_bug.cgi?id=211277">https://bugzilla.kernel.org/show_bug.cgi?id=211277</a>
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">I think the change is OK. But I don't understand how the IOMMUv2
initialization sequence could affect a crash in DM. The display should
not depend on IOMMUv2 at all. What am I missing?</pre>
</blockquote>
<p>[JZ] It is a weird issue. disable VCN IP block or disable gpu_off
feature, or set pci=noats, all</p>
<p>can fix DM crash. Also the issue occurred quite random, some time
after few suspend/resume cycle,</p>
<p>some times after few hundreds S/R cycles. the maximum that I saw
is 2422 S/R cycles.</p>
<p>But every time DM crash, I can see one or two iommu errors ahead:<br>
</p>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
letter-spacing: normal; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; word-spacing: 0px;"><b>AMD-Vi:
Event logged [IO_PAGE_FAULT domain=0x0000 address=****
flags=0x0070]</b></div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">Since we can't stop HW/FW/SW right the way
after IO page fault detected, so I can't tell which part try to
access</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">system memory through IOMMU.</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;"><br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">But after moving IOMMU device init before
other amdgpu IP init/resume, the DM crash /IOMMU page fault issues
are gone.</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;"><br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">Those patches can't directly explain why the
issue fixed, but this new sequence makes more sense to me.</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;"><br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">Can I have you RB on those patches?<br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;"><br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">Thanks!</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;">James<br>
</div>
<div style="box-sizing: border-box; font-family: "Segoe
UI", system-ui, "Apple Color Emoji", "Segoe UI
Emoji", sans-serif; font-size: 14px; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
word-spacing: 0px;"><br>
</div>
<blockquote type="cite" cite="mid:03c5e276-c478-c33c-9f75-e03a56ef16a6@amd.com">
<pre class="moz-quote-pre" wrap="">
Regards,
Felix
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 653bd8f..e3f0308 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2393,6 +2393,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (r)
goto init_failed;
+ r = amdgpu_amdkfd_resume_iommu(adev);
+ if (r)
+ goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -3147,6 +3151,10 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
{
int r;
+ r = amdgpu_amdkfd_resume_iommu(adev);
+ if (r)
+ return r;
+
r = amdgpu_device_ip_resume_phase1(adev);
if (r)
return r;
@@ -4602,6 +4610,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
dev_warn(tmp_adev->dev, "asic atom init failed!");
} else {
dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
+ r = amdgpu_amdkfd_resume_iommu(tmp_adev);
+ if (r)
+ goto out;
+
r = amdgpu_device_ip_resume_phase1(tmp_adev);
if (r)
goto out;
</pre>
</blockquote>
</blockquote>
</body>
</html>