<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>I wouldn't call it premature. Revert is a usual practice when
there is a serious regression that isn't fully understood or
root-caused. As far as I can tell, the problem has been reproduced
on multiple systems, different GPUs, and clearly regressed to
Christian's commit. I think that justifies reverting it for now.<br>
</p>
<p>I agree with Christian that a general HDP memory access problem
causing RAS errors would potentially cause problems in other tests
as well. For example common operations like GART table updates,
and GPUVM page table updates and PCIe peer2peer accesses in ROCm
applications use HDP. But we're not seeing obvious problems from
those. So we need to understand what's special about this test. I
asked questions to that effect on our other email thread.</p>
<p>Regards,<br>
Felix<br>
</p>
<div class="moz-cite-prefix">Am 2020-04-14 um 10:51 a.m. schrieb
Kim, Jonathan:<br>
</div>
<blockquote type="cite" cite="mid:MN2PR12MB451836BC6F9C0F002EE1C3D685DA0@MN2PR12MB4518.namprd12.prod.outlook.com">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
p.msipheader4d0fcdd7, li.msipheader4d0fcdd7, div.msipheader4d0fcdd7
{mso-style-name:msipheader4d0fcdd7;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle20
{mso-style-type:personal-compose;
font-family:"Arial",sans-serif;
color:#0078D7;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="msipheader4d0fcdd7" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD
Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I think it’s premature to push this revert.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">With more testing, I’m getting failures
from different tests or sometimes none at all on my machine.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Kent, let’s continue the discussion on the
original thread.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Jon<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Koenig, Christian
<a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a> <br>
<b>Sent:</b> Tuesday, April 14, 2020 10:47 AM<br>
<b>To:</b> Deucher, Alexander
<a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><br>
<b>Cc:</b> Russell, Kent <a class="moz-txt-link-rfc2396E" href="mailto:Kent.Russell@amd.com"><Kent.Russell@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Kuehling, Felix
<a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; Kim, Jonathan
<a class="moz-txt-link-rfc2396E" href="mailto:Jonathan.Kim@amd.com"><Jonathan.Kim@amd.com></a><br>
<b>Subject:</b> Re: [PATCH] Revert "drm/amdgpu: use the
BAR if possible in amdgpu_device_vram_access v2"<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal">That's exactly my concern as
well. <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">This looks a bit like the test
creates erroneous data somehow, but there doesn't
seems to be a RAS check in the MM data path.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">And now that we use the BAR
path it goes up in flames.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I just don't see how we can
create erroneous data in a test case?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Christian.<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Am 14.04.2020 16:35 schrieb
"Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid
#CCCCCC 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
Public Use]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">If
this causes an issue, any access to vram
via the BAR could cause an issue.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center">
</div>
<div>
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
<b>Sent:</b> Tuesday, April 14, 2020
10:19 AM<br>
<b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
<b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in
amdgpu_device_vram_access v2"</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[AMD Official Use
Only - Internal Distribution Only]<br>
<br>
On VG20 or MI100, as soon as we run the
subtest, we get the dmesg output below,
and then the kernel ends up hanging. I
don't know enough about the test itself
to know why this is occurring, but Jon
Kim and Felix were discussing it on a
separate thread when the issue was first
reported, so they can hopefully provide
some additional information.<br>
<br>
Kent<br>
<br>
> -----Original Message-----<br>
> From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
> Sent: Tuesday, April 14, 2020 9:52
AM<br>
> To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
> Subject: Re: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in<br>
> amdgpu_device_vram_access v2"<br>
> <br>
> Am 13.04.20 um 20:20 schrieb Kent
Russell:<br>
> > This reverts commit
c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
> > The original patch causes a
RAS event and subsequent kernel
hard-hang<br>
> > when running the
KFDMemoryTest.PtraceAccessInvisibleVram
on VG20 and<br>
> > Arcturus<br>
> ><br>
> > dmesg output at hang time:<br>
> > [drm] RAS event of type
ERREVENT_ATHUB_INTERRUPT detected!<br>
> > amdgpu 0000:67:00.0: GPU reset
begin!<br>
> > Evicting PASID 0x8000 queues<br>
> > Started evicting pasid 0x8000<br>
> > qcm fence wait loop timeout
expired<br>
> > The cp might be in an
unrecoverable state due to an
unsuccessful<br>
> > queues preemption Failed to
evict process queues Failed to suspend<br>
> > process 0x8000 Finished
evicting pasid 0x8000 Started restoring
pasid<br>
> > 0x8000 Finished restoring
pasid 0x8000 [drm] UVD VCPU state may
lost<br>
> > due to RAS
ERREVENT_ATHUB_INTERRUPT<br>
> > amdgpu: [powerplay] Failed to
send message 0x26, response 0x0<br>
> > amdgpu: [powerplay] Failed to
set soft min gfxclk !<br>
> > amdgpu: [powerplay] Failed to
upload DPM Bootup Levels!<br>
> > amdgpu: [powerplay] Failed to
send message 0x7, response 0x0<br>
> > amdgpu: [powerplay]
[DisableAllSMUFeatures] Failed to
disable all smu<br>
> features!<br>
> > amdgpu: [powerplay]
[DisableDpmTasks] Failed to disable all
smu features!<br>
> > amdgpu: [powerplay]
[PowerOffAsic] Failed to disable DPM!<br>
> >
[drm:amdgpu_device_ip_suspend_phase2
[amdgpu]] *ERROR* suspend of IP<br>
> > block <powerplay> failed
-5<br>
> <br>
> Do you have more information on
what's going wrong here since this is a
really<br>
> important patch for KFD debugging.<br>
> <br>
> ><br>
> > Signed-off-by: Kent Russell
<<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
> <br>
> Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
> <br>
> > ---<br>
> >
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 26 ----------------------<br>
> > 1 file changed, 26
deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> >
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > index
cf5d6e585634..a3f997f84020 100644<br>
> > ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > @@ -254,32 +254,6 @@ void
amdgpu_device_vram_access(struct<br>
> amdgpu_device *adev, loff_t pos,<br>
> > uint32_t hi = ~0;<br>
> > uint64_t last;<br>
> ><br>
> > -<br>
> > -#ifdef CONFIG_64BIT<br>
> > - last = min(pos + size,
adev->gmc.visible_vram_size);<br>
> > - if (last > pos) {<br>
> > - void __iomem *addr
= adev->mman.aper_base_kaddr + pos;<br>
> > - size_t count =
last - pos;<br>
> > -<br>
> > - if (write) {<br>
> > -
memcpy_toio(addr, buf, count);<br>
> > - mb();<br>
> > -
amdgpu_asic_flush_hdp(adev, NULL);<br>
> > - } else {<br>
> > -
amdgpu_asic_invalidate_hdp(adev, NULL);<br>
> > - mb();<br>
> > -
memcpy_fromio(buf, addr, count);<br>
> > - }<br>
> > -<br>
> > - if (count == size)<br>
> > - return;<br>
> > -<br>
> > - pos += count;<br>
> > - buf += count / 4;<br>
> > - size -= count;<br>
> > - }<br>
> > -#endif<br>
> > -<br>
> >
spin_lock_irqsave(&adev->mmio_idx_lock,
flags);<br>
> > for (last = pos + size;
pos < last; pos += 4) {<br>
> > uint32_t tmp =
pos >> 31;<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Am 14.04.2020 16:35 schrieb
"Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid
#CCCCCC 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
Public Use]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">If
this causes an issue, any access to vram
via the BAR could cause an issue.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center">
</div>
<div>
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
<b>Sent:</b> Tuesday, April 14, 2020 10:19
AM<br>
<b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
<b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in
amdgpu_device_vram_access v2"</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[AMD Official Use Only
- Internal Distribution Only]<br>
<br>
On VG20 or MI100, as soon as we run the
subtest, we get the dmesg output below,
and then the kernel ends up hanging. I
don't know enough about the test itself to
know why this is occurring, but Jon Kim
and Felix were discussing it on a separate
thread when the issue was first reported,
so they can hopefully provide some
additional information.<br>
<br>
Kent<br>
<br>
> -----Original Message-----<br>
> From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
> Sent: Tuesday, April 14, 2020 9:52 AM<br>
> To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
> Subject: Re: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in<br>
> amdgpu_device_vram_access v2"<br>
> <br>
> Am 13.04.20 um 20:20 schrieb Kent
Russell:<br>
> > This reverts commit
c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
> > The original patch causes a RAS
event and subsequent kernel hard-hang<br>
> > when running the
KFDMemoryTest.PtraceAccessInvisibleVram on
VG20 and<br>
> > Arcturus<br>
> ><br>
> > dmesg output at hang time:<br>
> > [drm] RAS event of type
ERREVENT_ATHUB_INTERRUPT detected!<br>
> > amdgpu 0000:67:00.0: GPU reset
begin!<br>
> > Evicting PASID 0x8000 queues<br>
> > Started evicting pasid 0x8000<br>
> > qcm fence wait loop timeout
expired<br>
> > The cp might be in an
unrecoverable state due to an unsuccessful<br>
> > queues preemption Failed to
evict process queues Failed to suspend<br>
> > process 0x8000 Finished evicting
pasid 0x8000 Started restoring pasid<br>
> > 0x8000 Finished restoring pasid
0x8000 [drm] UVD VCPU state may lost<br>
> > due to RAS
ERREVENT_ATHUB_INTERRUPT<br>
> > amdgpu: [powerplay] Failed to
send message 0x26, response 0x0<br>
> > amdgpu: [powerplay] Failed to
set soft min gfxclk !<br>
> > amdgpu: [powerplay] Failed to
upload DPM Bootup Levels!<br>
> > amdgpu: [powerplay] Failed to
send message 0x7, response 0x0<br>
> > amdgpu: [powerplay]
[DisableAllSMUFeatures] Failed to disable
all smu<br>
> features!<br>
> > amdgpu: [powerplay]
[DisableDpmTasks] Failed to disable all
smu features!<br>
> > amdgpu: [powerplay]
[PowerOffAsic] Failed to disable DPM!<br>
> >
[drm:amdgpu_device_ip_suspend_phase2
[amdgpu]] *ERROR* suspend of IP<br>
> > block <powerplay> failed
-5<br>
> <br>
> Do you have more information on
what's going wrong here since this is a
really<br>
> important patch for KFD debugging.<br>
> <br>
> ><br>
> > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
> <br>
> Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
> <br>
> > ---<br>
> >
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 26 ----------------------<br>
> > 1 file changed, 26
deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> >
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > index cf5d6e585634..a3f997f84020
100644<br>
> > ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > @@ -254,32 +254,6 @@ void
amdgpu_device_vram_access(struct<br>
> amdgpu_device *adev, loff_t pos,<br>
> > uint32_t hi = ~0;<br>
> > uint64_t last;<br>
> ><br>
> > -<br>
> > -#ifdef CONFIG_64BIT<br>
> > - last = min(pos + size,
adev->gmc.visible_vram_size);<br>
> > - if (last > pos) {<br>
> > - void __iomem *addr =
adev->mman.aper_base_kaddr + pos;<br>
> > - size_t count = last
- pos;<br>
> > -<br>
> > - if (write) {<br>
> > -
memcpy_toio(addr, buf, count);<br>
> > - mb();<br>
> > -
amdgpu_asic_flush_hdp(adev, NULL);<br>
> > - } else {<br>
> > -
amdgpu_asic_invalidate_hdp(adev, NULL);<br>
> > - mb();<br>
> > -
memcpy_fromio(buf, addr, count);<br>
> > - }<br>
> > -<br>
> > - if (count == size)<br>
> > - return;<br>
> > -<br>
> > - pos += count;<br>
> > - buf += count / 4;<br>
> > - size -= count;<br>
> > - }<br>
> > -#endif<br>
> > -<br>
> >
spin_lock_irqsave(&adev->mmio_idx_lock,
flags);<br>
> > for (last = pos + size; pos
< last; pos += 4) {<br>
> > uint32_t tmp = pos
>> 31;<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Am 14.04.2020 16:35 schrieb
"Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
Public Use]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this
causes an issue, any access to vram via the
BAR could cause an issue.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center">
</div>
<div>
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
<b>Sent:</b> Tuesday, April 14, 2020 10:19
AM<br>
<b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
<b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in
amdgpu_device_vram_access v2"</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[AMD Official Use Only -
Internal Distribution Only]<br>
<br>
On VG20 or MI100, as soon as we run the
subtest, we get the dmesg output below, and
then the kernel ends up hanging. I don't
know enough about the test itself to know
why this is occurring, but Jon Kim and Felix
were discussing it on a separate thread when
the issue was first reported, so they can
hopefully provide some additional
information.<br>
<br>
Kent<br>
<br>
> -----Original Message-----<br>
> From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
> Sent: Tuesday, April 14, 2020 9:52 AM<br>
> To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
> Subject: Re: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in<br>
> amdgpu_device_vram_access v2"<br>
> <br>
> Am 13.04.20 um 20:20 schrieb Kent
Russell:<br>
> > This reverts commit
c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
> > The original patch causes a RAS
event and subsequent kernel hard-hang<br>
> > when running the
KFDMemoryTest.PtraceAccessInvisibleVram on
VG20 and<br>
> > Arcturus<br>
> ><br>
> > dmesg output at hang time:<br>
> > [drm] RAS event of type
ERREVENT_ATHUB_INTERRUPT detected!<br>
> > amdgpu 0000:67:00.0: GPU reset
begin!<br>
> > Evicting PASID 0x8000 queues<br>
> > Started evicting pasid 0x8000<br>
> > qcm fence wait loop timeout
expired<br>
> > The cp might be in an
unrecoverable state due to an unsuccessful<br>
> > queues preemption Failed to evict
process queues Failed to suspend<br>
> > process 0x8000 Finished evicting
pasid 0x8000 Started restoring pasid<br>
> > 0x8000 Finished restoring pasid
0x8000 [drm] UVD VCPU state may lost<br>
> > due to RAS
ERREVENT_ATHUB_INTERRUPT<br>
> > amdgpu: [powerplay] Failed to send
message 0x26, response 0x0<br>
> > amdgpu: [powerplay] Failed to set
soft min gfxclk !<br>
> > amdgpu: [powerplay] Failed to
upload DPM Bootup Levels!<br>
> > amdgpu: [powerplay] Failed to send
message 0x7, response 0x0<br>
> > amdgpu: [powerplay]
[DisableAllSMUFeatures] Failed to disable
all smu<br>
> features!<br>
> > amdgpu: [powerplay]
[DisableDpmTasks] Failed to disable all smu
features!<br>
> > amdgpu: [powerplay] [PowerOffAsic]
Failed to disable DPM!<br>
> >
[drm:amdgpu_device_ip_suspend_phase2
[amdgpu]] *ERROR* suspend of IP<br>
> > block <powerplay> failed -5<br>
> <br>
> Do you have more information on what's
going wrong here since this is a really<br>
> important patch for KFD debugging.<br>
> <br>
> ><br>
> > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
> <br>
> Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
> <br>
> > ---<br>
> >
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
26 ----------------------<br>
> > 1 file changed, 26 deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> >
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > index cf5d6e585634..a3f997f84020
100644<br>
> > ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > @@ -254,32 +254,6 @@ void
amdgpu_device_vram_access(struct<br>
> amdgpu_device *adev, loff_t pos,<br>
> > uint32_t hi = ~0;<br>
> > uint64_t last;<br>
> ><br>
> > -<br>
> > -#ifdef CONFIG_64BIT<br>
> > - last = min(pos + size,
adev->gmc.visible_vram_size);<br>
> > - if (last > pos) {<br>
> > - void __iomem *addr =
adev->mman.aper_base_kaddr + pos;<br>
> > - size_t count = last -
pos;<br>
> > -<br>
> > - if (write) {<br>
> > -
memcpy_toio(addr, buf, count);<br>
> > - mb();<br>
> > -
amdgpu_asic_flush_hdp(adev, NULL);<br>
> > - } else {<br>
> > -
amdgpu_asic_invalidate_hdp(adev, NULL);<br>
> > - mb();<br>
> > -
memcpy_fromio(buf, addr, count);<br>
> > - }<br>
> > -<br>
> > - if (count == size)<br>
> > - return;<br>
> > -<br>
> > - pos += count;<br>
> > - buf += count / 4;<br>
> > - size -= count;<br>
> > - }<br>
> > -#endif<br>
> > -<br>
> >
spin_lock_irqsave(&adev->mmio_idx_lock,
flags);<br>
> > for (last = pos + size; pos
< last; pos += 4) {<br>
> > uint32_t tmp = pos
>> 31;<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Am 14.04.2020 16:35 schrieb "Deucher,
Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC
1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
Public Use]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this
causes an issue, any access to vram via the
BAR could cause an issue.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center">
</div>
<div>
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
<b>Sent:</b> Tuesday, April 14, 2020 10:19 AM<br>
<b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
<b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] Revert
"drm/amdgpu: use the BAR if possible in
amdgpu_device_vram_access v2"</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[AMD Official Use Only -
Internal Distribution Only]<br>
<br>
On VG20 or MI100, as soon as we run the
subtest, we get the dmesg output below, and
then the kernel ends up hanging. I don't know
enough about the test itself to know why this
is occurring, but Jon Kim and Felix were
discussing it on a separate thread when the
issue was first reported, so they can
hopefully provide some additional information.<br>
<br>
Kent<br>
<br>
> -----Original Message-----<br>
> From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
> Sent: Tuesday, April 14, 2020 9:52 AM<br>
> To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
> Subject: Re: [PATCH] Revert "drm/amdgpu:
use the BAR if possible in<br>
> amdgpu_device_vram_access v2"<br>
> <br>
> Am 13.04.20 um 20:20 schrieb Kent
Russell:<br>
> > This reverts commit
c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
> > The original patch causes a RAS
event and subsequent kernel hard-hang<br>
> > when running the
KFDMemoryTest.PtraceAccessInvisibleVram on
VG20 and<br>
> > Arcturus<br>
> ><br>
> > dmesg output at hang time:<br>
> > [drm] RAS event of type
ERREVENT_ATHUB_INTERRUPT detected!<br>
> > amdgpu 0000:67:00.0: GPU reset
begin!<br>
> > Evicting PASID 0x8000 queues<br>
> > Started evicting pasid 0x8000<br>
> > qcm fence wait loop timeout expired<br>
> > The cp might be in an unrecoverable
state due to an unsuccessful<br>
> > queues preemption Failed to evict
process queues Failed to suspend<br>
> > process 0x8000 Finished evicting
pasid 0x8000 Started restoring pasid<br>
> > 0x8000 Finished restoring pasid
0x8000 [drm] UVD VCPU state may lost<br>
> > due to RAS ERREVENT_ATHUB_INTERRUPT<br>
> > amdgpu: [powerplay] Failed to send
message 0x26, response 0x0<br>
> > amdgpu: [powerplay] Failed to set
soft min gfxclk !<br>
> > amdgpu: [powerplay] Failed to upload
DPM Bootup Levels!<br>
> > amdgpu: [powerplay] Failed to send
message 0x7, response 0x0<br>
> > amdgpu: [powerplay]
[DisableAllSMUFeatures] Failed to disable all
smu<br>
> features!<br>
> > amdgpu: [powerplay]
[DisableDpmTasks] Failed to disable all smu
features!<br>
> > amdgpu: [powerplay] [PowerOffAsic]
Failed to disable DPM!<br>
> > [drm:amdgpu_device_ip_suspend_phase2
[amdgpu]] *ERROR* suspend of IP<br>
> > block <powerplay> failed -5<br>
> <br>
> Do you have more information on what's
going wrong here since this is a really<br>
> important patch for KFD debugging.<br>
> <br>
> ><br>
> > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
> <br>
> Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
> <br>
> > ---<br>
> >
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
26 ----------------------<br>
> > 1 file changed, 26 deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> >
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > index cf5d6e585634..a3f997f84020
100644<br>
> > ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > @@ -254,32 +254,6 @@ void
amdgpu_device_vram_access(struct<br>
> amdgpu_device *adev, loff_t pos,<br>
> > uint32_t hi = ~0;<br>
> > uint64_t last;<br>
> ><br>
> > -<br>
> > -#ifdef CONFIG_64BIT<br>
> > - last = min(pos + size,
adev->gmc.visible_vram_size);<br>
> > - if (last > pos) {<br>
> > - void __iomem *addr =
adev->mman.aper_base_kaddr + pos;<br>
> > - size_t count = last -
pos;<br>
> > -<br>
> > - if (write) {<br>
> > -
memcpy_toio(addr, buf, count);<br>
> > - mb();<br>
> > -
amdgpu_asic_flush_hdp(adev, NULL);<br>
> > - } else {<br>
> > -
amdgpu_asic_invalidate_hdp(adev, NULL);<br>
> > - mb();<br>
> > -
memcpy_fromio(buf, addr, count);<br>
> > - }<br>
> > -<br>
> > - if (count == size)<br>
> > - return;<br>
> > -<br>
> > - pos += count;<br>
> > - buf += count / 4;<br>
> > - size -= count;<br>
> > - }<br>
> > -#endif<br>
> > -<br>
> >
spin_lock_irqsave(&adev->mmio_idx_lock,
flags);<br>
> > for (last = pos + size; pos
< last; pos += 4) {<br>
> > uint32_t tmp = pos
>> 31;<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Am 14.04.2020 16:35 schrieb "Deucher,
Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
</div>
</div>
<div>
<p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
Public Use]<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this causes an
issue, any access to vram via the BAR could cause an
issue.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
</div>
<div class="MsoNormal" style="text-align:center" align="center">
<hr width="98%" size="2" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
<b>Sent:</b> Tuesday, April 14, 2020 10:19 AM<br>
<b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
<<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
<b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
<b>Subject:</b> RE: [PATCH] Revert "drm/amdgpu: use
the BAR if possible in amdgpu_device_vram_access v2"</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal">[AMD Official Use Only - Internal
Distribution Only]<br>
<br>
On VG20 or MI100, as soon as we run the subtest, we
get the dmesg output below, and then the kernel ends
up hanging. I don't know enough about the test itself
to know why this is occurring, but Jon Kim and Felix
were discussing it on a separate thread when the issue
was first reported, so they can hopefully provide some
additional information.<br>
<br>
Kent<br>
<br>
> -----Original Message-----<br>
> From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
> Sent: Tuesday, April 14, 2020 9:52 AM<br>
> To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the
BAR if possible in<br>
> amdgpu_device_vram_access v2"<br>
> <br>
> Am 13.04.20 um 20:20 schrieb Kent Russell:<br>
> > This reverts commit
c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
> > The original patch causes a RAS event and
subsequent kernel hard-hang<br>
> > when running the
KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and<br>
> > Arcturus<br>
> ><br>
> > dmesg output at hang time:<br>
> > [drm] RAS event of type
ERREVENT_ATHUB_INTERRUPT detected!<br>
> > amdgpu 0000:67:00.0: GPU reset begin!<br>
> > Evicting PASID 0x8000 queues<br>
> > Started evicting pasid 0x8000<br>
> > qcm fence wait loop timeout expired<br>
> > The cp might be in an unrecoverable state
due to an unsuccessful<br>
> > queues preemption Failed to evict process
queues Failed to suspend<br>
> > process 0x8000 Finished evicting pasid
0x8000 Started restoring pasid<br>
> > 0x8000 Finished restoring pasid 0x8000 [drm]
UVD VCPU state may lost<br>
> > due to RAS ERREVENT_ATHUB_INTERRUPT<br>
> > amdgpu: [powerplay] Failed to send message
0x26, response 0x0<br>
> > amdgpu: [powerplay] Failed to set soft min
gfxclk !<br>
> > amdgpu: [powerplay] Failed to upload DPM
Bootup Levels!<br>
> > amdgpu: [powerplay] Failed to send message
0x7, response 0x0<br>
> > amdgpu: [powerplay] [DisableAllSMUFeatures]
Failed to disable all smu<br>
> features!<br>
> > amdgpu: [powerplay] [DisableDpmTasks] Failed
to disable all smu features!<br>
> > amdgpu: [powerplay] [PowerOffAsic] Failed to
disable DPM!<br>
> > [drm:amdgpu_device_ip_suspend_phase2
[amdgpu]] *ERROR* suspend of IP<br>
> > block <powerplay> failed -5<br>
> <br>
> Do you have more information on what's going
wrong here since this is a really<br>
> important patch for KFD debugging.<br>
> <br>
> ><br>
> > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
> <br>
> Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
> <br>
> > ---<br>
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
| 26 ----------------------<br>
> > 1 file changed, 26 deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > index cf5d6e585634..a3f997f84020 100644<br>
> > ---
a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > +++
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
> > @@ -254,32 +254,6 @@ void
amdgpu_device_vram_access(struct<br>
> amdgpu_device *adev, loff_t pos,<br>
> > uint32_t hi = ~0;<br>
> > uint64_t last;<br>
> ><br>
> > -<br>
> > -#ifdef CONFIG_64BIT<br>
> > - last = min(pos + size,
adev->gmc.visible_vram_size);<br>
> > - if (last > pos) {<br>
> > - void __iomem *addr =
adev->mman.aper_base_kaddr + pos;<br>
> > - size_t count = last - pos;<br>
> > -<br>
> > - if (write) {<br>
> > - memcpy_toio(addr, buf,
count);<br>
> > - mb();<br>
> > -
amdgpu_asic_flush_hdp(adev, NULL);<br>
> > - } else {<br>
> > -
amdgpu_asic_invalidate_hdp(adev, NULL);<br>
> > - mb();<br>
> > - memcpy_fromio(buf, addr,
count);<br>
> > - }<br>
> > -<br>
> > - if (count == size)<br>
> > - return;<br>
> > -<br>
> > - pos += count;<br>
> > - buf += count / 4;<br>
> > - size -= count;<br>
> > - }<br>
> > -#endif<br>
> > -<br>
> >
spin_lock_irqsave(&adev->mmio_idx_lock, flags);<br>
> > for (last = pos + size; pos < last;
pos += 4) {<br>
> > uint32_t tmp = pos >> 31;<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0</a><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>