<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>I wouldn't call it premature. Revert is a usual practice when
      there is a serious regression that isn't fully understood or
      root-caused. As far as I can tell, the problem has been reproduced
      on multiple systems, different GPUs, and clearly regressed to
      Christian's commit. I think that justifies reverting it for now.<br>
    </p>
    <p>I agree with Christian that a general HDP memory access problem
      causing RAS errors would potentially cause problems in other tests
      as well. For example common operations like GART table updates,
      and GPUVM page table updates and PCIe peer2peer accesses in ROCm
      applications use HDP. But we're not seeing obvious problems from
      those. So we need to understand what's special about this test. I
      asked questions to that effect on our other email thread.</p>
    <p>Regards,<br>
        Felix<br>
    </p>
    <div class="moz-cite-prefix">Am 2020-04-14 um 10:51 a.m. schrieb
      Kim, Jonathan:<br>
    </div>
    <blockquote type="cite" cite="mid:MN2PR12MB451836BC6F9C0F002EE1C3D685DA0@MN2PR12MB4518.namprd12.prod.outlook.com">
      
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
p.msipheader4d0fcdd7, li.msipheader4d0fcdd7, div.msipheader4d0fcdd7
        {mso-style-name:msipheader4d0fcdd7;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle20
        {mso-style-type:personal-compose;
        font-family:"Arial",sans-serif;
        color:#0078D7;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="msipheader4d0fcdd7" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD
            Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">I think it’s premature to push this revert.<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">With more testing, I’m getting failures
          from different tests or sometimes none at all on my machine.<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">Kent, let’s continue the discussion on the
          original thread.<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">Thanks,<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">Jon<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0in 0in 0in">
            <p class="MsoNormal"><b>From:</b> Koenig, Christian
              <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a> <br>
              <b>Sent:</b> Tuesday, April 14, 2020 10:47 AM<br>
              <b>To:</b> Deucher, Alexander
              <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com"><Alexander.Deucher@amd.com></a><br>
              <b>Cc:</b> Russell, Kent <a class="moz-txt-link-rfc2396E" href="mailto:Kent.Russell@amd.com"><Kent.Russell@amd.com></a>;
              <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Kuehling, Felix
              <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; Kim, Jonathan
              <a class="moz-txt-link-rfc2396E" href="mailto:Jonathan.Kim@amd.com"><Jonathan.Kim@amd.com></a><br>
              <b>Subject:</b> Re: [PATCH] Revert "drm/amdgpu: use the
              BAR if possible in amdgpu_device_vram_access v2"<o:p></o:p></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <div>
            <div>
              <div>
                <div>
                  <p class="MsoNormal">That's exactly my concern as
                    well. <o:p></o:p></p>
                  <div>
                    <p class="MsoNormal"><o:p> </o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal">This looks a bit like the test
                      creates erroneous data somehow, but there doesn't
                      seems to be a RAS check in the MM data path.<o:p></o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal"><o:p> </o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal">And now that we use the BAR
                      path it goes up in flames.<o:p></o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal"><o:p> </o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal">I just don't see how we can
                      create erroneous data in a test case?<o:p></o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal"><o:p> </o:p></p>
                  </div>
                  <div>
                    <p class="MsoNormal">Christian.<o:p></o:p></p>
                  </div>
                </div>
                <div>
                  <p class="MsoNormal"><o:p> </o:p></p>
                  <div>
                    <p class="MsoNormal">Am 14.04.2020 16:35 schrieb
                      "Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
                    <blockquote style="border:none;border-left:solid
                      #CCCCCC 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                      <div>
                        <p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
                            Public Use]<o:p></o:p></span></p>
                        <p class="MsoNormal"><o:p> </o:p></p>
                        <div>
                          <div>
                            <p class="MsoNormal"><span style="font-size:12.0pt;color:black">If
                                this causes an issue, any access to vram
                                via the BAR could cause an issue.<o:p></o:p></span></p>
                          </div>
                          <div>
                            <p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
                          </div>
                          <div>
                            <p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
                          </div>
                          <div class="MsoNormal" style="text-align:center" align="center">
                            <hr width="98%" size="2" align="center">
                          </div>
                          <div>
                            <p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                                on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
                                <b>Sent:</b> Tuesday, April 14, 2020
                                10:19 AM<br>
                                <b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
                                <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
                                <<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
                                <b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                                Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
                                <b>Subject:</b> RE: [PATCH] Revert
                                "drm/amdgpu: use the BAR if possible in
                                amdgpu_device_vram_access v2"</span>
                              <o:p></o:p></p>
                            <div>
                              <p class="MsoNormal"> <o:p></o:p></p>
                            </div>
                          </div>
                          <div>
                            <div>
                              <p class="MsoNormal">[AMD Official Use
                                Only - Internal Distribution Only]<br>
                                <br>
                                On VG20 or MI100, as soon as we run the
                                subtest, we get the dmesg output below,
                                and then the kernel ends up hanging. I
                                don't know enough about the test itself
                                to know why this is occurring, but Jon
                                Kim and Felix were discussing it on a
                                separate thread when the issue was first
                                reported, so they can hopefully provide
                                some additional information.<br>
                                <br>
                                 Kent<br>
                                <br>
                                > -----Original Message-----<br>
                                > From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
                                > Sent: Tuesday, April 14, 2020 9:52
                                AM<br>
                                > To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
                                <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                                > Subject: Re: [PATCH] Revert
                                "drm/amdgpu: use the BAR if possible in<br>
                                > amdgpu_device_vram_access v2"<br>
                                > <br>
                                > Am 13.04.20 um 20:20 schrieb Kent
                                Russell:<br>
                                > > This reverts commit
                                c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
                                > > The original patch causes a
                                RAS event and subsequent kernel
                                hard-hang<br>
                                > > when running the
                                KFDMemoryTest.PtraceAccessInvisibleVram
                                on VG20 and<br>
                                > > Arcturus<br>
                                > ><br>
                                > > dmesg output at hang time:<br>
                                > > [drm] RAS event of type
                                ERREVENT_ATHUB_INTERRUPT detected!<br>
                                > > amdgpu 0000:67:00.0: GPU reset
                                begin!<br>
                                > > Evicting PASID 0x8000 queues<br>
                                > > Started evicting pasid 0x8000<br>
                                > > qcm fence wait loop timeout
                                expired<br>
                                > > The cp might be in an
                                unrecoverable state due to an
                                unsuccessful<br>
                                > > queues preemption Failed to
                                evict process queues Failed to suspend<br>
                                > > process 0x8000 Finished
                                evicting pasid 0x8000 Started restoring
                                pasid<br>
                                > > 0x8000 Finished restoring
                                pasid 0x8000 [drm] UVD VCPU state may
                                lost<br>
                                > > due to RAS
                                ERREVENT_ATHUB_INTERRUPT<br>
                                > > amdgpu: [powerplay] Failed to
                                send message 0x26, response 0x0<br>
                                > > amdgpu: [powerplay] Failed to
                                set soft min gfxclk !<br>
                                > > amdgpu: [powerplay] Failed to
                                upload DPM Bootup Levels!<br>
                                > > amdgpu: [powerplay] Failed to
                                send message 0x7, response 0x0<br>
                                > > amdgpu: [powerplay]
                                [DisableAllSMUFeatures] Failed to
                                disable all smu<br>
                                > features!<br>
                                > > amdgpu: [powerplay]
                                [DisableDpmTasks] Failed to disable all
                                smu features!<br>
                                > > amdgpu: [powerplay]
                                [PowerOffAsic] Failed to disable DPM!<br>
                                > >
                                [drm:amdgpu_device_ip_suspend_phase2
                                [amdgpu]] *ERROR* suspend of IP<br>
                                > > block <powerplay> failed
                                -5<br>
                                > <br>
                                > Do you have more information on
                                what's going wrong here since this is a
                                really<br>
                                > important patch for KFD debugging.<br>
                                > <br>
                                > ><br>
                                > > Signed-off-by: Kent Russell
                                <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
                                > <br>
                                > Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                                > <br>
                                > > ---<br>
                                > >  
                                drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                                | 26 ----------------------<br>
                                > >   1 file changed, 26
                                deletions(-)<br>
                                > ><br>
                                > > diff --git
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                                > >
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                                > > index
                                cf5d6e585634..a3f997f84020 100644<br>
                                > > ---
                                a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                                > > +++
                                b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                                > > @@ -254,32 +254,6 @@ void
                                amdgpu_device_vram_access(struct<br>
                                > amdgpu_device *adev, loff_t pos,<br>
                                > >      uint32_t hi = ~0;<br>
                                > >      uint64_t last;<br>
                                > ><br>
                                > > -<br>
                                > > -#ifdef CONFIG_64BIT<br>
                                > > -   last = min(pos + size,
                                adev->gmc.visible_vram_size);<br>
                                > > -   if (last > pos) {<br>
                                > > -           void __iomem *addr
                                = adev->mman.aper_base_kaddr + pos;<br>
                                > > -           size_t count =
                                last - pos;<br>
                                > > -<br>
                                > > -           if (write) {<br>
                                > > -                  
                                memcpy_toio(addr, buf, count);<br>
                                > > -                   mb();<br>
                                > > -                  
                                amdgpu_asic_flush_hdp(adev, NULL);<br>
                                > > -           } else {<br>
                                > > -                  
                                amdgpu_asic_invalidate_hdp(adev, NULL);<br>
                                > > -                   mb();<br>
                                > > -                  
                                memcpy_fromio(buf, addr, count);<br>
                                > > -           }<br>
                                > > -<br>
                                > > -           if (count == size)<br>
                                > > -                   return;<br>
                                > > -<br>
                                > > -           pos += count;<br>
                                > > -           buf += count / 4;<br>
                                > > -           size -= count;<br>
                                > > -   }<br>
                                > > -#endif<br>
                                > > -<br>
                                > >     
                                spin_lock_irqsave(&adev->mmio_idx_lock,
                                flags);<br>
                                > >      for (last = pos + size;
                                pos < last; pos += 4) {<br>
                                > >              uint32_t tmp =
                                pos >> 31;<br>
_______________________________________________<br>
                                amd-gfx mailing list<br>
                                <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                                <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0</a><o:p></o:p></p>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <p class="MsoNormal"><o:p> </o:p></p>
                </div>
              </div>
              <div>
                <p class="MsoNormal"><o:p> </o:p></p>
                <div>
                  <p class="MsoNormal">Am 14.04.2020 16:35 schrieb
                    "Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
                  <blockquote style="border:none;border-left:solid
                    #CCCCCC 1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                    <div>
                      <p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
                          Public Use]<o:p></o:p></span></p>
                      <p class="MsoNormal"><o:p> </o:p></p>
                      <div>
                        <div>
                          <p class="MsoNormal"><span style="font-size:12.0pt;color:black">If
                              this causes an issue, any access to vram
                              via the BAR could cause an issue.<o:p></o:p></span></p>
                        </div>
                        <div>
                          <p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
                        </div>
                        <div>
                          <p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
                        </div>
                        <div class="MsoNormal" style="text-align:center" align="center">
                          <hr width="98%" size="2" align="center">
                        </div>
                        <div>
                          <p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                              on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
                              <b>Sent:</b> Tuesday, April 14, 2020 10:19
                              AM<br>
                              <b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
                              <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
                              <<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
                              <b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                              Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
                              <b>Subject:</b> RE: [PATCH] Revert
                              "drm/amdgpu: use the BAR if possible in
                              amdgpu_device_vram_access v2"</span>
                            <o:p></o:p></p>
                          <div>
                            <p class="MsoNormal"> <o:p></o:p></p>
                          </div>
                        </div>
                        <div>
                          <div>
                            <p class="MsoNormal">[AMD Official Use Only
                              - Internal Distribution Only]<br>
                              <br>
                              On VG20 or MI100, as soon as we run the
                              subtest, we get the dmesg output below,
                              and then the kernel ends up hanging. I
                              don't know enough about the test itself to
                              know why this is occurring, but Jon Kim
                              and Felix were discussing it on a separate
                              thread when the issue was first reported,
                              so they can hopefully provide some
                              additional information.<br>
                              <br>
                               Kent<br>
                              <br>
                              > -----Original Message-----<br>
                              > From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
                              > Sent: Tuesday, April 14, 2020 9:52 AM<br>
                              > To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
                              <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                              > Subject: Re: [PATCH] Revert
                              "drm/amdgpu: use the BAR if possible in<br>
                              > amdgpu_device_vram_access v2"<br>
                              > <br>
                              > Am 13.04.20 um 20:20 schrieb Kent
                              Russell:<br>
                              > > This reverts commit
                              c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
                              > > The original patch causes a RAS
                              event and subsequent kernel hard-hang<br>
                              > > when running the
                              KFDMemoryTest.PtraceAccessInvisibleVram on
                              VG20 and<br>
                              > > Arcturus<br>
                              > ><br>
                              > > dmesg output at hang time:<br>
                              > > [drm] RAS event of type
                              ERREVENT_ATHUB_INTERRUPT detected!<br>
                              > > amdgpu 0000:67:00.0: GPU reset
                              begin!<br>
                              > > Evicting PASID 0x8000 queues<br>
                              > > Started evicting pasid 0x8000<br>
                              > > qcm fence wait loop timeout
                              expired<br>
                              > > The cp might be in an
                              unrecoverable state due to an unsuccessful<br>
                              > > queues preemption Failed to
                              evict process queues Failed to suspend<br>
                              > > process 0x8000 Finished evicting
                              pasid 0x8000 Started restoring pasid<br>
                              > > 0x8000 Finished restoring pasid
                              0x8000 [drm] UVD VCPU state may lost<br>
                              > > due to RAS
                              ERREVENT_ATHUB_INTERRUPT<br>
                              > > amdgpu: [powerplay] Failed to
                              send message 0x26, response 0x0<br>
                              > > amdgpu: [powerplay] Failed to
                              set soft min gfxclk !<br>
                              > > amdgpu: [powerplay] Failed to
                              upload DPM Bootup Levels!<br>
                              > > amdgpu: [powerplay] Failed to
                              send message 0x7, response 0x0<br>
                              > > amdgpu: [powerplay]
                              [DisableAllSMUFeatures] Failed to disable
                              all smu<br>
                              > features!<br>
                              > > amdgpu: [powerplay]
                              [DisableDpmTasks] Failed to disable all
                              smu features!<br>
                              > > amdgpu: [powerplay]
                              [PowerOffAsic] Failed to disable DPM!<br>
                              > >
                              [drm:amdgpu_device_ip_suspend_phase2
                              [amdgpu]] *ERROR* suspend of IP<br>
                              > > block <powerplay> failed
                              -5<br>
                              > <br>
                              > Do you have more information on
                              what's going wrong here since this is a
                              really<br>
                              > important patch for KFD debugging.<br>
                              > <br>
                              > ><br>
                              > > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
                              > <br>
                              > Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                              > <br>
                              > > ---<br>
                              > >  
                              drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                              | 26 ----------------------<br>
                              > >   1 file changed, 26
                              deletions(-)<br>
                              > ><br>
                              > > diff --git
                              a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                              > >
                              b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                              > > index cf5d6e585634..a3f997f84020
                              100644<br>
                              > > ---
                              a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                              > > +++
                              b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                              > > @@ -254,32 +254,6 @@ void
                              amdgpu_device_vram_access(struct<br>
                              > amdgpu_device *adev, loff_t pos,<br>
                              > >      uint32_t hi = ~0;<br>
                              > >      uint64_t last;<br>
                              > ><br>
                              > > -<br>
                              > > -#ifdef CONFIG_64BIT<br>
                              > > -   last = min(pos + size,
                              adev->gmc.visible_vram_size);<br>
                              > > -   if (last > pos) {<br>
                              > > -           void __iomem *addr =
                              adev->mman.aper_base_kaddr + pos;<br>
                              > > -           size_t count = last
                              - pos;<br>
                              > > -<br>
                              > > -           if (write) {<br>
                              > > -                  
                              memcpy_toio(addr, buf, count);<br>
                              > > -                   mb();<br>
                              > > -                  
                              amdgpu_asic_flush_hdp(adev, NULL);<br>
                              > > -           } else {<br>
                              > > -                  
                              amdgpu_asic_invalidate_hdp(adev, NULL);<br>
                              > > -                   mb();<br>
                              > > -                  
                              memcpy_fromio(buf, addr, count);<br>
                              > > -           }<br>
                              > > -<br>
                              > > -           if (count == size)<br>
                              > > -                   return;<br>
                              > > -<br>
                              > > -           pos += count;<br>
                              > > -           buf += count / 4;<br>
                              > > -           size -= count;<br>
                              > > -   }<br>
                              > > -#endif<br>
                              > > -<br>
                              > >     
                              spin_lock_irqsave(&adev->mmio_idx_lock,
                              flags);<br>
                              > >      for (last = pos + size; pos
                              < last; pos += 4) {<br>
                              > >              uint32_t tmp = pos
                              >> 31;<br>
_______________________________________________<br>
                              amd-gfx mailing list<br>
                              <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                              <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0</a><o:p></o:p></p>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <p class="MsoNormal"><o:p> </o:p></p>
              </div>
            </div>
            <div>
              <p class="MsoNormal"><o:p> </o:p></p>
              <div>
                <p class="MsoNormal">Am 14.04.2020 16:35 schrieb
                  "Deucher, Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
                <blockquote style="border:none;border-left:solid #CCCCCC
                  1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                  <div>
                    <p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
                        Public Use]<o:p></o:p></span></p>
                    <p class="MsoNormal"><o:p> </o:p></p>
                    <div>
                      <div>
                        <p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this
                            causes an issue, any access to vram via the
                            BAR could cause an issue.<o:p></o:p></span></p>
                      </div>
                      <div>
                        <p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
                      </div>
                      <div>
                        <p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
                      </div>
                      <div class="MsoNormal" style="text-align:center" align="center">
                        <hr width="98%" size="2" align="center">
                      </div>
                      <div>
                        <p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                            on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
                            <b>Sent:</b> Tuesday, April 14, 2020 10:19
                            AM<br>
                            <b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
                            <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
                            <<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
                            <b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                            Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
                            <b>Subject:</b> RE: [PATCH] Revert
                            "drm/amdgpu: use the BAR if possible in
                            amdgpu_device_vram_access v2"</span>
                          <o:p></o:p></p>
                        <div>
                          <p class="MsoNormal"> <o:p></o:p></p>
                        </div>
                      </div>
                      <div>
                        <div>
                          <p class="MsoNormal">[AMD Official Use Only -
                            Internal Distribution Only]<br>
                            <br>
                            On VG20 or MI100, as soon as we run the
                            subtest, we get the dmesg output below, and
                            then the kernel ends up hanging. I don't
                            know enough about the test itself to know
                            why this is occurring, but Jon Kim and Felix
                            were discussing it on a separate thread when
                            the issue was first reported, so they can
                            hopefully provide some additional
                            information.<br>
                            <br>
                             Kent<br>
                            <br>
                            > -----Original Message-----<br>
                            > From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
                            > Sent: Tuesday, April 14, 2020 9:52 AM<br>
                            > To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
                            <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                            > Subject: Re: [PATCH] Revert
                            "drm/amdgpu: use the BAR if possible in<br>
                            > amdgpu_device_vram_access v2"<br>
                            > <br>
                            > Am 13.04.20 um 20:20 schrieb Kent
                            Russell:<br>
                            > > This reverts commit
                            c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
                            > > The original patch causes a RAS
                            event and subsequent kernel hard-hang<br>
                            > > when running the
                            KFDMemoryTest.PtraceAccessInvisibleVram on
                            VG20 and<br>
                            > > Arcturus<br>
                            > ><br>
                            > > dmesg output at hang time:<br>
                            > > [drm] RAS event of type
                            ERREVENT_ATHUB_INTERRUPT detected!<br>
                            > > amdgpu 0000:67:00.0: GPU reset
                            begin!<br>
                            > > Evicting PASID 0x8000 queues<br>
                            > > Started evicting pasid 0x8000<br>
                            > > qcm fence wait loop timeout
                            expired<br>
                            > > The cp might be in an
                            unrecoverable state due to an unsuccessful<br>
                            > > queues preemption Failed to evict
                            process queues Failed to suspend<br>
                            > > process 0x8000 Finished evicting
                            pasid 0x8000 Started restoring pasid<br>
                            > > 0x8000 Finished restoring pasid
                            0x8000 [drm] UVD VCPU state may lost<br>
                            > > due to RAS
                            ERREVENT_ATHUB_INTERRUPT<br>
                            > > amdgpu: [powerplay] Failed to send
                            message 0x26, response 0x0<br>
                            > > amdgpu: [powerplay] Failed to set
                            soft min gfxclk !<br>
                            > > amdgpu: [powerplay] Failed to
                            upload DPM Bootup Levels!<br>
                            > > amdgpu: [powerplay] Failed to send
                            message 0x7, response 0x0<br>
                            > > amdgpu: [powerplay]
                            [DisableAllSMUFeatures] Failed to disable
                            all smu<br>
                            > features!<br>
                            > > amdgpu: [powerplay]
                            [DisableDpmTasks] Failed to disable all smu
                            features!<br>
                            > > amdgpu: [powerplay] [PowerOffAsic]
                            Failed to disable DPM!<br>
                            > >
                            [drm:amdgpu_device_ip_suspend_phase2
                            [amdgpu]] *ERROR* suspend of IP<br>
                            > > block <powerplay> failed -5<br>
                            > <br>
                            > Do you have more information on what's
                            going wrong here since this is a really<br>
                            > important patch for KFD debugging.<br>
                            > <br>
                            > ><br>
                            > > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
                            > <br>
                            > Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                            > <br>
                            > > ---<br>
                            > >  
                            drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
                            26 ----------------------<br>
                            > >   1 file changed, 26 deletions(-)<br>
                            > ><br>
                            > > diff --git
                            a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                            > >
                            b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                            > > index cf5d6e585634..a3f997f84020
                            100644<br>
                            > > ---
                            a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                            > > +++
                            b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                            > > @@ -254,32 +254,6 @@ void
                            amdgpu_device_vram_access(struct<br>
                            > amdgpu_device *adev, loff_t pos,<br>
                            > >      uint32_t hi = ~0;<br>
                            > >      uint64_t last;<br>
                            > ><br>
                            > > -<br>
                            > > -#ifdef CONFIG_64BIT<br>
                            > > -   last = min(pos + size,
                            adev->gmc.visible_vram_size);<br>
                            > > -   if (last > pos) {<br>
                            > > -           void __iomem *addr =
                            adev->mman.aper_base_kaddr + pos;<br>
                            > > -           size_t count = last -
                            pos;<br>
                            > > -<br>
                            > > -           if (write) {<br>
                            > > -                  
                            memcpy_toio(addr, buf, count);<br>
                            > > -                   mb();<br>
                            > > -                  
                            amdgpu_asic_flush_hdp(adev, NULL);<br>
                            > > -           } else {<br>
                            > > -                  
                            amdgpu_asic_invalidate_hdp(adev, NULL);<br>
                            > > -                   mb();<br>
                            > > -                  
                            memcpy_fromio(buf, addr, count);<br>
                            > > -           }<br>
                            > > -<br>
                            > > -           if (count == size)<br>
                            > > -                   return;<br>
                            > > -<br>
                            > > -           pos += count;<br>
                            > > -           buf += count / 4;<br>
                            > > -           size -= count;<br>
                            > > -   }<br>
                            > > -#endif<br>
                            > > -<br>
                            > >     
                            spin_lock_irqsave(&adev->mmio_idx_lock,
                            flags);<br>
                            > >      for (last = pos + size; pos
                            < last; pos += 4) {<br>
                            > >              uint32_t tmp = pos
                            >> 31;<br>
_______________________________________________<br>
                            amd-gfx mailing list<br>
                            <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                            <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0</a><o:p></o:p></p>
                        </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
              </div>
              <p class="MsoNormal"><o:p> </o:p></p>
            </div>
          </div>
          <div>
            <p class="MsoNormal"><o:p> </o:p></p>
            <div>
              <p class="MsoNormal">Am 14.04.2020 16:35 schrieb "Deucher,
                Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
              <blockquote style="border:none;border-left:solid #CCCCCC
                1.0pt;padding:0in 0in 0in
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
                <div>
                  <p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
                      Public Use]<o:p></o:p></span></p>
                  <p class="MsoNormal"><o:p> </o:p></p>
                  <div>
                    <div>
                      <p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this
                          causes an issue, any access to vram via the
                          BAR could cause an issue.<o:p></o:p></span></p>
                    </div>
                    <div>
                      <p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
                    </div>
                    <div>
                      <p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
                    </div>
                    <div class="MsoNormal" style="text-align:center" align="center">
                      <hr width="98%" size="2" align="center">
                    </div>
                    <div>
                      <p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                          on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
                          <b>Sent:</b> Tuesday, April 14, 2020 10:19 AM<br>
                          <b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
                          <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
                          <<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
                          <b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                          Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
                          <b>Subject:</b> RE: [PATCH] Revert
                          "drm/amdgpu: use the BAR if possible in
                          amdgpu_device_vram_access v2"</span>
                        <o:p></o:p></p>
                      <div>
                        <p class="MsoNormal"> <o:p></o:p></p>
                      </div>
                    </div>
                    <div>
                      <div>
                        <p class="MsoNormal">[AMD Official Use Only -
                          Internal Distribution Only]<br>
                          <br>
                          On VG20 or MI100, as soon as we run the
                          subtest, we get the dmesg output below, and
                          then the kernel ends up hanging. I don't know
                          enough about the test itself to know why this
                          is occurring, but Jon Kim and Felix were
                          discussing it on a separate thread when the
                          issue was first reported, so they can
                          hopefully provide some additional information.<br>
                          <br>
                           Kent<br>
                          <br>
                          > -----Original Message-----<br>
                          > From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
                          > Sent: Tuesday, April 14, 2020 9:52 AM<br>
                          > To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
                          <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                          > Subject: Re: [PATCH] Revert "drm/amdgpu:
                          use the BAR if possible in<br>
                          > amdgpu_device_vram_access v2"<br>
                          > <br>
                          > Am 13.04.20 um 20:20 schrieb Kent
                          Russell:<br>
                          > > This reverts commit
                          c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
                          > > The original patch causes a RAS
                          event and subsequent kernel hard-hang<br>
                          > > when running the
                          KFDMemoryTest.PtraceAccessInvisibleVram on
                          VG20 and<br>
                          > > Arcturus<br>
                          > ><br>
                          > > dmesg output at hang time:<br>
                          > > [drm] RAS event of type
                          ERREVENT_ATHUB_INTERRUPT detected!<br>
                          > > amdgpu 0000:67:00.0: GPU reset
                          begin!<br>
                          > > Evicting PASID 0x8000 queues<br>
                          > > Started evicting pasid 0x8000<br>
                          > > qcm fence wait loop timeout expired<br>
                          > > The cp might be in an unrecoverable
                          state due to an unsuccessful<br>
                          > > queues preemption Failed to evict
                          process queues Failed to suspend<br>
                          > > process 0x8000 Finished evicting
                          pasid 0x8000 Started restoring pasid<br>
                          > > 0x8000 Finished restoring pasid
                          0x8000 [drm] UVD VCPU state may lost<br>
                          > > due to RAS ERREVENT_ATHUB_INTERRUPT<br>
                          > > amdgpu: [powerplay] Failed to send
                          message 0x26, response 0x0<br>
                          > > amdgpu: [powerplay] Failed to set
                          soft min gfxclk !<br>
                          > > amdgpu: [powerplay] Failed to upload
                          DPM Bootup Levels!<br>
                          > > amdgpu: [powerplay] Failed to send
                          message 0x7, response 0x0<br>
                          > > amdgpu: [powerplay]
                          [DisableAllSMUFeatures] Failed to disable all
                          smu<br>
                          > features!<br>
                          > > amdgpu: [powerplay]
                          [DisableDpmTasks] Failed to disable all smu
                          features!<br>
                          > > amdgpu: [powerplay] [PowerOffAsic]
                          Failed to disable DPM!<br>
                          > > [drm:amdgpu_device_ip_suspend_phase2
                          [amdgpu]] *ERROR* suspend of IP<br>
                          > > block <powerplay> failed -5<br>
                          > <br>
                          > Do you have more information on what's
                          going wrong here since this is a really<br>
                          > important patch for KFD debugging.<br>
                          > <br>
                          > ><br>
                          > > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
                          > <br>
                          > Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                          > <br>
                          > > ---<br>
                          > >  
                          drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |
                          26 ----------------------<br>
                          > >   1 file changed, 26 deletions(-)<br>
                          > ><br>
                          > > diff --git
                          a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                          > >
                          b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                          > > index cf5d6e585634..a3f997f84020
                          100644<br>
                          > > ---
                          a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                          > > +++
                          b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                          > > @@ -254,32 +254,6 @@ void
                          amdgpu_device_vram_access(struct<br>
                          > amdgpu_device *adev, loff_t pos,<br>
                          > >      uint32_t hi = ~0;<br>
                          > >      uint64_t last;<br>
                          > ><br>
                          > > -<br>
                          > > -#ifdef CONFIG_64BIT<br>
                          > > -   last = min(pos + size,
                          adev->gmc.visible_vram_size);<br>
                          > > -   if (last > pos) {<br>
                          > > -           void __iomem *addr =
                          adev->mman.aper_base_kaddr + pos;<br>
                          > > -           size_t count = last -
                          pos;<br>
                          > > -<br>
                          > > -           if (write) {<br>
                          > > -                  
                          memcpy_toio(addr, buf, count);<br>
                          > > -                   mb();<br>
                          > > -                  
                          amdgpu_asic_flush_hdp(adev, NULL);<br>
                          > > -           } else {<br>
                          > > -                  
                          amdgpu_asic_invalidate_hdp(adev, NULL);<br>
                          > > -                   mb();<br>
                          > > -                  
                          memcpy_fromio(buf, addr, count);<br>
                          > > -           }<br>
                          > > -<br>
                          > > -           if (count == size)<br>
                          > > -                   return;<br>
                          > > -<br>
                          > > -           pos += count;<br>
                          > > -           buf += count / 4;<br>
                          > > -           size -= count;<br>
                          > > -   }<br>
                          > > -#endif<br>
                          > > -<br>
                          > >     
                          spin_lock_irqsave(&adev->mmio_idx_lock,
                          flags);<br>
                          > >      for (last = pos + size; pos
                          < last; pos += 4) {<br>
                          > >              uint32_t tmp = pos
                          >> 31;<br>
_______________________________________________<br>
                          amd-gfx mailing list<br>
                          <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                          <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0</a><o:p></o:p></p>
                      </div>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <p class="MsoNormal"><o:p> </o:p></p>
          </div>
        </div>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
          <div>
            <p class="MsoNormal">Am 14.04.2020 16:35 schrieb "Deucher,
              Alexander" <<a href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true">Alexander.Deucher@amd.com</a>>:<o:p></o:p></p>
          </div>
        </div>
        <div>
          <p style="margin:15.0pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#317100">[AMD
              Public Use]<o:p></o:p></span></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <div>
            <div>
              <p class="MsoNormal"><span style="font-size:12.0pt;color:black">If this causes an
                  issue, any access to vram via the BAR could cause an
                  issue.<o:p></o:p></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span style="font-size:12.0pt;color:black">Alex<o:p></o:p></span></p>
            </div>
            <div class="MsoNormal" style="text-align:center" align="center">
              <hr width="98%" size="2" align="center">
            </div>
            <div id="divRplyFwdMsg">
              <p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                  on behalf of Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>><br>
                  <b>Sent:</b> Tuesday, April 14, 2020 10:19 AM<br>
                  <b>To:</b> Koenig, Christian <<a href="mailto:Christian.Koenig@amd.com" moz-do-not-send="true">Christian.Koenig@amd.com</a>>;
                  <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
                  <<a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>><br>
                  <b>Cc:</b> Kuehling, Felix <<a href="mailto:Felix.Kuehling@amd.com" moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                  Kim, Jonathan <<a href="mailto:Jonathan.Kim@amd.com" moz-do-not-send="true">Jonathan.Kim@amd.com</a>><br>
                  <b>Subject:</b> RE: [PATCH] Revert "drm/amdgpu: use
                  the BAR if possible in amdgpu_device_vram_access v2"</span>
                <o:p></o:p></p>
              <div>
                <p class="MsoNormal"> <o:p></o:p></p>
              </div>
            </div>
            <div>
              <div>
                <p class="MsoNormal">[AMD Official Use Only - Internal
                  Distribution Only]<br>
                  <br>
                  On VG20 or MI100, as soon as we run the subtest, we
                  get the dmesg output below, and then the kernel ends
                  up hanging. I don't know enough about the test itself
                  to know why this is occurring, but Jon Kim and Felix
                  were discussing it on a separate thread when the issue
                  was first reported, so they can hopefully provide some
                  additional information.<br>
                  <br>
                   Kent<br>
                  <br>
                  > -----Original Message-----<br>
                  > From: Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com" moz-do-not-send="true">ckoenig.leichtzumerken@gmail.com</a>><br>
                  > Sent: Tuesday, April 14, 2020 9:52 AM<br>
                  > To: Russell, Kent <<a href="mailto:Kent.Russell@amd.com" moz-do-not-send="true">Kent.Russell@amd.com</a>>;
                  <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  > Subject: Re: [PATCH] Revert "drm/amdgpu: use the
                  BAR if possible in<br>
                  > amdgpu_device_vram_access v2"<br>
                  > <br>
                  > Am 13.04.20 um 20:20 schrieb Kent Russell:<br>
                  > > This reverts commit
                  c12b84d6e0d70f1185e6daddfd12afb671791b6e.<br>
                  > > The original patch causes a RAS event and
                  subsequent kernel hard-hang<br>
                  > > when running the
                  KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and<br>
                  > > Arcturus<br>
                  > ><br>
                  > > dmesg output at hang time:<br>
                  > > [drm] RAS event of type
                  ERREVENT_ATHUB_INTERRUPT detected!<br>
                  > > amdgpu 0000:67:00.0: GPU reset begin!<br>
                  > > Evicting PASID 0x8000 queues<br>
                  > > Started evicting pasid 0x8000<br>
                  > > qcm fence wait loop timeout expired<br>
                  > > The cp might be in an unrecoverable state
                  due to an unsuccessful<br>
                  > > queues preemption Failed to evict process
                  queues Failed to suspend<br>
                  > > process 0x8000 Finished evicting pasid
                  0x8000 Started restoring pasid<br>
                  > > 0x8000 Finished restoring pasid 0x8000 [drm]
                  UVD VCPU state may lost<br>
                  > > due to RAS ERREVENT_ATHUB_INTERRUPT<br>
                  > > amdgpu: [powerplay] Failed to send message
                  0x26, response 0x0<br>
                  > > amdgpu: [powerplay] Failed to set soft min
                  gfxclk !<br>
                  > > amdgpu: [powerplay] Failed to upload DPM
                  Bootup Levels!<br>
                  > > amdgpu: [powerplay] Failed to send message
                  0x7, response 0x0<br>
                  > > amdgpu: [powerplay] [DisableAllSMUFeatures]
                  Failed to disable all smu<br>
                  > features!<br>
                  > > amdgpu: [powerplay] [DisableDpmTasks] Failed
                  to disable all smu features!<br>
                  > > amdgpu: [powerplay] [PowerOffAsic] Failed to
                  disable DPM!<br>
                  > > [drm:amdgpu_device_ip_suspend_phase2
                  [amdgpu]] *ERROR* suspend of IP<br>
                  > > block <powerplay> failed -5<br>
                  > <br>
                  > Do you have more information on what's going
                  wrong here since this is a really<br>
                  > important patch for KFD debugging.<br>
                  > <br>
                  > ><br>
                  > > Signed-off-by: Kent Russell <<a href="mailto:kent.russell@amd.com" moz-do-not-send="true">kent.russell@amd.com</a>><br>
                  > <br>
                  > Reviewed-by: Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>><br>
                  > <br>
                  > > ---<br>
                  > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
                  | 26 ----------------------<br>
                  > >   1 file changed, 26 deletions(-)<br>
                  > ><br>
                  > > diff --git
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                  > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                  > > index cf5d6e585634..a3f997f84020 100644<br>
                  > > ---
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                  > > +++
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<br>
                  > > @@ -254,32 +254,6 @@ void
                  amdgpu_device_vram_access(struct<br>
                  > amdgpu_device *adev, loff_t pos,<br>
                  > >      uint32_t hi = ~0;<br>
                  > >      uint64_t last;<br>
                  > ><br>
                  > > -<br>
                  > > -#ifdef CONFIG_64BIT<br>
                  > > -   last = min(pos + size,
                  adev->gmc.visible_vram_size);<br>
                  > > -   if (last > pos) {<br>
                  > > -           void __iomem *addr =
                  adev->mman.aper_base_kaddr + pos;<br>
                  > > -           size_t count = last - pos;<br>
                  > > -<br>
                  > > -           if (write) {<br>
                  > > -                   memcpy_toio(addr, buf,
                  count);<br>
                  > > -                   mb();<br>
                  > > -                  
                  amdgpu_asic_flush_hdp(adev, NULL);<br>
                  > > -           } else {<br>
                  > > -                  
                  amdgpu_asic_invalidate_hdp(adev, NULL);<br>
                  > > -                   mb();<br>
                  > > -                   memcpy_fromio(buf, addr,
                  count);<br>
                  > > -           }<br>
                  > > -<br>
                  > > -           if (count == size)<br>
                  > > -                   return;<br>
                  > > -<br>
                  > > -           pos += count;<br>
                  > > -           buf += count / 4;<br>
                  > > -           size -= count;<br>
                  > > -   }<br>
                  > > -#endif<br>
                  > > -<br>
                  > >     
                  spin_lock_irqsave(&adev->mmio_idx_lock, flags);<br>
                  > >      for (last = pos + size; pos < last;
                  pos += 4) {<br>
                  > >              uint32_t tmp = pos >> 31;<br>
                  _______________________________________________<br>
                  amd-gfx mailing list<br>
                  <a href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  <a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0" moz-do-not-send="true">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0</a><o:p></o:p></p>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>