<div dir="ltr"><div>Hi Jim,</div><div><br></div><div>Replies in between.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Regards,</div><div class="gmail_extra">Luís<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 12, 2018 at 3:16 AM, jimqu <span dir="ltr"><<a href="mailto:jimqu@amd.com" target="_blank">jimqu@amd.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF"><span class="gmail-">
    <p><br>
    </p>
    <br>
    <div class="gmail-m_3173675840618728897moz-cite-prefix">On 2018年07月12日 05:27, Luís Mendes
      wrote:<br>
    </div>
    </span><blockquote type="cite">
      
      <div dir="ltr">
        <div>Hi Jim,</div>
        <div><br>
        </div><span class="gmail-">
        <div>I followed your suggestion and was able to bisect the
          kernel patches.</div>
        <div>The offending patch is: drm/amdgpu: defer test IBs on the
          rings at boot (V3)<br>
        </div>
        <div>commit:
          <table summary="commit info" class="gmail-m_3173675840618728897gmail-commit-info">
            <tbody>
              <tr>
                <th><br>
                </th>
                <td colspan="2" class="gmail-m_3173675840618728897gmail-sha1"><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.18-rc4&id=2c773de2ecb8c327f2448bd1eecad224e9227087" target="_blank">2c773de2ecb8c327f2448bd1eecad2<wbr>24e9227087</a></td>
              </tr>
            </tbody>
          </table>
        </div>
        <div><br>
        </div>
        <div>After reverting this patch the IB test succeeded with
          kernel v4.18-rc4 on both systems and the amdgpu driver was
          correctly loaded both on SAPPHIRE RX550 4GB and on SAPPHIRE
          RX460 2GB.</div>
        <div><br>
        </div>
      </span></div>
    </blockquote>
    <br>
    Alex, Christian, What do you think about the patch?<span class="gmail-"><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>The GPU hang remains, however.<br>
        </div>
        <div> I will try to configure a remote IPMI connection to see
          what is happening with the kernel boot or setup a serial
          console for the Kernel.</div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <br></span>
    <b>You can set up remote connection by ssh, and also you can add amdgpu
    to blacklist first, and manually modprobe amdgpu.</b><br></div></blockquote><div>R: I was able to setup a remote serial console with console=ttyS0,11520n8 kernel parameter. <br></div><div>Boot log follows attached as file kernel_bisected_v4.18-rc4_log.txt. <br></div><div>First noticeable issue seems to be:</div><div>[    6.131989] amdgpu: [powerplay] <br>[    6.131989]  last message was failed ret is 65535<br>...</div><div>and later hangs with:</div><div>[   33.504100] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out<br>[   43.744094] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out                                                      <br>[   53.984089] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:54:HDMI-A-1] flip_done timed out                                               <br>[   64.224036] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out                                                    <br>[   64.224141] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* amdgpu_dm_commit_planes: acrtc 0, already busy                                                               <br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
    <b>What's about xinit? What is MESA driver version on your platform?</b></div></blockquote><div>R: I am running Ubuntu 18.04 with bisected kernel 4.18-rc4 using libdrm-2.4.92 and mesa-18.1.0.</div><div>xinit output follows attached as xinit_log.txt<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><div><div class="gmail-h5"><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>Thanks & Regards,</div>
        <div>Luís<br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Jul 11, 2018 at 10:56 AM, jimqu
          <span dir="ltr"><<a href="mailto:jimqu@amd.com" target="_blank">jimqu@amd.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p>HI Luis,</p>
              <p><br>
              </p>
              <p>Let us trace the issue one by one.</p>
              <p><br>
              </p>
              <p>IB test fail:</p>
              <p>This should be regression issue on 4.18, you can bisect
                the kernel patches.</p>
              <p>GPU hang:</p>
              <p>Fix IB test fail first.</p>
              <p><br>
              </p>
              <p>Thanks</p>
              <span class="gmail-m_3173675840618728897HOEnZb"><font color="#888888">
                  <p>JimQu<br>
                  </p>
                </font></span>
              <div>
                <div class="gmail-m_3173675840618728897h5">
                  <p><br>
                  </p>
                  <br>
                  <div class="gmail-m_3173675840618728897m_-5542977703135971300moz-cite-prefix">On
                    2018年07月11日 17:34, Luís Mendes wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>Hi Jim,</div>
                      <div><br>
                      </div>
                      <div>Thanks for your interest in this issue.
                        Actually this is a multiple issue... not only
                        the IB ring test is failing... as I am having
                        quite some trouble getting the cards SAPPHIRE RX
                        550 4GB on a Tyan S7025 and SAPPHIRE RX 460 2GB
                        on a TYAN S7002 to work, both systems using same
                        Ubuntu 18.04 with vanilla kernel.<br>
                      </div>
                      <div><br>
                      </div>
                      <div><b>1. May you also test earlier kernel? v4.17
                          or v4.16.</b><br>
                      </div>
                      <div>I've tested kernels v4.17.5 and v4.16.6 with
                        same system and both are able to pass the IB
                        ring test and system boots into X using NVIDIA
                        as the display connected card.</div>
                      <div>dmesg log attached for kernel 4.17.5, file
                        TYAN_S7025_kernelv4.17.5_amdgp<wbr>u_IB_ring_test_OK.txt.<br>
                      </div>
                      <div><br>
                      </div>
                      <div><b>2. May you test the issue only with
                          amdgpu?</b></div>
                      <div>
                        <div>- I've tested on a TYAN S7002 system with a
                          single SAPPHIRE RX 460 2GB, on-board VGA
                          enabled and used as primary display.</div>
                        <div>Kernel v4.18-rc4 fails the IB ring test,
                          system is able to enter X through the on-board
                          VGA. <br>
                        </div>
                        <div>dmesg log attached for kernel 4.18-rc4,
                          file TYAN_S7002_kernel_v4.18-rc4_IB<wbr>_ring_test_fail.txt.</div>
                        <div><br>
                        </div>
                        <div>- Same TYAN S7002 system, but now with
                          on-board VGA disabled and using RX 460 as
                          display connected card.<br>
                        </div>
                        <div>
                          <div>Kernels v4.17.5 and v4.16.6 are able to
                            pass the IB ring test, but GPU hangs before
                            entering X. Don't have logs for these yet.<br>
                          </div>
                          <br>
                          <div>Regards,</div>
                          <div>Luís Mendes</div>
                          <div>Aparapi contributor and MSc Researcher<br>
                          </div>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          <br>
                        </div>
                        <br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Wed, Jul 11, 2018 at
                        3:49 AM, Qu, Jim <span dir="ltr"><<a href="mailto:Jim.Qu@amd.com" target="_blank">Jim.Qu@amd.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Luis,<br>
                          <br>
                          1. May you also test earlier kernel? v4.17 or
                          v4.16.<br>
                          2. May you test the issue only with amdgpu?<br>
                          <br>
                          Thanks<br>
                          JimQu<br>
                          <br>
                          ______________________________<wbr>__________<br>
                          发件人: amd-gfx <<a href="mailto:amd-gfx-bounces@lists.freedesktop.org" target="_blank">amd-gfx-bounces@lists.freedes<wbr>ktop.org</a>>
                          代表 Luís Mendes <<a href="mailto:luis.p.mendes@gmail.com" target="_blank">luis.p.mendes@gmail.com</a>><br>
                          发送时间: 2018年7月11日 6:04:00<br>
                          收件人: Michel Dänzer; Koenig, Christian; amd-gfx
                          list<br>
                          主题: Re: Regression with kernel 4.18 - AMD RX
                          550 fails IB ring test on power-up<br>
                          <span class="gmail-m_3173675840618728897m_-5542977703135971300im gmail-m_3173675840618728897m_-5542977703135971300HOEnZb"><br>
                            Hi,<br>
                            <br>
                            Issue remains in kernel 4.18-rc4 using
                            SAPPHIRE RX 550 4GB.<br>
                            <br>
                            Logs follow attached.<br>
                            <br>
                            Regards,<br>
                            Luis<br>
                            <br>
                          </span>
                          <div class="gmail-m_3173675840618728897m_-5542977703135971300HOEnZb">
                            <div class="gmail-m_3173675840618728897m_-5542977703135971300h5">On
                              Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
                              <<a href="mailto:luis.p.mendes@gmail.com" target="_blank">luis.p.mendes@gmail.com</a><mailt<wbr>o:<a href="mailto:luis.p.mendes@gmail.com" target="_blank">luis.p.mendes@gmail.com</a>>>
                              wrote:<br>
                              Hi,<br>
                              <br>
                              I've tried kernel 4.18-rc2 on a system
                              with a NVIDIA GTX 1050 Ti and an AMD RX
                              550 4GB and the RX 550 card is failing the
                              IB ring test.<br>
                              <br>
                              [    5.033217] [drm:gfx_v8_0_ring_test_ib
                              [amdgpu]] *ERROR* amdgpu: ib test failed
                              (scratch(0xC040)=0xFFFFFFFF)<br>
                              [    5.033264] [drm:amdgpu_ib_ring_tests
                              [amdgpu]] *ERROR* amdgpu: failed testing
                              IB on ring 6 (-22).<br>
                              <br>
                              Please see the attached log.<br>
                              <br>
                              Regards,<br>
                              Luís<br>
                              <br>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div></div>