<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Luis,<br>
<br>
well was "drm/amdgpu: defer test IBs on the rings at boot (V3)"
does is delaying the IB test a bit and running it async to the
rest of the bootup.<br>
<br>
So what most likely happens is that some hardware feature (like
power or clock gating) which doesn't works correctly on your
system kicks in and lets the IB test fail.<br>
<br>
It's rather likely that this problem is also responsible for the
crashes you expect later on. So I think we should concentrate on
fixing that.<br>
<br>
Regards,<br>
Christian.<br>
<br>
Am 11.07.2018 um 23:27 schrieb Luís Mendes:<br>
</div>
<blockquote type="cite"
cite="mid:CAEzXK1q5OdOaYzLRw9YWmAeOUQxGjb2R8jTEjQX1SPNhTGsS_w@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>Hi Jim,</div>
<div><br>
</div>
<div>I followed your suggestion and was able to bisect the
kernel patches.</div>
<div>The offending patch is: drm/amdgpu: defer test IBs on the
rings at boot (V3)<br>
</div>
<div>commit:
<table summary="commit info" class="gmail-commit-info">
<tbody>
<tr>
<th><br>
</th>
<td colspan="2" class="gmail-sha1"><a
href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.18-rc4&id=2c773de2ecb8c327f2448bd1eecad224e9227087"
moz-do-not-send="true">2c773de2ecb8c327f2448bd1eecad224e9227087</a></td>
</tr>
</tbody>
</table>
</div>
<div><br>
</div>
<div>After reverting this patch the IB test succeeded with
kernel v4.18-rc4 on both systems and the amdgpu driver was
correctly loaded both on SAPPHIRE RX550 4GB and on SAPPHIRE
RX460 2GB.</div>
<div><br>
</div>
<div>The GPU hang remains, however.<br>
</div>
<div> I will try to configure a remote IPMI connection to see
what is happening with the kernel boot or setup a serial
console for the Kernel.</div>
<div><br>
</div>
<div>Thanks & Regards,</div>
<div>Luís<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Jul 11, 2018 at 10:56 AM, jimqu
<span dir="ltr"><<a href="mailto:jimqu@amd.com"
target="_blank" moz-do-not-send="true">jimqu@amd.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>HI Luis,</p>
<p><br>
</p>
<p>Let us trace the issue one by one.</p>
<p><br>
</p>
<p>IB test fail:</p>
<p>This should be regression issue on 4.18, you can bisect
the kernel patches.</p>
<p>GPU hang:</p>
<p>Fix IB test fail first.</p>
<p><br>
</p>
<p>Thanks</p>
<span class="HOEnZb"><font color="#888888">
<p>JimQu<br>
</p>
</font></span>
<div>
<div class="h5">
<p><br>
</p>
<br>
<div class="m_-5542977703135971300moz-cite-prefix">On
2018年07月11日 17:34, Luís Mendes wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Jim,</div>
<div><br>
</div>
<div>Thanks for your interest in this issue.
Actually this is a multiple issue... not only
the IB ring test is failing... as I am having
quite some trouble getting the cards SAPPHIRE RX
550 4GB on a Tyan S7025 and SAPPHIRE RX 460 2GB
on a TYAN S7002 to work, both systems using same
Ubuntu 18.04 with vanilla kernel.<br>
</div>
<div><br>
</div>
<div><b>1. May you also test earlier kernel? v4.17
or v4.16.</b><br>
</div>
<div>I've tested kernels v4.17.5 and v4.16.6 with
same system and both are able to pass the IB
ring test and system boots into X using NVIDIA
as the display connected card.</div>
<div>dmesg log attached for kernel 4.17.5, file
TYAN_S7025_kernelv4.17.5_<wbr>amdgpu_IB_ring_test_OK.txt.<br>
</div>
<div><br>
</div>
<div><b>2. May you test the issue only with
amdgpu?</b></div>
<div>
<div>- I've tested on a TYAN S7002 system with a
single SAPPHIRE RX 460 2GB, on-board VGA
enabled and used as primary display.</div>
<div>Kernel v4.18-rc4 fails the IB ring test,
system is able to enter X through the on-board
VGA. <br>
</div>
<div>dmesg log attached for kernel 4.18-rc4,
file TYAN_S7002_kernel_v4.18-rc4_<wbr>IB_ring_test_fail.txt.</div>
<div><br>
</div>
<div>- Same TYAN S7002 system, but now with
on-board VGA disabled and using RX 460 as
display connected card.<br>
</div>
<div>
<div>Kernels v4.17.5 and v4.16.6 are able to
pass the IB ring test, but GPU hangs before
entering X. Don't have logs for these yet.<br>
</div>
<br>
<div>Regards,</div>
<div>Luís Mendes</div>
<div>Aparapi contributor and MSc Researcher<br>
</div>
<div><br>
</div>
<div><br>
</div>
<br>
</div>
<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Jul 11, 2018 at
3:49 AM, Qu, Jim <span dir="ltr"><<a
href="mailto:Jim.Qu@amd.com" target="_blank"
moz-do-not-send="true">Jim.Qu@amd.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">Hi Luis,<br>
<br>
1. May you also test earlier kernel? v4.17 or
v4.16.<br>
2. May you test the issue only with amdgpu?<br>
<br>
Thanks<br>
JimQu<br>
<br>
______________________________<wbr>__________<br>
发件人: amd-gfx <<a
href="mailto:amd-gfx-bounces@lists.freedesktop.org"
target="_blank" moz-do-not-send="true">amd-gfx-bounces@lists.freedes<wbr>ktop.org</a>>
代表 Luís Mendes <<a
href="mailto:luis.p.mendes@gmail.com"
target="_blank" moz-do-not-send="true">luis.p.mendes@gmail.com</a>><br>
发送时间: 2018年7月11日 6:04:00<br>
收件人: Michel Dänzer; Koenig, Christian; amd-gfx
list<br>
主题: Re: Regression with kernel 4.18 - AMD RX
550 fails IB ring test on power-up<br>
<span class="m_-5542977703135971300im
m_-5542977703135971300HOEnZb"><br>
Hi,<br>
<br>
Issue remains in kernel 4.18-rc4 using
SAPPHIRE RX 550 4GB.<br>
<br>
Logs follow attached.<br>
<br>
Regards,<br>
Luis<br>
<br>
</span>
<div class="m_-5542977703135971300HOEnZb">
<div class="m_-5542977703135971300h5">On
Tue, Jun 26, 2018 at 10:08 AM, Luís Mendes
<<a
href="mailto:luis.p.mendes@gmail.com"
target="_blank" moz-do-not-send="true">luis.p.mendes@gmail.com</a><mailt<wbr>o:<a
href="mailto:luis.p.mendes@gmail.com"
target="_blank" moz-do-not-send="true">luis.p.mendes@gmail.com</a>>>
wrote:<br>
Hi,<br>
<br>
I've tried kernel 4.18-rc2 on a system
with a NVIDIA GTX 1050 Ti and an AMD RX
550 4GB and the RX 550 card is failing the
IB ring test.<br>
<br>
[ 5.033217] [drm:gfx_v8_0_ring_test_ib
[amdgpu]] *ERROR* amdgpu: ib test failed
(scratch(0xC040)=0xFFFFFFFF)<br>
[ 5.033264] [drm:amdgpu_ib_ring_tests
[amdgpu]] *ERROR* amdgpu: failed testing
IB on ring 6 (-22).<br>
<br>
Please see the attached log.<br>
<br>
Regards,<br>
Luís<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>