<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thank you Alex! It does fix the crash. (GPU post failed following that but at least it exits gracefully.)<br>
<br>
Regards,<br>
Amber<br>
<br>
<div class="moz-cite-prefix">On 2019-05-08 10:48 p.m., Deucher, Alexander wrote:<br>
</div>
<blockquote type="cite" cite="mid:BN6PR12MB1809AAA827BDC68A79EBFF14F7330@BN6PR12MB1809.namprd12.prod.outlook.com">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
The attached patch should fix it.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Alex</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> amd-gfx
<a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org">
<amd-gfx-bounces@lists.freedesktop.org></a> on behalf of Lin, Amber <a class="moz-txt-link-rfc2396E" href="mailto:Amber.Lin@amd.com">
<Amber.Lin@amd.com></a><br>
<b>Sent:</b> Wednesday, May 8, 2019 4:56 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">
amd-gfx@lists.freedesktop.org</a><br>
<b>Subject:</b> Kernel crash at reloading amdgpu</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">[CAUTION: External Email]<br>
<br>
Hi,<br>
<br>
When I do "rmmod amdgpu; modprobe amdgpu", kernel crashed. This is<br>
vega20. What happens is in amdgpu_device_init():<br>
<br>
<br>
/* check if we need to reset the asic<br>
* E.g., driver was not cleanly unloaded previously, etc.<br>
*/<br>
if (!amdgpu_sriov_vf(adev) &&<br>
amdgpu_asic_need_reset_on_init(adev)) {<br>
r = amdgpu_asic_reset(adev);<br>
if (r) {<br>
dev_err(adev->dev, "asic reset on init failed\n");<br>
goto failed;<br>
}<br>
}<br>
<br>
amdgpu_asic_need_reset_on_init()/soc15_need_reset_on_init() returns true<br>
and it goes to amdgpu_asic_reset()/soc15_asic_mode1_reset(), where it<br>
calls psp_gpu_reset():<br>
<br>
int psp_gpu_reset(struct amdgpu_device *adev)<br>
{<br>
if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)<br>
return 0;<br>
<br>
return psp_mode1_reset(&adev->psp);<br>
}<br>
<br>
Here, however, psp_mode1_reset is NOT assigned as<br>
psp_v11_0_mode1_reset() until amdgpu_device_ip_init(), which is after<br>
amdgpu_asic_reset. This null function pointer causes the kernel crash<br>
and I have to reboot my system.<br>
<br>
Does anyone have an idea how to fix this properly?<br>
<br>
BTW this is the log:<br>
<br>
[ 157.686303] PGD 0 P4D 0<br>
[ 157.688837] Oops: 0000 [#1] SMP PTI<br>
[ 157.692331] CPU: 0 PID: 1902 Comm: kworker/0:2 Tainted: G W<br>
5.0.0-rc1-kfd+ #6<br>
[ 157.700760] Hardware name: ASUS All Series/X99-E WS, BIOS 1302 01/05/2016<br>
[ 157.707543] Workqueue: events work_for_cpu_fn<br>
[ 157.711976] RIP: 0010:psp_gpu_reset+0x18/0x30 [amdgpu]<br>
[ 157.717106] Code: ff ff ff 5b c3 b8 ea ff ff ff c3 0f 1f 80 00 00 00<br>
00 0f 1f 44 00 00 83 bf c8 22 01 00 02 74 03 31 c0 c3 48 8b 87 c0 23 01<br>
00 <48> 8b 40 50 48 85 c0 74 ed 48 81 c7 88 23 01 00 e9 03 3b 8d d6 0f<br>
[ 157.735852] RSP: 0018:ffffaa2544243ce0 EFLAGS: 00010246<br>
[ 157.741077] RAX: 0000000000000000 RBX: ffff97e946f60000 RCX:<br>
0000000000000000<br>
[ 157.748202] RDX: 0000000000000027 RSI: ffffffff976655a0 RDI:<br>
ffff97e946f60000<br>
[ 157.755326] RBP: 0000000000000000 R08: 0000000000000000 R09:<br>
0000000000000002<br>
[ 157.762459] R10: ffffaa2544243ba0 R11: 38a79ac3ec19edd5 R12:<br>
ffff97e946f75088<br>
[ 157.769608] R13: 000000000000000a R14: ffff97e946f75128 R15:<br>
0000000000000001<br>
[ 157.776741] FS: 0000000000000000(0000) GS:ffff97e94f800000(0000)<br>
knlGS:0000000000000000<br>
[ 157.784827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033<br>
[ 157.790564] CR2: 0000000000000050 CR3: 00000008083e6003 CR4:<br>
00000000001606f0<br>
[ 157.797696] Call Trace:<br>
[ 157.800184] soc15_asic_reset+0x81/0x1f0 [amdgpu]<br>
[ 157.804936] amdgpu_device_init+0xcf1/0x1800 [amdgpu]<br>
[ 157.809993] ? rcu_read_lock_sched_held+0x74/0x80<br>
[ 157.814734] amdgpu_driver_load_kms+0x65/0x270 [amdgpu]<br>
<br>
Thanks.<br>
<br>
Regards,<br>
Amber<br>
_______________________________________________<br>
amd-gfx mailing list<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a><br>
</div>
</span></font></div>
</blockquote>
<br>
</body>
</html>