Kernel crash at reloading amdgpu
Lin, Amber
Amber.Lin at amd.com
Thu May 9 14:20:02 UTC 2019
Thank you Alex! It does fix the crash. (GPU post failed following that but at least it exits gracefully.)
Regards,
Amber
On 2019-05-08 10:48 p.m., Deucher, Alexander wrote:
The attached patch should fix it.
Alex
________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> on behalf of Lin, Amber <Amber.Lin at amd.com><mailto:Amber.Lin at amd.com>
Sent: Wednesday, May 8, 2019 4:56 PM
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Kernel crash at reloading amdgpu
[CAUTION: External Email]
Hi,
When I do "rmmod amdgpu; modprobe amdgpu", kernel crashed. This is
vega20. What happens is in amdgpu_device_init():
/* check if we need to reset the asic
* E.g., driver was not cleanly unloaded previously, etc.
*/
if (!amdgpu_sriov_vf(adev) &&
amdgpu_asic_need_reset_on_init(adev)) {
r = amdgpu_asic_reset(adev);
if (r) {
dev_err(adev->dev, "asic reset on init failed\n");
goto failed;
}
}
amdgpu_asic_need_reset_on_init()/soc15_need_reset_on_init() returns true
and it goes to amdgpu_asic_reset()/soc15_asic_mode1_reset(), where it
calls psp_gpu_reset():
int psp_gpu_reset(struct amdgpu_device *adev)
{
if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP)
return 0;
return psp_mode1_reset(&adev->psp);
}
Here, however, psp_mode1_reset is NOT assigned as
psp_v11_0_mode1_reset() until amdgpu_device_ip_init(), which is after
amdgpu_asic_reset. This null function pointer causes the kernel crash
and I have to reboot my system.
Does anyone have an idea how to fix this properly?
BTW this is the log:
[ 157.686303] PGD 0 P4D 0
[ 157.688837] Oops: 0000 [#1] SMP PTI
[ 157.692331] CPU: 0 PID: 1902 Comm: kworker/0:2 Tainted: G W
5.0.0-rc1-kfd+ #6
[ 157.700760] Hardware name: ASUS All Series/X99-E WS, BIOS 1302 01/05/2016
[ 157.707543] Workqueue: events work_for_cpu_fn
[ 157.711976] RIP: 0010:psp_gpu_reset+0x18/0x30 [amdgpu]
[ 157.717106] Code: ff ff ff 5b c3 b8 ea ff ff ff c3 0f 1f 80 00 00 00
00 0f 1f 44 00 00 83 bf c8 22 01 00 02 74 03 31 c0 c3 48 8b 87 c0 23 01
00 <48> 8b 40 50 48 85 c0 74 ed 48 81 c7 88 23 01 00 e9 03 3b 8d d6 0f
[ 157.735852] RSP: 0018:ffffaa2544243ce0 EFLAGS: 00010246
[ 157.741077] RAX: 0000000000000000 RBX: ffff97e946f60000 RCX:
0000000000000000
[ 157.748202] RDX: 0000000000000027 RSI: ffffffff976655a0 RDI:
ffff97e946f60000
[ 157.755326] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000002
[ 157.762459] R10: ffffaa2544243ba0 R11: 38a79ac3ec19edd5 R12:
ffff97e946f75088
[ 157.769608] R13: 000000000000000a R14: ffff97e946f75128 R15:
0000000000000001
[ 157.776741] FS: 0000000000000000(0000) GS:ffff97e94f800000(0000)
knlGS:0000000000000000
[ 157.784827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 157.790564] CR2: 0000000000000050 CR3: 00000008083e6003 CR4:
00000000001606f0
[ 157.797696] Call Trace:
[ 157.800184] soc15_asic_reset+0x81/0x1f0 [amdgpu]
[ 157.804936] amdgpu_device_init+0xcf1/0x1800 [amdgpu]
[ 157.809993] ? rcu_read_lock_sched_held+0x74/0x80
[ 157.814734] amdgpu_driver_load_kms+0x65/0x270 [amdgpu]
Thanks.
Regards,
Amber
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190509/46addc9b/attachment.html>
More information about the amd-gfx
mailing list