Amdgpu kernel oops and freezing on system suspend and hibernate

Harvey harv at gmx.de
Tue Mar 23 15:26:50 UTC 2021


Alex,

thanks for the hint, but...

Is this patch intended for kernel 5.11.8?

I applied the patch against 5.11.8 and it is freezing again:


Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail 
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail 
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] 
*ERROR* ring sdma0 timeout, signaled seq=615, emitted seq=617
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] 
*ERROR* Process information: process  pid 0 thread  pid 0
Mär 23 16:18:51 obelix kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Mär 23 16:18:51 obelix kernel: BUG: kernel NULL pointer dereference, 
address: 0000000000000029
Mär 23 16:18:51 obelix kernel: #PF: supervisor read access in kernel mode
Mär 23 16:18:51 obelix kernel: #PF: error_code(0x0000) - not-present page
Mär 23 16:18:51 obelix kernel: PGD 0 P4D 0
Mär 23 16:18:51 obelix kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mär 23 16:18:51 obelix kernel: CPU: 12 PID: 178 Comm: kworker/12:1 Not 
tainted 5.11.8-arch1-1-custom #1
Mär 23 16:18:51 obelix kernel: Hardware name: Micro-Star International 
Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.117 10/29/2020
Mär 23 16:18:51 obelix kernel: Workqueue: events drm_sched_job_timedout 
[gpu_sched]
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 
48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX: 
ffff8b044c5ee000 RCX: 000000000080005b
Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI: 
0000000000000001 RDI: ffff8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08: 
0000000000000001 R09: 0000000000000000
Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11: 
ffffffffafccba00 R12: ffff8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14: 
ffff8b04414a1000 R15: ffff8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS:  0000000000000000(0000) 
GS:ffff8b075f900000(0000) knlGS:0000000000000000
Mär 23 16:18:51 obelix kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3: 
00000001ab010000 CR4: 0000000000350ee0
Mär 23 16:18:51 obelix kernel: Call Trace:
Mär 23 16:18:51 obelix kernel:  stop_cpsch+0xa0/0xc0 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Mär 23 16:18:51 obelix kernel: 
amdgpu_device_gpu_recover.cold+0x36e/0x95d [amdgpu]
Mär 23 16:18:51 obelix kernel:  amdgpu_job_timedout+0x121/0x140 [amdgpu]
Mär 23 16:18:51 obelix kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Mär 23 16:18:51 obelix kernel:  process_one_work+0x214/0x3e0
Mär 23 16:18:51 obelix kernel:  worker_thread+0x4d/0x3d0
Mär 23 16:18:51 obelix kernel:  ? rescuer_thread+0x3c0/0x3c0
Mär 23 16:18:51 obelix kernel:  kthread+0x133/0x150
Mär 23 16:18:51 obelix kernel:  ? __kthread_bind_mask+0x60/0x60
Mär 23 16:18:51 obelix kernel:  ret_from_fork+0x22/0x30
Mär 23 16:18:51 obelix kernel: Modules linked in: rfcomm 
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
snd_hda_codec_hdmi cmac algif_hash snd_hda_intel algif_skcipher 
snd_intel_dspcfg soundwire_intel af_alg soundwire_ge>
Mär 23 16:18:51 obelix kernel:  sr_mod cdrom uas usb_storage dm_crypt 
cbc encrypted_keys dm_mod trusted tpm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd 
glue_helper serio_raw ccp xhc>
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029
Mär 23 16:18:51 obelix kernel: ---[ end trace 8a72c5e07cbe6b63 ]---
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 
48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX: 
ffff8b044c5ee000 RCX: 000000000080005b
Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI: 
0000000000000001 RDI: ffff8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08: 
0000000000000001 R09: 0000000000000000
Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11: 
ffffffffafccba00 R12: ffff8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14: 
ffff8b04414a1000 R15: ffff8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS:  0000000000000000(0000) 
GS:ffff8b075f900000(0000) knlGS:0000000000000000
Mär 23 16:18:51 obelix kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3: 
0000000105594000 CR4: 0000000000350ee0
Mär 23 16:19:10 obelix systemd[1]: systemd-hostnamed.service: 
Deactivated successfully.
Mär 23 16:19:10 obelix audit[1]: SERVICE_STOP pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? 
terminal=? res=success'
Mär 23 16:19:10 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail 
[amdgpu]] *ERROR* Waiting for fences timed out!

Greetings
Harvey

Am 22.03.21 um 20:22 schrieb Alex Deucher:
> On Thu, Mar 18, 2021 at 8:19 AM Harvey <harv at gmx.de> wrote:
>>
>> Alex,
>>
>> I waited for kernel 5.11.7 to hit our repos yesterday evening and tested
>> again:
>>
>> 1. The suspend issue is gone - suspend and resume now work as expected.
>>
>> 2. System hibernation seems to be a different beast - still freezing
> 
> You need this patch:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/711c13547aad08f2cfe996e0cddc3d56f1233081
> 
> Alex
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

-- 
I am root. If you see me laughing, you'd better have a backup!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210323/d3745ec2/attachment.sig>


More information about the amd-gfx mailing list