Amdgpu kernel oops and freezing on system suspend and hibernate
Harvey
harv at gmx.de
Tue Mar 23 15:26:50 UTC 2021
Alex,
thanks for the hint, but...
Is this patch intended for kernel 5.11.8?
I applied the patch against 5.11.8 and it is freezing again:
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail
[amdgpu]] *ERROR* Waiting for fences timed out!
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]]
*ERROR* ring sdma0 timeout, signaled seq=615, emitted seq=617
Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]]
*ERROR* Process information: process pid 0 thread pid 0
Mär 23 16:18:51 obelix kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Mär 23 16:18:51 obelix kernel: BUG: kernel NULL pointer dereference,
address: 0000000000000029
Mär 23 16:18:51 obelix kernel: #PF: supervisor read access in kernel mode
Mär 23 16:18:51 obelix kernel: #PF: error_code(0x0000) - not-present page
Mär 23 16:18:51 obelix kernel: PGD 0 P4D 0
Mär 23 16:18:51 obelix kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mär 23 16:18:51 obelix kernel: CPU: 12 PID: 178 Comm: kworker/12:1 Not
tainted 5.11.8-arch1-1-custom #1
Mär 23 16:18:51 obelix kernel: Hardware name: Micro-Star International
Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.117 10/29/2020
Mär 23 16:18:51 obelix kernel: Workqueue: events drm_sched_job_timedout
[gpu_sched]
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00
48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX:
ffff8b044c5ee000 RCX: 000000000080005b
Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI:
0000000000000001 RDI: ffff8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08:
0000000000000001 R09: 0000000000000000
Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11:
ffffffffafccba00 R12: ffff8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14:
ffff8b04414a1000 R15: ffff8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS: 0000000000000000(0000)
GS:ffff8b075f900000(0000) knlGS:0000000000000000
Mär 23 16:18:51 obelix kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3:
00000001ab010000 CR4: 0000000000350ee0
Mär 23 16:18:51 obelix kernel: Call Trace:
Mär 23 16:18:51 obelix kernel: stop_cpsch+0xa0/0xc0 [amdgpu]
Mär 23 16:18:51 obelix kernel: kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Mär 23 16:18:51 obelix kernel: kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Mär 23 16:18:51 obelix kernel:
amdgpu_device_gpu_recover.cold+0x36e/0x95d [amdgpu]
Mär 23 16:18:51 obelix kernel: amdgpu_job_timedout+0x121/0x140 [amdgpu]
Mär 23 16:18:51 obelix kernel: drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Mär 23 16:18:51 obelix kernel: process_one_work+0x214/0x3e0
Mär 23 16:18:51 obelix kernel: worker_thread+0x4d/0x3d0
Mär 23 16:18:51 obelix kernel: ? rescuer_thread+0x3c0/0x3c0
Mär 23 16:18:51 obelix kernel: kthread+0x133/0x150
Mär 23 16:18:51 obelix kernel: ? __kthread_bind_mask+0x60/0x60
Mär 23 16:18:51 obelix kernel: ret_from_fork+0x22/0x30
Mär 23 16:18:51 obelix kernel: Modules linked in: rfcomm
snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi cmac algif_hash snd_hda_intel algif_skcipher
snd_intel_dspcfg soundwire_intel af_alg soundwire_ge>
Mär 23 16:18:51 obelix kernel: sr_mod cdrom uas usb_storage dm_crypt
cbc encrypted_keys dm_mod trusted tpm crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd
glue_helper serio_raw ccp xhc>
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029
Mär 23 16:18:51 obelix kernel: ---[ end trace 8a72c5e07cbe6b63 ]---
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0
[amdgpu]
Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f
84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48
8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00
48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX:
ffff8b044c5ee000 RCX: 000000000080005b
Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI:
0000000000000001 RDI: ffff8b044a877bc0
Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08:
0000000000000001 R09: 0000000000000000
Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11:
ffffffffafccba00 R12: ffff8b044c5ee0d0
Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14:
ffff8b04414a1000 R15: ffff8b04414a10c8
Mär 23 16:18:51 obelix kernel: FS: 0000000000000000(0000)
GS:ffff8b075f900000(0000) knlGS:0000000000000000
Mär 23 16:18:51 obelix kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3:
0000000105594000 CR4: 0000000000350ee0
Mär 23 16:19:10 obelix systemd[1]: systemd-hostnamed.service:
Deactivated successfully.
Mär 23 16:19:10 obelix audit[1]: SERVICE_STOP pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
Mär 23 16:19:10 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail
[amdgpu]] *ERROR* Waiting for fences timed out!
Greetings
Harvey
Am 22.03.21 um 20:22 schrieb Alex Deucher:
> On Thu, Mar 18, 2021 at 8:19 AM Harvey <harv at gmx.de> wrote:
>>
>> Alex,
>>
>> I waited for kernel 5.11.7 to hit our repos yesterday evening and tested
>> again:
>>
>> 1. The suspend issue is gone - suspend and resume now work as expected.
>>
>> 2. System hibernation seems to be a different beast - still freezing
>
> You need this patch:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/711c13547aad08f2cfe996e0cddc3d56f1233081
>
> Alex
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
--
I am root. If you see me laughing, you'd better have a backup!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20210323/d3745ec2/attachment.sig>
More information about the amd-gfx
mailing list