[bug] 5.11-rc5 brought page allocation failure issue [ttm][amdgpu]

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Tue Feb 9 19:22:14 UTC 2021


On Mon, 8 Feb 2021 at 14:18, Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Are the other problems gone as well?
>

And yes and no.
The issue with monitor turns off was gone after rc6 (git3aaf0a27ffc2)
But both traces
1) BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196 (kernel 5.11 specific)
2) WARNING: CPU: 14 PID: 504 at kernel/locking/lockdep.c:4618
lockdep_init_map_waits+0x18b/0x210 (Navi specific)
are still happening on every boot.

1)
[    5.806032] BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
[    5.806048] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
504, name: systemd-udevd
[    5.806064] 1 lock held by systemd-udevd/504:
[    5.806073]  #0: ffff9c5ac2e4f258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x3b/0xb0
[    5.806097] CPU: 14 PID: 504 Comm: systemd-udevd Not tainted
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    5.806117] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    5.806135] Call Trace:
[    5.806142]  dump_stack+0x8b/0xb0
[    5.806153]  ___might_sleep.cold+0xb6/0xc6
[    5.806163]  ? dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806338]  kmem_cache_alloc_trace+0x204/0x230
[    5.806353]  dcn30_clock_source_create+0x34/0xb0 [amdgpu]
[    5.806516]  dcn30_create_resource_pool+0x1de/0x13b0 [amdgpu]
[    5.806678]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.806690]  ? trace_kmalloc+0xb2/0xe0
[    5.806699]  ? __kmalloc+0x191/0x280
[    5.806710]  ? dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.806869]  dc_create_resource_pool+0x110/0x1d0 [amdgpu]
[    5.807026]  dc_create+0x205/0x790 [amdgpu]
[    5.807181]  ? trace_kmalloc+0xb2/0xe0
[    5.807190]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.807203]  amdgpu_dm_init.isra.0+0x1b9/0x250 [amdgpu]
[    5.807369]  ? dev_vprintk_emit+0x171/0x195
[    5.807385]  ? dev_printk_emit+0x3e/0x40
[    5.807403]  dm_hw_init+0xe/0x20 [amdgpu]
[    5.807563]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    5.807728]  ? pci_conf1_read+0x9b/0xf0
[    5.807741]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    5.807877]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    5.808009]  local_pci_probe+0x42/0x80
[    5.808020]  pci_device_probe+0xd9/0x1a0
[    5.808031]  really_probe+0xf2/0x440
[    5.808042]  driver_probe_device+0xe1/0x150
[    5.808053]  device_driver_attach+0xa8/0xb0
[    5.808063]  __driver_attach+0x8c/0x150
[    5.808071]  ? device_driver_attach+0xb0/0xb0
[    5.808080]  ? device_driver_attach+0xb0/0xb0
[    5.808090]  bus_for_each_dev+0x67/0x90
[    5.808101]  bus_add_driver+0x12e/0x1f0
[    5.808111]  driver_register+0x8f/0xe0
[    5.808119]  ? 0xffffffffc0c02000
[    5.808128]  do_one_initcall+0x67/0x320
[    5.808138]  ? rcu_read_lock_sched_held+0x3f/0x80
[    5.808148]  ? trace_kmalloc+0xb2/0xe0
[    5.808157]  ? kmem_cache_alloc_trace+0x174/0x230
[    5.808169]  do_init_module+0x5c/0x270
[    5.808179]  __do_sys_init_module+0x130/0x190
[    5.808196]  do_syscall_64+0x33/0x40
[    5.808205]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    5.808216] RIP: 0033:0x7f4d133aa40e
[    5.808225] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    5.808256] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    5.808272] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    5.808285] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    5.808298] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    5.808311] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    5.808324] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100


2)
[    6.064107] BUG: key ffff9c5adb339148 has not been registered!
[    6.064119] ------------[ cut here ]------------
[    6.064121] DEBUG_LOCKS_WARN_ON(1)
[    6.064124] WARNING: CPU: 14 PID: 504 at
kernel/locking/lockdep.c:4618 lockdep_init_map_waits+0x18b/0x210
[    6.064131] Modules linked in: amdgpu(+) drm_ttm_helper ttm
iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul
crc32c_intel cec igb drm ghash_clmulni_intel ccp nvme dca i2c_algo_bit
nvme_core wmi pinctrl_amd fuse
[    6.064147] CPU: 14 PID: 504 Comm: systemd-udevd Tainted: G
W        --------- ---
5.11.0-0.rc6.20210204git61556703b610.145.fc34.x86_64 #1
[    6.064152] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3402 01/13/2021
[    6.064156] RIP: 0010:lockdep_init_map_waits+0x18b/0x210
[    6.064159] Code: 00 85 c0 0f 84 77 ff ff ff 8b 3d 08 5e f1 01 85
ff 0f 85 69 ff ff ff 48 c7 c6 cc 98 60 9a 48 c7 c7 7d d4 5a 9a e8 51
3a b7 00 <0f> 0b e9 4f ff ff ff e8 c9 82 bd 00 85 c0 74 21 44 8b 15 d6
5d f1
[    6.064165] RSP: 0018:ffffbba701be78c8 EFLAGS: 00010292
[    6.064168] RAX: 0000000000000016 RBX: ffffffff9a247b80 RCX: 0000000000000027
[    6.064171] RDX: ffff9c61c87db2a8 RSI: 0000000000000001 RDI: ffff9c61c87db2a0
[    6.064174] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffbba701be7700
[    6.064177] R10: ffffbba701be76f8 R11: 0000000000000000 R12: ffff9c5adb339148
[    6.064180] R13: 0000000000000000 R14: ffff9c5adb610348 R15: ffff9c5adb610348
[    6.064183] FS:  00007f4d1279c340(0000) GS:ffff9c61c8600000(0000)
knlGS:0000000000000000
[    6.064186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.064189] CR2: 0000563f79657000 CR3: 0000000111396000 CR4: 0000000000350ee0
[    6.064192] Call Trace:
[    6.064194]  __kernfs_create_file+0x7b/0x100
[    6.064198]  sysfs_add_file_mode_ns+0xa2/0x190
[    6.064202]  sysfs_create_bin_file+0x50/0x70
[    6.064205]  hdcp_create_workqueue+0x3bd/0x410 [amdgpu]
[    6.064365]  amdgpu_dm_init.isra.0.cold+0x293/0x13e7 [amdgpu]
[    6.064526]  ? dev_vprintk_emit+0x171/0x195
[    6.064529]  ? psp_set_srm+0xb0/0xb0 [amdgpu]
[    6.064691]  ? hdcp_update_display+0x1f0/0x1f0 [amdgpu]
[    6.064847]  ? dev_printk_emit+0x3e/0x40
[    6.064851]  dm_hw_init+0xe/0x20 [amdgpu]
[    6.065005]  amdgpu_device_init.cold+0x179f/0x1afd [amdgpu]
[    6.065160]  ? pci_conf1_read+0x9b/0xf0
[    6.065164]  amdgpu_driver_load_kms+0x68/0x280 [amdgpu]
[    6.065291]  amdgpu_pci_probe+0x129/0x1b0 [amdgpu]
[    6.065415]  local_pci_probe+0x42/0x80
[    6.065418]  pci_device_probe+0xd9/0x1a0
[    6.065421]  really_probe+0xf2/0x440
[    6.065425]  driver_probe_device+0xe1/0x150
[    6.065428]  device_driver_attach+0xa8/0xb0
[    6.065431]  __driver_attach+0x8c/0x150
[    6.065433]  ? device_driver_attach+0xb0/0xb0
[    6.065435]  ? device_driver_attach+0xb0/0xb0
[    6.065438]  bus_for_each_dev+0x67/0x90
[    6.065441]  bus_add_driver+0x12e/0x1f0
[    6.065445]  driver_register+0x8f/0xe0
[    6.065447]  ? 0xffffffffc0c02000
[    6.065449]  do_one_initcall+0x67/0x320
[    6.065452]  ? rcu_read_lock_sched_held+0x3f/0x80
[    6.065455]  ? trace_kmalloc+0xb2/0xe0
[    6.065458]  ? kmem_cache_alloc_trace+0x174/0x230
[    6.065462]  do_init_module+0x5c/0x270
[    6.065465]  __do_sys_init_module+0x130/0x190
[    6.065469]  do_syscall_64+0x33/0x40
[    6.065472]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    6.065475] RIP: 0033:0x7f4d133aa40e
[    6.065477] Code: 48 8b 0d 65 1a 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 32 1a 0c 00 f7 d8 64 89
01 48
[    6.065483] RSP: 002b:00007ffc81317fb8 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    6.065487] RAX: ffffffffffffffda RBX: 0000563f79509ee0 RCX: 00007f4d133aa40e
[    6.065490] RDX: 0000563f7951daa0 RSI: 0000000000b8a85e RDI: 0000563f79f03db0
[    6.065493] RBP: 0000563f79f03db0 R08: 0000563f79509fd0 R09: 00007ffc813146be
[    6.065496] R10: 0000563a1aa70959 R11: 0000000000000246 R12: 0000563f7951daa0
[    6.065499] R13: 0000563f7950e9c0 R14: 0000000000000000 R15: 0000563f7951f100
[    6.065503] irq event stamp: 304459
[    6.065505] hardirqs last  enabled at (304459):
[<ffffffff99169d57>] console_unlock+0x527/0x640
[    6.065510] hardirqs last disabled at (304458):
[<ffffffff99169ca2>] console_unlock+0x472/0x640
[    6.065514] softirqs last  enabled at (304350):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065518] softirqs last disabled at (304345):
[<ffffffff99e01152>] asm_call_irq_on_stack+0x12/0x20
[    6.065522] ---[ end trace 3e996d7d10608635 ]---


Full kernel log is here: https://pastebin.com/sguf7Tac

-- 
Best Regards,
Mike Gavrilov.


More information about the amd-gfx mailing list