Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)
Christian König
christian.koenig at amd.com
Tue Feb 20 13:50:04 UTC 2024
Hi Erhard,
Am 20.02.24 um 13:45 schrieb Erhard Furtner:
> On Tue, 20 Feb 2024 16:12:44 +0700
> Bagas Sanjaya <bagasdotme at gmail.com> wrote:
>
>>> [ 0.000000] Linux version 6.7.5-Zen3 (root at supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
>> Is it vanilla kernel (i.e. no patches applied)? Can you also check current
>> mainline (v6.8-rc5)?
>>
>> Confused...
> Yes, this kernel was built from upstream git stable sources, no additional patches.
>
> It's just that I use my own custom kernel .config that's why I attached it. But the kernel should run in qemu too.
Yeah and that's probably the problem. The test is not supposed to be
compiled and executed on bare metal, but rather just as unit test
through user mode Linux.
We probably don't check that correctly in the kconfig for some reason.
Can you provide your .config file?
Thanks,
Christian.
>
> Also the issue is reproducible on v6.8-rc5 (dmesg attached).
>
> Additionally I tried 'modprobe -v ttm-device-test' on v6.8-rc5 with KASAN enabled instead of KFENCE, same kernel .config otherwise. With KASAN I get a different dmesg and the test completes with a failure. And I don't seem to get memory corruption afterwards:
>
> [...]
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: ttm_device
> # module: ttm_device_test
> 1..5
> ok 1 ttm_device_init_basic
> # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
> Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
> list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
> num_dev == 3 (0x3)
> not ok 2 ttm_device_init_multiple
> ok 3 ttm_device_fini_basic
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 2146 at drivers/gpu/drm/ttm/ttm_device.c:206 ttm_device_init+0x23/0x281 [ttm]
> Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher amdgpu wmi_bmof amd64_edac edac_mce_amd snd_hda_codec_hdmi input_leds snd_hda_intel amdxcp snd_intel_dspcfg kvm_amd snd_hda_codec snd_hwdep snd_hda_core mfd_core snd_pcm gpu_sched snd_timer video drm_suballoc_helper snd i2c_algo_bit drm_ttm_helper gpio_amdpt soundcore ttm drm_exec button drm_display_helper rapl gpio_generic wmi drm_buddy k10temp evdev joydev lz4 lz4_compress lz4_decompress sg zram nct6775 nct6775_core hwmon_vid hwmon loop configfs hid_generic usbhid hid sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel xhci_pci libaes xhci_hcd crypto_simd ccp cryptd usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> CPU: 5 PID: 2146 Comm: kunit_try_catch Tainted: G B N 6.8.0-rc5-Zen3 #3
> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> RIP: 0010:ttm_device_init+0x23/0x281 [ttm]
> Code: 31 ff e9 fa e4 d5 e6 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 18 8b 44 24 50 48 89 14 24 89 44 24 0c 4d 85 c0 75 0c <0f> 0b bd ea ff ff ff e9 2f 02 00 00 48 89 fb 49 89 f7 49 89 ce 4d
> RSP: 0018:ffffc9000611fcf8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888190184000 RCX: ffff888100651b18
> RDX: ffff88817d4a6400 RSI: ffffffffc2033d40 RDI: ffff888106abc000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff888106abc000 R14: 0000000000000000 R15: ffff888100651b18
> FS: 0000000000000000(0000) GS:ffff8887de880000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007feb67e03b20 CR3: 00000001608ac000 CR4: 0000000000b50ef0
> Call Trace:
> <TASK>
> ? __warn+0x113/0x14c
> ? ttm_device_init+0x23/0x281 [ttm]
> ? report_bug+0x1b3/0x229
> ? ttm_device_init+0x23/0x281 [ttm]
> ? handle_bug+0x3c/0x7c
> ? exc_invalid_op+0x17/0x46
> ? asm_exc_invalid_op+0x1a/0x20
> ? ttm_device_init+0x23/0x281 [ttm]
> ? local_clock_noinstr+0xc/0xa8
> ttm_device_kunit_init+0xf1/0x10f [ttm_kunit_helpers]
> ttm_device_init_no_vma_man+0x145/0x1e7 [ttm_device_test]
> ? ttm_device_init_pools+0x61e/0x61e [ttm_device_test]
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? timekeeping_get_ns+0x60/0xf8
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? ktime_get_ts64+0x68/0x109
> kunit_try_run_case+0x269/0x3cc [kunit]
> ? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_raw_spin_unlock+0x5d/0x1b6
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? kunit_try_catch_throw+0x6a/0x6a [kunit]
> ? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
> kunit_generic_run_threadfn_adapter+0x54/0x86 [kunit]
> kthread+0x25e/0x26d
> ? kthread_complete_and_exit+0x1f/0x1f
> ret_from_fork+0x23/0x54
> ? kthread_complete_and_exit+0x1f/0x1f
> ret_from_fork_asm+0x11/0x20
> </TASK>
> ---[ end trace 0000000000000000 ]---
> ok 4 ttm_device_init_no_vma_man
> KTAP version 1
> # Subtest: ttm_device_init_pools
> ok 1 No DMA allocations, no DMA32 required
> ok 2 DMA allocations, DMA32 required
> ok 3 No DMA allocations, DMA32 required
> ok 4 DMA allocations, no DMA32 required
> # ttm_device_init_pools: pass:4 fail:0 skip:0 total:4
> ok 5 ttm_device_init_pools
> # ttm_device: pass:4 fail:1 skip:0 total:5
> # Totals: pass:7 fail:1 skip:0 total:8
> not ok 1 ttm_device
> [...]
>
>
> Regards,
> Erhard
More information about the dri-devel
mailing list