[6.4-rc7][regression] slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Wed Jun 21 07:37:43 UTC 2023


Hi,
after commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 I see KASAN
sanitizer bug message at every boot:

Backtrace:
[   18.600551] ==================================================================
[   18.600558] BUG: KASAN: slab-out-of-bounds in
amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.600943] Write of size 8 at addr ffff8881e4d3a098 by task kworker/8:1/133

[   18.600952] CPU: 8 PID: 133 Comm: kworker/8:1 Tainted: G        W
 L    -------  ---  6.4.0-0.rc7.53.fc39.x86_64+debug #1
[   18.600960] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.331 02/24/2023
[   18.600966] Workqueue: events
amdgpu_device_delayed_init_work_handler [amdgpu]
[   18.601253] Call Trace:
[   18.601256]  <TASK>
[   18.601260]  dump_stack_lvl+0x76/0xd0
[   18.601267]  print_report+0xcf/0x670
[   18.601275]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.601573]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.601865]  kasan_report+0xa8/0xe0
[   18.601870]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.602163]  amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.602455]  gfx_v9_0_ring_emit_ib_gfx+0x4cc/0xd50 [amdgpu]
[   18.602767]  ? amdgpu_sw_ring_ib_begin+0x1b4/0x3d0 [amdgpu]
[   18.603061]  amdgpu_ib_schedule+0x7cb/0x1570 [amdgpu]
[   18.603354]  gfx_v9_0_ring_test_ib+0x375/0x540 [amdgpu]
[   18.603656]  ? __pfx_gfx_v9_0_ring_test_ib+0x10/0x10 [amdgpu]
[   18.603959]  ? __pfx_lock_acquire+0x10/0x10
[   18.603966]  amdgpu_ib_ring_tests+0x2bc/0x490 [amdgpu]
[   18.604260]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[   18.604544]  process_one_work+0x888/0x1460
[   18.604551]  ? worker_thread+0x2c8/0x12c0
[   18.604555]  ? __pfx_process_one_work+0x10/0x10
[   18.604562]  worker_thread+0x104/0x12c0
[   18.604567]  ? __kthread_parkme+0xc1/0x1f0
[   18.604573]  ? __pfx_worker_thread+0x10/0x10
[   18.604577]  kthread+0x2ee/0x3c0
[   18.604581]  ? __pfx_kthread+0x10/0x10
[   18.604586]  ret_from_fork+0x2c/0x50
[   18.604593]  </TASK>

[   18.604598] Allocated by task 466:
[   18.604601]  kasan_save_stack+0x33/0x60
[   18.604606]  kasan_set_track+0x25/0x30
[   18.604610]  __kasan_kmalloc+0x8f/0xa0
[   18.604614]  __kmalloc+0x62/0x160
[   18.604618]  amdgpu_ring_mux_init+0x6e/0x1b0 [amdgpu]
[   18.604905]  gfx_v9_0_sw_init+0xffe/0x2930 [amdgpu]
[   18.605197]  amdgpu_device_init+0x3c36/0x7fc0 [amdgpu]
[   18.605476]  amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
[   18.605753]  amdgpu_pci_probe+0x279/0x9a0 [amdgpu]
[   18.606029]  local_pci_probe+0xdd/0x190
[   18.606034]  pci_device_probe+0x23a/0x770
[   18.606039]  really_probe+0x3e2/0xb80
[   18.606044]  __driver_probe_device+0x18c/0x450
[   18.606048]  driver_probe_device+0x4a/0x120
[   18.606052]  __driver_attach+0x1e5/0x4a0
[   18.606056]  bus_for_each_dev+0x109/0x190
[   18.606061]  bus_add_driver+0x2a1/0x570
[   18.606064]  driver_register+0x134/0x460
[   18.606069]  do_one_initcall+0xd5/0x3b0
[   18.606073]  do_init_module+0x238/0x770
[   18.606079]  load_module+0x5581/0x6f10
[   18.606082]  __do_sys_init_module+0x1f2/0x220
[   18.606086]  do_syscall_64+0x60/0x90
[   18.606091]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[   18.606099] The buggy address belongs to the object at ffff8881e4d3a000
                which belongs to the cache kmalloc-128 of size 128
[   18.606106] The buggy address is located 24 bytes to the right of
                allocated 128-byte region [ffff8881e4d3a000, ffff8881e4d3a080)

[   18.606115] The buggy address belongs to the physical page:
[   18.606119] page:00000000024dbf3d refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x1e4d3a
[   18.606126] head:00000000024dbf3d order:1 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[   18.606132] flags:
0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[   18.606138] page_type: 0xffffffff()
[   18.606143] raw: 0017ffffc0010200 ffff8881000428c0 dead000000000122
0000000000000000
[   18.606148] raw: 0000000000000000 0000000000200020 00000001ffffffff
0000000000000000
[   18.606153] page dumped because: kasan: bad access detected

[   18.606159] Memory state around the buggy address:
[   18.606162]  ffff8881e4d39f80: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[   18.606167]  ffff8881e4d3a000: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[   18.606172] >ffff8881e4d3a080: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   18.606176]                             ^
[   18.606180]  ffff8881e4d3a100: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 fc
[   18.606184]  ffff8881e4d3a180: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   18.606189] ==================================================================
[   18.606201] Disabling lock debugging due to kernel taint

>From bisect log:
5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 is the first bad commit
commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360
Author: Jiadong Zhu <Jiadong.Zhu at amd.com>
Date:   Thu May 25 18:42:15 2023 +0800

    drm/amdgpu: Implement gfx9 patch functions for resubmission

    Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
    during the resubmission scenario.

    Signed-off-by: Jiadong Zhu <Jiadong.Zhu at amd.com>
    Acked-by: Alex Deucher <alexander.deucher at amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
    Cc: stable at vger.kernel.org # 6.3.x

 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 80 +++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

Appears only on my laptop ASUS ROG Strix G15 Advantage Edition
G513QY-HQ007 (Radeon 6800M).
I didn't see such a problem on the desktop Radeon 7900XTX and Radeon 6900XT.


Is there anything else I can help with?

-- 
Best Regards,
Mike Gavrilov.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.zip
Type: application/zip
Size: 45492 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20230621/0e4002bb/attachment-0001.zip>


More information about the amd-gfx mailing list