BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

Christian König ckoenig.leichtzumerken at gmail.com
Wed Apr 26 11:50:42 UTC 2023


Sending that once more from my mailing list address since AMD internal 
servers are blocking the mail.

Regards,
Christian.

Am 26.04.23 um 13:48 schrieb Christian König:
> WTF? I own you a beer!
>
> I've fixed exactly that problem during the review process of the 
> cleanup patch and because of this didn't considered that the code is 
> still there.
>
> It also explains why we don't see that in our testing.
>
> @Mikhail can you test that patch with drm-misc-next?
>
> Thanks,
> Christian.
>
> Am 26.04.23 um 04:00 schrieb Chen, Guchun:
>> After reviewing this whole history, maybe attached patch is able to 
>> fix your problem. Can you have a try please?
>>
>> Regards,
>> Guchun
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> Mikhail Gavrilov
>>> Sent: Tuesday, April 25, 2023 9:20 PM
>>> To: Koenig, Christian <Christian.Koenig at amd.com>
>>> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>; dri-devel <dri-
>>> devel at lists.freedesktop.org>; amd-gfx list 
>>> <amd-gfx at lists.freedesktop.org>;
>>> Linux List Kernel Mailing <linux-kernel at vger.kernel.org>
>>> Subject: Re: BUG: KASAN: null-ptr-deref in
>>> drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
>>>
>>> On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov
>>> <mikhail.v.gavrilov at gmail.com> wrote:
>>>> Important don't give up.
>>>> https://youtu.be/25zhHBGIHJ8 [40 min]
>>>> https://youtu.be/utnDR26eYBY [50 min]
>>>> https://youtu.be/DJQ_tiimW6g [12 min]
>>>> https://youtu.be/Y6AH1oJKivA [6 min]
>>>> Yes the issue is everything reproducible, but time to time it not
>>>> happens at first attempt.
>>>> I also uploaded other videos which proves that the issue definitely
>>>> exists if someone will launch those games in turn.
>>>> Reproducibility is only a matter of time.
>>>>
>>>> Anyway I didn't want you to spend so much time trying to reproduce it.
>>>> This monkey business fits me more than you.
>>>> It would be better if I could collect more useful info.
>>> Christian,
>>> Did you manage to reproduce the problem?
>>>
>>> At the weekend I faced with slab-use-after-free in
>>> amdgpu_vm_handle_moved.
>>> I didn't play in the games at this time.
>>> The Xwayland process was affected so it leads to desktop hang.
>>>
>>> ================================================================
>>> ==
>>> BUG: KASAN: slab-use-after-free in
>>> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] Read of size 8 at addr
>>> ffff888295c66190 by task Xwayland:cs0/173185
>>>
>>> CPU: 21 PID: 173185 Comm: Xwayland:cs0 Tainted: G        W L
>>> -------  --- 6.3.0-0.rc7.20230420gitcb0856346a60.59.fc39.x86_64+debug
>>> #1
>>> Hardware name: System manufacturer System Product Name/ROG STRIX
>>> X570-I GAMING, BIOS 4601 02/02/2023 Call Trace:
>>>   <TASK>
>>>   dump_stack_lvl+0x76/0xd0
>>>   print_report+0xcf/0x670
>>>   ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]  ?
>>> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>>>   kasan_report+0xa8/0xe0
>>>   ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>>>   amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>>>   amdgpu_cs_ioctl+0x2b7e/0x5630 [amdgpu]
>>>   ? __pfx___lock_acquire+0x10/0x10
>>>   ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]  ? 
>>> mark_lock+0x101/0x16e0  ?
>>> __lock_acquire+0xe54/0x59f0  ? __pfx_lock_release+0x10/0x10  ?
>>> __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
>>>   drm_ioctl_kernel+0x1fc/0x3d0
>>>   ? __pfx_drm_ioctl_kernel+0x10/0x10
>>>   drm_ioctl+0x4c5/0xaa0
>>>   ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]  ?
>>> __pfx_drm_ioctl+0x10/0x10  ? _raw_spin_unlock_irqrestore+0x66/0x80
>>>   ? lockdep_hardirqs_on+0x81/0x110
>>>   ? _raw_spin_unlock_irqrestore+0x4f/0x80
>>>   amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>>>   __x64_sys_ioctl+0x131/0x1a0
>>>   do_syscall_64+0x60/0x90
>>>   ? do_syscall_64+0x6c/0x90
>>>   ? lockdep_hardirqs_on+0x81/0x110
>>>   ? do_syscall_64+0x6c/0x90
>>>   ? lockdep_hardirqs_on+0x81/0x110
>>>   ? do_syscall_64+0x6c/0x90
>>>   ? lockdep_hardirqs_on+0x81/0x110
>>>   ? do_syscall_64+0x6c/0x90
>>>   ? lockdep_hardirqs_on+0x81/0x110
>>>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>> RIP: 0033:0x7ffb71b0892d
>>> Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00
>>> 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> 
>>> c2 3d 00
>>> f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>>> RSP: 002b:00007ffb677fe840 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000010
>>> RAX: ffffffffffffffda RBX: 00007ffb677fe9f8 RCX: 00007ffb71b0892d
>>> RDX: 00007ffb677fe900 RSI: 00000000c0186444 RDI: 000000000000000d
>>> RBP: 00007ffb677fe890 R08: 00007ffb677fea50 R09: 00007ffb677fe8e0
>>> R10: 0000556c4611bec0 R11: 0000000000000246 R12: 00007ffb677fe900
>>> R13: 00000000c0186444 R14: 000000000000000d R15: 00007ffb677fe9f8
>>> </TASK>
>>>
>>> Allocated by task 173181:
>>>   kasan_save_stack+0x33/0x60
>>>   kasan_set_track+0x25/0x30
>>>   __kasan_kmalloc+0x8f/0xa0
>>>   __kmalloc_node+0x65/0x160
>>>   amdgpu_bo_create+0x31e/0xfb0 [amdgpu]
>>>   amdgpu_bo_create_user+0xca/0x160 [amdgpu]
>>>   amdgpu_gem_create_ioctl+0x398/0x980 [amdgpu]
>>>   drm_ioctl_kernel+0x1fc/0x3d0
>>>   drm_ioctl+0x4c5/0xaa0
>>>   amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>>>   __x64_sys_ioctl+0x131/0x1a0
>>>   do_syscall_64+0x60/0x90
>>>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> Freed by task 173185:
>>>   kasan_save_stack+0x33/0x60
>>>   kasan_set_track+0x25/0x30
>>>   kasan_save_free_info+0x2e/0x50
>>>   __kasan_slab_free+0x10b/0x1a0
>>>   slab_free_freelist_hook+0x11e/0x1d0
>>>   __kmem_cache_free+0xc0/0x2e0
>>>   ttm_bo_release+0x667/0x9e0 [ttm]
>>>   amdgpu_bo_unref+0x35/0x70 [amdgpu]
>>>   amdgpu_gem_object_free+0x73/0xb0 [amdgpu]
>>>   drm_gem_handle_delete+0xe3/0x150
>>>   drm_ioctl_kernel+0x1fc/0x3d0
>>>   drm_ioctl+0x4c5/0xaa0
>>>   amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>>>   __x64_sys_ioctl+0x131/0x1a0
>>>   do_syscall_64+0x60/0x90
>>>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> Last potentially related work creation:
>>>   kasan_save_stack+0x33/0x60
>>>   __kasan_record_aux_stack+0x97/0xb0
>>>   __call_rcu_common.constprop.0+0xf8/0x1af0
>>>   drm_sched_fence_release_scheduled+0xb8/0xe0 [gpu_sched]
>>>   dma_resv_reserve_fences+0x4dc/0x7f0
>>>   ttm_eu_reserve_buffers+0x3f6/0x1190 [ttm]
>>>   amdgpu_cs_ioctl+0x204d/0x5630 [amdgpu]
>>>   drm_ioctl_kernel+0x1fc/0x3d0
>>>   drm_ioctl+0x4c5/0xaa0
>>>   amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>>>   __x64_sys_ioctl+0x131/0x1a0
>>>   do_syscall_64+0x60/0x90
>>>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> Second to last potentially related work creation:
>>>   kasan_save_stack+0x33/0x60
>>>   __kasan_record_aux_stack+0x97/0xb0
>>>   __call_rcu_common.constprop.0+0xf8/0x1af0
>>>   drm_sched_fence_release_scheduled+0xb8/0xe0 [gpu_sched]
>>>   amdgpu_ctx_add_fence+0x2b1/0x390 [amdgpu]
>>>   amdgpu_cs_ioctl+0x44d0/0x5630 [amdgpu]
>>>   drm_ioctl_kernel+0x1fc/0x3d0
>>>   drm_ioctl+0x4c5/0xaa0
>>>   amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>>>   __x64_sys_ioctl+0x131/0x1a0
>>>   do_syscall_64+0x60/0x90
>>>   entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>
>>> The buggy address belongs to the object at ffff888295c66000 which 
>>> belongs
>>> to the cache kmalloc-1k of size 1024 The buggy address is located 
>>> 400 bytes
>>> inside of  freed 1024-byte region [ffff888295c66000, ffff888295c66400)
>>>
>>> The buggy address belongs to the physical page:
>>> page:00000000125ffbe3 refcount:1 mapcount:0 mapping:0000000000000000
>>> index:0x0 pfn:0x295c60
>>> head:00000000125ffbe3 order:3 entire_mapcount:0 nr_pages_mapped:0
>>> pincount:0 anon flags:
>>> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>>> raw: 0017ffffc0010200 ffff88810004cdc0 0000000000000000
>>> dead000000000001
>>> raw: 0000000000000000 0000000000100010 00000001ffffffff
>>> 0000000000000000 page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>>   ffff888295c66080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>   ffff888295c66100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ffff888295c66180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>                           ^
>>>   ffff888295c66200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>   ffff888295c66280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ================================================================
>>> ==
>>>
>>> -- 
>>> Best Regards,
>>> Mike Gavrilov.
>



More information about the amd-gfx mailing list