<div dir="ltr">Hi Vitaly,<br><br>Thank you for looking into this issue!<br><br>We have reproduced this issue with a Radeon RX 580 (Polaris 20) passthrough-ed to a QEMU (4.0.0) VM by VFIO.<br>All bugs were reproducible on the recent 6.8-rc4 Linux kernel (<a href="https://github.com/torvalds/linux/tree/v6.8-rc4">https://github.com/torvalds/linux/tree/v6.8-rc4</a>), which I double checked right now with previous programs.<br><br>Below are the QEMU arguments used, in-VM lspci -vvv and /proc/cpuinfo.<br>Should you need any more information, please let us know.<br><br><b>QEMU arguments</b><br>qemu-system-x86_64<br>-m 2G \<br>-cpu host \<br>-kernel $KERNEL \<br>-append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \<br>-drive file=$DRIVE_FILE,format=qcow2 \<br>-enable-kvm \<br>-device vfio-pci,host=$PCI_ADDR,id=gpu,multifunction=on,x-vga=on \<br>-nographic<br><br><b>root@qemu:~# lspci -vvv</b><br>00:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 47)<br> Subsystem: Gigabyte Technology Co., Ltd Ellesmere [Radeon RX 470/480/570/570X/580/580X/59]<br> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB+<br> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <-<br> Latency: 0<br> Interrupt: pin A routed to IRQ 24<br> Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]<br> Region 2: Memory at f0000000 (64-bit, prefetchable) [size=2M]<br> Region 4: I/O ports at c000 [size=256]<br> Region 5: Memory at feb80000 (32-bit, non-prefetchable) [size=256K]<br> Expansion ROM at 000c0000 [disabled] [size=128K]<br> Capabilities: [48] Vendor Specific Information: Len=08 <?><br> Capabilities: [50] Power Management version 3<br> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)<br> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-<br> Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00<br> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited<br> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-<br> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-<br> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+<br> MaxPayload 128 bytes, MaxReadReq 512 bytes<br> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-<br> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <1us<br> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+<br> LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+<br> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-<br> LnkSta: Speed 5GT/s (downgraded), Width x4 (downgraded)<br> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-<br> DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+<br> 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, Max1<br> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-<br> FRS-<br> AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-<br> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,<br> AtomicOpsCtl: ReqEn+<br> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPha-<br> EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-<br> Retimer- 2Retimers- CrosslinkRes: unsupported<br> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+<br> Address: 00000000fee01004 Data: 0021<br> Kernel driver in use: amdgpu<br><br><b>root@snapuzz:~# cat /proc/cpuinfo</b><br>processor : 0<br>vendor_id : GenuineIntel<br>cpu family : 6<br>model : 183<br>model name : 13th Gen Intel(R) Core(TM) i9-13900K<br>stepping : 1<br>microcode : 0x1<br>cpu MHz : 2995.200<br>cache size : 16384 KB<br>physical id : 0<br>siblings : 1<br>core id : 0<br>cpu cores : 1<br>apicid : 0<br>initial apicid : 0<br>fpu : yes<br>fpu_exception : yes<br>cpuid level : 31<br>wp : yes<br>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflushs<br>vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexprioritg<br>bugs : spectre_v1 spectre_v2 spec_store_bypass mds swapgs itlb_multihit mmio_unknown eb<br>bogomips : 5990.40<br>clflush size : 64<br>cache_alignment : 64<br>address sizes : 40 bits physical, 48 bits virtual<br>power management:<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at 10:57 AM vitaly prosyak <<a href="mailto:vprosyak@amd.com">vprosyak@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u> <div> <p>Hi Joonkyo,</p> <p>Thanks for your reporting!</p> <p>I reproduced the first issue with 'amdgpu_gem_userptr_ioctl' when KAZAN enabled, but i could not reproduce the other two issues.</p> <p>Could you indicate what ASIC did you use to reproduce the the issue?</p> <p>Could you provide details of your system?</p> <p>Much appreciated for your responce.</p> <p>Vitaly<br> </p> <p><br> </p> <p>I placed your findings below to keep the context and track the issues.<br> </p> <p>===========================================================================================================================<br> </p> <pre><fieldset><legend>Reporting a slab-use-after-free in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a slab-use-after-free in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>Joonkyo Jung <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-16, 04:22</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div><a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a>, Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a></td></tr></tbody></table> <div lang="x-unicode"><div dir="ltr">Hello, We would like to report a slab-use-after-free bug in the AMDGPU DRM driver in the linux kernel v6.8-rc4 that we found with our customized Syzkaller. The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession. In amdgpu_bo_move, struct ttm_resource *old_mem = bo->resource is assigned. As you can see on the alloc & free stack calls, on the same function amdgpu_bo_move, amdgpu_move_blit in the end frees bo->resource at ttm_bo_move_accel_cleanup with ttm_bo_wait_free_node(bo, man->use_tt). But amdgpu_bo_move continues after that, reaching trace_amdgpu_bo_move(abo, new_mem->mem_type, old_mem->mem_type) at the end, causing the use-after-free bug. Steps to reproduce are as below. union drm_amdgpu_gem_create *arg1; arg1 = malloc(sizeof(union drm_amdgpu_gem_create)); arg1->in.bo_size = 0x8; arg1->in.alignment = 0x0; arg1->in.domains = 0x4; arg1->in.domain_flags = 0x9; ioctl(fd, 0xc0206440, arg1); arg1->in.bo_size = 0x7fffffff; arg1->in.alignment = 0x0; arg1->in.domains = 0x4; arg1->in.domain_flags = 0x9; ioctl(fd, 0xc0206440, arg1); The KASAN report is as follows: ================================================================== BUG: KASAN: slab-use-after-free in amdgpu_bo_move+0x1479/0x1550 Read of size 4 at addr ffff88800f5bee80 by task syz-executor/219 Call Trace: <TASK> amdgpu_bo_move+0x1479/0x1550 ttm_bo_handle_move_mem+0x4d0/0x700 ttm_mem_evict_first+0x945/0x1230 ttm_bo_mem_space+0x6c7/0x940 ttm_bo_validate+0x286/0x650 ttm_bo_init_reserved+0x34c/0x490 amdgpu_bo_create+0x94b/0x1610 amdgpu_bo_create_user+0xa3/0x130 amdgpu_gem_create_ioctl+0x4bc/0xc10 drm_ioctl_kernel+0x300/0x410 drm_ioctl+0x648/0xb30 amdgpu_drm_ioctl+0xc8/0x160 </TASK> Allocated by task 219: kmalloc_trace+0x211/0x390 amdgpu_vram_mgr_new+0x1d6/0xbe0 ttm_resource_alloc+0xfd/0x1e0 ttm_bo_mem_space+0x255/0x940 ttm_bo_validate+0x286/0x650 ttm_bo_init_reserved+0x34c/0x490 amdgpu_bo_create+0x94b/0x1610 amdgpu_bo_create_user+0xa3/0x130 amdgpu_gem_create_ioctl+0x4bc/0xc10 drm_ioctl_kernel+0x300/0x410 drm_ioctl+0x648/0xb30 amdgpu_drm_ioctl+0xc8/0x160 Freed by task 219: kfree+0x111/0x2d0 ttm_resource_free+0x17e/0x1e0 ttm_bo_move_accel_cleanup+0x77e/0x9b0 amdgpu_move_blit+0x3db/0x670 amdgpu_bo_move+0xfa2/0x1550 ttm_bo_handle_move_mem+0x4d0/0x700 ttm_mem_evict_first+0x945/0x1230 ttm_bo_mem_space+0x6c7/0x940 ttm_bo_validate+0x286/0x650 ttm_bo_init_reserved+0x34c/0x490 amdgpu_bo_create+0x94b/0x1610 amdgpu_bo_create_user+0xa3/0x130 amdgpu_gem_create_ioctl+0x4bc/0xc10 drm_ioctl_kernel+0x300/0x410 drm_ioctl+0x648/0xb30 amdgpu_drm_ioctl+0xc8/0x160 The buggy address belongs to the object at ffff88800f5bee70 which belongs to the cache kmalloc-96 of size 96 The buggy address is located 16 bytes inside of freed 96-byte region [ffff88800f5bee70, ffff88800f5beed0) Should you need any more information, please do not hesitate to contact us. Best regards, Joonkyo Jung </div> </div> <fieldset><legend>Reporting a null-ptr-deref in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a null-ptr-deref in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>Joonkyo Jung <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-16, 04:20</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div>Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a>, <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a></td></tr></tbody></table> <div lang="x-unicode"><div dir="ltr">Hello, We would like to report a null-ptr-deref bug in the AMDGPU DRM driver in the linux kernel v6.8-rc4 that we found with our customized Syzkaller. The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession. The first ioctl amdgpu_ctx_ioctl will create a ctx, and return ctx_id = 1 to the userspace. Second ioctl, actually any ioctl that will eventually call amdgpu_ctx_get_entity, carries this ctx_id and passes the context check. Here, for example, drm_amdgpu_wait_cs. Validations in amdgpu_ctx_get_entity can also be passed by the user-provided values from the ioctl arguments. This eventually leads to drm_sched_entity_init, where the null-ptr-deref will trigger on RCU_INIT_POINTER(entity->last_scheduled, NULL); Steps to reproduce are as below. union drm_amdgpu_ctx *arg1; union drm_amdgpu_wait_cs *arg2; arg1 = malloc(sizeof(union drm_amdgpu_ctx)); arg2 = malloc(sizeof(union drm_amdgpu_wait_cs)); arg1->in.op = 0x1; ioctl(AMDGPU_renderD128_DEVICE_FILE, 0x140106442, arg1); arg2->in.handle = 0x0; arg2->in.timeout = 0x2000000000000; arg2->in.ip_type = 0x9; arg2->in.ip_instance = 0x0; arg2->in.ring = 0x0; arg2->in.ctx_id = 0x1; ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc0206449, arg2); The KASAN report is as follows: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] Call Trace: <TASK> ? drm_sched_entity_init+0x16e/0x650 ? drm_sched_entity_init+0x208/0x650 amdgpu_ctx_get_entity+0x944/0xc30 amdgpu_cs_wait_ioctl+0x13d/0x3f0 drm_ioctl_kernel+0x300/0x410 drm_ioctl+0x648/0xb30 amdgpu_drm_ioctl+0xc8/0x160 </TASK> Should you need any more information, please do not hesitate to contact us. Best regards, Joonkyo Jung </div> </div> <fieldset><legend>Reporting a use-after-free in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a use-after-free in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>정준교 <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-14, 21:34</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div><a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a>, Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a></td></tr></tbody></table> <div lang="x-unicode"><div dir="ltr">Hello, We would like to report a use-after-free bug in the AMDGPU DRM driver in the linux kernel 6.2 that we found with our customized Syzkaller. The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver, with invalid addr and size. We have tested that this bug can still be triggered in the latest RC kernel, v6.8-rc4. Steps to reproduce are as below. struct drm_amdgpu_gem_userptr *arg; arg = malloc(sizeof(struct drm_amdgpu_gem_userptr)); arg->addr = 0xffffffffffff0000; arg->size = 0x80000000; arg->flags = 0x7; ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc1186451, arg); The KASAN report is as follows: ================================================================== BUG: KASAN: use-after-free in switch_mm_irqs_off+0x89d/0xb70 Read of size 8 at addr ffff88801f17bc00 by task syz-executor/386 Call Trace: <TASK> switch_mm_irqs_off+0x89d/0xb70 __schedule+0xa62/0x2630 preempt_schedule_common+0x45/0xd0 vfree+0x4d/0x60 ttm_tt_fini+0xdf/0x1c0 amdgpu_ttm_backend_destroy+0x9f/0xe0 ttm_bo_cleanup_memtype_use+0x142/0x1f0 ttm_bo_release+0x67d/0xc00 ttm_bo_put+0x7c/0xa0 amdgpu_bo_unref+0x3b/0x80 amdgpu_gem_object_free+0x7f/0xc0 drm_gem_object_free+0x5d/0x90 amdgpu_gem_userptr_ioctl+0x452/0x7e0 drm_ioctl_kernel+0x284/0x500 drm_ioctl+0x55e/0xa50 amdgpu_drm_ioctl+0xe3/0x1d0 </TASK> Allocated by task 385: kmem_cache_alloc+0x174/0x300 copy_process+0x32d1/0x6640 kernel_clone+0xcd/0x690 Freed by task 386: kmem_cache_free+0x13b/0x550 mmu_interval_notifier_remove+0x4c8/0x610 amdgpu_hmm_unregister+0x47/0x90 amdgpu_gem_object_free+0x75/0xc0 drm_gem_object_free+0x5d/0x90 amdgpu_gem_userptr_ioctl+0x452/0x7e0 drm_ioctl_kernel+0x284/0x500 drm_ioctl+0x55e/0xa50 amdgpu_drm_ioctl+0xe3/0x1d0 The buggy address belongs to the object at ffff88801f17bb80 which belongs to the cache mm_struct of size 2016 The buggy address is located 128 bytes inside of 2016-byte region [ffff88801f17bb80, ffff88801f17c360) The buggy address belongs to the physical page: page:000000002c2a61bd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1f178 head:000000002c2a61bd order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 memcg:ffff8880141aa301 flags: 0x100000000010200(slab|head|node=0|zone=1) raw: 0100000000010200 ffff88800a44fc80 ffffea00006ca400 dead000000000004 raw: 0000000000000000 00000000800f000f 00000001ffffffff ffff8880141aa301 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801f17bb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801f17bb80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88801f17bc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801f17bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801f17bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================</div></div></pre> <p></p> </div> </blockquote></div>