<div dir="ltr">Hi Vitaly,<br><br>Thank you for looking into this issue!<br><br>We have reproduced this issue with a Radeon RX 580 (Polaris 20) passthrough-ed to a QEMU (4.0.0) VM by VFIO.<br>All bugs were reproducible on the recent 6.8-rc4 Linux kernel (<a href="https://github.com/torvalds/linux/tree/v6.8-rc4">https://github.com/torvalds/linux/tree/v6.8-rc4</a>), which I double checked right now with previous programs.<br><br>Below are the QEMU arguments used, in-VM lspci -vvv and /proc/cpuinfo.<br>Should you need any more information, please let us know.<br><br><b>QEMU arguments</b><br>qemu-system-x86_64<br>-m 2G \<br>-cpu host \<br>-kernel $KERNEL \<br>-append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \<br>-drive file=$DRIVE_FILE,format=qcow2 \<br>-enable-kvm \<br>-device vfio-pci,host=$PCI_ADDR,id=gpu,multifunction=on,x-vga=on \<br>-nographic<br><br><b>root@qemu:~# lspci -vvv</b><br>00:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 47)<br> Subsystem: Gigabyte Technology Co., Ltd Ellesmere [Radeon RX 470/480/570/570X/580/580X/59]<br> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB+<br> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <-<br> Latency: 0<br> Interrupt: pin A routed to IRQ 24<br> Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]<br> Region 2: Memory at f0000000 (64-bit, prefetchable) [size=2M]<br> Region 4: I/O ports at c000 [size=256]<br> Region 5: Memory at feb80000 (32-bit, non-prefetchable) [size=256K]<br> Expansion ROM at 000c0000 [disabled] [size=128K]<br> Capabilities: [48] Vendor Specific Information: Len=08 <?><br> Capabilities: [50] Power Management version 3<br> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)<br> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-<br> Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00<br> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited<br> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-<br> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-<br> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+<br> MaxPayload 128 bytes, MaxReadReq 512 bytes<br> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-<br> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <1us<br> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+<br> LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+<br> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-<br> LnkSta: Speed 5GT/s (downgraded), Width x4 (downgraded)<br> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-<br> DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+<br> 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, Max1<br> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-<br> FRS-<br> AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-<br> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,<br> AtomicOpsCtl: ReqEn+<br> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPha-<br> EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-<br> Retimer- 2Retimers- CrosslinkRes: unsupported<br> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+<br> Address: 00000000fee01004 Data: 0021<br> Kernel driver in use: amdgpu<br><br><b>root@snapuzz:~# cat /proc/cpuinfo</b><br>processor : 0<br>vendor_id : GenuineIntel<br>cpu family : 6<br>model : 183<br>model name : 13th Gen Intel(R) Core(TM) i9-13900K<br>stepping : 1<br>microcode : 0x1<br>cpu MHz : 2995.200<br>cache size : 16384 KB<br>physical id : 0<br>siblings : 1<br>core id : 0<br>cpu cores : 1<br>apicid : 0<br>initial apicid : 0<br>fpu : yes<br>fpu_exception : yes<br>cpuid level : 31<br>wp : yes<br>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflushs<br>vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexprioritg<br>bugs : spectre_v1 spectre_v2 spec_store_bypass mds swapgs itlb_multihit mmio_unknown eb<br>bogomips : 5990.40<br>clflush size : 64<br>cache_alignment : 64<br>address sizes : 40 bits physical, 48 bits virtual<br>power management:<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at 10:57 AM vitaly prosyak <<a href="mailto:vprosyak@amd.com">vprosyak@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<p>Hi Joonkyo,</p>
<p>Thanks for your reporting!</p>
<p>I reproduced the first issue with 'amdgpu_gem_userptr_ioctl' when
KAZAN enabled, but i could not reproduce the other two issues.</p>
<p>Could you indicate what ASIC did you use to reproduce the the
issue?</p>
<p>Could you provide details of your system?</p>
<p>Much appreciated for your responce.</p>
<p>Vitaly<br>
</p>
<p><br>
</p>
<p>I placed your findings below to keep the context and track the
issues.<br>
</p>
<p>===========================================================================================================================<br>
</p>
<pre><fieldset><legend>Reporting a slab-use-after-free in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a slab-use-after-free in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>Joonkyo Jung <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-16, 04:22</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div><a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a>, Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a></td></tr></tbody></table>
<div lang="x-unicode"><div dir="ltr">Hello,
We
would like to report a slab-use-after-free bug in the AMDGPU DRM driver
in the linux kernel v6.8-rc4 that we found with our customized
Syzkaller.
The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession.
In amdgpu_bo_move, struct ttm_resource *old_mem = bo->resource is assigned.
As you can see on the alloc & free stack calls, on the same function amdgpu_bo_move,
amdgpu_move_blit in the end frees bo->resource at ttm_bo_move_accel_cleanup with ttm_bo_wait_free_node(bo, man->use_tt).
But
amdgpu_bo_move continues after that, reaching trace_amdgpu_bo_move(abo,
new_mem->mem_type, old_mem->mem_type) at the end, causing the
use-after-free bug.
Steps to reproduce are as below.
union drm_amdgpu_gem_create *arg1;
arg1 = malloc(sizeof(union drm_amdgpu_gem_create));
arg1->in.bo_size = 0x8;
arg1->in.alignment = 0x0;
arg1->in.domains = 0x4;
arg1->in.domain_flags = 0x9;
ioctl(fd, 0xc0206440, arg1);
arg1->in.bo_size = 0x7fffffff;
arg1->in.alignment = 0x0;
arg1->in.domains = 0x4;
arg1->in.domain_flags = 0x9;
ioctl(fd, 0xc0206440, arg1);
The KASAN report is as follows:
==================================================================
BUG: KASAN: slab-use-after-free in amdgpu_bo_move+0x1479/0x1550
Read of size 4 at addr ffff88800f5bee80 by task syz-executor/219
Call Trace:
<TASK>
amdgpu_bo_move+0x1479/0x1550
ttm_bo_handle_move_mem+0x4d0/0x700
ttm_mem_evict_first+0x945/0x1230
ttm_bo_mem_space+0x6c7/0x940
ttm_bo_validate+0x286/0x650
ttm_bo_init_reserved+0x34c/0x490
amdgpu_bo_create+0x94b/0x1610
amdgpu_bo_create_user+0xa3/0x130
amdgpu_gem_create_ioctl+0x4bc/0xc10
drm_ioctl_kernel+0x300/0x410
drm_ioctl+0x648/0xb30
amdgpu_drm_ioctl+0xc8/0x160
</TASK>
Allocated by task 219:
kmalloc_trace+0x211/0x390
amdgpu_vram_mgr_new+0x1d6/0xbe0
ttm_resource_alloc+0xfd/0x1e0
ttm_bo_mem_space+0x255/0x940
ttm_bo_validate+0x286/0x650
ttm_bo_init_reserved+0x34c/0x490
amdgpu_bo_create+0x94b/0x1610
amdgpu_bo_create_user+0xa3/0x130
amdgpu_gem_create_ioctl+0x4bc/0xc10
drm_ioctl_kernel+0x300/0x410
drm_ioctl+0x648/0xb30
amdgpu_drm_ioctl+0xc8/0x160
Freed by task 219:
kfree+0x111/0x2d0
ttm_resource_free+0x17e/0x1e0
ttm_bo_move_accel_cleanup+0x77e/0x9b0
amdgpu_move_blit+0x3db/0x670
amdgpu_bo_move+0xfa2/0x1550
ttm_bo_handle_move_mem+0x4d0/0x700
ttm_mem_evict_first+0x945/0x1230
ttm_bo_mem_space+0x6c7/0x940
ttm_bo_validate+0x286/0x650
ttm_bo_init_reserved+0x34c/0x490
amdgpu_bo_create+0x94b/0x1610
amdgpu_bo_create_user+0xa3/0x130
amdgpu_gem_create_ioctl+0x4bc/0xc10
drm_ioctl_kernel+0x300/0x410
drm_ioctl+0x648/0xb30
amdgpu_drm_ioctl+0xc8/0x160
The buggy address belongs to the object at ffff88800f5bee70
which belongs to the cache kmalloc-96 of size 96
The buggy address is located 16 bytes inside of
freed 96-byte region [ffff88800f5bee70, ffff88800f5beed0)
Should you need any more information, please do not hesitate to contact us.
Best regards,
Joonkyo Jung
</div>
</div>
<fieldset><legend>Reporting a null-ptr-deref in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a null-ptr-deref in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>Joonkyo Jung <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-16, 04:20</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div>Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a>, <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a></td></tr></tbody></table>
<div lang="x-unicode"><div dir="ltr">Hello,
We
would like to report a null-ptr-deref bug in the AMDGPU DRM driver in
the linux kernel v6.8-rc4 that we found with our customized Syzkaller.
The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession.
The first ioctl amdgpu_ctx_ioctl will create a ctx, and return ctx_id = 1 to the userspace.
Second
ioctl, actually any ioctl that will eventually call
amdgpu_ctx_get_entity, carries this ctx_id and passes the context check.
Here, for example, drm_amdgpu_wait_cs.
Validations in amdgpu_ctx_get_entity can also be passed by the user-provided values from the ioctl arguments.
This
eventually leads to drm_sched_entity_init, where the null-ptr-deref
will trigger on RCU_INIT_POINTER(entity->last_scheduled, NULL);
Steps to reproduce are as below.
union drm_amdgpu_ctx *arg1;
union drm_amdgpu_wait_cs *arg2;
arg1 = malloc(sizeof(union drm_amdgpu_ctx));
arg2 = malloc(sizeof(union drm_amdgpu_wait_cs));
arg1->in.op = 0x1;
ioctl(AMDGPU_renderD128_DEVICE_FILE, 0x140106442, arg1);
arg2->in.handle = 0x0;
arg2->in.timeout = 0x2000000000000;
arg2->in.ip_type = 0x9;
arg2->in.ip_instance = 0x0;
arg2->in.ring = 0x0;
arg2->in.ctx_id = 0x1;
ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc0206449, arg2);
The KASAN report is as follows:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
Call Trace:
<TASK>
? drm_sched_entity_init+0x16e/0x650
? drm_sched_entity_init+0x208/0x650
amdgpu_ctx_get_entity+0x944/0xc30
amdgpu_cs_wait_ioctl+0x13d/0x3f0
drm_ioctl_kernel+0x300/0x410
drm_ioctl+0x648/0xb30
amdgpu_drm_ioctl+0xc8/0x160
</TASK>
Should you need any more information, please do not hesitate to contact us.
Best regards,
Joonkyo Jung
</div>
</div>
<fieldset><legend>Reporting a use-after-free in amdgpu.eml</legend></fieldset><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">Subject: </div>Reporting a use-after-free in amdgpu</td></tr><tr><td><div style="display:inline">From: </div>정준교 <a href="mailto:joonkyoj@yonsei.ac.kr" target="_blank"><joonkyoj@yonsei.ac.kr></a></td></tr><tr><td><div style="display:inline">Date: </div>2024-02-14, 21:34</td></tr></tbody></table><table width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><div style="display:inline">To: </div><a href="mailto:alexander.deucher@amd.com" target="_blank">alexander.deucher@amd.com</a>, <a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>, <a href="mailto:Xinhui.Pan@amd.com" target="_blank">Xinhui.Pan@amd.com</a></td></tr><tr><td><div style="display:inline">CC: </div><a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a>, Dokyung Song <a href="mailto:dokyungs@yonsei.ac.kr" target="_blank"><dokyungs@yonsei.ac.kr></a>, <a href="mailto:jisoo.jang@yonsei.ac.kr" target="_blank">jisoo.jang@yonsei.ac.kr</a>, <a href="mailto:yw9865@yonsei.ac.kr" target="_blank">yw9865@yonsei.ac.kr</a></td></tr></tbody></table>
<div lang="x-unicode"><div dir="ltr">Hello,
We
would like to report a use-after-free bug in the AMDGPU DRM driver in
the linux kernel 6.2 that we found with our customized Syzkaller.
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver, with invalid addr and size.
We have tested that this bug can still be triggered in the latest RC kernel, v6.8-rc4.
Steps to reproduce are as below.
struct drm_amdgpu_gem_userptr *arg;
arg = malloc(sizeof(struct drm_amdgpu_gem_userptr));
arg->addr = 0xffffffffffff0000;
arg->size = 0x80000000;
arg->flags = 0x7;
ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc1186451, arg);
The KASAN report is as follows:
==================================================================
BUG: KASAN: use-after-free in switch_mm_irqs_off+0x89d/0xb70
Read of size 8 at addr ffff88801f17bc00 by task syz-executor/386
Call Trace:
<TASK>
switch_mm_irqs_off+0x89d/0xb70
__schedule+0xa62/0x2630
preempt_schedule_common+0x45/0xd0
vfree+0x4d/0x60
ttm_tt_fini+0xdf/0x1c0
amdgpu_ttm_backend_destroy+0x9f/0xe0
ttm_bo_cleanup_memtype_use+0x142/0x1f0
ttm_bo_release+0x67d/0xc00
ttm_bo_put+0x7c/0xa0
amdgpu_bo_unref+0x3b/0x80
amdgpu_gem_object_free+0x7f/0xc0
drm_gem_object_free+0x5d/0x90
amdgpu_gem_userptr_ioctl+0x452/0x7e0
drm_ioctl_kernel+0x284/0x500
drm_ioctl+0x55e/0xa50
amdgpu_drm_ioctl+0xe3/0x1d0
</TASK>
Allocated by task 385:
kmem_cache_alloc+0x174/0x300
copy_process+0x32d1/0x6640
kernel_clone+0xcd/0x690
Freed by task 386:
kmem_cache_free+0x13b/0x550
mmu_interval_notifier_remove+0x4c8/0x610
amdgpu_hmm_unregister+0x47/0x90
amdgpu_gem_object_free+0x75/0xc0
drm_gem_object_free+0x5d/0x90
amdgpu_gem_userptr_ioctl+0x452/0x7e0
drm_ioctl_kernel+0x284/0x500
drm_ioctl+0x55e/0xa50
amdgpu_drm_ioctl+0xe3/0x1d0
The buggy address belongs to the object at ffff88801f17bb80
which belongs to the cache mm_struct of size 2016
The buggy address is located 128 bytes inside of
2016-byte region [ffff88801f17bb80, ffff88801f17c360)
The buggy address belongs to the physical page:
page:000000002c2a61bd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1f178
head:000000002c2a61bd order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0
memcg:ffff8880141aa301
flags: 0x100000000010200(slab|head|node=0|zone=1)
raw: 0100000000010200 ffff88800a44fc80 ffffea00006ca400 dead000000000004
raw: 0000000000000000 00000000800f000f 00000001ffffffff ffff8880141aa301
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88801f17bb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88801f17bb80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88801f17bc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88801f17bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88801f17bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================</div></div></pre>
<p></p>
</div>
</blockquote></div>