Reporting a use-after-free in amdgpu and null-ptr-deref
Joonkyo Jung
joonkyoj at yonsei.ac.kr
Thu Feb 22 02:42:56 UTC 2024
Hi Vitaly,
Thank you for looking into this issue!
We have reproduced this issue with a Radeon RX 580 (Polaris 20)
passthrough-ed to a QEMU (4.0.0) VM by VFIO.
All bugs were reproducible on the recent 6.8-rc4 Linux kernel (
https://github.com/torvalds/linux/tree/v6.8-rc4), which I double checked
right now with previous programs.
Below are the QEMU arguments used, in-VM lspci -vvv and /proc/cpuinfo.
Should you need any more information, please let us know.
*QEMU arguments*
qemu-system-x86_64
-m 2G \
-cpu host \
-kernel $KERNEL \
-append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
-drive file=$DRIVE_FILE,format=qcow2 \
-enable-kvm \
-device vfio-pci,host=$PCI_ADDR,id=gpu,multifunction=on,x-vga=on \
-nographic
*root at qemu:~# lspci -vvv*
00:03.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 47)
Subsystem: Gigabyte Technology Co., Ltd Ellesmere [Radeon RX
470/480/570/570X/580/580X/59]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <-
Latency: 0
Interrupt: pin A routed to IRQ 24
Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at f0000000 (64-bit, prefetchable) [size=2M]
Region 4: I/O ports at c000 [size=256]
Region 5: Memory at feb80000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr-
TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit
Latency L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x4 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported,
ExtFmt+ EETLPPrefix+, Max1
EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+
OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkSta2: Current De-emphasis Level: -3.5dB,
EqualizationComplete- EqualizationPha-
EqualizationPhase2- EqualizationPhase3-
LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee01004 Data: 0021
Kernel driver in use: amdgpu
*root at snapuzz:~# cat /proc/cpuinfo*
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 183
model name : 13th Gen Intel(R) Core(TM) i9-13900K
stepping : 1
microcode : 0x1
cpu MHz : 2995.200
cache size : 16384 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 31
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflushs
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only
ept_ad ept_1gb flexprioritg
bugs : spectre_v1 spectre_v2 spec_store_bypass mds swapgs
itlb_multihit mmio_unknown eb
bogomips : 5990.40
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
On Thu, Feb 22, 2024 at 10:57 AM vitaly prosyak <vprosyak at amd.com> wrote:
> Hi Joonkyo,
>
> Thanks for your reporting!
>
> I reproduced the first issue with 'amdgpu_gem_userptr_ioctl' when KAZAN
> enabled, but i could not reproduce the other two issues.
>
> Could you indicate what ASIC did you use to reproduce the the issue?
>
> Could you provide details of your system?
>
> Much appreciated for your responce.
>
> Vitaly
>
>
> I placed your findings below to keep the context and track the issues.
>
>
> ===========================================================================================================================
>
> Reporting a slab-use-after-free in amdgpu.eml
>
> Subject:
> Reporting a slab-use-after-free in amdgpu
>
> From:
> Joonkyo Jung <joonkyoj at yonsei.ac.kr> <joonkyoj at yonsei.ac.kr>
>
> Date:
> 2024-02-16, 04:22
>
> To:
> alexander.deucher at amd.com, christian.koenig at amd.com, Xinhui.Pan at amd.com
>
> CC:
> amd-gfx at lists.freedesktop.org, Dokyung Song <dokyungs at yonsei.ac.kr> <dokyungs at yonsei.ac.kr>, jisoo.jang at yonsei.ac.kr, yw9865 at yonsei.ac.kr
>
> Hello,
>
> We
> would like to report a slab-use-after-free bug in the AMDGPU DRM driver
> in the linux kernel v6.8-rc4 that we found with our customized
> Syzkaller.
> The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession.
>
> In amdgpu_bo_move, struct ttm_resource *old_mem = bo->resource is assigned.
> As you can see on the alloc & free stack calls, on the same function amdgpu_bo_move,
> amdgpu_move_blit in the end frees bo->resource at ttm_bo_move_accel_cleanup with ttm_bo_wait_free_node(bo, man->use_tt).
> But
> amdgpu_bo_move continues after that, reaching trace_amdgpu_bo_move(abo,
> new_mem->mem_type, old_mem->mem_type) at the end, causing the
> use-after-free bug.
>
> Steps to reproduce are as below.
> union drm_amdgpu_gem_create *arg1;
>
> arg1 = malloc(sizeof(union drm_amdgpu_gem_create));
> arg1->in.bo_size = 0x8;
> arg1->in.alignment = 0x0;
> arg1->in.domains = 0x4;
> arg1->in.domain_flags = 0x9;
> ioctl(fd, 0xc0206440, arg1);
>
> arg1->in.bo_size = 0x7fffffff;
> arg1->in.alignment = 0x0;
> arg1->in.domains = 0x4;
> arg1->in.domain_flags = 0x9;
> ioctl(fd, 0xc0206440, arg1);
>
> The KASAN report is as follows:
> ==================================================================
> BUG: KASAN: slab-use-after-free in amdgpu_bo_move+0x1479/0x1550
> Read of size 4 at addr ffff88800f5bee80 by task syz-executor/219
> Call Trace:
> <TASK>
> amdgpu_bo_move+0x1479/0x1550
> ttm_bo_handle_move_mem+0x4d0/0x700
> ttm_mem_evict_first+0x945/0x1230
> ttm_bo_mem_space+0x6c7/0x940
> ttm_bo_validate+0x286/0x650
> ttm_bo_init_reserved+0x34c/0x490
> amdgpu_bo_create+0x94b/0x1610
> amdgpu_bo_create_user+0xa3/0x130
> amdgpu_gem_create_ioctl+0x4bc/0xc10
> drm_ioctl_kernel+0x300/0x410
> drm_ioctl+0x648/0xb30
> amdgpu_drm_ioctl+0xc8/0x160
> </TASK>
>
> Allocated by task 219:
> kmalloc_trace+0x211/0x390
> amdgpu_vram_mgr_new+0x1d6/0xbe0
> ttm_resource_alloc+0xfd/0x1e0
> ttm_bo_mem_space+0x255/0x940
> ttm_bo_validate+0x286/0x650
> ttm_bo_init_reserved+0x34c/0x490
> amdgpu_bo_create+0x94b/0x1610
> amdgpu_bo_create_user+0xa3/0x130
> amdgpu_gem_create_ioctl+0x4bc/0xc10
> drm_ioctl_kernel+0x300/0x410
> drm_ioctl+0x648/0xb30
> amdgpu_drm_ioctl+0xc8/0x160
>
> Freed by task 219:
> kfree+0x111/0x2d0
> ttm_resource_free+0x17e/0x1e0
> ttm_bo_move_accel_cleanup+0x77e/0x9b0
> amdgpu_move_blit+0x3db/0x670
> amdgpu_bo_move+0xfa2/0x1550
> ttm_bo_handle_move_mem+0x4d0/0x700
> ttm_mem_evict_first+0x945/0x1230
> ttm_bo_mem_space+0x6c7/0x940
> ttm_bo_validate+0x286/0x650
> ttm_bo_init_reserved+0x34c/0x490
> amdgpu_bo_create+0x94b/0x1610
> amdgpu_bo_create_user+0xa3/0x130
> amdgpu_gem_create_ioctl+0x4bc/0xc10
> drm_ioctl_kernel+0x300/0x410
> drm_ioctl+0x648/0xb30
> amdgpu_drm_ioctl+0xc8/0x160
>
> The buggy address belongs to the object at ffff88800f5bee70
> which belongs to the cache kmalloc-96 of size 96
> The buggy address is located 16 bytes inside of
> freed 96-byte region [ffff88800f5bee70, ffff88800f5beed0)
>
> Should you need any more information, please do not hesitate to contact us.
>
> Best regards,
> Joonkyo Jung
>
> Reporting a null-ptr-deref in amdgpu.eml
>
> Subject:
> Reporting a null-ptr-deref in amdgpu
>
> From:
> Joonkyo Jung <joonkyoj at yonsei.ac.kr> <joonkyoj at yonsei.ac.kr>
>
> Date:
> 2024-02-16, 04:20
>
> To:
> alexander.deucher at amd.com, christian.koenig at amd.com, Xinhui.Pan at amd.com
>
> CC:
> Dokyung Song <dokyungs at yonsei.ac.kr> <dokyungs at yonsei.ac.kr>, jisoo.jang at yonsei.ac.kr, yw9865 at yonsei.ac.kr, amd-gfx at lists.freedesktop.org
>
> Hello,
>
> We
> would like to report a null-ptr-deref bug in the AMDGPU DRM driver in
> the linux kernel v6.8-rc4 that we found with our customized Syzkaller.
> The bug can be triggered by sending two ioctls to the AMDGPU DRM driver in succession.
>
> The first ioctl amdgpu_ctx_ioctl will create a ctx, and return ctx_id = 1 to the userspace.
> Second
> ioctl, actually any ioctl that will eventually call
> amdgpu_ctx_get_entity, carries this ctx_id and passes the context check.
> Here, for example, drm_amdgpu_wait_cs.
> Validations in amdgpu_ctx_get_entity can also be passed by the user-provided values from the ioctl arguments.
> This
> eventually leads to drm_sched_entity_init, where the null-ptr-deref
> will trigger on RCU_INIT_POINTER(entity->last_scheduled, NULL);
>
> Steps to reproduce are as below.
> union drm_amdgpu_ctx *arg1;
> union drm_amdgpu_wait_cs *arg2;
>
> arg1 = malloc(sizeof(union drm_amdgpu_ctx));
> arg2 = malloc(sizeof(union drm_amdgpu_wait_cs));
>
> arg1->in.op = 0x1;
> ioctl(AMDGPU_renderD128_DEVICE_FILE, 0x140106442, arg1);
>
> arg2->in.handle = 0x0;
> arg2->in.timeout = 0x2000000000000;
> arg2->in.ip_type = 0x9;
> arg2->in.ip_instance = 0x0;
> arg2->in.ring = 0x0;
> arg2->in.ctx_id = 0x1;
> ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc0206449, arg2);
>
> The KASAN report is as follows:
> general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> Call Trace:
> <TASK>
> ? drm_sched_entity_init+0x16e/0x650
> ? drm_sched_entity_init+0x208/0x650
> amdgpu_ctx_get_entity+0x944/0xc30
> amdgpu_cs_wait_ioctl+0x13d/0x3f0
> drm_ioctl_kernel+0x300/0x410
> drm_ioctl+0x648/0xb30
> amdgpu_drm_ioctl+0xc8/0x160
> </TASK>
>
> Should you need any more information, please do not hesitate to contact us.
>
> Best regards,
> Joonkyo Jung
>
> Reporting a use-after-free in amdgpu.eml
>
> Subject:
> Reporting a use-after-free in amdgpu
>
> From:
> 정준교 <joonkyoj at yonsei.ac.kr> <joonkyoj at yonsei.ac.kr>
>
> Date:
> 2024-02-14, 21:34
>
> To:
> alexander.deucher at amd.com, christian.koenig at amd.com, Xinhui.Pan at amd.com
>
> CC:
> amd-gfx at lists.freedesktop.org, Dokyung Song <dokyungs at yonsei.ac.kr> <dokyungs at yonsei.ac.kr>, jisoo.jang at yonsei.ac.kr, yw9865 at yonsei.ac.kr
>
> Hello,
>
> We
> would like to report a use-after-free bug in the AMDGPU DRM driver in
> the linux kernel 6.2 that we found with our customized Syzkaller.
> The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver, with invalid addr and size.
> We have tested that this bug can still be triggered in the latest RC kernel, v6.8-rc4.
>
> Steps to reproduce are as below.
>
> struct drm_amdgpu_gem_userptr *arg;
> arg = malloc(sizeof(struct drm_amdgpu_gem_userptr));
> arg->addr = 0xffffffffffff0000;
> arg->size = 0x80000000;
> arg->flags = 0x7;
> ioctl(AMDGPU_renderD128_DEVICE_FILE, 0xc1186451, arg);
>
> The KASAN report is as follows:
> ==================================================================
> BUG: KASAN: use-after-free in switch_mm_irqs_off+0x89d/0xb70
> Read of size 8 at addr ffff88801f17bc00 by task syz-executor/386
> Call Trace:
> <TASK>
> switch_mm_irqs_off+0x89d/0xb70
> __schedule+0xa62/0x2630
> preempt_schedule_common+0x45/0xd0
> vfree+0x4d/0x60
> ttm_tt_fini+0xdf/0x1c0
> amdgpu_ttm_backend_destroy+0x9f/0xe0
> ttm_bo_cleanup_memtype_use+0x142/0x1f0
> ttm_bo_release+0x67d/0xc00
> ttm_bo_put+0x7c/0xa0
> amdgpu_bo_unref+0x3b/0x80
> amdgpu_gem_object_free+0x7f/0xc0
> drm_gem_object_free+0x5d/0x90
> amdgpu_gem_userptr_ioctl+0x452/0x7e0
> drm_ioctl_kernel+0x284/0x500
> drm_ioctl+0x55e/0xa50
> amdgpu_drm_ioctl+0xe3/0x1d0
> </TASK>
>
> Allocated by task 385:
> kmem_cache_alloc+0x174/0x300
> copy_process+0x32d1/0x6640
> kernel_clone+0xcd/0x690
>
> Freed by task 386:
> kmem_cache_free+0x13b/0x550
> mmu_interval_notifier_remove+0x4c8/0x610
> amdgpu_hmm_unregister+0x47/0x90
> amdgpu_gem_object_free+0x75/0xc0
> drm_gem_object_free+0x5d/0x90
> amdgpu_gem_userptr_ioctl+0x452/0x7e0
> drm_ioctl_kernel+0x284/0x500
> drm_ioctl+0x55e/0xa50
> amdgpu_drm_ioctl+0xe3/0x1d0
>
> The buggy address belongs to the object at ffff88801f17bb80
> which belongs to the cache mm_struct of size 2016
> The buggy address is located 128 bytes inside of
> 2016-byte region [ffff88801f17bb80, ffff88801f17c360)
>
> The buggy address belongs to the physical page:
> page:000000002c2a61bd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1f178
> head:000000002c2a61bd order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0
> memcg:ffff8880141aa301
> flags: 0x100000000010200(slab|head|node=0|zone=1)
> raw: 0100000000010200 ffff88800a44fc80 ffffea00006ca400 dead000000000004
> raw: 0000000000000000 00000000800f000f 00000001ffffffff ffff8880141aa301
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff88801f17bb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88801f17bb80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88801f17bc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88801f17bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88801f17bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240222/8d5574a0/attachment-0001.htm>
More information about the amd-gfx
mailing list