[PATCH] dmabuf: fix dmabuf file poll uaf issue

zhiguojiang justinjiang at vivo.com
Fri Apr 12 06:19:50 UTC 2024



在 2024/4/3 2:22, T.J. Mercier 写道:
> [你通常不会收到来自 tjmercier at google.com 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification,以了解这一点为什么很重要]
>
> On Tue, Apr 2, 2024 at 1:08 AM Christian König <christian.koenig at amd.com> wrote:
>> Am 02.04.24 um 08:49 schrieb zhiguojiang:
>>>> As far as I can see that's not because of the DMA-buf code, but
>>>> because you are somehow using this interface incorrectly.
>>>>
>>>> When dma_buf_poll() is called it is mandatory for the caller to hold
>>>> a reference to the file descriptor on which the poll operation is
>>>> executed.
>>>>
>>>> So adding code like "if (!file_count(file))" in the beginning of
>>>> dma_buf_poll() is certainly broken.
>>>>
>>>> My best guess is that you have some unbalanced
>>>> dma_buf_get()/dma_buf_put() somewhere instead.
>>>>
>>>>
>>> Hi Christian,
>>>
>>> The kernel dma_buf_poll() code shound not cause system crashes due to
>>> the user mode usage logical issues ?
>> What user mode logical issues are you talking about? Closing a file
>> while polling on it is perfectly valid.
>>
>> dma_buf_poll() is called by the filesystem layer and it's the filesystem
>> layer which should make sure that a file can't be closed while polling
>> for an event.
>>
>> If that doesn't work then you have stumbled over a massive bug in the fs
>> layer. And I have some doubts that this is actually the case.
>>
>> What is more likely is that some driver messes up the reference count
>> and because of this you see an UAF.
>>
>> Anyway as far as I can see the DMA-buf code is correct regarding this.
>>
>> Regards,
>> Christian.
> I tried to reproduce this problem but I couldn't. What I see is a ref
> get taken when poll is first called. So subsequently closing the fd in
> userspace while it's being polled doesn't take the refcount all the
> way to 0. That happens when dma_buf_poll_cb fires, either due to an
> event or when the fd is closed upon timeout.
>
> I don't really see how this could be triggered from userspace so I am
> also suspicious of dma_buf_get/put.
Hi,

Panic signature:

 > list_del corruption, ffffff8a6f143a90->next is LIST_POISON1
 > (dead000000000100)
 > ------------[ cut here ]------------
 > kernel BUG at lib/list_debug.c:55!
 > Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
 > pc : __list_del_entry_valid+0x98/0xd4
 > lr : __list_del_entry_valid+0x98/0xd4
 > sp : ffffffc01d413d00
 > x29: ffffffc01d413d00 x28: 00000000000000c0 x27: 0000000000000020
 > x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000080007
 > x23: ffffff8b22e5dcc0 x22: ffffff88a6be12d0 x21: ffffff8b22e572b0
 > x20: ffffff80254ed0a0 x19: ffffff8a6f143a00 x18: ffffffda5efed3c0
 > x17: 6165642820314e4f x16: 53494f505f545349 x15: 4c20736920747865
 > x14: 6e3e2d3039613334 x13: 2930303130303030 x12: 0000000000000018
 > x11: ffffff8b6c188000 x10: 00000000ffffffff x9 : 6c8413a194897b00
 > x8 : 6c8413a194897b00 x7 : 74707572726f6320 x6 : 6c65645f7473696c
 > x5 : ffffff8b6c3b2a3e x4 : ffffff8b6c3b2a40 x3 : ffff103000001005
 > x2 : 0000000000000001 x1 : 00000000000000c0 x0 : 000000000000004e
 > Call trace:
 >  __list_del_entry_valid+0x98/0xd4
 >  dma_buf_file_release+0x48/0x90
 >  __fput+0xf4/0x280
 >  ____fput+0x10/0x20
 >  task_work_run+0xcc/0xf4
 >  do_notify_resume+0x2a0/0x33c
 >  el0_svc+0x5c/0xa4
 >  el0t_64_sync_handler+0x68/0xb4
 >  el0t_64_sync+0x1a0/0x1a4

It is caused that the same dma buf file released twice in short time, so 
it should be race issue of dma buf release.

Below ftrace log shows the procedure how dma buf release twice:

Line 715473:        android.display-2220  [006] 22160.660738: bprint:    
__fget_files __fget: [file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x3] call:(('do_epoll_ctl', 
356)ffffffd4ad46411c<-('__arm64_sys_epoll_ctl', 
112)ffffffd4ad463dd8<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44<-('el0_svc', 
36)ffffffd4ae57ffcc<-('el0t_64_sync_handler', 136)ffffffd4ae57ff44)

Line 715475:        android.display-2220  [006] 22160.660739: bprint:    
get_file get_file[file: 0xffffff8ab9e57b80, dmabuf: 0xffffff8b1baa6a00, 
f_count: 0x4] call:(('dma_buf_poll', 760)ffffffd4adb685c8<-('ep_insert', 
1120)ffffffd4ad464bcc<-('do_epoll_ctl', 
1196)ffffffd4ad464464<-('__arm64_sys_epoll_ctl', 
112)ffffffd4ad463dd8<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44)

Line 715477:        android.display-2220  [006] 22160.660740: bprint:    
fput_many fput for dma buf file[file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x4] call:(('dma_buf_poll', 
1104)ffffffd4adb68720<-('ep_insert', 
1120)ffffffd4ad464bcc<-('do_epoll_ctl', 
1196)ffffffd4ad464464<-('__arm64_sys_epoll_ctl', 
112)ffffffd4ad463dd8<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44)

Line 715479:        android.display-2220  [006] 22160.660741: bprint:    
fput_many fput for dma buf file[file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x3] call:(('do_epoll_ctl', 
652)ffffffd4ad464244<-('__arm64_sys_epoll_ctl', 
112)ffffffd4ad463dd8<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44<-('el0_svc', 
36)ffffffd4ae57ffcc<-('el0t_64_sync_handler', 136)ffffffd4ae57ff44)

-> Here task 2220 do epoll for dma_buf file twice, and the fget/fput 
match. After this the file refcount is 2.

Line 716521:        RenderThread-3470  [005] 22160.664236: bprint:    
fput_many fput for dma buf file[file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x2] call:(('vm_area_free_no_check', 
140)ffffffd4acf5021c<-('__do_munmap', 
1572)ffffffd4ad306eb8<-('__vm_munmap', 
216)ffffffd4ad3095b4<-('__arm64_sys_munmap', 
68)ffffffd4ad3094c4<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44)

Line 716525:        RenderThread-3470  [005] 22160.664243: bprint:    
fput_many fput for dma buf file[file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x1] call:(('close_fd', 
376)ffffffd4ad40b404<-('__arm64_sys_close', 
24)ffffffd4ad3bfb1c<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44<-('el0_svc', 
36)ffffffd4ae57ffcc<-('el0t_64_sync_handler', 136)ffffffd4ae57ff44)

Line 716527:        RenderThread-3470  [005] 22160.664244: bprint:    
fput_many fput for dma buf file ret: 0, [file: 0xffffff8ab9e57b80, 
dmabuf: 0xffffff8b1baa6a00] start to free

-> Here task3470 do unmap and close(fd) then decrease the file count to 
zero. Then start to free file buf.

Line 716566:        android.display-2220  [006] 22160.664424: bprint:    
get_file get_file[file: 0xffffff8ab9e57b80, dmabuf: 0xffffff8b1baa6a00, 
f_count: 0x1] call:(('dma_buf_poll', 
760)ffffffd4adb685c8<-('do_epoll_wait', 
1020)ffffffd4ad46229c<-('do_epoll_pwait', 
84)ffffffd4ad463b70<-('__arm64_sys_epoll_pwait', 
276)ffffffd4ad463d1c<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44)

Line 716568:        android.display-2220  [006] 22160.664425: bprint:    
fput_many fput for dma buf file[file: 0xffffff8ab9e57b80, dmabuf: 
0xffffff8b1baa6a00, count: 0x1] call:(('dma_buf_poll', 
1104)ffffffd4adb68720<-('do_epoll_wait', 
1020)ffffffd4ad46229c<-('do_epoll_pwait', 
84)ffffffd4ad463b70<-('__arm64_sys_epoll_pwait', 
276)ffffffd4ad463d1c<-('invoke_syscall', 
96)ffffffd4acebe00c<-('el0_svc_common', 
140)ffffffd4acebdf40<-('do_el0_svc', 40)ffffffd4acebde44)

Line 716570:        android.display-2220  [006] 22160.664427: bprint:    
fput_many fput for dma buf file ret: 0, [file: 0xffffff8ab9e57b80, 
dmabuf: 0xffffff8b1baa6a00] start to free

-> Here task 2220 do epoll again where internally it will get/put then 
start to free twice and lead to final crash.

Here is the basic flow:

1. Thread A install the dma_buf_fd via dma_buf_export, the fd refcount is 1

2. Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)

3. After use the dma buf, Thread A close the fd, then here fd refcount is 0,
   and it will run __fput finally to release the file

4. Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it still 
resides in one epoll_list.
   Although __fput will call eventpoll_release to remove the file from 
binded epoll list,
   but it has small time window where Thread B jumps in.

5. During the small window, Thread B do the poll action for dma_buf_fd, 
where it will fget/fput for the file,
   this means the fd refcount will be 0 -> 1 -> 0, and it will call 
__fput again.
   This will lead to __fput twice for the same file.

6. So the potenial fix is use get_file_rcu which to check if file 
refcount already zero which means under free.
   If so, we just return and no need to do the dma_buf_poll.

Here is the race condition:

Thread A Thread B
dma_buf_export
fd_refcount is 1
epoll_ctl(EPOLL_ADD)
add dma_buf_fd to epoll list
close(dma_buf_fd)
fd_refcount is 0
__fput
dma_buf_poll
fget
fput
fd_refcount is zero again

Thanks



More information about the dri-devel mailing list