[PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

Christian König christian.koenig at amd.com
Thu May 22 11:57:45 UTC 2025


On 5/22/25 10:02, wangtao wrote:
>> -----Original Message-----
>> From: Christian König <christian.koenig at amd.com>
>> Sent: Wednesday, May 21, 2025 7:57 PM
>> To: wangtao <tao.wangtao at honor.com>; T.J. Mercier
>> <tjmercier at google.com>
>> Cc: sumit.semwal at linaro.org; benjamin.gaignard at collabora.com;
>> Brian.Starkey at arm.com; jstultz at google.com; linux-media at vger.kernel.org;
>> dri-devel at lists.freedesktop.org; linaro-mm-sig at lists.linaro.org; linux-
>> kernel at vger.kernel.org; wangbintian(BintianWang)
>> <bintian.wang at honor.com>; yipengxiang <yipengxiang at honor.com>; liulu
>> 00013167 <liulu.liu at honor.com>; hanfeng 00012985 <feng.han at honor.com>;
>> amir73il at gmail.com
>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
>> DMA_BUF_IOCTL_RW_FILE for system_heap
>>
>> On 5/21/25 12:25, wangtao wrote:
>>> [wangtao] I previously explained that
>>> read/sendfile/splice/copy_file_range
>>> syscalls can't achieve dmabuf direct IO zero-copy.
>>
>> And why can't you work on improving those syscalls instead of creating a new
>> IOCTL?
>>
> [wangtao] As I mentioned in previous emails, these syscalls cannot
> achieve dmabuf zero-copy due to technical constraints.

Yeah, and why can't you work on removing those technical constrains?

What is blocking you from improving the sendfile system call or proposing a patch to remove the copy_file_range restrictions?

Regards,
Christian.

 Could you
> specify the technical points, code, or principles that need
> optimization? 
> 
> Let me explain again why these syscalls can't work:
> 1. read() syscall
>    - dmabuf fops lacks read callback implementation. Even if implemented,
>      file_fd info cannot be transferred
>    - read(file_fd, dmabuf_ptr, len) with remap_pfn_range-based mmap
>      cannot access dmabuf_buf pages, forcing buffer-mode reads
> 
> 2. sendfile() syscall
>    - Requires CPU copy from page cache to memory file(tmpfs/shmem):
>      [DISK] --DMA--> [page cache] --CPU copy--> [MEMORY file]
>    - CPU overhead (both buffer/direct modes involve copies):
>      55.08% do_sendfile
>     |- 55.08% do_splice_direct
>     |-|- 55.08% splice_direct_to_actor
>     |-|-|- 22.51% copy_splice_read
>     |-|-|-|- 16.57% f2fs_file_read_iter
>     |-|-|-|-|- 15.12% __iomap_dio_rw
>     |-|-|- 32.33% direct_splice_actor
>     |-|-|-|- 32.11% iter_file_splice_write
>     |-|-|-|-|- 28.42% vfs_iter_write
>     |-|-|-|-|-|- 28.42% do_iter_write
>     |-|-|-|-|-|-|- 28.39% shmem_file_write_iter
>     |-|-|-|-|-|-|-|- 24.62% generic_perform_write
>     |-|-|-|-|-|-|-|-|- 18.75% __pi_memmove
> 
> 3. splice() requires one end to be a pipe, incompatible with regular files or dmabuf.
> 
> 4. copy_file_range()
>    - Blocked by cross-FS restrictions (Amir's commit 868f9f2f8e00)
>    - Even without this restriction, Even without restrictions, implementing
>      the copy_file_range callback in dmabuf fops would only allow dmabuf read
> 	 from regular files. This is because copy_file_range relies on
> 	 file_out->f_op->copy_file_range, which cannot support dmabuf write
> 	 operations to regular files.
> 
> Test results confirm these limitations:
> T.J. Mercier's 1G from ext4 on 6.12.20 | read/sendfile (ms) w/ 3 > drop_caches
> ------------------------|-------------------
> udmabuf buffer read     | 1210
> udmabuf direct read     | 671
> udmabuf buffer sendfile | 1096
> udmabuf direct sendfile | 2340
> 
> My 3GHz CPU tests (cache cleared):
> Method                | alloc | read  | vs. (%)
> -----------------------------------------------
> udmabuf buffer read   | 135   | 546   | 180%
> udmabuf direct read   | 159   | 300   | 99%
> udmabuf buffer sendfile | 134 | 303   | 100%
> udmabuf direct sendfile | 141 | 912   | 301%
> dmabuf buffer read    | 22    | 362   | 119%
> my patch direct read  | 29    | 265   | 87%
> 
> My 1GHz CPU tests (cache cleared):
> Method                | alloc | read  | vs. (%)
> -----------------------------------------------
> udmabuf buffer read   | 552   | 2067  | 198%
> udmabuf direct read   | 540   | 627   | 60%
> udmabuf buffer sendfile | 497 | 1045  | 100%
> udmabuf direct sendfile | 527 | 2330  | 223%
> dmabuf buffer read    | 40    | 1111  | 106%
> patch direct read     | 44    | 310   | 30%
> 
> Test observations align with expectations:
> 1. dmabuf buffer read requires slow CPU copies
> 2. udmabuf direct read achieves zero-copy but has page retrieval
>    latency from vaddr
> 3. udmabuf buffer sendfile suffers CPU copy overhead
> 4. udmabuf direct sendfile combines CPU copies with frequent DMA
>    operations due to small pipe buffers
> 5. dmabuf buffer read also requires CPU copies
> 6. My direct read patch enables zero-copy with better performance
>    on low-power CPUs
> 7. udmabuf creation time remains problematic (as you’ve noted).
> 
>>> My focus is enabling dmabuf direct I/O for [regular file] <--DMA-->
>>> [dmabuf] zero-copy.
>>
>> Yeah and that focus is wrong. You need to work on a general solution to the
>> issue and not specific to your problem.
>>
>>> Any API achieving this would work. Are there other uAPIs you think
>>> could help? Could you recommend experts who might offer suggestions?
>>
>> Well once more: Either work on sendfile or copy_file_range or eventually
>> splice to make it what you want to do.
>>
>> When that is done we can discuss with the VFS people if that approach is
>> feasible.
>>
>> But just bypassing the VFS review by implementing a DMA-buf specific IOCTL
>> is a NO-GO. That is clearly not something you can do in any way.
> [wangtao] The issue is that only dmabuf lacks Direct I/O zero-copy support. Tmpfs/shmem
> already work with Direct I/O zero-copy. As explained, existing syscalls or
> generic methods can't enable dmabuf direct I/O zero-copy, which is why I
> propose adding an IOCTL command.
> 
> I respect your perspective. Could you clarify specific technical aspects,
> code requirements, or implementation principles for modifying sendfile()
> or copy_file_range()? This would help advance our discussion.
> 
> Thank you for engaging in this dialogue.
> 
>>
>> Regards,
>> Christian.



More information about the dri-devel mailing list