[PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap
Christian König
christian.koenig at amd.com
Thu May 22 11:57:45 UTC 2025
On 5/22/25 10:02, wangtao wrote:
>> -----Original Message-----
>> From: Christian König <christian.koenig at amd.com>
>> Sent: Wednesday, May 21, 2025 7:57 PM
>> To: wangtao <tao.wangtao at honor.com>; T.J. Mercier
>> <tjmercier at google.com>
>> Cc: sumit.semwal at linaro.org; benjamin.gaignard at collabora.com;
>> Brian.Starkey at arm.com; jstultz at google.com; linux-media at vger.kernel.org;
>> dri-devel at lists.freedesktop.org; linaro-mm-sig at lists.linaro.org; linux-
>> kernel at vger.kernel.org; wangbintian(BintianWang)
>> <bintian.wang at honor.com>; yipengxiang <yipengxiang at honor.com>; liulu
>> 00013167 <liulu.liu at honor.com>; hanfeng 00012985 <feng.han at honor.com>;
>> amir73il at gmail.com
>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
>> DMA_BUF_IOCTL_RW_FILE for system_heap
>>
>> On 5/21/25 12:25, wangtao wrote:
>>> [wangtao] I previously explained that
>>> read/sendfile/splice/copy_file_range
>>> syscalls can't achieve dmabuf direct IO zero-copy.
>>
>> And why can't you work on improving those syscalls instead of creating a new
>> IOCTL?
>>
> [wangtao] As I mentioned in previous emails, these syscalls cannot
> achieve dmabuf zero-copy due to technical constraints.
Yeah, and why can't you work on removing those technical constrains?
What is blocking you from improving the sendfile system call or proposing a patch to remove the copy_file_range restrictions?
Regards,
Christian.
Could you
> specify the technical points, code, or principles that need
> optimization?
>
> Let me explain again why these syscalls can't work:
> 1. read() syscall
> - dmabuf fops lacks read callback implementation. Even if implemented,
> file_fd info cannot be transferred
> - read(file_fd, dmabuf_ptr, len) with remap_pfn_range-based mmap
> cannot access dmabuf_buf pages, forcing buffer-mode reads
>
> 2. sendfile() syscall
> - Requires CPU copy from page cache to memory file(tmpfs/shmem):
> [DISK] --DMA--> [page cache] --CPU copy--> [MEMORY file]
> - CPU overhead (both buffer/direct modes involve copies):
> 55.08% do_sendfile
> |- 55.08% do_splice_direct
> |-|- 55.08% splice_direct_to_actor
> |-|-|- 22.51% copy_splice_read
> |-|-|-|- 16.57% f2fs_file_read_iter
> |-|-|-|-|- 15.12% __iomap_dio_rw
> |-|-|- 32.33% direct_splice_actor
> |-|-|-|- 32.11% iter_file_splice_write
> |-|-|-|-|- 28.42% vfs_iter_write
> |-|-|-|-|-|- 28.42% do_iter_write
> |-|-|-|-|-|-|- 28.39% shmem_file_write_iter
> |-|-|-|-|-|-|-|- 24.62% generic_perform_write
> |-|-|-|-|-|-|-|-|- 18.75% __pi_memmove
>
> 3. splice() requires one end to be a pipe, incompatible with regular files or dmabuf.
>
> 4. copy_file_range()
> - Blocked by cross-FS restrictions (Amir's commit 868f9f2f8e00)
> - Even without this restriction, Even without restrictions, implementing
> the copy_file_range callback in dmabuf fops would only allow dmabuf read
> from regular files. This is because copy_file_range relies on
> file_out->f_op->copy_file_range, which cannot support dmabuf write
> operations to regular files.
>
> Test results confirm these limitations:
> T.J. Mercier's 1G from ext4 on 6.12.20 | read/sendfile (ms) w/ 3 > drop_caches
> ------------------------|-------------------
> udmabuf buffer read | 1210
> udmabuf direct read | 671
> udmabuf buffer sendfile | 1096
> udmabuf direct sendfile | 2340
>
> My 3GHz CPU tests (cache cleared):
> Method | alloc | read | vs. (%)
> -----------------------------------------------
> udmabuf buffer read | 135 | 546 | 180%
> udmabuf direct read | 159 | 300 | 99%
> udmabuf buffer sendfile | 134 | 303 | 100%
> udmabuf direct sendfile | 141 | 912 | 301%
> dmabuf buffer read | 22 | 362 | 119%
> my patch direct read | 29 | 265 | 87%
>
> My 1GHz CPU tests (cache cleared):
> Method | alloc | read | vs. (%)
> -----------------------------------------------
> udmabuf buffer read | 552 | 2067 | 198%
> udmabuf direct read | 540 | 627 | 60%
> udmabuf buffer sendfile | 497 | 1045 | 100%
> udmabuf direct sendfile | 527 | 2330 | 223%
> dmabuf buffer read | 40 | 1111 | 106%
> patch direct read | 44 | 310 | 30%
>
> Test observations align with expectations:
> 1. dmabuf buffer read requires slow CPU copies
> 2. udmabuf direct read achieves zero-copy but has page retrieval
> latency from vaddr
> 3. udmabuf buffer sendfile suffers CPU copy overhead
> 4. udmabuf direct sendfile combines CPU copies with frequent DMA
> operations due to small pipe buffers
> 5. dmabuf buffer read also requires CPU copies
> 6. My direct read patch enables zero-copy with better performance
> on low-power CPUs
> 7. udmabuf creation time remains problematic (as you’ve noted).
>
>>> My focus is enabling dmabuf direct I/O for [regular file] <--DMA-->
>>> [dmabuf] zero-copy.
>>
>> Yeah and that focus is wrong. You need to work on a general solution to the
>> issue and not specific to your problem.
>>
>>> Any API achieving this would work. Are there other uAPIs you think
>>> could help? Could you recommend experts who might offer suggestions?
>>
>> Well once more: Either work on sendfile or copy_file_range or eventually
>> splice to make it what you want to do.
>>
>> When that is done we can discuss with the VFS people if that approach is
>> feasible.
>>
>> But just bypassing the VFS review by implementing a DMA-buf specific IOCTL
>> is a NO-GO. That is clearly not something you can do in any way.
> [wangtao] The issue is that only dmabuf lacks Direct I/O zero-copy support. Tmpfs/shmem
> already work with Direct I/O zero-copy. As explained, existing syscalls or
> generic methods can't enable dmabuf direct I/O zero-copy, which is why I
> propose adding an IOCTL command.
>
> I respect your perspective. Could you clarify specific technical aspects,
> code requirements, or implementation principles for modifying sendfile()
> or copy_file_range()? This would help advance our discussion.
>
> Thank you for engaging in this dialogue.
>
>>
>> Regards,
>> Christian.
More information about the dri-devel
mailing list