[PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap
wangtao
tao.wangtao at honor.com
Fri May 16 07:40:25 UTC 2025
> -----Original Message-----
> From: Christian König <christian.koenig at amd.com>
> Sent: Thursday, May 15, 2025 10:26 PM
> To: wangtao <tao.wangtao at honor.com>; sumit.semwal at linaro.org;
> benjamin.gaignard at collabora.com; Brian.Starkey at arm.com;
> jstultz at google.com; tjmercier at google.com
> Cc: linux-media at vger.kernel.org; dri-devel at lists.freedesktop.org; linaro-
> mm-sig at lists.linaro.org; linux-kernel at vger.kernel.org;
> wangbintian(BintianWang) <bintian.wang at honor.com>; yipengxiang
> <yipengxiang at honor.com>; liulu 00013167 <liulu.liu at honor.com>; hanfeng
> 00012985 <feng.han at honor.com>
> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
> DMA_BUF_IOCTL_RW_FILE for system_heap
>
> On 5/15/25 16:03, wangtao wrote:
> > [wangtao] My Test Configuration (CPU 1GHz, 5-test average):
> > Allocation: 32x32MB buffer creation
> > - dmabuf 53ms vs. udmabuf 694ms (10X slower)
> > - Note: shmem shows excessive allocation time
>
> Yeah, that is something already noted by others as well. But that is
> orthogonal.
>
> >
> > Read 1024MB File:
> > - dmabuf direct 326ms vs. udmabuf direct 461ms (40% slower)
> > - Note: pin_user_pages_fast consumes majority CPU cycles
> >
> > Key function call timing: See details below.
>
> Those aren't valid, you are comparing different functionalities here.
>
> Please try using udmabuf with sendfile() as confirmed to be working by T.J.
[wangtao] Using buffer IO with dmabuf file read/write requires one memory copy.
Direct IO removes this copy to enable zero-copy. The sendfile system call
reduces memory copies from two (read/write) to one. However, with udmabuf,
sendfile still keeps at least one copy, failing zero-copy.
If udmabuf sendfile uses buffer IO (file page cache), read latency matches
dmabuf buffer read, but allocation time is much longer.
With Direct IO, the default 16-page pipe size makes it slower than buffer IO.
Test data shows:
udmabuf direct read is much faster than udmabuf sendfile.
dmabuf direct read outperforms udmabuf direct read by a large margin.
Issue: After udmabuf is mapped via map_dma_buf, apps using memfd or
udmabuf for Direct IO might cause errors, but there are no safeguards to
prevent this.
Allocate 32x32MB buffer and read 1024 MB file Test:
Metric | alloc (ms) | read (ms) | total (ms)
-----------------------|------------|-----------|-----------
udmabuf buffer read | 539 | 2017 | 2555
udmabuf direct read | 522 | 658 | 1179
udmabuf buffer sendfile| 505 | 1040 | 1546
udmabuf direct sendfile| 510 | 2269 | 2780
dmabuf buffer read | 51 | 1068 | 1118
dmabuf direct read | 52 | 297 | 349
udmabuf sendfile test steps:
1. Open data file(1024MB), get back_fd
2. Create memfd(32MB) # Loop steps 2-6
3. Allocate udmabuf with memfd
4. Call sendfile(memfd, back_fd)
5. Close memfd after sendfile
6. Close udmabuf
7. Close back_fd
>
> Regards,
> Christian.
More information about the dri-devel
mailing list