[PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap
wangtao
tao.wangtao at honor.com
Tue May 20 04:06:08 UTC 2025
> -----Original Message-----
> From: wangtao
> Sent: Monday, May 19, 2025 8:04 PM
> To: 'T.J. Mercier' <tjmercier at google.com>; Christian König
> <christian.koenig at amd.com>
> Cc: sumit.semwal at linaro.org; benjamin.gaignard at collabora.com;
> Brian.Starkey at arm.com; jstultz at google.com; linux-media at vger.kernel.org;
> dri-devel at lists.freedesktop.org; linaro-mm-sig at lists.linaro.org; linux-
> kernel at vger.kernel.org; wangbintian(BintianWang)
> <bintian.wang at honor.com>; yipengxiang <yipengxiang at honor.com>; liulu
> 00013167 <liulu.liu at honor.com>; hanfeng 00012985 <feng.han at honor.com>
> Subject: RE: [PATCH 2/2] dmabuf/heaps: implement
> DMA_BUF_IOCTL_RW_FILE for system_heap
>
>
>
> > -----Original Message-----
> > From: T.J. Mercier <tjmercier at google.com>
> > Sent: Saturday, May 17, 2025 2:37 AM
> > To: Christian König <christian.koenig at amd.com>
> > Cc: wangtao <tao.wangtao at honor.com>; sumit.semwal at linaro.org;
> > benjamin.gaignard at collabora.com; Brian.Starkey at arm.com;
> > jstultz at google.com; linux-media at vger.kernel.org; dri-
> > devel at lists.freedesktop.org; linaro-mm-sig at lists.linaro.org; linux-
> > kernel at vger.kernel.org; wangbintian(BintianWang)
> > <bintian.wang at honor.com>; yipengxiang <yipengxiang at honor.com>; liulu
> > 00013167 <liulu.liu at honor.com>; hanfeng 00012985
> <feng.han at honor.com>
> > Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
> DMA_BUF_IOCTL_RW_FILE
> > for system_heap
> >
> > On Fri, May 16, 2025 at 1:36 AM Christian König
> > <christian.koenig at amd.com>
> > wrote:
> > >
> > > On 5/16/25 09:40, wangtao wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Christian König <christian.koenig at amd.com>
> > > >> Sent: Thursday, May 15, 2025 10:26 PM
> > > >> To: wangtao <tao.wangtao at honor.com>; sumit.semwal at linaro.org;
> > > >> benjamin.gaignard at collabora.com; Brian.Starkey at arm.com;
> > > >> jstultz at google.com; tjmercier at google.com
> > > >> Cc: linux-media at vger.kernel.org; dri-devel at lists.freedesktop.org;
> > > >> linaro- mm-sig at lists.linaro.org; linux-kernel at vger.kernel.org;
> > > >> wangbintian(BintianWang) <bintian.wang at honor.com>; yipengxiang
> > > >> <yipengxiang at honor.com>; liulu 00013167 <liulu.liu at honor.com>;
> > > >> hanfeng
> > > >> 00012985 <feng.han at honor.com>
> > > >> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
> > > >> DMA_BUF_IOCTL_RW_FILE for system_heap
> > > >>
> > > >> On 5/15/25 16:03, wangtao wrote:
> > > >>> [wangtao] My Test Configuration (CPU 1GHz, 5-test average):
> > > >>> Allocation: 32x32MB buffer creation
> > > >>> - dmabuf 53ms vs. udmabuf 694ms (10X slower)
> > > >>> - Note: shmem shows excessive allocation time
> > > >>
> > > >> Yeah, that is something already noted by others as well. But that
> > > >> is orthogonal.
> > > >>
> > > >>>
> > > >>> Read 1024MB File:
> > > >>> - dmabuf direct 326ms vs. udmabuf direct 461ms (40% slower)
> > > >>> - Note: pin_user_pages_fast consumes majority CPU cycles
> > > >>>
> > > >>> Key function call timing: See details below.
> > > >>
> > > >> Those aren't valid, you are comparing different functionalities here.
> > > >>
> > > >> Please try using udmabuf with sendfile() as confirmed to be
> > > >> working by
> > T.J.
> > > > [wangtao] Using buffer IO with dmabuf file read/write requires one
> > memory copy.
> > > > Direct IO removes this copy to enable zero-copy. The sendfile
> > > > system call reduces memory copies from two (read/write) to one.
> > > > However, with udmabuf, sendfile still keeps at least one copy, failing
> zero-copy.
> > >
> > >
> > > Then please work on fixing this.
> > >
> > > Regards,
> > > Christian.
> > >
> > >
> > > >
> > > > If udmabuf sendfile uses buffer IO (file page cache), read latency
> > > > matches dmabuf buffer read, but allocation time is much longer.
> > > > With Direct IO, the default 16-page pipe size makes it slower than
> > > > buffer
> > IO.
> > > >
> > > > Test data shows:
> > > > udmabuf direct read is much faster than udmabuf sendfile.
> > > > dmabuf direct read outperforms udmabuf direct read by a large margin.
> > > >
> > > > Issue: After udmabuf is mapped via map_dma_buf, apps using memfd
> > > > or udmabuf for Direct IO might cause errors, but there are no
> > > > safeguards to prevent this.
> > > >
> > > > Allocate 32x32MB buffer and read 1024 MB file Test:
> > > > Metric | alloc (ms) | read (ms) | total (ms)
> > > > -----------------------|------------|-----------|-----------
> > > > udmabuf buffer read | 539 | 2017 | 2555
> > > > udmabuf direct read | 522 | 658 | 1179
> >
> > I can't reproduce the part where udmabuf direct reads are faster than
> > buffered reads. That's the opposite of what I'd expect. Something
> > seems wrong with those buffered reads.
> >
> > > > udmabuf buffer sendfile| 505 | 1040 | 1546
> > > > udmabuf direct sendfile| 510 | 2269 | 2780
> >
> > I can reproduce the 3.5x slower udambuf direct sendfile compared to
> > udmabuf direct read. It's a pretty disappointing result, so it seems
> > like something could be improved there.
> >
> > 1G from ext4 on 6.12.17 | read/sendfile (ms)
> > ------------------------|-------------------
> > udmabuf buffer read | 351
> > udmabuf direct read | 540
> > udmabuf buffer sendfile | 255
> > udmabuf direct sendfile | 1990
> >
> [wangtao] By the way, did you clear the file cache during testing?
> Looking at your data again, read and sendfile buffers are faster than Direct
> I/O, which suggests the file cache wasn’t cleared. If you didn’t clear the file
> cache, the test results are unfair and unreliable for reference. On embedded
> devices, it’s nearly impossible to maintain stable caching for multi-GB files. If
> such files could be cached, we might as well cache dmabufs directly to save
> time on creating dmabufs and reading file data.
> You can call posix_fadvise(file_fd, 0, len, POSIX_FADV_DONTNEED) after
> opening the file or before closing it to clear the file cache, ensuring actual file
> I/O operations are tested.
>
[wangtao] Please confirm if cache clearing was performed during testing.
I reduced the test scope from 3GB to 1GB. While results without
cache clearing show general alignment, udmabuf buffer read remains
slower than direct read. Comparative data:
Your test reading 1GB(ext4 on 6.12.17:
Method | read/sendfile (ms) | read vs. (%)
----------------------------------------------------------
udmabuf buffer read | 351 | 138%
udmabuf direct read | 540 | 212%
udmabuf buffer sendfile | 255 | 100%
udmabuf direct sendfile | 1990 | 780%
My 3.5GHz tests (f2fs):
Without cache clearing:
Method | alloc | read | vs. (%)
-----------------------------------------------
udmabuf buffer read | 140 | 386 | 310%
udmabuf direct read | 151 | 326 | 262%
udmabuf buffer sendfile | 136 | 124 | 100%
udmabuf direct sendfile | 132 | 892 | 717%
dmabuf buffer read | 23 | 154 | 124%
patch direct read | 29 | 271 | 218%
With cache clearing:
Method | alloc | read | vs. (%)
-----------------------------------------------
udmabuf buffer read | 135 | 546 | 180%
udmabuf direct read | 159 | 300 | 99%
udmabuf buffer sendfile | 134 | 303 | 100%
udmabuf direct sendfile | 141 | 912 | 301%
dmabuf buffer read | 22 | 362 | 119%
patch direct read | 29 | 265 | 87%
Results without cache clearing aren't representative for embedded
mobile devices. Notably, on low-power CPUs @1GHz, sendfile latency
without cache clearing exceeds dmabuf direct I/O read time.
Without cache clearing:
Method | alloc | read | vs. (%)
-----------------------------------------------
udmabuf buffer read | 546 | 1745 | 442%
udmabuf direct read | 511 | 704 | 178%
udmabuf buffer sendfile | 496 | 395 | 100%
udmabuf direct sendfile | 498 | 2332 | 591%
dmabuf buffer read | 43 | 453 | 115%
my patch direct read | 49 | 310 | 79%
With cache clearing:
Method | alloc | read | vs. (%)
-----------------------------------------------
udmabuf buffer read | 552 | 2067 | 198%
udmabuf direct read | 540 | 627 | 60%
udmabuf buffer sendfile | 497 | 1045 | 100%
udmabuf direct sendfile | 527 | 2330 | 223%
dmabuf buffer read | 40 | 1111 | 106%
my patch direct read | 44 | 310 | 30%
Reducing CPU overhead/power consumption is critical for mobile devices.
We need simpler and more efficient dmabuf direct I/O support.
As Christian evaluated sendfile performance based on your data, could
you confirm whether the cache was cleared? If not, please share the
post-cache-clearing test data. Thank you for your support.
> >
> > > > dmabuf buffer read | 51 | 1068 | 1118
> > > > dmabuf direct read | 52 | 297 | 349
> > > >
> > > > udmabuf sendfile test steps:
> > > > 1. Open data file(1024MB), get back_fd 2. Create memfd(32MB) #
> > > > Loop steps 2-6 3. Allocate udmabuf with memfd 4. Call
> > > > sendfile(memfd,
> > > > back_fd) 5. Close memfd after sendfile 6. Close udmabuf 7. Close
> > > > back_fd
> > > >
> > > >>
> > > >> Regards,
> > > >> Christian.
> > > >
> > >
More information about the dri-devel
mailing list