[PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

Fri May 16 08:36:11 UTC 2025

On 5/16/25 09:40, wangtao wrote:
> 
> 
>> -----Original Message-----
>> From: Christian König <christian.koenig at amd.com>
>> Sent: Thursday, May 15, 2025 10:26 PM
>> To: wangtao <tao.wangtao at honor.com>; sumit.semwal at linaro.org;
>> benjamin.gaignard at collabora.com; Brian.Starkey at arm.com;
>> jstultz at google.com; tjmercier at google.com
>> Cc: linux-media at vger.kernel.org; dri-devel at lists.freedesktop.org; linaro-
>> mm-sig at lists.linaro.org; linux-kernel at vger.kernel.org;
>> wangbintian(BintianWang) <bintian.wang at honor.com>; yipengxiang
>> <yipengxiang at honor.com>; liulu 00013167 <liulu.liu at honor.com>; hanfeng
>> 00012985 <feng.han at honor.com>
>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
>> DMA_BUF_IOCTL_RW_FILE for system_heap
>>
>> On 5/15/25 16:03, wangtao wrote:
>>> [wangtao] My Test Configuration (CPU 1GHz, 5-test average):
>>> Allocation: 32x32MB buffer creation
>>> - dmabuf 53ms vs. udmabuf 694ms (10X slower)
>>> - Note: shmem shows excessive allocation time
>>
>> Yeah, that is something already noted by others as well. But that is
>> orthogonal.
>>
>>>
>>> Read 1024MB File:
>>> - dmabuf direct 326ms vs. udmabuf direct 461ms (40% slower)
>>> - Note: pin_user_pages_fast consumes majority CPU cycles
>>>
>>> Key function call timing: See details below.
>>
>> Those aren't valid, you are comparing different functionalities here.
>>
>> Please try using udmabuf with sendfile() as confirmed to be working by T.J.
> [wangtao] Using buffer IO with dmabuf file read/write requires one memory copy.
> Direct IO removes this copy to enable zero-copy. The sendfile system call
> reduces memory copies from two (read/write) to one. However, with udmabuf,
> sendfile still keeps at least one copy, failing zero-copy.

Then please work on fixing this.

Regards,
Christian.

> 
> If udmabuf sendfile uses buffer IO (file page cache), read latency matches
> dmabuf buffer read, but allocation time is much longer.
> With Direct IO, the default 16-page pipe size makes it slower than buffer IO.
> 
> Test data shows:
> udmabuf direct read is much faster than udmabuf sendfile.
> dmabuf direct read outperforms udmabuf direct read by a large margin.
> 
> Issue: After udmabuf is mapped via map_dma_buf, apps using memfd or
> udmabuf for Direct IO might cause errors, but there are no safeguards to
> prevent this.
> 
> Allocate 32x32MB buffer and read 1024 MB file Test:
> Metric                 | alloc (ms) | read (ms) | total (ms)
> -----------------------|------------|-----------|-----------
> udmabuf buffer read    | 539        | 2017      | 2555
> udmabuf direct read    | 522        | 658       | 1179
> udmabuf buffer sendfile| 505        | 1040      | 1546
> udmabuf direct sendfile| 510        | 2269      | 2780
> dmabuf buffer read     | 51         | 1068      | 1118
> dmabuf direct read     | 52         | 297       | 349
> 
> udmabuf sendfile test steps:
> 1. Open data file(1024MB), get back_fd
> 2. Create memfd(32MB) # Loop steps 2-6
> 3. Allocate udmabuf with memfd
> 4. Call sendfile(memfd, back_fd)
> 5. Close memfd after sendfile
> 6. Close udmabuf
> 7. Close back_fd
> 
>>
>> Regards,
>> Christian.
>