[PATCH 0/2] Support direct I/O read and write for memory allocated by dmabuf

Christian König christian.koenig at amd.com
Thu Jul 11 14:25:59 UTC 2024


Am 10.07.24 um 18:34 schrieb T.J. Mercier:
> On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt at vivo.com> wrote:
>> on 2024/7/10 22:48, Christian König wrote:
>>> Am 10.07.24 um 16:35 schrieb Lei Liu:
>>>> on 2024/7/10 22:14, Christian König wrote:
>>>>> Am 10.07.24 um 15:57 schrieb Lei Liu:
>>>>>> Use vm_insert_page to establish a mapping for the memory allocated
>>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the
>>>>>> issue of incorrect memory statistics after mapping dmabuf memory.
>>>>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs.
>>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is
>>>> there any historical reason for this?
>>> It's basically one of the most fundamental design decision of DMA-Buf.
>>> The attachment/map/fence model DMA-buf uses is not really compatible
>>> with direct I/O on the underlying pages.
>> Thank you! Is there any related documentation on this? I would like to
>> understand and learn more about the fundamental reasons for the lack of
>> support.
> Hi Lei and Christian,
>
> This is now the third request I've seen from three different companies
> who are interested in this,

Yeah, completely agree. This is a re-occurring pattern :)

Maybe we should document the preferred solution for that.

> but the others are not for reasons of read
> performance that you mention in the commit message on your first
> patch. Someone else at Google ran a comparison between a normal read()
> and a direct I/O read() into a preallocated user buffer and found that
> with large readahead (16 MB) the throughput can actually be slightly
> higher than direct I/O. If you have concerns about read performance,
> have you tried increasing the readahead size?
>
> The other motivation is to load a gajillion byte file from disk into a
> dmabuf without evicting the entire contents of pagecache while doing
> so. Something like this (which does not currently work because read()
> tries to GUP on the dmabuf memory as you mention):
>
> static int dmabuf_heap_alloc(int heap_fd, size_t len)
> {
>      struct dma_heap_allocation_data data = {
>          .len = len,
>          .fd = 0,
>          .fd_flags = O_RDWR | O_CLOEXEC,
>          .heap_flags = 0,
>      };
>      int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data);
>      if (ret < 0)
>          return ret;
>      return data.fd;
> }
>
> int main(int, char **argv)
> {
>          const char *file_path = argv[1];
>          printf("File: %s\n", file_path);
>          int file_fd = open(file_path, O_RDONLY | O_DIRECT);
>
>          struct stat st;
>          stat(file_path, &st);
>          ssize_t file_size = st.st_size;
>          ssize_t aligned_size = (file_size + 4095) & ~4095;
>
>          printf("File size: %zd Aligned size: %zd\n", file_size, aligned_size);
>          int heap_fd = open("/dev/dma_heap/system", O_RDONLY);
>          int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size);
>
>          void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE,
> MAP_SHARED, dmabuf_fd, 0);
>          printf("VM at 0x%lx\n", (unsigned long)vm);
>
>          dma_buf_sync sync_flags { DMA_BUF_SYNC_START |
> DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE };
>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>
>          ssize_t rc = read(file_fd, vm, file_size);
>          printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : "");
>
>          sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ |
> DMA_BUF_SYNC_WRITE;
>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
> }
>
> Or replace the mmap() + read() with sendfile().

Or copy_file_range(). That's pretty much exactly what I suggested on the 
other mail thread around that topic as well.

> So I would also like to see the above code (or something else similar)
> be able to work and I understand some of the reasons why it currently
> does not, but I don't understand why we should actively prevent this
> type of behavior entirely.

+1

Regards,
Christian.

>
> Best,
> T.J.
>
>
>
>
>
>
>
>
>>>>> We already discussed enforcing that in the DMA-buf framework and
>>>>> this patch probably means that we should really do that.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>> Thank you for your response. With the application of AI large model
>>>> edgeification, we urgently need support for direct_io on DMABUF to
>>>> read some very large files. Do you have any new solutions or plans
>>>> for this?
>>> We have seen similar projects over the years and all of those turned
>>> out to be complete shipwrecks.
>>>
>>> There is currently a patch set under discussion to give the network
>>> subsystem DMA-buf support. If you are interest in network direct I/O
>>> that could help.
>> Is there a related introduction link for this patch?
>>
>>> Additional to that a lot of GPU drivers support userptr usages, e.g.
>>> to import malloced memory into the GPU driver. You can then also do
>>> direct I/O on that malloced memory and the kernel will enforce correct
>>> handling with the GPU driver through MMU notifiers.
>>>
>>> But as far as I know a general DMA-buf based solution isn't possible.
>> 1.The reason we need to use DMABUF memory here is that we need to share
>> memory between the CPU and APU. Currently, only DMABUF memory is
>> suitable for this purpose. Additionally, we need to read very large files.
>>
>> 2. Are there any other solutions for this? Also, do you have any plans
>> to support direct_io for DMABUF memory in the future?
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Lei Liu.
>>>>
>>>>>> Lei Liu (2):
>>>>>>     mm: dmabuf_direct_io: Support direct_io for memory allocated by
>>>>>> dmabuf
>>>>>>     mm: dmabuf_direct_io: Fix memory statistics error for dmabuf
>>>>>> allocated
>>>>>>       memory with direct_io support
>>>>>>
>>>>>>    drivers/dma-buf/heaps/system_heap.c |  5 +++--
>>>>>>    fs/proc/task_mmu.c                  |  8 +++++++-
>>>>>>    include/linux/mm.h                  |  1 +
>>>>>>    mm/memory.c                         | 15 ++++++++++-----
>>>>>>    mm/rmap.c                           |  9 +++++----
>>>>>>    5 files changed, 26 insertions(+), 12 deletions(-)
>>>>>>



More information about the dri-devel mailing list