[RFC] Make use of non-dynamic dmabuf in RDMA

Tue Aug 24 20:00:27 UTC 2021

> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Tuesday, August 24, 2021 12:44 PM
> To: Dave Airlie <airlied at gmail.com>
> Cc: John Hubbard <jhubbard at nvidia.com>; Jason Gunthorpe <jgg at ziepe.ca>; Christian König <christian.koenig at amd.com>; Gal Pressman
> <galpress at amazon.com>; Daniel Vetter <daniel at ffwll.ch>; Sumit Semwal <sumit.semwal at linaro.org>; Doug Ledford
> <dledford at redhat.com>; open list:DMA BUFFER SHARING FRAMEWORK <linux-media at vger.kernel.org>; dri-devel <dri-
> devel at lists.freedesktop.org>; Linux Kernel Mailing List <linux-kernel at vger.kernel.org>; linux-rdma <linux-rdma at vger.kernel.org>; Gabbay,
> Oded (Habana) <ogabbay at habana.ai>; Tayar, Tomer (Habana) <ttayar at habana.ai>; Yossi Leybovich <sleybo at amazon.com>; Alexander
> Matushevsky <matua at amazon.com>; Leon Romanovsky <leonro at nvidia.com>; Xiong, Jianxin <jianxin.xiong at intel.com>
> Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA
> 
> On Tue, Aug 24, 2021 at 3:16 PM Dave Airlie <airlied at gmail.com> wrote:
> >
> > On Wed, 25 Aug 2021 at 03:36, John Hubbard <jhubbard at nvidia.com> wrote:
> > >
> > > On 8/24/21 10:32 AM, Jason Gunthorpe wrote:
> > > ...
> > > >>> And yes at least for the amdgpu driver we migrate the memory to
> > > >>> host memory as soon as it is pinned and I would expect that
> > > >>> other GPU drivers do something similar.
> > > >>
> > > >> Well...for many topologies, migrating to host memory will result
> > > >> in a dramatically slower p2p setup. For that reason, some GPU
> > > >> drivers may want to allow pinning of video memory in some situations.
> > > >>
> > > >> Ideally, you've got modern ODP devices and you don't even need to pin.
> > > >> But if not, and you still hope to do high performance p2p between
> > > >> a GPU and a non-ODP Infiniband device, then you would need to
> > > >> leave the pinned memory in vidmem.
> > > >>
> > > >> So I think we don't want to rule out that behavior, right? Or is
> > > >> the thinking more like, "you're lucky that this old non-ODP setup
> > > >> works at all, and we'll make it work by routing through host/cpu
> > > >> memory, but it will be slow"?
> > > >
> > > > I think it depends on the user, if the user creates memory which
> > > > is permanently located on the GPU then it should be pinnable in
> > > > this way without force migration. But if the memory is inherently
> > > > migratable then it just cannot be pinned in the GPU at all as we
> > > > can't indefinately block migration from happening eg if the CPU
> > > > touches it later or something.
> > > >
> > >
> > > OK. I just want to avoid creating any API-level assumptions that
> > > dma_buf_pin() necessarily implies or requires migrating to host memory.
> >
> > I'm not sure we should be allowing dma_buf_pin at all on
> > non-migratable memory, what's to stop someone just pinning all the
> > VRAM and making the GPU unuseable?
> 
> In a lot of cases we have GPUs with more VRAM than system memory, but we allow pinning in system memory.
> 
> Alex
> 

In addition, the dma-buf exporter can be a non-GPU device.

Jianxin

> >
> > I understand not considering more than a single user in these
> > situations is enterprise thinking, but I do worry about pinning is
> > always fine type of thinking when things are shared or multi-user.
> >
> > My impression from this is we've designed hardware that didn't
> > consider the problem, and now to let us use that hardware in horrible
> > ways we should just allow it to pin all the things.
> >
> > Dave.