[PATCH v16 0/4] RDMA: Add dma-buf support

Thu Feb 4 18:29:23 UTC 2021

On Thu, Feb 04, 2021 at 08:50:38AM -0500, Alex Deucher wrote:
> On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard at nvidia.com> wrote:
> >
> > On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > > This patch series adds dma-buf importer role to the RDMA driver in
> > > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > > chosen for a few reasons: first, the API is relatively simple and allows
> > > a lot of flexibility in implementing the buffer manipulation ops.
> > > Second, it doesn't require page structure. Third, dma-buf is already
> > > supported in many GPU drivers. However, we are aware that existing GPU
> > > drivers don't allow pinning device memory via the dma-buf interface.
> > > Pinning would simply cause the backing storage to migrate to system RAM.
> > > True peer-to-peer access is only possible using dynamic attach, which
> > > requires on-demand paging support from the NIC to work. For this reason,
> > > this series only works with ODP capable NICs.
> >
> > Hi,
> >
> > Looking ahead to after this patchset is merged...
> >
> > Are there design thoughts out there, about the future of pinning to vidmem,
> > for this? It would allow a huge group of older GPUs and NICs and such to
> > do p2p with this approach, and it seems like a natural next step, right?
> 
> The argument is that vram is a scarce resource, but I don't know if
> that is really the case these days.  At this point, we often have as
> much vram as system ram if not more.

I thought the main argument was that GPU memory could move at any time
between the GPU and CPU and the DMA buf would always track its current
location?

IMHO there is no reason not to have a special API to create small
amounts of GPU dedicated locked memory that cannot be moved off the
GPU.

For instance this paper:

http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2014-ASHESIPDPS.pdf

Considers using the GPU to directly drive the RDMA work
queues. Putting the queues themselves in GPU VRAM would make alot of
sense.

But that is impossible without fixed non-invalidating dma bufs.

Jason