Enabling peer to peer device transactions for PCIe devices

Thu Nov 24 16:24:22 UTC 2016

On Thu, Nov 24, 2016 at 12:40:37AM +0000, Sagalovitch, Serguei wrote:
> On Wed, Nov 23, 2016 at 02:11:29PM -0700, Logan Gunthorpe wrote:
> 
> > Perhaps I am not following what Serguei is asking for, but I
> > understood the desire was for a complex GPU allocator that could
> > migrate pages between GPU and CPU memory under control of the GPU
> > driver, among other things. The desire is for DMA to continue to work
> > even after these migrations happen.
> 
> The main issue is to  how to solve use cases when p2p is 
> requested/initiated via CPU pointers where such pointers could 
> point to non-system memory location e.g.  VRAM.  

Okay, but your list is conflating a whole bunch of problems..

 1) How to go from a __user pointer to a p2p DMA address
  a) How to validate, setup iommu and maybe worst case bounce buffer
     these p2p DMAs
 2) How to allow drivers (ie GPU allocator) dynamically
    remap pages in a VMA to/from p2p DMA addresses
 3) How to expose uncachable p2p DMA address to user space via mmap

> to allow "get_user_pages"  to work transparently similar 
> how it is/was done for "DAX Device" case. Unfortunately 
> based on my understanding "DAX Device" implementation 
> deal only with permanently  "locked" memory  (fixed location) 
> unrelated to "get_user_pages"/"put_page" scope  
> which doesn't satisfy requirements  for "eviction" / "moving" of 
> memory keeping CPU address intact.  

Hurm, isn't that issue with DAX only to do with being coherent with
the page cache?

A GPU allocator would not use the page cache, it would have to
construct VMAs some other way.

> My understanding is that It will not solve RDMA MR issue where "lock" 
> could be during the whole  application life but  (a) it will not make 
> RDMA MR case worse  (b) should be enough for all other cases for 
> "get_user_pages"/"put_page" controlled by  kernel.

Right. There is no solution to the RDMA MR issue on old hardware. Apps
that are using GPU+RDMA+Old hardware will have to use short lived MRs
and pay that performance cost, or give up on migration.

Jason