[RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

Mon Oct 5 13:13:02 UTC 2020

On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> Dma-buf is a standard cross-driver buffer sharing mechanism that can be
> used to support peer-to-peer access from RDMA devices.
> 
> Device memory exported via dma-buf is associated with a file descriptor.
> This is passed to the user space as a property associated with the
> buffer allocation. When the buffer is registered as a memory region,
> the file descriptor is passed to the RDMA driver along with other
> parameters.
> 
> Implement the common code for importing dma-buf object and mapping
> dma-buf pages.
> 
> Signed-off-by: Jianxin Xiong <jianxin.xiong at intel.com>
> Reviewed-by: Sean Hefty <sean.hefty at intel.com>
> Acked-by: Michael J. Ruhl <michael.j.ruhl at intel.com>
> ---
>  drivers/infiniband/core/Makefile      |   2 +-
>  drivers/infiniband/core/umem.c        |   4 +
>  drivers/infiniband/core/umem_dmabuf.c | 291 ++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/umem_dmabuf.h |  14 ++
>  drivers/infiniband/core/umem_odp.c    |  12 ++
>  include/rdma/ib_umem.h                |  19 ++-
>  6 files changed, 340 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.h

I think this is using ODP too literally, dmabuf isn't going to need
fine grained page faults, and I'm not sure this locking scheme is OK -
ODP is horrifically complicated.

If this is the approach then I think we should make dmabuf its own
stand alone API, reg_user_mr_dmabuf()

The implementation in mlx5 will be much more understandable, it would
just do dma_buf_dynamic_attach() and program the XLT exactly the same
as a normal umem.

The move_notify() simply zap's the XLT and triggers a work to reload
it after the move. Locking is provided by the dma_resv_lock. Only a
small disruption to the page fault handler is needed.

> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> +				     DMA_BIDIRECTIONAL);
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);

This doesn't look right, this lock has to be held up until the HW is
prorgammed

The use of atomic looks probably wrong as well.

> +	k = 0;
> +	total_pages = ib_umem_odp_num_pages(umem_odp);
> +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> +		addr = sg_dma_address(sg);
> +		pages = sg_dma_len(sg) >> page_shift;
> +		while (pages > 0 && k < total_pages) {
> +			umem_odp->dma_list[k++] = addr | access_mask;
> +			umem_odp->npages++;
> +			addr += page_size;
> +			pages--;

This isn't fragmenting the sg into a page list properly, won't work
for unaligned things

And really we don't need the dma_list for this case, with a fixed
whole mapping DMA SGL a normal umem sgl is OK and the normal umem XLT
programming in mlx5 is fine.

Jason