[PATCH v5 4/5] RDMA/mlx5: Support dma-buf based userspace memory region

Jason Gunthorpe jgg at ziepe.ca
Fri Oct 16 18:58:29 UTC 2020


On Fri, Oct 16, 2020 at 06:40:01AM +0000, Xiong, Jianxin wrote:
> > > +	if (!mr)
> > > +		return -EINVAL;
> > > +
> > > +	return mlx5_ib_update_xlt(mr, 0, mr->npages, PAGE_SHIFT, flags); }
> > > +
> > > +static struct ib_umem_dmabuf_ops mlx5_ib_umem_dmabuf_ops = {
> > > +	.init = mlx5_ib_umem_dmabuf_xlt_init,
> > > +	.update = mlx5_ib_umem_dmabuf_xlt_update,
> > > +	.invalidate = mlx5_ib_umem_dmabuf_xlt_invalidate,
> > > +};
> > 
> > I'm not really convinced these should be ops, this is usually a bad design pattern.
> > 
> > Why do I need so much code to extract the sgl from the dma_buf? I would prefer the dma_buf layer simplify this, not by adding a wrapper
> > around it in the IB core code...
> > 
> 
> We just need a way to call a device specific function to update the NIC's translation
> table.  I considered three ways: (1) ops registered with ib_umem_get_dmabuf; 
> (2) a single function pointer registered with ib_umem_get_dmabuf; (3) a method 
> in 'struct ib_device'. Option (1) was chosen here with no strong reason. We could
> consolidate the three functions of the ops into one, but then we will need to 
> define commands or flags for different update operations.   

I'd rather the driver directly provide the dma_buf ops.. Inserting
layers that do nothing be call other layers is usually a bad idea. I
didn't look carefully yet at how that would be arranged.

> > > +	ncont = npages;
> > > +	order = ilog2(roundup_pow_of_two(ncont));
> > 
> > We still need to deal with contiguity here, this ncont/npages is just obfuscation.
> 
> Since the pages can move, we can't take advantage of contiguity here. This handling
> is similar to the ODP case. The variables 'ncont' and 'page_shift' here are not necessary.
> They are kept just for the sake of signifying the semantics of the following functions that
> use them.

Well, in this case we can manage it, and the performance boost is high
enough we need to. The work on mlx5 to do it is a bit inovlved though.
 
> > > +	err = ib_umem_dmabuf_init_mapping(umem, mr);
> > > +	if (err) {
> > > +		dereg_mr(dev, mr);
> > > +		return ERR_PTR(err);
> > > +	}
> > 
> > Did you test the page fault path at all? Looks like some xarray code is missing here, and this is also missing the related complex teardown
> > logic.
> > 
> > Does this mean you didn't test the pagefault_dmabuf_mr() at all?
> 
> Thanks for the hint. I was unable to get the test runs reaching the
> pagefault_dmabuf_mr() function. Now I have found the reason. Along
> the path of all the page fault handlers, the array "odp_mkeys" is checked
> against the mr key. Since the dmabuf mr keys are not in the list the
> handler is never called.
>
> On the other hand, it seems that pagefault_dmabuf_mr() is not needed at all.
> The pagefault is gracefully handled by retrying until the work thread finished
> programming the NIC.

This is a bug of some kind, pagefaults that can't find a mkey in the
xarray should cause completion with error.

Jason


More information about the dri-devel mailing list