[PATCH v7 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
Xiong, Jianxin
jianxin.xiong at intel.com
Thu Nov 5 00:36:25 UTC 2020
> -----Original Message-----
> From: Jason Gunthorpe <jgg at ziepe.ca>
> Sent: Wednesday, November 04, 2020 4:07 PM
> To: Xiong, Jianxin <jianxin.xiong at intel.com>
> Cc: linux-rdma at vger.kernel.org; dri-devel at lists.freedesktop.org; Doug Ledford <dledford at redhat.com>; Leon Romanovsky
> <leon at kernel.org>; Sumit Semwal <sumit.semwal at linaro.org>; Christian Koenig <christian.koenig at amd.com>; Vetter, Daniel
> <daniel.vetter at intel.com>
> Subject: Re: [PATCH v7 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
>
> On Wed, Nov 04, 2020 at 02:06:34PM -0800, Jianxin Xiong wrote:
> > + umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, access_flags,
> > + &mlx5_ib_dmabuf_attach_ops);
> > + if (IS_ERR(umem)) {
> > + mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
> > + return ERR_PTR(PTR_ERR(umem));
> > + }
> > +
> > + mr = alloc_mr_from_cache(pd, umem, virt_addr, access_flags);
>
> It is very subtle, but this calls mlx5_umem_find_best_pgsz() which calls ib_umem_find_best_pgsz() which goes over the SGL to determine
> the page size to use.
>
When this is called here, the umem sglist is still NULL because dma_buf_map_attachment()
is not called until a page fault occurs. In patch 1/5, the function ib_umem_find_best_pgsz()
has been modified to always return PAGE_SIZE for dma-buf based MR.
> As part of this it does validation of the IOVA vs first page offset vs first page dma address. These little details come into play if the IOVA and
> offset are not PAGE_SIZE aligned, which is very possible if the dma buf exporter or system PAGE_SIZE is over 4k.
>
> In other words, the dma_address of the first SGL must be the page aligned starting point of the MR. Since the 'skip' approach is being done
> when breaking the SGL into blocks the ib_umem_find_best_pgsz() sees an invalid page size.
>
> Slicing it has to be done in a way that gives a properly formed SGL.
>
> My suggestion is to just change the SGL in place. Iterate to the starting SGE in the SGL and assign it to the sg table, modify it to have a offset
> dma_address and reduced length
>
> Count the number of SGEs to span the remaning range and use that as the new nmaped
>
> Edit the last SGE to have a reduced length
Do you still think modifying the SGL in place needed given the above explanation? I do see
some benefits of doing so -- hiding the discrepancy of sgl and addr/length from the device drivers and avoid special handling in the code that use the sgl.
>
> Upon unmap undo the edits so the exporter doesn't see the mangled SGL.
>
> It would be saner if the exporter could do this, but oh well.
>
> Approximately like this:
>
> struct ib_umem *umem = &umem_p->umem;
> struct scatterlist *sg;
> int i;
>
> for_each_sg(umem_p->umem.sg_head.sgl, sg, umem_p->umem.nmap, i) {
> if (cur + sg_dma_len(sg) > ALIGN_DOWN(umem->address, PAGE_SIZE)) {
> unsigned long offset;
>
> umem_p->first_sg = sg;
> umem_p->first_dma_address = sg->dma_address;
> umem_p->first_dma_length = sg_dma_len(sg);
> umem_p->first_length = sg->length;
> offset = ALIGN_DOWN(umem->addressm PAGE_SIZE) - cur;
> sg->dma_address += offset;
> sg_dma_len(sg) -= offset;
> sg->length -= offset;
> }
> if (ALIGN(umem->address + umem->length, PAGE_SIZE) < cur + sg_dma_len(sg)) {
> unsigned long trim;
>
> umem_p->last_sg = sg;
> umem_p->last_dma_length = sg_dma_len(sg);
> umem_p->last_length = sg->length;
> trim = cur + sg_dma_len(sg) - ALIGN(umem->address + umem->length, PAGE_SIZE);
> sg_dma_len(sg) -= trim;
> sg->length -= trim;
> return npages;
> }
> cur += sg_dma_len(sg);
> }
> /* It is essential that the length of the SGL exactly match
> the adjusted page aligned length of umem->length */
> return -EINVAL;
>
> Further, this really only works if the umem->page_size is locked to 4k because this doesn't have code to resize the MKEY, or change the
> underlying page size when the SGL changes.
Yes, now it's locked to 4K.
>
> So, I'd say put something like the above in the core code to validate and properly form the umem->sgl
>
> Then modify the alloc_mr_from_cache to use only PAGE_SIZE:
>
> if (umem->is_dma_buf)
> page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova); else
> page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
>
This should have been addressed in patch 1/5.
Thanks,
Jianxin
> Jason
More information about the dri-devel
mailing list