[Lsf-pc] [LSF/MM/BPF proposal]: Physr discussion

Dan Williams dan.j.williams at intel.com
Mon Jan 23 19:36:51 UTC 2023


Jason Gunthorpe via Lsf-pc wrote:
> I would like to have a session at LSF to talk about Matthew's
> physr discussion starter:
> 
>  https://lore.kernel.org/linux-mm/YdyKWeU0HTv8m7wD@casper.infradead.org/
> 
> I have become interested in this with some immediacy because of
> IOMMUFD and this other discussion with Christoph:
> 
>  https://lore.kernel.org/kvm/4-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/

I think this is a worthwhile discussion. My main hangup with 'struct
page' elimination in general is that if anything needs to be allocated
to describe a physical address for other parts of the kernel to operate
on it, why not a 'struct page'? There are of course several difficulties
allocating a 'struct page' array, but I look at subsection support and
the tail page space optimization work as evidence that some of the pain
can be mitigated, what more needs to be done? I also think this is
somewhat of a separate consideration than replacing a bio_vec with phyr
where that has value independent of the mechanism used to manage
phys_addr_t => dma_addr_t.

> Which results in, more or less, we have no way to do P2P DMA
> operations without struct page - and from the RDMA side solving this
> well at the DMA API means advancing at least some part of the physr
> idea.
> 
> So - my objective is to enable to DMA API to "DMA map" something that
> is not a scatterlist, may or may not contain struct pages, but can
> still contain P2P DMA data. From there I would move RDMA MR's to use
> this new API, modify DMABUF to export it, complete the above VFIO
> series, and finally, use all of this to add back P2P support to VFIO
> when working with IOMMUFD by allowing IOMMUFD to obtain a safe
> reference to the VFIO memory using DMABUF. From there we'd want to see
> pin_user_pages optimized, and that also will need some discussion how
> best to structure it.
> 
> I also have several ideas on how something like physr can optimize the
> iommu driver ops when working with dma-iommu.c and IOMMUFD.
> 
> I've been working on an implementation and hope to have something
> draft to show on the lists in a few weeks. It is pretty clear there
> are several interesting decisions to make that I think will benefit
> from a live discussion.
> 
> Providing a kernel-wide alternative to scatterlist is something that
> has general interest across all the driver subsystems. I've started to
> view the general problem rather like xarray where the main focus is to
> create the appropriate abstraction and then go about transforming
> users to take advatange of the cleaner abstraction. scatterlist
> suffers here because it has an incredibly leaky API, a huge number of
> (often sketchy driver) users, and has historically been very difficult
> to improve.

When I read "general interest across all the driver subsystems" it is
hard not to ask "have all possible avenues to enable 'struct page' been
exhausted?"

> The session would quickly go over the current state of whatever the
> mailing list discussion evolves into and an open discussion around the
> different ideas.

Sounds good to me.


More information about the dri-devel mailing list