[Intel-gfx] [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

Tue Nov 22 19:29:05 UTC 2022

On Tue, 22 Nov 2022 at 19:50, Jason Gunthorpe <jgg at ziepe.ca> wrote:
>
> On Tue, Nov 22, 2022 at 07:08:25PM +0100, Daniel Vetter wrote:
> > On Tue, 22 Nov 2022 at 19:04, Jason Gunthorpe <jgg at ziepe.ca> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 06:08:00PM +0100, Daniel Vetter wrote:
> > > > tldr; DMA buffers aren't normal memory, expecting that you can use
> > > > them like that (like calling get_user_pages works, or that they're
> > > > accounting like any other normal memory) cannot be guaranteed.
> > > >
> > > > Since some userspace only runs on integrated devices, where all
> > > > buffers are actually all resident system memory, there's a huge
> > > > temptation to assume that a struct page is always present and useable
> > > > like for any more pagecache backed mmap. This has the potential to
> > > > result in a uapi nightmare.
> > > >
> > > > To stop this gap require that DMA buffer mmaps are VM_PFNMAP, which
> > > > blocks get_user_pages and all the other struct page based
> > > > infrastructure for everyone. In spirit this is the uapi counterpart to
> > > > the kernel-internal CONFIG_DMABUF_DEBUG.
> > > >
> > > > Motivated by a recent patch which wanted to swich the system dma-buf
> > > > heap to vm_insert_page instead of vm_insert_pfn.
> > > >
> > > > v2:
> > > >
> > > > Jason brought up that we also want to guarantee that all ptes have the
> > > > pte_special flag set, to catch fast get_user_pages (on architectures
> > > > that support this). Allowing VM_MIXEDMAP (like VM_SPECIAL does) would
> > > > still allow vm_insert_page, but limiting to VM_PFNMAP will catch that.
> > > >
> > > > From auditing the various functions to insert pfn pte entires
> > > > (vm_insert_pfn_prot, remap_pfn_range and all it's callers like
> > > > dma_mmap_wc) it looks like VM_PFNMAP is already required anyway, so
> > > > this should be the correct flag to check for.
> > >
> > > I didn't look at how this actually gets used, but it is a bit of a
> > > pain to insert a lifetime controlled object like a struct page as a
> > > special PTE/VM_PFNMAP
> > >
> > > How is the lifetime model implemented here? How do you know when
> > > userspace has finally unmapped the page?
> >
> > The vma has a filp which is the refcounted dma_buf. With dma_buf you
> > never get an individual page it's always the entire object. And it's
> > up to the allocator how exactly it wants to use or not use the page's
> > refcount. So if gup goes in and elevates the refcount, you can break
> > stuff, which is why I'm doing this.
>
> But how does move work?

You nuke all the ptes. Drivers that move have slightly more than a
bare struct file, they also have a struct address_space so that
invalidate_mapping_range() works. Refaulting and any coherency issues
when a refault races against a dma-buf migration is up to the
driver/exporter to handle correctly. None rely on struct page like mm/
moving stuff around for compaction/ksm/numa-balancing/whateverr.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch