[Intel-gfx] [Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

Daniel Vetter daniel.vetter at ffwll.ch
Wed Nov 23 10:06:55 UTC 2022


On Wed, 23 Nov 2022 at 10:39, Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 23.11.22 um 10:30 schrieb Daniel Vetter:
> > On Wed, 23 Nov 2022 at 10:06, Christian König <christian.koenig at amd.com> wrote:
> >> Am 22.11.22 um 20:50 schrieb Daniel Vetter:
> >>> On Tue, 22 Nov 2022 at 20:34, Jason Gunthorpe <jgg at ziepe.ca> wrote:
> >>>> On Tue, Nov 22, 2022 at 08:29:05PM +0100, Daniel Vetter wrote:
> >>>>> You nuke all the ptes. Drivers that move have slightly more than a
> >>>>> bare struct file, they also have a struct address_space so that
> >>>>> invalidate_mapping_range() works.
> >>>> Okay, this is one of the ways that this can be made to work correctly,
> >>>> as long as you never allow GUP/GUP_fast to succeed on the PTEs. (this
> >>>> was the DAX mistake)
> >>> Hence this patch, to enforce that no dma-buf exporter gets this wrong.
> >>> Which some did, and then blamed bug reporters for the resulting splats
> >>> :-) One of the things we've reverted was the ttm huge pte support,
> >>> since that doesn't have the pmd_special flag (yet) and so would let
> >>> gup_fast through.
> >> The problem is not only gup, a lot of people seem to assume that when
> >> you are able to grab a reference to a page that the ptes pointing to
> >> that page can't change any more. And that's obviously incorrect.
> >>
> >> I witnessed tons of discussions about that already. Some customers even
> >> modified our code assuming that and then wondered why the heck they ran
> >> into data corruption.
> >>
> >> It's gotten so bad that I've even proposed intentionally mangling the
> >> page reference count on TTM allocated pages:
> >> https://patchwork.kernel.org/project/dri-devel/patch/20220927143529.135689-1-christian.koenig@amd.com/
> > Yeah maybe something like this could be applied after we land this
> > patch here.
>
> I think both should land ASAP. We don't have any other way than to
> clearly point out incorrect approaches if we want to prevent the
> resulting data corruption.
>
> > Well maybe should have the same check in gem mmap code to
> > make sure no driver
>
> Reads like the sentence is somehow cut of?

Yeah, just wanted to say that we need to make sure in as many places
as possible that VM_PFNMAP is set for bo mmaps.

> >> I think it would be better that instead of having special flags in the
> >> ptes and vmas that you can't follow them to a page structure we would
> >> add something to the page indicating that you can't grab a reference to
> >> it. But this might break some use cases as well.
> > Afaik the problem with that is that there's no free page bits left for
> > these debug checks. Plus the pte+vma flags are the flags to make this
> > clear already.
>
> Maybe a GFP flag to set the page reference count to zero or something
> like this?

Hm yeah that might work. I'm not sure what it will all break though?
And we'd need to make sure that underflowing the page refcount dies in
a backtrace.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list