[Linaro-mm-sig] Re: [PATCH] dma-buf: Require VM_PFNMAP vma for mmap

Jason Gunthorpe jgg at ziepe.ca
Wed Nov 23 13:28:37 UTC 2022


On Wed, Nov 23, 2022 at 02:12:25PM +0100, Christian König wrote:
> Am 23.11.22 um 13:53 schrieb Jason Gunthorpe:
> > On Wed, Nov 23, 2022 at 01:49:41PM +0100, Christian König wrote:
> > > Am 23.11.22 um 13:46 schrieb Jason Gunthorpe:
> > > > On Wed, Nov 23, 2022 at 11:06:55AM +0100, Daniel Vetter wrote:
> > > > 
> > > > > > Maybe a GFP flag to set the page reference count to zero or something
> > > > > > like this?
> > > > > Hm yeah that might work. I'm not sure what it will all break though?
> > > > > And we'd need to make sure that underflowing the page refcount dies in
> > > > > a backtrace.
> > > > Mucking with the refcount like this to protect against crazy out of
> > > > tree drives seems horrible..
> > > Well not only out of tree drivers. The intree KVM got that horrible
> > > wrong as well, those where the latest guys complaining about it.
> > kvm was taking refs on special PTEs? That seems really unlikely?
> 
> Well then look at this code here:
> 
> commit add6a0cd1c5ba51b201e1361b05a5df817083618
> Author: Paolo Bonzini <pbonzini at redhat.com>
> Date:   Tue Jun 7 17:51:18 2016 +0200
> 
>     KVM: MMU: try to fix up page faults before giving up
> 
>     The vGPU folks would like to trap the first access to a BAR by setting
>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> handler
>     then can use remap_pfn_range to place some non-reserved pages in the
> VMA.
> 
>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>     and fixup_user_fault together help supporting it.  The patch also
> supports
>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>     reference counting.
> 
>     Cc: Xiao Guangrong <guangrong.xiao at linux.intel.com>
>     Cc: Andrea Arcangeli <aarcange at redhat.com>
>     Cc: Radim Krčmář <rkrcmar at redhat.com>
>     Tested-by: Neo Jia <cjia at nvidia.com>
>     Reported-by: Kirti Wankhede <kwankhede at nvidia.com>
>     Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>

This patch is known to be broken in so many ways. It also has a major
security hole that it ignores the PTE flags making the page
RO. Ignoring the special bit is somehow not surprising :(

This probably doesn't work, but is the general idea of what KVM needs
to do:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			return r;
 	}
 
+	/*
+	 * Special PTEs are never convertible into a struct page, even if the
+	 * driver that owns them might have put a PFN with a struct page into
+	 * the PFNMAP. If the arch doesn't support special then we cannot
+	 * safely process these pages.
+	 */
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+	if (pte_special(*ptep))
+		return -EINVAL;
+#else
+	return -EINVAL;
+#endif
+
 	if (write_fault && !pte_write(*ptep)) {
 		pfn = KVM_PFN_ERR_RO_FAULT;
 		goto out;

Jason


More information about the dri-devel mailing list