Hi all,
This are the leftovers from my pull that landed in 5.12:
https://lore.kernel.org/dri-devel/CAKMK7uHQ=6OJcRguCUtiB456RWdCfwSNEXV8pQsfs...
Only changes compared to the old submission are: - dropped vfio and kvm patch - add patch to just remove follow_pfn at the end
Assuming no objections I'd like to lande these three patches in my topic branch for 5.13, for sufficient amounts of testing in linux-next before the merge window.
Ack/review especially on the two mm patches very much thought after.
Cheers, Daniel
Daniel Vetter (3): mm: Add unsafe_follow_pfn media/videobuf1|2: Mark follow_pfn usage as unsafe mm: unexport follow_pfn
drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- include/linux/mm.h | 4 +- mm/memory.c | 46 ++++++++++++------- mm/nommu.c | 28 ++++++++--- security/Kconfig | 13 ++++++ 6 files changed, 68 insertions(+), 27 deletions(-)
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea.
Unfortunately there's some users where this is not fixable (like v4l userptr of iomem mappings) or involves a pile of work (vfio type1 iommu). For now annotate these as unsafe and splat appropriately.
This patch adds an unsafe_follow_pfn, which later patches will then roll out to all appropriate places.
Also mark up follow_pfn as EXPORT_SYMBOL_GPL. The only safe way to use that by drivers/modules is together with an mmu_notifier, and that's all _GPL stuff.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v5: Suggestions from Christoph - reindent for less weirdness - use IS_ENABLED instead of #ifdef - same checks for nommu, for consistency - EXPORT_SYMBOL_GPL for follow_pfn. - kerneldoc was already updated in previous versions to explain when follow_pfn can be used safely --- include/linux/mm.h | 2 ++ mm/memory.c | 34 ++++++++++++++++++++++++++++++++-- mm/nommu.c | 27 ++++++++++++++++++++++++++- security/Kconfig | 13 +++++++++++++ 4 files changed, 73 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 64a71bf20536..caec8b25d66f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1695,6 +1695,8 @@ int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp, spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned long *prot, resource_size_t *phys); int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index 5efa07fb6cdc..e8a145505b69 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4741,7 +4741,12 @@ EXPORT_SYMBOL_GPL(follow_pte); * @address: user virtual address * @pfn: location to store found PFN * - * Only IO mappings and raw PFN mappings are allowed. + * Only IO mappings and raw PFN mappings are allowed. Note that callers must + * ensure coherency with pte updates by using a &mmu_notifier to follow updates. + * If this is not feasible, or the access to the @pfn is only very short term, + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of + * the access instead. Any caller not following these requirements must use + * unsafe_follow_pfn() instead. * * This function does not allow the caller to read the permissions * of the PTE. Do not use it. @@ -4765,7 +4770,32 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, pte_unmap_unlock(ptep, ptl); return 0; } -EXPORT_SYMBOL(follow_pfn); +EXPORT_SYMBOL_GPL(follow_pfn); + +/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ + if (IS_ENABLED(CONFIG_STRICT_FOLLOW_PFN)) { + pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); + return -EINVAL; + } + + WARN_ONCE(1, "unsafe follow_pfn usage\n"); + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + return follow_pfn(vma, address, pfn); +} +EXPORT_SYMBOL(unsafe_follow_pfn);
#ifdef CONFIG_HAVE_IOREMAP_PROT int follow_phys(struct vm_area_struct *vma, diff --git a/mm/nommu.c b/mm/nommu.c index 5c9ab799c0e6..1dc983f50e2c 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -130,7 +130,32 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, *pfn = address >> PAGE_SHIFT; return 0; } -EXPORT_SYMBOL(follow_pfn); +EXPORT_SYMBOL_GPL(follow_pfn); + +/** + * unsafe_follow_pfn - look up PFN at a user virtual address + * @vma: memory mapping + * @address: user virtual address + * @pfn: location to store found PFN + * + * Only IO mappings and raw PFN mappings are allowed. + * + * Returns zero and the pfn at @pfn on success, -ve otherwise. + */ +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, + unsigned long *pfn) +{ + if (IS_ENABLED(CONFIG_STRICT_FOLLOW_PFN)) { + pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n"); + return -EINVAL; + } + + WARN_ONCE(1, "unsafe follow_pfn usage\n"); + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + + return follow_pfn(vma, address, pfn); +} +EXPORT_SYMBOL(unsafe_follow_pfn);
LIST_HEAD(vmap_area_list);
diff --git a/security/Kconfig b/security/Kconfig index 7561f6f99f1d..48945402e103 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH If you wish for all usermode helper programs to be disabled, specify an empty string here (i.e. "").
+config STRICT_FOLLOW_PFN + bool "Disable unsafe use of follow_pfn" + depends on MMU + help + Some functionality in the kernel follows userspace mappings to iomem + ranges in an unsafe matter. Examples include v4l userptr for zero-copy + buffers sharing. + + If this option is switched on, such access is rejected. Only enable + this option when you must run userspace which requires this. + + If in doubt, say Y. + source "security/selinux/Kconfig" source "security/smack/Kconfig" source "security/tomoyo/Kconfig"
On Tue, Mar 16, 2021 at 04:33:01PM +0100, Daniel Vetter wrote:
Way back it was a reasonable assumptions that iomem mappings never change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is therefore no longer a good idea.
Unfortunately there's some users where this is not fixable (like v4l userptr of iomem mappings) or involves a pile of work (vfio type1 iommu). For now annotate these as unsafe and splat appropriately.
This patch adds an unsafe_follow_pfn, which later patches will then roll out to all appropriate places.
Also mark up follow_pfn as EXPORT_SYMBOL_GPL. The only safe way to use that by drivers/modules is together with an mmu_notifier, and that's all _GPL stuff.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v5: Suggestions from Christoph
- reindent for less weirdness
- use IS_ENABLED instead of #ifdef
- same checks for nommu, for consistency
- EXPORT_SYMBOL_GPL for follow_pfn.
- kerneldoc was already updated in previous versions to explain when follow_pfn can be used safely
include/linux/mm.h | 2 ++ mm/memory.c | 34 ++++++++++++++++++++++++++++++++-- mm/nommu.c | 27 ++++++++++++++++++++++++++- security/Kconfig | 13 +++++++++++++ 4 files changed, 73 insertions(+), 3 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
The media model assumes that buffers are all preallocated, so that when a media pipeline is running we never miss a deadline because the buffers aren't allocated or available.
This means we cannot fix the v4l follow_pfn usage through mmu_notifier, without breaking how this all works. The only real fix is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and tell everyone to cut over to dma-buf memory sharing for zerocopy.
userptr for normal memory will keep working as-is, this only affects the zerocopy userptr usage enabled in 50ac952d2263 ("[media] videobuf2-dma-sg: Support io userptr operations on io memory").
Acked-by: Tomasz Figa tfiga@chromium.org Acked-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Kees Cook keescook@chromium.org Cc: Dan Williams dan.j.williams@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Pawel Osciak pawel@osciak.com Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Tomasz Figa tfiga@chromium.org Cc: Laurent Dufour ldufour@linux.ibm.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Daniel Jordan daniel.m.jordan@oracle.com Cc: Michel Lespinasse walken@google.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch -- v3: - Reference the commit that enabled the zerocopy userptr use case to make it abundandtly clear that this patch only affects that, and not normal memory userptr. The old commit message already explained that normal memory userptr is unaffected, but I guess that was not clear enough. --- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index a0e65481a201..1a82ec13ea00 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -70,7 +70,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, break;
while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) { - err = follow_pfn(vma, start, &nums[ret]); + err = unsafe_follow_pfn(vma, start, &nums[ret]); if (err) { if (ret == 0) ret = err; diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c index 52312ce2ba05..821c4a76ab96 100644 --- a/drivers/media/v4l2-core/videobuf-dma-contig.c +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem, user_address = untagged_baddr;
while (pages_done < (mem->size >> PAGE_SHIFT)) { - ret = follow_pfn(vma, user_address, &this_pfn); + ret = unsafe_follow_pfn(vma, user_address, &this_pfn); if (ret) break;
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index caec8b25d66f..304588e2f829 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1693,8 +1693,6 @@ int follow_invalidate_pte(struct mm_struct *mm, unsigned long address, pmd_t **pmdpp, spinlock_t **ptlp); int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp, spinlock_t **ptlp); -int follow_pfn(struct vm_area_struct *vma, unsigned long address, - unsigned long *pfn); int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, diff --git a/mm/memory.c b/mm/memory.c index e8a145505b69..317e653c8aeb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4724,7 +4724,10 @@ int follow_invalidate_pte(struct mm_struct *mm, unsigned long address, * should be taken for read. * * KVM uses this function. While it is arguably less bad than ``follow_pfn``, - * it is not a good general-purpose API. + * it is not a good general-purpose API: If callers use the pte after they've + * unlocked @ptlp they must ensure coherency with pte updates by using a + * &mmu_notifier to follow updates. Any caller not following these requirements + * must use unsafe_follow_pfn() instead. * * Return: zero on success, -ve otherwise. */ @@ -4735,25 +4738,7 @@ int follow_pte(struct mm_struct *mm, unsigned long address, } EXPORT_SYMBOL_GPL(follow_pte);
-/** - * follow_pfn - look up PFN at a user virtual address - * @vma: memory mapping - * @address: user virtual address - * @pfn: location to store found PFN - * - * Only IO mappings and raw PFN mappings are allowed. Note that callers must - * ensure coherency with pte updates by using a &mmu_notifier to follow updates. - * If this is not feasible, or the access to the @pfn is only very short term, - * use follow_pte_pmd() instead and hold the pagetable lock for the duration of - * the access instead. Any caller not following these requirements must use - * unsafe_follow_pfn() instead. - * - * This function does not allow the caller to read the permissions - * of the PTE. Do not use it. - * - * Return: zero and the pfn at @pfn on success, -ve otherwise. - */ -int follow_pfn(struct vm_area_struct *vma, unsigned long address, +static int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn) { int ret = -EINVAL; @@ -4770,7 +4755,6 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, pte_unmap_unlock(ptep, ptl); return 0; } -EXPORT_SYMBOL_GPL(follow_pfn);
/** * unsafe_follow_pfn - look up PFN at a user virtual address diff --git a/mm/nommu.c b/mm/nommu.c index 1dc983f50e2c..cee29d0791b3 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -111,17 +111,7 @@ unsigned int kobjsize(const void *objp) return page_size(page); }
-/** - * follow_pfn - look up PFN at a user virtual address - * @vma: memory mapping - * @address: user virtual address - * @pfn: location to store found PFN - * - * Only IO mappings and raw PFN mappings are allowed. - * - * Returns zero and the pfn at @pfn on success, -ve otherwise. - */ -int follow_pfn(struct vm_area_struct *vma, unsigned long address, +static int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn) { if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) @@ -130,7 +120,6 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address, *pfn = address >> PAGE_SHIFT; return 0; } -EXPORT_SYMBOL_GPL(follow_pfn);
/** * unsafe_follow_pfn - look up PFN at a user virtual address
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
I think this is the right thing to do.
Alex is working on fixing VFIO and while kvm is still racy using follow pte, I think they are working on it too?
Jason
On 24/03/21 13:52, Jason Gunthorpe wrote:
I think this is the right thing to do.
Alex is working on fixing VFIO and while kvm is still racy using follow pte, I think they are working on it too?
Yeah, or at least we have a plan.
Paolo
On Wed, Mar 24, 2021 at 09:52:11AM -0300, Jason Gunthorpe wrote:
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
I think this is the right thing to do.
Was just about to smash this into the topic branch for testing in linux-next. Feel like an ack on the series, or at least the two mm patches? -Daniel
Alex is working on fixing VFIO and while kvm is still racy using follow pte, I think they are working on it too?
Jason
On Wed, Mar 24, 2021 at 08:17:10PM +0100, Daniel Vetter wrote:
On Wed, Mar 24, 2021 at 09:52:11AM -0300, Jason Gunthorpe wrote:
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
I think this is the right thing to do.
Was just about to smash this into the topic branch for testing in linux-next. Feel like an ack on the series, or at least the two mm patches?
Pushed them to my topic branch for a bit of testing in linux-next, hopefully goes all fine for a pull for 5.13. -Daniel
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger
vfio and kvm
picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger
vfio and kvm
Hm I thought kvm is non-racy due to the mmu notifier catch races?
picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Thanks for your r-b tags, I'll add them. -Daniel
Jason
On 08/04/21 12:05, Daniel Vetter wrote:
On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger
vfio and kvm
Hm I thought kvm is non-racy due to the mmu notifier catch races?
No, but the plan is indeed to have some struct for each page that uses follow_pfn and update it from the MMU notifiers.
Paolo
picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Thanks for your r-b tags, I'll add them. -Daniel
Jason
On Thu, Apr 08, 2021 at 01:40:59PM +0200, Paolo Bonzini wrote:
On 08/04/21 12:05, Daniel Vetter wrote:
On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use follow_pte()")) have lost their callsites of follow_pfn(). All the other ones have been switched over to unsafe_follow_pfn because they cannot be fixed without breaking userspace api.
Argueably the vfio code is still racy, but that's kinda a bigger
vfio and kvm
Hm I thought kvm is non-racy due to the mmu notifier catch races?
No, but the plan is indeed to have some struct for each page that uses follow_pfn and update it from the MMU notifiers.
Thanks for clarifying, I've fixed the commit message to mention both vfio and kvm as Jason suggested. I didn't know that the follow_pte usage in kvm still has some gaps wrt what's invalidated with mmu notifiers.
Thanks, Daniel
Paolo
picture. But since it does leak the pte beyond where it drops the pt lock, without anything else like an mmu notifier guaranteeing coherence, the problem is at least clearly visible in the vfio code. So good enough with me.
I've decided to keep the explanation that after dropping the pt lock you must have an mmu notifier if you keep using the pte somehow by adjusting it and moving it into the kerneldoc for the new follow_pte() function.
Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Cornelia Huck cohuck@redhat.com Cc: Peter Xu peterx@redhat.com Cc: Alex Williamson alex.williamson@redhat.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com
include/linux/mm.h | 2 -- mm/memory.c | 26 +++++--------------------- mm/nommu.c | 13 +------------ 3 files changed, 6 insertions(+), 35 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Thanks for your r-b tags, I'll add them. -Daniel
Jason
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org