[PATCH v2 12/15] drm/amdgpu: Call find_vma under mmap_sem

Jason Gunthorpe jgg at mellanox.com
Tue Oct 29 17:19:12 UTC 2019

On Tue, Oct 29, 2019 at 04:28:43PM +0000, Kuehling, Felix wrote:
> On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg at mellanox.com>
> >
> > find_vma() must be called under the mmap_sem, reorganize this code to
> > do the vma check after entering the lock.
> >
> > Further, fix the unlocked use of struct task_struct's mm, instead use
> > the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> > must be converted to a mm_get before acquiring mmap_sem or calling
> > find_vma().
> >
> > Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> > Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
> > Cc: Alex Deucher <alexander.deucher at amd.com>
> > Cc: Christian K├Ânig <christian.koenig at amd.com>
> > Cc: David (ChunMing) Zhou <David1.Zhou at amd.com>
> > Cc: amd-gfx at lists.freedesktop.org
> > Signed-off-by: Jason Gunthorpe <jgg at mellanox.com>
> One question inline to confirm my understanding. Otherwise this patch is
> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> > -	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> > -		vma->vm_file)) {
> > -		r = -EPERM;
> > -		goto out;
> > -	}
> > +	mm = mirror->hmm->mmu_notifier.mm;
> > +	if (!mmget_not_zero(mm)) /* Happens during process shutdown */
> This works because mirror->hmm->mmu_notifier holds an mmgrab reference 
> to the mm?

Yes, this makes sure the mm pointer remains valid

> So the MM will not just go away, but if the mmget refcount is 0, it
> means the mm is marked for destruction and shouldn't be used any
> more.

Not just marked for destruction, but that another thread is
progressing or finished release().

The other detail here is that in general you can't get the mmap_sem
without also having a mmget as exit_mmap() does not lock the mmap_sem
in some places where it alters the datastructures. ie racing
find_vma() with exit_mmap() is not allowed.

This means we have to hold the mmget across the hmm_range_fault(), but
we can drop the mmget and then test mmu_range_read_retry() under the
driver lock. It will return true if the mmget refcount has gone to
zero in the mean time.

But I think this is probably a poor driver design, a driver should
just hold the mmget() until it has completed establishing the shadow
PTEs, as it is hard to see a reason not to..


More information about the dri-devel mailing list