[PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

Fri Jun 7 21:37:07 UTC 2019

On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg at mellanox.com>
> 
> hmm_release() is called exactly once per hmm. ops->release() cannot
> accidentally trigger any action that would recurse back onto
> hmm->mirrors_sem.
> 
> This fixes a use after-free race of the form:
> 
>         CPU0                                   CPU1
>                                             hmm_release()
>                                               up_write(&hmm->mirrors_sem);
>   hmm_mirror_unregister(mirror)
>    down_write(&hmm->mirrors_sem);
>    up_write(&hmm->mirrors_sem);
>    kfree(mirror)
>                                               mirror->ops->release(mirror)
> 
> The only user we have today for ops->release is an empty function, so this
> is unambiguously safe.
> 
> As a consequence of plugging this race drivers are not allowed to
> register/unregister mirrors from within a release op.
> 
> Signed-off-by: Jason Gunthorpe <jgg at mellanox.com>

I agree with the analysis above but I'm not sure that release() will
always be an empty function. It might be more efficient to write back
all data migrated to a device "in one pass" instead of relying
on unmap_vmas() calling hmm_start_range_invalidate() per VMA.

I think the bigger issue is potential deadlocks while calling
sync_cpu_device_pagetables() and tasks calling hmm_mirror_unregister():

Say you have three threads:
- Thread A is in try_to_unmap(), either without holding mmap_sem or with
mmap_sem held for read.
- Thread B has some unrelated driver calling hmm_mirror_unregister().
This doesn't require mmap_sem.
- Thread C is about to call migrate_vma().

Thread A                Thread B                 Thread C
try_to_unmap            hmm_mirror_unregister    migrate_vma
----------------------  -----------------------  ----------------------
hmm_invalidate_range_start
down_read(mirrors_sem)
                         down_write(mirrors_sem)
                         // Blocked on A
                                                   device_lock
device_lock
// Blocked on C
                                                   migrate_vma()
                                                   hmm_invalidate_range_s
                                                   down_read(mirrors_sem)
                                                   // Blocked on B
                                                   // Deadlock

Perhaps we should consider using SRCU for walking the mirror->list?

> ---
>   mm/hmm.c | 28 +++++++++-------------------
>   1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 709d138dd49027..3a45dd3d778248 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>   	WARN_ON(!list_empty(&hmm->ranges));
>   	mutex_unlock(&hmm->lock);
>   
> -	down_write(&hmm->mirrors_sem);
> -	mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
> -					  list);
> -	while (mirror) {
> -		list_del_init(&mirror->list);
> -		if (mirror->ops->release) {
> -			/*
> -			 * Drop mirrors_sem so the release callback can wait
> -			 * on any pending work that might itself trigger a
> -			 * mmu_notifier callback and thus would deadlock with
> -			 * us.
> -			 */
> -			up_write(&hmm->mirrors_sem);
> +	down_read(&hmm->mirrors_sem);
> +	list_for_each_entry(mirror, &hmm->mirrors, list) {
> +		/*
> +		 * Note: The driver is not allowed to trigger
> +		 * hmm_mirror_unregister() from this thread.
> +		 */
> +		if (mirror->ops->release)
>   			mirror->ops->release(mirror);
> -			down_write(&hmm->mirrors_sem);
> -		}
> -		mirror = list_first_entry_or_null(&hmm->mirrors,
> -						  struct hmm_mirror, list);
>   	}
> -	up_write(&hmm->mirrors_sem);
> +	up_read(&hmm->mirrors_sem);
>   
>   	hmm_put(hmm);
>   }
> @@ -287,7 +277,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
>   	struct hmm *hmm = mirror->hmm;
>   
>   	down_write(&hmm->mirrors_sem);
> -	list_del_init(&mirror->list);
> +	list_del(&mirror->list);
>   	up_write(&hmm->mirrors_sem);
>   	hmm_put(hmm);
>   	memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));
>