[PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

John Hubbard jhubbard at nvidia.com
Fri Jun 7 03:47:28 UTC 2019


On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg at mellanox.com>
> 
> hmm_release() is called exactly once per hmm. ops->release() cannot
> accidentally trigger any action that would recurse back onto
> hmm->mirrors_sem.
> 
> This fixes a use after-free race of the form:
> 
>        CPU0                                   CPU1
>                                            hmm_release()
>                                              up_write(&hmm->mirrors_sem);
>  hmm_mirror_unregister(mirror)
>   down_write(&hmm->mirrors_sem);
>   up_write(&hmm->mirrors_sem);
>   kfree(mirror)
>                                              mirror->ops->release(mirror)
> 
> The only user we have today for ops->release is an empty function, so this
> is unambiguously safe.
> 
> As a consequence of plugging this race drivers are not allowed to
> register/unregister mirrors from within a release op.
> 
> Signed-off-by: Jason Gunthorpe <jgg at mellanox.com>
> ---
>  mm/hmm.c | 28 +++++++++-------------------
>  1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 709d138dd49027..3a45dd3d778248 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>  	WARN_ON(!list_empty(&hmm->ranges));
>  	mutex_unlock(&hmm->lock);
>  
> -	down_write(&hmm->mirrors_sem);
> -	mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
> -					  list);
> -	while (mirror) {
> -		list_del_init(&mirror->list);
> -		if (mirror->ops->release) {
> -			/*
> -			 * Drop mirrors_sem so the release callback can wait
> -			 * on any pending work that might itself trigger a
> -			 * mmu_notifier callback and thus would deadlock with
> -			 * us.
> -			 */
> -			up_write(&hmm->mirrors_sem);
> +	down_read(&hmm->mirrors_sem);

This is cleaner and simpler, but I suspect it is leading to the deadlock
that Ralph Campbell is seeing in his driver testing. (And in general, holding
a lock during a driver callback usually leads to deadlocks.)

Ralph, is this the one? It's the only place in this patchset where I can
see a lock around a callback to driver code, that wasn't there before. So
I'm pretty sure it is the one...


thanks,
-- 
John Hubbard
NVIDIA


More information about the amd-gfx mailing list