[PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release
Jason Gunthorpe
jgg at ziepe.ca
Fri Jun 7 12:58:42 UTC 2019
On Thu, Jun 06, 2019 at 08:47:28PM -0700, John Hubbard wrote:
> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg at mellanox.com>
> >
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> >
> > This fixes a use after-free race of the form:
> >
> > CPU0 CPU1
> > hmm_release()
> > up_write(&hmm->mirrors_sem);
> > hmm_mirror_unregister(mirror)
> > down_write(&hmm->mirrors_sem);
> > up_write(&hmm->mirrors_sem);
> > kfree(mirror)
> > mirror->ops->release(mirror)
> >
> > The only user we have today for ops->release is an empty function, so this
> > is unambiguously safe.
> >
> > As a consequence of plugging this race drivers are not allowed to
> > register/unregister mirrors from within a release op.
> >
> > Signed-off-by: Jason Gunthorpe <jgg at mellanox.com>
> > mm/hmm.c | 28 +++++++++-------------------
> > 1 file changed, 9 insertions(+), 19 deletions(-)
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 709d138dd49027..3a45dd3d778248 100644
> > +++ b/mm/hmm.c
> > @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> > WARN_ON(!list_empty(&hmm->ranges));
> > mutex_unlock(&hmm->lock);
> >
> > - down_write(&hmm->mirrors_sem);
> > - mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
> > - list);
> > - while (mirror) {
> > - list_del_init(&mirror->list);
> > - if (mirror->ops->release) {
> > - /*
> > - * Drop mirrors_sem so the release callback can wait
> > - * on any pending work that might itself trigger a
> > - * mmu_notifier callback and thus would deadlock with
> > - * us.
> > - */
> > - up_write(&hmm->mirrors_sem);
> > + down_read(&hmm->mirrors_sem);
>
> This is cleaner and simpler, but I suspect it is leading to the deadlock
> that Ralph Campbell is seeing in his driver testing. (And in general, holding
> a lock during a driver callback usually leads to deadlocks.)
I think Ralph has never seen this patch (it is new), so it must be one
of the earlier patches..
> Ralph, is this the one? It's the only place in this patchset where I can
> see a lock around a callback to driver code, that wasn't there before. So
> I'm pretty sure it is the one...
Can you share the lockdep report please?
Thanks,
Jason
More information about the amd-gfx
mailing list