[PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

Mon Oct 21 13:57:51 UTC 2019

On Sun, Oct 20, 2019 at 02:21:42PM +0000, Koenig, Christian wrote:
> Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
> > On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote:
> >
> >>> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
> >>> called while holding mm->map_sem, so they are always serialized.
> >> Not even remotely.
> >>
> >> For calling get_user_pages()/hmm_range_fault() you only need to hold the
> >> mmap_sem in read mode.
> > Right
> >   
> >> And IIRC invalidate_range_start() is sometimes called without holding
> >> the mmap_sem at all.
> > Yep
> >   
> >> So again how are they serialized?
> > The 'driver lock' thing does it, read the hmm documentation, the hmm
> > approach is basically the only approach that was correct of all the
> > drivers..
> 
> Well that's what I've did, but what HMM does still doesn't looks correct 
> to me.

It has a bug, but the basic flow seems to work.

https://patchwork.kernel.org/patch/11191

> > So long as the 'driver lock' is held the range cannot become
> > invalidated as the 'driver lock' prevents progress of invalidation.
> 
> Correct, but the problem is it doesn't wait for ongoing operations to 
> complete.
>
> See I'm talking about the following case:
> 
> Thread A    Thread B
> invalidate_range_start()
>                      mmu_range_read_begin()
>                      get_user_pages()/hmm_range_fault()
>                      grab_driver_lock()
> Updating the ptes
> invalidate_range_end()
> 
> As far as I can see in invalidate_range_start() the driver lock is taken 
> to make sure that we can't start any invalidation while the driver is 
> using the pages for a command submission.

Again, this uses the seqlock like scheme *and* the driver lock.

In this case after grab_driver_lock() mmu_range_read_retry() will
return false if Thread A has progressed to 'updating the ptes.

For instance here is how the concurrency resolves for retry:

       CPU1                                CPU2
                                  seq = mmu_range_read_begin()
invalidate_range_start()
  invalidate_seq++ 
                                  get_user_pages()/hmm_range_fault()
  grab_driver_lock()
  ungrab_driver_lock()
                                  grab_driver_lock()
                                  seq != invalidate_seq, so retry

  Updating the ptes

invalidate_range_end()
  invalidate_seq++

And here is how it resolves for blocking:

       CPU1                                CPU2
                                  seq = mmu_range_read_begin()
invalidate_range_start()
                                  get_user_pages()/hmm_range_fault()

                                  grab_driver_lock()
                                  seq == invalidate_seq, so conitnue
  invalidate_seq++ 
                                  ungrab_driver_lock()
  grab_driver_lock()
  // Cannot progress to 'Updating the ptes' while the drive_lock is held
  ungrab_driver_lock()
  Updating the ptes

invalidate_range_end()
  invalidate_seq++

For the above I've simplified the mechanics of the invalidate_seq, you
need to look through the patch to see how it actually works.

> Well we don't update the seqlock after the update to the protected data 
> structure (the page table) happened, but rather before that.

??? This is what mn_itree_inv_end() does, it is called by
invalidate_range_end

> That doesn't looks like the normal patter for a seqlock to me and as far 
> as I can see that is quite a bug in the HMM design/logic.

Well, hmm has a bug because it doesn't use a seqlock pattern, see the
above URL.

One of the motivations for this work is to squash that bug by adding a
seqlock like pattern. But the basic hmm flow and collision-retry
approach seems sound.

Do you see a problem with this patch?

Jason