[PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking
Jason Gunthorpe
jgg at mellanox.com
Mon Oct 21 13:57:51 UTC 2019
On Sun, Oct 20, 2019 at 02:21:42PM +0000, Koenig, Christian wrote:
> Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
> > On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote:
> >
> >>> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
> >>> called while holding mm->map_sem, so they are always serialized.
> >> Not even remotely.
> >>
> >> For calling get_user_pages()/hmm_range_fault() you only need to hold the
> >> mmap_sem in read mode.
> > Right
> >
> >> And IIRC invalidate_range_start() is sometimes called without holding
> >> the mmap_sem at all.
> > Yep
> >
> >> So again how are they serialized?
> > The 'driver lock' thing does it, read the hmm documentation, the hmm
> > approach is basically the only approach that was correct of all the
> > drivers..
>
> Well that's what I've did, but what HMM does still doesn't looks correct
> to me.
It has a bug, but the basic flow seems to work.
https://patchwork.kernel.org/patch/11191
> > So long as the 'driver lock' is held the range cannot become
> > invalidated as the 'driver lock' prevents progress of invalidation.
>
> Correct, but the problem is it doesn't wait for ongoing operations to
> complete.
>
> See I'm talking about the following case:
>
> Thread A Thread B
> invalidate_range_start()
> mmu_range_read_begin()
> get_user_pages()/hmm_range_fault()
> grab_driver_lock()
> Updating the ptes
> invalidate_range_end()
>
> As far as I can see in invalidate_range_start() the driver lock is taken
> to make sure that we can't start any invalidation while the driver is
> using the pages for a command submission.
Again, this uses the seqlock like scheme *and* the driver lock.
In this case after grab_driver_lock() mmu_range_read_retry() will
return false if Thread A has progressed to 'updating the ptes.
For instance here is how the concurrency resolves for retry:
CPU1 CPU2
seq = mmu_range_read_begin()
invalidate_range_start()
invalidate_seq++
get_user_pages()/hmm_range_fault()
grab_driver_lock()
ungrab_driver_lock()
grab_driver_lock()
seq != invalidate_seq, so retry
Updating the ptes
invalidate_range_end()
invalidate_seq++
And here is how it resolves for blocking:
CPU1 CPU2
seq = mmu_range_read_begin()
invalidate_range_start()
get_user_pages()/hmm_range_fault()
grab_driver_lock()
seq == invalidate_seq, so conitnue
invalidate_seq++
ungrab_driver_lock()
grab_driver_lock()
// Cannot progress to 'Updating the ptes' while the drive_lock is held
ungrab_driver_lock()
Updating the ptes
invalidate_range_end()
invalidate_seq++
For the above I've simplified the mechanics of the invalidate_seq, you
need to look through the patch to see how it actually works.
> Well we don't update the seqlock after the update to the protected data
> structure (the page table) happened, but rather before that.
??? This is what mn_itree_inv_end() does, it is called by
invalidate_range_end
> That doesn't looks like the normal patter for a seqlock to me and as far
> as I can see that is quite a bug in the HMM design/logic.
Well, hmm has a bug because it doesn't use a seqlock pattern, see the
above URL.
One of the motivations for this work is to squash that bug by adding a
seqlock like pattern. But the basic hmm flow and collision-retry
approach seems sound.
Do you see a problem with this patch?
Jason
More information about the amd-gfx
mailing list