[PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Oct 17 08:54:43 UTC 2019
Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
>>> From: Jason Gunthorpe <jgg at mellanox.com>
>>>
>>> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
>>> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
>>> they only use invalidate_range_start/end and immediately check the
>>> invalidating range against some driver data structure to tell if the
>>> driver is interested. Half of them use an interval_tree, the others are
>>> simple linear search lists.
>>>
>>> Of the ones I checked they largely seem to have various kinds of races,
>>> bugs and poor implementation. This is a result of the complexity in how
>>> the notifier interacts with get_user_pages(). It is extremely difficult to
>>> use it correctly.
>>>
>>> Consolidate all of this code together into the core mmu_notifier and
>>> provide a locking scheme similar to hmm_mirror that allows the user to
>>> safely use get_user_pages() and reliably know if the page list still
>>> matches the mm.
>> That sounds really good, but could you outline for a moment how that is
>> archived?
> It uses the same basic scheme as hmm and rdma odp, outlined in the
> revisions to hmm.rst later on.
>
> Basically,
>
> seq = mmu_range_read_begin(&mrn);
>
> // This is a speculative region
> .. get_user_pages()/hmm_range_fault() ..
How do we enforce that this get_user_pages()/hmm_range_fault() doesn't
see outdated page table information?
In other words how the the following race prevented:
CPU A CPU B
invalidate_range_start()
mmu_range_read_begin()
get_user_pages()/hmm_range_fault()
Updating the ptes
invalidate_range_end()
I mean get_user_pages() tries to circumvent this issue by grabbing a
reference to the pages in question, but that isn't sufficient for the
SVM use case.
That's the reason why we had this horrible solution with a r/w lock and
a linked list of BOs in an interval tree.
Regards,
Christian.
> // Result cannot be derferenced
>
> take_lock(driver->update);
> if (mmu_range_read_retry(&mrn, range.notifier_seq) {
> // collision! The results are not correct
> goto again
> }
>
> // no collision, and now under lock. Now we can de-reference the pages/etc
> // program HW
> // Now the invalidate callback is responsible to synchronize against changes
> unlock(driver->update)
>
> Basically, anything that was using hmm_mirror correctly transisions
> over fairly trivially, just with the modification to store a sequence
> number to close that race described in the hmm commit.
>
> For something like AMD gpu I expect it to transition to use dma_fence
> from the notifier for coherency right before it unlocks driver->update.
>
> Jason
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list