[PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking
Koenig, Christian
Christian.Koenig at amd.com
Thu Oct 17 16:47:20 UTC 2019
Sending once more as text.
Am 17.10.19 um 18:26 schrieb Yang, Philip:
> On 2019-10-17 4:54 a.m., Christian König wrote:
>> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
>>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
>>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
>>>>> From: Jason Gunthorpe <jgg at mellanox.com>
>>>>>
>>>>> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp,
>>>>> hfi1,
>>>>> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
>>>>> they only use invalidate_range_start/end and immediately check the
>>>>> invalidating range against some driver data structure to tell if the
>>>>> driver is interested. Half of them use an interval_tree, the others are
>>>>> simple linear search lists.
>>>>>
>>>>> Of the ones I checked they largely seem to have various kinds of races,
>>>>> bugs and poor implementation. This is a result of the complexity in how
>>>>> the notifier interacts with get_user_pages(). It is extremely
>>>>> difficult to
>>>>> use it correctly.
>>>>>
>>>>> Consolidate all of this code together into the core mmu_notifier and
>>>>> provide a locking scheme similar to hmm_mirror that allows the user to
>>>>> safely use get_user_pages() and reliably know if the page list still
>>>>> matches the mm.
>>>> That sounds really good, but could you outline for a moment how that is
>>>> archived?
>>> It uses the same basic scheme as hmm and rdma odp, outlined in the
>>> revisions to hmm.rst later on.
>>>
>>> Basically,
>>>
>>> seq = mmu_range_read_begin(&mrn);
>>>
>>> // This is a speculative region
>>> .. get_user_pages()/hmm_range_fault() ..
>> How do we enforce that this get_user_pages()/hmm_range_fault() doesn't
>> see outdated page table information?
>>
>> In other words how the the following race prevented:
>>
>> CPU A CPU B
>> invalidate_range_start()
>> mmu_range_read_begin()
>> get_user_pages()/hmm_range_fault()
>> Updating the ptes
>> invalidate_range_end()
>>
>>
>> I mean get_user_pages() tries to circumvent this issue by grabbing a
>> reference to the pages in question, but that isn't sufficient for the
>> SVM use case.
>>
>> That's the reason why we had this horrible solution with a r/w lock and
>> a linked list of BOs in an interval tree.
>>
>> Regards,
>> Christian.
> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
> called while holding mm->map_sem, so they are always serialized.
Not even remotely.
For calling get_user_pages()/hmm_range_fault() you only need to hold the
mmap_sem in read mode.
And IIRC invalidate_range_start() is sometimes called without holding
the mmap_sem at all.
So again how are they serialized?
Regards,
Christian.
>
> Philip
>>> // Result cannot be derferenced
>>>
>>> take_lock(driver->update);
>>> if (mmu_range_read_retry(&mrn, range.notifier_seq) {
>>> // collision! The results are not correct
>>> goto again
>>> }
>>>
>>> // no collision, and now under lock. Now we can de-reference the
>>> pages/etc
>>> // program HW
>>> // Now the invalidate callback is responsible to synchronize against
>>> changes
>>> unlock(driver->update)
>>>
>>> Basically, anything that was using hmm_mirror correctly transisions
>>> over fairly trivially, just with the modification to store a sequence
>>> number to close that race described in the hmm commit.
>>>
>>> For something like AMD gpu I expect it to transition to use dma_fence
>>> from the notifier for coherency right before it unlocks driver->update.
>>>
>>> Jason
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list