[PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

Sun Oct 20 14:21:42 UTC 2019

Am 18.10.19 um 22:36 schrieb Jason Gunthorpe:
> On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote:
>
>>> get_user_pages/hmm_range_fault() and invalidate_range_start() both are
>>> called while holding mm->map_sem, so they are always serialized.
>> Not even remotely.
>>
>> For calling get_user_pages()/hmm_range_fault() you only need to hold the
>> mmap_sem in read mode.
> Right
>   
>> And IIRC invalidate_range_start() is sometimes called without holding
>> the mmap_sem at all.
> Yep
>   
>> So again how are they serialized?
> The 'driver lock' thing does it, read the hmm documentation, the hmm
> approach is basically the only approach that was correct of all the
> drivers..

Well that's what I've did, but what HMM does still doesn't looks correct 
to me.

> So long as the 'driver lock' is held the range cannot become
> invalidated as the 'driver lock' prevents progress of invalidation.

Correct, but the problem is it doesn't wait for ongoing operations to 
complete.

See I'm talking about the following case:

Thread A    Thread B
invalidate_range_start()
                     mmu_range_read_begin()
                     get_user_pages()/hmm_range_fault()
                     grab_driver_lock()
Updating the ptes
invalidate_range_end()

As far as I can see in invalidate_range_start() the driver lock is taken 
to make sure that we can't start any invalidation while the driver is 
using the pages for a command submission.

But the pages we got from get_user_pages()/hmm_range_fault() might not 
be up to date because the driver lock is also dropped again in 
invalidate_range_start() and not in invalidate_range_end().

> Holding the driver lock and using the seq based mmu_range_read_retry()
> tells if the previous unlocked get_user_pages() is still valid or
> needs to be discard.
>
> So it doesn't matter if get_user_pages() races or not, the result is not
> to be used until the driver lock is held and mmu_range_read_retry()
> called, which provides the coherence.
>
> It is the usual seqlock pattern.

Well we don't update the seqlock after the update to the protected data 
structure (the page table) happened, but rather before that.

That doesn't looks like the normal patter for a seqlock to me and as far 
as I can see that is quite a bug in the HMM design/logic.

Regards,
Christian.

>
> Jason